Senior Site Reliability Engineer (SRE)
📍 Job Overview
- Job Title: Senior Site Reliability Engineer (SRE)
- Company: Tracksuit Limited
- Location: Auckland, Auckland, New Zealand
- Job Type: Full-time
- Category: DevOps, Infrastructure
- Date Posted: 2025-07-12
- Experience Level: 5-10 years
- Remote Status: Hybrid (Auckland, Sydney, London, New York)
🚀 Role Summary
- 📝 Enhancement Note: This role focuses on scaling and maintaining the infrastructure behind Tracksuit's brand platform, ensuring high reliability, security, and scalability. The ideal candidate will have a strong background in SRE or infrastructure roles and be comfortable working with AWS, Kubernetes, and Terraform.
💻 Primary Responsibilities
-
📝 Enhancement Note: The primary responsibilities listed below require a solid understanding of cloud infrastructure, automation, and incident response. The candidate should be comfortable working with various tools and programming languages to ensure the platform's stability and performance.
-
Design, Build, and Maintain Resilient Infrastructure: Utilize AWS, Kubernetes, Terraform, and other tools to create secure, scalable, and highly available infrastructure. This includes designing and implementing systems that can withstand failures and scale to meet demand.
-
Lead Reliability, Observability, and Monitoring Practices: Develop and maintain strategies for monitoring the platform's health and performance. This includes setting up alerts, defining service level objectives (SLOs), and conducting blameless postmortems after incidents.
-
Automate Infrastructure and Deployments: Write scripts and use tools like Terraform and CDK to automate infrastructure provisioning and deployment processes. This helps reduce manual effort, improve speed, and increase confidence in deployments.
-
Coach and Support Other Engineers: Share your knowledge and best practices with other engineers on the team. This includes mentoring, leading workshops, and contributing to the team's onboarding process.
-
Balance Feature Velocity with Platform Stability: Work closely with the development team to ensure that new features and updates are delivered without compromising the platform's stability. This includes conducting risk assessments, performing code reviews, and participating in on-call rotations.
-
Champion Operational Excellence: Embed a culture of reliability and operational excellence across the engineering team. This includes driving initiatives like chaos engineering, automated testing, and continuous improvement.
🎓 Skills & Qualifications
Education: A bachelor's degree in Computer Science, Engineering, or a related field. Relevant experience may be considered in lieu of a degree.
Experience: Proven experience (4+ years) in Site Reliability Engineering or a similar role, with a strong focus on cloud infrastructure, automation, and incident response.
Required Skills:
- Cloud Platform Proficiency: Expertise in AWS (or another major cloud provider) with experience in designing, deploying, and managing scalable infrastructure.
- Containerization and Orchestration: Proficiency in Kubernetes or a similar container orchestration platform.
- Infrastructure as Code (IaC): Experience with Terraform, CDK, or other IaC tools for automating infrastructure provisioning.
- Programming Languages: Proficiency in Python, Bash, or TypeScript for scripting, automation, and tool development.
- Monitoring and Alerting: Experience with monitoring tools like Datadog, Prometheus, or similar platforms for tracking system health and performance.
- Incident Response: A calm and methodical approach to troubleshooting and resolving issues during incidents. Experience with on-call rotations and blameless postmortems.
- Collaboration and Communication: Strong communication skills and the ability to work effectively with cross-functional teams, including developers, product managers, and other stakeholders.
Preferred Skills:
- Chaos Engineering: Experience with chaos engineering tools and practices for improving system resilience and identifying single points of failure.
- CI/CD Pipelines: Proficiency in setting up and maintaining CI/CD pipelines for automated testing and deployment.
- Serverless Architecture: Experience with serverless architecture and platforms like AWS Lambda, Azure Functions, or Google Cloud Functions.
- GitOps: Familiarity with GitOps workflows for managing infrastructure and application configuration.
📊 Web Portfolio & Project Requirements
Portfolio Essentials:
-
📝 Enhancement Note: As this role focuses on infrastructure and DevOps, the portfolio should highlight the candidate's technical skills and accomplishments in these areas. Include case studies or projects that demonstrate the candidate's ability to design, build, and maintain scalable, reliable infrastructure.
-
Cloud Infrastructure Projects: Showcase projects that demonstrate your ability to design, deploy, and manage infrastructure on AWS or another major cloud platform.
-
Automation and Scripting: Include examples of scripts or tools you've developed to automate infrastructure provisioning, deployment, or other repetitive tasks.
-
Incident Response Case Studies: Describe incidents you've responded to and the steps you took to resolve the issue, improve the system, and prevent similar incidents in the future.
-
Monitoring and Alerting: Highlight projects or case studies that demonstrate your ability to set up and maintain monitoring and alerting systems for tracking system health and performance.
Technical Documentation:
-
📝 Enhancement Note: The technical documentation should provide a clear and comprehensive overview of the candidate's approach to infrastructure design, automation, and incident response. It should also demonstrate the candidate's ability to communicate complex technical concepts effectively.
-
Architecture Diagrams: Include diagrams that illustrate the overall architecture of the systems you've worked on, as well as the specific components and their interactions.
-
Automation Scripts and Tools: Provide examples of scripts or tools you've developed to automate infrastructure provisioning, deployment, or other repetitive tasks. Include comments and documentation that explain the purpose and functionality of each script or tool.
-
Incident Response Documentation: Document the steps you took to respond to incidents, including the root cause analysis, resolution steps, and any preventative measures you implemented to avoid similar incidents in the future.
-
Monitoring and Alerting Documentation: Provide detailed documentation on the monitoring and alerting systems you've implemented, including the metrics you track, the alerts you've configured, and the tools you use to visualize system health and performance.
💵 Compensation & Benefits
Salary Range: $130,000 - $190,000 USD per year (based on the provided range and regional cost of living adjustments)
Benefits:
- Competitive Market Rate Remuneration: Competitive market rate remuneration, which is reviewed twice annually. The company's radically transparent compensation policy ensures that salaries are fair across the entire team.
- Annual Company-wide Performance Bonus: An annual company-wide performance bonus to celebrate hitting targets together.
- Employee Share Option Program (ESOP): An ESOP to ensure that everyone on the team has a share in Tracksuit's success.
- Progressive Health and Wellness Benefits: Progressive health and wellness benefits, including an annual wellness bonus, access to a premium EAP platform, and 6 weeks of paid annual leave.
- Generous Parental Benefits: Generous parental benefits, including 12 weeks' paid parental leave for either caregiver, additional sick leave for IVF, and a gradual return to work.
- Personal L&D Budget: A $1,000 personal L&D budget for each employee, plus additional growth opportunities including mentorships, speaking engagements, and travel.
- Flexible Working: Flexible working arrangements, with beautiful offices in Auckland, Sydney, London, and New York. The company adopts a balanced approach to WFH/in-office work.
🎯 Team & Company Context
🏢 Company Culture
Industry: Market Research and Brand Tracking
Company Size: Medium (450+ brands across NZ, AU, USA, Canada, and the UK)
Founded: 2019
Team Structure:
-
📝 Enhancement Note: The engineering team at Tracksuit is structured to support the company's growth and ensure the platform's stability and performance. The team includes frontend and backend engineers, as well as DevOps and SRE specialists.
-
Web Technology Team: The engineering team is responsible for designing, developing, and maintaining the Tracksuit platform. This includes frontend and backend development, as well as infrastructure and DevOps tasks.
-
Cross-functional Collaboration: The engineering team works closely with other departments, including product, design, and customer success, to ensure that the platform meets the needs of Tracksuit's customers.
Development Methodology:
- Agile/Scrum: The engineering team follows Agile/Scrum methodologies, with regular sprint planning, stand-ups, and retrospectives.
- Code Review and Testing: The team emphasizes code review, automated testing, and quality assurance practices to ensure the platform's stability and performance.
- Deployment Strategies: The team uses CI/CD pipelines and automated deployment strategies to ensure that new features and updates are delivered quickly and safely.
Company Website: www.gotracksuit.com
📝 Enhancement Note: Tracksuit's company culture is characterized by transparency, trust, learning, and constant development. The company values collaboration, experimentation, and real impact over checking every box. This culture is reflected in the team's approach to engineering, with a strong focus on operational excellence, automation, and continuous improvement.
📈 Career & Growth Analysis
Web Technology Career Level: Senior Site Reliability Engineer (SRE)
Reporting Structure: The Senior SRE reports directly to the Head of Engineering and works closely with other engineers, product managers, and stakeholders.
Technical Impact: The Senior SRE has a significant impact on the platform's reliability, performance, and scalability. They work closely with other engineers to ensure that new features and updates are delivered without compromising the platform's stability.
Growth Opportunities:
- Technical Leadership: As the company grows, there will be opportunities for the Senior SRE to take on more technical leadership responsibilities, such as mentoring other engineers, driving technical initiatives, and contributing to architecture decisions.
- Team Expansion: With the company's continued growth, there will be opportunities for the Senior SRE to expand their team and take on more management responsibilities, such as hiring, onboarding, and coaching other engineers.
- Emerging Technologies: As the company explores new technologies and platforms, there will be opportunities for the Senior SRE to expand their skill set and take on more specialized roles.
📝 Enhancement Note: Tracksuit's growth and expansion present numerous opportunities for the Senior SRE to grow both technically and professionally. The company's commitment to learning, development, and continuous improvement ensures that employees have the support they need to reach their full potential.
🌐 Work Environment
Office Type: Hybrid (Auckland, Sydney, London, New York)
Office Location(s): Auckland, Auckland, New Zealand; Sydney, New South Wales, Australia; London, City of, United Kingdom; New York, United York, United States
Workspace Context:
-
📝 Enhancement Note: Tracksuit's hybrid work environment offers employees the flexibility to work from home or in one of the company's beautiful offices. The company adopts a balanced approach to WFH/in-office work, with a focus on collaboration, productivity, and work-life balance.
-
Collaborative Workspace: The company's offices are designed to be collaborative and supportive, with plenty of space for team meetings, workshops, and social events.
-
Development Tools and Resources: The engineering team has access to the latest development tools, multiple monitors, and testing devices to ensure that they can work effectively and efficiently.
-
Cross-functional Interaction: The engineering team works closely with other departments, including product, design, and customer success. This ensures that the platform meets the needs of Tracksuit's customers and that engineers have the support they need to succeed.
Work Schedule: The company adopts a flexible work schedule, with a focus on productivity and work-life balance. The core hours are 10:00 AM - 4:00 PM NZST, with the option to start earlier or later and make up the hours as needed.
📝 Enhancement Note: Tracksuit's hybrid work environment offers employees the flexibility to balance their work and personal lives, with a focus on productivity, collaboration, and work-life balance. The company's commitment to employee well-being ensures that employees have the support they need to succeed both professionally and personally.
🛠 Technology Stack & Web Infrastructure
Frontend Technologies: (Not applicable for this role)
Backend & Server Technologies:
-
📝 Enhancement Note: The Senior SRE will work with a wide range of backend and server technologies, including AWS services, Kubernetes, Terraform, and other infrastructure tools. They will also collaborate with frontend and backend engineers to ensure that the platform's architecture is scalable, reliable, and performant.
-
AWS Services: The Senior SRE will work with various AWS services, including EC2, RDS, DynamoDB, and Lambda, to design, deploy, and manage scalable infrastructure.
-
Kubernetes: The Senior SRE will use Kubernetes or a similar container orchestration platform to manage and deploy applications and services.
-
Terraform/CDK: The Senior SRE will use Terraform or CDK to automate infrastructure provisioning and deployment, ensuring that the platform's architecture is consistent, scalable, and reliable.
-
CI/CD Pipelines: The Senior SRE will work with CI/CD pipelines to automate testing, deployment, and other repetitive tasks, ensuring that new features and updates are delivered quickly and safely.
Development & DevOps Tools:
- Git: The Senior SRE will use Git for version control and collaborative development.
- CI/CD Tools: The Senior SRE will work with CI/CD tools like Jenkins, CircleCI, or GitHub Actions to automate testing, deployment, and other repetitive tasks.
- Monitoring Tools: The Senior SRE will use monitoring tools like Datadog, Prometheus, or similar platforms to track system health and performance.
- Incident Response Tools: The Senior SRE will use incident response tools like PagerDuty, OpsGenie, or similar platforms to manage and respond to incidents.
📝 Enhancement Note: The Senior SRE will work with a wide range of technologies and tools to ensure that the platform is scalable, reliable, and performant. They will also collaborate with other engineers to ensure that the platform's architecture is consistent, well-documented, and easy to maintain.
👥 Team Culture & Values
Web Development Values:
-
📝 Enhancement Note: Tracksuit's engineering team values collaboration, experimentation, and real impact over checking every box. This culture is reflected in the team's approach to engineering, with a strong focus on operational excellence, automation, and continuous improvement.
-
User Experience Focus: The engineering team prioritizes the user experience, ensuring that the platform is intuitive, accessible, and easy to use.
-
Performance Optimization: The engineering team focuses on optimizing the platform's performance, ensuring that it is fast, reliable, and scalable.
-
Code Quality: The engineering team emphasizes code quality, ensuring that the platform's architecture is well-documented, maintainable, and easy to understand.
-
Collaboration and Learning: The engineering team values collaboration and learning, with a strong focus on knowledge sharing, mentoring, and continuous development.
Collaboration Style:
- Cross-functional Integration: The engineering team works closely with other departments, including product, design, and customer success, to ensure that the platform meets the needs of Tracksuit's customers.
- Code Review Culture: The engineering team emphasizes code review, with a focus on knowledge sharing, learning, and continuous improvement.
- Peer Programming: The engineering team encourages peer programming, with a focus on collaboration, learning, and code quality.
📝 Enhancement Note: Tracksuit's engineering team values collaboration, experimentation, and real impact over checking every box. This culture is reflected in the team's approach to engineering, with a strong focus on operational excellence, automation, and continuous improvement. The team's commitment to learning, development, and continuous improvement ensures that employees have the support they need to reach their full potential.
⚡ Challenges & Growth Opportunities
Technical Challenges:
-
📝 Enhancement Note: The Senior SRE will face a wide range of technical challenges, including designing and deploying scalable infrastructure, optimizing performance, and responding to incidents. They will also need to stay up-to-date with emerging technologies and best practices in the field.
-
Scalability and Performance: The Senior SRE will need to design and deploy scalable infrastructure that can meet the demands of Tracksuit's growing customer base. They will also need to optimize the platform's performance, ensuring that it is fast, reliable, and scalable.
-
Incident Response: The Senior SRE will need to respond to incidents quickly and effectively, minimizing downtime and ensuring that the platform's availability is maintained.
-
Emerging Technologies: The Senior SRE will need to stay up-to-date with emerging technologies and best practices in the field, ensuring that the platform is at the forefront of industry trends.
Learning & Development Opportunities:
- Technical Skill Development: The Senior SRE will have the opportunity to develop their technical skills, working with a wide range of technologies and tools.
- Leadership Development: As the company grows, the Senior SRE will have the opportunity to take on more technical leadership responsibilities, mentoring other engineers, and driving technical initiatives.
- Architecture and Design: The Senior SRE will have the opportunity to contribute to the platform's architecture and design, ensuring that it is scalable, reliable, and performant.
📝 Enhancement Note: The Senior SRE will face a wide range of technical challenges, including designing and deploying scalable infrastructure, optimizing performance, and responding to incidents. They will also need to stay up-to-date with emerging technologies and best practices in the field. The company's commitment to learning, development, and continuous improvement ensures that employees have the support they need to reach their full potential.
💡 Interview Preparation
Technical Questions:
-
📝 Enhancement Note: The technical interview for the Senior SRE role will focus on the candidate's knowledge of cloud infrastructure, automation, and incident response. The candidate should be prepared to discuss their experience with AWS, Kubernetes, Terraform, and other relevant technologies.
-
Cloud Infrastructure Design: The candidate should be prepared to discuss their experience designing and deploying scalable infrastructure on AWS or another major cloud platform.
-
Automation and Scripting: The candidate should be prepared to discuss their experience with automation and scripting, including their approach to infrastructure as code (IaC) and CI/CD pipelines.
-
Incident Response: The candidate should be prepared to discuss their experience responding to incidents, including their approach to root cause analysis, resolution, and preventative measures.
Company & Culture Questions:
-
📝 Enhancement Note: The company and culture questions for the Senior SRE role will focus on the candidate's fit with Tracksuit's values, culture, and engineering team. The candidate should be prepared to discuss their approach to collaboration, learning, and continuous improvement.
-
Company Values: The candidate should be prepared to discuss their understanding of Tracksuit's values, including transparency, trust, learning, and continuous development.
-
Team Dynamics: The candidate should be prepared to discuss their experience working in a collaborative, cross-functional team, and their approach to knowledge sharing, mentoring, and continuous learning.
-
Growth and Development: The candidate should be prepared to discuss their long-term career goals and how they align with Tracksuit's growth and development opportunities.
Portfolio Presentation Strategy:
-
📝 Enhancement Note: The portfolio presentation for the Senior SRE role should focus on the candidate's technical skills and accomplishments in infrastructure design, automation, and incident response. The candidate should include case studies, architecture diagrams, and other relevant documentation that demonstrates their ability to design, deploy, and maintain scalable, reliable infrastructure.
-
Case Studies: The candidate should include case studies that demonstrate their ability to design, deploy, and maintain scalable, reliable infrastructure. These case studies should include detailed documentation, architecture diagrams, and other relevant information that showcases the candidate's technical skills and accomplishments.
-
Architecture Diagrams: The candidate should include architecture diagrams that illustrate the overall architecture of the systems they've worked on, as well as the specific components and their interactions.
-
Incident Response Documentation: The candidate should include documentation that demonstrates their approach to incident response, including root cause analysis, resolution steps, and preventative measures.
📝 Enhancement Note: The technical interview for the Senior SRE role will focus on the candidate's knowledge of cloud infrastructure, automation, and incident response. The candidate should be prepared to discuss their experience with AWS, Kubernetes, Terraform, and other relevant technologies. The company and culture questions will focus on the candidate's fit with Tracksuit's values, culture, and engineering team. The candidate should be prepared to discuss their approach to collaboration, learning, and continuous improvement. The portfolio presentation should focus on the candidate's technical skills and accomplishments in infrastructure design, automation, and incident response.
📌 Application Steps
To apply for this Senior Site Reliability Engineer (SRE) position at Tracksuit Limited:
- 📝 Enhancement Note: Review the job description carefully, ensuring that you meet the required qualifications and experience level. Tailor your resume and portfolio to highlight your relevant skills and accomplishments in infrastructure design, automation, and incident response.
- 📝 Enhancement Note: Prepare for the technical interview by studying the required technologies, practicing coding challenges, and brushing up on your incident response skills. Research the company's values, culture, and engineering team to ensure a strong fit.
- 📝 Enhancement Note: Prepare a comprehensive portfolio that showcases your technical skills and accomplishments in infrastructure design, automation, and incident response. Include case studies, architecture diagrams, and other relevant documentation that demonstrates your ability to design, deploy, and maintain scalable, reliable infrastructure.
- 📝 Enhancement Note: Submit your application through the provided link, ensuring that all required fields are completed accurately and thoroughly. Follow up on your application if you haven't heard back within a week.
📝 Enhancement Note: By following these application steps and preparing thoroughly, you'll increase your chances of success in the Senior Site Reliability Engineer (SRE) role at Tracksuit Limited. Good luck!
Application Requirements
Candidates should have 4+ years of experience in SRE or infrastructure roles, with expertise in building secure, scalable cloud-native systems. Proficiency in AWS, Kubernetes, Terraform, and programming languages like Python or Bash is essential.