Senior Site Reliability Engineer
📍 Job Overview
- Job Title: Senior Site Reliability Engineer
- Company: Axon
- Location: Ho Chi Minh City, Vietnam
- Job Type: On-site
- Category: DevOps Engineer
- Date Posted: 2025-07-28
- Experience Level: 5-10 years
🚀 Role Summary
- Key Responsibilities: Build robust, easy-to-use foundational platforms and tools, exemplify cloud-native site reliability best practices, write performant and maintainable code, employ strong problem-solving skills, influence and educate the engineering organization, and provide robust documentation.
- Key Technologies: Linux, cloud platforms (Azure, AWS), container technologies (Kubernetes, Docker), scripting languages (Python, Go, Bash), code collaboration tools (GitHub, ArgoCD), DevOps CI/CD platforms, observability tools, Infrastructure as Code tools (Terraform, Cloudformation).
📝 Enhancement Note: This role focuses on delivering solutions to real-time problems in mission-critical cloud native services, requiring a strong background in site reliability engineering, cloud platforms, and container technologies.
💻 Primary Responsibilities
- Build and Maintain Foundational Platforms: Develop and maintain robust, easy-to-use platforms and tools that enable engineering teams to provision services rapidly, consistently, and securely.
- Exemplify Cloud-Native Site Reliability Best Practices: Ensure high availability, scalability, and performance of cloud native services by implementing best practices in site reliability engineering.
- Write Performant and Maintainable Code: Contribute to the development and maintenance of the company's software products by writing clean, efficient, and well-documented code.
- Problem-Solving and Debugging: Employ strong problem-solving skills to debug and resolve issues in cloud native distributed systems.
- Influence and Educate: Collaborate with the engineering organization to adopt new and improved architectural patterns, and provide guidance on best practices in site reliability engineering.
- Documentation: Create and maintain robust documentation for use by engineers to promote self-service and knowledge sharing.
📝 Enhancement Note: This role requires a strong focus on problem-solving, collaboration, and influencing others to drive improvements in the engineering organization's site reliability practices.
🎓 Skills & Qualifications
Education: Bachelor's degree in Computer Science, Engineering, or a related field. Relevant experience may be considered in lieu of a degree.
Experience: 5+ years of applicable experience in site reliability engineering, DevOps, or a related role.
Required Skills:
- Strong Operating Systems Skills, preferably Linux
- Experience operating cloud platforms such as Azure, AWS, or similar
- Experience utilizing container technologies like Kubernetes, Docker, or similar
- Experience using scripting languages such as Python, Go, Bash, or similar
- Experience using code collaboration tools such as GitHub, ArgoCD, or similar
- Experience utilizing DevOps CI/CD platforms to automate provisioning infrastructure, software builds, tests, and releases
- Experience using observability tools such as APM, logging, and metrics to assist with debugging issues
- Experience using Infrastructure as Code tools for provisioning infrastructure such as Terraform, Cloudformation, or similar
- Experience designing tooling to simplify the operational management of SaaS/PaaS systems
- Familiarity with building flexible and testable Infrastructure as Code modules
- Empathy to support the needs of software engineers
Preferred Skills:
- Experience with cloud-native architectures and microservices
- Familiarity with service mesh technologies (e.g., Istio, Linkerd)
- Knowledge of chaos engineering principles and practices
- Experience with infrastructure automation and configuration management tools (e.g., Ansible, Puppet)
- Familiarity with cloud security best practices and compliance frameworks
📝 Enhancement Note: This role requires a strong background in site reliability engineering, with a focus on cloud platforms, container technologies, and scripting languages. Experience with infrastructure automation and configuration management tools is also beneficial.
📊 Web Portfolio & Project Requirements
Portfolio Essentials:
- Demonstrate experience in building and maintaining cloud native services, highlighting your problem-solving skills and ability to deliver robust, scalable solutions.
- Showcase your experience with container technologies, scripting languages, and infrastructure as code tools by including relevant projects and code snippets.
- Highlight your ability to collaborate with engineering teams and influence best practices in site reliability engineering.
Technical Documentation:
- Include documentation for your projects, demonstrating your ability to create clear, concise, and comprehensive technical documentation.
- Showcase your understanding of cloud-native architectures and best practices by including architectural diagrams and design decisions in your documentation.
📝 Enhancement Note: This role requires a strong portfolio demonstrating your experience in site reliability engineering, with a focus on cloud platforms, container technologies, and infrastructure as code tools. Your portfolio should also showcase your ability to collaborate with engineering teams and influence best practices.
💵 Compensation & Benefits
Salary Range: $120,000 - $160,000 USD per year (based on market research for senior site reliability engineering roles in Ho Chi Minh City, Vietnam)
Benefits:
- Medical, Dental, and Vision Insurance
- Robust Paid Time Off policy
- Bonuses
- Lunch allowance
- Cell phone stipend
- Free LinkedIn Learning account or Udemy account
- Access to 24/7 online emotional and mental support
- Gym membership
- Free parking
- Stocked fridges and pantries - free coffee, cold beverages, snacks
- Annual Company Outing Trip
- Monthly team social activities
Working Hours: Full-time, 40 hours per week, with flexible hours for deployment windows and maintenance.
📝 Enhancement Note: The salary range for this role is based on market research for senior site reliability engineering roles in Ho Chi Minh City, Vietnam. Benefits are comprehensive and include health insurance, paid time off, and various perks to support work-life balance and employee well-being.
🎯 Team & Company Context
🏢 Company Culture
Industry: Technology, focusing on public safety and protecting life through innovative cloud-native software solutions.
Company Size: Medium to large (approximately 1,000 - 5,000 employees globally)
Founded: 1993 (as TASER International, rebranded as Axon in 2017)
Team Structure:
- Cross-functional teams consisting of software engineers, QA engineers, product managers, and designers.
- Flat hierarchy with a focus on empowering teams to make decisions and drive innovation.
- Strong collaboration and communication across teams and departments.
Development Methodology:
- Agile/Scrum methodologies with two-week sprints.
- Continuous Integration and Continuous Deployment (CI/CD) pipelines for automated testing and deployment.
- Regular code reviews and pair programming to ensure code quality and knowledge sharing.
Company Website: Axon
📝 Enhancement Note: Axon is a technology company focused on public safety and protecting life, with a strong commitment to innovation, collaboration, and empowering teams to drive results. The company's culture emphasizes continuous learning, improvement, and customer focus.
📈 Career & Growth Analysis
Web Technology Career Level: Senior Site Reliability Engineer - Responsible for designing, implementing, and maintaining scalable, highly available, and secure cloud native services. Mentors junior team members and influences best practices across the engineering organization.
Reporting Structure: Reports directly to the Site Reliability Engineering Manager or a similar role within the engineering leadership team.
Technical Impact: Drives improvements in the reliability, performance, and scalability of Axon's cloud native services, ensuring high availability and minimal downtime. Collaborates with engineering teams to implement best practices in site reliability engineering and cloud-native architectures.
Growth Opportunities:
- Technical Leadership: Transition into a technical lead role, focusing on mentoring and driving best practices across the engineering organization.
- Architecture: Specialize in cloud-native architectures and microservices, driving innovation and adoption across Axon's product portfolio.
- Management: Transition into a management role, leading a team of site reliability engineers and driving the organization's site reliability engineering practices.
📝 Enhancement Note: This role offers significant opportunities for growth and development, with a focus on technical leadership, architecture, and management. Axon's culture of empowerment and innovation provides a strong foundation for career progression.
🌐 Work Environment
Office Type: Modern, collaborative office space with a focus on employee comfort and productivity.
Office Location(s): Ho Chi Minh City, Vietnam (with additional offices in the United States, Europe, and Asia)
Workspace Context:
- Open-plan workspaces with dedicated areas for focused work and collaboration.
- Multiple monitors and testing devices available to support development and debugging tasks.
- Access to cloud-based tools and resources to facilitate remote work and collaboration.
Work Schedule: Flexible hours with a focus on work-life balance, with core hours from 8:00 AM to 12:00 PM and 1:00 PM to 5:00 PM (Vietnam Standard Time).
📝 Enhancement Note: Axon's work environment is designed to support collaboration, productivity, and work-life balance. The company's flexible work schedule and remote work options enable employees to maintain a healthy work-life balance while driving innovation and results.
📄 Application & Technical Interview Process
Interview Process:
- Phone Screen: A brief call to discuss your experience, motivations, and fit for the role.
- Technical Deep Dive: A comprehensive technical interview focused on your experience with cloud platforms, container technologies, and site reliability engineering best practices.
- Behavioral and Cultural Fit: An interview to assess your problem-solving skills, communication, and cultural fit within Axon's organization.
- Final Review: A meeting with the hiring manager or a member of the engineering leadership team to discuss your fit for the role and answer any remaining questions.
Portfolio Review Tips:
- Highlight your experience with cloud platforms, container technologies, and infrastructure as code tools.
- Include case studies demonstrating your problem-solving skills and ability to deliver robust, scalable solutions.
- Showcase your ability to collaborate with engineering teams and influence best practices in site reliability engineering.
Technical Challenge Preparation:
- Brush up on your knowledge of cloud-native architectures, container technologies, and infrastructure as code tools.
- Practice problem-solving and debugging exercises to hone your skills in identifying and resolving issues in cloud native distributed systems.
- Prepare for behavioral questions that assess your communication, collaboration, and problem-solving skills.
ATS Keywords: (Organized by category)
- Programming Languages: Python, Go, Bash, JavaScript, TypeScript
- Web Frameworks: React, Angular, Vue.js
- Server Technologies: Linux, Kubernetes, Docker, AWS, Azure, Google Cloud Platform
- Databases: PostgreSQL, MySQL, MongoDB, Redis
- Tools: GitHub, ArgoCD, Terraform, Cloudformation, Prometheus, Grafana, ELK Stack, Jenkins, GitLab CI/CD
- Methodologies: Agile, Scrum, CI/CD, Infrastructure as Code, Site Reliability Engineering, Chaos Engineering
- Soft Skills: Problem-solving, communication, collaboration, empathy, mentoring, influencing
- Industry Terms: Cloud native, microservices, serverless, containerization, orchestration, observability, monitoring, alerting, automation, configuration management
📝 Enhancement Note: Axon's interview process is designed to assess your technical skills, problem-solving abilities, and cultural fit within the organization. By preparing for the interview process and showcasing your experience with cloud platforms, container technologies, and site reliability engineering best practices, you can demonstrate your qualifications for the Senior Site Reliability Engineer role.
🛠 Technology Stack & Web Infrastructure
Frontend Technologies: (Not applicable for this role)
Backend & Server Technologies:
- Linux (Ubuntu, CentOS)
- Kubernetes (v1.21+)
- Docker (v20.10.7+)
- AWS (EC2, RDS, S3, Lambda, API Gateway)
- Azure (Virtual Machines, App Service, Container Instances, Azure Functions)
- Google Cloud Platform (Compute Engine, Cloud Functions, Cloud Pub/Sub)
Development & DevOps Tools:
- GitHub (v3.1.0+)
- ArgoCD (v2.1.7+)
- Terraform (v1.1.7+)
- Prometheus (v2.32.0+)
- Grafana (v8.5.2)
- ELK Stack (v7.15.1)
- Jenkins (v2.331)
- GitLab CI/CD (v14.0.5)
📝 Enhancement Note: Axon's technology stack is built on cloud-native principles, with a focus on scalability, availability, and performance. The company's use of containerization, orchestration, and infrastructure as code tools enables efficient deployment and management of its cloud native services.
👥 Team Culture & Values
Web Development Values:
- User-Centric: Focus on delivering innovative solutions that meet the needs of Axon's customers and users.
- Quality-Driven: Emphasize code quality, performance optimization, and accessibility in all development efforts.
- Collaborative: Foster a culture of collaboration, knowledge sharing, and continuous learning.
- Innovative: Encourage experimentation, iteration, and continuous improvement in all aspects of the development process.
Collaboration Style:
- Cross-Functional: Work closely with product managers, designers, and other engineering teams to deliver integrated solutions that meet business objectives and user needs.
- Code Review: Implement a culture of code review to ensure code quality, knowledge sharing, and collective code ownership.
- Peer Programming: Encourage pair programming and knowledge sharing to drive continuous learning and improvement.
📝 Enhancement Note: Axon's web development values emphasize user-centric design, quality-driven development, collaboration, and innovation. The company's collaboration style fosters cross-functional teamwork, code review, and peer programming to drive continuous learning and improvement.
⚡ Challenges & Growth Opportunities
Technical Challenges:
- Cloud-Native Architecture: Design and implement scalable, highly available, and secure cloud-native architectures for Axon's software products.
- Containerization and Orchestration: Manage and optimize containerized applications and orchestration platforms to ensure efficient resource utilization and minimal downtime.
- Observability and Monitoring: Develop and maintain robust monitoring and alerting systems to ensure early detection and resolution of issues in Axon's cloud native services.
- Infrastructure as Code: Implement and maintain infrastructure as code practices to enable automated provisioning, deployment, and management of Axon's cloud native services.
Learning & Development Opportunities:
- Technical Specialization: Deepen your expertise in cloud-native architectures, container technologies, and infrastructure as code tools to drive innovation and improvement in Axon's software products.
- Conference Attendance: Attend industry conferences and events to stay up-to-date with the latest trends and best practices in cloud-native software development and site reliability engineering.
- Certification and Community Involvement: Obtain relevant certifications (e.g., AWS Certified Solutions Architect, Certified Kubernetes Administrator) and engage with the developer community to expand your network and knowledge base.
📝 Enhancement Note: Axon's technical challenges and learning opportunities focus on driving innovation and improvement in the company's cloud-native software products and site reliability engineering practices. By embracing these challenges and pursuing continuous learning, you can drive significant impact and growth in your career at Axon.
💡 Interview Preparation
Technical Questions:
- Cloud-Native Architecture: Describe your experience with cloud-native architectures and how you've ensured scalability, availability, and performance in your previous roles.
- Containerization and Orchestration: Explain your approach to containerization and orchestration, and how you've optimized resource utilization and minimized downtime in your previous projects.
- Observability and Monitoring: Discuss your experience with monitoring and alerting systems, and how you've ensured early detection and resolution of issues in cloud native services.
Company & Culture Questions:
- Site Reliability Engineering Culture: Describe your experience with site reliability engineering best practices and how you've driven improvement in your previous roles.
- Agile Methodologies: Explain your experience with Agile methodologies, and how you've collaborated with cross-functional teams to deliver innovative solutions.
- User Experience Impact: Discuss your approach to understanding and addressing user needs, and how you've ensured that your technical solutions meet business objectives and user expectations.
Portfolio Presentation Strategy:
- Cloud-Native Architecture: Highlight your experience with cloud-native architectures by including architectural diagrams and design decisions in your portfolio.
- Containerization and Orchestration: Demonstrate your expertise in containerization and orchestration by including examples of optimized resource utilization and minimal downtime in your portfolio.
- Observability and Monitoring: Showcase your experience with monitoring and alerting systems by including examples of early detection and resolution of issues in your portfolio.
📝 Enhancement Note: Axon's interview preparation focuses on assessing your technical skills, problem-solving abilities, and cultural fit within the organization. By preparing for the interview process and showcasing your experience with cloud-native architectures, container technologies, and site reliability engineering best practices, you can demonstrate your qualifications for the Senior Site Reliability Engineer role.
📌 Application Steps
To apply for this Senior Site Reliability Engineer position at Axon:
- Customize Your Portfolio: Tailor your portfolio to highlight your experience with cloud platforms, container technologies, and site reliability engineering best practices. Include case studies demonstrating your problem-solving skills and ability to deliver robust, scalable solutions.
- Optimize Your Resume: Highlight your relevant experience with cloud platforms, container technologies, and site reliability engineering in your resume. Include specific project examples and achievements that demonstrate your qualifications for the role.
- Prepare for Technical Interviews: Brush up on your knowledge of cloud-native architectures, container technologies, and infrastructure as code tools. Practice problem-solving and debugging exercises to hone your skills in identifying and resolving issues in cloud native distributed systems. Prepare for behavioral questions that assess your communication, collaboration, and problem-solving skills.
- Research Axon: Familiarize yourself with Axon's products, services, and company culture. Understand the company's mission, values, and commitment to protecting life through innovative cloud-native software solutions.
⚠️ Important Notice: This enhanced job description includes AI-generated insights and web development/server administration industry-standard assumptions. All details should be verified directly with Axon before making application decisions.
Application Requirements
Candidates should have 5+ years of applicable experience with strong skills in operating systems, cloud platforms, and container technologies. Proficiency in scripting languages and experience with DevOps practices and observability tools are also required.