Site Reliability Engineer Lead / SRE Lead

Sertis
Full_timeBangkok, Thailand

📍 Job Overview

  • Job Title: Senior-Lead Site Reliability Engineer
  • Company: Sertis
  • Location: Bangkok, Krung Thep Maha Nakhon, Thailand
  • Job Type: Hybrid
  • Category: DevOps / Site Reliability Engineering
  • Date Posted: July 4, 2025
  • Experience Level: 5-10 years
  • Remote Status: On-site/Hybrid

🚀 Role Summary

  • Lead the improvement of software development and deployment processes for reliability and efficiency
  • Ensure system availability, performance, and scalability through continuous monitoring and maintenance
  • Automate infrastructure provisioning, configuration management, and deployment processes
  • Design, build, and maintain CI/CD pipelines for efficient and reliable software releases
  • Develop and maintain runbooks/playbooks for incident response and regular maintenance activities
  • Collaborate with cross-functional teams to identify and resolve production issues, establish best practices, and improve overall system quality and efficiency
  • Mentor junior and mid-level engineers in SRE best practices and technical skills development
  • Participate in pre-sales activities, recruitment, and process improvement efforts
  • Advocate for DevOps practices within the company

💻 Primary Responsibilities

  • Process Improvement: Streamline software development and deployment processes to enhance reliability and efficiency
  • System Availability: Ensure high system availability, performance, and scalability through continuous monitoring and maintenance
  • Automation: Automate infrastructure provisioning, configuration management, and deployment processes to reduce manual effort and human error
  • CI/CD Pipeline Management: Design, build, and maintain CI/CD pipelines for efficient and reliable software releases
  • Incident Response: Develop and maintain runbooks/playbooks for incident response and regular maintenance activities to minimize downtime and service disruptions
  • Collaboration: Work closely with cross-functional teams to identify and resolve production issues, establish best practices, and improve overall system quality and efficiency
  • Mentoring: Mentor junior and mid-level engineers in SRE best practices and technical skills development to enhance their career growth and team performance
  • Pre-Sales Support: Participate in pre-sales activities, gathering customer requirements, and estimating project scope
  • Recruitment & Process Improvement: Contribute to recruitment efforts and process improvement initiatives to enhance the overall efficiency and effectiveness of the team

🎓 Skills & Qualifications

Education: Bachelor's degree in Computer Science, Engineering, or a related field with a strong focus on software development, computer networks, or a related discipline

Experience: 5-8 years of hands-on experience in designing, building, maintaining cloud infrastructure, and applying DevOps and SRE practices in large-scale systems

Required Skills:

  • Strong automation and IaC skills, including experience with tools such as Terraform, AWS CDK, Flux/ArgoCD, Helm, and Gitlab CI
  • In-depth knowledge of cloud infrastructure and its components, including virtual machine, serverless, storage, networking, and security, with hands-on experience in deploying and managing applications in cloud environments
  • Experience with container orchestration principles and techniques, including hands-on experience with Docker and platforms such as Kubernetes
  • Familiarity with monitoring tools (Prometheus/Grafana preferred) and defining SLAs
  • Strong problem-solving skills and experience in troubleshooting production issues
  • Excellent communication (written/verbal) skills and the ability to effectively communicate with technical and non-technical stakeholders
  • Ability to work collaboratively with different teams and adapt to a dynamic work environment
  • Leadership skills and ability to mentor junior and mid-level Site Reliability Engineers
  • Familiarity with Agile, DevOps, and SRE best practices and methodologies
  • Proactiveness in keeping up to date with the latest technology and industry trends

Preferred Skills:

  • Experience with penetration testing tools such as Nessus, Nikto, nmap, and familiarity with their usage
  • Holding certifications from cloud providers or CNCF (such as CKA/CKAD, AWS/GCP/Azure)
  • Experience with service mesh technologies, such as Istio or Linkerd
  • Knowledge of canary deployment strategies for testing and deploying new releases in a controlled and safe manner
  • Ability to optimize cloud costs through various strategies
  • Experience working on multiple projects simultaneously and gathering customer requirements during pre-sales interactions

💵 Compensation & Benefits

  • Salary Range: Competitive salary package based on experience and industry standards for the region, with a range of THB 1,500,000 - 2,500,000 per year (approximately USD 45,000 - 75,000 per year)
  • Benefits:
    • Flexible office hours and hybrid work arrangement
    • Learning opportunities and career growth potential in a growing data-driven and AI industry
    • Amazing colleagues and social outings, parties, and events
    • Result-oriented workplace with autonomy to deliver best work
    • Innovative work environment at the frontier of AI industry innovation

🎯 Team & Company Context

Company Culture:

  • Hybrid working environment with flexible office hours
  • Collaborative and informal culture that values learning, growth, and innovation
  • Focus on getting things done while maintaining a down-to-earth and informal atmosphere
  • Work-life balance and emphasis on employee well-being

Development Methodology:

  • Agile and DevOps methodologies for software development and deployment
  • Continuous integration and continuous deployment (CI/CD) pipelines for automated testing and deployment
  • Infrastructure as Code (IaC) practices for automated infrastructure provisioning and management
  • Regular code reviews, testing, and quality assurance processes to ensure code quality and maintainability

📈 Career & Growth Analysis

Web Technology Career Level: Senior-Lead Site Reliability Engineer, responsible for leading the improvement of software development and deployment processes, ensuring system availability, and mentoring junior and mid-level engineers

Reporting Structure: Reports directly to the Head of Engineering or a relevant manager, depending on the company's organizational structure

Technical Impact: Significant influence on software development processes, system availability, and overall system quality and efficiency. Responsible for defining and implementing best practices for reliability and performance, improving the overall quality and efficiency of the systems, and mentoring junior and mid-level engineers in SRE best practices and technical skills development

Growth Opportunities:

  • Technical leadership potential with team management and architecture decision-making responsibilities
  • Continuous learning and skill development opportunities in emerging technologies and industry trends
  • Career progression paths in technical leadership, architecture, or management roles within the company or broader industry

🌐 Work Environment

Office Type: Hybrid office environment with flexible office hours and a focus on work-life balance

Office Location(s): Sertis' main office is located in the heart of Bangkok's Phrom Phong District, with additional offices in other regions as the company continues to expand

Workspace Context:

  • Collaborative workspace with cross-functional team interaction and knowledge sharing opportunities
  • Access to modern development tools, multiple monitors, and testing devices to enhance productivity and efficiency
  • Regular team-building activities, social events, and outings to foster a strong team culture and employee engagement

Work Schedule:

  • Flexible office hours with a focus on work-life balance and employee well-being
  • Regular team meetings and one-on-one sessions to discuss project progress, address any challenges, and provide support and guidance as needed
  • Occasional on-call duties to ensure system availability and performance during critical periods

🛠 Technology Stack & Web Infrastructure

Frontend Technologies:

  • Not applicable for this role, as it focuses on backend and infrastructure management

Backend & Server Technologies:

  • Cloud infrastructure management (AWS, GCP, Azure, or other cloud providers)
  • Container orchestration platforms (Kubernetes, Docker, or other container management tools)
  • Infrastructure as Code (IaC) tools (Terraform, AWS CDK, or other IaC tools)
  • CI/CD pipelines and automation tools (Gitlab CI, Jenkins, or other CI/CD tools)
  • Monitoring and logging tools (Prometheus, Grafana, ELK Stack, or other monitoring and logging solutions)
  • Configuration management tools (Ansible, Puppet, or other configuration management tools)
  • Serverless architecture and functions (AWS Lambda, Google Cloud Functions, Azure Functions, or other serverless platforms)

Development & DevOps Tools:

  • Version control systems (Git, SVN, or other version control systems)
  • Collaboration and communication tools (Slack, Microsoft Teams, or other collaboration platforms)
  • Project management tools (Jira, Asana, or other project management tools)
  • Documentation and knowledge management tools (Confluence, Notion, or other knowledge management tools)
  • Container registry and image management tools (Docker Hub, Google Container Registry, or other container registry tools)

👥 Team Culture & Values

Web Development Values:

  • Focus on reliability, performance, and scalability to ensure optimal system availability and user experience
  • Emphasis on automation, IaC, and CI/CD pipelines to enhance development efficiency and reduce human error
  • Collaboration and knowledge sharing to foster a strong team culture and continuous learning
  • User-centered design and accessibility standards to ensure a positive user experience for all users
  • Attention to detail and quality assurance processes to ensure code quality and maintainability

Collaboration Style:

  • Agile and DevOps methodologies for software development and deployment
  • Cross-functional integration between developers, designers, and stakeholders
  • Code reviews and pair programming practices to enhance code quality and knowledge sharing
  • Regular team meetings and one-on-one sessions to discuss project progress, address any challenges, and provide support and guidance as needed
  • Open communication and feedback culture to encourage continuous improvement and employee engagement

🛡 Challenges & Growth Opportunities

Technical Challenges:

  • Ensuring high system availability, performance, and scalability through continuous monitoring, maintenance, and optimization
  • Automating infrastructure provisioning, configuration management, and deployment processes to reduce manual effort and human error
  • Implementing and maintaining SLAs to meet or exceed customer expectations and ensure optimal system performance
  • Designing and building new infrastructure to support business growth and evolving user needs
  • Collaborating with cross-functional teams to identify and resolve production issues, establish best practices, and improve overall system quality and efficiency

Learning & Development Opportunities:

  • Continuous learning and skill development in emerging technologies, industry trends, and best practices for reliability and performance
  • Mentoring junior and mid-level engineers in SRE best practices and technical skills development to enhance their career growth and team performance
  • Participation in technical communities, conferences, and training programs to stay up-to-date with the latest trends and best practices in the industry
  • Contribution to open-source projects, blog posts, and technical documentation to share knowledge and enhance personal branding

💡 Interview Preparation

Technical Questions:

  • In-depth technical questions related to cloud infrastructure, container orchestration, and automation tools
  • Scenario-based questions to assess problem-solving skills, troubleshooting production issues, and incident response strategies
  • Questions related to monitoring tools, defining SLAs, and ensuring system availability and performance

Company & Culture Questions:

  • Questions related to the company's mission, vision, and values, as well as the team's dynamics and work environment
  • Questions related to the role's responsibilities, growth opportunities, and the team's goals and objectives
  • Questions related to the company's products, services, and industry-specific challenges and trends

Portfolio Presentation Strategy:

  • Highlight specific projects that demonstrate the candidate's experience in cloud infrastructure, container orchestration, and automation tools
  • Showcase the candidate's problem-solving skills, incident response strategies, and ability to ensure system availability and performance
  • Emphasize the candidate's ability to collaborate with cross-functional teams, establish best practices, and improve overall system quality and efficiency

📌 Application & Technical Interview Process

Interview Process:

  1. Application Review: The hiring manager or a member of the HR team reviews the candidate's application and resume to assess their qualifications and fit for the role.
  2. Phone Screen: A brief phone call or video conference to discuss the candidate's background, experience, and career goals. This step may also include a technical screening to assess the candidate's problem-solving skills and knowledge of relevant technologies.
  3. Technical Deep Dive: A more in-depth technical interview focused on the candidate's expertise in cloud infrastructure, container orchestration, and automation tools. This step may include hands-on exercises, code reviews, or architecture design challenges.
  4. Behavioral Interview: A structured interview to assess the candidate's soft skills, cultural fit, and problem-solving abilities. This step may include scenario-based questions, role-playing exercises, or case studies.
  5. Final Interview: A final interview with the hiring manager or a panel of stakeholders to discuss the candidate's qualifications, fit for the role, and any remaining questions or concerns.
  6. Offer Extension: If the candidate is selected, the company extends a formal job offer outlining the terms and conditions of employment.

Portfolio Review Tips:

  • Tailor the portfolio to highlight relevant projects that demonstrate the candidate's experience in cloud infrastructure, container orchestration, and automation tools
  • Include specific examples of the candidate's problem-solving skills, incident response strategies, and ability to ensure system availability and performance
  • Emphasize the candidate's ability to collaborate with cross-functional teams, establish best practices, and improve overall system quality and efficiency

Technical Challenge Preparation:

  • Review the job description and required skills to identify the key technologies and concepts that will be assessed during the technical interview
  • Practice hands-on exercises, code reviews, and architecture design challenges to build confidence and demonstrate proficiency in relevant tools and technologies
  • Familiarize yourself with the company's products, services, and industry-specific challenges and trends to showcase your understanding of the business and ability to contribute to its success

🛠 ATS Keywords

Programming Languages:

  • Python, Bash, Go, Java, C++, or other relevant programming languages

Web Frameworks:

  • Not applicable for this role, as it focuses on backend and infrastructure management

Server Technologies:

  • Cloud infrastructure (AWS, GCP, Azure, or other cloud providers)
  • Container orchestration platforms (Kubernetes, Docker, or other container management tools)
  • Infrastructure as Code (IaC) tools (Terraform, AWS CDK, or other IaC tools)
  • CI/CD pipelines and automation tools (Gitlab CI, Jenkins, or other CI/CD tools)
  • Monitoring and logging tools (Prometheus, Grafana, ELK Stack, or other monitoring and logging solutions)
  • Configuration management tools (Ansible, Puppet, or other configuration management tools)
  • Serverless architecture and functions (AWS Lambda, Google Cloud Functions, Azure Functions, or other serverless platforms)

Databases:

  • Relational databases (MySQL, PostgreSQL, or other relational databases)
  • NoSQL databases (MongoDB, Cassandra, or other NoSQL databases)
  • Cloud databases (AWS RDS, Google Cloud SQL, Azure SQL Database, or other cloud databases)

Tools:

  • Version control systems (Git, SVN, or other version control systems)
  • Collaboration and communication tools (Slack, Microsoft Teams, or other collaboration platforms)
  • Project management tools (Jira, Asana, or other project management tools)
  • Documentation and knowledge management tools (Confluence, Notion, or other knowledge management tools)
  • Container registry and image management tools (Docker Hub, Google Container Registry, or other container registry tools)

Methodologies:

  • Agile methodologies (Scrum, Kanban, or other Agile methodologies)
  • DevOps methodologies (Infrastructure as Code, Continuous Integration/Continuous Deployment, or other DevOps methodologies)
  • Site Reliability Engineering (Chaos Engineering, Mean Time to Recovery, or other SRE methodologies)

Soft Skills:

  • Problem-solving skills, troubleshooting production issues, and incident response strategies
  • Collaboration and communication skills, both written and verbal
  • Leadership skills and ability to mentor junior and mid-level engineers
  • Adaptability and ability to work in a dynamic and fast-paced environment
  • Attention to detail and quality assurance processes to ensure code quality and maintainability

Industry Terms:

  • Cloud infrastructure, container orchestration, automation, IaC, CI/CD, monitoring, logging, configuration management, serverless architecture, and other relevant industry terms

Application Requirements

Candidates should have 5-8 years of experience in cloud infrastructure and SRE practices, with strong skills in automation and container orchestration. Familiarity with Agile methodologies and excellent communication skills are also essential.