Site Reliability Engineer Lead / SRE Lead at Sertis

📍 Job Overview

Job Title: Senior-Lead Site Reliability Engineer
Company: Sertis
Location: Bangkok, Krung Thep Maha Nakhon, Thailand
Job Type: Hybrid
Category: DevOps / Site Reliability Engineering
Date Posted: July 4, 2025
Experience Level: 5-10 years
Remote Status: On-site/Hybrid

🚀 Role Summary

Lead the improvement of software development and deployment processes for reliability and efficiency
Ensure system availability, performance, and scalability through continuous monitoring and maintenance
Automate infrastructure provisioning, configuration management, and deployment processes
Design, build, and maintain CI/CD pipelines for efficient and reliable software releases
Develop and maintain runbooks/playbooks for incident response and regular maintenance activities
Collaborate with cross-functional teams to identify and resolve production issues, establish best practices, and improve overall system quality and efficiency
Mentor junior and mid-level engineers in SRE best practices and technical skills development
Participate in pre-sales activities, recruitment, and process improvement efforts
Advocate for DevOps practices within the company

💻 Primary Responsibilities

Process Improvement: Streamline software development and deployment processes to enhance reliability and efficiency
System Availability: Ensure high system availability, performance, and scalability through continuous monitoring and maintenance
Automation: Automate infrastructure provisioning, configuration management, and deployment processes to reduce manual effort and human error
CI/CD Pipeline Management: Design, build, and maintain CI/CD pipelines for efficient and reliable software releases
Incident Response: Develop and maintain runbooks/playbooks for incident response and regular maintenance activities to minimize downtime and service disruptions
Collaboration: Work closely with cross-functional teams to identify and resolve production issues, establish best practices, and improve overall system quality and efficiency
Mentoring: Mentor junior and mid-level engineers in SRE best practices and technical skills development to enhance their career growth and team performance
Pre-Sales Support: Participate in pre-sales activities, gathering customer requirements, and estimating project scope
Recruitment & Process Improvement: Contribute to recruitment efforts and process improvement initiatives to enhance the overall efficiency and effectiveness of the team

🎓 Skills & Qualifications

Education: Bachelor's degree in Computer Science, Engineering, or a related field with a strong focus on software development, computer networks, or a related discipline

Experience: 5-8 years of hands-on experience in designing, building, maintaining cloud infrastructure, and applying DevOps and SRE practices in large-scale systems

Required Skills:

Strong automation and IaC skills, including experience with tools such as Terraform, AWS CDK, Flux/ArgoCD, Helm, and Gitlab CI
In-depth knowledge of cloud infrastructure and its components, including virtual machine, serverless, storage, networking, and security, with hands-on experience in deploying and managing applications in cloud environments
Experience with container orchestration principles and techniques, including hands-on experience with Docker and platforms such as Kubernetes
Familiarity with monitoring tools (Prometheus/Grafana preferred) and defining SLAs
Strong problem-solving skills and experience in troubleshooting production issues
Excellent communication (written/verbal) skills and the ability to effectively communicate with technical and non-technical stakeholders
Ability to work collaboratively with different teams and adapt to a dynamic work environment
Leadership skills and ability to mentor junior and mid-level Site Reliability Engineers
Familiarity with Agile, DevOps, and SRE best practices and methodologies
Proactiveness in keeping up to date with the latest technology and industry trends

Preferred Skills:

Experience with penetration testing tools such as Nessus, Nikto, nmap, and familiarity with their usage
Holding certifications from cloud providers or CNCF (such as CKA/CKAD, AWS/GCP/Azure)
Experience with service mesh technologies, such as Istio or Linkerd
Knowledge of canary deployment strategies for testing and deploying new releases in a controlled and safe manner
Ability to optimize cloud costs through various strategies
Experience working on multiple projects simultaneously and gathering customer requirements during pre-sales interactions

💵 Compensation & Benefits

Salary Range: Competitive salary package based on experience and industry standards for the region, with a range of THB 1,500,000 - 2,500,000 per year (approximately USD 45,000 - 75,000 per year)
Benefits:
- Flexible office hours and hybrid work arrangement
- Learning opportunities and career growth potential in a growing data-driven and AI industry
- Amazing colleagues and social outings, parties, and events
- Result-oriented workplace with autonomy to deliver best work
- Innovative work environment at the frontier of AI industry innovation

🎯 Team & Company Context

Company Culture:

Hybrid working environment with flexible office hours
Collaborative and informal culture that values learning, growth, and innovation
Focus on getting things done while maintaining a down-to-earth and informal atmosphere
Work-life balance and emphasis on employee well-being

Development Methodology:

Agile and DevOps methodologies for software development and deployment
Continuous integration and continuous deployment (CI/CD) pipelines for automated testing and deployment
Infrastructure as Code (IaC) practices for automated infrastructure provisioning and management
Regular code reviews, testing, and quality assurance processes to ensure code quality and maintainability

📈 Career & Growth Analysis

Web Technology Career Level: Senior-Lead Site Reliability Engineer, responsible for leading the improvement of software development and deployment processes, ensuring system availability, and mentoring junior and mid-level engineers

Reporting Structure: Reports directly to the Head of Engineering or a relevant manager, depending on the company's organizational structure

Technical Impact: Significant influence on software development processes, system availability, and overall system quality and efficiency. Responsible for defining and implementing best practices for reliability and performance, improving the overall quality and efficiency of the systems, and mentoring junior and mid-level engineers in SRE best practices and technical skills development

Growth Opportunities:

Technical leadership potential with team management and architecture decision-making responsibilities
Continuous learning and skill development opportunities in emerging technologies and industry trends
Career progression paths in technical leadership, architecture, or management roles within the company or broader industry

🌐 Work Environment

Office Type: Hybrid office environment with flexible office hours and a focus on work-life balance

Office Location(s): Sertis' main office is located in the heart of Bangkok's Phrom Phong District, with additional offices in other regions as the company continues to expand

Workspace Context:

Collaborative workspace with cross-functional team interaction and knowledge sharing opportunities
Access to modern development tools, multiple monitors, and testing devices to enhance productivity and efficiency
Regular team-building activities, social events, and outings to foster a strong team culture and employee engagement

Work Schedule:

Flexible office hours with a focus on work-life balance and employee well-being
Regular team meetings and one-on-one sessions to discuss project progress, address any challenges, and provide support and guidance as needed
Occasional on-call duties to ensure system availability and performance during critical periods

🛠 Technology Stack & Web Infrastructure

Frontend Technologies:

Not applicable for this role, as it focuses on backend and infrastructure management

Backend & Server Technologies:

Cloud infrastructure management (AWS, GCP, Azure, or other cloud providers)
Container orchestration platforms (Kubernetes, Docker, or other container management tools)
Infrastructure as Code (IaC) tools (Terraform, AWS CDK, or other IaC tools)
CI/CD pipelines and automation tools (Gitlab CI, Jenkins, or other CI/CD tools)
Monitoring and logging tools (Prometheus, Grafana, ELK Stack, or other monitoring and logging solutions)
Configuration management tools (Ansible, Puppet, or other configuration management tools)
Serverless architecture and functions (AWS Lambda, Google Cloud Functions, Azure Functions, or other serverless platforms)

Development & DevOps Tools:

Version control systems (Git, SVN, or other version control systems)
Collaboration and communication tools (Slack, Microsoft Teams, or other collaboration platforms)
Project management tools (Jira, Asana, or other project management tools)
Documentation and knowledge management tools (Confluence, Notion, or other knowledge management tools)
Container registry and image management tools (Docker Hub, Google Container Registry, or other container registry tools)

👥 Team Culture & Values

Web Development Values:

Focus on reliability, performance, and scalability to ensure optimal system availability and user experience
Emphasis on automation, IaC, and CI/CD pipelines to enhance development efficiency and reduce human error
Collaboration and knowledge sharing to foster a strong team culture and continuous learning
User-centered design and accessibility standards to ensure a positive user experience for all users
Attention to detail and quality assurance processes to ensure code quality and maintainability

Collaboration Style:

Agile and DevOps methodologies for software development and deployment
Cross-functional integration between developers, designers, and stakeholders
Code reviews and pair programming practices to enhance code quality and knowledge sharing
Regular team meetings and one-on-one sessions to discuss project progress, address any challenges, and provide support and guidance as needed
Open communication and feedback culture to encourage continuous improvement and employee engagement

🛡 Challenges & Growth Opportunities

Technical Challenges:

Ensuring high system availability, performance, and scalability through continuous monitoring, maintenance, and optimization
Automating infrastructure provisioning, configuration management, and deployment processes to reduce manual effort and human error
Implementing and maintaining SLAs to meet or exceed customer expectations and ensure optimal system performance
Designing and building new infrastructure to support business growth and evolving user needs
Collaborating with cross-functional teams to identify and resolve production issues, establish best practices, and improve overall system quality and efficiency

Learning & Development Opportunities:

Continuous learning and skill development in emerging technologies, industry trends, and best practices for reliability and performance
Mentoring junior and mid-level engineers in SRE best practices and technical skills development to enhance their career growth and team performance
Participation in technical communities, conferences, and training programs to stay up-to-date with the latest trends and best practices in the industry
Contribution to open-source projects, blog posts, and technical documentation to share knowledge and enhance personal branding

💡 Interview Preparation

Technical Questions:

In-depth technical questions related to cloud infrastructure, container orchestration, and automation tools
Scenario-based questions to assess problem-solving skills, troubleshooting production issues, and incident response strategies
Questions related to monitoring tools, defining SLAs, and ensuring system availability and performance

Company & Culture Questions:

Questions related to the company's mission, vision, and values, as well as the team's dynamics and work environment
Questions related to the role's responsibilities, growth opportunities, and the team's goals and objectives
Questions related to the company's products, services, and industry-specific challenges and trends

Portfolio Presentation Strategy:

Highlight specific projects that demonstrate the candidate's experience in cloud infrastructure, container orchestration, and automation tools
Showcase the candidate's problem-solving skills, incident response strategies, and ability to ensure system availability and performance
Emphasize the candidate's ability to collaborate with cross-functional teams, establish best practices, and improve overall system quality and efficiency

📌 Application & Technical Interview Process

Interview Process:

Application Review: The hiring manager or a member of the HR team reviews the candidate's application and resume to assess their qualifications and fit for the role.
Phone Screen: A brief phone call or video conference to discuss the candidate's background, experience, and career goals. This step may also include a technical screening to assess the candidate's problem-solving skills and knowledge of relevant technologies.
Technical Deep Dive: A more in-depth technical interview focused on the candidate's expertise in cloud infrastructure, container orchestration, and automation tools. This step may include hands-on exercises, code reviews, or architecture design challenges.
Behavioral Interview: A structured interview to assess the candidate's soft skills, cultural fit, and problem-solving abilities. This step may include scenario-based questions, role-playing exercises, or case studies.
Final Interview: A final interview with the hiring manager or a panel of stakeholders to discuss the candidate's qualifications, fit for the role, and any remaining questions or concerns.
Offer Extension: If the candidate is selected, the company extends a formal job offer outlining the terms and conditions of employment.

Portfolio Review Tips:

Tailor the portfolio to highlight relevant projects that demonstrate the candidate's experience in cloud infrastructure, container orchestration, and automation tools
Include specific examples of the candidate's problem-solving skills, incident response strategies, and ability to ensure system availability and performance
Emphasize the candidate's ability to collaborate with cross-functional teams, establish best practices, and improve overall system quality and efficiency

Technical Challenge Preparation:

Review the job description and required skills to identify the key technologies and concepts that will be assessed during the technical interview
Practice hands-on exercises, code reviews, and architecture design challenges to build confidence and demonstrate proficiency in relevant tools and technologies
Familiarize yourself with the company's products, services, and industry-specific challenges and trends to showcase your understanding of the business and ability to contribute to its success

🛠 ATS Keywords

Programming Languages:

Python, Bash, Go, Java, C++, or other relevant programming languages

Web Frameworks:

Not applicable for this role, as it focuses on backend and infrastructure management

Server Technologies:

Cloud infrastructure (AWS, GCP, Azure, or other cloud providers)
Container orchestration platforms (Kubernetes, Docker, or other container management tools)
Infrastructure as Code (IaC) tools (Terraform, AWS CDK, or other IaC tools)
CI/CD pipelines and automation tools (Gitlab CI, Jenkins, or other CI/CD tools)
Monitoring and logging tools (Prometheus, Grafana, ELK Stack, or other monitoring and logging solutions)
Configuration management tools (Ansible, Puppet, or other configuration management tools)
Serverless architecture and functions (AWS Lambda, Google Cloud Functions, Azure Functions, or other serverless platforms)

Databases:

Relational databases (MySQL, PostgreSQL, or other relational databases)
NoSQL databases (MongoDB, Cassandra, or other NoSQL databases)
Cloud databases (AWS RDS, Google Cloud SQL, Azure SQL Database, or other cloud databases)

Tools:

Version control systems (Git, SVN, or other version control systems)
Collaboration and communication tools (Slack, Microsoft Teams, or other collaboration platforms)
Project management tools (Jira, Asana, or other project management tools)
Documentation and knowledge management tools (Confluence, Notion, or other knowledge management tools)
Container registry and image management tools (Docker Hub, Google Container Registry, or other container registry tools)

Methodologies:

Agile methodologies (Scrum, Kanban, or other Agile methodologies)
DevOps methodologies (Infrastructure as Code, Continuous Integration/Continuous Deployment, or other DevOps methodologies)
Site Reliability Engineering (Chaos Engineering, Mean Time to Recovery, or other SRE methodologies)

Soft Skills:

Problem-solving skills, troubleshooting production issues, and incident response strategies
Collaboration and communication skills, both written and verbal
Leadership skills and ability to mentor junior and mid-level engineers
Adaptability and ability to work in a dynamic and fast-paced environment
Attention to detail and quality assurance processes to ensure code quality and maintainability

Industry Terms:

Cloud infrastructure, container orchestration, automation, IaC, CI/CD, monitoring, logging, configuration management, serverless architecture, and other relevant industry terms

Site Reliability Engineer Lead / SRE Lead

Application Requirements

Company

Jobs

Job Feeds

Legal