📍 Job Overview

Job Title: Site Reliability Engineer (SRE), Cloud Incident Response
Company: SS&C Technologies
Location: Bangkok, Krung Thep Maha Nakhon, Thailand
Job Type: Full-Time
Category: DevOps, Site Reliability Engineering
Date Posted: June 12, 2025
Experience Level: 5-10 years (Fresh graduates welcome for junior roles)
Remote Status: Hybrid

🚀 Role Summary

Ensure the performance, scalability, and reliability of critical cloud-based applications in a global team environment.
Collaborate with cross-functional teams to respond to and resolve application incidents while enhancing application health monitoring.
Define, implement, and track key SRE metrics to drive reliability improvements and reduce incident recurrence.
Partner with development teams to improve application reliability and resilience.

📝 Enhancement Note: This role requires a strong focus on incident response, troubleshooting, and automation to ensure high-quality, reliable services for users.

💻 Primary Responsibilities

Incident Response: Respond to, troubleshoot, and resolve Level 2 application incidents in a follow-the-sun support model.
Application Monitoring: Ensure critical applications are effectively monitored using tools like Prometheus and Grafana, creating and maintaining dashboards and alerts to enhance visibility into application health.
SRE Metrics: Define, implement, and track key SRE metrics (SLOs, SLIs, error budgets) to measure and improve reliability.
Collaboration: Partner with development teams to improve application reliability and resilience, and analyze incident trends to recommend improvements.
Automation: Automate repetitive support tasks to improve efficiency and drive reliability initiatives.
Post-Incident Reviews: Participate in post-incident reviews and drive reliability improvements based on lessons learned.

📝 Enhancement Note: This role requires strong problem-solving skills, a focus on automation, and the ability to work effectively in a global team environment.

🎓 Skills & Qualifications

Education: A Bachelor’s degree in Computer Science, Computer Engineering, IT, or a related field.

Experience:

Senior Roles: 5+ years of experience in Site Reliability Engineering or similar roles.
Junior Roles: Fresh graduates are welcome to apply.

Required Skills:

Proficiency in one or more programming languages, preferably Java, JavaScript, or Python.
Proven ability to troubleshoot complex systems.
Strong debugging, code optimization, and automation skills.
Experience with relational databases and data analysis.

Preferred Skills:

Experience working in Site Reliability Engineer (SRE) roles or incident response environments.
Hands-on experience with cloud infrastructure, preferably AWS.
Familiarity with observability tools such as Grafana, ELK Stack, or similar.
Experience deploying and managing applications on Kubernetes platforms.
Strong skills in analyzing and troubleshooting issues in large-scale, distributed systems.

📝 Enhancement Note: This role requires a strong technical skill set, with a focus on troubleshooting, automation, and cloud infrastructure management.

📊 Web Portfolio & Project Requirements

Portfolio Essentials:

Demonstrate your ability to troubleshoot complex systems and automate repetitive tasks.
Showcase your experience with cloud infrastructure and application monitoring tools.
Highlight your problem-solving skills and ability to work effectively in a team environment.

Technical Documentation:

Document your approach to incident response, including your process for troubleshooting, automation, and collaboration with development teams.
Include examples of key SRE metrics you've defined and tracked to improve reliability.
Showcase your ability to analyze incident trends and recommend improvements to reduce recurrence.

📝 Enhancement Note: This role requires a strong focus on technical documentation, with an emphasis on incident response, automation, and reliability improvements.

💵 Compensation & Benefits

Salary Range: The salary range for this role in Bangkok, Thailand is approximately 300,000 - 450,000 THB per year, based on industry standards for DevOps and Site Reliability Engineering roles. This range takes into account the cost of living in Bangkok and the experience level required for the role.

Benefits:

Retirement Program
Professional Development Reimbursement
Flexible Personal/Vacation Time Off
Sick Leave
Paid Holidays
Business Leave
Maternity Leave
Ordination Leave
Medical, Dental, Vision, and Life Insurance
Annual Health Check Up
Employee Assistance Program
Parental Leave
Well-Stocked Pantry
Provident Fund Contribution
Bonus Scheme
SS&C Stock(s) Allocation
Discounts on fitness clubs, travel, and more!

Working Hours: 40 hours per week, with flexible hours and a hybrid work model.

📝 Enhancement Note: The salary range provided is an estimate based on market research and may vary depending on the candidate's qualifications and experience.

🎯 Team & Company Context

🏢 Company Culture

Industry: SS&C Technologies is a leading financial services and healthcare technology company based on revenue, with headquarters in Windsor, Connecticut, and over 27,000 employees in 35 countries.

Company Size: With over 27,000 employees, SS&C Technologies offers a large, diverse work environment with ample opportunities for growth and collaboration.

Founded: SS&C Technologies was founded in 1986 and has since grown to become a global leader in financial services and healthcare technology.

Team Structure:

The Global Investor and Distribution Solutions (GIDS) Platform Services team is responsible for ensuring the performance, scalability, and reliability of critical cloud-based applications.
The team operates in a follow-the-sun support model, collaborating with global teams to respond to and resolve application incidents.
The team consists of Site Reliability Engineers, DevOps Engineers, and other technical roles focused on incident response, automation, and reliability improvements.

Development Methodology:

The team follows Agile methodologies, with a focus on continuous integration, continuous deployment, and iterative development.
The team uses tools such as Jira, Confluence, and Bitbucket to manage projects and collaborate on development efforts.
The team emphasizes automation, monitoring, and observability to ensure high-quality, reliable services for users.

Company Website: SS&C Technologies

📝 Enhancement Note: SS&C Technologies offers a large, diverse work environment with ample opportunities for growth and collaboration, particularly for those interested in incident response, automation, and reliability improvements.

📈 Career & Growth Analysis

Web Technology Career Level: This role is at the intermediate to senior level, with a focus on incident response, troubleshooting, and automation. The role requires strong technical skills and the ability to work effectively in a global team environment.

Reporting Structure: This role reports directly to the Manager of Platform Services within the Global Investor and Distribution Solutions (GIDS) team. The team consists of Site Reliability Engineers, DevOps Engineers, and other technical roles focused on incident response, automation, and reliability improvements.

Technical Impact: This role has a significant impact on the reliability and performance of critical cloud-based applications, ensuring high-quality, reliable services for users. The role requires strong problem-solving skills, a focus on automation, and the ability to collaborate effectively with development teams to improve application reliability and resilience.

Growth Opportunities:

Technical Growth: This role offers opportunities for technical growth, including the chance to work with cutting-edge cloud infrastructure and observability tools, and to develop expertise in incident response, automation, and reliability improvements.
Leadership Growth: As the team grows, there may be opportunities for technical leadership roles, focusing on mentoring junior team members, driving reliability initiatives, and collaborating with development teams to improve application reliability and resilience.
Career Progression: With over 27,000 employees in 35 countries, SS&C Technologies offers ample opportunities for career progression, including the chance to work on diverse projects and to take on new challenges as the company continues to grow.

📝 Enhancement Note: This role offers strong opportunities for technical growth, with a focus on incident response, automation, and reliability improvements. The role also offers opportunities for leadership growth and career progression within the company.

🌐 Work Environment

Office Type: The SS&C Technologies office in Bangkok is a modern, collaborative workspace designed to foster innovation and teamwork. The office is centrally located, with easy access to public transportation.

Office Location(s): The office is located in the heart of Bangkok, just a short walk from Phromphong BTS or Sukhumvit MRT stations.

Workspace Context:

Collaboration: The office features open-plan workspaces, encouraging collaboration and communication among team members.
Technology: The office is equipped with state-of-the-art technology, including multiple monitors and testing devices, to support the development and testing of web applications.
Flexibility: The hybrid work model offers flexibility, allowing team members to work from home or the office as needed.

Work Schedule: The work schedule is flexible, with a focus on maintaining high-quality, reliable services for users. The team operates in a follow-the-sun support model, with global teams collaborating to respond to and resolve application incidents.

📝 Enhancement Note: The SS&C Technologies office in Bangkok offers a modern, collaborative workspace with state-of-the-art technology and a flexible work schedule, supporting the development and testing of web applications.

📄 Application & Technical Interview Process

Interview Process:

Technical Assessment: A technical assessment, focusing on incident response, troubleshooting, and automation, will be conducted to evaluate your problem-solving skills and technical expertise.
Behavioral Interview: A behavioral interview will be conducted to assess your ability to work effectively in a global team environment, with a focus on collaboration, communication, and adaptability.
Final Interview: A final interview with the hiring manager will be conducted to discuss your career goals, technical skills, and fit within the team.

Portfolio Review Tips:

Highlight your ability to troubleshoot complex systems and automate repetitive tasks.
Showcase your experience with cloud infrastructure and application monitoring tools.
Include examples of key SRE metrics you've defined and tracked to improve reliability.
Demonstrate your ability to analyze incident trends and recommend improvements to reduce recurrence.

Technical Challenge Preparation:

Brush up on your incident response, troubleshooting, and automation skills.
Familiarize yourself with cloud infrastructure, preferably AWS, and application monitoring tools such as Prometheus and Grafana.
Prepare for technical questions focused on incident response, automation, and reliability improvements.

ATS Keywords: [See the comprehensive list of web development and server administration-relevant keywords for resume optimization, organized by category: programming languages, web frameworks, server technologies, databases, tools, methodologies, soft skills, industry terms]

📝 Enhancement Note: The interview process for this role focuses on incident response, troubleshooting, and automation, with a strong emphasis on technical skills and the ability to work effectively in a global team environment.

🛠 Technology Stack & Web Infrastructure

Cloud Infrastructure:

AWS (Amazon Web Services)
Other cloud providers may be used as needed

Observability Tools:

Prometheus
Grafana
ELK Stack (Elasticsearch, Logstash, Kibana)
Other observability tools may be used as needed

Programming Languages:

Java
JavaScript
Python
Other programming languages may be used as needed

Databases:

Relational databases (e.g., MySQL, PostgreSQL)
NoSQL databases (e.g., MongoDB, Cassandra)
Other databases may be used as needed

📝 Enhancement Note: This role requires strong technical skills in cloud infrastructure, observability tools, and programming languages, with a focus on incident response, troubleshooting, and automation.

👥 Team Culture & Values

Web Development Values:

Reliability: Ensure high-quality, reliable services for users through incident response, troubleshooting, and automation.
Collaboration: Work effectively in a global team environment, collaborating with development teams to improve application reliability and resilience.
Continuous Improvement: Continuously improve reliability and performance through incident analysis, automation, and monitoring.
Customer Focus: Focus on the needs of users, ensuring high-quality, reliable services that meet their expectations.

Collaboration Style:

Global Teamwork: Collaborate with global teams in a follow-the-sun support model to respond to and resolve application incidents.
Cross-Functional Collaboration: Work closely with development teams to improve application reliability and resilience, and to drive reliability initiatives.
Knowledge Sharing: Share knowledge and best practices with team members to improve reliability and performance.

📝 Enhancement Note: The team culture at SS&C Technologies emphasizes reliability, collaboration, continuous improvement, and customer focus, with a strong emphasis on incident response, troubleshooting, and automation.

⚡ Challenges & Growth Opportunities

Technical Challenges:

Incident Response: Respond to and resolve complex application incidents in a global team environment, with a focus on automation and reliability improvements.
Automation: Automate repetitive support tasks to improve efficiency and drive reliability initiatives.
Monitoring: Ensure critical applications are effectively monitored using tools like Prometheus and Grafana, creating and maintaining dashboards and alerts to enhance visibility into application health.
Reliability Metrics: Define, implement, and track key SRE metrics (SLOs, SLIs, error budgets) to measure and improve reliability.

Learning & Development Opportunities:

Technical Skills: Develop your technical skills in incident response, automation, and cloud infrastructure management through on-the-job training, workshops, and online courses.
Leadership Skills: Develop your leadership skills through mentoring junior team members, driving reliability initiatives, and collaborating with development teams to improve application reliability and resilience.
Career Progression: Pursue career progression opportunities within the company, including the chance to work on diverse projects and to take on new challenges as the company continues to grow.

📝 Enhancement Note: This role offers strong opportunities for technical growth, with a focus on incident response, automation, and cloud infrastructure management. The role also offers opportunities for leadership growth and career progression within the company.

💡 Interview Preparation

Technical Questions:

Incident Response: Describe your approach to incident response, including your process for troubleshooting, automation, and collaboration with development teams.
Automation: Explain your experience with automation, including your approach to automating repetitive support tasks and driving reliability initiatives.
Monitoring: Discuss your experience with application monitoring tools, such as Prometheus and Grafana, and your approach to creating and maintaining dashboards and alerts to enhance visibility into application health.
Reliability Metrics: Describe your experience with defining, implementing, and tracking key SRE metrics (SLOs, SLIs, error budgets) to measure and improve reliability.

Company & Culture Questions:

Team Dynamics: Describe your experience working in a global team environment, with a focus on collaboration, communication, and adaptability.
Company Culture: Explain what you value in a company culture and how you believe you can contribute to the team's success.
Career Goals: Discuss your career goals and how this role can help you achieve them.

Portfolio Presentation Strategy:

Incident Response: Highlight your ability to troubleshoot complex systems and automate repetitive tasks, with a focus on incident response, automation, and reliability improvements.
Automation: Showcase your experience with automation, including your approach to automating repetitive support tasks and driving reliability initiatives.
Monitorability: Demonstrate your experience with application monitoring tools, such as Prometheus and Grafana, and your approach to creating and maintaining dashboards and alerts to enhance visibility into application health.

📝 Enhancement Note: The interview process for this role focuses on incident response, troubleshooting, and automation, with a strong emphasis on technical skills and the ability to work effectively in a global team environment.

📌 Application Steps

To apply for this Site Reliability Engineer (SRE), Cloud Incident Response position at SS&C Technologies:

Resume Optimization: Optimize your resume for web development and server administration keywords, with a focus on incident response, troubleshooting, and automation.
Portfolio Customization: Customize your portfolio to highlight your ability to troubleshoot complex systems and automate repetitive tasks, with a focus on incident response, automation, and reliability improvements.
Technical Interview Preparation: Brush up on your incident response, troubleshooting, and automation skills, and prepare for technical questions focused on incident response, automation, and reliability improvements.
Company Research: Research SS&C Technologies and the Global Investor and Distribution Solutions (GIDS) Platform Services team, focusing on the company's culture, values, and technical stack.

⚠️ Important Notice: This enhanced job description includes AI-generated insights and web development/server administration industry-standard assumptions. All details should be verified directly with the hiring organization before making application decisions.

PA2025Q3JB090 Site Reliability Engineer (SRE), Cloud Incident Response