Site Reliability Engineer
📍 Job Overview
- Job Title: Site Reliability Engineer
- Company: ComplyAdvantage
- Location: Lisbon, Portugal
- Job Type: Hybrid (2 days in the office)
- Category: DevOps Engineer
- Date Posted: 2025-07-25
- Experience Level: Mid-Senior Level (5-10 years)
- Remote Status: On-site/Hybrid
🚀 Role Summary
- Key Responsibilities: Design, build, and maintain reliable foundational services for CI/CD pipelines and observability platforms. Participate in on-call rotations to respond to production incidents and lead post-incident reviews.
- Key Technologies: Cloud-based infrastructure (AWS/GCP), Kubernetes, Terraform, Helm, CI/CD tooling, observability platforms, Python, and containerized workloads.
📝 Enhancement Note: This role requires a strong background in cloud services, Kubernetes, and CI/CD tooling. Familiarity with the company's tech stack, including AWS/GCP, Kubernetes, Terraform, Helm, and observability platforms, is essential for success in this position.
💻 Primary Responsibilities
- Design and Build: Architect, implement, and maintain highly available and reliable foundational services for CI/CD pipelines, observability platforms, and the Internal Developer Platform.
- Ensure Reliability: Participate in an on-call rotation to effectively respond to and resolve production incidents swiftly. Lead thorough post-incident reviews to identify root causes and implement proactive preventative measures.
- Automate Infrastructure: Manage and automate the cloud infrastructure using Terraform and Helm, adhering to GitOps best practices.
- Collaborate Effectively: Partner closely with development and data engineering teams to ensure seamless deployments and provide robust operational support.
📝 Enhancement Note: This role involves a significant amount of collaboration with other engineering teams. Strong communication and teamwork skills are crucial for success in this position.
🎓 Skills & Qualifications
Education: BSc/BA degree in computer science, engineering, or a related discipline, or relevant years of experience in required skills.
Experience: 5-10 years of experience in cloud services, Kubernetes, CI/CD tooling, and observability platforms.
Required Skills:
- Deep expertise in cloud services (AWS and/or GCP)
- Significant experience managing and troubleshooting services within Kubernetes environments
- Proven track record with CI/CD tooling
- Strong proficiency in observability platforms, including monitoring, alerting, and production operations
- Hands-on experience codifying infrastructure with Terraform and Helm charts
- Excellent incident response and troubleshooting abilities
- Proficiency in scripting and automation using Python
- Experience working with containerized workloads
- Experience collaborating with software engineers to support production cloud-native applications
Preferred Skills:
- Familiarity with ArgoCD, GitLab CI, and the Grafana, Mimir, Loki & Prometheus stack
📝 Enhancement Note: While not required, familiarity with the company's preferred tools and technologies can provide a significant advantage in this role.
📊 Web Portfolio & Project Requirements
Portfolio Essentials:
- Demonstrate a strong understanding of cloud services, Kubernetes, and CI/CD tooling through relevant projects and case studies.
- Showcase your ability to design, build, and maintain reliable foundational services for CI/CD pipelines and observability platforms.
- Highlight your experience with Terraform and Helm, and provide examples of infrastructure automation projects.
- Include examples of your incident response and troubleshooting skills, and describe how you have led post-incident reviews.
Technical Documentation:
- Provide clear and concise documentation for your projects, including code comments, version control, and deployment processes.
- Include performance metrics and optimization techniques used in your projects.
📝 Enhancement Note: A well-structured portfolio that demonstrates your technical skills and problem-solving abilities will be crucial for success in this role.
💵 Compensation & Benefits
Salary Range: €60,000 - €90,000 per year (based on market research and company size)
Benefits:
- Equity as we want you to have a part of what we are building
- Private medical insurance designed to keep you ensuring peace of mind while you excel in your career
- Unlimited Time Off Policy - A work-life balance and focus on our well-being are critical to keeping us performing at our best
- We embrace a hybrid approach that requires employees to be in the office for two days a week. We strongly believe that this approach fosters collaboration and enables the building of meaningful relationships
- You will also get a new starter budget to kit out your home office
- Opportunity to work on innovative projects with smart-minded people keen to share their knowledge and continuously improve
- Annual learning budget (prorated based on start date) to drive your performance and career development
Working Hours: 40 hours per week, with flexible deployment windows and maintenance schedules
📝 Enhancement Note: The salary range provided is based on market research and company size. Actual compensation may vary based on experience and qualifications.
🎯 Team & Company Context
🏢 Company Culture
Industry: Financial crime risk data and detection technology
Company Size: Medium-sized company with around 1,000 employees
Founded: 2014
Team Structure:
- The DevOps team is part of the Platform tribe, which is dedicated to building and maintaining foundational systems, tooling, and services for the Technology organization.
- The team collaborates closely with other engineering teams, including development and data engineering teams.
Development Methodology:
- The company uses modern development methodologies, including Agile and Scrum.
- They emphasize engineering excellence and strive to ship the best possible code and solutions to their customers.
Company Website: complyadvantage.com
📝 Enhancement Note: The company's focus on engineering excellence and collaboration makes it an attractive place for DevOps engineers looking to grow their careers in a dynamic and innovative environment.
📈 Career & Growth Analysis
Web Technology Career Level: Mid-Senior Level (5-10 years of experience)
Reporting Structure: The Site Reliability Engineer reports directly to the DevOps team lead and works closely with other engineering teams.
Technical Impact: The Site Reliability Engineer plays a crucial role in ensuring the reliability, scalability, and performance of the company's critical services. Their work directly impacts the user experience and the company's ability to deliver exceptional products to its customers.
Growth Opportunities:
- Technical Growth: The role offers ample opportunities for technical growth, including working with cutting-edge technologies and collaborating with experienced engineers.
- Leadership Potential: With experience, there is potential for growth into a technical leadership role, where you would be responsible for guiding the team's technical direction and mentoring other engineers.
- Career Progression: As the company continues to grow, there may be opportunities for career progression into more senior roles within the DevOps team or other areas of the Technology organization.
📝 Enhancement Note: The company's focus on engineering excellence and innovation provides numerous opportunities for technical growth and career progression.
🌐 Work Environment
Office Type: Hybrid office environment, with employees required to be in the office for two days a week.
Office Location(s): Lisbon, Portugal
Workspace Context:
- The company provides a collaborative workspace with multiple monitors and testing devices available for engineers.
- The workspace is designed to foster cross-functional collaboration between developers, designers, and stakeholders.
Work Schedule: The work schedule is flexible, with deployment windows and maintenance schedules managed by the on-call rotation.
📝 Enhancement Note: The company's hybrid work environment and flexible work schedule allow for a healthy work-life balance while still fostering collaboration and innovation.
📄 Application & Technical Interview Process
Interview Process:
- Technical Assessment: A hands-on technical assessment focused on cloud services, Kubernetes, CI/CD tooling, and observability platforms. This may include live coding exercises and system design discussions.
- Behavioral Interview: A behavioral interview focused on your problem-solving skills, communication abilities, and cultural fit with the company.
- Final Evaluation: A final evaluation based on your technical skills, cultural fit, and alignment with the company's mission and values.
Portfolio Review Tips:
- Highlight your experience with cloud services, Kubernetes, CI/CD tooling, and observability platforms through relevant projects and case studies.
- Emphasize your ability to design, build, and maintain reliable foundational services for CI/CD pipelines and observability platforms.
- Include examples of your incident response and troubleshooting skills, and describe how you have led post-incident reviews.
Technical Challenge Preparation:
- Brush up on your cloud services, Kubernetes, CI/CD tooling, and observability platforms skills.
- Familiarize yourself with the company's tech stack, including AWS/GCP, Kubernetes, Terraform, Helm, and the Grafana, Mimir, Loki & Prometheus stack.
- Prepare for live coding exercises and system design discussions by practicing common interview questions and working through relevant coding challenges.
ATS Keywords:
- Cloud Services: AWS, GCP, Kubernetes, Terraform, Helm, CI/CD, Observability, Monitoring, Alerting, Production Operations, Incident Response, Troubleshooting, Scripting, Automation, Containerized Workloads, Collaboration, Agile, Scrum, Engineering Excellence, Innovation, Hybrid Work Environment, Flexible Work Schedule, Technical Growth, Career Progression, Technical Leadership.
📝 Enhancement Note: Familiarize yourself with the company's tech stack and the relevant ATS keywords to optimize your resume and application materials.
🛠 Technology Stack & Web Infrastructure
Cloud-Based Infrastructure: Fully cloud-based with a Kubernetes-focused tech stack. Compute workloads run in Kubernetes clusters across multiple regions.
Backend & Server Technologies:
- Cloud services: AWS and/or GCP
- Containerization: Kubernetes
- Infrastructure as Code: Terraform and Helm
- CI/CD tooling: GitLab CI, ArgoCD
- Observability platforms: Grafana, Mimir, Loki & Prometheus
Development & DevOps Tools:
- Version control: Git
- Collaboration: GitLab, ArgoCD
- Monitoring: Grafana, Mimir, Loki & Prometheus
- Logging: ELK Stack (Elasticsearch, Logstash, Kibana)
- Infrastructure as Code: Terraform, Helm
- CI/CD: GitLab CI, ArgoCD
📝 Enhancement Note: Familiarize yourself with the company's tech stack and be prepared to discuss your experience with the relevant technologies during the interview process.
👥 Team Culture & Values
Web Development Values:
- Reliability: The company values reliability and strives to ensure the availability and performance of its critical services.
- Collaboration: The company fosters a culture of collaboration and encourages engineers to work closely with other teams to deliver exceptional products.
- Innovation: The company embraces innovation and encourages engineers to explore new technologies and approaches to solve complex problems.
- Continuous Learning: The company values continuous learning and provides opportunities for engineers to develop their skills and advance their careers.
Collaboration Style:
- Cross-functional Integration: The company encourages collaboration between developers, designers, and stakeholders to deliver exceptional products.
- Code Review Culture: The company emphasizes code review and peer programming practices to ensure code quality and knowledge sharing.
- Knowledge Sharing: The company encourages knowledge sharing and provides opportunities for engineers to mentor and learn from one another.
📝 Enhancement Note: The company's focus on collaboration, innovation, and continuous learning makes it an attractive place for DevOps engineers looking to grow their careers in a dynamic and supportive environment.
⚡ Challenges & Growth Opportunities
Technical Challenges:
- Cloud Services: Design, implement, and maintain highly available and reliable foundational services for CI/CD pipelines and observability platforms in a cloud-based environment.
- Incident Response: Respond to and resolve production incidents swiftly, and lead thorough post-incident reviews to identify root causes and implement proactive preventative measures.
- Automation: Manage and automate the cloud infrastructure using Terraform and Helm, adhering to GitOps best practices.
Learning & Development Opportunities:
- Technical Skill Development: Develop your skills in cloud services, Kubernetes, CI/CD tooling, and observability platforms through hands-on projects and collaboration with experienced engineers.
- Emerging Technologies: Stay up-to-date with emerging technologies and trends in cloud services, Kubernetes, and CI/CD tooling.
- Leadership Development: Develop your leadership skills through mentoring, team management, and architecture decision-making opportunities.
📝 Enhancement Note: The company's focus on technical growth and innovation provides numerous opportunities for DevOps engineers to develop their skills and advance their careers.
💡 Interview Preparation
Technical Questions:
- Cloud Services: Describe your experience with cloud services (AWS and/or GCP) and how you have used them to design, implement, and maintain highly available and reliable foundational services for CI/CD pipelines and observability platforms.
- Kubernetes: Explain your experience with Kubernetes and how you have used it to manage and automate cloud infrastructure using Terraform and Helm.
- Incident Response: Describe your incident response and troubleshooting skills, and provide examples of how you have led post-incident reviews to identify root causes and implement proactive preventative measures.
Company & Culture Questions:
- Company Mission: Explain how your experience and skills align with the company's mission to neutralize the risk of money laundering, terrorist financing, corruption, and other financial crime.
- Company Values: Describe how you embody the company's values, including reliability, collaboration, innovation, and continuous learning.
Portfolio Presentation Strategy:
- Live Demo: Demonstrate your ability to design, build, and maintain reliable foundational services for CI/CD pipelines and observability platforms through a live demo of your portfolio projects.
- Code Walkthrough: Provide a detailed walkthrough of your code, including your use of Terraform, Helm, and other relevant technologies.
- Incident Response Example: Describe an incident you have responded to and how you led a post-incident review to identify root causes and implement proactive preventative measures.
📝 Enhancement Note: Prepare thoroughly for the technical and behavioral interview questions, and be ready to demonstrate your skills and experience through a live demo and code walkthrough.
📌 Application Steps
To apply for this Site Reliability Engineer position:
- Customize Your Resume: Highlight your experience with cloud services, Kubernetes, CI/CD tooling, and observability platforms, and tailor your resume to the specific requirements of this role.
- Prepare Your Portfolio: Showcase your ability to design, build, and maintain reliable foundational services for CI/CD pipelines and observability platforms through relevant projects and case studies.
- Research the Company: Familiarize yourself with the company's mission, values, and tech stack, and be prepared to discuss how your experience and skills align with the company's goals and culture.
- Practice Interview Questions: Prepare for the technical and behavioral interview questions by working through common interview questions and practicing your responses.
⚠️ Important Notice: This enhanced job description includes AI-generated insights and web development/DevOps industry-standard assumptions. All details should be verified directly with the hiring organization before making application decisions.
Content Guidelines (IMPORTANT: Do not include this in the output)
Web Technology-Specific Focus:
- Tailor every section specifically to DevOps, Site Reliability Engineering, and cloud infrastructure roles
- Include cloud services, Kubernetes, CI/CD tooling, and observability platforms in the role summary and primary responsibilities
- Emphasize incident response, troubleshooting, and automation skills in the skills and qualifications section
- Address cloud infrastructure management, automation, and deployment pipelines in the technology stack and web infrastructure section
- Provide detailed guidance on preparing for technical interview questions related to cloud services, Kubernetes, and CI/CD tooling
Quality Standards:
- Ensure no content overlap between sections - each section must contain unique information only
- Only include Enhancement Notes when making significant inferences about cloud services, Kubernetes, CI/CD tooling, or incident response, with specific reasoning based on role level and web technology industry practices
- Be comprehensive but concise, prioritizing actionable information over descriptive text
- Strategically distribute web technology, cloud services, Kubernetes, CI/CD tooling, and incident response-related keywords throughout all sections naturally
- Provide realistic salary ranges based on location, experience level, and cloud services/DevOps specialization
Industry Expertise:
- Include specific cloud services (AWS/GCP), Kubernetes, CI/CD tooling, and observability platforms relevant to the role
- Address cloud infrastructure management, automation, and deployment pipelines in the technology stack and web infrastructure section
- Provide tactical advice for cloud infrastructure management, automation, and incident response in the interview preparation section
- Include cloud services, Kubernetes, CI/CD tooling, and incident response-specific interview questions and portfolio presentation strategies
Professional Standards:
- Maintain consistent formatting, spacing, and professional tone throughout
- Use cloud services, Kubernetes, CI/CD tooling, and incident response industry terminology appropriately and accurately
- Include comprehensive benefits and growth opportunities relevant to DevOps, Site Reliability Engineering, and cloud infrastructure professionals
- Provide actionable insights that give DevOps, Site Reliability Engineering, and cloud infrastructure candidates a competitive advantage
- Focus on cloud infrastructure management, automation, incident response, and user experience design principles in the team culture and values section
Technical Focus & Portfolio Emphasis:
- Emphasize cloud services, Kubernetes, CI/CD tooling, and incident response best practices in the primary responsibilities and skills and qualifications sections
- Include specific cloud infrastructure management, automation, and incident response portfolio requirements
- Address cloud infrastructure management, automation, and incident response in the technical interview process and portfolio review tips sections
- Focus on cloud services, Kubernetes, CI/CD tooling, and incident response problem-solving methods, performance optimization, and scalable architecture in the challenges and growth opportunities section
Avoid:
- Generic business jargon not relevant to DevOps, Site Reliability Engineering, or cloud infrastructure roles
- Placeholder text or incomplete sections
- Repetitive content across different sections
- Non-technical terminology unless relevant to the specific DevOps, Site Reliability Engineering, or cloud infrastructure role
- Marketing language unrelated to cloud services, Kubernetes, CI/CD tooling, or incident response
Application Requirements
Candidates should have deep expertise in cloud services and significant experience managing services within Kubernetes environments. Proficiency in scripting and automation, as well as experience with CI/CD tooling, is also required.