Senior Site Reliabilty Engineer - AWS Cloud Operations

synava
Full_timeKarlsruhe, Germany

📍 Job Overview

  • Job Title: Senior Site Reliability Engineer - AWS Cloud Operations
  • Company: synava
  • Location: Karlsruhe, Germany
  • Job Type: On-site, Full-time
  • Category: DevOps Engineer, System Administrator, Web Infrastructure
  • Date Posted: July 30, 2025
  • Experience Level: Mid-Senior level (5-10 years)
  • Remote Status: On-site

🚀 Role Summary

  • Design and implement cloud architecture best practices, ensuring reliability, scalability, and security in a complex multi-account AWS environment.
  • Collaborate with DevOps, AWS Admins, and the Developer Team to drive automation, observability, and security initiatives.
  • Lead the design and implementation of scalable, highly available cloud-native systems and telemetry stacks.
  • Foster Infrastructure as Code (IaC) best practices and automate provisioning and configuration tasks.
  • Define and implement Service Level Objectives (SLOs) and Service Level Indicators (SLIs) for active service quality management.

📝 Enhancement Note: This role requires a strong background in AWS cloud operations, with a focus on architecture, automation, and observability. Candidates should be comfortable working in a complex, multi-account environment and driving improvements in reliability, security, and performance.

💻 Primary Responsibilities

  • Architecture & Design:

    • Design and implement cloud architecture best practices.
    • Advise on platform component selection for existing or new products.
    • Collaborate with stakeholders to define and implement scalable, highly available cloud-native systems.
  • Operations & Observability:

    • Design and implement scalable, highly available cloud-native systems within a complex, multi-account organization.
    • Build a cost-efficient, compliant, and organization-wide usable telemetry stack.
    • Support the definition of Service Level Objectives (SLOs) and Service Level Indicators (SLIs) for active service quality management.
  • Automation & Infrastructure as Code:

    • Promote IaC best practices and automate provisioning and configuration tasks to improve efficiency and consistency.
    • Lead the implementation of CI/CD pipelines for operational tasks.
    • Develop strategies for secure, cross-account deployments.
  • Security & Identity:

    • Design and implement an IAM strategy, including concepts such as Zero Trust, RBAC, and SSO.
    • Collaborate with other teams on organizational unit (OU) structures and shared service accounts.
    • Implement compliance automation using policies as code, such as Service Control Policies, AWS Config, and AWS Security Hub.

📝 Enhancement Note: This role involves a mix of strategic planning, hands-on engineering, and collaboration with various teams. Candidates should be comfortable working in a dynamic environment and driving initiatives that improve the overall reliability, security, and performance of the cloud platform.

🎓 Skills & Qualifications

Education: A bachelor's degree in Computer Science, Engineering, or a related field. Relevant experience may be considered in lieu of a degree.

Experience: 5+ years of experience as a Site Reliability Engineer, DevOps Engineer, or Cloud Engineer in productive AWS or comparable cloud environments.

Required Skills:

  • Proven experience with AWS services, including EC2, RDS, VPC, IAM, Route 53, CloudTrail, AWS Organizations, Resource Access Manager (RAM), and AWS SSO.
  • Strong knowledge of Infrastructure as Code (IaC) frameworks, such as AWS CDK, Terraform, and CloudFormation, with a focus on reusable, cloud-agnostic infrastructure patterns.
  • Experience with modern observability tools, such as Prometheus, Grafana, and OpenTelemetry (OTel), for metrics, logging, and distributed tracing.
  • Proficiency in designing secure cloud network architectures and implementing compliance-compliant access controls in cloud-native environments.
  • Excellent troubleshooting skills and a strong ownership mindset.
  • Fluent English skills (German is a plus).

Preferred Skills:

  • Experience with Kubernetes deployments, Helm, IAM integration, autoscaling, and advanced troubleshooting.
  • Familiarity with policy-as-code frameworks, such as AWS Service Control Policies (SCPs), AWS Config Rules, or similar governance tools.
  • Knowledge of incident management and compliance processes in cloud environments.

📝 Enhancement Note: Candidates should have a strong technical background in AWS cloud operations, with a focus on architecture, automation, and observability. Relevant experience in other cloud providers or comparable environments may also be considered.

📊 Web Portfolio & Project Requirements

Portfolio Essentials:

  • Demonstrate experience with AWS services, including cloud architecture, automation, and observability.
  • Showcase projects that highlight your ability to design, implement, and manage scalable, highly available cloud-native systems.
  • Include examples of your work with Infrastructure as Code (IaC) frameworks and modern observability tools.
  • Highlight your experience with incident management, security, and compliance in cloud environments.

Technical Documentation:

  • Provide documentation for your projects, including code quality, commenting, and documentation standards.
  • Include version control, deployment processes, and server configuration details.
  • Demonstrate your understanding of testing methodologies, performance metrics, and optimization techniques.

📝 Enhancement Note: As this role focuses on cloud architecture, automation, and observability, candidates should emphasize these aspects in their portfolio. Include case studies that showcase your ability to drive improvements in reliability, security, and performance in complex cloud environments.

💵 Compensation & Benefits

Salary Range: €70,000 - €90,000 per year (Based on market research for mid-senior level DevOps engineers in Germany)

Benefits:

  • Competitive salary and benefits package.
  • Flexible working hours and remote work options.
  • An integrative work environment that values individual skills, experiences, and perspectives.
  • Opportunities for professional growth and development.

Working Hours: Full-time (40 hours per week) with flexible working hours and remote work options available.

📝 Enhancement Note: The salary range is estimated based on market research for mid-senior level DevOps engineers in Germany. The actual salary may vary depending on the candidate's experience, skills, and the company's internal salary structure.

🎯 Team & Company Context

🏢 Company Culture

Industry: Healthcare technology, focusing on optimizing workflows in radiology practices and clinics worldwide.

Company Size: Medium-sized company with around 950 clients worldwide, offering a collaborative and innovative work environment.

Founded: 2000, with a strong focus on continuous improvement and technological innovation.

Team Structure:

  • A dedicated Cloud Operations team responsible for the reliability, security, and performance of the cloud platform.
  • Close collaboration with DevOps, AWS Admins, and the Developer Team to drive automation, observability, and security initiatives.
  • Cross-functional collaboration with design, marketing, and business teams to ensure user-focused solutions.

Development Methodology:

  • Agile/Scrum methodologies for project management and sprint planning.
  • Code reviews, testing, and quality assurance practices to ensure high code quality and reliability.
  • CI/CD pipelines and automated deployment strategies to improve efficiency and consistency.

Company Website: www.synava.com

📝 Enhancement Note: The company culture at synava is focused on innovation, collaboration, and continuous improvement. Candidates should be comfortable working in a dynamic environment and driving initiatives that enhance the user experience and improve the overall reliability, security, and performance of the cloud platform.

📈 Career & Growth Analysis

Web Technology Career Level: Mid-Senior level (5-10 years of experience) with a focus on cloud architecture, automation, and observability. This role offers opportunities for technical leadership, mentoring, and career progression within the Cloud Operations team.

Reporting Structure: The Senior Site Reliability Engineer will report directly to the Head of Cloud Operations and work closely with DevOps, AWS Admins, and the Developer Team.

Technical Impact: This role has a significant impact on the reliability, security, and performance of the cloud platform, ensuring that it meets the needs of the company's clients and users.

Growth Opportunities:

  • Technical Leadership: Opportunities to mentor junior team members and drive technical initiatives that improve the overall reliability, security, and performance of the cloud platform.
  • Emerging Technologies: The chance to work with cutting-edge cloud technologies and stay up-to-date with the latest industry trends and best practices.
  • Architecture Decisions: The opportunity to make strategic architecture decisions that drive improvements in reliability, security, and performance.

📝 Enhancement Note: This role offers significant opportunities for technical growth and leadership within the Cloud Operations team. Candidates should be comfortable working in a dynamic environment and driving initiatives that enhance the user experience and improve the overall reliability, security, and performance of the cloud platform.

🌐 Work Environment

Office Type: A modern, collaborative workspace with a focus on innovation and continuous improvement.

Office Location(s): Karlsruhe, Germany, with opportunities for remote work and flexible working hours.

Workspace Context:

  • A dedicated workspace with multiple monitors and testing devices available to ensure optimal productivity.
  • Collaborative workspaces that encourage cross-functional interaction and knowledge sharing.
  • Access to the latest tools and technologies to support cloud architecture, automation, and observability initiatives.

Work Schedule: Full-time (40 hours per week) with flexible working hours and remote work options available. The work schedule may include deployment windows, maintenance, and project deadlines.

📝 Enhancement Note: The work environment at synava is focused on collaboration, innovation, and continuous improvement. Candidates should be comfortable working in a dynamic environment and driving initiatives that enhance the user experience and improve the overall reliability, security, and performance of the cloud platform.

📄 Application & Technical Interview Process

Interview Process:

  1. Technical Assessment: A hands-on technical assessment focused on AWS cloud architecture, automation, and observability. Candidates should be prepared to discuss their experience with AWS services, IaC frameworks, and modern observability tools.
  2. Architecture Review: A review of the candidate's architecture and design decisions, focusing on scalability, security, and performance. Candidates should be prepared to discuss their approach to incident management, security, and compliance in cloud environments.
  3. Behavioral Interview: A discussion of the candidate's problem-solving skills, ownership mindset, and ability to work effectively in a collaborative environment.
  4. Final Evaluation: A review of the candidate's overall fit for the role, with a focus on their technical skills, cultural fit, and potential for growth within the team.

Portfolio Review Tips:

  • Highlight your experience with AWS services, cloud architecture, automation, and observability.
  • Include case studies that demonstrate your ability to drive improvements in reliability, security, and performance in complex cloud environments.
  • Showcase your understanding of incident management, security, and compliance processes in cloud environments.

Technical Challenge Preparation:

  • Brush up on your AWS services, including cloud architecture, automation, and observability.
  • Familiarize yourself with modern observability tools, such as Prometheus, Grafana, and OpenTelemetry (OTel).
  • Prepare for hands-on technical assessments and architecture reviews, focusing on your ability to design, implement, and manage scalable, highly available cloud-native systems.

ATS Keywords: AWS, Cloud Operations, Site Reliability Engineering, DevOps, Automation, Observability, Security, Infrastructure as Code, Kubernetes, IAM, Policy as Code, Troubleshooting, Cloud Architecture, Service Level Objectives, Telemetry, CI/CD Pipelines, Incident Management, Compliance, Agile, Scrum, Code Review, Quality Assurance, CI/CD Pipelines, Deployment Strategies, Cloud-Native Systems, Multi-Account Organizations, Telemetry Stacks, Service Level Objectives, Service Level Indicators, Zero Trust, RBAC, SSO, Organizational Unit Structures, Shared Service Accounts, Service Control Policies, AWS Config, AWS Security Hub, IaC Best Practices, IaC Frameworks, CloudFormation, Terraform, AWS CDK, Prometheus, Grafana, OpenTelemetry, Distributed Tracing, Network Architecture, Access Controls, Compliance-Compliant, Incident Management Processes, Security Processes, Cloud Environments, Agile Practices, Collaboration, User Experience, Performance Optimization, Scalability, High Availability, Reliability, Security, Performance.

📝 Enhancement Note: The interview process for this role is focused on AWS cloud architecture, automation, and observability. Candidates should be prepared to discuss their experience with AWS services, IaC frameworks, and modern observability tools. The technical assessment and architecture review are crucial components of the interview process, with a strong emphasis on the candidate's ability to design, implement, and manage scalable, highly available cloud-native systems.

📌 Application Steps

To apply for this Senior Site Reliability Engineer - AWS Cloud Operations position:

  1. Submit your application through the application link provided.
  2. Customize your portfolio to highlight your experience with AWS services, cloud architecture, automation, and observability. Include case studies that demonstrate your ability to drive improvements in reliability, security, and performance in complex cloud environments.
  3. Optimize your resume for web technology roles, emphasizing your project highlights and technical skills relevant to this position.
  4. Prepare for the technical interview process, focusing on your AWS cloud architecture, automation, and observability skills. Brush up on your knowledge of modern observability tools and incident management processes in cloud environments.
  5. Research the company and its focus on healthcare technology, user experience, and continuous improvement. Be prepared to discuss your fit for the role and your potential contributions to the team's success.

⚠️ Important Notice: This enhanced job description includes AI-generated insights and industry-standard assumptions about web technology roles, cloud architecture, automation, and observability. All details should be verified directly with the hiring organization before making application decisions.

Application Requirements

Candidates should have over 5 years of experience in Site Reliability Engineering, DevOps, or Cloud Engineering in productive AWS environments. Strong knowledge of AWS services, Infrastructure as Code frameworks, and modern observability tools is essential, along with excellent troubleshooting skills.