Staff SRE - Observability (x/f/m)
๐ Job Overview
- Job Title: Staff SRE - Observability (x/f/m)
- Company: Doctolib
- Location: Nantes, Pays de la Loire, France
- Job Type: Hybrid
- Category: Site Reliability Engineering
- Date Posted: 2025-07-07
- Experience Level: 10+ years
- Remote Status: On-site/Hybrid
๐ Role Summary
- Lead the observability strategy across Doctolib's platform, focusing on scalable logging and tracing capabilities.
- Identify and drive large-scale reliability initiatives to improve incident detection, response, and postmortem analysis.
- Mentor senior engineers and elevate the craft of reliability engineering across the company.
- Influence strategic decisions by providing technical guidance to leadership and representing the observability discipline in architectural reviews.
๐ Enhancement Note: This role sits at the intersection of infrastructure, developer experience, and product engineering, requiring a strong technical leader with a broad skill set in observability and reliability engineering.
๐ป Primary Responsibilities
- Observability Strategy: Develop and execute the observability strategy for Doctolib's platform, focusing on scalable and developer-friendly logging, tracing, and alerting capabilities.
- Reliability Initiatives: Identify, lead, and drive large-scale cross-cutting reliability initiatives that improve Doctolib's operational maturity.
- On-Call Rotation: Participate in the on-call rotation and contribute to improving the on-call experience by refining alerting, reducing noise, and ensuring actionable telemetry.
- Technical Mentorship: Serve as a mentor and technical coach to senior engineers, helping elevate the craft of reliability engineering across the company.
- Strategic Influence: Influence strategic decisions by providing technical guidance to leadership and representing the observability discipline in architectural reviews and platform discussions.
๐ Enhancement Note: This role requires a balance of long-term architecture work and fast, iterative improvements, with a focus on driving consensus and mentoring engineers across teams.
๐ Skills & Qualifications
Education: A bachelor's degree in Computer Science, Engineering, or a related field. Relevant experience may substitute for formal education.
Experience: Extensive experience (8+ years) in SRE, platform engineering, or infrastructure roles within cloud-native environments (preferably AWS, GCP, or Kubernetes-based).
Required Skills:
- Deep expertise in observability tooling and architecture, such as:
- Logging: Fluent Bit, OpenTelemetry, Loki, Elasticsearch, Logstash, Vector
- Tracing: OpenTelemetry or proprietary APMs
- Metrics: Prometheus, Thanos, Datadog, or equivalent
- Strong systems engineering background with fluency in at least one backend programming language (e.g., Go, Python, Ruby).
- Proven ability to lead through influence, set technical direction, drive consensus, and mentor engineers across teams.
- Experience designing and operating high-scale telemetry pipelines and working with developers to improve instrumentation quality.
- Comfortable balancing long-term architecture work with fast, iterative improvements.
- Clear, concise communication skillsโboth written and verbalโwith the ability to drive alignment in ambiguous environments.
Preferred Skills:
- Familiarity with Doctolib's tech stack and healthcare industry context.
- Experience with chaos engineering and fault injection techniques.
- Knowledge of French and ability to work in a bilingual environment.
๐ Enhancement Note: Given the strategic nature of this role, candidates should possess a strong understanding of observability principles, cloud-native architectures, and the ability to thrive in a dynamic, cross-functional environment.
๐ Web Portfolio & Project Requirements
Portfolio Essentials:
- Demonstrate a strong track record of driving observability and reliability improvements in large-scale, cloud-native environments.
- Showcase your ability to lead technical initiatives, mentor engineers, and influence strategic decisions.
- Highlight your expertise in observability tooling, architecture, and best practices.
Technical Documentation:
- Prepare case studies or blog posts detailing your approach to observability challenges, including logging, tracing, and alerting strategies.
- Be ready to discuss your experience with telemetry pipelines, incident response, and postmortem analysis.
๐ Enhancement Note: As this role focuses on technical leadership and strategic influence, a strong portfolio demonstrating your ability to drive change and mentor others will be crucial.
๐ต Compensation & Benefits
Salary Range: โฌ80,000 - โฌ120,000 per year (based on experience and market research for SRE roles in the Pays de la Loire region)
Benefits:
- Free Health Insurance for you
- Up to 14 days of RTT (Rรฉcupรฉration du Temps de Travail)
- A flexible workplace policy offering both hybrid and office-based modes
- Flexibility days allowing you to work in EU countries and the UK for up to 10 days per year
- Wellbeing program with free mental health and coaching through moka.care
- Special support package for caregivers and workers with disabilities
- Lunch voucher with Swile card
- Work Council subsidy for sport club membership or creative activities
- Bicycle subsidy
- Public transportation reimbursement
- Relocation support for international mobility
Working Hours: Full-time, with flexible working hours and the possibility of remote work.
๐ Enhancement Note: Salary range is estimated based on market research for SRE roles in the Pays de la Loire region, considering the candidate's experience level and the strategic nature of the position.
๐ฏ Team & Company Context
Company Culture:
- Industry: Healthcare technology, with a focus on digital health and telemedicine.
- Company Size: Medium to large (1,000+ employees), with a strong presence in France and expanding internationally.
- Founded: 2013, with a mission to make healthcare more accessible and convenient for patients and healthcare professionals.
Team Structure:
- The Core Reliability & Observability team sits within the broader Engineering organization, working closely with software engineers, product teams, and other SREs.
- The team consists of SREs, software engineers, and a team lead, collaborating to ensure Doctolib's platform remains reliable, debuggable, and scalable.
Development Methodology:
- Agile/Scrum methodologies, with a focus on iterative development, continuous integration, and delivery.
- Code reviews, testing, and quality assurance practices to ensure high standards and maintainability.
- Deployment strategies, CI/CD pipelines, and server management to support Doctolib's growing infrastructure.
Company Website: Doctolib
๐ Enhancement Note: Doctolib's culture values innovation, collaboration, and a patient-centric approach. The company encourages continuous learning and growth, making it an excellent fit for a strategic SRE role focused on driving change and improving reliability.
๐ Career & Growth Analysis
Web Technology Career Level: Staff SRE - Observability, a strategic and senior role focused on driving observability and reliability improvements across the organization.
Reporting Structure: The Staff SRE - Observability will report directly to the Head of SRE and work closely with software engineers, product teams, and other SREs to drive cross-cutting initiatives and elevate Doctolib's operational maturity.
Technical Impact: This role will have a significant impact on Doctolib's platform reliability, performance, and user experience by improving observability, incident response, and postmortem analysis capabilities.
Growth Opportunities:
- Technical Leadership: Grow into a principal or distinguished engineer role, focusing on technical strategy, architecture, and mentoring.
- Management: Transition into a management role, leading a team of SREs and driving the reliability and observability agenda for Doctolib's platform.
- Specialization: Deepen your expertise in a specific area of observability or reliability engineering, becoming a go-to expert within Doctolib and the broader industry.
๐ Enhancement Note: Given the strategic nature of this role, there are ample opportunities for growth and progression within Doctolib's organization, both technically and managerially.
๐ Work Environment
Office Type: Modern, collaborative workspace designed to foster innovation and teamwork.
Office Location(s): Nantes, France, with additional offices in Paris, Lyon, and international locations.
Workspace Context:
- Collaboration: Doctolib's offices are designed to encourage collaboration and cross-functional teamwork, with open-plan workspaces, meeting rooms, and breakout areas.
- Equipment: Doctolib provides modern equipment, including multiple monitors, high-quality audio-visual tools, and ergonomic furniture to support productive work.
- Flexibility: Doctolib offers a flexible workplace policy, allowing employees to work remotely or in the office, depending on their preferences and the needs of their role.
Work Schedule: Full-time, with flexible working hours and the possibility of remote work. Doctolib offers up to 14 days of RTT (Rรฉcupรฉration du Temps de Travail) per year.
๐ Enhancement Note: Doctolib's work environment prioritizes collaboration, flexibility, and employee well-being, creating an ideal setting for a strategic SRE role focused on driving change and improving reliability.
๐ Application & Technical Interview Process
Interview Process:
- Phone Screen (30 minutes): A brief conversation with a Tech Recruiter to understand your background, motivations, and fit for the role.
- Technical Interview (1 hour 30 minutes): A deep dive into your technical skills, focusing on your experience with observability tooling, architecture, and best practices. Be prepared to discuss your approach to logging, tracing, and alerting, as well as your experience with telemetry pipelines and incident response.
- System Design Interview (1 hour 30 minutes): An in-depth discussion of your system design and architecture skills, focusing on your ability to make strategic decisions and balance long-term architecture work with fast, iterative improvements.
- Manager Interview (1 hour 15 minutes): A conversation with the Head of SRE to assess your cultural fit, leadership potential, and alignment with Doctolib's mission and values.
Portfolio Review Tips:
- Highlight your experience driving observability and reliability improvements in large-scale, cloud-native environments.
- Showcase your ability to lead technical initiatives, mentor engineers, and influence strategic decisions.
- Demonstrate your expertise in observability tooling, architecture, and best practices through case studies, blog posts, or other relevant examples.
Technical Challenge Preparation:
- Brush up on your knowledge of observability tooling, architecture, and best practices, focusing on logging, tracing, and alerting strategies.
- Prepare for system design questions, focusing on your ability to make strategic decisions and balance long-term architecture work with fast, iterative improvements.
- Practice communicating your ideas clearly and concisely, both verbally and in writing, to demonstrate your ability to drive alignment in ambiguous environments.
ATS Keywords: Site Reliability Engineering, Observability, Reliability Engineering, Cloud-Native Environments, AWS, GCP, Kubernetes, Logging, Tracing, Metrics, Systems Engineering, Backend Programming, Technical Leadership, Telemetry Pipelines, Incident Detection, Response, Postmortem Analysis.
๐ Enhancement Note: To optimize your application and interview preparation, focus on demonstrating your strategic thinking, technical expertise, and ability to drive change in a dynamic, cross-functional environment.
๐ Technology Stack & Web Infrastructure
Observability Tools:
- Logging: Fluent Bit, OpenTelemetry, Loki, Elasticsearch, Logstash, Vector
- Tracing: OpenTelemetry, Jaeger, Zipkin, or other proprietary APMs
- Metrics: Prometheus, Thanos, Datadog, or equivalent
- Alerting: PagerDuty, OpsGenie, or other alerting platforms
Cloud & Infrastructure:
- Cloud Providers: AWS, GCP, or a hybrid/multi-cloud environment
- Containerization: Kubernetes, Docker
- Orchestration: Helm, Argo CD, or other CI/CD tools
- Infrastructure as Code (IaC): Terraform, CloudFormation, or other IaC tools
Programming Languages:
- Backend: Go, Python, Ruby, or other backend programming languages
- Scripting: Bash, PowerShell, or other scripting languages
๐ Enhancement Note: Doctolib's technology stack is primarily cloud-native, with a focus on observability, reliability, and scalability. Familiarity with these tools and technologies will be crucial for success in this role.
๐ฅ Team Culture & Values
Web Development Values:
- Patient-Centric: Prioritize the needs and well-being of patients in all decisions and actions.
- Innovative: Embrace continuous learning and improvement, driving progress through technology and data.
- Collaborative: Work together across teams and disciplines to achieve common goals and deliver exceptional results.
- Quality-Driven: Strive for excellence in all aspects of our work, ensuring high standards and maintaining a strong focus on quality.
Collaboration Style:
- Cross-Functional: Work closely with software engineers, product teams, and other SREs to drive cross-cutting initiatives and elevate Doctolib's operational maturity.
- Mentoring: Provide technical guidance and support to senior engineers, helping them grow and develop their skills in reliability and observability engineering.
- Knowledge Sharing: Contribute to Doctolib's internal wiki, blog, and other knowledge-sharing platforms to ensure best practices and lessons learned are accessible to the entire organization.
๐ Enhancement Note: Doctolib's culture values innovation, collaboration, and a patient-centric approach. The company encourages continuous learning and growth, making it an ideal environment for a strategic SRE role focused on driving change and improving reliability.
โก Challenges & Growth Opportunities
Technical Challenges:
- Scalability: Design and implement scalable logging, tracing, and alerting solutions that can support Doctolib's growing user base and infrastructure.
- Complexity: Navigate Doctolib's complex, microservices-based architecture and ensure observability, reliability, and performance across the entire platform.
- Interoperability: Integrate and manage multiple observability tools, ensuring they work seamlessly together and provide a comprehensive view of Doctolib's infrastructure.
- Emerging Technologies: Stay up-to-date with the latest observability and reliability engineering trends, and evaluate new tools and technologies for potential integration into Doctolib's stack.
Learning & Development Opportunities:
- Technical Training: Participate in workshops, conferences, and online courses to deepen your expertise in observability, reliability engineering, and related technologies.
- Mentoring: Serve as a mentor to junior engineers, helping them develop their skills and advance their careers within Doctolib's organization.
- Leadership Development: Attend leadership training programs and workshops to enhance your management, communication, and strategic decision-making skills.
๐ Enhancement Note: As a strategic SRE role focused on driving change and improving reliability, you will face numerous technical challenges and growth opportunities. Embrace these as opportunities to learn, grow, and make a significant impact on Doctolib's platform and user experience.
๐ก Interview Preparation
Technical Questions:
- Observability Architecture: Describe your approach to designing and implementing scalable, developer-friendly logging, tracing, and alerting solutions in a large-scale, cloud-native environment.
- Incident Response: Walk through your process for detecting, responding to, and analyzing incidents, with a focus on minimizing downtime and ensuring actionable telemetry.
- System Design: Present a system design for a large-scale, microservices-based architecture, focusing on observability, reliability, and performance considerations.
- Leadership & Influence: Discuss your experience driving consensus, mentoring engineers, and influencing strategic decisions in a dynamic, cross-functional environment.
Company & Culture Questions:
- Patient-Centric Focus: Explain how you would ensure that Doctolib's observability and reliability initiatives align with the company's patient-centric mission and values.
- Innovation & Collaboration: Describe your approach to driving innovation and collaboration within a cross-functional team, with a focus on elevating Doctolib's operational maturity.
- Work-Life Balance: Discuss how you maintain a healthy work-life balance, especially in a role that requires on-call rotations and may involve working outside of standard business hours.
Portfolio Presentation Strategy:
- Storytelling: Craft a compelling narrative that highlights your experience driving observability and reliability improvements in large-scale, cloud-native environments.
- Case Studies: Prepare detailed case studies or blog posts that demonstrate your expertise in observability tooling, architecture, and best practices.
- Live Demonstrations: Be ready to walk through your portfolio live, showcasing your ability to lead technical initiatives, mentor engineers, and influence strategic decisions.
๐ Enhancement Note: To excel in the interview process, focus on demonstrating your strategic thinking, technical expertise, and ability to drive change in a dynamic, cross-functional environment. Tailor your responses to Doctolib's patient-centric mission and values, and be prepared to discuss your approach to innovation, collaboration, and work-life balance.
๐ Application Steps
To apply for this Staff SRE - Observability (x/f/m) position at Doctolib:
- Customize Your Resume: Highlight your experience with observability tooling, architecture, and best practices, as well as your ability to lead technical initiatives, mentor engineers, and influence strategic decisions.
- Tailor Your Cover Letter: Explain your motivation for applying to this role and how your background and skills make you a strong fit for Doctolib's organization and mission.
- Prepare Your Portfolio: Showcase your experience driving observability and reliability improvements in large-scale, cloud-native environments, with a focus on logging, tracing, and alerting strategies.
- Research Doctolib: Familiarize yourself with Doctolib's technology stack, industry context, and company culture, ensuring you can articulate your alignment with the organization's mission and values.
- Practice for Interviews: Brush up on your technical skills, system design knowledge, and communication abilities, focusing on your ability to drive change in a dynamic, cross-functional environment.
โ ๏ธ Important Notice: This enhanced job description includes AI-generated insights and web development/server administration industry-standard assumptions. All details should be verified directly with the hiring organization before making application decisions.
Application Requirements
Candidates should have extensive experience (8+ years) in SRE or related roles within cloud-native environments. Deep expertise in observability tooling and a strong systems engineering background are essential.