Senior Engineering Manager, SRE
📍 Job Overview
- Job Title: Senior Engineering Manager, SRE
- Company: Nexthink
- Location: Madrid, Madrid, Spain
- Job Type: Full-time
- Category: DevOps & SRE
- Date Posted: June 21, 2025
- Experience Level: 10+ years
- Remote Status: Hybrid
🚀 Role Summary
- Lead and drive the adoption of SRE industry best practices in a security and compliance-centric delivery model.
- Collaborate with development, architecture, and security teams to ensure high-quality products and enterprise-grade practices.
- Manage and inspire a proficient cloud engineering and SRE team to meet or exceed business-defined SLAs.
📝 Enhancement Note: This role requires a strong background in cloud operations engineering and a solid understanding of SRE principles to ensure system reliability and performance at scale.
💻 Primary Responsibilities
- Incident Response & Forward-Thinking Monitoring: Oversee all operations and SRE functions, including incident response and forward-thinking monitoring to ensure system availability and performance.
- Capacity Forecasting & Change Management: Drive capacity forecasting and change management processes to anticipate and address system demands and changes.
- Automation & Infrastructure-as-Code: Implement automation for delivery and operations of platform services using infrastructure-as-code and monitoring-as-code to streamline processes and improve efficiency.
- Service Availability & Scalability: Tasked with building and managing service availability, performance, and scalability in production environments to enable business-defined SLAs.
- Collaboration & Stakeholder Management: Collaborate with application and business stakeholders to ensure a high-quality product is developed and deployed in production, and work closely with architecture and security teams to define and implement enterprise-grade practices.
- Compliance & Evidence-Gathering: Own and drive compliance and evidence-gathering activities for audits and regulated deployments to ensure the organization meets all relevant standards and regulations.
- Team Management & Development: Recruit, manage, and inspire a proficient cloud engineering and SRE team, fostering a customer-focused SRE-driven operations culture.
📝 Enhancement Note: This role requires a balance of technical expertise and leadership skills to manage and grow an SRE team, while also collaborating with various stakeholders to ensure system reliability and performance.
🎓 Skills & Qualifications
Education: Bachelor's degree in Computer Science, Engineering, or a related field, or equivalent professional experience.
Experience: 10+ years in cloud operations engineering leadership roles in SaaS companies, with 5+ years in a senior management/leadership role leading large SRE and cloud operations teams.
Required Skills:
- Deep understanding and experience working with one of the three major cloud service providers (AWS, Azure, or GCP) running native cloud technologies based on Docker, Kubernetes, Istio, Kafka at scale.
- Experience working with modern CI/CD and automation tools such as Jenkins, Ansible, Terraform, etc.
- Experience building, scaling, and monitoring infrastructure needed for SaaS-based applications and services, with proficiency in APM and infrastructure monitoring tools such as Datadog, New Relic, Sumo Logic, Splunk, Dynatrace, etc.
- Managed on-call 24x7 rotation teams to serve global customers.
- Experience creating a strong and passionate customer-focused SRE-driven operations culture.
- Excellent interpersonal and communication skills in English.
Preferred Skills:
- Experience operating workloads in a secured, highly regulated environment such as FedRAMP.
- Knowledge of lean and agile software engineering best practices.
📝 Enhancement Note: Candidates with experience in highly regulated environments and a strong background in agile methodologies may have an advantage in this role.
📊 Web Portfolio & Project Requirements
Portfolio Essentials:
- SRE & Cloud Operations Portfolio: Demonstrate your experience in SRE and cloud operations by showcasing projects that highlight your ability to manage and improve system reliability, performance, and scalability.
- Incident Response & Monitoring: Include case studies or examples of incident response and monitoring strategies you've implemented to ensure system availability and performance.
- Automation & Infrastructure-as-Code: Highlight your experience with automation and infrastructure-as-code by showcasing projects that demonstrate your ability to streamline processes and improve efficiency.
- Compliance & Evidence-Gathering: Showcase your experience with compliance and evidence-gathering activities by including examples of audits or regulated deployments you've successfully managed.
Technical Documentation:
- Documentation Standards: Demonstrate your ability to document technical processes and procedures using clear, concise, and well-organized documentation standards.
- Version Control & Deployment Processes: Highlight your experience with version control systems and deployment processes by including examples of how you've managed and optimized these processes in previous roles.
- Testing Methodologies & Performance Metrics: Showcase your experience with testing methodologies and performance metrics by including examples of how you've used these to improve system reliability, performance, and scalability.
📝 Enhancement Note: Candidates should focus on presenting a portfolio that demonstrates their technical expertise and leadership skills in SRE and cloud operations, with a strong emphasis on incident response, automation, and compliance.
💵 Compensation & Benefits
Salary Range: €120,000 - €160,000 per year (based on regional market research and industry benchmarks for senior SRE management roles in Madrid)
Benefits:
- Permanent Contract and a competitive compensation package (Stock Options also included).
- Amazing centrally located offices near the Bernabeu Stadium.
- Private Health Insurance (Sanitas) and daily meal vouchers of €11 will be entirely covered by the company.
- Hybrid work model balancing office and remote work, with a structured approach for new hires to foster connections and onboarding.
- Flexible Hours and unlimited vacation (employees have unlimited paid time off on top of the 23 days of holidays offered) plus 3 company-paid volunteer days.
- Up to €25 per month for a gym subscription.
- Flexible compensation plan for childcare & public transportation.
- Reimbursement of up to 50% of the cost of English & Spanish classes.
- Fresh fruit, cookies, soft drinks, and protein shakes at the office.
- Regular company and team events like Pizza talks, Team Building activities, Christmas parties, hosting Meetups at the office, and more!
- Bonuses for referring successful hires after three months of continuous employment.
- Relocation package for people coming from another country.
📝 Enhancement Note: The salary range provided is an estimate based on regional market research and industry benchmarks for senior SRE management roles in Madrid. The actual salary may vary depending on the candidate's experience, skills, and qualifications.
🎯 Team & Company Context
🏢 Company Culture
Industry: Digital Employee Experience Management Software
Company Size: 1,300+ employees across 5 continents
Founded: 2004
Team Structure:
- The SRE team is responsible for ensuring system reliability, performance, and scalability in production environments.
- The team works closely with development, architecture, and security teams to define and implement enterprise-grade practices.
- The team consists of cloud engineers and SRE professionals with a strong focus on incident response, automation, and compliance.
Development Methodology:
- Nexthink operates as One Team, connecting, collaborating, and innovating to continuously grow.
- The company uses agile methodologies to drive software development and delivery.
- The SRE team uses a combination of on-call rotations, incident response, and monitoring strategies to ensure system reliability and performance.
Company Website: Nexthink
📝 Enhancement Note: Nexthink's company culture emphasizes collaboration, innovation, and continuous growth, with a strong focus on digital employee experience management. The SRE team plays a critical role in ensuring the reliability, performance, and scalability of the company's software products.
📈 Career & Growth Analysis
Web Technology Career Level: Senior Engineering Manager, SRE
- Responsible for overseeing all operations and SRE functions, including incident response and monitoring.
- Drives capacity forecasting, automation, and ensures service availability, performance, and compliance in production environments.
- Manages and inspires a proficient cloud engineering and SRE team, fostering a customer-focused SRE-driven operations culture.
Reporting Structure: Reports directly to the VP of Engineering and works closely with development, architecture, and security teams.
Technical Impact: Ensures system reliability, performance, and scalability in production environments, enabling business-defined SLAs and driving enterprise-grade practices.
Growth Opportunities:
- Technical Growth: Expand your expertise in cloud technologies, SRE best practices, and enterprise-grade security practices.
- Leadership Development: Develop your leadership skills by managing and growing a team of cloud engineers and SRE professionals.
- Architecture & Decision-Making: Gain experience in defining and implementing enterprise-grade practices, and make critical decisions that impact system reliability, performance, and scalability.
📝 Enhancement Note: This role offers significant growth opportunities for candidates looking to expand their technical expertise in SRE and cloud operations, as well as develop their leadership skills in managing and growing a team of cloud engineers and SRE professionals.
🌐 Work Environment
Office Type: Hybrid work model balancing office and remote work, with a structured approach for new hires to foster connections and onboarding.
Office Location(s): Madrid, Spain
Workspace Context:
- Collaborative Workspace: The Madrid office is centrally located near the Bernabeu Stadium, providing a collaborative workspace for employees to connect, collaborate, and innovate.
- Development Tools & Resources: Nexthink provides access to modern development tools, resources, and infrastructure to support cloud engineers and SRE professionals in their work.
- Cross-Functional Collaboration: The SRE team works closely with development, architecture, and security teams to ensure system reliability, performance, and scalability in production environments.
Work Schedule: Flexible hours and unlimited vacation, with employees having unlimited paid time off on top of the 23 days of holidays offered, plus 3 company-paid volunteer days.
📝 Enhancement Note: Nexthink's hybrid work model and flexible hours provide employees with the work-life balance they need to thrive, while also fostering a collaborative and innovative work environment.
📄 Application & Technical Interview Process
Interview Process:
- Technical Phone Screen: A 30-minute phone screen to assess your technical expertise in SRE and cloud operations, with a focus on incident response, automation, and compliance.
- On-Site Technical Deep Dive: A 4-hour on-site technical deep dive to assess your ability to manage and improve system reliability, performance, and scalability, with a focus on incident response, automation, and compliance.
- Behavioral & Cultural Fit Interview: A 1-hour behavioral and cultural fit interview to assess your leadership skills, communication style, and cultural fit within the SRE team and Nexthink as a whole.
- Final Decision: A final decision will be made based on your technical expertise, leadership skills, and cultural fit within the SRE team and Nexthink.
Portfolio Review Tips:
- SRE & Cloud Operations Portfolio: Highlight your experience in SRE and cloud operations by showcasing projects that demonstrate your ability to manage and improve system reliability, performance, and scalability.
- Incident Response & Monitoring: Include case studies or examples of incident response and monitoring strategies you've implemented to ensure system availability and performance.
- Automation & Infrastructure-as-Code: Highlight your experience with automation and infrastructure-as-code by showcasing projects that demonstrate your ability to streamline processes and improve efficiency.
- Compliance & Evidence-Gathering: Showcase your experience with compliance and evidence-gathering activities by including examples of audits or regulated deployments you've successfully managed.
Technical Challenge Preparation:
- Incident Response & Monitoring: Brush up on your incident response and monitoring skills, and be prepared to discuss strategies you've used to ensure system availability and performance in previous roles.
- Automation & Infrastructure-as-Code: Familiarize yourself with modern automation and infrastructure-as-code tools and best practices, and be prepared to discuss how you've used these to streamline processes and improve efficiency in previous roles.
- Compliance & Evidence-Gathering: Review your experience with compliance and evidence-gathering activities, and be prepared to discuss how you've ensured that your organization meets all relevant standards and regulations in previous roles.
ATS Keywords: [Provided in the ATS Keywords section below]
📝 Enhancement Note: The interview process for this role is designed to assess the candidate's technical expertise in SRE and cloud operations, as well as their leadership skills and cultural fit within the SRE team and Nexthink as a whole. Candidates should focus on presenting a portfolio that demonstrates their technical expertise and leadership skills in SRE and cloud operations, with a strong emphasis on incident response, automation, and compliance.
🛠 Technology Stack & Web Infrastructure
Cloud Service Providers:
- AWS, Azure, or GCP (depending on the candidate's experience)
Cloud Technologies:
- Docker, Kubernetes, Istio, Kafka (depending on the candidate's experience)
Infrastructure-as-Code & Automation Tools:
- Jenkins, Ansible, Terraform, etc. (depending on the candidate's experience)
APM & Infrastructure Monitoring Tools:
- Datadog, New Relic, Sumo Logic, Splunk, Dynatrace, etc. (depending on the candidate's experience)
📝 Enhancement Note: The technology stack for this role will depend on the candidate's experience and the specific cloud service provider, cloud technologies, infrastructure-as-code tools, and APM & infrastructure monitoring tools they have experience with.
👥 Team Culture & Values
SRE & Cloud Operations Values:
- Reliability: A strong focus on ensuring system reliability, performance, and scalability in production environments.
- Automation: Leveraging automation and infrastructure-as-code to streamline processes and improve efficiency.
- Incident Response: A proactive approach to incident response and monitoring to ensure system availability and performance.
- Compliance: A commitment to ensuring that the organization meets all relevant standards and regulations.
- Customer-Focused: A dedication to understanding and addressing the needs of internal and external customers.
Collaboration Style:
- Cross-Functional Collaboration: The SRE team works closely with development, architecture, and security teams to ensure system reliability, performance, and scalability in production environments.
- Knowledge Sharing: A culture of knowledge sharing and continuous learning, with a focus on staying up-to-date with the latest SRE best practices and cloud technologies.
- Technical Mentoring: Technical mentoring and coaching opportunities to help cloud engineers and SRE professionals grow and develop their skills.
📝 Enhancement Note: Nexthink's SRE team values a strong focus on reliability, automation, incident response, compliance, and customer focus, with a dedication to cross-functional collaboration, knowledge sharing, and technical mentoring.
⚡ Challenges & Growth Opportunities
Technical Challenges:
- Incident Response & Monitoring: Developing and implementing incident response and monitoring strategies to ensure system availability and performance in a highly dynamic and complex environment.
- Automation & Infrastructure-as-Code: Streamlining processes and improving efficiency through automation and infrastructure-as-code, while also ensuring system reliability, performance, and scalability.
- Compliance & Evidence-Gathering: Ensuring that the organization meets all relevant standards and regulations, while also maintaining system reliability, performance, and scalability.
- Emerging Technologies: Staying up-to-date with the latest SRE best practices and cloud technologies, and integrating these into the organization's technology stack.
Learning & Development Opportunities:
- Technical Skill Development: Expanding your expertise in SRE and cloud operations, with a focus on incident response, automation, and compliance.
- Leadership Development: Developing your leadership skills by managing and growing a team of cloud engineers and SRE professionals.
- Architecture & Decision-Making: Gaining experience in defining and implementing enterprise-grade practices, and making critical decisions that impact system reliability, performance, and scalability.
📝 Enhancement Note: This role offers significant technical challenges and growth opportunities for candidates looking to expand their expertise in SRE and cloud operations, as well as develop their leadership skills in managing and growing a team of cloud engineers and SRE professionals.
💡 Interview Preparation
Technical Questions:
- Incident Response & Monitoring: "Can you describe a complex incident you've responded to in the past, and how you ensured system availability and performance in the aftermath?"
- Automation & Infrastructure-as-Code: "How have you used automation and infrastructure-as-code to streamline processes and improve efficiency in previous roles? Can you provide specific examples?"
- Compliance & Evidence-Gathering: "Can you describe a time when you had to ensure that your organization met all relevant standards and regulations? What steps did you take to gather evidence and demonstrate compliance?"
Company & Culture Questions:
- Company Culture: "How do you see yourself fitting into Nexthink's company culture, and how would you contribute to our collaborative and innovative work environment?"
- Team Dynamics: "How do you approach working with cross-functional teams, and how would you ensure effective collaboration with development, architecture, and security teams?"
- Leadership Style: "Can you describe your leadership style, and how you would manage and grow a team of cloud engineers and SRE professionals?"
Portfolio Presentation Strategy:
- SRE & Cloud Operations Portfolio: Highlight your experience in SRE and cloud operations by showcasing projects that demonstrate your ability to manage and improve system reliability, performance, and scalability.
- Incident Response & Monitoring: Include case studies or examples of incident response and monitoring strategies you've implemented to ensure system availability and performance.
- Automation & Infrastructure-as-Code: Highlight your experience with automation and infrastructure-as-code by showcasing projects that demonstrate your ability to streamline processes and improve efficiency.
- Compliance & Evidence-Gathering: Showcase your experience with compliance and evidence-gathering activities by including examples of audits or regulated deployments you've successfully managed.
📝 Enhancement Note: The interview process for this role is designed to assess the candidate's technical expertise in SRE and cloud operations, as well as their leadership skills and cultural fit within the SRE team and Nexthink as a whole. Candidates should focus on presenting a portfolio that demonstrates their technical expertise and leadership skills in SRE and cloud operations, with a strong emphasis on incident response, automation, and compliance.
📌 Application Steps
To apply for this Senior Engineering Manager, SRE position at Nexthink:
- Tailor Your Resume: Highlight your relevant experience in SRE and cloud operations, with a focus on incident response, automation, and compliance. Include specific examples of your technical achievements and leadership skills.
- Prepare Your Portfolio: Showcase your experience in SRE and cloud operations by including projects that demonstrate your ability to manage and improve system reliability, performance, and scalability. Highlight your incident response, automation, and compliance skills, and include examples of audits or regulated deployments you've successfully managed.
- Research Nexthink: Familiarize yourself with Nexthink's company culture, values, and technology stack. Understand how the SRE team fits into the organization's overall structure and how you can contribute to its success.
- Practice Technical Interview Questions: Brush up on your incident response, automation, and compliance skills, and practice answering technical interview questions related to these topics. Be prepared to discuss your leadership style and how you would manage and grow a team of cloud engineers and SRE professionals.
⚠️ Important Notice: This enhanced job description includes AI-generated insights and web development/server administration industry-standard assumptions. All details should be verified directly with the hiring organization before making application decisions.
ATS Keywords:
Programming Languages:
- Python, Bash, PowerShell, Go, Java, C#, etc. (depending on the candidate's experience)
Web Frameworks & Libraries:
- (Not applicable for this role)
Server Technologies:
- AWS, Azure, GCP (depending on the candidate's experience)
Databases:
- (Not applicable for this role)
Tools:
- Jenkins, Ansible, Terraform, Datadog, New Relic, Sumo Logic, Splunk, Dynatrace, etc. (depending on the candidate's experience)
Methodologies:
- Agile, Scrum, Kanban, Lean, DevOps, SRE, etc. (depending on the candidate's experience)
Soft Skills:
- Leadership, Communication, Teamwork, Problem-Solving, Decision-Making, etc. (depending on the candidate's experience)
Industry Terms:
- Cloud Operations, Site Reliability Engineering, Incident Response, Monitoring, Automation, Infrastructure-as-Code, Compliance, etc. (depending on the candidate's experience)
ATS Keyword Distribution: These keywords are naturally integrated throughout the job description, with a focus on incident response, automation, and compliance in SRE and cloud operations.
Application Requirements
Candidates should have a degree in Computer Science or Engineering and over 10 years of experience in cloud operations engineering leadership roles. A deep understanding of cloud technologies and experience with CI/CD and monitoring tools is essential.