Site Reliability Engineer

Nexthink
Full_timeMadrid, Spain

📍 Job Overview

  • Job Title: Site Reliability Engineer
  • Company: Nexthink
  • Location: Madrid, Spain
  • Job Type: Hybrid (2 days per week in the office)
  • Category: DevOps & Infrastructure
  • Date Posted: June 11, 2025
  • Experience Level: Mid-Senior Level (2-5 years)
  • Remote Status: On-site (Madrid, Spain)

🚀 Role Summary

  • 📝 Enhancement Note: This role focuses on maintaining and optimizing Nexthink's cloud infrastructure, ensuring high availability and performance for global customers. It involves managing Kubernetes clusters, automating operations, and collaborating with cross-functional teams.

💻 Primary Responsibilities

  • 📝 Enhancement Note: The primary responsibilities revolve around managing and maintaining Kubernetes clusters, automating routine tasks, and ensuring high system reliability and availability.

  • Manage and maintain Kubernetes clusters to ensure stability, scalability, and high availability, accommodating increasing demands.

  • Automate routine tasks and implement infrastructure as code (IaC) practices to facilitate rapid and reliable deployments, ensuring efficient resource provisioning and management.

  • Participate in an on-call rotation to provide prompt responses and resolution to critical incidents, maintaining the cloud infrastructure's uptime.

  • Proactively identify and troubleshoot system anomalies by collaborating with other teams to address incidents and implement preventive measures to reduce downtime.

  • Set up and maintain comprehensive monitoring and alerting systems to detect anomalies, capacity constraints, and potential performance bottlenecks, ensuring timely responses to alerts and alarms.

  • Continuously assess and optimize the performance of cloud infrastructure and applications to enhance system efficiency and reduce response times.

  • Maintain accurate and up-to-date documentation of processes, procedures, and troubleshooting guides to facilitate knowledge sharing and standardization.

🎓 Skills & Qualifications

Education: Bachelor's degree in Computer Science, Engineering, or a related field. Relevant experience may be considered in lieu of a degree.

Experience: Proven experience (2-5 years) in managing Kubernetes clusters in a production environment.

Required Skills:

  • Strong hands-on experience in managing Kubernetes clusters in a production environment.
  • Knowledge in config automation (Ansible), CI/CD (Jenkins), IaC (Terraform, Crossplane) for infrastructure management, and proficiency in at least one scripting language (bash, python).
  • Familiarity with source code management solutions (GitHub, Bitbucket) and the Atlassian suite (JIRA, Confluence).
  • Experience working in an on-call rotation environment and running operations.
  • Proven problem-solving skills and the ability to troubleshoot complex technical issues.
  • Deep commitment to maintaining high system reliability and availability.
  • Experience with AWS cloud computing platform and related services.
  • Basic knowledge of Kafka (MSK) is a plus.

Preferred Skills:

  • Excellent communication and collaboration skills to work effectively with cross-functional teams.
  • Excellent communication English skills.

📊 Web Portfolio & Project Requirements

Portfolio Essentials:

  • Demonstrate your experience in managing Kubernetes clusters with case studies showcasing your problem-solving skills and the impact you've made on system reliability and performance.
  • Highlight your automation and IaC skills with examples of scripts, tools, or projects that have streamlined operations and improved efficiency.
  • Showcase your monitoring and alerting expertise with examples of systems you've implemented to detect anomalies and ensure timely responses to alerts.

Technical Documentation:

  • Provide documentation of your processes, procedures, and troubleshooting guides to demonstrate your commitment to knowledge sharing and standardization.
  • Include any relevant certifications or training in Kubernetes, AWS, or related technologies to showcase your expertise.

💵 Compensation & Benefits

Salary Range: €45,000 - €65,000 per year (Estimated based on Madrid market standards for mid-senior level DevOps roles)

Benefits:

  • Permanent Contract and a competitive compensation package (Stock Options also included).
  • Amazing centrally located offices near the Bernabeu Stadium.
  • Private Health Insurance (Sanitas) and daily meal vouchers of €11 entirely covered by the company.
  • Hybrid work model balancing office and remote work, with a structured approach for new hires to foster connections and onboarding.
  • Flexible Hours and unlimited vacation (employees have unlimited paid time off on top of the 23 days of holidays offered) plus 3 company-paid volunteer days.
  • Up to €25 per month for a gym subscription.
  • Flexible retribution plan for kindergarten & transport tickets.
  • Reimbursement of up to 50% of the cost of English & Spanish classes.
  • Fresh fruit, cookies, soft drinks, and protein shakes at the office.
  • Regular company and team events like Pizza talks, Team Building activities, Christmas parties, hosting Meetups at the office, and more!
  • Bonuses for referring successful hires after three months of continuous employment.
  • Relocation package for people coming from another country.

🎯 Team & Company Context

🏢 Company Culture

Industry: Nexthink is the leader in digital employee experience management software, providing IT leaders with unprecedented insight into issues impacting employees across applications and networks.

Company Size: Nexthink has over 1,300 customers and 1,000 employees across 5 continents, operating as One Team with a commitment to diversity, inclusion, and equity.

Founded: 2004

Team Structure:

  • The Site Reliability Engineering team is responsible for maintaining and optimizing Nexthink's cloud infrastructure, ensuring high availability and performance for global customers.
  • The team collaborates closely with other departments, such as Software Development, IT, and Customer Success, to address incidents and implement preventive measures.

Development Methodology:

  • Nexthink follows Agile methodologies, focusing on continuous improvement, collaboration, and customer value.
  • The team uses JIRA for project management and GitHub for version control and collaborative development.
  • Infrastructure as Code (IaC) practices are employed to facilitate rapid and reliable deployments, ensuring efficient resource provisioning and management.

Company Website: https://www.nexthink.com/

📝 Enhancement Note: Nexthink's culture emphasizes innovation, collaboration, and continuous learning. The company values diversity and inclusion, with over 75 nationalities represented among its employees.

📈 Career & Growth Analysis

Web Technology Career Level: This role is at the mid-senior level, focusing on managing and optimizing cloud infrastructure, ensuring high availability and performance for global customers.

Reporting Structure: The Site Reliability Engineer reports directly to the Head of Site Reliability Engineering and collaborates closely with other teams, such as Software Development, IT, and Customer Success.

Technical Impact: The Site Reliability Engineer plays a crucial role in maintaining and optimizing Nexthink's cloud infrastructure, ensuring high availability and performance for global customers. Their work directly impacts the user experience and satisfaction of Nexthink's customers.

Growth Opportunities:

  • Technical Growth: Develop expertise in cloud infrastructure management, Kubernetes, and related technologies. Explore opportunities to specialize in specific areas, such as performance optimization, monitoring, or automation.
  • Leadership Development: Gain experience in mentoring junior team members and contributing to the team's growth and success. Explore opportunities to take on more significant responsibilities and lead projects or initiatives.
  • Architecture Decisions: Contribute to the design and implementation of Nexthink's cloud infrastructure, making strategic decisions that impact the company's technical direction.

📝 Enhancement Note: Nexthink offers a dynamic and challenging work environment, providing ample opportunities for growth and development. The company values internal promotions and supports employees in advancing their careers within the organization.

🌐 Work Environment

Office Type: Nexthink's Madrid office is centrally located near the Bernabeu Stadium, offering a modern and collaborative work environment.

Office Location(s): Madrid, Spain

Workspace Context:

  • The Site Reliability Engineer will work in a collaborative environment with other DevOps, IT, and Software Development team members.
  • The team uses state-of-the-art tools and technologies to manage and optimize Nexthink's cloud infrastructure.
  • The office provides ample space for team meetings, brainstorming sessions, and social events.

Work Schedule: The hybrid work model balances office and remote work, with a structured approach for new hires to foster connections and onboarding. The standard workweek is Monday to Friday, with flexible hours and unlimited vacation.

📝 Enhancement Note: Nexthink's work environment fosters collaboration, innovation, and continuous learning. The company offers a flexible and supportive work environment, allowing employees to balance their personal and professional lives.

📄 Application & Technical Interview Process

Interview Process:

  1. Online Assessment: Complete an online assessment to evaluate your technical skills and problem-solving abilities. Focus on Kubernetes, AWS, and related technologies.
  2. Technical Phone Screen: Participate in a phone or video call with a member of the Site Reliability Engineering team to discuss your experience, qualifications, and career goals. Be prepared to answer technical questions related to Kubernetes, AWS, and related technologies.
  3. On-site Interview: Visit Nexthink's Madrid office for an on-site interview with the Site Reliability Engineering team. Expect to discuss your portfolio, case studies, and technical challenges in-depth. Be prepared to present your problem-solving approach and demonstrate your expertise in managing Kubernetes clusters and optimizing cloud infrastructure.
  4. Final Decision: Receive a final decision from the hiring manager, and if successful, proceed to the onboarding process.

Portfolio Review Tips:

  • Highlight your experience in managing Kubernetes clusters, automating operations, and ensuring high system reliability and availability.
  • Showcase your problem-solving skills with case studies demonstrating your ability to troubleshoot complex technical issues and optimize cloud infrastructure.
  • Include any relevant certifications or training in Kubernetes, AWS, or related technologies to showcase your expertise.

Technical Challenge Preparation:

  • Review your knowledge of Kubernetes, AWS, and related technologies, focusing on managing Kubernetes clusters, automating operations, and ensuring high system reliability and availability.
  • Practice explaining your problem-solving approach and demonstrating your expertise in managing Kubernetes clusters and optimizing cloud infrastructure.
  • Familiarize yourself with Nexthink's products and services, understanding the company's mission and values.

ATS Keywords: [A comprehensive list of web development and server administration-relevant keywords for resume optimization, organized by category: programming languages, web frameworks, server technologies, databases, tools, methodologies, soft skills, industry terms]

📝 Enhancement Note: Nexthink's interview process focuses on evaluating candidates' technical skills, problem-solving abilities, and cultural fit. The company values candidates who can demonstrate their expertise in managing Kubernetes clusters and optimizing cloud infrastructure.

🛠 Technology Stack & Web Infrastructure

Frontend Technologies: [Not applicable for this role]

Backend & Server Technologies:

  • Kubernetes: Manage and maintain Kubernetes clusters to ensure stability, scalability, and high availability, accommodating increasing demands.
  • AWS: Utilize AWS cloud computing platform and related services to manage and optimize Nexthink's cloud infrastructure.
  • GitOps: Implement GitOps practices to automate routine tasks and streamline operations, ensuring efficient resource provisioning and management.
  • CI/CD: Use CI/CD pipelines to automate deployment processes and ensure efficient resource management.
  • Monitoring Tools: Set up and maintain comprehensive monitoring and alerting systems to detect anomalies, capacity constraints, and potential performance bottlenecks.

Development & DevOps Tools:

  • Ansible: Use Ansible for config automation to manage and configure Kubernetes clusters and related infrastructure.
  • Terraform: Implement IaC practices with Terraform to facilitate rapid and reliable deployments, ensuring efficient resource provisioning and management.
  • Jenkins: Utilize Jenkins for CI/CD pipeline automation and deployment processes.
  • GitHub: Use GitHub for version control and collaborative development.
  • JIRA: Use JIRA for project management and issue tracking.

📝 Enhancement Note: Nexthink's technology stack focuses on cloud infrastructure management, ensuring high availability and performance for global customers. The company uses cutting-edge tools and technologies to maintain and optimize its cloud infrastructure.

👥 Team Culture & Values

Web Development Values:

  • Innovation: Nexthink values innovation and encourages its employees to explore new technologies and approaches to improve cloud infrastructure management and optimize performance.
  • Collaboration: The company fosters a collaborative work environment, encouraging team members to share knowledge and learn from one another.
  • Continuous Learning: Nexthink supports its employees' professional development and encourages them to pursue relevant certifications and training opportunities.
  • Customer Focus: The company prioritizes the user experience and strives to provide exceptional service to its global customers.

Collaboration Style:

  • Cross-Functional Integration: The Site Reliability Engineering team collaborates closely with other departments, such as Software Development, IT, and Customer Success, to address incidents and implement preventive measures.
  • Code Review Culture: The team follows best practices for code reviews, ensuring high-quality standards and knowledge sharing.
  • Knowledge Sharing: Nexthink encourages its employees to share their expertise and learn from one another, fostering a culture of continuous learning and improvement.

📝 Enhancement Note: Nexthink's culture emphasizes innovation, collaboration, and continuous learning. The company values diversity and inclusion, with over 75 nationalities represented among its employees.

⚡ Challenges & Growth Opportunities

Technical Challenges:

  • Kubernetes Cluster Management: Manage and maintain Kubernetes clusters to ensure stability, scalability, and high availability, accommodating increasing demands.
  • Automation and IaC: Automate routine tasks and implement IaC practices to facilitate rapid and reliable deployments, ensuring efficient resource provisioning and management.
  • Monitoring and Alerting: Set up and maintain comprehensive monitoring and alerting systems to detect anomalies, capacity constraints, and potential performance bottlenecks, ensuring timely responses to alerts and alarms.
  • Performance Optimization: Continuously assess and optimize the performance of cloud infrastructure and applications to enhance system efficiency and reduce response times.

Learning & Development Opportunities:

  • Technical Skill Development: Develop expertise in cloud infrastructure management, Kubernetes, and related technologies. Explore opportunities to specialize in specific areas, such as performance optimization, monitoring, or automation.
  • Certifications and Training: Pursue relevant certifications and training opportunities in Kubernetes, AWS, or related technologies to showcase your expertise and advance your career.
  • Mentorship and Leadership: Gain experience in mentoring junior team members and contributing to the team's growth and success. Explore opportunities to take on more significant responsibilities and lead projects or initiatives.

📝 Enhancement Note: Nexthink offers a dynamic and challenging work environment, providing ample opportunities for growth and development. The company values internal promotions and supports employees in advancing their careers within the organization.

💡 Interview Preparation

Technical Questions:

  • Kubernetes Cluster Management: Prepare for questions related to managing Kubernetes clusters, ensuring stability, scalability, and high availability. Be ready to discuss your experience in managing Kubernetes clusters and optimizing cloud infrastructure.
  • Automation and IaC: Expect questions about your experience with automation and IaC practices, focusing on tools like Ansible, Terraform, and Jenkins. Be prepared to discuss your approach to automating routine tasks and streamlining operations.
  • Monitoring and Alerting: Prepare for questions related to setting up and maintaining comprehensive monitoring and alerting systems. Be ready to discuss your experience in detecting anomalies, capacity constraints, and potential performance bottlenecks.

Company & Culture Questions:

  • Nexthink's Mission and Values: Familiarize yourself with Nexthink's mission and values, and be prepared to discuss how your experience and skills align with the company's goals and culture.
  • Team Dynamics: Expect questions about your experience working in a collaborative environment, and be ready to discuss your approach to knowledge sharing and continuous learning.
  • Problem-Solving Approach: Prepare for questions that assess your problem-solving skills and ability to troubleshoot complex technical issues. Be ready to discuss your approach to managing Kubernetes clusters and optimizing cloud infrastructure.

Portfolio Presentation Strategy:

  • Kubernetes Cluster Management: Highlight your experience in managing Kubernetes clusters, demonstrating your ability to ensure stability, scalability, and high availability.
  • Automation and IaC: Showcase your automation and IaC skills with examples of scripts, tools, or projects that have streamlined operations and improved efficiency.
  • Monitoring and Alerting: Include examples of monitoring and alerting systems you've implemented to detect anomalies and ensure timely responses to alerts and alarms.

📝 Enhancement Note: Nexthink's interview process focuses on evaluating candidates' technical skills, problem-solving abilities, and cultural fit. The company values candidates who can demonstrate their expertise in managing Kubernetes clusters and optimizing cloud infrastructure.

📌 Application Steps

To apply for this Site Reliability Engineer position at Nexthink:

  1. Submit your application through the application link provided in the job listing.
  2. Customize your portfolio with live demos and responsive examples showcasing your experience in managing Kubernetes clusters, automating operations, and ensuring high system reliability and availability.
  3. Optimize your resume for web technology roles, highlighting your project experience and technical skills relevant to cloud infrastructure management and optimization.
  4. Prepare for technical interviews by reviewing your knowledge of Kubernetes, AWS, and related technologies, focusing on managing Kubernetes clusters, automating operations, and ensuring high system reliability and availability.
  5. Research Nexthink to understand the company's mission, values, and culture, focusing on the user experience and customer success.

⚠️ Important Notice: This enhanced job description includes AI-generated insights and web development/server administration industry-standard assumptions. All details should be verified directly with the hiring organization before making application decisions.

Application Requirements

Strong hands-on experience in managing Kubernetes clusters and knowledge in config automation and CI/CD practices is required. Familiarity with AWS and excellent communication skills are also essential.