Senior Site Reliability Engineer II

Elsevier
Full_timeLondon, United Kingdom

📍 Job Overview

  • Job Title: Senior Site Reliability Engineer II
  • Company: Elsevier
  • Location: London, Oxford
  • Job Type: Full-Time
  • Category: DevOps, Infrastructure
  • Date Posted: 2025-07-03
  • Experience Level: Mid-Senior level
  • Remote Status: On-site

🚀 Role Summary

  • Design and manage cloud platforms to ensure reliability and performance of complex systems.
  • Collaborate with development and operations teams to optimize applications for cloud environments.
  • Mentor junior team members and promote best practices in SRE, automation, and cloud architecture.

💻 Primary Responsibilities

  • Cloud Platform Design & Management:

    • Design, deploy, and maintain highly available, scalable Kubernetes clusters on AWS EKS.
    • Manage and optimize cross-portfolio cloud infrastructure using AWS services and supported organizational tooling.
    • Develop and maintain Infrastructure as Code (IaC) solutions to automate provisioning and management of cloud and Kubernetes resources.
  • Automation & Incident Management:

    • Write automation processes to streamline operational workflows, incident response, and infrastructure management.
    • Implement CI/CD pipelines to facilitate deployments, testing, and validation.
    • Monitor system health, instrument system components, troubleshoot issues, and perform root cause analysis.
  • Collaboration & Mentoring:

    • Collaborate with development and operations teams to optimize applications for cloud environments.
    • Mentor junior team members and promote best practices in SRE, automation, and cloud architecture.
    • Stay current with industry trends, emerging technologies, and best practices for cloud-native and infrastructure management.

🎓 Skills & Qualifications

Education:

  • Bachelor's degree in Computer Science, Engineering, or related field (or equivalent experience).

Experience:

  • Good experience in SRE, DevOps, or cloud infrastructure roles.
  • Relevant certifications (e.g., AWS Solutions Architect, CKA, Terraform Associate) are a plus.

Required Skills:

  • Extensive experience deploying, managing, and troubleshooting containerized applications.
  • Deep understanding of Kubernetes architecture, networking, security, storage, and operational best practices.
  • Proven expertise with AWS services and architectural principles.
  • Strong knowledge of AWS security, compliance, and best practices.
  • Advanced skills in writing modular, reusable IaC components.
  • Strong Python scripting skills for automation, tooling, and data processing.
  • Ability to develop custom solutions for monitoring, automation, and incident management.
  • Experience designing and maintaining CI/CD workflows using GitHub Actions.
  • Experience supporting highly available, multi-regional critical environments.
  • Proven ability to manage multiple portfolios and complex landscapes.
  • Familiarity with monitoring tools such as NewRelic.
  • Knowledge of security best practices, network policies, and enterprise-grade RBAC policies.
  • Strong problem-solving, troubleshooting, and incident management skills.
  • Effective communication and collaboration skills.

Preferred Skills:

  • Experience with multi-cloud environments.
  • Familiarity with Terraform or other IaC tools.
  • Knowledge of container security best practices.

📊 Web Portfolio & Project Requirements

Portfolio Essentials:

  • Demonstrate your ability to design, deploy, and manage Kubernetes clusters on AWS EKS.
  • Showcase your automation skills with examples of streamlined operational workflows and incident response processes.
  • Highlight your problem-solving skills with examples of troubleshooting and root cause analysis.

Technical Documentation:

  • Provide documentation for your solutions, ensuring users have clear instructions and support to effectively implement and operate their systems.
  • Include code comments, version control, and deployment processes in your portfolio.

💵 Compensation & Benefits

Salary Range:

  • £70,000 - £90,000 per annum (based on experience and location)

Benefits:

  • Annual Profit Share Bonus
  • Comprehensive Pension Plan
  • Home, Office or Commuting Allowance
  • Generous Vacation Entitlement and option for Sabbatical Leave
  • Maternity, Paternity, Adoption and Family Care Leave
  • Internal Communities and Networks
  • Recruitment Introduction Reward

Working Hours:

  • Full-time (35-40 hours per week)

🎯 Team & Company Context

Company Culture:

  • Elsevier is a global leader in information and analytics, helping researchers and healthcare professionals advance science and improve health outcomes.
  • The company thrives on excellence, innovation, and a strong dedication to customers, employees, and communities.

Team Structure:

  • The team combines software thinking and service operations to enable and run Elsevier’s large-scale, 24x7, distributed and fault-tolerant systems within agreed reliability objectives.
  • The team works closely with development and operations teams to optimize applications for cloud environments.

Development Methodology:

  • The team follows Agile methodologies, with a focus on sprint planning, code review, testing, and quality assurance practices.
  • CI/CD pipelines are used for deployments, testing, and validation.

Company Website: Elsevier

📈 Career & Growth Analysis

Web Technology Career Level:

  • This role is at the Mid-Senior level, focusing on designing and managing cloud platforms, mentoring junior team members, and driving best practices in SRE, automation, and cloud architecture.

Reporting Structure:

  • The role reports directly to the SRE Manager and collaborates with development and operations teams.

Technical Impact:

  • The role has a significant impact on Elsevier’s large-scale, 24x7 systems, ensuring high availability, performance, and rapid incident resolution.
  • The role also influences the technical direction of the SRE team, driving best practices and mentoring junior team members.

🌐 Work Environment

Office Type:

  • The role is based in London or Oxford, with on-site work required.

Office Location(s):

  • London - London Wall
  • Oxford

Workspace Context:

  • The workspace is collaborative, with a focus on cross-functional integration between developers, designers, and stakeholders.
  • The team uses modern development tools, multiple monitors, and testing devices to ensure high-quality solutions.

Work Schedule:

  • The role requires a full-time commitment, with flexibility for deployment windows, maintenance, and project deadlines.

📄 Application & Technical Interview Process

Interview Process:

  1. Technical Phone Screen: Assess your understanding of Kubernetes, AWS, and automation skills.
  2. On-site Technical Deep Dive: Dive into your cloud architecture, automation, and incident management skills with hands-on exercises and case studies.
  3. Behavioral Interview: Evaluate your problem-solving, communication, and collaboration skills.
  4. Final Review: Review your qualifications, career goals, and cultural fit.

Portfolio Review Tips:

  • Highlight your cloud architecture, automation, and incident management skills with live demos and responsive design standards.
  • Include code quality demonstration and responsive design standards for this role.

Technical Challenge Preparation:

  • Brush up on your Kubernetes, AWS, and automation skills with hands-on exercises and case studies.
  • Practice problem-solving and communication skills for technical interviews.

🛠 Technology Stack & Web Infrastructure

Cloud Platform:

  • AWS (EKS, RDS, Route 53, CloudFront, etc.)

Containerization:

  • Kubernetes

Infrastructure as Code (IaC):

  • Terraform (preferred) or other IaC tools

Monitoring & Logging:

  • NewRelic or other monitoring tools

CI/CD:

  • GitHub Actions or other CI/CD tools

Version Control:

  • Git

Scripting:

  • Python

Documentation:

  • Markdown or other documentation tools

👥 Team Culture & Values

Web Development Values:

  • Elsevier values innovation, collaboration, and a strong dedication to customers, employees, and communities.
  • The SRE team emphasizes reliability, performance, and continuous improvement.

Collaboration Style:

  • The team follows Agile methodologies, with a focus on cross-functional integration, code review culture, and peer programming practices.
  • Knowledge sharing, technical mentoring, and continuous learning are encouraged.

⚡ Challenges & Growth Opportunities

Technical Challenges:

  • Designing and managing highly available, scalable Kubernetes clusters on AWS EKS.
  • Automating deployment pipelines, testing, and validation processes.
  • Supporting multi-regional critical environments with high availability and rapid incident resolution.

Learning & Development Opportunities:

  • Stay current with industry trends, emerging technologies, and best practices for cloud-native and infrastructure management.
  • Develop your leadership skills by mentoring junior team members and driving best practices in SRE, automation, and cloud architecture.
  • Contribute to Elsevier’s mission of advancing science and improving health outcomes for the benefit of society.

💡 Interview Preparation

Technical Questions:

  1. Cloud Architecture: Describe your approach to designing and managing highly available, scalable Kubernetes clusters on AWS EKS.
  2. Automation: Explain your automation processes for streamlining operational workflows, incident response, and infrastructure management.
  3. Incident Management: Walk through your incident management process, including monitoring, troubleshooting, and root cause analysis.

Company & Culture Questions:

  1. Company Culture: How do you align with Elsevier’s values and culture, particularly in the context of the SRE team?
  2. Team Dynamics: Describe your experience working in a collaborative, cross-functional team environment, and how you have contributed to team success.

Portfolio Presentation Strategy:

  • Demonstrate your cloud architecture, automation, and incident management skills with live demos and responsive design standards.
  • Include code quality demonstration and responsive design standards for this role.

📌 Application Steps

To apply for this Senior Site Reliability Engineer II position:

  1. Submit your application through the application link.
  2. Customize your resume with relevant web technology skills, experience, and portfolio highlights.
  3. Prepare for technical phone screens, on-site technical deep dives, and behavioral interviews.
  4. Research Elsevier’s company culture, mission, and values to ensure a strong cultural fit.

⚠️ Important Notice: This enhanced job description includes AI-generated insights and web technology industry-standard assumptions. All details should be verified directly with the hiring organization before making application decisions.

Application Requirements

Candidates should have extensive experience with containerized applications and a deep understanding of Kubernetes and AWS services. Strong automation skills and the ability to mentor junior team members are also essential.