Senior Site Reliability Engineer II at Elsevier

📍 Job Overview

Job Title: Senior Site Reliability Engineer II
Company: Elsevier
Location: London, Oxford
Job Type: Full-Time
Category: DevOps, Infrastructure
Date Posted: 2025-07-03
Experience Level: Mid-Senior level
Remote Status: On-site

🚀 Role Summary

Design and manage cloud platforms to ensure reliability and performance of complex systems.
Collaborate with development and operations teams to optimize applications for cloud environments.
Mentor junior team members and promote best practices in SRE, automation, and cloud architecture.

💻 Primary Responsibilities

Cloud Platform Design & Management:
- Design, deploy, and maintain highly available, scalable Kubernetes clusters on AWS EKS.
- Manage and optimize cross-portfolio cloud infrastructure using AWS services and supported organizational tooling.
- Develop and maintain Infrastructure as Code (IaC) solutions to automate provisioning and management of cloud and Kubernetes resources.
Automation & Incident Management:
- Write automation processes to streamline operational workflows, incident response, and infrastructure management.
- Implement CI/CD pipelines to facilitate deployments, testing, and validation.
- Monitor system health, instrument system components, troubleshoot issues, and perform root cause analysis.
Collaboration & Mentoring:
- Collaborate with development and operations teams to optimize applications for cloud environments.
- Mentor junior team members and promote best practices in SRE, automation, and cloud architecture.
- Stay current with industry trends, emerging technologies, and best practices for cloud-native and infrastructure management.

🎓 Skills & Qualifications

Education:

Bachelor's degree in Computer Science, Engineering, or related field (or equivalent experience).

Experience:

Good experience in SRE, DevOps, or cloud infrastructure roles.
Relevant certifications (e.g., AWS Solutions Architect, CKA, Terraform Associate) are a plus.

Required Skills:

Extensive experience deploying, managing, and troubleshooting containerized applications.
Deep understanding of Kubernetes architecture, networking, security, storage, and operational best practices.
Proven expertise with AWS services and architectural principles.
Strong knowledge of AWS security, compliance, and best practices.
Advanced skills in writing modular, reusable IaC components.
Strong Python scripting skills for automation, tooling, and data processing.
Ability to develop custom solutions for monitoring, automation, and incident management.
Experience designing and maintaining CI/CD workflows using GitHub Actions.
Experience supporting highly available, multi-regional critical environments.
Proven ability to manage multiple portfolios and complex landscapes.
Familiarity with monitoring tools such as NewRelic.
Knowledge of security best practices, network policies, and enterprise-grade RBAC policies.
Strong problem-solving, troubleshooting, and incident management skills.
Effective communication and collaboration skills.

Preferred Skills:

Experience with multi-cloud environments.
Familiarity with Terraform or other IaC tools.
Knowledge of container security best practices.

📊 Web Portfolio & Project Requirements

Portfolio Essentials:

Demonstrate your ability to design, deploy, and manage Kubernetes clusters on AWS EKS.
Showcase your automation skills with examples of streamlined operational workflows and incident response processes.
Highlight your problem-solving skills with examples of troubleshooting and root cause analysis.

Technical Documentation:

Provide documentation for your solutions, ensuring users have clear instructions and support to effectively implement and operate their systems.
Include code comments, version control, and deployment processes in your portfolio.

💵 Compensation & Benefits

Salary Range:

£70,000 - £90,000 per annum (based on experience and location)

Benefits:

Annual Profit Share Bonus
Comprehensive Pension Plan
Home, Office or Commuting Allowance
Generous Vacation Entitlement and option for Sabbatical Leave
Maternity, Paternity, Adoption and Family Care Leave
Internal Communities and Networks
Recruitment Introduction Reward

Working Hours:

Full-time (35-40 hours per week)

🎯 Team & Company Context

Company Culture:

Elsevier is a global leader in information and analytics, helping researchers and healthcare professionals advance science and improve health outcomes.
The company thrives on excellence, innovation, and a strong dedication to customers, employees, and communities.

Team Structure:

The team combines software thinking and service operations to enable and run Elsevier’s large-scale, 24x7, distributed and fault-tolerant systems within agreed reliability objectives.
The team works closely with development and operations teams to optimize applications for cloud environments.

Development Methodology:

The team follows Agile methodologies, with a focus on sprint planning, code review, testing, and quality assurance practices.
CI/CD pipelines are used for deployments, testing, and validation.

Company Website: Elsevier

📈 Career & Growth Analysis

Web Technology Career Level:

This role is at the Mid-Senior level, focusing on designing and managing cloud platforms, mentoring junior team members, and driving best practices in SRE, automation, and cloud architecture.

Reporting Structure:

The role reports directly to the SRE Manager and collaborates with development and operations teams.

Technical Impact:

The role has a significant impact on Elsevier’s large-scale, 24x7 systems, ensuring high availability, performance, and rapid incident resolution.
The role also influences the technical direction of the SRE team, driving best practices and mentoring junior team members.

🌐 Work Environment

Office Type:

The role is based in London or Oxford, with on-site work required.

Office Location(s):

London - London Wall
Oxford

Workspace Context:

The workspace is collaborative, with a focus on cross-functional integration between developers, designers, and stakeholders.
The team uses modern development tools, multiple monitors, and testing devices to ensure high-quality solutions.

Work Schedule:

The role requires a full-time commitment, with flexibility for deployment windows, maintenance, and project deadlines.

📄 Application & Technical Interview Process

Interview Process:

Technical Phone Screen: Assess your understanding of Kubernetes, AWS, and automation skills.
On-site Technical Deep Dive: Dive into your cloud architecture, automation, and incident management skills with hands-on exercises and case studies.
Behavioral Interview: Evaluate your problem-solving, communication, and collaboration skills.
Final Review: Review your qualifications, career goals, and cultural fit.

Portfolio Review Tips:

Highlight your cloud architecture, automation, and incident management skills with live demos and responsive design standards.
Include code quality demonstration and responsive design standards for this role.

Technical Challenge Preparation:

Brush up on your Kubernetes, AWS, and automation skills with hands-on exercises and case studies.
Practice problem-solving and communication skills for technical interviews.

🛠 Technology Stack & Web Infrastructure

Cloud Platform:

AWS (EKS, RDS, Route 53, CloudFront, etc.)

Containerization:

Kubernetes

Infrastructure as Code (IaC):

Terraform (preferred) or other IaC tools

Monitoring & Logging:

NewRelic or other monitoring tools

CI/CD:

GitHub Actions or other CI/CD tools

Version Control:

Scripting:

Python

Documentation:

Markdown or other documentation tools

👥 Team Culture & Values

Web Development Values:

Elsevier values innovation, collaboration, and a strong dedication to customers, employees, and communities.
The SRE team emphasizes reliability, performance, and continuous improvement.

Collaboration Style:

The team follows Agile methodologies, with a focus on cross-functional integration, code review culture, and peer programming practices.
Knowledge sharing, technical mentoring, and continuous learning are encouraged.

⚡ Challenges & Growth Opportunities

Technical Challenges:

Designing and managing highly available, scalable Kubernetes clusters on AWS EKS.
Automating deployment pipelines, testing, and validation processes.
Supporting multi-regional critical environments with high availability and rapid incident resolution.

Learning & Development Opportunities:

Stay current with industry trends, emerging technologies, and best practices for cloud-native and infrastructure management.
Develop your leadership skills by mentoring junior team members and driving best practices in SRE, automation, and cloud architecture.
Contribute to Elsevier’s mission of advancing science and improving health outcomes for the benefit of society.

💡 Interview Preparation

Technical Questions:

Cloud Architecture: Describe your approach to designing and managing highly available, scalable Kubernetes clusters on AWS EKS.
Automation: Explain your automation processes for streamlining operational workflows, incident response, and infrastructure management.
Incident Management: Walk through your incident management process, including monitoring, troubleshooting, and root cause analysis.

Company & Culture Questions:

Company Culture: How do you align with Elsevier’s values and culture, particularly in the context of the SRE team?
Team Dynamics: Describe your experience working in a collaborative, cross-functional team environment, and how you have contributed to team success.

Portfolio Presentation Strategy:

Demonstrate your cloud architecture, automation, and incident management skills with live demos and responsive design standards.
Include code quality demonstration and responsive design standards for this role.

📌 Application Steps

To apply for this Senior Site Reliability Engineer II position:

Submit your application through the application link.
Customize your resume with relevant web technology skills, experience, and portfolio highlights.
Prepare for technical phone screens, on-site technical deep dives, and behavioral interviews.
Research Elsevier’s company culture, mission, and values to ensure a strong cultural fit.

⚠️ Important Notice: This enhanced job description includes AI-generated insights and web technology industry-standard assumptions. All details should be verified directly with the hiring organization before making application decisions.

Senior Site Reliability Engineer II