📍 Job Overview

Job Title: Senior Site Reliability Engineer II
Company: Remitly
Location: London, United Kingdom
Job Type: Full-Time
Category: DevOps, Site Reliability Engineering
Date Posted: 2025-07-03

🚀 Role Summary

Design and manage cloud platforms to ensure reliability and performance of complex systems.
Collaborate with teams to manage cloud infrastructure and streamline operational workflows.
Mentor junior team members and promote best practices in SRE, automation, and cloud architecture.

💻 Primary Responsibilities

Cloud Platform Management:
- Design, deploy, and maintain highly available Kubernetes clusters on AWS EKS.
- Manage and optimize cross-portfolio cloud infrastructure using AWS services and supported organizational tooling.
- Develop and maintain Infrastructure as Code (IaC) solutions to automate provisioning and management of cloud and Kubernetes resources.
Automation and Workflow Optimization:
- Write automation processes to streamline operational workflows, incident response, and infrastructure management.
- Implement CI/CD pipelines to facilitate deployments, testing, and validation.
- Support multi-regional critical infrastructure, ensuring high availability and rapid incident resolution.
Collaboration and Knowledge Sharing:
- Collaborate with development and operations teams to optimize applications for cloud environments.
- Maintain comprehensive documentation and best practice guides for solutions, ensuring users have clear instructions and support.
- Monitor system health, instrument system components, troubleshoot issues, and perform root cause analysis.
- Mentor junior team members and promote best practices in SRE, automation, and cloud architecture.

🎓 Skills & Qualifications

Education:

Bachelor's degree in Computer Science, Engineering, or related field (or equivalent experience).

Experience:

Extensive experience deploying, managing, and troubleshooting containerized applications.
Deep understanding of Kubernetes architecture, networking, security, storage, and operational best practices.
Proven expertise with AWS services and architectural principles.
Strong knowledge of AWS security, compliance, and best practices.
Advanced skills in writing modular, reusable IaC components.
Strong Python scripting skills for automation, tooling, and data processing.
Experience designing and maintaining CI/CD workflows using GitHub Actions.
Familiarity with monitoring tools such as NewRelic.
Knowledge of security best practices, network policies, and enterprise-grade RBAC policies.
Strong problem-solving, troubleshooting, and incident management skills.
Effective communication and collaboration skills.

Preferred Skills:

Relevant certifications (e.g., AWS Solutions Architect, CKA, Terraform Associate) are a plus.

📊 Web Portfolio & Project Requirements

Portfolio Essentials:

Demonstrate expertise in cloud-native architecture, Kubernetes, and AWS services through past projects.
Showcase automation skills and the ability to streamline operational workflows.
Highlight experience managing multi-regional critical infrastructure and ensuring high availability.

Technical Documentation:

Provide detailed documentation for cloud infrastructure, Kubernetes clusters, and automation processes.
Include best practice guides and step-by-step instructions for deploying and managing cloud resources.
Demonstrate understanding of security best practices and compliance requirements.

💵 Compensation & Benefits

Salary Range:

£80,000 - £100,000 per annum (based on experience and market research)

Benefits:

Annual Profit Share Bonus
Comprehensive Pension Plan
Home, Office or Commuting Allowance
Generous Vacation Entitlement
Option for Sabbatical Leave
Maternity Leave
Paternity Leave
Adoption Leave
Family Care Leave
Internal Communities and Networks
Recruitment Introduction Reward

Working Hours:

Full-time position with standard working hours (40 hours per week)

🎯 Team & Company Context

Company Culture:

Elsevier is a global leader in information and analytics, helping researchers and healthcare professionals advance science and improve health outcomes.
The company thrives on excellence, innovation, and a strong dedication to customers, employees, and communities.
Elsevier offers a vibrant, diverse, and collaborative team environment where employees are free to grow and contribute actively.

Team Structure:

The team combines software thinking and service operations to enable and run Elsevier’s large-scale, 24x7, distributed, and fault-tolerant systems within agreed reliability objectives.
The team works closely with development and operations teams to optimize applications for cloud environments and ensure the fast flow of feature and service updates.

Development Methodology:

The team follows Agile methodologies, with a focus on continuous integration, delivery, and improvement.
They use GitHub Actions for CI/CD pipelines and NewRelic for monitoring and performance tracking.

Company Website:

Elsevier

📈 Career & Growth Analysis

Web Technology Career Level:

Senior Site Reliability Engineer II role focuses on managing complex cloud infrastructure, mentoring junior team members, and driving best practices in SRE, automation, and cloud architecture.

Reporting Structure:

The role reports directly to the Site Reliability Engineering Manager and works closely with development and operations teams.

Technical Impact:

The Senior SRE II role has a significant impact on Elsevier’s large-scale, 24x7 systems, ensuring high availability, performance, and rapid incident resolution.
The role also influences the fast flow of feature and service updates, enabling the company to innovate and adapt quickly to market demands.

🌐 Work Environment

Office Type:

Elsevier offers a flexible work environment, with the option to work from home, the office, or a combination of both.

Office Location(s):

London, United Kingdom (with remote work flexibility)

Workspace Context:

Elsevier provides a collaborative workspace with multiple monitors, testing devices, and development tools to support web development teams.
The company encourages cross-functional interaction between developers, designers, and stakeholders to foster innovation and user-centered design.

Work Schedule:

The role follows a standard full-time work schedule with flexible hours to accommodate project deadlines and maintenance windows.

📄 Application & Technical Interview Process

Interview Process:

Technical Assessment (1 hour): Evaluate the candidate's understanding of cloud-native architecture, Kubernetes, and AWS services through a hands-on technical assessment.
Behavioral Interview (45 minutes): Assess the candidate's problem-solving skills, communication, and cultural fit with Elsevier.
Final Interview (30 minutes): Discuss the candidate's career aspirations, motivation, and alignment with the company's mission and values.

Portfolio Review Tips:

Highlight past projects that demonstrate expertise in cloud-native architecture, Kubernetes, and AWS services.
Provide detailed documentation for cloud infrastructure, Kubernetes clusters, and automation processes.
Showcase experience managing multi-regional critical infrastructure and ensuring high availability.

Technical Challenge Preparation:

Brush up on cloud-native architecture, Kubernetes, and AWS services.
Practice designing and deploying Kubernetes clusters on AWS EKS.
Familiarize yourself with Elsevier's development methodologies and tools, such as GitHub Actions and NewRelic.

🛠 Technology Stack & Web Infrastructure

Cloud Platform:

AWS EKS (Kubernetes on AWS)
AWS Services (EC2, RDS, ELB, etc.)
Infrastructure as Code (IaC) tools (Terraform, CloudFormation)

Containerization & Orchestration:

Kubernetes
Docker

Monitoring & Logging:

NewRelic
ELK Stack (Elasticsearch, Logstash, Kibana)

CI/CD & Automation:

GitHub Actions
Jenkins

Programming Languages:

Python
Bash

Databases:

Amazon RDS (PostgreSQL, MySQL)
Amazon DynamoDB

Caching:

Amazon ElastiCache
Redis

👥 Team Culture & Values

Web Development Values:

Elsevier values innovation, collaboration, and user-centered design in web development.
The company emphasizes continuous learning, knowledge sharing, and promoting best practices in SRE, automation, and cloud architecture.

Collaboration Style:

Elsevier fosters a collaborative work environment, with cross-functional teams working together to deliver high-quality web products and services.
The company encourages code reviews, peer programming, and knowledge sharing to drive technical excellence and innovation.

🌐 Challenges & Growth Opportunities

Technical Challenges:

Designing and managing highly available, scalable Kubernetes clusters on AWS EKS.
Optimizing cloud infrastructure for performance, cost-effectiveness, and security.
Automating deployment pipelines, testing, and validation processes.
Ensuring multi-regional critical infrastructure and rapid incident resolution.

Learning & Development Opportunities:

Elsevier offers opportunities for career progression, technical skill development, and leadership roles in SRE, automation, and cloud architecture.
The company supports conference attendance, certification, and community involvement to help employees stay current with emerging technologies and best practices.

💡 Interview Preparation

Technical Questions:

Cloud Architecture (30 minutes): Design a highly available, scalable Kubernetes cluster on AWS EKS, considering best practices, security, and cost-effectiveness.
Incident Management (30 minutes): Walk through a recent incident you've handled, describing your approach, tools used, and the outcome.
Automation (30 minutes): Explain a complex automation task you've completed and the tools and techniques you used to streamline the process.

Company & Culture Questions:

Company Mission (15 minutes): Explain how your work aligns with Elsevier's mission to advance science and improve health outcomes.
Team Dynamics (15 minutes): Describe how you've worked effectively in a remote or hybrid team environment, and how you've contributed to a positive team culture.

Portfolio Presentation Strategy:

Highlight past projects that demonstrate expertise in cloud-native architecture, Kubernetes, and AWS services.
Showcase automation skills and experience managing multi-regional critical infrastructure.
Emphasize your ability to collaborate with teams, mentor junior team members, and drive best practices in SRE, automation, and cloud architecture.

📌 Application Steps

To apply for this Senior Site Reliability Engineer II position:

Update Your Resume (15 minutes): Tailor your resume to highlight relevant skills, experience, and achievements in cloud-native architecture, Kubernetes, and AWS services.
Prepare Your Portfolio (30 minutes): Curate a portfolio showcasing your expertise in cloud-native architecture, Kubernetes, and AWS services, with a focus on automation, infrastructure management, and incident response.
Research Elsevier (15 minutes): Familiarize yourself with Elsevier's mission, values, and company culture to ensure a strong fit and alignment with your career goals.
Prepare for Technical Assessment (30 minutes): Brush up on cloud-native architecture, Kubernetes, and AWS services, and practice designing and deploying Kubernetes clusters on AWS EKS.

⚠️ Important Notice: This enhanced job description includes AI-generated insights and web development industry-standard assumptions. All details should be verified directly with the hiring organization before making application decisions.

Senior Site Reliability Engineer II