Senior Site Reliability Engineer II

Remitly
Full_timeโ€ขLondon, United Kingdom

๐Ÿ“ Job Overview

  • Job Title: Senior Site Reliability Engineer II
  • Company: Remitly
  • Location: London, United Kingdom
  • Job Type: Full-Time
  • Category: DevOps, Site Reliability Engineering
  • Date Posted: 2025-07-03

๐Ÿš€ Role Summary

  • Design and manage cloud platforms to ensure reliability and performance of complex systems.
  • Collaborate with teams to manage cloud infrastructure and streamline operational workflows.
  • Mentor junior team members and promote best practices in SRE, automation, and cloud architecture.

๐Ÿ’ป Primary Responsibilities

  • Cloud Platform Management:

    • Design, deploy, and maintain highly available Kubernetes clusters on AWS EKS.
    • Manage and optimize cross-portfolio cloud infrastructure using AWS services and supported organizational tooling.
    • Develop and maintain Infrastructure as Code (IaC) solutions to automate provisioning and management of cloud and Kubernetes resources.
  • Automation and Workflow Optimization:

    • Write automation processes to streamline operational workflows, incident response, and infrastructure management.
    • Implement CI/CD pipelines to facilitate deployments, testing, and validation.
    • Support multi-regional critical infrastructure, ensuring high availability and rapid incident resolution.
  • Collaboration and Knowledge Sharing:

    • Collaborate with development and operations teams to optimize applications for cloud environments.
    • Maintain comprehensive documentation and best practice guides for solutions, ensuring users have clear instructions and support.
    • Monitor system health, instrument system components, troubleshoot issues, and perform root cause analysis.
    • Mentor junior team members and promote best practices in SRE, automation, and cloud architecture.

๐ŸŽ“ Skills & Qualifications

Education:

  • Bachelor's degree in Computer Science, Engineering, or related field (or equivalent experience).

Experience:

  • Extensive experience deploying, managing, and troubleshooting containerized applications.
  • Deep understanding of Kubernetes architecture, networking, security, storage, and operational best practices.
  • Proven expertise with AWS services and architectural principles.
  • Strong knowledge of AWS security, compliance, and best practices.
  • Advanced skills in writing modular, reusable IaC components.
  • Strong Python scripting skills for automation, tooling, and data processing.
  • Experience designing and maintaining CI/CD workflows using GitHub Actions.
  • Familiarity with monitoring tools such as NewRelic.
  • Knowledge of security best practices, network policies, and enterprise-grade RBAC policies.
  • Strong problem-solving, troubleshooting, and incident management skills.
  • Effective communication and collaboration skills.

Preferred Skills:

  • Relevant certifications (e.g., AWS Solutions Architect, CKA, Terraform Associate) are a plus.

๐Ÿ“Š Web Portfolio & Project Requirements

Portfolio Essentials:

  • Demonstrate expertise in cloud-native architecture, Kubernetes, and AWS services through past projects.
  • Showcase automation skills and the ability to streamline operational workflows.
  • Highlight experience managing multi-regional critical infrastructure and ensuring high availability.

Technical Documentation:

  • Provide detailed documentation for cloud infrastructure, Kubernetes clusters, and automation processes.
  • Include best practice guides and step-by-step instructions for deploying and managing cloud resources.
  • Demonstrate understanding of security best practices and compliance requirements.

๐Ÿ’ต Compensation & Benefits

Salary Range:

  • ยฃ80,000 - ยฃ100,000 per annum (based on experience and market research)

Benefits:

  • Annual Profit Share Bonus
  • Comprehensive Pension Plan
  • Home, Office or Commuting Allowance
  • Generous Vacation Entitlement
  • Option for Sabbatical Leave
  • Maternity Leave
  • Paternity Leave
  • Adoption Leave
  • Family Care Leave
  • Internal Communities and Networks
  • Recruitment Introduction Reward

Working Hours:

  • Full-time position with standard working hours (40 hours per week)

๐ŸŽฏ Team & Company Context

Company Culture:

  • Elsevier is a global leader in information and analytics, helping researchers and healthcare professionals advance science and improve health outcomes.
  • The company thrives on excellence, innovation, and a strong dedication to customers, employees, and communities.
  • Elsevier offers a vibrant, diverse, and collaborative team environment where employees are free to grow and contribute actively.

Team Structure:

  • The team combines software thinking and service operations to enable and run Elsevierโ€™s large-scale, 24x7, distributed, and fault-tolerant systems within agreed reliability objectives.
  • The team works closely with development and operations teams to optimize applications for cloud environments and ensure the fast flow of feature and service updates.

Development Methodology:

  • The team follows Agile methodologies, with a focus on continuous integration, delivery, and improvement.
  • They use GitHub Actions for CI/CD pipelines and NewRelic for monitoring and performance tracking.

Company Website:

๐Ÿ“ˆ Career & Growth Analysis

Web Technology Career Level:

  • Senior Site Reliability Engineer II role focuses on managing complex cloud infrastructure, mentoring junior team members, and driving best practices in SRE, automation, and cloud architecture.

Reporting Structure:

  • The role reports directly to the Site Reliability Engineering Manager and works closely with development and operations teams.

Technical Impact:

  • The Senior SRE II role has a significant impact on Elsevierโ€™s large-scale, 24x7 systems, ensuring high availability, performance, and rapid incident resolution.
  • The role also influences the fast flow of feature and service updates, enabling the company to innovate and adapt quickly to market demands.

๐ŸŒ Work Environment

Office Type:

  • Elsevier offers a flexible work environment, with the option to work from home, the office, or a combination of both.

Office Location(s):

  • London, United Kingdom (with remote work flexibility)

Workspace Context:

  • Elsevier provides a collaborative workspace with multiple monitors, testing devices, and development tools to support web development teams.
  • The company encourages cross-functional interaction between developers, designers, and stakeholders to foster innovation and user-centered design.

Work Schedule:

  • The role follows a standard full-time work schedule with flexible hours to accommodate project deadlines and maintenance windows.

๐Ÿ“„ Application & Technical Interview Process

Interview Process:

  1. Technical Assessment (1 hour): Evaluate the candidate's understanding of cloud-native architecture, Kubernetes, and AWS services through a hands-on technical assessment.
  2. Behavioral Interview (45 minutes): Assess the candidate's problem-solving skills, communication, and cultural fit with Elsevier.
  3. Final Interview (30 minutes): Discuss the candidate's career aspirations, motivation, and alignment with the company's mission and values.

Portfolio Review Tips:

  • Highlight past projects that demonstrate expertise in cloud-native architecture, Kubernetes, and AWS services.
  • Provide detailed documentation for cloud infrastructure, Kubernetes clusters, and automation processes.
  • Showcase experience managing multi-regional critical infrastructure and ensuring high availability.

Technical Challenge Preparation:

  • Brush up on cloud-native architecture, Kubernetes, and AWS services.
  • Practice designing and deploying Kubernetes clusters on AWS EKS.
  • Familiarize yourself with Elsevier's development methodologies and tools, such as GitHub Actions and NewRelic.

๐Ÿ›  Technology Stack & Web Infrastructure

Cloud Platform:

  • AWS EKS (Kubernetes on AWS)
  • AWS Services (EC2, RDS, ELB, etc.)
  • Infrastructure as Code (IaC) tools (Terraform, CloudFormation)

Containerization & Orchestration:

  • Kubernetes
  • Docker

Monitoring & Logging:

  • NewRelic
  • ELK Stack (Elasticsearch, Logstash, Kibana)

CI/CD & Automation:

  • GitHub Actions
  • Jenkins

Programming Languages:

  • Python
  • Bash

Databases:

  • Amazon RDS (PostgreSQL, MySQL)
  • Amazon DynamoDB

Caching:

  • Amazon ElastiCache
  • Redis

๐Ÿ‘ฅ Team Culture & Values

Web Development Values:

  • Elsevier values innovation, collaboration, and user-centered design in web development.
  • The company emphasizes continuous learning, knowledge sharing, and promoting best practices in SRE, automation, and cloud architecture.

Collaboration Style:

  • Elsevier fosters a collaborative work environment, with cross-functional teams working together to deliver high-quality web products and services.
  • The company encourages code reviews, peer programming, and knowledge sharing to drive technical excellence and innovation.

๐ŸŒ Challenges & Growth Opportunities

Technical Challenges:

  • Designing and managing highly available, scalable Kubernetes clusters on AWS EKS.
  • Optimizing cloud infrastructure for performance, cost-effectiveness, and security.
  • Automating deployment pipelines, testing, and validation processes.
  • Ensuring multi-regional critical infrastructure and rapid incident resolution.

Learning & Development Opportunities:

  • Elsevier offers opportunities for career progression, technical skill development, and leadership roles in SRE, automation, and cloud architecture.
  • The company supports conference attendance, certification, and community involvement to help employees stay current with emerging technologies and best practices.

๐Ÿ’ก Interview Preparation

Technical Questions:

  1. Cloud Architecture (30 minutes): Design a highly available, scalable Kubernetes cluster on AWS EKS, considering best practices, security, and cost-effectiveness.
  2. Incident Management (30 minutes): Walk through a recent incident you've handled, describing your approach, tools used, and the outcome.
  3. Automation (30 minutes): Explain a complex automation task you've completed and the tools and techniques you used to streamline the process.

Company & Culture Questions:

  1. Company Mission (15 minutes): Explain how your work aligns with Elsevier's mission to advance science and improve health outcomes.
  2. Team Dynamics (15 minutes): Describe how you've worked effectively in a remote or hybrid team environment, and how you've contributed to a positive team culture.

Portfolio Presentation Strategy:

  • Highlight past projects that demonstrate expertise in cloud-native architecture, Kubernetes, and AWS services.
  • Showcase automation skills and experience managing multi-regional critical infrastructure.
  • Emphasize your ability to collaborate with teams, mentor junior team members, and drive best practices in SRE, automation, and cloud architecture.

๐Ÿ“Œ Application Steps

To apply for this Senior Site Reliability Engineer II position:

  1. Update Your Resume (15 minutes): Tailor your resume to highlight relevant skills, experience, and achievements in cloud-native architecture, Kubernetes, and AWS services.
  2. Prepare Your Portfolio (30 minutes): Curate a portfolio showcasing your expertise in cloud-native architecture, Kubernetes, and AWS services, with a focus on automation, infrastructure management, and incident response.
  3. Research Elsevier (15 minutes): Familiarize yourself with Elsevier's mission, values, and company culture to ensure a strong fit and alignment with your career goals.
  4. Prepare for Technical Assessment (30 minutes): Brush up on cloud-native architecture, Kubernetes, and AWS services, and practice designing and deploying Kubernetes clusters on AWS EKS.

โš ๏ธ Important Notice: This enhanced job description includes AI-generated insights and web development industry-standard assumptions. All details should be verified directly with the hiring organization before making application decisions.

Application Requirements

Candidates should have extensive experience with containerized applications and a deep understanding of Kubernetes and AWS services. Strong automation skills and the ability to mentor junior team members are also essential.