Senior Site Reliability Engineer II
๐ Job Overview
- Job Title: Senior Site Reliability Engineer II
- Company: Remitly
- Location: London, United Kingdom
- Job Type: Full-Time
- Category: DevOps, Site Reliability Engineering
- Date Posted: 2025-07-03
๐ Role Summary
- Design and manage cloud platforms to ensure reliability and performance of complex systems.
- Collaborate with teams to manage cloud infrastructure and streamline operational workflows.
- Mentor junior team members and promote best practices in SRE, automation, and cloud architecture.
๐ป Primary Responsibilities
-
Cloud Platform Management:
- Design, deploy, and maintain highly available Kubernetes clusters on AWS EKS.
- Manage and optimize cross-portfolio cloud infrastructure using AWS services and supported organizational tooling.
- Develop and maintain Infrastructure as Code (IaC) solutions to automate provisioning and management of cloud and Kubernetes resources.
-
Automation and Workflow Optimization:
- Write automation processes to streamline operational workflows, incident response, and infrastructure management.
- Implement CI/CD pipelines to facilitate deployments, testing, and validation.
- Support multi-regional critical infrastructure, ensuring high availability and rapid incident resolution.
-
Collaboration and Knowledge Sharing:
- Collaborate with development and operations teams to optimize applications for cloud environments.
- Maintain comprehensive documentation and best practice guides for solutions, ensuring users have clear instructions and support.
- Monitor system health, instrument system components, troubleshoot issues, and perform root cause analysis.
- Mentor junior team members and promote best practices in SRE, automation, and cloud architecture.
๐ Skills & Qualifications
Education:
- Bachelor's degree in Computer Science, Engineering, or related field (or equivalent experience).
Experience:
- Extensive experience deploying, managing, and troubleshooting containerized applications.
- Deep understanding of Kubernetes architecture, networking, security, storage, and operational best practices.
- Proven expertise with AWS services and architectural principles.
- Strong knowledge of AWS security, compliance, and best practices.
- Advanced skills in writing modular, reusable IaC components.
- Strong Python scripting skills for automation, tooling, and data processing.
- Experience designing and maintaining CI/CD workflows using GitHub Actions.
- Familiarity with monitoring tools such as NewRelic.
- Knowledge of security best practices, network policies, and enterprise-grade RBAC policies.
- Strong problem-solving, troubleshooting, and incident management skills.
- Effective communication and collaboration skills.
Preferred Skills:
- Relevant certifications (e.g., AWS Solutions Architect, CKA, Terraform Associate) are a plus.
๐ Web Portfolio & Project Requirements
Portfolio Essentials:
- Demonstrate expertise in cloud-native architecture, Kubernetes, and AWS services through past projects.
- Showcase automation skills and the ability to streamline operational workflows.
- Highlight experience managing multi-regional critical infrastructure and ensuring high availability.
Technical Documentation:
- Provide detailed documentation for cloud infrastructure, Kubernetes clusters, and automation processes.
- Include best practice guides and step-by-step instructions for deploying and managing cloud resources.
- Demonstrate understanding of security best practices and compliance requirements.
๐ต Compensation & Benefits
Salary Range:
- ยฃ80,000 - ยฃ100,000 per annum (based on experience and market research)
Benefits:
- Annual Profit Share Bonus
- Comprehensive Pension Plan
- Home, Office or Commuting Allowance
- Generous Vacation Entitlement
- Option for Sabbatical Leave
- Maternity Leave
- Paternity Leave
- Adoption Leave
- Family Care Leave
- Internal Communities and Networks
- Recruitment Introduction Reward
Working Hours:
- Full-time position with standard working hours (40 hours per week)
๐ฏ Team & Company Context
Company Culture:
- Elsevier is a global leader in information and analytics, helping researchers and healthcare professionals advance science and improve health outcomes.
- The company thrives on excellence, innovation, and a strong dedication to customers, employees, and communities.
- Elsevier offers a vibrant, diverse, and collaborative team environment where employees are free to grow and contribute actively.
Team Structure:
- The team combines software thinking and service operations to enable and run Elsevierโs large-scale, 24x7, distributed, and fault-tolerant systems within agreed reliability objectives.
- The team works closely with development and operations teams to optimize applications for cloud environments and ensure the fast flow of feature and service updates.
Development Methodology:
- The team follows Agile methodologies, with a focus on continuous integration, delivery, and improvement.
- They use GitHub Actions for CI/CD pipelines and NewRelic for monitoring and performance tracking.
Company Website:
๐ Career & Growth Analysis
Web Technology Career Level:
- Senior Site Reliability Engineer II role focuses on managing complex cloud infrastructure, mentoring junior team members, and driving best practices in SRE, automation, and cloud architecture.
Reporting Structure:
- The role reports directly to the Site Reliability Engineering Manager and works closely with development and operations teams.
Technical Impact:
- The Senior SRE II role has a significant impact on Elsevierโs large-scale, 24x7 systems, ensuring high availability, performance, and rapid incident resolution.
- The role also influences the fast flow of feature and service updates, enabling the company to innovate and adapt quickly to market demands.
๐ Work Environment
Office Type:
- Elsevier offers a flexible work environment, with the option to work from home, the office, or a combination of both.
Office Location(s):
- London, United Kingdom (with remote work flexibility)
Workspace Context:
- Elsevier provides a collaborative workspace with multiple monitors, testing devices, and development tools to support web development teams.
- The company encourages cross-functional interaction between developers, designers, and stakeholders to foster innovation and user-centered design.
Work Schedule:
- The role follows a standard full-time work schedule with flexible hours to accommodate project deadlines and maintenance windows.
๐ Application & Technical Interview Process
Interview Process:
- Technical Assessment (1 hour): Evaluate the candidate's understanding of cloud-native architecture, Kubernetes, and AWS services through a hands-on technical assessment.
- Behavioral Interview (45 minutes): Assess the candidate's problem-solving skills, communication, and cultural fit with Elsevier.
- Final Interview (30 minutes): Discuss the candidate's career aspirations, motivation, and alignment with the company's mission and values.
Portfolio Review Tips:
- Highlight past projects that demonstrate expertise in cloud-native architecture, Kubernetes, and AWS services.
- Provide detailed documentation for cloud infrastructure, Kubernetes clusters, and automation processes.
- Showcase experience managing multi-regional critical infrastructure and ensuring high availability.
Technical Challenge Preparation:
- Brush up on cloud-native architecture, Kubernetes, and AWS services.
- Practice designing and deploying Kubernetes clusters on AWS EKS.
- Familiarize yourself with Elsevier's development methodologies and tools, such as GitHub Actions and NewRelic.
๐ Technology Stack & Web Infrastructure
Cloud Platform:
- AWS EKS (Kubernetes on AWS)
- AWS Services (EC2, RDS, ELB, etc.)
- Infrastructure as Code (IaC) tools (Terraform, CloudFormation)
Containerization & Orchestration:
- Kubernetes
- Docker
Monitoring & Logging:
- NewRelic
- ELK Stack (Elasticsearch, Logstash, Kibana)
CI/CD & Automation:
- GitHub Actions
- Jenkins
Programming Languages:
- Python
- Bash
Databases:
- Amazon RDS (PostgreSQL, MySQL)
- Amazon DynamoDB
Caching:
- Amazon ElastiCache
- Redis
๐ฅ Team Culture & Values
Web Development Values:
- Elsevier values innovation, collaboration, and user-centered design in web development.
- The company emphasizes continuous learning, knowledge sharing, and promoting best practices in SRE, automation, and cloud architecture.
Collaboration Style:
- Elsevier fosters a collaborative work environment, with cross-functional teams working together to deliver high-quality web products and services.
- The company encourages code reviews, peer programming, and knowledge sharing to drive technical excellence and innovation.
๐ Challenges & Growth Opportunities
Technical Challenges:
- Designing and managing highly available, scalable Kubernetes clusters on AWS EKS.
- Optimizing cloud infrastructure for performance, cost-effectiveness, and security.
- Automating deployment pipelines, testing, and validation processes.
- Ensuring multi-regional critical infrastructure and rapid incident resolution.
Learning & Development Opportunities:
- Elsevier offers opportunities for career progression, technical skill development, and leadership roles in SRE, automation, and cloud architecture.
- The company supports conference attendance, certification, and community involvement to help employees stay current with emerging technologies and best practices.
๐ก Interview Preparation
Technical Questions:
- Cloud Architecture (30 minutes): Design a highly available, scalable Kubernetes cluster on AWS EKS, considering best practices, security, and cost-effectiveness.
- Incident Management (30 minutes): Walk through a recent incident you've handled, describing your approach, tools used, and the outcome.
- Automation (30 minutes): Explain a complex automation task you've completed and the tools and techniques you used to streamline the process.
Company & Culture Questions:
- Company Mission (15 minutes): Explain how your work aligns with Elsevier's mission to advance science and improve health outcomes.
- Team Dynamics (15 minutes): Describe how you've worked effectively in a remote or hybrid team environment, and how you've contributed to a positive team culture.
Portfolio Presentation Strategy:
- Highlight past projects that demonstrate expertise in cloud-native architecture, Kubernetes, and AWS services.
- Showcase automation skills and experience managing multi-regional critical infrastructure.
- Emphasize your ability to collaborate with teams, mentor junior team members, and drive best practices in SRE, automation, and cloud architecture.
๐ Application Steps
To apply for this Senior Site Reliability Engineer II position:
- Update Your Resume (15 minutes): Tailor your resume to highlight relevant skills, experience, and achievements in cloud-native architecture, Kubernetes, and AWS services.
- Prepare Your Portfolio (30 minutes): Curate a portfolio showcasing your expertise in cloud-native architecture, Kubernetes, and AWS services, with a focus on automation, infrastructure management, and incident response.
- Research Elsevier (15 minutes): Familiarize yourself with Elsevier's mission, values, and company culture to ensure a strong fit and alignment with your career goals.
- Prepare for Technical Assessment (30 minutes): Brush up on cloud-native architecture, Kubernetes, and AWS services, and practice designing and deploying Kubernetes clusters on AWS EKS.
โ ๏ธ Important Notice: This enhanced job description includes AI-generated insights and web development industry-standard assumptions. All details should be verified directly with the hiring organization before making application decisions.
Application Requirements
Candidates should have extensive experience with containerized applications and a deep understanding of Kubernetes and AWS services. Strong automation skills and the ability to mentor junior team members are also essential.