Senior Site Reliability Engineer II
📍 Job Overview
- Job Title: Senior Site Reliability Engineer II
- Company: Elsevier
- Location: London, Oxford
- Job Type: Full-Time
- Category: DevOps, Infrastructure
- Date Posted: 2025-07-03
- Experience Level: Mid-Senior level
- Remote Status: On-site
🚀 Role Summary
- Design and manage cloud platforms to ensure reliability and performance of complex systems.
- Collaborate with development and operations teams to optimize applications for cloud environments.
- Mentor junior team members and promote best practices in SRE, automation, and cloud architecture.
💻 Primary Responsibilities
-
Cloud Platform Design & Management:
- Design, deploy, and maintain highly available, scalable Kubernetes clusters on AWS EKS.
- Manage and optimize cross-portfolio cloud infrastructure using AWS services and supported organizational tooling.
- Develop and maintain Infrastructure as Code (IaC) solutions to automate provisioning and management of cloud and Kubernetes resources.
-
Automation & Incident Management:
- Write automation processes to streamline operational workflows, incident response, and infrastructure management.
- Implement CI/CD pipelines to facilitate deployments, testing, and validation.
- Monitor system health, instrument system components, troubleshoot issues, and perform root cause analysis.
-
Collaboration & Mentoring:
- Collaborate with development and operations teams to optimize applications for cloud environments.
- Mentor junior team members and promote best practices in SRE, automation, and cloud architecture.
- Stay current with industry trends, emerging technologies, and best practices for cloud-native and infrastructure management.
🎓 Skills & Qualifications
Education:
- Bachelor's degree in Computer Science, Engineering, or related field (or equivalent experience).
Experience:
- Good experience in SRE, DevOps, or cloud infrastructure roles.
- Relevant certifications (e.g., AWS Solutions Architect, CKA, Terraform Associate) are a plus.
Required Skills:
- Extensive experience deploying, managing, and troubleshooting containerized applications.
- Deep understanding of Kubernetes architecture, networking, security, storage, and operational best practices.
- Proven expertise with AWS services and architectural principles.
- Strong knowledge of AWS security, compliance, and best practices.
- Advanced skills in writing modular, reusable IaC components.
- Strong Python scripting skills for automation, tooling, and data processing.
- Ability to develop custom solutions for monitoring, automation, and incident management.
- Experience designing and maintaining CI/CD workflows using GitHub Actions.
- Experience supporting highly available, multi-regional critical environments.
- Proven ability to manage multiple portfolios and complex landscapes.
- Familiarity with monitoring tools such as NewRelic.
- Knowledge of security best practices, network policies, and enterprise-grade RBAC policies.
- Strong problem-solving, troubleshooting, and incident management skills.
- Effective communication and collaboration skills.
Preferred Skills:
- Experience with multi-cloud environments.
- Familiarity with Terraform or other IaC tools.
- Knowledge of container security best practices.
📊 Web Portfolio & Project Requirements
Portfolio Essentials:
- Demonstrate your ability to design, deploy, and manage Kubernetes clusters on AWS EKS.
- Showcase your automation skills with examples of streamlined operational workflows and incident response processes.
- Highlight your problem-solving skills with examples of troubleshooting and root cause analysis.
Technical Documentation:
- Provide documentation for your solutions, ensuring users have clear instructions and support to effectively implement and operate their systems.
- Include code comments, version control, and deployment processes in your portfolio.
💵 Compensation & Benefits
Salary Range:
- £70,000 - £90,000 per annum (based on experience and location)
Benefits:
- Annual Profit Share Bonus
- Comprehensive Pension Plan
- Home, Office or Commuting Allowance
- Generous Vacation Entitlement and option for Sabbatical Leave
- Maternity, Paternity, Adoption and Family Care Leave
- Internal Communities and Networks
- Recruitment Introduction Reward
Working Hours:
- Full-time (35-40 hours per week)
🎯 Team & Company Context
Company Culture:
- Elsevier is a global leader in information and analytics, helping researchers and healthcare professionals advance science and improve health outcomes.
- The company thrives on excellence, innovation, and a strong dedication to customers, employees, and communities.
Team Structure:
- The team combines software thinking and service operations to enable and run Elsevier’s large-scale, 24x7, distributed and fault-tolerant systems within agreed reliability objectives.
- The team works closely with development and operations teams to optimize applications for cloud environments.
Development Methodology:
- The team follows Agile methodologies, with a focus on sprint planning, code review, testing, and quality assurance practices.
- CI/CD pipelines are used for deployments, testing, and validation.
Company Website: Elsevier
📈 Career & Growth Analysis
Web Technology Career Level:
- This role is at the Mid-Senior level, focusing on designing and managing cloud platforms, mentoring junior team members, and driving best practices in SRE, automation, and cloud architecture.
Reporting Structure:
- The role reports directly to the SRE Manager and collaborates with development and operations teams.
Technical Impact:
- The role has a significant impact on Elsevier’s large-scale, 24x7 systems, ensuring high availability, performance, and rapid incident resolution.
- The role also influences the technical direction of the SRE team, driving best practices and mentoring junior team members.
🌐 Work Environment
Office Type:
- The role is based in London or Oxford, with on-site work required.
Office Location(s):
- London - London Wall
- Oxford
Workspace Context:
- The workspace is collaborative, with a focus on cross-functional integration between developers, designers, and stakeholders.
- The team uses modern development tools, multiple monitors, and testing devices to ensure high-quality solutions.
Work Schedule:
- The role requires a full-time commitment, with flexibility for deployment windows, maintenance, and project deadlines.
📄 Application & Technical Interview Process
Interview Process:
- Technical Phone Screen: Assess your understanding of Kubernetes, AWS, and automation skills.
- On-site Technical Deep Dive: Dive into your cloud architecture, automation, and incident management skills with hands-on exercises and case studies.
- Behavioral Interview: Evaluate your problem-solving, communication, and collaboration skills.
- Final Review: Review your qualifications, career goals, and cultural fit.
Portfolio Review Tips:
- Highlight your cloud architecture, automation, and incident management skills with live demos and responsive design standards.
- Include code quality demonstration and responsive design standards for this role.
Technical Challenge Preparation:
- Brush up on your Kubernetes, AWS, and automation skills with hands-on exercises and case studies.
- Practice problem-solving and communication skills for technical interviews.
🛠 Technology Stack & Web Infrastructure
Cloud Platform:
- AWS (EKS, RDS, Route 53, CloudFront, etc.)
Containerization:
- Kubernetes
Infrastructure as Code (IaC):
- Terraform (preferred) or other IaC tools
Monitoring & Logging:
- NewRelic or other monitoring tools
CI/CD:
- GitHub Actions or other CI/CD tools
Version Control:
- Git
Scripting:
- Python
Documentation:
- Markdown or other documentation tools
👥 Team Culture & Values
Web Development Values:
- Elsevier values innovation, collaboration, and a strong dedication to customers, employees, and communities.
- The SRE team emphasizes reliability, performance, and continuous improvement.
Collaboration Style:
- The team follows Agile methodologies, with a focus on cross-functional integration, code review culture, and peer programming practices.
- Knowledge sharing, technical mentoring, and continuous learning are encouraged.
⚡ Challenges & Growth Opportunities
Technical Challenges:
- Designing and managing highly available, scalable Kubernetes clusters on AWS EKS.
- Automating deployment pipelines, testing, and validation processes.
- Supporting multi-regional critical environments with high availability and rapid incident resolution.
Learning & Development Opportunities:
- Stay current with industry trends, emerging technologies, and best practices for cloud-native and infrastructure management.
- Develop your leadership skills by mentoring junior team members and driving best practices in SRE, automation, and cloud architecture.
- Contribute to Elsevier’s mission of advancing science and improving health outcomes for the benefit of society.
💡 Interview Preparation
Technical Questions:
- Cloud Architecture: Describe your approach to designing and managing highly available, scalable Kubernetes clusters on AWS EKS.
- Automation: Explain your automation processes for streamlining operational workflows, incident response, and infrastructure management.
- Incident Management: Walk through your incident management process, including monitoring, troubleshooting, and root cause analysis.
Company & Culture Questions:
- Company Culture: How do you align with Elsevier’s values and culture, particularly in the context of the SRE team?
- Team Dynamics: Describe your experience working in a collaborative, cross-functional team environment, and how you have contributed to team success.
Portfolio Presentation Strategy:
- Demonstrate your cloud architecture, automation, and incident management skills with live demos and responsive design standards.
- Include code quality demonstration and responsive design standards for this role.
📌 Application Steps
To apply for this Senior Site Reliability Engineer II position:
- Submit your application through the application link.
- Customize your resume with relevant web technology skills, experience, and portfolio highlights.
- Prepare for technical phone screens, on-site technical deep dives, and behavioral interviews.
- Research Elsevier’s company culture, mission, and values to ensure a strong cultural fit.
⚠️ Important Notice: This enhanced job description includes AI-generated insights and web technology industry-standard assumptions. All details should be verified directly with the hiring organization before making application decisions.
Application Requirements
Candidates should have extensive experience with containerized applications and a deep understanding of Kubernetes and AWS services. Strong automation skills and the ability to mentor junior team members are also essential.