Application Site Reliability Engineer

Capital on Tap
Full_timeLondon, United Kingdom

📍 Job Overview

  • Job Title: Application Site Reliability Engineer
  • Company: Capital on Tap
  • Location: London, United Kingdom
  • Job Type: Hybrid (2 days per week in the office)
  • Category: DevOps, Site Reliability Engineering
  • Date Posted: June 17, 2025

🚀 Role Summary

  • Key Responsibilities:
    • Design, build, and monitor systems to maximize uptime and efficiency
    • Collaborate with platform teams to build reliable, scalable applications
    • Proactively address potential outages and performance issues
    • Implement structured monitoring and alerting to prevent incidents
    • Define service-level agreements (SLAs) and service-level indicators (SLIs) to ensure reliability
    • Work closely with the product team to launch new features

💻 Primary Responsibilities

  • Design and Implement Highly Available and Scalable Systems: Ensure the reliability and performance of the company's website or application by designing and implementing highly available and scalable systems.
  • Collaborate with Cross-Functional Teams: Define and establish service level objectives (SLOs) and service level agreements (SLAs) for critical systems by collaborating with cross-functional teams.
  • Monitor Systems and Applications: Proactively identify and resolve any performance bottlenecks or availability issues by monitoring systems and applications.
  • Develop and Maintain Monitoring Tools: Provide visibility into system health and performance by developing and maintaining monitoring tools, alerts, and dashboards.
  • Conduct Post-Incident Analyses: Identify root causes and implement preventive measures to avoid future incidents by conducting post-incident analyses.
  • Automate Repetitive Tasks: Improve efficiency and reduce manual intervention by automating repetitive tasks and processes.
  • Create and Maintain Documentation: Ensure optimal system performance and scalability by creating and maintaining documentation for system architecture, configuration, and troubleshooting procedures.
  • Perform Capacity Planning: Ensure optimal system performance and scalability by performing capacity planning and resource allocation.
  • Collaborate with Development Teams: Implement and deploy new features and enhancements while ensuring they meet reliability and performance standards by collaborating with development teams.
  • Stay Up to Date with Industry Best Practices: Stay informed about industry best practices, new technologies, and emerging trends in site reliability engineering.

🎓 Skills & Qualifications

Education: A relevant degree or equivalent experience in a related field.

Experience: Proven experience in managing a public cloud, preferably Azure.

Required Skills:

  • Experience in managing a public cloud (Azure advantageous)
  • Experience in Azure DevOps, Octopus, Flux, GitHub, or other CI/CD tools
  • Experience in Python, PowerShell, C#, or other scripting languages
  • Experience with Linux and Microsoft Systems
  • Excellent communication skills and ability to collaborate with multiple teams in an agile environment
  • Strong problem-solving and troubleshooting skills
  • Expertise in monitoring and logging tools (Datadog advantageous)
  • Experience with Kubernetes and containerization
  • Experience with setting and adjusting SLOs working with product teams

Preferred Skills:

  • Experience with IaC tools such as Terraform
  • Knowledge of service mesh technologies such as Istio
  • Experience with SQL databases

📊 Web Portfolio & Project Requirements

Portfolio Essentials:

  • Demonstrate experience in managing a public cloud and implementing CI/CD pipelines
  • Showcase problem-solving skills and incident management processes
  • Highlight experience with monitoring tools and alerting systems
  • Display expertise in Kubernetes and containerization

Technical Documentation:

  • Provide code quality, commenting, and documentation standards
  • Include version control, deployment processes, and server configuration
  • Demonstrate testing methodologies, performance metrics, and optimization techniques

💵 Compensation & Benefits

Salary Range: Competitive salary based on experience and industry standards for site reliability engineers in London.

Benefits:

  • Private healthcare including dental and opticians services through Vitality
  • Worldwide travel insurance through Vitality
  • Anniversary rewards (£250, £500, £750, 4-week fully paid sabbatical)
  • Salary sacrifice pension scheme up to 7% match
  • 28 days holiday (plus bank holidays)
  • Annual learning and wellbeing budget
  • Enhanced parental leave
  • Cycle to work scheme
  • Season ticket loan
  • 6 free therapy sessions per year
  • Dog-friendly offices
  • Free drinks and snacks in offices

🎯 Team & Company Context

🏢 Company Culture

Industry: Fintech, focusing on small business credit card and spend management.

Company Size: Medium-sized, with over 200,000 businesses served worldwide and a goal to help 1 million small businesses by 2030.

Founded: 2012, in London, United Kingdom.

Team Structure:

  • Embedded SRE model working closely with platform teams
  • Hybrid work environment with 1-2 days per week in the office

Development Methodology:

  • Agile and scaling environment, empowering innovation and problem-solving
  • Collaborative culture with cross-functional teams and continuous learning

Company Website: Capital on Tap

📝 Enhancement Note: Capital on Tap's mission and culture emphasize empowering small business owners and fostering innovation, collaboration, and continuous learning.

📈 Career & Growth Analysis

Web Technology Career Level: Mid-level site reliability engineer role with a focus on system design, monitoring, and incident management.

Reporting Structure: Embedded within platform teams, working closely with team leads and other SREs.

Technical Impact: Significant influence on system reliability, performance, and availability, ensuring optimal user experience and business continuity.

Growth Opportunities:

  • Technical leadership and architecture decision-making opportunities
  • Specialization in emerging technologies and trends in site reliability engineering
  • Career progression paths within the growing fintech company

📝 Enhancement Note: Capital on Tap's fast-growing and profitable nature presents numerous career growth opportunities for site reliability engineers.

🌐 Work Environment

Office Type: Hybrid, with 1-2 days per week in the office located in Shoreditch, London.

Office Location(s): London, United Kingdom.

Workspace Context:

  • Collaborative workspace with a focus on team interaction and knowledge sharing
  • Access to development tools, multiple monitors, and testing devices
  • Dog-friendly offices with a relaxed and casual work environment

Work Schedule: Flexible working hours with project deadline and maintenance window considerations.

📝 Enhancement Note: Capital on Tap's hybrid work arrangement and flexible working hours emphasize work-life balance and employee well-being.

📄 Application & Technical Interview Process

Interview Process:

  1. First stage: 30-minute intro and values call with a talent partner (video call)
  2. Second stage: 45-minute CV overview with the head of the department, engineering team leads, and/or product managers (video call)
  3. Final stage: 60-minute questions and scenario-based interview with the SRE team lead (video call)

Portfolio Review Tips:

  • Tailor the portfolio to showcase experience in managing public clouds, CI/CD pipelines, and incident management
  • Highlight problem-solving skills and technical expertise in monitoring tools and alerting systems
  • Include examples of Kubernetes and containerization experience

Technical Challenge Preparation:

  • Familiarize yourself with Azure DevOps, Octopus, Flux, GitHub, or other CI/CD tools
  • Brush up on Python, PowerShell, C#, or other scripting languages
  • Prepare for problem-solving and incident management scenarios

ATS Keywords: (Comprehensive list of web development and server administration-relevant keywords for resume optimization, organized by category)

  • Programming Languages: Python, PowerShell, C#, Azure DevOps, Octopus, Flux, GitHub
  • Web Frameworks & Libraries: N/A
  • Server Technologies: Linux, Microsoft Systems, Kubernetes, Containerization
  • Databases: SQL Databases
  • Tools: Monitoring Tools, Datadog, IaC Tools, Terraform, Service Mesh Technologies, Istio
  • Methodologies: Agile, Scrum, CI/CD, Site Reliability Engineering
  • Soft Skills: Problem-solving, Troubleshooting, Communication, Collaboration, Incident Management
  • Industry Terms: Public Cloud Management, Azure, SLOs, SLAs, IaC, IaC Tools, Terraform, Kubernetes, Containerization, Monitoring Tools, Datadog, Service Mesh Technologies, Istio

📌 Application Steps

To apply for this site reliability engineer position at Capital on Tap:

  1. Submit your application through the application link provided
  2. Tailor your resume to highlight relevant skills and experience in managing public clouds, CI/CD pipelines, and incident management
  3. Prepare a portfolio showcasing your experience in monitoring tools, alerting systems, and Kubernetes/containerization
  4. Familiarize yourself with Capital on Tap's company culture, mission, and values
  5. Research the company's fintech industry context and small business focus

⚠️ Important Notice: This enhanced job description includes AI-generated insights and web development/server administration industry-standard assumptions. All details should be verified directly with the hiring organization before making application decisions.

Application Requirements

Required skills include experience in managing a public cloud, CI/CD tools, and scripting languages. Strong problem-solving skills and expertise in monitoring tools are also essential.