Application Site Reliability Engineer at Capital on Tap

📍 Job Overview

Job Title: Application Site Reliability Engineer
Company: Capital on Tap
Location: London, United Kingdom
Job Type: Hybrid (2 days per week in the office)
Category: DevOps, Site Reliability Engineering
Date Posted: June 17, 2025

🚀 Role Summary

Key Responsibilities:
- Design, build, and monitor systems to maximize uptime and efficiency
- Collaborate with platform teams to build reliable, scalable applications
- Proactively address potential outages and performance issues
- Implement structured monitoring and alerting to prevent incidents
- Define service-level agreements (SLAs) and service-level indicators (SLIs) to ensure reliability
- Work closely with the product team to launch new features

💻 Primary Responsibilities

Design and Implement Highly Available and Scalable Systems: Ensure the reliability and performance of the company's website or application by designing and implementing highly available and scalable systems.
Collaborate with Cross-Functional Teams: Define and establish service level objectives (SLOs) and service level agreements (SLAs) for critical systems by collaborating with cross-functional teams.
Monitor Systems and Applications: Proactively identify and resolve any performance bottlenecks or availability issues by monitoring systems and applications.
Develop and Maintain Monitoring Tools: Provide visibility into system health and performance by developing and maintaining monitoring tools, alerts, and dashboards.
Conduct Post-Incident Analyses: Identify root causes and implement preventive measures to avoid future incidents by conducting post-incident analyses.
Automate Repetitive Tasks: Improve efficiency and reduce manual intervention by automating repetitive tasks and processes.
Create and Maintain Documentation: Ensure optimal system performance and scalability by creating and maintaining documentation for system architecture, configuration, and troubleshooting procedures.
Perform Capacity Planning: Ensure optimal system performance and scalability by performing capacity planning and resource allocation.
Collaborate with Development Teams: Implement and deploy new features and enhancements while ensuring they meet reliability and performance standards by collaborating with development teams.
Stay Up to Date with Industry Best Practices: Stay informed about industry best practices, new technologies, and emerging trends in site reliability engineering.

🎓 Skills & Qualifications

Education: A relevant degree or equivalent experience in a related field.

Experience: Proven experience in managing a public cloud, preferably Azure.

Required Skills:

Experience in managing a public cloud (Azure advantageous)
Experience in Azure DevOps, Octopus, Flux, GitHub, or other CI/CD tools
Experience in Python, PowerShell, C#, or other scripting languages
Experience with Linux and Microsoft Systems
Excellent communication skills and ability to collaborate with multiple teams in an agile environment
Strong problem-solving and troubleshooting skills
Expertise in monitoring and logging tools (Datadog advantageous)
Experience with Kubernetes and containerization
Experience with setting and adjusting SLOs working with product teams

Preferred Skills:

Experience with IaC tools such as Terraform
Knowledge of service mesh technologies such as Istio
Experience with SQL databases

📊 Web Portfolio & Project Requirements

Portfolio Essentials:

Demonstrate experience in managing a public cloud and implementing CI/CD pipelines
Showcase problem-solving skills and incident management processes
Highlight experience with monitoring tools and alerting systems
Display expertise in Kubernetes and containerization

Technical Documentation:

Provide code quality, commenting, and documentation standards
Include version control, deployment processes, and server configuration
Demonstrate testing methodologies, performance metrics, and optimization techniques

💵 Compensation & Benefits

Salary Range: Competitive salary based on experience and industry standards for site reliability engineers in London.

Benefits:

Private healthcare including dental and opticians services through Vitality
Worldwide travel insurance through Vitality
Anniversary rewards (£250, £500, £750, 4-week fully paid sabbatical)
Salary sacrifice pension scheme up to 7% match
28 days holiday (plus bank holidays)
Annual learning and wellbeing budget
Enhanced parental leave
Cycle to work scheme
Season ticket loan
6 free therapy sessions per year
Dog-friendly offices
Free drinks and snacks in offices

🎯 Team & Company Context

🏢 Company Culture

Industry: Fintech, focusing on small business credit card and spend management.

Company Size: Medium-sized, with over 200,000 businesses served worldwide and a goal to help 1 million small businesses by 2030.

Founded: 2012, in London, United Kingdom.

Team Structure:

Embedded SRE model working closely with platform teams
Hybrid work environment with 1-2 days per week in the office

Development Methodology:

Agile and scaling environment, empowering innovation and problem-solving
Collaborative culture with cross-functional teams and continuous learning

Company Website: Capital on Tap

📝 Enhancement Note: Capital on Tap's mission and culture emphasize empowering small business owners and fostering innovation, collaboration, and continuous learning.

📈 Career & Growth Analysis

Web Technology Career Level: Mid-level site reliability engineer role with a focus on system design, monitoring, and incident management.

Reporting Structure: Embedded within platform teams, working closely with team leads and other SREs.

Technical Impact: Significant influence on system reliability, performance, and availability, ensuring optimal user experience and business continuity.

Growth Opportunities:

Technical leadership and architecture decision-making opportunities
Specialization in emerging technologies and trends in site reliability engineering
Career progression paths within the growing fintech company

📝 Enhancement Note: Capital on Tap's fast-growing and profitable nature presents numerous career growth opportunities for site reliability engineers.

🌐 Work Environment

Office Type: Hybrid, with 1-2 days per week in the office located in Shoreditch, London.

Office Location(s): London, United Kingdom.

Workspace Context:

Collaborative workspace with a focus on team interaction and knowledge sharing
Access to development tools, multiple monitors, and testing devices
Dog-friendly offices with a relaxed and casual work environment

Work Schedule: Flexible working hours with project deadline and maintenance window considerations.

📝 Enhancement Note: Capital on Tap's hybrid work arrangement and flexible working hours emphasize work-life balance and employee well-being.

📄 Application & Technical Interview Process

Interview Process:

First stage: 30-minute intro and values call with a talent partner (video call)
Second stage: 45-minute CV overview with the head of the department, engineering team leads, and/or product managers (video call)
Final stage: 60-minute questions and scenario-based interview with the SRE team lead (video call)

Portfolio Review Tips:

Tailor the portfolio to showcase experience in managing public clouds, CI/CD pipelines, and incident management
Highlight problem-solving skills and technical expertise in monitoring tools and alerting systems
Include examples of Kubernetes and containerization experience

Technical Challenge Preparation:

Familiarize yourself with Azure DevOps, Octopus, Flux, GitHub, or other CI/CD tools
Brush up on Python, PowerShell, C#, or other scripting languages
Prepare for problem-solving and incident management scenarios

ATS Keywords: (Comprehensive list of web development and server administration-relevant keywords for resume optimization, organized by category)

Programming Languages: Python, PowerShell, C#, Azure DevOps, Octopus, Flux, GitHub
Web Frameworks & Libraries: N/A
Server Technologies: Linux, Microsoft Systems, Kubernetes, Containerization
Databases: SQL Databases
Tools: Monitoring Tools, Datadog, IaC Tools, Terraform, Service Mesh Technologies, Istio
Methodologies: Agile, Scrum, CI/CD, Site Reliability Engineering
Soft Skills: Problem-solving, Troubleshooting, Communication, Collaboration, Incident Management
Industry Terms: Public Cloud Management, Azure, SLOs, SLAs, IaC, IaC Tools, Terraform, Kubernetes, Containerization, Monitoring Tools, Datadog, Service Mesh Technologies, Istio

📌 Application Steps

To apply for this site reliability engineer position at Capital on Tap:

Submit your application through the application link provided
Tailor your resume to highlight relevant skills and experience in managing public clouds, CI/CD pipelines, and incident management
Prepare a portfolio showcasing your experience in monitoring tools, alerting systems, and Kubernetes/containerization
Familiarize yourself with Capital on Tap's company culture, mission, and values
Research the company's fintech industry context and small business focus

⚠️ Important Notice: This enhanced job description includes AI-generated insights and web development/server administration industry-standard assumptions. All details should be verified directly with the hiring organization before making application decisions.

Application Site Reliability Engineer