Senior Site Reliability Engineer

Veeam Software
Full_timepune, India

📍 Job Overview

  • Job Title: Senior Site Reliability Engineer
  • Company: Veeam Software
  • Location: Bengaluru, Karnataka, India & Pune, Maharashtra, India
  • Job Type: On-site
  • Category: DevOps & Site Reliability Engineering

🚀 Role Summary

  • As a Senior Site Reliability Engineer, you will play a foundational role in shaping Veeam's new global Site Reliability Engineering function for their SaaS offering, the Veeam Data Cloud.
  • You will work with cross-functional teams to ensure high availability, performance, and operational excellence as Veeam grows globally.
  • This is a rare opportunity to make a significant impact across architecture, tooling, operations, and culture within Veeam's engineering teams.

💻 Primary Responsibilities

  • Reliability Engineering & Resilience:

    • Design and evolve infrastructure to be highly available, fault-tolerant, and scalable across public clouds (initially Azure, with future expansion plans to other providers).
    • Establish and maintain Service Level Indicators (SLIs), Service Level Objectives (SLOs), and error budgets to define and enforce reliability objectives.
    • Lead incident response, analysis, blameless postmortems, and sharing sessions to maximize learning across the entire engineering team and drive changes to the entire socio-technical engineering system.
  • Observability & Operational Excellence:

    • Drive adoption of deep observability practices, ensuring comprehensive and actionable telemetry, logs, metrics, and tracing.
    • Develop automation and self-healing tools to reduce toil and support Veeam’s fleet management strategy.
    • Participate in on-call rotations and lead operational excellence across the stack.
  • Engineering at Scale:

    • Contribute to infrastructure as code (IaC), CI/CD systems, deployment automation, and scalable config management.
    • Integrate and extend monitoring and chaos engineering tools to validate reliability assumptions under load and failure conditions.
    • Implement testing strategies, canary deployments, and release validation pipelines to protect production environments and allow teams to safely deliver new features as quickly as possible.
  • Collaboration & Culture:

    • Embed within product and platform teams to champion reliability from design through delivery.
    • Contribute to a learning culture focused on continuous improvement and proactive risk management.
    • Mentor engineers and advocate for DevOps/SRE best practices across global teams.

🎓 Skills & Qualifications

Education: A bachelor's degree in Computer Science, Engineering, or a related field.

Experience: 5+ years of hands-on experience in a Software Engineering role, with at least 2 years in Site Reliability, Platform Engineering, or similar.

Required Skills:

  • 5+ years of hands-on experience in a Software Engineering role with at least 2 years in Site Reliability, Platform Engineering, or similar.
  • Deep experience building systems on public cloud providers (Azure preferred).
  • Strong programming skills in JavaScript, Node, TypeScript, Go, Java, C#, or similar.
  • Proven track record in delivering monitoring, alerting, and observability tooling (e.g., Prometheus, Grafana, OpenTelemetry).
  • Experience with IaC tools like Terraform/Pulumi, and container orchestration (e.g., Kubernetes).
  • Solid understanding of distributed systems, cloud networking, and cloud-native system design.

Preferred Skills:

  • Experience working on large-scale B2B SaaS platforms.
  • Background in chaos engineering, resilience testing, performance testing, load testing, or incident learning programs.
  • Familiarity with compliance frameworks (e.g., ISO, SOC 2, GDPR, FEDRAMP/CMMC).

📊 Web Portfolio & Project Requirements

  • Portfolio Essentials:

    • Demonstrate your understanding of modern SRE principles and cloud architecture best practices.
    • Showcase your experience with public cloud providers, IaC tools, and container orchestration.
    • Highlight your ability to design scalable, resilient, and observable systems.
  • Technical Documentation:

    • Provide clear and concise code comments, documentation, and version control strategies.
    • Demonstrate your understanding of testing methodologies, performance metrics, and optimization techniques.
    • Showcase your ability to collaborate with cross-functional teams and contribute to a learning culture.

💵 Compensation & Benefits

Salary Range: Competitive compensation tailored to local markets in India.

Benefits:

  • Professional development resources, including internal mentorship, technical training platforms, and volunteer days.
  • Competitive compensation and benefits tailored to local markets in the US, Czechia, India, and Australia.

Working Hours: 40 hours per week, with flexibility for deployment windows, maintenance, and project deadlines.

🎯 Team & Company Context

Company Culture:

  • Industry: Data Resilience, focusing on data backup, recovery, portability, security, and intelligence.
  • Company Size: Medium (550-1,000 employees).
  • Founded: 2006.
  • Team Structure: Embed within product and platform teams to champion reliability from design through delivery, with a focus on collaboration and learning culture.

Development Methodology:

  • Veeam follows Agile methodologies, with a focus on sprint planning, code review, testing, and quality assurance practices.
  • They also employ deployment strategies, CI/CD pipelines, and server management techniques to ensure high availability and performance.

Company Website: Veeam Software

Career & Growth Analysis:

  • Web Technology Career Level: Senior Site Reliability Engineer, focusing on architecture, tooling, operations, and culture.
  • Reporting Structure: Embedded within product and platform teams, with a focus on cross-functional collaboration and learning culture.
  • Technical Impact: Drive high availability, performance, and operational excellence for Veeam's global customer base.

Growth Opportunities:

  • Web Technology Career Progression: Shape the architecture, tooling, operations, and culture of Veeam's global Site Reliability Engineering function.
  • Technical Skill Development: Gain experience with modern SRE principles, cloud architecture best practices, and emerging technologies.
  • Technical Leadership Potential: Mentor engineers and advocate for DevOps/SRE best practices across global teams.

Work Environment:

  • Office Type: On-site, with a focus on collaboration, learning, and continuous improvement.
  • Office Location(s): Bengaluru, Karnataka, India & Pune, Maharashtra, India.
  • Workspace Context: Collaborative workspace with multiple monitors, testing devices, and development tools available.
  • Work Schedule: Flexible work schedule with deployment windows, maintenance, and project deadlines.

📄 Application & Technical Interview Process

Interview Process:

  • Process Step 1: Technical preparation and coding/configuration assessment, focusing on problem-solving, performance optimization, and system design discussions.
  • Process Step 2: Web architecture expectations and system design discussion, with a focus on scalability, fault tolerance, and high availability.
  • Process Step 3: Web development team interaction and cultural fit assessment, with a focus on collaboration and learning culture.
  • Process Step 4: Final evaluation criteria and technical impact discussion, with a focus on operational excellence and global customer base.

Portfolio Review Tips:

  • Tip 1: Highlight your understanding of modern SRE principles, cloud architecture best practices, and public cloud provider experience.
  • Tip 2: Showcase your ability to design scalable, resilient, and observable systems, with a focus on performance optimization and accessibility standards.
  • Tip 3: Demonstrate your experience with IaC tools, container orchestration, and monitoring/alerting/observability tooling.
  • Tip 4: Emphasize your ability to collaborate with cross-functional teams, contribute to a learning culture, and drive operational excellence.

Technical Challenge Preparation:

  • Challenge Preparation 1: Familiarize yourself with Veeam's products, services, and customer base to better understand their data resilience needs.
  • Challenge Preparation 2: Brush up on your knowledge of public cloud providers, IaC tools, and container orchestration to ensure you're well-prepared for technical assessments.
  • Challenge Preparation 3: Review your understanding of distributed systems, cloud networking, and cloud-native system design to excel in architecture and design discussions.

ATS Keywords: (Comprehensive list of web development and server administration-relevant keywords for resume optimization, organized by category)

  • Programming Languages: JavaScript, Node, TypeScript, Go, Java, C#, etc.
  • Web Frameworks: Not applicable for this role.
  • Server Technologies: Azure, Kubernetes, Terraform/Pulumi, Prometheus, Grafana, OpenTelemetry, etc.
  • Databases: Not applicable for this role.
  • Tools: Not applicable for this role.
  • Methodologies: Agile, DevOps, SRE, IaC, CI/CD, etc.
  • Soft Skills: Communication, Collaboration, Problem-Solving, Leadership, Mentoring, etc.
  • Industry Terms: Site Reliability Engineering, DevOps, Cloud Architecture, Cloud-Native, etc.

📝 Enhancement Note: The provided ATS keywords are tailored to the Senior Site Reliability Engineer role, focusing on relevant web development and server administration skills, tools, and methodologies.

📌 Application Steps

To apply for this Senior Site Reliability Engineer position:

  1. Submit your application through the Veeam Software careers page.
  2. Tailor your resume to highlight your web development, server administration, and DevOps/SRE skills, with a focus on relevant keywords and project examples.
  3. Prepare for technical interviews by reviewing Veeam's products, services, and customer base, as well as brushing up on your knowledge of public cloud providers, IaC tools, and container orchestration.
  4. Showcase your understanding of modern SRE principles, cloud architecture best practices, and your ability to design scalable, resilient, and observable systems in your portfolio and technical assessments.
  5. Research Veeam's company culture, values, and mission to ensure a strong cultural fit and alignment with their learning and development opportunities.

📝 Important Notice: This enhanced job description includes AI-generated insights and web development/server administration industry-standard assumptions. All details should be verified directly with the hiring organization before making application decisions.

Application Requirements

Candidates should have 5+ years of experience in Software Engineering, with at least 2 years in Site Reliability or similar roles. Strong programming skills and experience with public cloud providers, monitoring tools, and IaC are essential.