Lead Site Reliability Engineer - DevOps

Qualys
Full_timeβ€’Pune, India

πŸ“ Job Overview

  • Job Title: Lead Site Reliability Engineer - DevOps
  • Company: Qualys
  • Location: Pune, Mahārāshtra, India
  • Job Type: Full-time
  • Category: DevOps Engineering
  • Date Posted: June 27, 2025
  • Experience Level: 5-10 years
  • Remote Status: On-site

πŸš€ Role Summary

  • Lead the development and maintenance of Qualys' Cloud Platform & Middleware technologies, ensuring reliability, performance, and scalability.
  • Collaborate with a team of engineers and architects to build, deploy, and operate scalable, distributed, and fault-tolerant systems.
  • Drive automation, monitoring, alerting, testing, and deployment processes to optimize day-to-day work.
  • Propose and implement improvements to systems and processes, with a focus on capacity planning, configuration management, and performance tuning.

πŸ“ Enhancement Note: This role requires a strong background in both software development and systems engineering, with a focus on cloud platforms and distributed systems.

πŸ’» Primary Responsibilities

  • System Design & Development: Co-develop and participate in the full lifecycle development of cloud platform services, from inception to improvement, applying scientific principles.
  • System Reliability & Performance: Increase the effectiveness, reliability, and performance of cloud platform technologies by identifying key indicators, automating changes, and evaluating results.
  • Incident Response & Resolution: Lead incident response and participate in on-call rotations. Write detailed postmortem analysis reports, focusing on root cause analysis and improvement.
  • Process Improvement: Propose and drive efficiencies in systems and processes related to capacity planning, configuration management, scaling services, performance tuning, monitoring, alerting, and root cause analysis.
  • Collaboration & Ownership: Participate in the development process, supporting new features, services, and releases. Hold an ownership mindset for the cloud platform technologies.
  • Automation & Deployment: Develop tools and automate processes for large-scale provisioning and deployment of cloud platform technologies.

πŸ“ Enhancement Note: This role requires a deep understanding of distributed systems, cloud platforms, and software development best practices to ensure the reliability and performance of Qualys' technologies.

πŸŽ“ Skills & Qualifications

Education: BS/MS degree in Computer Science, Applied Math, or a related field.

Experience: 3+ years of relevant experience in running distributed systems at scale in production.

Required Skills:

  • Expertise in one or more programming languages: Java, Python, or Go.
  • Proficiency in writing bash scripts.
  • Good understanding of SQL and NoSQL systems.
  • Good understanding of systems programming (network stack, file system, OS services).
  • Understanding of network elements such as firewalls, load balancers, DNS, NAT, TLS/SSL, VLANs, etc.
  • Skilled in identifying performance bottlenecks, anomalous system behavior, and determining the root cause of incidents.
  • Knowledge of JVM concepts like garbage collection, heap, stack, profiling, class loading, etc.
  • Knowledge of best practices related to security, performance, high-availability, and disaster recovery.
  • Proven record of handling production issues, planning escalation procedures, conducting post-mortems, impact analysis, risk assessments, and other related procedures.
  • Ability to drive results and set priorities independently.

Preferred Skills:

  • Experience with managing large-scale deployments of search engines (e.g., Elasticsearch), message-oriented middleware (e.g., Kafka), RDBMS systems (e.g., Oracle), NoSQL databases (e.g., Cassandra), and in-memory caching (e.g., Redis, Memcached).
  • Experience with container and orchestration technologies (e.g., Docker, Kubernetes).
  • Experience with monitoring tools (e.g., Graphite, Grafana, Prometheus).
  • Experience with Hashicorp technologies (e.g., Consul, Vault, Terraform, Vagrant).
  • Experience with configuration management tools (e.g., Chef, Puppet, Ansible).
  • In-depth experience with continuous integration and continuous deployment pipelines.
  • Exposure to Maven, Ant, or Gradle for builds.

πŸ“ Enhancement Note: This role requires a strong technical background with a focus on distributed systems, cloud platforms, and software development. Preferred skills indicate a strong focus on cloud-native technologies and DevOps best practices.

πŸ“Š Web Portfolio & Project Requirements

Portfolio Essentials:

  • Demonstrate experience with large-scale distributed systems, cloud platforms, and software development projects.
  • Showcase your ability to automate processes, monitor systems, and respond to incidents.
  • Highlight your understanding of system design, capacity planning, and performance tuning.

Technical Documentation:

  • Document your approach to system design, capacity planning, and performance tuning for large-scale distributed systems.
  • Include examples of incident response, postmortem analysis, and process improvement.
  • Demonstrate your understanding of best practices related to security, high-availability, and disaster recovery.

πŸ“ Enhancement Note: This role requires a strong focus on system design, automation, and incident response. Your portfolio should demonstrate your ability to manage and improve large-scale distributed systems.

πŸ’΅ Compensation & Benefits

Salary Range: INR 1,200,000 - 1,800,000 per annum (Based on experience and qualifications)

Benefits:

  • Competitive health, dental, and vision insurance plans.
  • Retirement savings plans with company matching.
  • Generous time-off policies, including vacation, sick leave, and holidays.
  • Employee stock purchase plan.
  • Tuition reimbursement and professional development opportunities.
  • On-site gym, cafeteria, and other amenities.

Working Hours: 40 hours per week, with flexible hours and on-call rotations as needed.

πŸ“ Enhancement Note: The salary range is estimated based on market research for similar roles in the Pune, India area. Benefits are typical for a large, multinational corporation and may vary based on individual circumstances.

🎯 Team & Company Context

🏒 Company Culture

Industry: Cybersecurity and compliance software.

Company Size: Medium (1,001-5,000 employees)

Founded: 1999

Team Structure:

  • The DevOps team at Qualys consists of Site Reliability Engineers, DevOps Engineers, and Cloud Engineers.
  • The team follows an Agile/Scrum methodology, with regular sprint planning, daily stand-ups, and retrospectives.
  • Cross-functional collaboration with development, QA, and product management teams is essential for success.

Development Methodology:

  • Qualys uses a continuous integration and continuous deployment (CI/CD) pipeline for automated testing, building, and deployment of software.
  • The team follows best practices for version control, code reviews, and quality assurance.
  • Infrastructure as Code (IaC) is used to manage and provision cloud resources.

Company Website: www.qualys.com

πŸ“ Enhancement Note: Qualys is a well-established company in the cybersecurity industry, with a medium-sized team focused on cloud platforms and middleware technologies. The team follows Agile methodologies and emphasizes cross-functional collaboration.

πŸ“ˆ Career & Growth Analysis

Web Technology Career Level: Lead Site Reliability Engineer - DevOps (Senior-level role with significant technical influence and leadership responsibilities)

Reporting Structure: Reports directly to the Director of Site Reliability Engineering and collaborates with other engineering and architecture teams.

Technical Impact: Leads the development and maintenance of Qualys' Cloud Platform & Middleware technologies, ensuring reliability, performance, and scalability. Drives process improvements and sets technical standards for the team.

Growth Opportunities:

  • Technical Leadership: Grow into a Principal Engineer or Architecture role, focusing on technical strategy and mentoring other engineers.
  • Management: Transition into a management role, leading a team of Site Reliability Engineers or DevOps Engineers.
  • Specialization: Deepen expertise in specific cloud platforms, technologies, or domains, becoming a subject matter expert.

πŸ“ Enhancement Note: This role offers significant growth opportunities, both in technical leadership and management. The ideal candidate will have a strong technical background and a desire to take on increasing levels of responsibility.

🌐 Work Environment

Office Type: Modern, collaborative office space with on-site amenities, including a gym, cafeteria, and game room.

Office Location(s): Pune, India

Workspace Context:

  • The workspace is designed to foster collaboration and innovation, with open-plan offices and dedicated team spaces.
  • Multiple monitors and testing devices are provided to support development and debugging activities.
  • The team encourages knowledge sharing, technical mentoring, and continuous learning.

Work Schedule: Standard business hours, with flexible hours and on-call rotations as needed. Project deadlines and maintenance windows may require additional availability.

πŸ“ Enhancement Note: The work environment at Qualys is designed to support collaboration and innovation, with a focus on knowledge sharing and continuous learning. The team encourages flexible hours and on-call rotations to ensure system reliability and performance.

πŸ“„ Application & Technical Interview Process

Interview Process:

  1. Phone Screen (30 minutes): A brief conversation to assess communication skills, cultural fit, and initial technical fit.
  2. Technical Deep Dive (60-90 minutes): A detailed discussion of your technical background, experience with distributed systems, and problem-solving skills. Expect questions on system design, performance tuning, and incident response.
  3. On-site Interview (4-5 hours): A series of interviews with team members, focusing on your technical skills, cultural fit, and alignment with Qualys' mission and values. Expect a mix of behavioral, technical, and case study questions.
  4. Final Decision: A decision will be made based on the results of the interview process and your overall fit for the role.

Portfolio Review Tips:

  • Highlight your experience with large-scale distributed systems, cloud platforms, and software development projects.
  • Showcase your ability to automate processes, monitor systems, and respond to incidents.
  • Demonstrate your understanding of system design, capacity planning, and performance tuning.

Technical Challenge Preparation:

  • Brush up on your knowledge of distributed systems, cloud platforms, and software development best practices.
  • Familiarize yourself with Qualys' products and services, and be prepared to discuss how your skills and experience align with the company's mission and values.
  • Practice problem-solving exercises and be ready to discuss your approach to system design, capacity planning, and performance tuning.

ATS Keywords: (See the comprehensive list at the end of this document)

πŸ“ Enhancement Note: The interview process at Qualys is designed to assess your technical skills, cultural fit, and alignment with the company's mission and values. Expect a mix of behavioral, technical, and case study questions, with a focus on your experience with distributed systems, cloud platforms, and software development.

πŸ›  Technology Stack & Web Infrastructure

Frontend Technologies: (Not applicable for this role)

Backend & Server Technologies:

  • Java
  • Python
  • Go
  • Bash Scripting
  • SQL
  • NoSQL (e.g., Cassandra)
  • Message-oriented middleware (e.g., Kafka)
  • Search engines (e.g., Elasticsearch)
  • In-memory caching (e.g., Redis, Memcached)

Development & DevOps Tools:

  • Containerization (e.g., Docker)
  • Orchestration (e.g., Kubernetes)
  • Configuration management (e.g., Ansible, Puppet)
  • Infrastructure as Code (e.g., Terraform)
  • Monitoring (e.g., Prometheus, Grafana)
  • CI/CD pipelines (e.g., Jenkins, GitLab CI)
  • Version control (e.g., Git)
  • Cloud platforms (e.g., AWS, GCP, Azure)

πŸ“ Enhancement Note: This role requires a strong background in software development and systems engineering, with a focus on cloud platforms and distributed systems. The technology stack includes a mix of programming languages, databases, and cloud platforms, with a strong emphasis on DevOps tools and best practices.

πŸ‘₯ Team Culture & Values

Web Development Values:

  • Reliability: Qualys values reliability above all else, ensuring that our products and services are always available and performant.
  • Innovation: We encourage continuous learning and innovation, driving improvements in our technologies and processes.
  • Collaboration: We work together to achieve our goals, fostering a culture of teamwork and knowledge sharing.
  • Customer Focus: We prioritize our customers' needs, ensuring that our technologies meet their evolving requirements.

Collaboration Style:

  • Qualys follows an Agile/Scrum methodology, with regular sprint planning, daily stand-ups, and retrospectives.
  • The team encourages cross-functional collaboration, with regular communication and feedback between engineering, QA, and product management teams.
  • Knowledge sharing, technical mentoring, and continuous learning are essential for success.

πŸ“ Enhancement Note: Qualys values reliability, innovation, collaboration, and customer focus. The team follows an Agile/Scrum methodology and encourages cross-functional collaboration, knowledge sharing, and continuous learning.

⚑ Challenges & Growth Opportunities

Technical Challenges:

  • Scalability: Design and implement scalable, distributed, and fault-tolerant systems to support Qualys' growing customer base.
  • Performance: Identify and address performance bottlenecks, optimizing system performance and user experience.
  • Incident Response: Lead incident response and resolution, minimizing downtime and ensuring business continuity.
  • Process Improvement: Propose and drive efficiencies in systems and processes, improving capacity planning, configuration management, and deployment automation.

Learning & Development Opportunities:

  • Technical Skills: Deepen your expertise in cloud platforms, distributed systems, and software development best practices.
  • Leadership: Develop your leadership skills, mentoring other engineers and driving technical strategy.
  • Certifications: Pursue relevant certifications, such as AWS, GCP, or Azure certifications, to enhance your technical skills and knowledge.
  • Community Involvement: Engage with the local tech community, attending meetups, conferences, and other events to expand your network and learn from other professionals.

πŸ“ Enhancement Note: This role presents significant technical challenges and growth opportunities. The ideal candidate will have a strong background in distributed systems, cloud platforms, and software development, with a desire to take on increasing levels of responsibility and leadership.

πŸ’‘ Interview Preparation

Technical Questions:

  • System Design: Describe your approach to designing scalable, distributed, and fault-tolerant systems. How do you ensure high availability and performance?
  • Performance Tuning: How do you identify and address performance bottlenecks in large-scale distributed systems? What tools and techniques do you use?
  • Incident Response: Walk us through your process for responding to incidents, from detection to resolution and postmortem analysis. How do you ensure that your systems are resilient and can withstand failures?
  • Process Improvement: How do you drive efficiencies in systems and processes? What metrics do you use to measure success, and how do you continuously improve over time?

Company & Culture Questions:

  • Company Mission: How does this role align with Qualys' mission to make the world's assets secure and compliant? How do you contribute to our customers' success?
  • Team Dynamics: How do you work effectively in a collaborative, cross-functional team environment? How do you handle conflicts or differing opinions?
  • Customer Focus: How do you ensure that your work aligns with our customers' needs and expectations? How do you gather and incorporate feedback into your projects?

Portfolio Presentation Strategy:

  • System Design: Present a case study of a large-scale distributed system you've designed and implemented. Walk us through your approach to system design, capacity planning, and performance tuning.
  • Incident Response: Describe a significant incident you've responded to, from detection to resolution and postmortem analysis. Highlight your leadership and problem-solving skills, as well as your ability to work effectively under pressure.
  • Process Improvement: Present a process improvement initiative you've led, from identification to implementation and evaluation. Highlight your ability to drive change, measure success, and continuously improve over time.

πŸ“ Enhancement Note: The interview process at Qualys is designed to assess your technical skills, cultural fit, and alignment with the company's mission and values. Expect a mix of behavioral, technical, and case study questions, with a focus on your experience with distributed systems, cloud platforms, and software development.

πŸ“Œ Application Steps

To apply for this Lead Site Reliability Engineer - DevOps position at Qualys:

  1. Submit your application through the Qualys careers portal.
  2. Customize your resume and portfolio to highlight your experience with large-scale distributed systems, cloud platforms, and software development projects.
  3. Prepare for the technical interview process, focusing on system design, performance tuning, incident response, and process improvement.
  4. Research Qualys' products, services, and company culture to ensure a strong fit for your career goals and technical skills.

⚠️ Important Notice: This enhanced job description includes AI-generated insights and web development/server administration industry-standard assumptions. All details should be verified directly with the hiring organization before making application decisions.


ATS Keywords:

Programming Languages:

  • Java
  • Python
  • Go
  • Bash Scripting
  • SQL
  • NoSQL (e.g., Cassandra)
  • Message-oriented middleware (e.g., Kafka)
  • Search engines (e.g., Elasticsearch)
  • In-memory caching (e.g., Redis, Memcached)

Web Frameworks & Libraries:

  • (Not applicable for this role)

Server Technologies:

  • Containerization (e.g., Docker)
  • Orchestration (e.g., Kubernetes)
  • Configuration management (e.g., Ansible, Puppet)
  • Infrastructure as Code (e.g., Terraform)
  • Monitoring (e.g., Prometheus, Grafana)
  • Cloud platforms (e.g., AWS, GCP, Azure)

Databases:

  • SQL (e.g., MySQL, PostgreSQL)
  • NoSQL (e.g., Cassandra, MongoDB)
  • In-memory databases (e.g., Redis, Memcached)

Tools:

  • Version control (e.g., Git)
  • CI/CD pipelines (e.g., Jenkins, GitLab CI)
  • Infrastructure as Code (e.g., Terraform)
  • Containerization (e.g., Docker)
  • Orchestration (e.g., Kubernetes)

Methodologies:

  • Agile/Scrum
  • DevOps
  • Site Reliability Engineering
  • Infrastructure as Code
  • Continuous Integration
  • Continuous Deployment

Soft Skills:

  • Communication
  • Collaboration
  • Problem-solving
  • Leadership
  • Mentoring
  • Process improvement

Industry Terms:

  • Cloud Platforms
  • Distributed Systems
  • Scalability
  • Performance Tuning
  • Incident Response
  • System Design
  • Capacity Planning
  • Configuration Management
  • DevOps
  • Site Reliability Engineering
  • Infrastructure as Code
  • Continuous Integration
  • Continuous Deployment

Application Requirements

Candidates should have at least 3 years of experience in running distributed systems at scale and expertise in programming languages such as Java, Python, or Go. A strong understanding of systems programming, performance bottlenecks, and security best practices is also required.