Senior Site Reliability Engineer- Incident management

Qualys
Full_timeβ€’Pune, India

πŸ“ Job Overview

  • Job Title: Senior Site Reliability Engineer - Incident Management
  • Company: Qualys
  • Location: Pune, Mahārāshtra, India
  • Job Type: On-site, Full-time
  • Category: DevOps, Site Reliability Engineering
  • Date Posted: 2025-06-30
  • Experience Level: 0-2 years (Fresher can apply)

πŸš€ Role Summary

  • Incident Management Focus: Monitor, troubleshoot, and resolve issues in Qualys' infrastructure and services to ensure maximum service availability and performance.
  • Collaborative Environment: Work closely with engineering and technical teams to drive quick resolution and support services.
  • Automation & Documentation: Automate tasks and document processes to improve efficiency and knowledge sharing.

πŸ“ Enhancement Note: This role emphasizes incident management, requiring strong troubleshooting skills and a proactive approach to maintain service reliability and minimize downtime.

πŸ’» Primary Responsibilities

  • Monitoring & Troubleshooting: Monitor system performance using tools like Splunk, Grafana, and Kibana. Troubleshoot and resolve issues promptly to minimize service disruption.
  • Incident Management: Manage incident tickets, track issues, and document resolutions using tools like Jira and ServiceNow. Ensure timely resolution and effective communication with stakeholders.
  • Collaboration: Work closely with engineering and technical teams to identify root causes, drive quick resolution, and prevent future incidents.
  • Automation & Documentation: Automate repetitive tasks and document processes to improve efficiency and knowledge sharing within the team.

πŸ“ Enhancement Note: This role requires a strong focus on incident management processes, with the ability to drive incident processes and maintain ongoing communication with stakeholders.

πŸŽ“ Skills & Qualifications

Education: Relevant IT degree or certification in Linux, System Administration, VMware, IT Security, or ITSM/ITIL.

Experience: 0-2 years of IT Operations experience (Fresher can apply).

Required Skills:

  • Strong troubleshooting skills and familiarity with monitoring tools (Splunk, Prometheus, Grafana, Kibana).
  • Knowledge of incident management processes and tools (Jira, ServiceNow, PagerDuty).
  • Good understanding of ITSM main functions and usage of tools.
  • Strong interpersonal skills and ability to interact with employees at all levels.
  • Basic knowledge of DevOps/SRE principles, Python, and cloud technologies.

Preferred Skills:

  • Certifications in Linux, System Admin, VMware, IT Security, or ITSM/ITIL.
  • Familiarity with automation tools and scripting languages.

πŸ“ Enhancement Note: While not required, relevant certifications and familiarity with automation tools can significantly enhance the candidate's prospects in this role.

πŸ“Š Web Portfolio & Project Requirements

  • Incident Management Portfolio: Demonstrate your incident management experience through case studies showcasing your problem-solving skills, process improvement, and communication with stakeholders.
  • Technical Documentation: Highlight your documentation skills by providing examples of incident reports, post-mortem analyses, and process improvements.
  • Automation Projects: Showcase your automation skills through projects demonstrating task automation, script development, or tool integration.

πŸ’΅ Compensation & Benefits

Salary Range: INR 8-12 LPA (Based on experience and qualifications)

Benefits:

  • Competitive salary and benefits package.
  • Opportunities for professional growth and development.
  • Collaborative and innovative work environment.

πŸ“ Enhancement Note: The salary range is estimated based on market research for similar roles in Pune, India. Please verify with the company for the most accurate and up-to-date information.

🎯 Team & Company Context

🏒 Company Culture

Industry: Cybersecurity and compliance.

Company Size: Medium (1,001-5,000 employees)

Founded: 1999

Team Structure:

  • The Site Reliability Engineering team works closely with engineering and technical teams to ensure service reliability and performance.
  • The team operates on a 24/7/365 basis, with monthly shift rotation.

Development Methodology:

  • The team follows ITIL/ITSM principles for incident management.
  • They use tools like Jira, ServiceNow, and PagerDuty for incident tracking and resolution.

Company Website: Qualys

πŸ“ Enhancement Note: Qualys is a leading provider of cloud-based security and compliance solutions, with a strong focus on innovation and teamwork.

πŸ“ˆ Career & Growth Analysis

Web Technology Career Level: Senior Site Reliability Engineer - Incident Management, responsible for driving incident management processes and ensuring service reliability and performance.

Reporting Structure: This role reports directly to the Site Reliability Engineering Manager and works closely with engineering and technical teams.

Technical Impact: The Senior Site Reliability Engineer - Incident Management plays a crucial role in maintaining Qualys' infrastructure and services, ensuring maximum service availability and performance.

Growth Opportunities:

  • Develop expertise in incident management processes and tools.
  • Gain experience in automation and scripting languages.
  • Explore opportunities in technical leadership, architecture, or specialized roles within the Site Reliability Engineering team.

πŸ“ Enhancement Note: This role offers significant growth potential, with opportunities to develop expertise in incident management and automation, as well as exploring leadership or specialized roles.

🌐 Work Environment

Office Type: On-site, with a collaborative and innovative work environment.

Office Location(s): Pune, Mahārāshtra, India

Workspace Context:

  • The workspace is designed to foster collaboration and innovation, with multiple monitors and testing devices available.
  • The team works closely together to ensure quick resolution and support services for engineering and technical teams.

Work Schedule: 40 hours per week, with a 24/7/365 on-call rotation.

πŸ“ Enhancement Note: The on-call rotation ensures that the team is available to address incidents and maintain service reliability and performance at all times.

πŸ“„ Application & Technical Interview Process

Interview Process:

  1. Online assessment of technical skills and problem-solving abilities.
  2. Technical interview focusing on incident management processes, troubleshooting, and automation.
  3. Behavioral interview to assess communication skills, teamwork, and adaptability.
  4. Final interview with the hiring manager to discuss the role and company culture.

Portfolio Review Tips:

  • Highlight your incident management experience through case studies and process improvement projects.
  • Demonstrate your technical skills through automation projects and scripting examples.
  • Showcase your communication skills through incident reports and stakeholder management examples.

Technical Challenge Preparation:

  • Brush up on your troubleshooting skills and incident management processes.
  • Familiarize yourself with relevant tools like Splunk, Grafana, Kibana, Jira, and ServiceNow.
  • Prepare for behavioral interview questions focusing on communication, teamwork, and adaptability.

ATS Keywords:

  • Incident Management
  • Troubleshooting
  • Monitoring
  • Automation
  • Splunk
  • Grafana
  • Kibana
  • PagerDuty
  • Jira
  • ServiceNow
  • ITSM
  • Linux
  • System Administration
  • VMware
  • IT Security
  • Python
  • Cloud

πŸ“ Enhancement Note: Incorporate relevant ATS keywords throughout your resume and portfolio to optimize your application for this role.

πŸ›  Technology Stack & Web Infrastructure

Monitoring & Troubleshooting Tools:

  • Splunk
  • Prometheus
  • Grafana
  • Kibana
  • PagerDuty
  • Runscope

Incident Management Tools:

  • Jira
  • ServiceNow

Automation & Scripting Languages:

  • Python
  • Bash
  • PowerShell

πŸ“ Enhancement Note: Familiarize yourself with the relevant tools and technologies used in this role to demonstrate your technical proficiency during the interview process.

πŸ‘₯ Team Culture & Values

Site Reliability Engineering Values:

  • Proactive incident management and prevention.
  • Collaborative problem-solving and knowledge sharing.
  • Continuous improvement and automation.
  • Strong communication and stakeholder management.

Collaboration Style:

  • The Site Reliability Engineering team works closely with engineering and technical teams to drive quick resolution and support services.
  • They follow a collaborative approach to incident management, with a focus on knowledge sharing and continuous improvement.

πŸ“ Enhancement Note: The Site Reliability Engineering team at Qualys values collaboration, proactivity, and continuous improvement in incident management and service reliability.

⚑ Challenges & Growth Opportunities

Technical Challenges:

  • Troubleshooting complex infrastructure and service issues.
  • Managing high-pressure incident situations and minimizing downtime.
  • Automating repetitive tasks and improving incident management processes.

Learning & Development Opportunities:

  • Gain expertise in incident management processes and tools.
  • Develop automation and scripting skills.
  • Explore opportunities in technical leadership, architecture, or specialized roles within the Site Reliability Engineering team.

πŸ“ Enhancement Note: This role offers significant technical challenges and growth opportunities, with the potential to develop expertise in incident management and automation, as well as exploring leadership or specialized roles.

πŸ’‘ Interview Preparation

Technical Questions:

  • Be prepared to discuss your incident management experience and process improvement projects.
  • Demonstrate your troubleshooting skills through case studies and technical challenges.
  • Showcase your automation and scripting skills through relevant projects and examples.

Company & Culture Questions:

  • Research Qualys' company culture and values, focusing on innovation and teamwork.
  • Prepare questions about the Site Reliability Engineering team's structure, collaboration, and growth opportunities.

Portfolio Presentation Strategy:

  • Highlight your incident management experience through case studies and process improvement projects.
  • Demonstrate your technical skills through automation projects and scripting examples.
  • Showcase your communication skills through incident reports and stakeholder management examples.

πŸ“ Enhancement Note: Tailor your interview preparation to emphasize your incident management experience, technical skills, and communication abilities, with a focus on the company's culture and values.

πŸ“Œ Application Steps

To apply for this Senior Site Reliability Engineer - Incident Management position at Qualys:

  1. Submit your application through the Qualys careers page.
  2. Customize your resume and portfolio to highlight your incident management experience, technical skills, and communication abilities.
  3. Prepare for the interview process by researching the company's culture, values, and technical requirements.
  4. Practice troubleshooting exercises and incident management case studies to demonstrate your technical proficiency.

⚠️ Important Notice: This enhanced job description includes AI-generated insights and web development industry-standard assumptions. All details should be verified directly with the hiring organization before making application decisions.

Application Requirements

Candidates should have 1-2 years of IT Operations experience or equivalent certification, with knowledge of monitoring tools and incident management processes. Strong interpersonal skills and relevant technical certifications are highly recommended.