Senior Site Reliability Engineer

📍 Job Overview

Job Title: Senior Site Reliability Engineer - Storage
Company: Kyndryl
Location: Noida, Uttar Pradesh, India
Job Type: On-site
Category: DevOps, Infrastructure
Date Posted: 2025-06-25
Experience Level: 10+
Remote Status: On-site

🚀 Role Summary

Key Responsibilities: Manage and optimize storage solutions, ensure system reliability, and collaborate with cross-functional teams to drive continuous improvement.
Key Technologies: ZFS, iSCSI, Ubuntu, Linux, Automation, Monitoring Tools

💻 Primary Responsibilities

💾 Storage and Infrastructure Management

Deploy, manage, and optimize storage solutions using ZFS and iSCSI across global data centers.
Implement and maintain automation and monitoring tools such as Puppet, Grafana, Zabbix, and Jenkins to enhance system performance and reliability.
Utilize storcli for managing server storage configurations.
Manage and maintain Ubuntu-based systems, ensuring security and compliance.
Conduct performance tuning and capacity planning for Linux servers.
Develop and implement self-healing systems and automated recovery processes on Linux platforms.

🔄 Reliability Engineering

Develop and implement strategies for improving system availability and performance.
Conduct root-cause analysis and incident response for storage-related issues.
Collaborate with SDEs to support software development infrastructure and deploy new product features.

📈 Operational Excellence

Manage on-call rotations, leveraging automation to minimize operational load.
Develop and maintain documentation for operational procedures and best practices.
Drive continuous improvement and innovation in storage operations.

🤝 Collaboration and Communication

Work closely with cross-functional teams, including SDEs and infrastructure engineers.
Provide technical guidance and support for storage-related challenges.
Present data-driven insights to stakeholders to support decision-making.

👥 Hiring and Team Growth

Manage recruiting and hiring pipeline, including reviewing resumes, designing interview loops, and directly interviewing candidates.
Develop and grow talent through effective mentoring, coaching, succession planning, and retention strategies for key talent.

📄 Processes and Documentation

Design, implement, and improve team processes, including task triage, tooling, knowledge sharing, and quality assurance.
Produce and review documentation artifacts, team health and status reports, and technical designs.

🎓 Skills & Qualifications

Education: Bachelor's degree in Computer Science, Engineering, or a related field.

Experience: 8 to 12 years of experience in site reliability engineering, with a focus on storage solutions and Linux systems.

Required Skills:

Proven experience in site reliability engineering, with a focus on storage solutions and Linux systems.
Strong knowledge of ZFS, iSCSI, and Ubuntu.
Expertise in automation and configuration management tools (e.g., Bash, Ansible, Puppet).
Familiarity with Hashicorp tools, SSH, and LDAP.
Experience with storcli for storage configuration.
Experience with monitoring tools such as Grafana, Zabbix, InfluxDB.
Ability to conduct root-cause analysis and implement effective solutions.
High level of ownership for assigned team problem space, including driving predictable delivery, continuous iteration and improvement, consistent and effective communication, gracefully coordinating with upstream and downstream stakeholders, and project status.
Project management skills, including experience with task estimation, scheduling, Gantt charts, unblocking dependencies, Agile methodologies, being detail-oriented, and keeping projects on track.
Documentation skills, including writing standard operating procedures, design docs, policy documents, runbooks.

Preferred Skills:

Ability to program in Python, Rust.
Software development experience, including technical design and deployment.

📊 Web Portfolio & Project Requirements

Portfolio Essentials:
- Demonstrate experience in managing and optimizing storage solutions using ZFS and iSCSI.
- Showcase proficiency in Linux systems administration and automation tools.
- Highlight successful root-cause analysis and incident response projects.
- Display effective project management and team collaboration skills.
Technical Documentation:
- Provide documentation for operational procedures, best practices, and technical designs related to storage management and reliability engineering.
- Include examples of project status reports and team health assessments.

💵 Compensation & Benefits

Salary Range: INR 2,500,000 - 3,500,000 per annum (region-specific, based on experience level and industry standards)
Benefits:
- Employee Learning Programs (including Microsoft, Google, Amazon, Skillsoft certifications)
- Volunteering and Giving Platform (donate, start fundraisers, volunteer, and search over 2 million non-profit organizations)

🎯 Team & Company Context

🏢 Company Culture

Industry: IT Services and Consulting
Company Size: Large (10,000+ employees)
Founded: 1933 (as IBM Global Services; Kyndryl spun off in 2021)
Team Structure:
- The DevOps team is part of Skytap on Kyndryl, which builds a public cloud platform product as a dedicated software development team within the broader organization.
Development Methodology:
- Agile/Scrum methodologies and sprint planning for web projects
- Code review, testing, and quality assurance practices
- Deployment strategies, CI/CD pipelines, and server management

📈 Career & Growth Analysis

Web Technology Career Level: Senior Site Reliability Engineer - Storage (highly specialized role with significant impact on system reliability and performance)
Reporting Structure: Reports directly to the Engineering Manager within the Skytap on Kyndryl team
Technical Impact: Drives system reliability, performance, and scalability for the Skytap cloud platform, ensuring high availability and minimal downtime for clients worldwide

🌐 Work Environment

Office Type: On-site (with opportunities for remote work in specific roles)
Office Location(s): Noida, Uttar Pradesh, India (with additional offices in Bangalore, Hyderabad, Chennai, and Pune)
Workspace Context:
- Collaborative workspaces with cross-functional teams, including SDEs and infrastructure engineers
- Access to modern development tools, multiple monitors, and testing devices
- Opportunities for knowledge sharing, technical mentoring, and continuous learning

📄 Application & Technical Interview Process

Interview Process:
- Technical preparation recommendations, including coding/configuration assessments and system design discussions
- Web architecture expectations and problem-solving strategies
- Portfolio review and cultural fit assessment
- Final evaluation criteria and technical impact discussion
Portfolio Review Tips:
- Highlight storage management and reliability engineering projects, demonstrating expertise in ZFS, iSCSI, and Linux systems
- Showcase successful root-cause analysis and incident response case studies
- Emphasize project management skills and team collaboration experiences
Technical Challenge Preparation:
- Familiarize yourself with Kyndryl's storage infrastructure and cloud platform (Skytap)
- Brush up on your Linux systems administration and automation skills
- Prepare for coding/configuration assessments and system design discussions

📌 Application Steps

To apply for this Senior Site Reliability Engineer - Storage position at Kyndryl:

Submit your application through the application link provided.
Customize your resume to highlight relevant web technology skills and experiences.
Prepare a comprehensive portfolio showcasing your storage management and reliability engineering projects.
Research Kyndryl's company culture, focusing on the Skytap on Kyndryl team and the broader organization.
Prepare for technical interviews, including coding/configuration assessments and system design discussions.

📝 Enhancement Note: This enhanced job description includes AI-generated insights and web technology industry-standard assumptions. All details should be verified directly with the hiring organization before making application decisions.

Content Guidelines (IMPORTANT: Do not include this in the output)

Web Technology-Specific Focus:

Tailor every section specifically to DevOps, infrastructure, and storage management roles
Include storage solutions, Linux systems, and automation tools in relevant sections
Emphasize web portfolio requirements, live project demonstrations, and user experience considerations for storage management and reliability engineering projects
Address DevOps team dynamics, cross-functional collaboration with software development teams, and infrastructure management processes
Highlight storage-related interview preparation and coding challenge guidance

Quality Standards:

Ensure no content overlap between sections - each section must contain unique information
Only include Enhancement Notes when making significant inferences about storage management and reliability engineering processes, with specific reasoning based on role level and web technology industry practices
Be comprehensive but concise, prioritizing actionable information over descriptive text
Strategically distribute web technology and storage management-related keywords throughout all sections naturally
Provide realistic salary ranges based on location, experience level, and storage management specialization

Industry Expertise:

Include specific storage solutions, server platforms, and infrastructure tools relevant to the role
Address storage management career progression paths and technical leadership opportunities in DevOps teams
Provide tactical advice for storage portfolio development, live demonstrations, and project case studies
Include storage-related interview preparation and coding challenge guidance
Emphasize performance optimization, accessibility standards, and user experience principles for storage management and reliability engineering projects

Professional Standards:

Maintain consistent formatting, spacing, and professional tone throughout
Use web technology and storage management industry terminology appropriately and accurately
Include comprehensive benefits and growth opportunities relevant to DevOps and infrastructure professionals
Provide actionable insights that give web technology and storage management candidates a competitive advantage
Focus on DevOps team culture, cross-functional collaboration, and user impact measurement for storage management and reliability engineering projects

Technical Focus & Portfolio Emphasis:

Emphasize storage management best practices, performance optimization, and accessibility standards
Include specific portfolio requirements tailored to the storage management and reliability engineering discipline and role level
Address browser compatibility, accessibility standards, and user experience design principles for storage management and reliability engineering projects
Focus on problem-solving methods, performance optimization, and scalable storage architecture
Include technical presentation skills and stakeholder communication for storage management and reliability engineering projects

Avoid:

Generic business jargon not relevant to DevOps, infrastructure, or storage management roles
Placeholder text or incomplete sections
Repetitive content across different sections
Non-technical terminology unless relevant to the specific web technology or storage management role
Marketing language unrelated to web technology, server administration, or storage management

Senior Site Reliability Engineer - Storage