AWS Site Reliability Engineer, Staff

Walmart
Full_timeIndia

📍 Job Overview

  • Job Title: AWS Site Reliability Engineer, Staff
  • Company: Walmart
  • Location: Hyderabad, Office Level 3 & 4, Block A - East Wing
  • Job Type: On-site
  • Category: DevOps, Site Reliability Engineering
  • Date Posted: June 20, 2025

🚀 Role Summary

  • Key Responsibilities:
    • Manage AWS environments, ensuring high availability, scalability, and security.
    • Collaborate with stakeholders to improve application observability and optimize performance.
    • Lead a team of engineers to implement security features, reduce operational toil, and drive continuous improvement.
    • Stay informed on AWS features, best practices, and WBD's AWS environment.
    • Work with complementary public cloud leads to facilitate consistency across WBD's management of resources.

💻 Primary Responsibilities

  • AWS Environment Management:
    • Primarily accountable for managing AWS environments.
    • Identify, optimize, and eliminate performance bottlenecks and proactively remediate security concerns through monitoring, profiling, and tuning.
    • Establish and improve SLOs, SLIs, and error budgets to drive system reliability.
    • Collaborate with stakeholders, including application developers, to improve application observability and optimize performance.
    • Lead and mentor a team of engineers working to reduce toil across the total team load and implement security features, roles, user access, and privileges according to best practices.
    • Proactively identify, design, and implement process and architectural improvements.

🎓 Skills & Qualifications

Education: Bachelor's degree in Computer Science, Engineering, or a related field.

Experience: 8+ years of prior experience in Site Reliability Engineering, DevOps, Cloud Infrastructure, or related fields.

Required Skills:

  • Expertise in AWS and strong experience in Linux/Unix administration, networking, and distributed systems.
  • Proficiency in two or more programming languages, such as Python, Golang, JavaScript, PowerShell, etc.
  • Extensive hands-on experience in container orchestration technologies, such as ECS, EKS, Kubernetes, Docker, etc.
  • Deep knowledge of monitoring, logging, and observability tools (Prometheus, Grafana, ELK, Splunk, etc.).
  • Hands-on experience with IaC using Terraform or CloudFormation templates.
  • Strong background in CI/CD pipelines, GitOps, and infrastructure automation (Terraform, Helm, Ansible, or Chef).

Preferred Skills:

  • Experience with other cloud providers such as Google Cloud Platform (GCP), Azure, Oracle, etc.
  • Knowledge of and passion for media, entertainment, and technology industries.
  • Familiarity with streaming and similar products/services.
  • Experience working in a national or global company.
  • Comfortable working in a highly iterative and somewhat unstructured environment.

Soft Skills:

  • Strong problem-solving, troubleshooting, and debugging skills.
  • Excellent written and verbal communication and collaboration abilities.
  • English language fluency required.
  • Ability to handle multiple assignments concurrently.
  • Passion for automation, reliability, and continuous improvement.
  • Move quickly and intelligently - seeing technical debt as your nemesis.
  • Ability to solve problems independently but knows when to request assistance.

📊 Web Portfolio & Project Requirements

  • Portfolio Essentials:

    • Demonstrate expertise in AWS, Linux administration, and distributed systems.
    • Showcase hands-on experience with container orchestration technologies and IaC tools like Terraform.
    • Highlight proficiency in programming languages and familiarity with monitoring, logging, and observability tools.
    • Display strong problem-solving skills and the ability to manage enterprise-scale infrastructure and tooling.
  • Technical Documentation:

    • Provide detailed technical documentation, including runbooks, troubleshooting guides, and system diagrams.
    • Showcase your ability to develop and maintain comprehensive documentation for complex systems.

💵 Compensation & Benefits

  • Salary Range: Competitive salary package based on experience and industry standards for Hyderabad, India. Research shows that the average salary for a Senior Site Reliability Engineer in Hyderabad ranges from ₹25-35 lakhs per annum (approximately $330,000 - $460,000 USD).

  • Benefits:

    • Comprehensive health, dental, and vision coverage.
    • Retirement savings plans and pension schemes.
    • Generous vacation and time-off policies.
    • Employee discounts and perks.
    • Professional development opportunities and training programs.
    • Flexible work arrangements and remote work options.
  • Working Hours: Full-time position with a standard workweek of 40 hours, with flexibility for deployment windows, maintenance, and project deadlines.

🎯 Team & Company Context

🏢 Company Culture

  • Industry: Media and entertainment technology.
  • Company Size: Large, global organization with a significant presence in the media and entertainment industry.
  • Founded: 1962, as a division of Walmart Inc.

Team Structure:

  • The AWS Site Reliability Engineer role serves as a technical leader within the Global Infrastructure Cloud Technologies (GICT) team, supporting hundreds of applications, websites, and services in the fleet of Warner Bros. Discovery (WBD) cloud accounts.
  • The role collaborates with other SRE leads, the rest of the cloud engineering team, software developers, and management to build and manage highly resilient and performant infrastructure.

Development Methodology:

  • The team follows Agile methodologies, with a focus on sprint planning, code review, testing, and quality assurance practices.
  • Deployment strategies include CI/CD pipelines and automated deployment tools, with a focus on infrastructure as code (IaC) and automated testing.

📈 Career & Growth Analysis

  • Web Technology Career Level: Staff-level role with significant technical leadership responsibilities, managing a team of engineers and driving operational excellence across WBD's AWS environment.
  • Reporting Structure: Reports directly to the Sr. Manager of Cloud Engineering, with close collaboration with other SRE leads, cloud engineering team members, and stakeholders.
  • Technical Impact: Responsible for ensuring the reliability, availability, scalability, and security of WBD's cloud infrastructure and services, directly impacting user experience and business outcomes.

Growth Opportunities:

  • Technical Leadership: Develop and refine team management skills, mentoring junior engineers, and driving best practices across the organization.
  • Architecture Decisions: Contribute to strategic architecture decisions, driving the evolution of WBD's cloud infrastructure, and ensuring alignment with business objectives.
  • Emerging Technologies: Stay informed on emerging AWS features, best practices, and industry trends, driving innovation and continuous improvement within the team.

🌐 Work Environment

  • Office Type: On-site, with a collaborative workspace designed to facilitate cross-functional team interaction and knowledge sharing.
  • Office Location(s): Hyderabad, Office Level 3 & 4, Block A - East Wing, with additional offices across India and global locations.
  • Workspace Context:
    • The office provides a collaborative workspace with multiple monitors, testing devices, and development tools available to support the team's work.
    • The team encourages knowledge sharing, technical mentoring, and continuous learning to drive professional growth and development.

Work Schedule:

  • Standard workweek of 40 hours, with flexibility for deployment windows, maintenance, and project deadlines.
  • The team follows a hybrid Agile/Scrum methodology, with regular sprint planning, stand-ups, and retrospectives to ensure efficient project management and continuous improvement.

📄 Application & Technical Interview Process

Interview Process:

  1. Phone Screen: A brief conversation to assess communication skills and cultural fit.
  2. Technical Phone Screen: A deeper dive into technical skills, focusing on AWS, Linux administration, and distributed systems.
  3. On-site Interview: A full-day on-site interview, including:
    • Technical Deep Dive: A comprehensive assessment of AWS, Linux administration, and distributed systems skills.
    • Behavioral Questions: Questions focused on problem-solving, collaboration, and leadership skills.
    • Team Fit: A meeting with the team to assess cultural fit and team dynamics.
  4. Final Decision: A review of the candidate's performance throughout the interview process, with a final decision made by the hiring manager.

Portfolio Review Tips:

  • Highlight your expertise in AWS, Linux administration, and distributed systems through live demos and case studies.
  • Showcase your ability to manage enterprise-scale infrastructure and tooling, with a focus on performance optimization, security, and scalability.
  • Demonstrate your problem-solving skills and the ability to drive continuous improvement within complex systems.

Technical Challenge Preparation:

  • Brush up on AWS services, best practices, and hands-on experience with relevant tools and technologies.
  • Familiarize yourself with the latest trends in Site Reliability Engineering, DevOps, and cloud infrastructure management.
  • Prepare for behavioral questions focused on problem-solving, collaboration, and leadership skills, with examples from your previous experiences.

ATS Keywords: AWS, Site Reliability Engineering, DevOps, Cloud Infrastructure, Linux Administration, Distributed Systems, Terraform, IaC, CI/CD Pipelines, GitOps, Monitoring Tools, Incident Management, Security Best Practices, Collaboration, Problem Solving, Automation, Continuous Improvement, Technical Documentation, Leadership, Mentoring, Architecture Decisions, Emerging Technologies.

📌 Application Steps

To apply for this AWS Site Reliability Engineer, Staff position:

  1. Submit your application through the application link provided.
  2. Tailor your resume to highlight relevant web technology skills, experience, and project accomplishments.
  3. Prepare a comprehensive portfolio showcasing your expertise in AWS, Linux administration, and distributed systems, with a focus on performance optimization, security, and scalability.
  4. Research Warner Bros. Discovery's company culture, values, and guiding principles to ensure a strong cultural fit and alignment with your personal beliefs and career aspirations.
  5. Prepare for the interview process by reviewing the job description, practicing common interview questions, and brushing up on your technical skills and industry knowledge.

⚠️ Important Notice: This enhanced job description includes AI-generated insights and web technology industry-standard assumptions. All details should be verified directly with the hiring organization before making application decisions.

Application Requirements

Candidates should have 8+ years of experience in Site Reliability Engineering or related fields, with expertise in AWS and strong skills in Linux/Unix administration. Proficiency in programming languages and container orchestration technologies is also required.