AWS Site Reliability Engineer, Staff
📍 Job Overview
- Job Title: AWS Site Reliability Engineer, Staff
- Company: Walmart
- Location: Hyderabad, Office Level 3 & 4, Block A - East Wing
- Job Type: On-site
- Category: DevOps, Site Reliability Engineering
- Date Posted: June 20, 2025
🚀 Role Summary
- Key Responsibilities:
- Manage AWS environments, ensuring high availability, scalability, and security.
- Collaborate with stakeholders to improve application observability and optimize performance.
- Lead a team of engineers to implement security features, reduce operational toil, and drive continuous improvement.
- Stay informed on AWS features, best practices, and WBD's AWS environment.
- Work with complementary public cloud leads to facilitate consistency across WBD's management of resources.
💻 Primary Responsibilities
- AWS Environment Management:
- Primarily accountable for managing AWS environments.
- Identify, optimize, and eliminate performance bottlenecks and proactively remediate security concerns through monitoring, profiling, and tuning.
- Establish and improve SLOs, SLIs, and error budgets to drive system reliability.
- Collaborate with stakeholders, including application developers, to improve application observability and optimize performance.
- Lead and mentor a team of engineers working to reduce toil across the total team load and implement security features, roles, user access, and privileges according to best practices.
- Proactively identify, design, and implement process and architectural improvements.
🎓 Skills & Qualifications
Education: Bachelor's degree in Computer Science, Engineering, or a related field.
Experience: 8+ years of prior experience in Site Reliability Engineering, DevOps, Cloud Infrastructure, or related fields.
Required Skills:
- Expertise in AWS and strong experience in Linux/Unix administration, networking, and distributed systems.
- Proficiency in two or more programming languages, such as Python, Golang, JavaScript, PowerShell, etc.
- Extensive hands-on experience in container orchestration technologies, such as ECS, EKS, Kubernetes, Docker, etc.
- Deep knowledge of monitoring, logging, and observability tools (Prometheus, Grafana, ELK, Splunk, etc.).
- Hands-on experience with IaC using Terraform or CloudFormation templates.
- Strong background in CI/CD pipelines, GitOps, and infrastructure automation (Terraform, Helm, Ansible, or Chef).
Preferred Skills:
- Experience with other cloud providers such as Google Cloud Platform (GCP), Azure, Oracle, etc.
- Knowledge of and passion for media, entertainment, and technology industries.
- Familiarity with streaming and similar products/services.
- Experience working in a national or global company.
- Comfortable working in a highly iterative and somewhat unstructured environment.
Soft Skills:
- Strong problem-solving, troubleshooting, and debugging skills.
- Excellent written and verbal communication and collaboration abilities.
- English language fluency required.
- Ability to handle multiple assignments concurrently.
- Passion for automation, reliability, and continuous improvement.
- Move quickly and intelligently - seeing technical debt as your nemesis.
- Ability to solve problems independently but knows when to request assistance.
📊 Web Portfolio & Project Requirements
-
Portfolio Essentials:
- Demonstrate expertise in AWS, Linux administration, and distributed systems.
- Showcase hands-on experience with container orchestration technologies and IaC tools like Terraform.
- Highlight proficiency in programming languages and familiarity with monitoring, logging, and observability tools.
- Display strong problem-solving skills and the ability to manage enterprise-scale infrastructure and tooling.
-
Technical Documentation:
- Provide detailed technical documentation, including runbooks, troubleshooting guides, and system diagrams.
- Showcase your ability to develop and maintain comprehensive documentation for complex systems.
💵 Compensation & Benefits
-
Salary Range: Competitive salary package based on experience and industry standards for Hyderabad, India. Research shows that the average salary for a Senior Site Reliability Engineer in Hyderabad ranges from ₹25-35 lakhs per annum (approximately $330,000 - $460,000 USD).
-
Benefits:
- Comprehensive health, dental, and vision coverage.
- Retirement savings plans and pension schemes.
- Generous vacation and time-off policies.
- Employee discounts and perks.
- Professional development opportunities and training programs.
- Flexible work arrangements and remote work options.
-
Working Hours: Full-time position with a standard workweek of 40 hours, with flexibility for deployment windows, maintenance, and project deadlines.
🎯 Team & Company Context
🏢 Company Culture
- Industry: Media and entertainment technology.
- Company Size: Large, global organization with a significant presence in the media and entertainment industry.
- Founded: 1962, as a division of Walmart Inc.
Team Structure:
- The AWS Site Reliability Engineer role serves as a technical leader within the Global Infrastructure Cloud Technologies (GICT) team, supporting hundreds of applications, websites, and services in the fleet of Warner Bros. Discovery (WBD) cloud accounts.
- The role collaborates with other SRE leads, the rest of the cloud engineering team, software developers, and management to build and manage highly resilient and performant infrastructure.
Development Methodology:
- The team follows Agile methodologies, with a focus on sprint planning, code review, testing, and quality assurance practices.
- Deployment strategies include CI/CD pipelines and automated deployment tools, with a focus on infrastructure as code (IaC) and automated testing.
📈 Career & Growth Analysis
- Web Technology Career Level: Staff-level role with significant technical leadership responsibilities, managing a team of engineers and driving operational excellence across WBD's AWS environment.
- Reporting Structure: Reports directly to the Sr. Manager of Cloud Engineering, with close collaboration with other SRE leads, cloud engineering team members, and stakeholders.
- Technical Impact: Responsible for ensuring the reliability, availability, scalability, and security of WBD's cloud infrastructure and services, directly impacting user experience and business outcomes.
Growth Opportunities:
- Technical Leadership: Develop and refine team management skills, mentoring junior engineers, and driving best practices across the organization.
- Architecture Decisions: Contribute to strategic architecture decisions, driving the evolution of WBD's cloud infrastructure, and ensuring alignment with business objectives.
- Emerging Technologies: Stay informed on emerging AWS features, best practices, and industry trends, driving innovation and continuous improvement within the team.
🌐 Work Environment
- Office Type: On-site, with a collaborative workspace designed to facilitate cross-functional team interaction and knowledge sharing.
- Office Location(s): Hyderabad, Office Level 3 & 4, Block A - East Wing, with additional offices across India and global locations.
- Workspace Context:
- The office provides a collaborative workspace with multiple monitors, testing devices, and development tools available to support the team's work.
- The team encourages knowledge sharing, technical mentoring, and continuous learning to drive professional growth and development.
Work Schedule:
- Standard workweek of 40 hours, with flexibility for deployment windows, maintenance, and project deadlines.
- The team follows a hybrid Agile/Scrum methodology, with regular sprint planning, stand-ups, and retrospectives to ensure efficient project management and continuous improvement.
📄 Application & Technical Interview Process
Interview Process:
- Phone Screen: A brief conversation to assess communication skills and cultural fit.
- Technical Phone Screen: A deeper dive into technical skills, focusing on AWS, Linux administration, and distributed systems.
- On-site Interview: A full-day on-site interview, including:
- Technical Deep Dive: A comprehensive assessment of AWS, Linux administration, and distributed systems skills.
- Behavioral Questions: Questions focused on problem-solving, collaboration, and leadership skills.
- Team Fit: A meeting with the team to assess cultural fit and team dynamics.
- Final Decision: A review of the candidate's performance throughout the interview process, with a final decision made by the hiring manager.
Portfolio Review Tips:
- Highlight your expertise in AWS, Linux administration, and distributed systems through live demos and case studies.
- Showcase your ability to manage enterprise-scale infrastructure and tooling, with a focus on performance optimization, security, and scalability.
- Demonstrate your problem-solving skills and the ability to drive continuous improvement within complex systems.
Technical Challenge Preparation:
- Brush up on AWS services, best practices, and hands-on experience with relevant tools and technologies.
- Familiarize yourself with the latest trends in Site Reliability Engineering, DevOps, and cloud infrastructure management.
- Prepare for behavioral questions focused on problem-solving, collaboration, and leadership skills, with examples from your previous experiences.
ATS Keywords: AWS, Site Reliability Engineering, DevOps, Cloud Infrastructure, Linux Administration, Distributed Systems, Terraform, IaC, CI/CD Pipelines, GitOps, Monitoring Tools, Incident Management, Security Best Practices, Collaboration, Problem Solving, Automation, Continuous Improvement, Technical Documentation, Leadership, Mentoring, Architecture Decisions, Emerging Technologies.
📌 Application Steps
To apply for this AWS Site Reliability Engineer, Staff position:
- Submit your application through the application link provided.
- Tailor your resume to highlight relevant web technology skills, experience, and project accomplishments.
- Prepare a comprehensive portfolio showcasing your expertise in AWS, Linux administration, and distributed systems, with a focus on performance optimization, security, and scalability.
- Research Warner Bros. Discovery's company culture, values, and guiding principles to ensure a strong cultural fit and alignment with your personal beliefs and career aspirations.
- Prepare for the interview process by reviewing the job description, practicing common interview questions, and brushing up on your technical skills and industry knowledge.
⚠️ Important Notice: This enhanced job description includes AI-generated insights and web technology industry-standard assumptions. All details should be verified directly with the hiring organization before making application decisions.
Application Requirements
Candidates should have 8+ years of experience in Site Reliability Engineering or related fields, with expertise in AWS and strong skills in Linux/Unix administration. Proficiency in programming languages and container orchestration technologies is also required.