Senior Manager, Site Reliability Engineering (SRE) - Hybrid - Seattle

Nordstrom
Full_timeβ€’$191k-297k/year (USD)β€’Seattle, United States

πŸ“ Job Overview

  • Job Title: Senior Manager, Site Reliability Engineering (SRE) - Hybrid
  • Company: Nordstrom
  • Location: Seattle, WA
  • Job Type: Full-Time, Hybrid
  • Category: DevOps, Infrastructure
  • Date Posted: July 23, 2025
  • Experience Level: 5-10 years
  • Remote Status: On-site/Hybrid

πŸš€ Role Summary

  • Lead and mentor a high-performing SRE team to deliver resilient, scalable, and high-performing systems.
  • Champion automation, collaborate across disciplines, and ensure infrastructure supports business growth and innovation.
  • Ensure the availability and performance of critical services through proactive monitoring, incident response, and root cause analysis.

πŸ“ Enhancement Note: This role requires a strategic and hands-on leader with a strong background in SRE, DevOps, or infrastructure engineering. The ideal candidate will have experience leading teams, driving reliability, and collaborating with various stakeholders to achieve business objectives.

πŸ’» Primary Responsibilities

  • Lead & Inspire: Build and mentor a high-performing SRE team, fostering a culture of ownership, innovation, and continuous learning.
  • Drive Reliability: Ensure the availability and performance of critical services through proactive monitoring, incident response, and root cause analysis.
  • Automate Everything: Reduce manual toil by implementing automation across deployment, recovery, and scaling processes.
  • Monitor & Observe: Define and execute observability strategies using New Relic, Splunk, and other tools to detect and resolve issues before they impact users.
  • Collaborate & Align: Partner with engineering, product, and operations teams to align reliability goals with business priorities.
  • Plan for Scale: Lead capacity planning and performance tuning for services running on AWS EKS and other cloud-native platforms.
  • Measure & Improve: Establish and track SLOs, SLAs, and error budgets. Continuously refine processes to improve system reliability and team efficiency.

πŸŽ“ Skills & Qualifications

Education: Bachelor’s degree in Computer Science, Engineering, or equivalent experience.

Experience: 5+ years in SRE, DevOps, or infrastructure engineering, with 2+ years in a leadership role.

Required Skills:

  • Expertise in cloud platforms (especially AWS), container orchestration (Kubernetes, EKS), and CI/CD pipelines.
  • Proficiency in Python, Go, or Java.
  • Hands-on experience with New Relic, Splunk, and Kubernetes.
  • Strong analytical skills and a passion for root cause analysis and continuous improvement.
  • Clear, concise, and collaborative communicator who thrives in cross-functional environments.

Preferred Skills:

  • Experience with large-scale distributed systems.
  • Familiarity with ITIL or similar incident management frameworks.
  • Cloud certifications (e.g., AWS Solutions Architect, Google Cloud Professional Engineer).

πŸ“Š Web Portfolio & Project Requirements

Portfolio Essentials:

  • Demonstrate experience in leading SRE teams and driving reliability in large-scale systems.
  • Showcase automation projects that have reduced manual toil and improved system reliability.
  • Highlight incident response and root cause analysis case studies.

Technical Documentation:

  • Provide documentation of system architecture, deployment processes, and monitoring strategies.
  • Include performance metrics, testing methodologies, and optimization techniques used in previous projects.

πŸ’΅ Compensation & Benefits

Salary Range: $191,000 - $297,000 per year (based on the provided range)

Benefits:

  • Medical, Vision, Dental, Retirement, and Paid Time Away
  • Life Insurance and Disability
  • Merchandise Discount and EAP Resources

Working Hours: 40 hours per week, with flexibility for deployment windows, maintenance, and project deadlines.

🎯 Team & Company Context

Company Culture:

  • Nordstrom is a leading fashion retailer known for its exceptional customer service and high-quality products.
  • With a strong focus on innovation and technology, Nordstrom offers an environment that values collaboration, creativity, and continuous learning.

Team Structure:

  • The SRE team works closely with engineering, product, and operations teams to ensure infrastructure supports business growth and innovation.
  • The team is responsible for the availability and performance of critical services, collaborating with various stakeholders to achieve business objectives.

Development Methodology:

  • Nordstrom follows Agile methodologies, with a focus on continuous integration, continuous deployment, and iterative development.
  • The SRE team uses tools like New Relic, Splunk, and Kubernetes to monitor, observe, and manage system reliability.

Company Website: www.nordstrom.com

πŸ“ Enhancement Note: Nordstrom's culture emphasizes customer focus, innovation, and collaboration. The SRE team plays a crucial role in ensuring the reliability and performance of Nordstrom's online platforms, supporting the company's mission to provide an exceptional customer experience.

πŸ“ˆ Career & Growth Analysis

Web Technology Career Level: This role is a senior management position, responsible for leading a high-performing SRE team and driving reliability in large-scale systems.

Reporting Structure: The Senior Manager, SRE reports directly to the Director of Engineering and oversees a team of SRE engineers.

Technical Impact: This role has a significant impact on Nordstrom's online platforms, ensuring their availability, performance, and scalability. The Senior Manager, SRE works closely with various teams to align reliability goals with business priorities and drive continuous improvement.

Growth Opportunities:

  • Technical Leadership: Develop and mentor team members, fostering a culture of ownership and continuous learning.
  • Architecture Decisions: Influence the design and implementation of large-scale, distributed systems, driving reliability and performance improvements.
  • Cross-Functional Collaboration: Work closely with engineering, product, and operations teams to align reliability goals with business priorities and drive continuous improvement.

πŸ“ Enhancement Note: This role offers significant growth opportunities for experienced SRE professionals looking to advance their careers in a dynamic and innovative environment. The Senior Manager, SRE is responsible for driving reliability in large-scale systems and has a substantial impact on Nordstrom's online platforms and business growth.

🌐 Work Environment

Office Type: Nordstrom's Seattle headquarters offers a modern, collaborative workspace with multiple monitors, testing devices, and development tools available for SRE engineers.

Office Location(s): Nordstrom's Seattle headquarters is located in the heart of downtown Seattle, with easy access to public transportation and nearby amenities.

Workspace Context:

  • The SRE team works in an open, collaborative environment, fostering knowledge sharing and continuous learning.
  • SRE engineers have access to the latest tools and technologies, enabling them to work efficiently and effectively.
  • The team interacts regularly with other departments, including engineering, product, and operations, to ensure infrastructure supports business growth and innovation.

Work Schedule: Nordstrom offers a hybrid work arrangement, with a flexible schedule that balances on-site collaboration and remote work. Working hours are typically 40 hours per week, with flexibility for deployment windows, maintenance, and project deadlines.

πŸ“ Enhancement Note: Nordstrom's Seattle headquarters provides a modern, collaborative workspace that supports the SRE team's mission to ensure the reliability and performance of the company's online platforms. The hybrid work arrangement offers a balance between on-site collaboration and remote work, fostering a flexible and inclusive work environment.

πŸ“„ Application & Technical Interview Process

Interview Process:

  1. Technical Assessment: A hands-on technical assessment focused on SRE fundamentals, incident response, and root cause analysis.
  2. System Design Discussion: A discussion on system design, capacity planning, and performance tuning for large-scale, distributed systems.
  3. Behavioral Interview: A behavioral interview focused on leadership, communication, and problem-solving skills.
  4. Final Evaluation: A final evaluation based on technical expertise, cultural fit, and alignment with Nordstrom's values.

Portfolio Review Tips:

  • Highlight automation projects that have reduced manual toil and improved system reliability.
  • Showcase incident response and root cause analysis case studies, demonstrating your ability to drive reliability in large-scale systems.
  • Emphasize your experience leading SRE teams and driving reliability in complex environments.

Technical Challenge Preparation:

  • Brush up on your knowledge of cloud platforms (especially AWS), container orchestration (Kubernetes, EKS), and CI/CD pipelines.
  • Familiarize yourself with New Relic, Splunk, and other monitoring tools used by Nordstrom.
  • Prepare for questions on incident response, root cause analysis, and system design for large-scale, distributed systems.

ATS Keywords:

  • Programming Languages: Python, Go, Java
  • Cloud Platforms: AWS, Kubernetes, EKS
  • Monitoring Tools: New Relic, Splunk
  • Incident Management: Root Cause Analysis, Incident Response, Proactive Monitoring
  • Leadership: Team Management, Mentoring, Cross-Functional Collaboration
  • Soft Skills: Communication, Problem-Solving, Analytical Skills

πŸ“ Enhancement Note: Nordstrom's interview process focuses on assessing technical expertise, leadership, and cultural fit. The Senior Manager, SRE role requires a strong background in SRE, DevOps, or infrastructure engineering, as well as proven leadership and communication skills. Preparation for the technical assessment, system design discussion, and behavioral interview will be crucial for success in this role.

πŸ›  Technology Stack & Web Infrastructure

Cloud Platforms:

  • AWS: Nordstrom's primary cloud provider, offering a wide range of services for building, deploying, and scaling applications.
  • EKS: Amazon Elastic Kubernetes Service, a fully managed Kubernetes service that makes it easy for developers to run and scale containerized applications using Kubernetes.

Container Orchestration:

  • Kubernetes: An open-source platform for automating deployment, scaling, and management of containerized applications.

Monitoring Tools:

  • New Relic: A software analytics platform that provides real-time performance monitoring, application analytics, and end-user monitoring for web and mobile applications.
  • Splunk: A data analytics platform that provides real-time operational intelligence, monitoring, and analytics for machine data.

CI/CD Pipelines:

  • Jenkins: An open-source automation server with a web interface for managing and executing software development tasks.

Infrastructure as Code (IaC) Tools:

  • Terraform: An open-source infrastructure as code software tool that provides a declarative way to launch and manage infrastructure.

πŸ‘₯ Team Culture & Values

Nordstrom's Core Values:

  • Customer First: Focus on understanding and exceeding customer expectations.
  • Own It: Take personal responsibility for your actions and decisions.
  • Respect & Integrity: Treat others with kindness, fairness, and honesty.
  • Continuous Learning: Embrace a growth mindset and strive for continuous improvement.

SRE Team Culture:

  • Reliability-Focused: Prioritize system availability, performance, and scalability.
  • Automation-Driven: Reduce manual toil and improve efficiency through automation.
  • Collaborative: Work closely with other teams to ensure infrastructure supports business growth and innovation.
  • Customer-Centric: Understand and address the impact of system reliability on Nordstrom's customers.

Collaboration Style:

  • Cross-Functional: Work closely with engineering, product, and operations teams to align reliability goals with business priorities.
  • Peer Review: Encourage knowledge sharing, technical mentoring, and continuous learning within the SRE team.
  • Agile: Follow Agile methodologies, with a focus on continuous integration, continuous deployment, and iterative development.

πŸ“ Enhancement Note: Nordstrom's culture emphasizes customer focus, innovation, and collaboration. The SRE team plays a crucial role in ensuring the reliability and performance of Nordstrom's online platforms, supporting the company's mission to provide an exceptional customer experience. The team's culture is built on a foundation of reliability, automation, collaboration, and continuous learning.

⚑ Challenges & Growth Opportunities

Technical Challenges:

  • Large-Scale Systems: Design, deploy, and manage reliable, scalable, and high-performing systems in a complex, distributed environment.
  • Incident Response: Develop and implement incident response plans to minimize the impact of system failures and ensure rapid recovery.
  • Root Cause Analysis: Identify and address the root causes of system issues to prevent recurring problems and improve overall reliability.
  • Performance Optimization: Continuously monitor and optimize system performance to meet the demands of Nordstrom's growing customer base.

Learning & Development Opportunities:

  • Technical Skill Development: Stay up-to-date with the latest trends and best practices in SRE, DevOps, and infrastructure engineering.
  • Leadership Development: Develop and refine your leadership skills through mentoring, coaching, and hands-on experience.
  • Architecture Decisions: Influence the design and implementation of large-scale, distributed systems, driving reliability and performance improvements.

πŸ“ Enhancement Note: The Senior Manager, SRE role presents numerous technical challenges and growth opportunities for experienced SRE professionals. The ideal candidate will have a strong background in SRE, DevOps, or infrastructure engineering, as well as proven leadership and communication skills. This role offers the chance to drive reliability in large-scale systems, collaborate with various teams, and make a significant impact on Nordstrom's online platforms and business growth.

πŸ’‘ Interview Preparation

Technical Questions:

  • System Design: Describe your approach to designing and implementing large-scale, distributed systems. How do you ensure reliability, scalability, and performance?
  • Incident Response: Walk us through your incident response process. How do you identify, diagnose, and resolve system issues quickly and effectively?
  • Root Cause Analysis: Explain your root cause analysis methodology. How do you identify the underlying causes of system issues and prevent recurring problems?
  • Performance Optimization: Describe your performance optimization strategies. How do you monitor and improve system performance in a large-scale, distributed environment?

Company & Culture Questions:

  • Nordstrom's Culture: How do you align with Nordstrom's core values, particularly 'Customer First' and 'Own It'?
  • Team Collaboration: Describe your experience working with cross-functional teams. How do you ensure effective collaboration and communication?
  • Agile Methodologies: Explain your experience with Agile methodologies. How do you apply Agile principles to SRE and infrastructure management?

Portfolio Presentation Strategy:

  • Automation Projects: Highlight your automation projects, demonstrating how you have reduced manual toil and improved system reliability.
  • Incident Response Case Studies: Showcase your incident response and root cause analysis case studies, emphasizing your ability to drive reliability in large-scale systems.
  • Leadership Experience: Emphasize your experience leading SRE teams and driving reliability in complex environments.

πŸ“ Enhancement Note: Nordstrom's interview process focuses on assessing technical expertise, leadership, and cultural fit. The Senior Manager, SRE role requires a strong background in SRE, DevOps, or infrastructure engineering, as well as proven leadership and communication skills. Preparation for the technical assessment, system design discussion, and behavioral interview will be crucial for success in this role.

πŸ“Œ Application Steps

To apply for this Senior Manager, Site Reliability Engineering (SRE) - Hybrid position at Nordstrom:

  1. Customize Your Portfolio: Tailor your portfolio to showcase your experience in leading SRE teams and driving reliability in large-scale systems. Highlight automation projects, incident response case studies, and your approach to system design, performance optimization, and root cause analysis.
  2. Optimize Your Resume: Update your resume to emphasize your relevant skills, experience, and achievements in SRE, DevOps, or infrastructure engineering. Include project highlights that demonstrate your ability to drive reliability and improve system performance.
  3. Prepare for Technical Assessment: Brush up on your knowledge of cloud platforms (especially AWS), container orchestration (Kubernetes, EKS), and CI/CD pipelines. Familiarize yourself with New Relic, Splunk, and other monitoring tools used by Nordstrom. Practice incident response, root cause analysis, and system design questions to ensure you are well-prepared for the technical assessment.
  4. Research Nordstrom: Learn about Nordstrom's business, culture, and values. Understand the company's approach to technology, innovation, and customer experience. Prepare thoughtful questions to ask during the interview process, demonstrating your interest in the role and the company.

⚠️ Important Notice: This enhanced job description includes AI-generated insights and web development/server administration industry-standard assumptions. All details should be verified directly with the hiring organization before making application decisions.

Application Requirements

Candidates should have over 5 years of experience in SRE, DevOps, or infrastructure engineering, with at least 2 years in a leadership role. Proficiency in cloud platforms, programming languages, and tools like New Relic and Splunk is essential.