Staff Site Reliability Engineer

Lucid Motors
Full_timeSouthfield, United States

📍 Job Overview

  • Job Title: Staff Site Reliability Engineer
  • Company: Lucid Motors
  • Location: Southfield, MI
  • Job Type: On-site
  • Category: DevOps Engineer
  • Date Posted: 2025-06-04
  • Experience Level: 10+
  • Remote Status: On-site

🚀 Role Summary

  • Key Responsibilities: Own and enhance service reliability, lead containerization and deployment, foster DevOps culture, monitor performance, and manage infrastructure as code.
  • Key Technologies: Kubernetes, Helm, Terraform, Prometheus, Grafana, AWS, GCP, Azure, Kafka, Spark, Presto, Airflow, MQTT, Ansible, Chef, Puppet.

💻 Primary Responsibilities

  • Reliability Engineering: Own and enhance the reliability of services deployed across various cloud regions. Proactively monitor, automate, and scale services to ensure seamless uptime and performance.
  • Containerization & Microservices Deployment: Lead the containerization and deployment of microservices and data pipelines on Kubernetes, using Helm charts, ensuring best practices for scalability and fault tolerance.
  • DevOps Advocacy: Foster and advocate for a DevOps culture that emphasizes automation, self-service, and engineering excellence. Enable development teams to manage and deploy applications seamlessly with minimal intervention.
  • Performance Monitoring & Autoscaling: Implement autoscaling strategies and monitor the performance of applications and infrastructure with tools like Prometheus, Grafana, and other observability platforms.
  • Site Reliability Engineering (SRE): Perform SRE tasks such as availability monitoring, incident response, post-mortem analysis, and preparing reliability reports for leadership and stakeholders.
  • Tool Deployment & Maintenance: Deploy, configure, and maintain essential cloud services and tools including Kafka, Spark, Presto, Airflow, MQTT, and other microservices platforms in a cloud-native environment.
  • Infrastructure as Code (IaC): Set up and manage cloud infrastructure using tools like Terraform, Cluster API, and other IaC frameworks, ensuring seamless provisioning, management, and scaling of resources.
  • Automated Alerts & Recovery: Continuously enhance and automate alerting, incident detection, and recovery mechanisms for critical applications and services to minimize downtime and improve system reliability.
  • On-Call Rotation: Participate in an on-call rotation to meet business SLAs, quickly troubleshoot and resolve issues, and document runbooks for consistent incident management processes.
  • Agile Collaboration: Work closely with Product Owners, Engineering Managers, and cross-functional teams in Agile Scrum and Kanban workflows to deliver iterative improvements and meet evolving business needs.
  • Impact Analysis & Incident Management: Perform impact analysis during incidents, collaborate with teams for root cause analysis, and implement preventive measures to avoid recurrence.

🎓 Skills & Qualifications

Education: B.S. or M.S. degree in Computer Science, Engineering, or a related technical field, or equivalent experience may be considered in lieu of degree.

Experience: 8+ years in Cloud Infrastructure, Site Reliability Engineering (SRE), DevOps Engineering, or related fields.

Required Skills:

  • At least 4+ years of hands-on experience deploying, managing, and optimizing containerized applications using Kubernetes in both public and private cloud environments (AWS, GCP, Azure, etc.).
  • 4+ years in Infrastructure-as-Code (IaC) using Terraform, Cluster API, or similar automation frameworks to manage cloud infrastructure.
  • Experience in scripting or programming with Python, Go, Bash/Shell, or similar languages.
  • Strong understanding of using Prometheus, Grafana, and other monitoring and observability tools.
  • Ability to effectively diagnose and resolve performance bottlenecks within AWS at the infrastructure and application layers.
  • Configuration Management: Experience with configuration management and automation tools such as Ansible, Chef, or Puppet (preferred but not required).

Preferred Skills:

  • Experience with AWS, GCP, and Azure cloud platforms.
  • Familiarity with data pipelines, ETL processes, and big data technologies.
  • Knowledge of CI/CD pipelines and deployment strategies.
  • Experience with incident management tools and platforms.

📊 Web Portfolio & Project Requirements

Portfolio Essentials:

  • Demonstrate your experience with containerization, microservices deployment, and infrastructure as code using relevant projects and case studies.
  • Showcase your proficiency in monitoring and observability tools with live demos and performance metrics.
  • Highlight your incident management and problem-solving skills with examples of successful incident resolution and runbook documentation.

Technical Documentation:

  • Provide code samples and documentation demonstrating your scripting and programming skills.
  • Include detailed explanations of your approach to infrastructure as code, highlighting best practices and automation strategies.
  • Showcase your understanding of cloud platforms and deployment strategies with architecture diagrams and deployment plans.

💵 Compensation & Benefits

Salary Range: $150,000 - $200,000 per year (based on regional market rates for senior DevOps/SRE roles in the United States)

Benefits:

  • Competitive salary and equity compensation.
  • Comprehensive health, dental, and vision insurance.
  • 401(k) retirement plan with company match.
  • Employee stock purchase plan.
  • Generous time off and paid holidays.
  • Flexible work arrangements and remote work options.
  • Professional development opportunities and training programs.
  • A dynamic, collaborative, and inclusive work environment.

Working Hours: Full-time, 40 hours per week, with flexible scheduling and on-call rotation responsibilities.

🎯 Team & Company Context

🏢 Company Culture

Industry: Automotive and mobility technology, focusing on luxury electric vehicles and sustainable transportation solutions.

Company Size: Medium to large (approximately 6,000 employees).

Founded: 2009, with a strong focus on innovation, sustainability, and luxury design.

Team Structure:

  • The SRE team works closely with development, product, and infrastructure teams to ensure service reliability and performance.
  • The team is structured in a matrix organization, with SREs embedded within product-focused squads.
  • The SRE team is responsible for defining and implementing reliability standards, tools, and processes across the organization.

Development Methodology:

  • Agile Scrum and Kanban methodologies are used for software development and project management.
  • Continuous Integration and Continuous Deployment (CI/CD) pipelines are employed for automated testing, deployment, and monitoring.
  • Infrastructure as Code (IaC) is used for automated provisioning, configuration, and management of cloud resources.

Company Website: Lucid Motors

📝 Enhancement Note: Lucid Motors is a rapidly growing company with a strong focus on innovation and sustainability. The SRE team plays a critical role in ensuring the reliability and performance of services that support the development, manufacturing, and sales of luxury electric vehicles.

📈 Career & Growth Analysis

Web Technology Career Level: Senior Site Reliability Engineer, responsible for leading reliability efforts, mentoring junior team members, and driving strategic initiatives.

Reporting Structure: Reports directly to the Director of Site Reliability Engineering or a similar role, with a matrixed reporting structure to product-focused squad leads.

Technical Impact: Collaborates with development, product, and infrastructure teams to define and implement reliability standards, tools, and processes. Ensures the reliability and performance of critical services and applications that support the company's mission to deliver luxury electric vehicles.

Growth Opportunities:

  • Technical Leadership: Transition into a technical leadership role, managing a team of SREs and driving strategic reliability initiatives across the organization.
  • Architecture & Design: Specialize in system architecture and design, focusing on the reliability and performance of large-scale, distributed systems.
  • Specialization: Develop expertise in specific areas such as data pipelines, big data technologies, or cloud platform specializations (AWS, GCP, Azure).

📝 Enhancement Note: Lucid Motors offers significant growth opportunities for technical professionals looking to advance their careers in a dynamic and innovative environment. The SRE team is critical to the company's success, and there is ample room for career progression and specialization.

🌐 Work Environment

Office Type: Modern, collaborative workspaces designed to facilitate cross-functional teamwork and innovation.

Office Location(s): Southfield, MI (headquarters), with additional offices in California, Arizona, and international locations.

Workspace Context:

  • Open-plan workspaces with dedicated team areas and meeting rooms.
  • State-of-the-art hardware and software tools, including multiple monitors and testing devices.
  • Access to on-site cafés, fitness centers, and other employee amenities.

Work Schedule: Full-time, 40 hours per week, with flexible scheduling and on-call rotation responsibilities. The company offers a hybrid work arrangement, allowing employees to work remotely up to two days per week.

📝 Enhancement Note: Lucid Motors provides a modern, collaborative work environment that fosters innovation and cross-functional teamwork. The company offers flexible work arrangements and remote work options to support work-life balance.

📄 Application & Technical Interview Process

Interview Process:

  1. Phone Screen: A brief phone or video call to discuss your background, experience, and career goals.
  2. Technical Deep Dive: A comprehensive technical interview focused on your experience with Kubernetes, Terraform, and other relevant technologies. You will be asked to describe your approach to reliability engineering, containerization, and infrastructure as code.
  3. On-Site Visit: An on-site visit to Lucid Motors' headquarters in Southfield, MI, where you will meet with key stakeholders, tour the facility, and participate in additional interviews and assessments.
  4. Final Decision: A final decision will be made, and you will be notified of the outcome.

Portfolio Review Tips:

  • Highlight your experience with Kubernetes, Terraform, and other relevant technologies with live demos and case studies.
  • Showcase your problem-solving skills and approach to reliability engineering with real-world examples and success stories.
  • Demonstrate your understanding of Lucid Motors' mission and values, and how your skills and experience align with the company's goals.

Technical Challenge Preparation:

  • Brush up on your knowledge of Kubernetes, Terraform, and other relevant technologies with hands-on exercises and tutorials.
  • Familiarize yourself with Lucid Motors' products, services, and company culture to ensure a strong fit and alignment with the company's mission.
  • Prepare for behavioral and situational interview questions that assess your problem-solving skills, teamwork, and cultural fit.

ATS Keywords: Kubernetes, Terraform, Helm, AWS, GCP, Azure, Prometheus, Grafana, IaC, SRE, DevOps, CI/CD, Agile, Scrum, Kanban, incident management, reliability engineering, containerization, microservices, infrastructure as code, cloud platforms, data pipelines, big data technologies.

📝 Enhancement Note: Lucid Motors' interview process is designed to assess your technical skills, problem-solving abilities, and cultural fit. The company values candidates who are passionate about innovation, sustainability, and luxury design, and who are eager to make a significant impact on the future of mobility.

📌 Application Steps

To apply for this Staff Site Reliability Engineer position at Lucid Motors:

  1. Customize Your Application: Tailor your resume and portfolio to highlight your experience with Kubernetes, Terraform, and other relevant technologies. Emphasize your problem-solving skills, reliability engineering approach, and alignment with Lucid Motors' mission and values.
  2. Submit Your Application: Apply through the provided application link, ensuring all required fields are completed accurately and thoroughly.
  3. Prepare for Phone Screen: Brush up on your knowledge of Lucid Motors, its products, and services. Be ready to discuss your background, experience, and career goals in a brief phone or video call.
  4. Research the Company: Familiarize yourself with Lucid Motors' mission, values, and company culture to ensure a strong fit and alignment with the company's goals.
  5. Prepare for Technical Deep Dive: Review your experience with Kubernetes, Terraform, and other relevant technologies. Be ready to discuss your approach to reliability engineering, containerization, and infrastructure as code in a comprehensive technical interview.
  6. Plan Your On-Site Visit: If invited for an on-site visit, prepare for additional interviews and assessments. Familiarize yourself with Lucid Motors' headquarters and plan your travel arrangements accordingly.

⚠️ Important Notice: This enhanced job description includes AI-generated insights and web technology industry-standard assumptions. All details should be verified directly with Lucid Motors before making application decisions.

Application Requirements

Candidates should have a B.S. or M.S. degree in a related technical field or equivalent experience, with 8+ years in relevant fields. A minimum of 4 years of hands-on experience with Kubernetes and Infrastructure-as-Code using Terraform is required.