Senior DevOps Engineer

Together AI
Full_time$160k-230k/year (USD)San Francisco, United States

📍 Job Overview

  • Job Title: Senior DevOps Engineer
  • Company: Together AI
  • Location: San Francisco, California, United States
  • Job Type: On-site
  • Category: DevOps Engineer
  • Date Posted: June 19, 2025
  • Experience Level: 5-10 years
  • Remote Status: Remote OK

🚀 Role Summary

  • 📝 Enhancement Note: This role involves designing, building, and maintaining infrastructure for AI workloads, requiring strong software development fundamentals and systems knowledge.

  • 📝 Enhancement Note: The Senior DevOps Engineer will work closely with internal teams to ensure best practices are appropriately applied, fostering a collaborative environment.

  • 📝 Enhancement Note: This role offers the opportunity to build and optimize cloud infrastructure for GPU-resident applications, driving innovation in AI systems.

💻 Primary Responsibilities

  • 📝 Enhancement Note: The Senior DevOps Engineer will be responsible for introducing tools to facilitate greater automation and operability of services, requiring a strong sense of ownership and responsibility.

  • 📝 Enhancement Note: Designing, building, and maintaining CI/CD infrastructure is a critical aspect of this role, ensuring efficient and reliable software delivery.

  • 📝 Enhancement Note: Architecting, deploying, and scaling observability infrastructure is essential for optimizing cloud triaging and limiting downtime, demonstrating strong problem-solving abilities.

🎓 Skills & Qualifications

Education: Bachelor's degree in Computer Science, Engineering, or a related field. Relevant experience may be considered in lieu of a degree.

Experience: Minimum of 5 years of prior relevant experience in DevOps, cloud computing, data center operations, and Linux systems administration.

Required Skills:

  • Proficiency in at least one of the following programming languages: Go, Python, Java, or C++
  • Experience in designing and building advanced CI/CD pipeline frameworks
  • Experience with cloud computing toolsets like Terraform, Vault, and Packer
  • Experience with configuration management tools like Ansible, Pulumi, Chef, and Puppet
  • Experience with Kubernetes and containerization
  • Strong sense of ownership and desire to build great tools for others

Preferred Skills:

  • Experience with AI workloads and GPU-resident applications
  • Familiarity with open-source research, models, and datasets in the AI domain
  • Knowledge of software, hardware, algorithms, and models co-design for AI systems

📝 Enhancement Note: While not explicitly stated, familiarity with AI systems and research would be beneficial for this role, as the candidate will be working with AI workloads and GPU-resident applications.

📊 Web Portfolio & Project Requirements

Portfolio Essentials:

  • Demonstrate experience in designing, building, and maintaining CI/CD pipelines, showcasing your ability to optimize software delivery processes.
  • Highlight projects that involve infrastructure-as-code, using tools like Terraform and Ansible, to illustrate your proficiency in automating infrastructure management.
  • Include examples of observability infrastructure projects, showcasing your ability to architect, deploy, and scale monitoring and logging solutions.
  • Present case studies that demonstrate your problem-solving skills and ability to optimize cloud triaging and limit downtime.

Technical Documentation:

  • Provide detailed documentation for your CI/CD pipelines, explaining the architecture, components, and workflows.
  • Include configuration management scripts and templates, showcasing your ability to automate infrastructure provisioning and management.
  • Document your approach to observability, explaining the tools, metrics, and alerting strategies you've implemented to monitor and optimize AI workloads.

💵 Compensation & Benefits

Salary Range: $160,000 - $230,000 per year (USD)

Benefits:

  • Health Insurance
  • Startup Equity
  • Competitive Benefits

Working Hours: Full-time position with standard working hours, flexible for deployment windows, maintenance, and project deadlines.

📝 Enhancement Note: The salary range provided is based on market research for senior DevOps engineer roles in the San Francisco Bay Area, considering the candidate's experience level and the company's size.

🎯 Team & Company Context

🏢 Company Culture

Industry: Artificial Intelligence research and development

Company Size: Medium (101-500 employees)

Founded: 2022

Team Structure:

  • The DevOps team works closely with internal teams, including research, engineering, and product, to ensure best practices are appropriately applied.
  • The team follows Agile methodologies, with a focus on collaboration, continuous improvement, and delivering high-quality infrastructure.
  • The team is responsible for automating everything and building failure-resistant and horizontally scalable cloud infrastructure for GPU-resident applications.

Development Methodology:

  • The team practices infrastructure-as-code, using tools like Terraform and Ansible to automate infrastructure management.
  • They follow CI/CD best practices to ensure efficient and reliable software delivery.
  • The team focuses on building observable systems, using tools like Prometheus and Grafana to monitor and optimize AI workloads.

Company Website: Together AI

📝 Enhancement Note: Together AI is a research-driven AI company focused on lowering the cost of modern AI systems by co-designing software, hardware, algorithms, and models. The company has contributed to leading open-source research, models, and datasets, and its team has been behind technological advancements like FlashAttention, Hyena, FlexGen, and RedPajama.

📈 Career & Growth Analysis

Web Technology Career Level: Senior DevOps Engineer - Responsible for designing, building, and maintaining infrastructure for AI workloads, driving innovation in AI systems, and collaborating with internal teams to ensure best practices are appropriately applied.

Reporting Structure: The Senior DevOps Engineer will report directly to the Head of Cloud Engineering and work closely with internal teams, including research, engineering, and product.

Technical Impact: The Senior DevOps Engineer will have a significant impact on the performance, reliability, and scalability of AI workloads, driving innovation in AI systems and optimizing cloud infrastructure for GPU-resident applications.

Growth Opportunities:

  • 📝 Enhancement Note: Opportunities for growth in this role may include taking on more complex projects, mentoring junior team members, and driving technical initiatives that advance the company's AI infrastructure.

🌐 Work Environment

Office Type: On-site, with remote work options available

Office Location(s): San Francisco, California, United States

Workspace Context:

  • The workspace is designed to facilitate collaboration and knowledge sharing, with multiple monitors and testing devices available for developers.
  • The team interacts regularly, with a focus on code review culture and peer programming practices.
  • The workspace is equipped with state-of-the-art hardware, including GPUs, to support AI workloads and research.

Work Schedule: Standard working hours with flexible time for deployment windows, maintenance, and project deadlines.

📝 Enhancement Note: The work environment at Together AI fosters collaboration, continuous learning, and innovation, with a focus on driving advancements in AI systems.

📄 Application & Technical Interview Process

Interview Process:

  1. 📝 Enhancement Note: Technical preparation recommendations include brushing up on programming languages, cloud computing tools, and AI workloads.
  2. 📝 Enhancement Note: Familiarize yourself with Together AI's services and research, as the interview process may involve discussing the company's AI systems and infrastructure.
  3. 📝 Enhancement Note: Be prepared to discuss your experience with infrastructure-as-code, CI/CD pipelines, and observability infrastructure, as these topics will be crucial for the technical assessment.
  4. 📝 Enhancement Note: The final evaluation criteria may include your ability to drive innovation in AI systems, optimize cloud infrastructure, and collaborate effectively with internal teams.

Portfolio Review Tips:

  • 📝 Enhancement Note: Highlight your experience with infrastructure-as-code, CI/CD pipelines, and observability infrastructure in your portfolio, as these skills are essential for this role.
  • 📝 Enhancement Note: Include case studies that demonstrate your problem-solving skills, ability to optimize cloud triaging, and limit downtime for AI workloads.

Technical Challenge Preparation:

  • 📝 Enhancement Note: Practice designing, building, and maintaining CI/CD pipelines, as this will be a critical aspect of the technical challenge.
  • 📝 Enhancement Note: Familiarize yourself with cloud computing tools like Terraform, Vault, and Packer, as well as configuration management tools like Ansible, Pulumi, Chef, and Puppet.
  • 📝 Enhancement Note: Brush up on your programming skills, as you may be asked to write code or explain complex technical concepts during the interview process.

ATS Keywords:

  • Programming Languages: Go, Python, Java, C++
  • Web Frameworks: N/A
  • Server Technologies: Linux, Kubernetes, Docker
  • Databases: N/A
  • Tools: Terraform, Ansible, Vault, Packer, Prometheus, Grafana
  • Methodologies: Infrastructure-as-Code, CI/CD, Agile
  • Soft Skills: Collaboration, Problem-Solving, Communication, Responsibility
  • Industry Terms: AI, Machine Learning, Deep Learning, GPU, Cloud Computing, Infrastructure-as-Code, CI/CD, Observability

🛠 Technology Stack & Web Infrastructure

Frontend Technologies: N/A

Backend & Server Technologies:

  • Linux
  • Kubernetes
  • Docker
  • Terraform
  • Ansible
  • Vault
  • Packer
  • Prometheus
  • Grafana

Development & DevOps Tools:

  • Git
  • Jenkins
  • Terraform
  • Ansible
  • Pulumi
  • Chef
  • Puppet
  • Cloud-based infrastructure (AWS, GCP, Azure)

👥 Team Culture & Values

Web Development Values:

  • 📝 Enhancement Note: Together AI values open and transparent AI systems, driving innovation and creating the best outcomes for society.
  • 📝 Enhancement Note: The company fosters a collaborative environment, with a focus on continuous learning, knowledge sharing, and driving advancements in AI systems.

Collaboration Style:

  • 📝 Enhancement Note: Together AI encourages cross-functional integration between developers, researchers, and stakeholders, fostering a collaborative environment that drives innovation in AI systems.
  • 📝 Enhancement Note: The company promotes code review culture and peer programming practices, ensuring high-quality infrastructure and driving technical excellence.

📝 Enhancement Note: Together AI's team culture emphasizes open communication, transparency, and collaboration, with a strong focus on driving advancements in AI systems and optimizing cloud infrastructure for GPU-resident applications.

🌐 Work Environment

Office Type: On-site, with remote work options available

Office Location(s): San Francisco, California, United States

Workspace Context:

  • 📝 Enhancement Note: The workspace is designed to facilitate collaboration and knowledge sharing, with multiple monitors and testing devices available for developers.
  • 📝 Enhancement Note: The team interacts regularly, with a focus on code review culture and peer programming practices.
  • 📝 Enhancement Note: The workspace is equipped with state-of-the-art hardware, including GPUs, to support AI workloads and research.

Work Schedule: Standard working hours with flexible time for deployment windows, maintenance, and project deadlines.

📝 Enhancement Note: The work environment at Together AI fosters collaboration, continuous learning, and innovation, with a focus on driving advancements in AI systems.

📌 Application Steps

To apply for this Senior DevOps Engineer position:

  1. Submit your application through the application link.
  2. 📝 Enhancement Note: Customize your resume and portfolio to highlight your experience with infrastructure-as-code, CI/CD pipelines, and observability infrastructure.
  3. 📝 Enhancement Note: Prepare for the technical interview by brushing up on your programming skills, familiarizing yourself with cloud computing tools, and practicing designing, building, and maintaining CI/CD pipelines.
  4. 📝 Enhancement Note: Research Together AI's services and research, and be prepared to discuss the company's AI systems and infrastructure during the interview process.

📝 Enhancement Note: This enhanced job description includes AI-generated insights and web development industry-standard assumptions. All details should be verified directly with the hiring organization before making application decisions.

Application Requirements

Candidates should have a minimum of 5 years of relevant experience in DevOps, cloud computing, and Linux systems administration. Proficiency in programming languages such as Go, Python, Java, or C++, along with experience in CI/CD pipeline frameworks and cloud computing toolsets is essential.