Senior DevOps Engineer
📍 Job Overview
- Job Title: Senior DevOps Engineer
- Company: Together AI
- Location: San Francisco, California, United States
- Job Type: On-site
- Category: DevOps Engineer
- Date Posted: June 19, 2025
- Experience Level: 5-10 years
- Remote Status: Remote OK
🚀 Role Summary
-
📝 Enhancement Note: This role involves designing, building, and maintaining infrastructure for AI workloads, requiring strong software development fundamentals and systems knowledge.
-
📝 Enhancement Note: The Senior DevOps Engineer will work closely with internal teams to ensure best practices are appropriately applied, fostering a collaborative environment.
-
📝 Enhancement Note: This role offers the opportunity to build and optimize cloud infrastructure for GPU-resident applications, driving innovation in AI systems.
💻 Primary Responsibilities
-
📝 Enhancement Note: The Senior DevOps Engineer will be responsible for introducing tools to facilitate greater automation and operability of services, requiring a strong sense of ownership and responsibility.
-
📝 Enhancement Note: Designing, building, and maintaining CI/CD infrastructure is a critical aspect of this role, ensuring efficient and reliable software delivery.
-
📝 Enhancement Note: Architecting, deploying, and scaling observability infrastructure is essential for optimizing cloud triaging and limiting downtime, demonstrating strong problem-solving abilities.
🎓 Skills & Qualifications
Education: Bachelor's degree in Computer Science, Engineering, or a related field. Relevant experience may be considered in lieu of a degree.
Experience: Minimum of 5 years of prior relevant experience in DevOps, cloud computing, data center operations, and Linux systems administration.
Required Skills:
- Proficiency in at least one of the following programming languages: Go, Python, Java, or C++
- Experience in designing and building advanced CI/CD pipeline frameworks
- Experience with cloud computing toolsets like Terraform, Vault, and Packer
- Experience with configuration management tools like Ansible, Pulumi, Chef, and Puppet
- Experience with Kubernetes and containerization
- Strong sense of ownership and desire to build great tools for others
Preferred Skills:
- Experience with AI workloads and GPU-resident applications
- Familiarity with open-source research, models, and datasets in the AI domain
- Knowledge of software, hardware, algorithms, and models co-design for AI systems
📝 Enhancement Note: While not explicitly stated, familiarity with AI systems and research would be beneficial for this role, as the candidate will be working with AI workloads and GPU-resident applications.
📊 Web Portfolio & Project Requirements
Portfolio Essentials:
- Demonstrate experience in designing, building, and maintaining CI/CD pipelines, showcasing your ability to optimize software delivery processes.
- Highlight projects that involve infrastructure-as-code, using tools like Terraform and Ansible, to illustrate your proficiency in automating infrastructure management.
- Include examples of observability infrastructure projects, showcasing your ability to architect, deploy, and scale monitoring and logging solutions.
- Present case studies that demonstrate your problem-solving skills and ability to optimize cloud triaging and limit downtime.
Technical Documentation:
- Provide detailed documentation for your CI/CD pipelines, explaining the architecture, components, and workflows.
- Include configuration management scripts and templates, showcasing your ability to automate infrastructure provisioning and management.
- Document your approach to observability, explaining the tools, metrics, and alerting strategies you've implemented to monitor and optimize AI workloads.
💵 Compensation & Benefits
Salary Range: $160,000 - $230,000 per year (USD)
Benefits:
- Health Insurance
- Startup Equity
- Competitive Benefits
Working Hours: Full-time position with standard working hours, flexible for deployment windows, maintenance, and project deadlines.
📝 Enhancement Note: The salary range provided is based on market research for senior DevOps engineer roles in the San Francisco Bay Area, considering the candidate's experience level and the company's size.
🎯 Team & Company Context
🏢 Company Culture
Industry: Artificial Intelligence research and development
Company Size: Medium (101-500 employees)
Founded: 2022
Team Structure:
- The DevOps team works closely with internal teams, including research, engineering, and product, to ensure best practices are appropriately applied.
- The team follows Agile methodologies, with a focus on collaboration, continuous improvement, and delivering high-quality infrastructure.
- The team is responsible for automating everything and building failure-resistant and horizontally scalable cloud infrastructure for GPU-resident applications.
Development Methodology:
- The team practices infrastructure-as-code, using tools like Terraform and Ansible to automate infrastructure management.
- They follow CI/CD best practices to ensure efficient and reliable software delivery.
- The team focuses on building observable systems, using tools like Prometheus and Grafana to monitor and optimize AI workloads.
Company Website: Together AI
📝 Enhancement Note: Together AI is a research-driven AI company focused on lowering the cost of modern AI systems by co-designing software, hardware, algorithms, and models. The company has contributed to leading open-source research, models, and datasets, and its team has been behind technological advancements like FlashAttention, Hyena, FlexGen, and RedPajama.
📈 Career & Growth Analysis
Web Technology Career Level: Senior DevOps Engineer - Responsible for designing, building, and maintaining infrastructure for AI workloads, driving innovation in AI systems, and collaborating with internal teams to ensure best practices are appropriately applied.
Reporting Structure: The Senior DevOps Engineer will report directly to the Head of Cloud Engineering and work closely with internal teams, including research, engineering, and product.
Technical Impact: The Senior DevOps Engineer will have a significant impact on the performance, reliability, and scalability of AI workloads, driving innovation in AI systems and optimizing cloud infrastructure for GPU-resident applications.
Growth Opportunities:
- 📝 Enhancement Note: Opportunities for growth in this role may include taking on more complex projects, mentoring junior team members, and driving technical initiatives that advance the company's AI infrastructure.
🌐 Work Environment
Office Type: On-site, with remote work options available
Office Location(s): San Francisco, California, United States
Workspace Context:
- The workspace is designed to facilitate collaboration and knowledge sharing, with multiple monitors and testing devices available for developers.
- The team interacts regularly, with a focus on code review culture and peer programming practices.
- The workspace is equipped with state-of-the-art hardware, including GPUs, to support AI workloads and research.
Work Schedule: Standard working hours with flexible time for deployment windows, maintenance, and project deadlines.
📝 Enhancement Note: The work environment at Together AI fosters collaboration, continuous learning, and innovation, with a focus on driving advancements in AI systems.
📄 Application & Technical Interview Process
Interview Process:
- 📝 Enhancement Note: Technical preparation recommendations include brushing up on programming languages, cloud computing tools, and AI workloads.
- 📝 Enhancement Note: Familiarize yourself with Together AI's services and research, as the interview process may involve discussing the company's AI systems and infrastructure.
- 📝 Enhancement Note: Be prepared to discuss your experience with infrastructure-as-code, CI/CD pipelines, and observability infrastructure, as these topics will be crucial for the technical assessment.
- 📝 Enhancement Note: The final evaluation criteria may include your ability to drive innovation in AI systems, optimize cloud infrastructure, and collaborate effectively with internal teams.
Portfolio Review Tips:
- 📝 Enhancement Note: Highlight your experience with infrastructure-as-code, CI/CD pipelines, and observability infrastructure in your portfolio, as these skills are essential for this role.
- 📝 Enhancement Note: Include case studies that demonstrate your problem-solving skills, ability to optimize cloud triaging, and limit downtime for AI workloads.
Technical Challenge Preparation:
- 📝 Enhancement Note: Practice designing, building, and maintaining CI/CD pipelines, as this will be a critical aspect of the technical challenge.
- 📝 Enhancement Note: Familiarize yourself with cloud computing tools like Terraform, Vault, and Packer, as well as configuration management tools like Ansible, Pulumi, Chef, and Puppet.
- 📝 Enhancement Note: Brush up on your programming skills, as you may be asked to write code or explain complex technical concepts during the interview process.
ATS Keywords:
- Programming Languages: Go, Python, Java, C++
- Web Frameworks: N/A
- Server Technologies: Linux, Kubernetes, Docker
- Databases: N/A
- Tools: Terraform, Ansible, Vault, Packer, Prometheus, Grafana
- Methodologies: Infrastructure-as-Code, CI/CD, Agile
- Soft Skills: Collaboration, Problem-Solving, Communication, Responsibility
- Industry Terms: AI, Machine Learning, Deep Learning, GPU, Cloud Computing, Infrastructure-as-Code, CI/CD, Observability
🛠 Technology Stack & Web Infrastructure
Frontend Technologies: N/A
Backend & Server Technologies:
- Linux
- Kubernetes
- Docker
- Terraform
- Ansible
- Vault
- Packer
- Prometheus
- Grafana
Development & DevOps Tools:
- Git
- Jenkins
- Terraform
- Ansible
- Pulumi
- Chef
- Puppet
- Cloud-based infrastructure (AWS, GCP, Azure)
👥 Team Culture & Values
Web Development Values:
- 📝 Enhancement Note: Together AI values open and transparent AI systems, driving innovation and creating the best outcomes for society.
- 📝 Enhancement Note: The company fosters a collaborative environment, with a focus on continuous learning, knowledge sharing, and driving advancements in AI systems.
Collaboration Style:
- 📝 Enhancement Note: Together AI encourages cross-functional integration between developers, researchers, and stakeholders, fostering a collaborative environment that drives innovation in AI systems.
- 📝 Enhancement Note: The company promotes code review culture and peer programming practices, ensuring high-quality infrastructure and driving technical excellence.
📝 Enhancement Note: Together AI's team culture emphasizes open communication, transparency, and collaboration, with a strong focus on driving advancements in AI systems and optimizing cloud infrastructure for GPU-resident applications.
🌐 Work Environment
Office Type: On-site, with remote work options available
Office Location(s): San Francisco, California, United States
Workspace Context:
- 📝 Enhancement Note: The workspace is designed to facilitate collaboration and knowledge sharing, with multiple monitors and testing devices available for developers.
- 📝 Enhancement Note: The team interacts regularly, with a focus on code review culture and peer programming practices.
- 📝 Enhancement Note: The workspace is equipped with state-of-the-art hardware, including GPUs, to support AI workloads and research.
Work Schedule: Standard working hours with flexible time for deployment windows, maintenance, and project deadlines.
📝 Enhancement Note: The work environment at Together AI fosters collaboration, continuous learning, and innovation, with a focus on driving advancements in AI systems.
📌 Application Steps
To apply for this Senior DevOps Engineer position:
- Submit your application through the application link.
- 📝 Enhancement Note: Customize your resume and portfolio to highlight your experience with infrastructure-as-code, CI/CD pipelines, and observability infrastructure.
- 📝 Enhancement Note: Prepare for the technical interview by brushing up on your programming skills, familiarizing yourself with cloud computing tools, and practicing designing, building, and maintaining CI/CD pipelines.
- 📝 Enhancement Note: Research Together AI's services and research, and be prepared to discuss the company's AI systems and infrastructure during the interview process.
📝 Enhancement Note: This enhanced job description includes AI-generated insights and web development industry-standard assumptions. All details should be verified directly with the hiring organization before making application decisions.
Application Requirements
Candidates should have a minimum of 5 years of relevant experience in DevOps, cloud computing, and Linux systems administration. Proficiency in programming languages such as Go, Python, Java, or C++, along with experience in CI/CD pipeline frameworks and cloud computing toolsets is essential.