ML Infrastructure Engineer

Symbolica AI
Full_timeSan Francisco, United States

📍 Job Overview

  • Job Title: ML Infrastructure Engineer
  • Company: Symbolica AI
  • Location: San Francisco, California, United States
  • Job Type: On-site
  • Category: Infrastructure Engineer
  • Date Posted: June 25, 2025
  • Experience Level: 5-10 years

🚀 Role Summary

  • Design, build, and optimize infrastructure for large-scale machine learning workflows at a cutting-edge AI research lab.
  • Collaborate closely with machine learning scientists, researchers, and engineers to identify and address infrastructure needs.
  • Ensure efficient use of cloud and on-prem hardware for training and inference, and build/maintain CI/CD pipelines tailored for machine learning development.

📝 Enhancement Note: This role requires a strong background in machine learning infrastructure and MLOps to support Symbolica's research and development efforts in applying category theory to enable logical reasoning in machines.

💻 Primary Responsibilities

  • Infrastructure Expansion & Optimization: Expand and improve existing infrastructure for large-scale machine learning workflows, including training systems and model deployment.
  • Tool Development: Develop tools and frameworks to support the global team's experiments, ensuring reproducibility and scalability.
  • Compute Resource Optimization: Optimize compute resources and ensure efficient use of cloud and on-prem hardware for training and inference.
  • CI/CD Pipeline Development: Build and maintain CI/CD pipelines tailored for machine learning development.
  • Collaboration: Collaborate closely with machine learning scientists, researchers, and engineers to identify and address infrastructure needs.

📝 Enhancement Note: This role requires exceptional problem-solving skills and the ability to nimbly solve edge-cases with minimum disruption, as well as proficiency in scaling DevOps pipelines for both traditional software and MLOps pipelines using orchestration tools like ZenML and Kubernetes.

🎓 Skills & Qualifications

Education: A bachelor's degree in Computer Science, Engineering, or a related field. Relevant experience may be considered in lieu of a degree.

Experience: 5+ years of experience in software engineering or infrastructure roles, with at least 2 years in machine learning infrastructure or MLOps.

Required Skills:

  • Proficiency in scaling DevOps pipelines and experience with orchestration tools like ZenML and Kubernetes.
  • Experience with Linux, containers, Nix, and Kubernetes.
  • Strong problem-solving skills and the ability to work in a fast-paced, execution-first environment.
  • Interest in ensuring the infrastructure behind models is secure by design.

Preferred Skills:

  • Familiarity with machine learning frameworks (e.g., TensorFlow, PyTorch) and libraries (e.g., NumPy, Pandas).
  • Experience with cloud providers (e.g., AWS, GCP, Azure) and on-prem servers.
  • Knowledge of category theory, formal mathematics, or logical reasoning.

📝 Enhancement Note: This role requires a strong background in machine learning infrastructure and MLOps, as well as a solid understanding of Linux, containers, Nix, and Kubernetes. Familiarity with machine learning frameworks and libraries, as well as category theory, would be a significant advantage.

📊 Web Portfolio & Project Requirements

Portfolio Essentials:

  • Demonstrate experience in designing, building, and optimizing infrastructure for machine learning workflows.
  • Showcase projects that highlight your ability to collaborate with machine learning teams and address their infrastructure needs.
  • Include examples of CI/CD pipelines you've built and maintained for machine learning development.

Technical Documentation:

  • Document your approach to optimizing compute resources and ensuring efficient use of cloud and on-prem hardware for training and inference.
  • Explain your problem-solving strategies and how you've addressed edge-cases in previous projects.
  • Provide code samples and explain your coding style, ensuring it adheres to best practices and is well-commented.

📝 Enhancement Note: This role requires a strong focus on technical documentation, as it is crucial for ensuring the reproducibility and scalability of machine learning experiments. Be prepared to discuss your documentation approach and provide examples of your code and problem-solving strategies.

💵 Compensation & Benefits

Salary Range: $150,000 - $200,000 per year (based on San Francisco market rates for senior infrastructure engineers with machine learning experience)

Benefits:

  • Competitive salary and early-stage equity package.
  • A high-trust, execution-first culture with minimal bureaucracy.
  • Direct ownership of meaningful projects with real business impact.
  • A rare opportunity to sit at the interface between deep research and real-world productization.

Working Hours: Full-time, 40 hours per week. Flexible working hours to accommodate project deadlines and maintenance windows.

📝 Enhancement Note: The salary range for this role is based on market research for senior infrastructure engineers with machine learning experience in San Francisco. The benefits package is designed to attract top talent and support the company's high-trust, execution-first culture.

🎯 Team & Company Context

🏢 Company Culture

Industry: AI research and development, focusing on applying category theory to enable logical reasoning in machines.

Company Size: Small, well-resourced team of experts with a nimble and agile approach to research and development.

Founded: 2022, with a mission to bridge the gap between theoretical mathematics and cutting-edge technologies.

Team Structure:

  • A tight-knit team of machine learning scientists, researchers, engineers, and infrastructure specialists working closely together.
  • A flat hierarchy that encourages collaboration, innovation, and direct ownership of projects.

Development Methodology:

  • Agile and iterative development processes, with a focus on fast-paced, results-driven execution.
  • A tight feedback loop between research and application, where research fuels product-focused machine learning models.

Company Website: Symbolica AI

📝 Enhancement Note: Symbolica AI's company culture is characterized by its high-trust, execution-first approach, which values collaboration, innovation, and direct ownership of projects. This culture is reflected in the company's flat hierarchy and agile development methodologies.

📈 Career & Growth Analysis

ML Infrastructure Engineer Career Level: This role is suited to an experienced infrastructure engineer with a strong background in machine learning infrastructure and MLOps. The ideal candidate will have 5-10 years of experience in software engineering or infrastructure roles, with at least 2 years in machine learning infrastructure or MLOps.

Reporting Structure: This role reports directly to the ML Infrastructure Lead and works closely with the machine learning scientists, researchers, and engineers to identify and address infrastructure needs.

Technical Impact: This role has a significant impact on Symbolica's research and development efforts, as it ensures that the global team has the robust platform they need to push the boundaries of AI. The infrastructure and tools developed in this role will enable Symbolica's research and development efforts and support the company's mission to create AI systems that transform industries.

Growth Opportunities:

  • Technical Growth: Develop expertise in machine learning infrastructure and MLOps, and stay up-to-date with the latest trends and best practices in the field.
  • Leadership Growth: As the company grows, there may be opportunities to take on more leadership responsibilities and help shape the infrastructure team's direction.
  • Research & Development Growth: Collaborate with machine learning scientists and researchers to explore new approaches to applying category theory to enable logical reasoning in machines.

📝 Enhancement Note: This role offers significant growth opportunities for experienced infrastructure engineers looking to develop their expertise in machine learning infrastructure and MLOps. As the company grows, there may be opportunities to take on more leadership responsibilities and help shape the infrastructure team's direction.

🌐 Work Environment

Office Type: A modern, collaborative workspace designed to facilitate cross-functional collaboration and innovation.

Office Location(s): San Francisco, California, United States.

Workspace Context:

  • Collaborative Workspace: The office features open-plan workspaces that encourage collaboration and communication between team members.
  • State-of-the-art Equipment: Symbolica provides its employees with access to the latest hardware, software, and tools to support their work.
  • Flexible Work Arrangement: While this is an on-site role, Symbolica offers flexible working arrangements to accommodate individual needs and preferences.

Work Schedule: Full-time, 40 hours per week. Flexible working hours to accommodate project deadlines and maintenance windows.

📝 Enhancement Note: Symbolica AI's work environment is designed to facilitate collaboration, innovation, and direct ownership of projects. The company provides its employees with access to the latest hardware, software, and tools to support their work and offers flexible working arrangements to accommodate individual needs and preferences.

📄 Application & Technical Interview Process

Interview Process:

  1. Phone Screen (30 minutes): A brief conversation to assess your communication skills, cultural fit, and understanding of the role.
  2. Technical Deep Dive (60 minutes): A detailed discussion of your technical skills, experience, and approach to machine learning infrastructure and MLOps.
  3. On-site Interview (2-3 hours): A series of interviews with key team members, including the ML Infrastructure Lead, machine learning scientists, researchers, and engineers. This will include a technical challenge and an opportunity to discuss your portfolio and ask questions.
  4. Final Decision: A final decision will be made based on your technical skills, cultural fit, and alignment with Symbolica's mission and values.

Portfolio Review Tips:

  • Highlight your experience in designing, building, and optimizing infrastructure for machine learning workflows.
  • Showcase your ability to collaborate with machine learning teams and address their infrastructure needs.
  • Include examples of CI/CD pipelines you've built and maintained for machine learning development.
  • Explain your approach to optimizing compute resources and ensuring efficient use of cloud and on-prem hardware for training and inference.
  • Provide code samples and explain your coding style, ensuring it adheres to best practices and is well-commented.

Technical Challenge Preparation:

  • Brush up on your knowledge of machine learning infrastructure and MLOps best practices.
  • Familiarize yourself with the latest trends and developments in the field.
  • Prepare for a technical challenge that may involve designing, building, or optimizing infrastructure for machine learning workflows.

ATS Keywords: Machine Learning Infrastructure, MLOps, DevOps, Kubernetes, Linux, Containers, Nix, Problem-Solving, Category Theory, Formal Mathematics, Logical Reasoning, CI/CD Pipelines, Cloud Providers, On-prem Servers, Machine Learning Frameworks, Machine Learning Libraries.

📝 Enhancement Note: The interview process for this role is designed to assess your technical skills, experience, and cultural fit, as well as your ability to collaborate with machine learning teams and address their infrastructure needs. Be prepared to discuss your portfolio, approach to machine learning infrastructure and MLOps, and any relevant technical challenges you've faced in previous roles.

🛠 Technology Stack & Web Infrastructure

Infrastructure Technologies:

  • Cloud Providers: AWS, GCP, Azure (depending on the specific needs of the project)
  • On-prem Servers: Linux-based servers, with a focus on efficiency, security, and scalability.
  • Containerization: Docker, Kubernetes
  • Orchestration Tools: ZenML, ArgoCD

Machine Learning Technologies:

  • Frameworks: TensorFlow, PyTorch (depending on the specific needs of the project)
  • Libraries: NumPy, Pandas, Scikit-learn, TensorFlow Extended (TFX), MLflow

Development & DevOps Tools:

  • Version Control: Git, GitHub
  • CI/CD Pipelines: GitHub Actions, GitLab CI/CD, CircleCI (depending on the specific needs of the project)
  • Monitoring & Logging: Prometheus, Grafana, ELK Stack (depending on the specific needs of the project)

📝 Enhancement Note: Symbolica AI uses a combination of cloud providers, on-prem servers, containerization, and orchestration tools to support its machine learning infrastructure and MLOps needs. The specific technologies used may vary depending on the needs of the project.

👥 Team Culture & Values

Symbolica AI Values:

  • Innovation: We value creativity, curiosity, and a willingness to explore new ideas and approaches.
  • Collaboration: We believe in working together to achieve our goals, and we value open communication, active listening, and mutual respect.
  • Expertise: We strive to be the best in our field, and we value continuous learning, growth, and improvement.
  • Integrity: We act with honesty, transparency, and a commitment to doing the right thing, even when no one is watching.
  • Impact: We are passionate about creating meaningful change in the world, and we value working on projects that have a positive impact on society and the environment.

Collaboration Style:

  • Cross-functional Integration: Symbolica AI encourages collaboration between different teams and disciplines, including machine learning, research, engineering, and infrastructure.
  • Code Review Culture: We value peer review and feedback, and we encourage our team members to share their knowledge and expertise with one another.
  • Knowledge Sharing: We believe in the power of collective intelligence, and we encourage our team members to share their knowledge and expertise with one another.

📝 Enhancement Note: Symbolica AI's team culture is characterized by its commitment to innovation, collaboration, expertise, integrity, and impact. The company values cross-functional integration, code review culture, and knowledge sharing, and encourages its team members to work together to achieve their goals.

⚡ Challenges & Growth Opportunities

Technical Challenges:

  • Scaling Infrastructure: Design, build, and optimize infrastructure for large-scale machine learning workflows, including training systems and model deployment.
  • Efficient Resource Utilization: Optimize compute resources and ensure efficient use of cloud and on-prem hardware for training and inference.
  • Reproducibility & Scalability: Develop tools and frameworks to support the global team's experiments, ensuring reproducibility and scalability.
  • Security by Design: Ensure that the infrastructure behind models is secure by design, protecting sensitive data and preventing unauthorized access.

Learning & Development Opportunities:

  • Technical Skill Development: Develop expertise in machine learning infrastructure and MLOps, and stay up-to-date with the latest trends and best practices in the field.
  • Conference Attendance & Certification: Attend industry conferences, obtain relevant certifications, and engage with the machine learning community to expand your knowledge and network.
  • Mentorship & Leadership Development: Collaborate with machine learning scientists, researchers, and engineers to explore new approaches to applying category theory to enable logical reasoning in machines, and develop your leadership skills through mentoring and team management opportunities.

📝 Enhancement Note: This role offers significant technical challenges and learning opportunities for experienced infrastructure engineers looking to develop their expertise in machine learning infrastructure and MLOps. As the company grows, there may be opportunities to take on more leadership responsibilities and help shape the infrastructure team's direction.

💡 Interview Preparation

Technical Questions:

  • Infrastructure Design: Describe your approach to designing, building, and optimizing infrastructure for large-scale machine learning workflows. What considerations do you make when working with cloud providers, on-prem servers, and containerization technologies?
  • Resource Optimization: How do you optimize compute resources and ensure efficient use of cloud and on-prem hardware for training and inference? Can you provide examples of strategies you've used in previous roles?
  • CI/CD Pipeline Development: Walk us through your experience with CI/CD pipeline development for machine learning projects. What tools and technologies have you used, and what challenges have you faced?
  • Security by Design: How do you ensure that the infrastructure behind models is secure by design? What measures do you take to protect sensitive data and prevent unauthorized access?

Company & Culture Questions:

  • Company Culture: How do you see yourself fitting into Symbolica AI's high-trust, execution-first culture? What aspects of our company values resonate with you, and how do you think you can contribute to our mission?
  • Team Collaboration: How do you approach collaboration with machine learning scientists, researchers, and engineers? Can you provide examples of successful collaborations in previous roles?
  • Problem-Solving: Can you describe a challenging technical problem you've faced in a previous role and how you went about solving it? What was the outcome, and what did you learn from the experience?

Portfolio Presentation Strategy:

  • Infrastructure Projects: Highlight your experience in designing, building, and optimizing infrastructure for machine learning workflows. Include examples of projects that demonstrate your ability to collaborate with machine learning teams and address their infrastructure needs.
  • CI/CD Pipeline Projects: Showcase your experience with CI/CD pipeline development for machine learning projects. Include examples of pipelines you've built and maintained, and discuss the challenges you faced and how you overcame them.
  • Technical Documentation: Provide examples of technical documentation you've created, and explain your approach to ensuring that your code and infrastructure are well-documented and easy to understand.

📝 Enhancement Note: The interview process for this role is designed to assess your technical skills, experience, and cultural fit, as well as your ability to collaborate with machine learning teams and address their infrastructure needs. Be prepared to discuss your portfolio, approach to machine learning infrastructure and MLOps, and any relevant technical challenges you've faced in previous roles.

📌 Application Steps

To apply for this ML Infrastructure Engineer position at Symbolica AI:

  1. Review the Job Description: Carefully read the job description and ensure that you meet the required qualifications and skills.
  2. Tailor Your Resume: Highlight your relevant experience, skills, and accomplishments, and tailor your resume to the specific requirements of the role.
  3. Prepare Your Portfolio: Showcase your experience in designing, building, and optimizing infrastructure for machine learning workflows, and include examples of CI/CD pipelines you've built and maintained for machine learning development.
  4. Practice Technical Challenges: Brush up on your knowledge of machine learning infrastructure and MLOps best practices, and prepare for a technical challenge that may involve designing, building, or optimizing infrastructure for machine learning workflows.
  5. Research Symbolica AI: Learn about Symbolica AI's mission, values, and culture, and prepare thoughtful questions to ask during the interview process.

📝 Enhancement Note: The application process for this role is designed to assess your technical skills, experience, and cultural fit, as well as your ability to collaborate with machine learning teams and address their infrastructure needs. Be prepared to discuss your portfolio, approach to machine learning infrastructure and MLOps, and any relevant technical challenges you've faced in previous roles.

Application Requirements

5+ years of experience in software engineering or infrastructure roles, with at least 2 years in machine learning infrastructure or MLOps. Proficiency in scaling DevOps pipelines and experience with orchestration tools like ZenML and Kubernetes.