Technical Program Manager III, ML Fleet Capacity Management, Cloud Supply Chain
📍 Job Overview
- Job Title: Technical Program Manager III, ML Fleet Capacity Management, Cloud Supply Chain
- Company: Google
- Location: Sunnyvale, California, United States
- Job Type: On-site
- Category: Technical Program Management
- Date Posted: August 8, 2025
- Experience Level: 5-10 years
- Remote Status: On-site
🚀 Role Summary
- Lead cross-functional programs to manage ML Fleet capacity and drive resource efficiency
- Develop and maintain resource usage metrics, policies, and governance frameworks for ML Fleet
- Oversee chip/aux operations to maintain a healthy and efficient ML fleet
- Manage communications and escalations related to ML resource allocation and strategic shifts
- Identify gaps and drive initiatives to improve existing tooling and processes
📝 Enhancement Note: This role requires a strong technical background and experience in program management to effectively lead cross-functional teams and drive strategic initiatives in ML Fleet capacity management.
💻 Primary Responsibilities
- Capacity Management: Lead cross-functional programs to manage ML Fleet capacity, including the design, update, and maintenance of the ML Fleet's cluster-level allocation Plan of Record.
- Resource Metrics & Governance: Drive the development, implementation, and ongoing maintenance of fleet-wide accelerator and auxiliary resource usage metrics, policies, and governance frameworks.
- Fleet Operations: Oversee and optimize chip/aux operations to maintain a healthy and efficient ML fleet and Global Quota Marketplace.
- Communication & Escalation: Manage communications and escalations related to ML resource allocation, performance, and strategic shifts for Product Areas and other partners.
- Process Improvement: Identify gaps and drive initiatives to improve existing tooling and processes, enhancing the efficiency, agility, and responsiveness of ML capacity allocation and management.
📝 Enhancement Note: This role requires a balance of technical depth and breadth to effectively manage cross-functional teams, understand ML workloads, and drive strategic initiatives in capacity management and resource optimization.
🎓 Skills & Qualifications
Education: Bachelor's degree in a technical field, or equivalent practical experience.
Experience: 5 years of experience in program management, with a focus on resource management, customer-facing communication, stakeholder management, and supply chain operations.
Required Skills:
- Program management experience, including cross-functional or cross-team project management
- Experience with Machine Learning infrastructure, accelerators (TPUs/GPUs), or managing AI/ML workloads at scale
- Experience with capacity management, supply chain, or demand forecasting processes in a technology context
- Ability to navigate ambiguity, influence without direct authority, and drive consensus across technical and non-technical teams
Preferred Skills:
- Experience managing cross-functional or cross-team projects
- Experience defining and implementing governance frameworks or policies for technical resources
- Familiarity with Google's ML Fleet and global resource allocation processes
📝 Enhancement Note: Candidates with experience in ML infrastructure, capacity management, and stakeholder communication will be well-suited for this role. Familiarity with Google's ML Fleet and global resource allocation processes is a plus.
📊 Web Portfolio & Project Requirements
Portfolio Essentials:
- Demonstrate experience in program management, resource management, and stakeholder communication through case studies or project examples
- Showcase ability to lead cross-functional teams and drive strategic initiatives in capacity management and resource optimization
- Highlight experience with ML infrastructure, accelerators (TPUs/GPUs), and managing AI/ML workloads at scale
Technical Documentation:
- Provide detailed project documentation, including project charters, stakeholder communication plans, and risk management strategies
- Demonstrate ability to analyze data, develop metrics, and create reports to inform decision-making and resource allocation
- Showcase experience with governance frameworks and policy development for technical resources
📝 Enhancement Note: Candidates should focus on demonstrating their ability to lead cross-functional teams, manage resources, and drive strategic initiatives in capacity management and resource optimization. Familiarity with Google's ML Fleet and global resource allocation processes is a plus.
💵 Compensation & Benefits
Salary Range: The US base salary range for this full-time position is $156,000-$229,000 + bonus + equity + benefits. Individual pay is determined by work location and additional factors, including job-related skills, experience, and relevant education or training.
Benefits:
- Bonus
- Equity
- Benefits
Working Hours: Full-time position with standard working hours, including flexibility for project deadlines and maintenance windows.
📝 Enhancement Note: The provided salary range is determined by role, level, and location. Individual pay is determined by work location and additional factors, including job-related skills, experience, and relevant education or training. Learn more about benefits at Google.
🎯 Team & Company Context
🏢 Company Culture
Industry: Google operates in the technology industry, with a focus on search, advertising, and cloud computing services. This role is within the Technical Infrastructure team, which keeps Google's product portfolio running and ensures users have the best and fastest experience possible.
Company Size: Google is a large company with a global presence, employing over 135,000 full-time employees worldwide. This role is part of a large, cross-functional team responsible for managing ML Fleet capacity and resource allocation.
Founded: Google was founded in 1998 by Larry Page and Sergey Brin. The company has since grown to become one of the world's leading technology companies, known for its innovative products and services.
Team Structure:
- The ML Fleet Capacity Management team is part of the Technical Infrastructure organization, working cross-functionally with various Product Areas and other partners
- The team is responsible for managing ML Fleet capacity and resource allocation, ensuring optimal utilization and performance of ML accelerators and auxiliary resources
- The team works closely with other technical teams, including ML Infrastructure, Site Reliability Engineering, and Global Quota Management, to maintain a healthy and efficient ML fleet
Development Methodology:
- The team follows Agile methodologies, with a focus on iterative development, continuous improvement, and cross-functional collaboration
- The team uses tools such as Google Workspace, JIRA, and BigQuery to manage projects, track progress, and analyze data
- The team works closely with stakeholders to define project requirements, identify risks, and manage project schedules
Company Website: Google Careers
📝 Enhancement Note: Google's company culture is characterized by innovation, collaboration, and a focus on user experience. The Technical Infrastructure team plays a critical role in keeping Google's products and services running smoothly and efficiently.
📈 Career & Growth Analysis
Web Technology Career Level: This role is a Technical Program Manager III position, indicating a mid-to-senior level within the technical program management career path. The role requires a strong technical background and proven experience in program management, resource management, and stakeholder communication.
Reporting Structure: The role reports directly to the ML Fleet Capacity Management team lead, working cross-functionally with various Product Areas and other partners. The role may have direct reports or manage projects with team members from other functional areas.
Technical Impact: The role has a significant impact on the performance, efficiency, and scalability of Google's ML Fleet. By effectively managing capacity and resource allocation, the role ensures optimal utilization of ML accelerators and auxiliary resources, enabling Google to deliver high-quality ML services to its users.
Growth Opportunities:
- Technical Leadership: As a Technical Program Manager III, there is potential for growth into senior technical leadership roles, such as Technical Lead or Engineering Manager
- Domain Expertise: The role offers opportunities to deepen technical expertise in ML infrastructure, capacity management, and resource optimization
- Cross-Functional Collaboration: The role provides opportunities to work with various teams and stakeholders, fostering a broad understanding of Google's ML Fleet and global resource allocation processes
📝 Enhancement Note: This role offers significant growth opportunities for technical program managers looking to advance their careers in ML infrastructure, capacity management, and resource optimization. The role's cross-functional nature also provides opportunities to develop a broad understanding of Google's ML Fleet and global resource allocation processes.
🌐 Work Environment
Office Type: Google's Sunnyvale office is a modern, collaborative workspace designed to foster innovation and creativity. The office features open workspaces, meeting rooms, and breakout areas, as well as on-site amenities such as cafes, fitness centers, and wellness spaces.
Office Location(s): Sunnyvale, California, United States
Workspace Context:
- Collaborative Environment: The office features open workspaces, encouraging collaboration and communication among team members
- Technical Infrastructure: The office is equipped with state-of-the-art technology, including high-speed internet, multiple monitors, and testing devices
- Cross-Functional Interaction: The office is home to various teams and functions, providing opportunities for cross-functional collaboration and knowledge sharing
Work Schedule: Full-time position with standard working hours, including flexibility for project deadlines and maintenance windows. The role may require occasional overtime or on-call responsibilities to support critical infrastructure and ensure optimal performance of the ML Fleet.
📝 Enhancement Note: Google's Sunnyvale office provides a modern, collaborative workspace designed to foster innovation and creativity. The office's cross-functional nature offers opportunities for collaboration and knowledge sharing among team members and other functions.
📄 Application & Technical Interview Process
Interview Process:
- Technical Assessment: Candidates can expect a technical assessment focused on program management, resource management, and stakeholder communication skills. The assessment may include case studies, scenario-based questions, and data analysis exercises.
- Behavioral Questions: Candidates can expect behavioral questions focused on problem-solving, decision-making, and leadership skills. The questions may be based on the STAR method (Situation, Task, Action, Result) to assess the candidate's ability to handle complex situations and drive strategic initiatives.
- Cross-Functional Interaction: Candidates can expect to interact with various team members and stakeholders during the interview process, providing an opportunity to assess cultural fit and communication skills.
- Final Evaluation: The final evaluation may include a presentation or project proposal, allowing candidates to demonstrate their ability to lead cross-functional teams and drive strategic initiatives in capacity management and resource optimization.
Portfolio Review Tips:
- Case Studies: Prepare case studies demonstrating experience in program management, resource management, and stakeholder communication. Highlight the impact of your work on capacity management, resource optimization, and strategic initiatives.
- Data Analysis: Prepare data analysis examples showcasing your ability to analyze data, develop metrics, and create reports to inform decision-making and resource allocation.
- Governance Frameworks: Prepare examples demonstrating your experience with governance frameworks and policy development for technical resources.
Technical Challenge Preparation:
- Program Management: Brush up on your program management skills, including project planning, risk management, and stakeholder communication. Familiarize yourself with Agile methodologies and Google's project management tools.
- Resource Management: Review your knowledge of capacity management, demand forecasting, and resource optimization techniques. Familiarize yourself with ML infrastructure and Google's ML Fleet and global resource allocation processes.
- Stakeholder Communication: Prepare for questions about your experience communicating with technical and non-technical stakeholders. Practice explaining complex technical concepts in a clear and concise manner.
ATS Keywords: [Provide a comprehensive list of web development and server administration-relevant keywords for resume optimization, organized by category: programming languages, web frameworks, server technologies, databases, tools, methodologies, soft skills, industry terms]
📝 Enhancement Note: The interview process for this role is designed to assess the candidate's technical skills, problem-solving abilities, and leadership potential in program management, resource management, and stakeholder communication. Candidates should focus on demonstrating their ability to lead cross-functional teams and drive strategic initiatives in capacity management and resource optimization.
🛠 Technology Stack & Web Infrastructure
Frontend Technologies: [Not applicable for this role]
Backend & Server Technologies:
- ML Infrastructure: Familiarity with ML infrastructure, accelerators (TPUs/GPUs), and managing AI/ML workloads at scale is required
- Capacity Management Tools: Experience with capacity management, demand forecasting, and resource optimization tools is preferred
- Governance Frameworks: Experience with governance frameworks and policy development for technical resources is preferred
Development & DevOps Tools:
- Google Workspace: Familiarity with Google Workspace tools, such as Google Docs, Sheets, and Slides, is required
- JIRA: Experience with JIRA for project management and issue tracking is preferred
- BigQuery: Experience with BigQuery for data analysis and reporting is preferred
📝 Enhancement Note: This role requires a strong technical background in ML infrastructure, capacity management, and resource optimization. Familiarity with Google's ML Fleet and global resource allocation processes is a plus.
👥 Team Culture & Values
Web Development Values:
- Innovation: Google values innovation and encourages its employees to think creatively and challenge the status quo
- Collaboration: Google values collaboration and fosters a culture of teamwork and knowledge sharing
- User Focus: Google values a user-centric approach and prioritizes the needs and experiences of its users
- Data-Driven Decision Making: Google values data-driven decision making and uses data to inform its strategies and product development processes
Collaboration Style:
- Cross-Functional Interaction: The team works closely with various Product Areas and other partners, fostering a collaborative and inclusive culture
- Agile Methodologies: The team follows Agile methodologies, encouraging iterative development, continuous improvement, and cross-functional collaboration
- Knowledge Sharing: The team values knowledge sharing and encourages its members to learn from one another and contribute to the team's collective expertise
📝 Enhancement Note: Google's company culture is characterized by innovation, collaboration, and a focus on user experience. The Technical Infrastructure team plays a critical role in keeping Google's products and services running smoothly and efficiently, and values a data-driven approach to decision-making and problem-solving.
⚡ Challenges & Growth Opportunities
Technical Challenges:
- ML Infrastructure: Stay up-to-date with the latest developments in ML infrastructure, accelerators (TPUs/GPUs), and managing AI/ML workloads at scale
- Capacity Management: Develop expertise in capacity management, demand forecasting, and resource optimization techniques to ensure optimal utilization of ML accelerators and auxiliary resources
- Governance Frameworks: Develop expertise in governance frameworks and policy development for technical resources to ensure compliance and security in ML Fleet capacity management and resource allocation
Learning & Development Opportunities:
- Technical Skill Development: The role offers opportunities to deepen technical expertise in ML infrastructure, capacity management, and resource optimization
- Leadership Development: The role provides opportunities to develop leadership skills through managing cross-functional teams and driving strategic initiatives
- Architecture Decision-Making: The role offers opportunities to participate in architecture decision-making processes, influencing the design and implementation of ML Fleet capacity management and resource allocation strategies
📝 Enhancement Note: This role presents significant technical challenges and growth opportunities for candidates looking to advance their careers in ML infrastructure, capacity management, and resource optimization. The role's cross-functional nature also provides opportunities to develop leadership skills and participate in architecture decision-making processes.
💡 Interview Preparation
Technical Questions:
- Program Management: Prepare for technical questions focused on program management, resource management, and stakeholder communication skills. The questions may include case studies, scenario-based questions, and data analysis exercises.
- Capacity Management: Prepare for technical questions focused on capacity management, demand forecasting, and resource optimization techniques. The questions may include scenario-based questions and data analysis exercises.
- Governance Frameworks: Prepare for technical questions focused on governance frameworks and policy development for technical resources. The questions may include scenario-based questions and data analysis exercises.
Company & Culture Questions:
- Google's ML Fleet: Prepare for questions about Google's ML Fleet and global resource allocation processes. Familiarize yourself with Google's products and services, and be ready to discuss how your work in ML Fleet capacity management and resource allocation would support the company's mission and values.
- Agile Methodologies: Prepare for questions about Agile methodologies and how you have applied them in previous roles. Be ready to discuss your experience with iterative development, continuous improvement, and cross-functional collaboration.
- User Experience: Prepare for questions about user experience and how you have prioritized the needs and experiences of users in previous roles. Be ready to discuss your experience with user-centered design and data-driven decision making.
Portfolio Presentation Strategy:
- Case Studies: Prepare case studies demonstrating your experience in program management, resource management, and stakeholder communication. Highlight the impact of your work on capacity management, resource optimization, and strategic initiatives.
- Data Analysis: Prepare data analysis examples showcasing your ability to analyze data, develop metrics, and create reports to inform decision-making and resource allocation.
- Governance Frameworks: Prepare examples demonstrating your experience with governance frameworks and policy development for technical resources.
📝 Enhancement Note: The interview process for this role is designed to assess the candidate's technical skills, problem-solving abilities, and leadership potential in program management, resource management, and stakeholder communication. Candidates should focus on demonstrating their ability to lead cross-functional teams and drive strategic initiatives in capacity management and resource optimization.
📌 Application Steps
To apply for this Technical Program Manager III, ML Fleet Capacity Management, Cloud Supply Chain position at Google:
- Submit your application through the Google Careers website
- Portfolio Customization: Tailor your portfolio to highlight your experience in program management, resource management, and stakeholder communication. Include case studies, data analysis examples, and governance framework examples that demonstrate your ability to lead cross-functional teams and drive strategic initiatives in capacity management and resource optimization.
- Resume Optimization: Optimize your resume for web development and server administration roles, with a focus on project highlighting and technical skills emphasis. Include relevant keywords and phrases to improve search engine optimization and increase your visibility to recruiters.
- Technical Interview Preparation: Prepare for technical interviews focused on program management, resource management, and stakeholder communication skills. Brush up on your knowledge of ML infrastructure, capacity management, and resource optimization techniques. Familiarize yourself with Google's ML Fleet and global resource allocation processes.
- Company Research: Research Google's company culture, products, and services. Be ready to discuss how your work in ML Fleet capacity management and resource allocation would support the company's mission and values. Familiarize yourself with Google's Agile methodologies and user-centered design principles.
⚠️ Important Notice: This enhanced job description includes AI-generated insights and web development/server administration industry-standard assumptions. All details should be verified directly with the hiring organization before making application decisions.
Application Requirements
Candidates must have a Bachelor's degree in a technical field and at least 5 years of experience in program management and resource management. Preferred qualifications include experience with machine learning infrastructure and capacity management processes.