Data Center Operations Manager - Team lead

FluidStack
Full_timeNew York, United States

📍 Job Overview

  • Job Title: Data Center Operations Manager - Team Lead
  • Company: FluidStack
  • Location: New York, New York, United States
  • Job Type: Full-Time, On-Site
  • Category: Data Center Operations, Team Leadership
  • Date Posted: August 2, 2025
  • Experience Level: 5-10 years

🚀 Role Summary

  • Lead and manage all datacenter operations during your assigned shift, ensuring 24/7 reliability of our GPU supercomputing infrastructure.
  • Train and mentor junior technicians to enhance their skills and build a high-performing team.
  • Oversee incident response and troubleshooting, providing technical guidance and escalation support.
  • Develop and implement operational procedures and best practices to improve efficiency and reduce downtime.
  • Coordinate with other shift leads to ensure seamless handovers and consistent operations across all shifts.

📝 Enhancement Note: This role requires a strong technical background in datacenter infrastructure and proven leadership skills to manage a team of technicians and ensure the highest levels of reliability and performance for the GPU supercomputing infrastructure.

💻 Primary Responsibilities

  • Shift Management: Oversee daily operations, ensuring all tasks are completed and deadlines are met.
  • Team Leadership: Mentor and develop junior technicians, fostering a culture of continuous learning and improvement.
  • Incident Management: Lead incident response efforts, troubleshoot issues, and implement corrective actions to minimize downtime.
  • Process Improvement: Develop and implement operational procedures and best practices to enhance efficiency and reduce human error.
  • Cross-Shift Coordination: Collaborate with other shift leads to ensure consistent operations and seamless handovers.
  • Performance Monitoring: Track and analyze key performance indicators (KPIs) to identify trends, optimize processes, and ensure service level agreements (SLAs) are met.

📝 Enhancement Note: This role demands strong technical expertise, excellent communication skills, and the ability to make critical decisions under pressure to maintain the reliability and performance of the GPU supercomputing infrastructure.

🎓 Skills & Qualifications

Education

  • Bachelor's degree in Computer Science, Engineering, or a related field, or equivalent experience.

Experience

  • 5+ years of experience in datacenter operations, with at least 2 years in a leadership or supervisory role.
  • Proven track record of successfully leading and developing technical teams.
  • Experience with incident management, root cause analysis, and implementing corrective actions.

Required Skills

  • Strong technical background in datacenter infrastructure, including servers, networking, power, and cooling systems.
  • Excellent communication and interpersonal skills with the ability to mentor and inspire team members.
  • Experience with incident management, root cause analysis, and implementing corrective actions.
  • Strong problem-solving skills and the ability to make critical decisions under pressure.
  • Proficiency in using DCIM tools and automation platforms.

Preferred Skills

  • Experience with GPU clusters and high-performance computing environments.
  • Familiarity with hyperscale or colocation datacenter environments.
  • Previous experience in a 24/7 shift-based operation.

📝 Enhancement Note: While not required, experience with GPU clusters and high-performance computing environments, as well as familiarity with hyperscale or colocation datacenter environments, would be highly beneficial for this role.

📊 Web Portfolio & Project Requirements

  • Portfolio Essentials: Not applicable for this role, as it focuses on datacenter operations and team leadership rather than web development or server administration projects.
  • Technical Documentation: Prepare a document outlining your experience with datacenter operations, team leadership, incident management, and process improvement. Include specific examples of challenges faced and how you overcame them.

💵 Compensation & Benefits

Salary Range

  • Estimated: $120,000 - $160,000 per year (Based on market research for datacenter operations managers in New York)
  • Currency: USD

Benefits

  • Competitive total compensation package (cash + equity).
  • Retirement or pension plan, in line with local norms.
  • Health, dental, and vision insurance.
  • Generous PTO policy, in line with local norms.

🎯 Team & Company Context

🏢 Company Culture

  • Industry: AI Cloud Platform
  • Company Size: Small, highly motivated team focused on providing a world-class supercomputing experience.
  • Founded: Not specified
  • Team Structure: Small, highly motivated team with a focus on customer experience and high standards.
  • Development Methodology: Not specified

Company Website: FluidStack

📝 Enhancement Note: FluidStack is a small, highly motivated team focused on providing a world-class supercomputing experience. They expect their team members to care deeply about the work they do, the products they build, and the experience their customers have in every interaction with them.

📈 Career & Growth Analysis

  • Datacenter Operations Manager - Team Lead: This role involves leading a team of technicians responsible for the daily operations and maintenance of the datacenter infrastructure. The ideal candidate will have a strong technical background, proven leadership skills, and experience with incident management and process improvement.
  • Reporting Structure: Reports directly to the Data Center Operations Manager or a similar role.
  • Technical Impact: Directly responsible for the reliability and performance of the GPU supercomputing infrastructure, ensuring minimal downtime and optimal performance.

Growth Opportunities:

  • Team Lead to Manager: Transition to a management role, overseeing multiple teams and larger datacenter operations.
  • Technical Expertise: Develop expertise in specific areas of datacenter operations, such as power and cooling systems or network infrastructure.
  • Cross-Functional Roles: Explore opportunities in related fields, such as IT service management or data center design and construction.

📝 Enhancement Note: This role offers the opportunity to grow within the datacenter operations team, either by managing larger teams or specializing in specific areas of datacenter infrastructure. Additionally, cross-functional roles in related fields may present themselves as the company grows and expands its offerings.

🌐 Work Environment

  • Office Type: On-site, with a focus on collaboration and teamwork.
  • Office Location(s): New York, New York, United States
  • Workspace Context:
    • Collaborative workspace with a focus on teamwork and communication.
    • Access to the latest tools and technologies for datacenter operations and management.
    • Opportunities for professional development and growth within the team and the company.

Work Schedule: 24/7 shift-based operation, with specific shifts to be determined based on business needs and team availability.

📝 Enhancement Note: The work environment at FluidStack is collaborative and focused on teamwork. The company offers opportunities for professional development and growth within the team and the company. The work schedule is 24/7 shift-based, with specific shifts to be determined based on business needs and team availability.

📄 Application & Technical Interview Process

Interview Process:

  1. Phone Screen: A brief phone call to discuss your experience and qualifications for the role.
  2. On-Site Interview: A visit to the FluidStack office in New York, where you will meet with the hiring manager and other team members to discuss your experience and qualifications in more detail. This may include a tour of the datacenter facility.
  3. Technical Assessment: A hands-on assessment of your technical skills, focusing on your experience with datacenter infrastructure, incident management, and process improvement.
  4. Final Interview: A final interview with the hiring manager or another senior team member to discuss your fit for the role and the company culture.

Portfolio Review Tips:

  • Prepare a document outlining your experience with datacenter operations, team leadership, incident management, and process improvement. Include specific examples of challenges faced and how you overcame them.
  • Be prepared to discuss your technical skills and experience with datacenter infrastructure, incident management, and process improvement.

Technical Challenge Preparation:

  • Brush up on your knowledge of datacenter infrastructure, incident management, and process improvement.
  • Prepare for a hands-on assessment of your technical skills, focusing on your experience with datacenter infrastructure, incident management, and process improvement.

ATS Keywords: Datacenter Operations, Team Leadership, Incident Management, Root Cause Analysis, Corrective Actions, DCIM Tools, Automation Platforms, GPU Clusters, High-Performance Computing, Hyperscale Datacenters, Colocation Datacenters, 24/7 Shift Operations, Service Level Agreements (SLAs), Key Performance Indicators (KPIs)

📝 Enhancement Note: The interview process for this role focuses on assessing your technical skills and experience with datacenter infrastructure, incident management, and process improvement. Be prepared to discuss your experience and qualifications in detail and to demonstrate your technical skills through a hands-on assessment.

🛠 Technology Stack & Web Infrastructure

Datacenter Infrastructure:

  • Servers: Not specified
  • Networking: Not specified
  • Power: Not specified
  • Cooling: Not specified

DCIM Tools & Automation Platforms:

  • Not specified

📝 Enhancement Note: The technology stack for this role focuses on datacenter infrastructure, including servers, networking, power, and cooling systems. Experience with DCIM tools and automation platforms is preferred but not required.

👥 Team Culture & Values

Datacenter Operations Values:

  • High standards: We expect you to care deeply about the work you do, the products you build, and the experience our customers have in every interaction with us.
  • Effectiveness: We value effectiveness, competence, and a growth mindset.
  • Competence: We expect you to be competent in your role and to continuously improve your skills and knowledge.
  • Growth mindset: We value a growth mindset and the ability to learn from failures and setbacks.

Collaboration Style:

  • Collaborative: We work together as a team to ensure the highest levels of reliability and performance for our GPU supercomputing infrastructure.
  • Cross-functional: We collaborate with other teams within the company to ensure our datacenter operations support the broader goals and objectives of the organization.
  • Knowledge sharing: We share our knowledge and expertise with one another to continuously improve our skills and the quality of our work.

📝 Enhancement Note: FluidStack values high standards, effectiveness, competence, and a growth mindset. The team works collaboratively to ensure the highest levels of reliability and performance for the GPU supercomputing infrastructure and supports the broader goals and objectives of the organization.

⚡ Challenges & Growth Opportunities

Technical Challenges:

  • Incident Management: Develop and refine incident management processes to minimize downtime and ensure optimal performance of the GPU supercomputing infrastructure.
  • Root Cause Analysis: Improve your ability to identify and address the root causes of incidents and issues to prevent them from recurring.
  • Process Improvement: Continuously review and improve operational procedures and best practices to enhance efficiency and reduce human error.
  • Team Leadership: Develop your leadership skills and abilities to effectively manage and mentor a team of technicians.

Learning & Development Opportunities:

  • Technical Training: Participate in technical training and development opportunities to enhance your skills and knowledge in datacenter operations and management.
  • Conferences & Events: Attend industry conferences and events to stay up-to-date on the latest trends and best practices in datacenter operations and management.
  • Mentorship: Seek out mentorship opportunities within the company to learn from experienced team members and develop your skills and career.

📝 Enhancement Note: This role presents numerous technical challenges and learning opportunities for candidates with a strong technical background and a desire to grow and develop their skills in datacenter operations and management.

💡 Interview Preparation

Technical Questions:

  • Datacenter Infrastructure: Be prepared to discuss your experience with datacenter infrastructure, including servers, networking, power, and cooling systems.
  • Incident Management: Prepare for questions about your experience with incident management, root cause analysis, and implementing corrective actions.
  • Process Improvement: Expect questions about your experience with process improvement and enhancing efficiency and reducing human error in datacenter operations.

Company & Culture Questions:

  • Company Culture: Research FluidStack's company culture and be prepared to discuss how your values and work style align with the company's mission and goals.
  • Team Dynamics: Prepare for questions about your experience working in a team environment and your ability to collaborate and communicate effectively with others.
  • Customer Focus: Expect questions about your commitment to providing a world-class supercomputing experience for FluidStack's customers.

Portfolio Presentation Strategy:

  • Prepare a document outlining your experience with datacenter operations, team leadership, incident management, and process improvement. Include specific examples of challenges faced and how you overcame them.
  • Be prepared to discuss your technical skills and experience with datacenter infrastructure, incident management, and process improvement.

📝 Enhancement Note: The interview process for this role focuses on assessing your technical skills and experience with datacenter infrastructure, incident management, and process improvement. Be prepared to discuss your experience and qualifications in detail and to demonstrate your technical skills through a hands-on assessment.

📌 Application Steps

To apply for this Data Center Operations Manager - Team Lead position at FluidStack:

  1. Submit your application through the application link provided.
  2. Prepare a document outlining your experience with datacenter operations, team leadership, incident management, and process improvement. Include specific examples of challenges faced and how you overcame them.
  3. Research FluidStack's company culture and be prepared to discuss how your values and work style align with the company's mission and goals.
  4. Prepare for a phone screen, on-site interview, technical assessment, and final interview, as outlined in the interview process section.

⚠️ Important Notice: This enhanced job description includes AI-generated insights and datacenter operations industry-standard assumptions. All details should be verified directly with the hiring organization before making application decisions.


Application Requirements

5+ years of experience in datacenter operations, with at least 2 years in a leadership or supervisory role is required. A strong technical background in datacenter infrastructure and proven track record of successfully leading and developing technical teams is essential.