Workload Scheduling HPC Platform Engineer

Qube Research & Technologies
Full_timeLondon, United Kingdom

📍 Job Overview

  • Job Title: Workload Scheduling HPC Platform Engineer
  • Company: Qube Research & Technologies
  • Location: London
  • Job Type: On-site
  • Category: DevOps Engineer
  • Date Posted: June 27, 2025
  • Experience Level: Mid-level (2-5 years)
  • Remote Status: On-site

🚀 Role Summary

  • Develop and maintain High-Performance Computing (HPC) platforms using YellowDog and Ray schedulers across cloud and on-premises infrastructures.
  • Collaborate with internal teams to integrate HPC solutions, enabling scalable and efficient compute capabilities.
  • Improve performance, resilience, and observability of compute infrastructure through continuous improvement initiatives.

📝 Enhancement Note: This role requires a strong background in HPC environments and a solid understanding of cloud services to support both business and technology groups effectively.

💻 Primary Responsibilities

  • Platform Development & Support: Develop and support scalable workload scheduling solutions for HPC environments using YellowDog and Ray schedulers.
  • Team Collaboration: Work closely with internal teams to adopt and optimize HPC platforms, ensuring they meet the needs of various stakeholders.
  • Infrastructure Optimization: Improve the performance, resilience, and observability of compute infrastructure by identifying bottlenecks and implementing enhancements.
  • Automation & Continuous Improvement: Contribute to infrastructure automation and continuous improvement initiatives to streamline workflows and increase efficiency.
  • Expertise Sharing: Share your expertise and support team development through coaching and collaboration, fostering a culture of learning and growth.

📝 Enhancement Note: This role requires a balance of technical expertise and strong communication skills to effectively collaborate with diverse teams and stakeholders.

🎓 Skills & Qualifications

Education: A bachelor's degree in Computer Science, Engineering, or a related field. Relevant experience may be considered in lieu of a degree.

Experience: Proven experience (2-5 years) in engineering and supporting HPC environments, with a strong focus on workload scheduling and large-scale systems.

Required Skills:

  • Experience with at least one HPC scheduler (YellowDog, Ray, Slurm, or IBM Symphony)
  • Deep understanding of both loosely coupled and tightly coupled HPC workloads
  • Experience developing and supporting large-scale systems (5000+ nodes) and high levels of concurrency (100k+ tasks)
  • Strong Linux administration skills
  • Solid understanding of core AWS services, including VPC security, networking, EC2 configuration, and storage services (S3, EFS, EBS, and FSx)
  • Experience with infrastructure-as-code using configuration languages and tools, particularly Terraform and Ansible
  • Proficiency in Python for developing applications and tools
  • Good understanding of monitoring and visualization tools for large-scale systems (CloudWatch, CloudTrail, OpenSearch, Athena)
  • Experience with performance tuning of compute, network, and storage components
  • Strong communication and collaboration skills, with the ability to describe complex issues at the appropriate level for different audiences

Preferred Skills:

  • Experience with identity providers such as Okta and user authorization in large-scale distributed environments
  • Familiarity with containerization (e.g., Docker, Kubernetes) and orchestration tools
  • Knowledge of CI/CD pipelines and deployment strategies
  • Experience with data analysis and visualization tools (e.g., Tableau, Power BI)

📝 Enhancement Note: Candidates with a strong background in both HPC environments and cloud services will be well-positioned to succeed in this role.

📊 Web Portfolio & Project Requirements

Portfolio Essentials:

  • Demonstrate your experience with HPC schedulers by showcasing projects where you've developed, optimized, or maintained large-scale HPC platforms.
  • Highlight your cloud services expertise by presenting projects that showcase your ability to manage and optimize AWS resources.
  • Include examples of your Python development work, emphasizing projects that involve infrastructure automation or data processing.
  • Showcase your Linux administration skills by demonstrating your ability to manage and maintain large-scale systems.

Technical Documentation:

  • Provide clear and concise documentation for your HPC projects, explaining the architecture, configuration, and any performance optimizations implemented.
  • Include documentation for your Python projects, detailing the purpose, functionality, and any relevant code snippets or examples.
  • Demonstrate your understanding of monitoring and visualization tools by including screenshots, graphs, or other visual representations of system performance and health.

📝 Enhancement Note: When preparing your portfolio, focus on projects that showcase your ability to work with large-scale systems and optimize performance in HPC environments.

💵 Compensation & Benefits

Salary Range: £60,000 - £80,000 per annum (Based on market research and regional adjustments for London)

Benefits:

  • Competitive salary and bonus structure
  • Comprehensive health, dental, and vision insurance
  • Generous pension contributions
  • Flexible working hours and remote work options
  • Employee equity and profit-sharing opportunities
  • Professional development and training opportunities
  • A dynamic and inclusive work environment that values diversity and collaboration

Working Hours: Full-time position with standard office hours (Monday-Friday, 9:00 AM - 5:30 PM), with flexibility for occasional overtime or remote work as needed.

📝 Enhancement Note: The salary range provided is based on market research for similar roles in the London area, considering the required experience level and the unique demands of this position.

🎯 Team & Company Context

🏢 Company Culture

Industry: Quantitative and systematic investment management, with a strong focus on technology and data-driven approaches.

Company Size: Medium-sized global investment manager with a collaborative and innovative culture.

Founded: 2003, with a history of growth and success in the investment management industry.

Team Structure:

  • The Workload Scheduling team is part of the broader Technology group, which consists of various teams focused on data, research, trading, and infrastructure.
  • The team works closely with business groups to understand their needs and provide tailored HPC solutions.
  • The team follows Agile methodologies, with a focus on continuous improvement and collaboration.

Development Methodology:

  • The team follows Agile/Scrum methodologies, with sprint planning, daily stand-ups, and regular retrospectives.
  • Code reviews, testing, and quality assurance practices are integral to the development process.
  • Deployment strategies, CI/CD pipelines, and server management are essential aspects of the role.

Company Website: Qube Research & Technologies

📝 Enhancement Note: QRT's company culture emphasizes technology, data, and collaboration, making it an ideal environment for candidates with strong technical skills and a passion for working in a dynamic, innovative setting.

📈 Career & Growth Analysis

Web Technology Career Level: This role is suitable for a mid-level DevOps Engineer with a strong background in HPC environments and cloud services. The position offers opportunities for growth and progression within the Technology group and the broader organization.

Reporting Structure: The role reports directly to the Head of Workload Scheduling, with regular interactions with other team members, business stakeholders, and technology leadership.

Technical Impact: The role has a significant impact on the performance, scalability, and efficiency of QRT's HPC platforms, directly contributing to the organization's ability to process and analyze large datasets effectively.

Growth Opportunities:

  • Technical Growth: Develop expertise in emerging technologies, tools, and best practices related to HPC environments and cloud services.
  • Leadership Growth: Gain experience in coaching and mentoring team members, contributing to the team's development and success.
  • Architecture & Design: Contribute to the design and architecture of HPC platforms, influencing the organization's technology roadmap.

📝 Enhancement Note: This role offers numerous opportunities for growth and development, both technically and professionally, within QRT's collaborative and innovative work environment.

🌐 Work Environment

Office Type: Modern, collaborative office spaces designed to facilitate teamwork and innovation.

Office Location(s): London, with opportunities for remote work and flexible working arrangements.

Workspace Context:

  • Collaborative Environment: The office features open-plan workspaces, meeting rooms, and breakout areas designed to encourage collaboration and communication among team members.
  • Technical Infrastructure: The workspace is equipped with high-performance workstations, multiple monitors, and testing devices to support the development and maintenance of HPC platforms.
  • Cross-functional Collaboration: The team works closely with other technology groups and business stakeholders, fostering a culture of knowledge sharing and continuous learning.

Work Schedule: Full-time position with standard office hours (Monday-Friday, 9:00 AM - 5:30 PM), with flexibility for occasional overtime or remote work as needed.

📝 Enhancement Note: QRT's work environment emphasizes collaboration, innovation, and continuous learning, providing an ideal setting for candidates seeking to grow both technically and professionally.

📄 Application & Technical Interview Process

Interview Process:

  1. Phone/Video Screen: A brief conversation to discuss your background, experience, and fit for the role (30-45 minutes).
  2. Technical Deep Dive: A detailed discussion of your technical skills, experience, and approach to HPC environments and cloud services (60-90 minutes).
  3. Behavioral & Cultural Fit: An assessment of your communication skills, problem-solving abilities, and cultural fit within the team and organization (30-45 minutes).
  4. Final Interview: A meeting with the hiring manager or other senior stakeholders to discuss your qualifications, answer any remaining questions, and make a final decision (30-45 minutes).

Portfolio Review Tips:

  • HPC Projects: Highlight your experience with HPC schedulers by showcasing projects where you've developed, optimized, or maintained large-scale HPC platforms.
  • Cloud Services: Demonstrate your expertise in AWS services by presenting projects that showcase your ability to manage and optimize resources.
  • Python Development: Include examples of your Python development work, emphasizing projects that involve infrastructure automation or data processing.
  • Linux Administration: Showcase your Linux administration skills by demonstrating your ability to manage and maintain large-scale systems.

Technical Challenge Preparation:

  • HPC & Cloud Services: Brush up on your knowledge of HPC schedulers, AWS services, and cloud architecture best practices.
  • Python & Linux: Review your Python development skills and Linux administration fundamentals to ensure you can effectively demonstrate your expertise.
  • Problem-solving & Communication: Prepare for behavioral questions that assess your problem-solving abilities, communication skills, and cultural fit within the team and organization.

ATS Keywords: See the comprehensive list of web development and server administration-relevant keywords for resume optimization, organized by category: programming languages, web frameworks, server technologies, databases, tools, methodologies, soft skills, industry terms

📝 Enhancement Note: The interview process for this role is designed to assess both your technical expertise and cultural fit within the organization, ensuring that you are well-prepared to succeed in the role and contribute to QRT's ongoing success.

🛠 Technology Stack & Web Infrastructure

HPC Schedulers:

  • YellowDog
  • Ray

Cloud Services:

  • AWS (Amazon Web Services)
    • VPC (Virtual Private Cloud)
    • EC2 (Elastic Compute Cloud)
    • S3 (Simple Storage Service)
    • EFS (Elastic File System)
    • EBS (Elastic Block Store)
    • FSx (Amazon FSx)
    • CloudWatch (Amazon CloudWatch)
    • CloudTrail (AWS CloudTrail)
    • OpenSearch (formerly Amazon Elasticsearch Service)
    • Athena (Amazon Athena)

Infrastructure-as-Code Tools:

  • Terraform
  • Ansible

Operating Systems:

  • Linux (Ubuntu, CentOS, etc.)
  • macOS
  • Windows

Programming Languages:

  • Python
  • Bash
  • Shell scripting

Databases:

  • Relational databases (PostgreSQL, MySQL, etc.)
  • NoSQL databases (MongoDB, Cassandra, etc.)

Monitoring & Visualization Tools:

  • Prometheus
  • Grafana
  • ELK Stack (Elasticsearch, Logstash, Kibana)
  • Zabbix
  • Nagios

📝 Enhancement Note: Familiarity with the technology stack listed above is essential for success in this role, as it forms the foundation of QRT's HPC platforms and infrastructure.

👥 Team Culture & Values

Web Development Values:

  • Innovation: Embrace a culture of continuous learning and improvement, driving technological advancements in HPC environments and cloud services.
  • Collaboration: Foster a collaborative work environment that encourages knowledge sharing, teamwork, and mutual support.
  • Quality: Maintain a strong focus on quality and excellence in all aspects of your work, from code development to system design and optimization.
  • Performance: Prioritize performance and efficiency in your work, continually seeking opportunities to improve the scalability, reliability, and speed of QRT's HPC platforms.

Collaboration Style:

  • Cross-functional Integration: Work closely with other technology groups and business stakeholders to understand their needs and provide tailored HPC solutions.
  • Code Review Culture: Participate in code reviews and pair programming practices to ensure high-quality work and knowledge sharing among team members.
  • Knowledge Sharing: Contribute to the team's development and success by sharing your expertise and supporting the growth of your colleagues.

📝 Enhancement Note: QRT's team culture emphasizes collaboration, innovation, and continuous learning, providing an ideal environment for candidates seeking to grow both technically and professionally.

⚡ Challenges & Growth Opportunities

Technical Challenges:

  • HPC Scheduler Optimization: Develop and implement strategies to optimize the performance and efficiency of YellowDog and Ray schedulers in large-scale HPC environments.
  • Cloud Services Management: Manage and optimize AWS resources to ensure cost-effectiveness, scalability, and reliability in QRT's HPC platforms.
  • Performance Tuning: Identify and address performance bottlenecks in HPC workloads, cloud services, and infrastructure components to maximize efficiency and throughput.
  • Emerging Technologies: Stay up-to-date with the latest developments in HPC environments, cloud services, and related technologies, and evaluate their applicability to QRT's infrastructure.

Learning & Development Opportunities:

  • Technical Skill Development: Develop expertise in emerging technologies, tools, and best practices related to HPC environments and cloud services.
  • Conference Attendance: Attend industry conferences, workshops, and training sessions to expand your knowledge and network with other professionals in the field.
  • Certification & Community Involvement: Pursue relevant certifications and engage with online communities to enhance your skills and stay informed about the latest trends and best practices in HPC environments and cloud services.
  • Mentorship & Leadership Development: Seek mentorship opportunities from senior team members and contribute to the development of your colleagues through coaching and knowledge sharing.

📝 Enhancement Note: This role offers numerous opportunities for technical growth and development, both through hands-on experience and structured learning initiatives.

💡 Interview Preparation

Technical Questions:

  • HPC Schedulers: Describe your experience with YellowDog, Ray, or other HPC schedulers, and explain how you've optimized their performance in large-scale environments.
  • Cloud Services: Explain your approach to managing and optimizing AWS resources, and provide examples of your experience with VPC, EC2, S3, and other cloud services.
  • Performance Tuning: Discuss your experience with performance tuning in HPC environments, cloud services, and infrastructure components, and describe your approach to identifying and addressing bottlenecks.
  • Problem-solving: Present a challenging problem you've faced in your previous roles and explain your approach to solving it, emphasizing your technical skills and problem-solving abilities.

Company & Culture Questions:

  • Team Dynamics: Describe your experience working in a collaborative, cross-functional team environment, and explain how you've contributed to the success of your colleagues and the broader organization.
  • Adaptability: Discuss a situation where you had to adapt to significant changes in your role or the broader organization, and explain how you approached the challenge and achieved success.
  • Innovation: Describe a time when you drove technological advancements in your previous roles, and explain how you identified opportunities for improvement and implemented innovative solutions.

Portfolio Presentation Strategy:

  • HPC Projects: Highlight your experience with HPC schedulers by showcasing projects where you've developed, optimized, or maintained large-scale HPC platforms.
  • Cloud Services: Demonstrate your expertise in AWS services by presenting projects that showcase your ability to manage and optimize resources.
  • Python Development: Include examples of your Python development work, emphasizing projects that involve infrastructure automation or data processing.
  • Linux Administration: Showcase your Linux administration skills by demonstrating your ability to manage and maintain large-scale systems.
  • Code Explanation: Clearly explain your approach to system design, architecture, and performance optimization in your projects, emphasizing your technical expertise and problem-solving abilities.

📝 Enhancement Note: The interview process for this role is designed to assess both your technical expertise and cultural fit within the organization, ensuring that you are well-prepared to succeed in the role and contribute to QRT's ongoing success.

📌 Application Steps

To apply for this Workload Scheduling HPC Platform Engineer position at Qube Research & Technologies:

  1. Submit Your Application: Click the application link and provide the required information, including your resume, cover letter, and portfolio.
  2. Prepare Your Portfolio: Tailor your portfolio to showcase your experience with HPC schedulers, cloud services, and related technologies, emphasizing your technical expertise and problem-solving abilities.
  3. Optimize Your Resume: Highlight your relevant skills, experience, and achievements, using web development and server administration-relevant keywords to improve your visibility to Applicant Tracking Systems (ATS).
  4. Research the Company: Familiarize yourself with QRT's company culture, technology stack, and industry focus, ensuring that your application and interview preparation are tailored to the organization's unique needs and values.

⚠️ Important Notice: This enhanced job description includes AI-generated insights and web technology industry-standard assumptions. All details should be verified directly with the hiring organization before making application decisions.


Content Guidelines (IMPORTANT: Do not include this in the output)

Web Technology-Specific Focus:

  • Tailor every section specifically to DevOps Engineer roles, emphasizing HPC environments, cloud services, and infrastructure management.
  • Include web technology methodologies, cloud architecture principles, and server management practices.
  • Emphasize portfolio requirements, live project demonstrations, and user experience considerations for HPC environments and cloud services.
  • Address DevOps team dynamics, cross-functional collaboration with technology and business teams, and the unique demands of HPC environments and cloud services.
  • Focus on DevOps career progression, emerging technology adoption, and technical specialization in HPC environments and cloud services.

Quality Standards:

  • Ensure no content overlap between sections; each section must contain unique information.
  • Only include Enhancement Notes when making significant inferences about technical responsibilities, team structure, or company information.
  • Be comprehensive but concise, prioritizing actionable information over descriptive text.
  • Strategically distribute web technology and server administration-related keywords throughout all sections naturally.
  • Provide realistic salary ranges based on location, experience level, and the unique demands of the role.

Industry Expertise:

  • Include specific HPC schedulers, cloud services, and infrastructure tools relevant to the role.
  • Address DevOps career progression paths and technical leadership opportunities in HPC environments and cloud services.
  • Provide tactical advice for portfolio development, live demonstrations, and project case studies tailored to HPC environments and cloud services.
  • Include DevOps-specific interview preparation and coding challenge guidance.
  • Emphasize performance optimization, scalability, and user experience principles in HPC environments and cloud services.

Professional Standards:

  • Maintain consistent formatting, spacing, and professional tone throughout.
  • Use web technology and server administration industry terminology appropriately and accurately.
  • Include comprehensive benefits and growth opportunities relevant to DevOps professionals in HPC environments and cloud services.
  • Provide actionable insights that give DevOps candidates a competitive advantage in the job application process.
  • Focus on DevOps team culture, cross-functional collaboration, and user impact measurement in HPC environments and cloud services.

Technical Focus & Portfolio Emphasis:

  • Emphasize HPC environments, cloud services, and infrastructure management best practices.
  • Include specific portfolio requirements tailored to the DevOps discipline and role level, with a strong focus on HPC schedulers, cloud services, and related technologies.
  • Address browser compatibility, accessibility standards, and user experience design principles in the context of HPC environments and cloud services.
  • Focus on problem-solving methods, performance optimization, and scalable architecture in HPC environments and cloud services.
  • Include technical presentation skills and stakeholder communication for HPC projects and cloud services.

Avoid:

  • Generic business jargon not relevant to DevOps roles in HPC environments and cloud services.
  • Placeholder text or incomplete sections.
  • Repetitive content across different sections.
  • Non-technical terminology unless relevant to the specific DevOps role in HPC environments and cloud services.
  • Marketing language unrelated to DevOps, HPC environments, or cloud services.

Generate comprehensive, web technology-focused content that serves as a valuable resource for DevOps professionals evaluating career opportunities and preparing for technical interviews in the web technology industry, with a strong emphasis on HPC environments and cloud services.

Application Requirements

Candidates should have experience with HPC schedulers and large-scale systems, along with a solid understanding of AWS services and infrastructure-as-code tools. Strong Linux administration skills and the ability to work in a fast-paced environment are also essential.