Workload Scheduling HPC Platform Engineer
📍 Job Overview
- Job Title: Workload Scheduling HPC Platform Engineer
- Company: Qube Research & Technologies
- Location: London
- Job Type: On-site
- Category: DevOps Engineer
- Date Posted: June 27, 2025
- Experience Level: Mid-level (2-5 years)
- Remote Status: On-site
🚀 Role Summary
- Develop and maintain High-Performance Computing (HPC) platforms using YellowDog and Ray schedulers across cloud and on-premises infrastructures.
- Collaborate with internal teams to integrate HPC solutions, enabling scalable and efficient compute capabilities.
- Improve performance, resilience, and observability of compute infrastructure through continuous improvement initiatives.
📝 Enhancement Note: This role requires a strong background in HPC environments and a solid understanding of cloud services to support both business and technology groups effectively.
💻 Primary Responsibilities
- Platform Development & Support: Develop and support scalable workload scheduling solutions for HPC environments using YellowDog and Ray schedulers.
- Team Collaboration: Work closely with internal teams to adopt and optimize HPC platforms, ensuring they meet the needs of various stakeholders.
- Infrastructure Optimization: Improve the performance, resilience, and observability of compute infrastructure by identifying bottlenecks and implementing enhancements.
- Automation & Continuous Improvement: Contribute to infrastructure automation and continuous improvement initiatives to streamline workflows and increase efficiency.
- Expertise Sharing: Share your expertise and support team development through coaching and collaboration, fostering a culture of learning and growth.
📝 Enhancement Note: This role requires a balance of technical expertise and strong communication skills to effectively collaborate with diverse teams and stakeholders.
🎓 Skills & Qualifications
Education: A bachelor's degree in Computer Science, Engineering, or a related field. Relevant experience may be considered in lieu of a degree.
Experience: Proven experience (2-5 years) in engineering and supporting HPC environments, with a strong focus on workload scheduling and large-scale systems.
Required Skills:
- Experience with at least one HPC scheduler (YellowDog, Ray, Slurm, or IBM Symphony)
- Deep understanding of both loosely coupled and tightly coupled HPC workloads
- Experience developing and supporting large-scale systems (5000+ nodes) and high levels of concurrency (100k+ tasks)
- Strong Linux administration skills
- Solid understanding of core AWS services, including VPC security, networking, EC2 configuration, and storage services (S3, EFS, EBS, and FSx)
- Experience with infrastructure-as-code using configuration languages and tools, particularly Terraform and Ansible
- Proficiency in Python for developing applications and tools
- Good understanding of monitoring and visualization tools for large-scale systems (CloudWatch, CloudTrail, OpenSearch, Athena)
- Experience with performance tuning of compute, network, and storage components
- Strong communication and collaboration skills, with the ability to describe complex issues at the appropriate level for different audiences
Preferred Skills:
- Experience with identity providers such as Okta and user authorization in large-scale distributed environments
- Familiarity with containerization (e.g., Docker, Kubernetes) and orchestration tools
- Knowledge of CI/CD pipelines and deployment strategies
- Experience with data analysis and visualization tools (e.g., Tableau, Power BI)
📝 Enhancement Note: Candidates with a strong background in both HPC environments and cloud services will be well-positioned to succeed in this role.
📊 Web Portfolio & Project Requirements
Portfolio Essentials:
- Demonstrate your experience with HPC schedulers by showcasing projects where you've developed, optimized, or maintained large-scale HPC platforms.
- Highlight your cloud services expertise by presenting projects that showcase your ability to manage and optimize AWS resources.
- Include examples of your Python development work, emphasizing projects that involve infrastructure automation or data processing.
- Showcase your Linux administration skills by demonstrating your ability to manage and maintain large-scale systems.
Technical Documentation:
- Provide clear and concise documentation for your HPC projects, explaining the architecture, configuration, and any performance optimizations implemented.
- Include documentation for your Python projects, detailing the purpose, functionality, and any relevant code snippets or examples.
- Demonstrate your understanding of monitoring and visualization tools by including screenshots, graphs, or other visual representations of system performance and health.
📝 Enhancement Note: When preparing your portfolio, focus on projects that showcase your ability to work with large-scale systems and optimize performance in HPC environments.
💵 Compensation & Benefits
Salary Range: £60,000 - £80,000 per annum (Based on market research and regional adjustments for London)
Benefits:
- Competitive salary and bonus structure
- Comprehensive health, dental, and vision insurance
- Generous pension contributions
- Flexible working hours and remote work options
- Employee equity and profit-sharing opportunities
- Professional development and training opportunities
- A dynamic and inclusive work environment that values diversity and collaboration
Working Hours: Full-time position with standard office hours (Monday-Friday, 9:00 AM - 5:30 PM), with flexibility for occasional overtime or remote work as needed.
📝 Enhancement Note: The salary range provided is based on market research for similar roles in the London area, considering the required experience level and the unique demands of this position.
🎯 Team & Company Context
🏢 Company Culture
Industry: Quantitative and systematic investment management, with a strong focus on technology and data-driven approaches.
Company Size: Medium-sized global investment manager with a collaborative and innovative culture.
Founded: 2003, with a history of growth and success in the investment management industry.
Team Structure:
- The Workload Scheduling team is part of the broader Technology group, which consists of various teams focused on data, research, trading, and infrastructure.
- The team works closely with business groups to understand their needs and provide tailored HPC solutions.
- The team follows Agile methodologies, with a focus on continuous improvement and collaboration.
Development Methodology:
- The team follows Agile/Scrum methodologies, with sprint planning, daily stand-ups, and regular retrospectives.
- Code reviews, testing, and quality assurance practices are integral to the development process.
- Deployment strategies, CI/CD pipelines, and server management are essential aspects of the role.
Company Website: Qube Research & Technologies
📝 Enhancement Note: QRT's company culture emphasizes technology, data, and collaboration, making it an ideal environment for candidates with strong technical skills and a passion for working in a dynamic, innovative setting.
📈 Career & Growth Analysis
Web Technology Career Level: This role is suitable for a mid-level DevOps Engineer with a strong background in HPC environments and cloud services. The position offers opportunities for growth and progression within the Technology group and the broader organization.
Reporting Structure: The role reports directly to the Head of Workload Scheduling, with regular interactions with other team members, business stakeholders, and technology leadership.
Technical Impact: The role has a significant impact on the performance, scalability, and efficiency of QRT's HPC platforms, directly contributing to the organization's ability to process and analyze large datasets effectively.
Growth Opportunities:
- Technical Growth: Develop expertise in emerging technologies, tools, and best practices related to HPC environments and cloud services.
- Leadership Growth: Gain experience in coaching and mentoring team members, contributing to the team's development and success.
- Architecture & Design: Contribute to the design and architecture of HPC platforms, influencing the organization's technology roadmap.
📝 Enhancement Note: This role offers numerous opportunities for growth and development, both technically and professionally, within QRT's collaborative and innovative work environment.
🌐 Work Environment
Office Type: Modern, collaborative office spaces designed to facilitate teamwork and innovation.
Office Location(s): London, with opportunities for remote work and flexible working arrangements.
Workspace Context:
- Collaborative Environment: The office features open-plan workspaces, meeting rooms, and breakout areas designed to encourage collaboration and communication among team members.
- Technical Infrastructure: The workspace is equipped with high-performance workstations, multiple monitors, and testing devices to support the development and maintenance of HPC platforms.
- Cross-functional Collaboration: The team works closely with other technology groups and business stakeholders, fostering a culture of knowledge sharing and continuous learning.
Work Schedule: Full-time position with standard office hours (Monday-Friday, 9:00 AM - 5:30 PM), with flexibility for occasional overtime or remote work as needed.
📝 Enhancement Note: QRT's work environment emphasizes collaboration, innovation, and continuous learning, providing an ideal setting for candidates seeking to grow both technically and professionally.
📄 Application & Technical Interview Process
Interview Process:
- Phone/Video Screen: A brief conversation to discuss your background, experience, and fit for the role (30-45 minutes).
- Technical Deep Dive: A detailed discussion of your technical skills, experience, and approach to HPC environments and cloud services (60-90 minutes).
- Behavioral & Cultural Fit: An assessment of your communication skills, problem-solving abilities, and cultural fit within the team and organization (30-45 minutes).
- Final Interview: A meeting with the hiring manager or other senior stakeholders to discuss your qualifications, answer any remaining questions, and make a final decision (30-45 minutes).
Portfolio Review Tips:
- HPC Projects: Highlight your experience with HPC schedulers by showcasing projects where you've developed, optimized, or maintained large-scale HPC platforms.
- Cloud Services: Demonstrate your expertise in AWS services by presenting projects that showcase your ability to manage and optimize resources.
- Python Development: Include examples of your Python development work, emphasizing projects that involve infrastructure automation or data processing.
- Linux Administration: Showcase your Linux administration skills by demonstrating your ability to manage and maintain large-scale systems.
Technical Challenge Preparation:
- HPC & Cloud Services: Brush up on your knowledge of HPC schedulers, AWS services, and cloud architecture best practices.
- Python & Linux: Review your Python development skills and Linux administration fundamentals to ensure you can effectively demonstrate your expertise.
- Problem-solving & Communication: Prepare for behavioral questions that assess your problem-solving abilities, communication skills, and cultural fit within the team and organization.
📝 Enhancement Note: The interview process for this role is designed to assess both your technical expertise and cultural fit within the organization, ensuring that you are well-prepared to succeed in the role and contribute to QRT's ongoing success.
🛠 Technology Stack & Web Infrastructure
HPC Schedulers:
- YellowDog
- Ray
Cloud Services:
- AWS (Amazon Web Services)
- VPC (Virtual Private Cloud)
- EC2 (Elastic Compute Cloud)
- S3 (Simple Storage Service)
- EFS (Elastic File System)
- EBS (Elastic Block Store)
- FSx (Amazon FSx)
- CloudWatch (Amazon CloudWatch)
- CloudTrail (AWS CloudTrail)
- OpenSearch (formerly Amazon Elasticsearch Service)
- Athena (Amazon Athena)
Infrastructure-as-Code Tools:
- Terraform
- Ansible
Operating Systems:
- Linux (Ubuntu, CentOS, etc.)
- macOS
- Windows
Programming Languages:
- Python
- Bash
- Shell scripting
Databases:
- Relational databases (PostgreSQL, MySQL, etc.)
- NoSQL databases (MongoDB, Cassandra, etc.)
Monitoring & Visualization Tools:
- Prometheus
- Grafana
- ELK Stack (Elasticsearch, Logstash, Kibana)
- Zabbix
- Nagios
📝 Enhancement Note: Familiarity with the technology stack listed above is essential for success in this role, as it forms the foundation of QRT's HPC platforms and infrastructure.
👥 Team Culture & Values
Web Development Values:
- Innovation: Embrace a culture of continuous learning and improvement, driving technological advancements in HPC environments and cloud services.
- Collaboration: Foster a collaborative work environment that encourages knowledge sharing, teamwork, and mutual support.
- Quality: Maintain a strong focus on quality and excellence in all aspects of your work, from code development to system design and optimization.
- Performance: Prioritize performance and efficiency in your work, continually seeking opportunities to improve the scalability, reliability, and speed of QRT's HPC platforms.
Collaboration Style:
- Cross-functional Integration: Work closely with other technology groups and business stakeholders to understand their needs and provide tailored HPC solutions.
- Code Review Culture: Participate in code reviews and pair programming practices to ensure high-quality work and knowledge sharing among team members.
- Knowledge Sharing: Contribute to the team's development and success by sharing your expertise and supporting the growth of your colleagues.
📝 Enhancement Note: QRT's team culture emphasizes collaboration, innovation, and continuous learning, providing an ideal environment for candidates seeking to grow both technically and professionally.
⚡ Challenges & Growth Opportunities
Technical Challenges:
- HPC Scheduler Optimization: Develop and implement strategies to optimize the performance and efficiency of YellowDog and Ray schedulers in large-scale HPC environments.
- Cloud Services Management: Manage and optimize AWS resources to ensure cost-effectiveness, scalability, and reliability in QRT's HPC platforms.
- Performance Tuning: Identify and address performance bottlenecks in HPC workloads, cloud services, and infrastructure components to maximize efficiency and throughput.
- Emerging Technologies: Stay up-to-date with the latest developments in HPC environments, cloud services, and related technologies, and evaluate their applicability to QRT's infrastructure.
Learning & Development Opportunities:
- Technical Skill Development: Develop expertise in emerging technologies, tools, and best practices related to HPC environments and cloud services.
- Conference Attendance: Attend industry conferences, workshops, and training sessions to expand your knowledge and network with other professionals in the field.
- Certification & Community Involvement: Pursue relevant certifications and engage with online communities to enhance your skills and stay informed about the latest trends and best practices in HPC environments and cloud services.
- Mentorship & Leadership Development: Seek mentorship opportunities from senior team members and contribute to the development of your colleagues through coaching and knowledge sharing.
📝 Enhancement Note: This role offers numerous opportunities for technical growth and development, both through hands-on experience and structured learning initiatives.
💡 Interview Preparation
Technical Questions:
- HPC Schedulers: Describe your experience with YellowDog, Ray, or other HPC schedulers, and explain how you've optimized their performance in large-scale environments.
- Cloud Services: Explain your approach to managing and optimizing AWS resources, and provide examples of your experience with VPC, EC2, S3, and other cloud services.
- Performance Tuning: Discuss your experience with performance tuning in HPC environments, cloud services, and infrastructure components, and describe your approach to identifying and addressing bottlenecks.
- Problem-solving: Present a challenging problem you've faced in your previous roles and explain your approach to solving it, emphasizing your technical skills and problem-solving abilities.
Company & Culture Questions:
- Team Dynamics: Describe your experience working in a collaborative, cross-functional team environment, and explain how you've contributed to the success of your colleagues and the broader organization.
- Adaptability: Discuss a situation where you had to adapt to significant changes in your role or the broader organization, and explain how you approached the challenge and achieved success.
- Innovation: Describe a time when you drove technological advancements in your previous roles, and explain how you identified opportunities for improvement and implemented innovative solutions.
Portfolio Presentation Strategy:
- HPC Projects: Highlight your experience with HPC schedulers by showcasing projects where you've developed, optimized, or maintained large-scale HPC platforms.
- Cloud Services: Demonstrate your expertise in AWS services by presenting projects that showcase your ability to manage and optimize resources.
- Python Development: Include examples of your Python development work, emphasizing projects that involve infrastructure automation or data processing.
- Linux Administration: Showcase your Linux administration skills by demonstrating your ability to manage and maintain large-scale systems.
- Code Explanation: Clearly explain your approach to system design, architecture, and performance optimization in your projects, emphasizing your technical expertise and problem-solving abilities.
📝 Enhancement Note: The interview process for this role is designed to assess both your technical expertise and cultural fit within the organization, ensuring that you are well-prepared to succeed in the role and contribute to QRT's ongoing success.
📌 Application Steps
To apply for this Workload Scheduling HPC Platform Engineer position at Qube Research & Technologies:
- Submit Your Application: Click the application link and provide the required information, including your resume, cover letter, and portfolio.
- Prepare Your Portfolio: Tailor your portfolio to showcase your experience with HPC schedulers, cloud services, and related technologies, emphasizing your technical expertise and problem-solving abilities.
- Optimize Your Resume: Highlight your relevant skills, experience, and achievements, using web development and server administration-relevant keywords to improve your visibility to Applicant Tracking Systems (ATS).
- Research the Company: Familiarize yourself with QRT's company culture, technology stack, and industry focus, ensuring that your application and interview preparation are tailored to the organization's unique needs and values.
⚠️ Important Notice: This enhanced job description includes AI-generated insights and web technology industry-standard assumptions. All details should be verified directly with the hiring organization before making application decisions.
Content Guidelines (IMPORTANT: Do not include this in the output)
Web Technology-Specific Focus:
- Tailor every section specifically to DevOps Engineer roles, emphasizing HPC environments, cloud services, and infrastructure management.
- Include web technology methodologies, cloud architecture principles, and server management practices.
- Emphasize portfolio requirements, live project demonstrations, and user experience considerations for HPC environments and cloud services.
- Address DevOps team dynamics, cross-functional collaboration with technology and business teams, and the unique demands of HPC environments and cloud services.
- Focus on DevOps career progression, emerging technology adoption, and technical specialization in HPC environments and cloud services.
Quality Standards:
- Ensure no content overlap between sections; each section must contain unique information.
- Only include Enhancement Notes when making significant inferences about technical responsibilities, team structure, or company information.
- Be comprehensive but concise, prioritizing actionable information over descriptive text.
- Strategically distribute web technology and server administration-related keywords throughout all sections naturally.
- Provide realistic salary ranges based on location, experience level, and the unique demands of the role.
Industry Expertise:
- Include specific HPC schedulers, cloud services, and infrastructure tools relevant to the role.
- Address DevOps career progression paths and technical leadership opportunities in HPC environments and cloud services.
- Provide tactical advice for portfolio development, live demonstrations, and project case studies tailored to HPC environments and cloud services.
- Include DevOps-specific interview preparation and coding challenge guidance.
- Emphasize performance optimization, scalability, and user experience principles in HPC environments and cloud services.
Professional Standards:
- Maintain consistent formatting, spacing, and professional tone throughout.
- Use web technology and server administration industry terminology appropriately and accurately.
- Include comprehensive benefits and growth opportunities relevant to DevOps professionals in HPC environments and cloud services.
- Provide actionable insights that give DevOps candidates a competitive advantage in the job application process.
- Focus on DevOps team culture, cross-functional collaboration, and user impact measurement in HPC environments and cloud services.
Technical Focus & Portfolio Emphasis:
- Emphasize HPC environments, cloud services, and infrastructure management best practices.
- Include specific portfolio requirements tailored to the DevOps discipline and role level, with a strong focus on HPC schedulers, cloud services, and related technologies.
- Address browser compatibility, accessibility standards, and user experience design principles in the context of HPC environments and cloud services.
- Focus on problem-solving methods, performance optimization, and scalable architecture in HPC environments and cloud services.
- Include technical presentation skills and stakeholder communication for HPC projects and cloud services.
Avoid:
- Generic business jargon not relevant to DevOps roles in HPC environments and cloud services.
- Placeholder text or incomplete sections.
- Repetitive content across different sections.
- Non-technical terminology unless relevant to the specific DevOps role in HPC environments and cloud services.
- Marketing language unrelated to DevOps, HPC environments, or cloud services.
Generate comprehensive, web technology-focused content that serves as a valuable resource for DevOps professionals evaluating career opportunities and preparing for technical interviews in the web technology industry, with a strong emphasis on HPC environments and cloud services.
Application Requirements
Candidates should have experience with HPC schedulers and large-scale systems, along with a solid understanding of AWS services and infrastructure-as-code tools. Strong Linux administration skills and the ability to work in a fast-paced environment are also essential.