Senior Platform Engineer
📍 Job Overview
- Job Title: Senior Platform Engineer with Specialization in HPC, Kubernetes, or Cloud
- Company: Millennium
- Location: Dublin, Leinster, Ireland
- Job Type: On-site, Full-time
- Category: DevOps Engineer, System Administrator, Web Infrastructure
- Date Posted: July 31, 2025
- Experience Level: 5-10 years
- Remote Status: On-site
🚀 Role Summary
- Key Responsibilities: Design, implement, and maintain scalable and reliable infrastructure solutions. Solve complex engineering problems and optimize system performance.
- Key Technologies: High-Performance Computing (HPC), Kubernetes, cloud platforms (Google Cloud Platform - GCP), Linux, software engineering, and container orchestration.
📝 Enhancement Note: This role requires a strong background in one or more of the specified technologies and a deep understanding of infrastructure management, system optimization, and problem-solving.
💻 Primary Responsibilities
- Infrastructure Management: Optimize Linux-based systems for performance and reliability. Design and manage HPC job scheduling systems. Develop and maintain workflow management and batch processing solutions.
- Container Orchestration: Implement and manage container orchestration using Kubernetes. Deploy and manage large-scale storage systems. Ensure security and compliance in cloud environments.
- Cloud Management: Provision and manage cloud environments, with a focus on GCP. Implement and manage cloud-native services and infrastructure on GCP. Ensure security and compliance in cloud environments.
- Collaboration: Contribute to software engineering efforts, CI/CD, and codebase management. Collaborate with development teams to optimize cloud application performance and define cloud strategies and roadmaps.
📝 Enhancement Note: This role involves a high degree of collaboration with development teams and stakeholders, requiring strong communication skills and a customer-focused mindset.
🎓 Skills & Qualifications
Education: Bachelor's degree in Computer Science, Engineering, or a related field. Relevant certifications are a plus.
Experience: 5-10 years of experience in infrastructure management, system administration, or a related role. Proven track record of designing, implementing, and maintaining scalable and reliable infrastructure solutions.
Required Skills:
- General: Ability to review and/or extend open-source platforms to satisfy business requirements. A passion for technology and automation, deep sense of curiosity, and willingness to always question. A passion for in-depth understanding of technology and building large-scale systems. Excellent verbal and written communication skills.
- HPC Job Scheduling: Experience in environments at scale (e.g., billions of jobs per week/month). Understanding of cost metrics, preemption, job types, queuing, scheduler, and optimizations. Experience with products like HTCondor, Slurm, Spectrum LSF, Nomad, AWS Batch.
- Container Orchestration (Kubernetes): Experience with Helm, admission/mutation controllers, PVs/PVCs, kube-router, BGP. Experience with Docker & registries (e.g., Harbor, Artifactory, GCP Container Registry, AWS Container Registry). Mature approach to dealing with operational complexities of the Kubernetes platform. Experience with Kubernetes security best practices, including RBAC, network policies, and secrets management. Experience with Kubernetes monitoring and logging tools (e.g., Prometheus, Grafana, ELK stack). Experience with Kubernetes cluster management and scaling.
- Storage Systems: Experience deploying and managing petabyte-scale systems supporting varied workloads. Mature approach to accessing price/performance, tiering, and backup requirements. Experience with products like GPFS, NetApp, Pure, Vast, GCP PDs, or other NVMe-specific products. Familiarity with NVMeOF, POSIX, object storage, and various modes of permissioning data.
- Linux: Extensive experience using configuration management systems (e.g., SaltStack, Ansible, Chef) to automate and manage large-scale Linux environments. Deep understanding of Linux kernel components (e.g., VFS, scheduler, memory management, networking stack) and their impact on system performance and stability. Solid troubleshooting experience using advanced tools such as gdb, strace, perf, ltrace, and other OS/application tracing and profiling mechanisms. Experience with containerization and virtualization technologies, including Docker, LXD/LXC, Kubernetes, KVM, QEMU, and VMware. Familiarity with modern Linux security mechanisms, including Kerberos, SELinux, AppArmor, and system hardening techniques. Proficiency in leveraging eBPF for performance monitoring, debugging, and security enforcement in Linux systems.
- Software Engineering: Proficient in one or more programming languages (Python and Go are used). Familiarity with git and CI/CD concepts. Comfortable contributing to a large codebase with varied technologies.
- Cloud (GCP): The chosen candidate will own the infrastructure that makes up the GCP offering and will also own the relationship between the firm and Google. Knowledge of core GCP technologies (Compute, GCS, BigQuery) and an understanding of GCP's underlying architecture. Experience with infrastructure-level GCP technologies (VPC Networking, Interconnect, Billing functionality, Image management) and the ability to optimize for performance and cost-efficiency. Strong understanding of GCP security technologies (Org Policy, VPC Service Controls, Resource Manager Tags, Firewalls, IAM) and the ability to implement security best practices. Knowledge of Terraform and experience managing a CI/CD desired state configuration pipeline for GCP. Familiarity with the gcloud toolset and the ability to write Python code against the Google API for automation and integration. Familiarity with other cloud platforms (AWS, Azure) and their interoperability with GCP. Experience designing and implementing scalable, highly available, and fault-tolerant architectures using GCP services, always following best practices. Proficiency in GCP monitoring and logging tools (e.g., Cloud Monitoring, Cloud Logging, Error Reporting, Trace) to ensure system health and performance, with the ability to set up alerts and dashboards. Experience of cost management, including monitoring and optimizing cloud spend using GCP's cost management tools (e.g., Billing Reports, Budgets, Recommendations). Ability to troubleshoot and resolve complex issues across the GCP stack, leveraging advanced tools and a systematic approach to problem-solving. Strong communication and collaboration skills, with the ability to act as a trusted advisor to colleagues, mentor team members, and share knowledge through documentation and training.
Preferred Skills:
- Experience with infrastructure as code (IaC) tools like Terraform or CloudFormation.
- Familiarity with infrastructure automation tools like Ansible or Puppet.
- Knowledge of IT service management (ITSM) frameworks and processes.
- Experience with Agile methodologies and DevOps practices.
📝 Enhancement Note: This role requires a broad set of skills and experience, with a strong focus on infrastructure management, system optimization, and problem-solving. Candidates should have a proven track record in at least one of the specified technology areas and be eager to learn and adapt in a dynamic environment.
📊 Web Portfolio & Project Requirements
Portfolio Essentials:
- HPC Job Scheduling: Include projects demonstrating your experience with HPC job scheduling systems, such as HTCondor, Slurm, or Spectrum LSF. Highlight your understanding of cost metrics, preemption, job types, queuing, and optimizations.
- Container Orchestration: Showcase your Kubernetes experience by including projects that demonstrate your ability to manage containerized applications, implement Helm charts, and manage Kubernetes clusters. Highlight your understanding of Kubernetes security best practices and monitoring tools.
- Storage Systems: Present projects that showcase your experience deploying and managing large-scale storage systems. Highlight your understanding of storage technologies, such as GPFS, NetApp, or GCP PDs, and your ability to optimize price/performance and backup requirements.
- Linux: Demonstrate your Linux expertise by including projects that showcase your ability to automate and manage large-scale Linux environments using configuration management systems. Highlight your deep understanding of Linux kernel components and your proficiency in leveraging eBPF for performance monitoring and debugging.
- Software Engineering: Display your software engineering skills by including projects that demonstrate your ability to contribute to a large codebase, work with CI/CD pipelines, and write clean, efficient code. Highlight your proficiency in Python and/or Go.
- Cloud (GCP): Showcase your GCP experience by including projects that demonstrate your ability to provision and manage cloud environments, implement and manage cloud-native services, and optimize for performance and cost-efficiency. Highlight your understanding of GCP security technologies and your ability to implement security best practices.
Technical Documentation:
- Code Quality: Document your code quality standards, including coding conventions, commenting, and documentation practices. Explain your approach to version control, code reviews, and testing methodologies.
- Infrastructure as Code (IaC): Document your IaC approach, including the tools you use (e.g., Terraform, CloudFormation) and your strategies for managing infrastructure as code. Explain your approach to version control, code reviews, and testing IaC configurations.
- Monitoring and Logging: Document your monitoring and logging strategies, including the tools you use (e.g., Prometheus, Grafana, ELK stack) and your approach to setting up alerts and dashboards. Explain your approach to performance optimization and system health management.
📝 Enhancement Note: This role requires a strong portfolio that demonstrates your technical expertise and problem-solving skills. Be prepared to showcase your projects, explain your approach to technical challenges, and articulate your understanding of the underlying technologies.
💵 Compensation & Benefits
Salary Range: €80,000 - €120,000 per year (based on experience and market research)
Benefits:
- Competitive salary and bonus structure
- Comprehensive health, dental, and vision insurance
- Retirement savings plan with company match
- Generous vacation and time-off policies
- Professional development opportunities, including training, conferences, and certifications
- Employee assistance program (EAP)
- On-site gym and wellness facilities
- Subsidized lunches and snacks
- Company-sponsored social events and team-building activities
Working Hours: Full-time, Monday-Friday, 9:00 AM - 5:30 PM. Flexible working hours and remote work options may be available for certain roles and teams.
📝 Enhancement Note: This role offers a competitive salary and comprehensive benefits package. The salary range is based on market research and considers the candidate's experience level and the role's complexity.
🎯 Team & Company Context
Company Culture:
- Industry: Financial services and technology
- Company Size: Large (15,000+ employees)
- Founded: 1998
- Team Structure: The platform engineering team is part of the global technology organization, working closely with development teams, product managers, and stakeholders to deliver scalable and reliable infrastructure solutions.
- Development Methodology: Agile/Scrum methodologies are used for software development, with sprint planning, daily stand-ups, and regular retrospectives. Infrastructure as code (IaC) practices are employed to manage infrastructure in a version-controlled and automated manner.
Company Website: Millennium
📝 Enhancement Note: Millennium is a large, global financial services and technology company with a strong focus on innovation and digital transformation. The platform engineering team plays a critical role in delivering scalable and reliable infrastructure solutions that support the company's technology stack and business objectives.
Career & Growth Analysis:
- Web Technology Career Level: Senior Platform Engineer (Specialization in HPC, Kubernetes, or Cloud)
- Reporting Structure: The Senior Platform Engineer reports directly to the Head of Platform Engineering or a similar role within the global technology organization. The role may have direct reports, depending on the team structure and organizational needs.
- Technical Impact: The Senior Platform Engineer has a significant impact on the company's technology stack, user experience, and infrastructure decisions. Their work directly influences the performance, scalability, and reliability of the company's infrastructure and applications.
- Growth Opportunities:
- Technical Growth: The Senior Platform Engineer can grow their technical expertise by working on cutting-edge projects, staying up-to-date with industry trends, and contributing to open-source communities. They can also advance their career by taking on more complex projects, leading teams, or moving into a technical leadership role.
- Managerial Growth: The Senior Platform Engineer can transition into a management role by demonstrating strong leadership skills, mentoring team members, and driving team success. They can also grow their career by taking on more strategic responsibilities, such as defining infrastructure roadmaps, driving technology adoption, or leading cross-functional initiatives.
📝 Enhancement Note: The Senior Platform Engineer role offers significant growth opportunities, both technically and managerially. Candidates should be eager to take on new challenges, learn continuously, and contribute to the team's success.
Work Environment:
- Office Type: Modern, collaborative office spaces with open-plan work areas, meeting rooms, and breakout spaces.
- Office Location(s): Dublin, Ireland (Headquarters). Millennium has offices in over 100 countries, with remote work options available for certain roles and teams.
- Workspace Context:
- Collaboration: The workspace is designed to foster collaboration, with open-plan work areas, meeting rooms, and breakout spaces. The team works closely together, sharing knowledge, and supporting each other's growth.
- Development Tools: The workspace is equipped with modern development tools, including high-performance workstations, multiple monitors, and testing devices. The team uses version control systems, CI/CD pipelines, and infrastructure as code (IaC) tools to manage their work efficiently.
- Work Arrangement: The workspace offers flexible work arrangements, including on-site, hybrid, and remote work options, depending on the role and team preferences.
Work Schedule: The work schedule is typically Monday-Friday, 9:00 AM - 5:30 PM, with flexible working hours and remote work options available for certain roles and teams. The work schedule may vary depending on the project's needs and deadlines.
📝 Enhancement Note: Millennium offers a modern, collaborative work environment that supports the team's success and fosters growth. The workspace is designed to encourage collaboration, knowledge-sharing, and continuous learning.
📄 Application & Technical Interview Process
Interview Process:
- Phone Screen: A brief phone or video call to assess communication skills, cultural fit, and basic technical knowledge.
- Technical Assessment: A hands-on technical assessment, focusing on the candidate's expertise in one or more of the specified technology areas (HPC, Kubernetes, cloud platforms, Linux, software engineering, or GCP). The assessment may include coding challenges, system design exercises, or infrastructure management tasks.
- On-site Interview: A full-day on-site interview, including technical deep dives, system design discussions, and behavioral interviews. The on-site interview may include meetings with team members, stakeholders, and senior leadership.
- Final Evaluation: A final evaluation based on the candidate's performance throughout the interview process, their technical expertise, and their cultural fit with the team.
Portfolio Review Tips:
- HPC Job Scheduling: Highlight your experience with HPC job scheduling systems, such as HTCondor, Slurm, or Spectrum LSF. Explain your understanding of cost metrics, preemption, job types, queuing, and optimizations. Demonstrate your ability to design, implement, and manage HPC job scheduling systems in large-scale environments.
- Container Orchestration: Showcase your Kubernetes experience by including projects that demonstrate your ability to manage containerized applications, implement Helm charts, and manage Kubernetes clusters. Explain your understanding of Kubernetes security best practices and monitoring tools. Demonstrate your ability to design, implement, and manage container orchestration systems in large-scale environments.
- Storage Systems: Present projects that showcase your experience deploying and managing large-scale storage systems. Explain your understanding of storage technologies, such as GPFS, NetApp, or GCP PDs, and your ability to optimize price/performance and backup requirements. Demonstrate your ability to design, implement, and manage storage systems in large-scale environments.
- Linux: Demonstrate your Linux expertise by including projects that showcase your ability to automate and manage large-scale Linux environments using configuration management systems. Explain your deep understanding of Linux kernel components and your proficiency in leveraging eBPF for performance monitoring and debugging. Demonstrate your ability to design, implement, and manage Linux-based systems in large-scale environments.
- Software Engineering: Display your software engineering skills by including projects that demonstrate your ability to contribute to a large codebase, work with CI/CD pipelines, and write clean, efficient code. Explain your understanding of software engineering best practices, version control, and testing methodologies. Demonstrate your ability to design, implement, and maintain software engineering projects in large-scale environments.
- Cloud (GCP): Showcase your GCP experience by including projects that demonstrate your ability to provision and manage cloud environments, implement and manage cloud-native services, and optimize for performance and cost-efficiency. Explain your understanding of GCP security technologies and your ability to implement security best practices. Demonstrate your ability to design, implement, and manage cloud environments using GCP services in large-scale environments.
Technical Challenge Preparation:
- HPC Job Scheduling: Brush up on your knowledge of HPC job scheduling systems, cost metrics, preemption, job types, queuing, and optimizations. Practice designing and implementing HPC job scheduling systems using tools like HTCondor, Slurm, or Spectrum LSF.
- Container Orchestration: Review your Kubernetes knowledge, focusing on container orchestration, Helm charts, and Kubernetes cluster management. Practice designing and implementing container orchestration systems using Kubernetes. Familiarize yourself with Kubernetes security best practices and monitoring tools.
- Storage Systems: Refresh your knowledge of storage technologies, such as GPFS, NetApp, or GCP PDs, and their price/performance and backup requirements. Practice designing and implementing storage systems using tools like NetApp, Pure, or GCP PDs.
- Linux: Review your Linux expertise, focusing on configuration management systems, Linux kernel components, and eBPF for performance monitoring and debugging. Practice designing and implementing Linux-based systems using tools like SaltStack, Ansible, or Chef.
- Software Engineering: Brush up on your software engineering skills, focusing on contributing to large codebases, working with CI/CD pipelines, and writing clean, efficient code. Practice designing and implementing software engineering projects using tools like Git, Jenkins, or CircleCI.
- Cloud (GCP): Review your GCP knowledge, focusing on provisioning and managing cloud environments, implementing and managing cloud-native services, and optimizing for performance and cost-efficiency. Practice designing and implementing cloud environments using GCP services and tools like Terraform or CloudFormation.
📝 Enhancement Note: The technical interview process for this role is comprehensive and challenging, focusing on the candidate's expertise in one or more of the specified technology areas. Candidates should be prepared to demonstrate their technical skills, problem-solving abilities, and cultural fit with the team.
💡 Interview Preparation
Technical Questions:
- HPC Job Scheduling:
- Can you explain the difference between preemptive and non-preemptive job scheduling?
- How do you optimize job scheduling in a large-scale environment with billions of jobs per week/month?
- Can you describe your experience with cost metrics and how you've optimized job scheduling to reduce costs?
- Container Orchestration:
- How do you manage containerized applications at scale using Kubernetes?
- Can you explain the difference between admission and mutation controllers in Kubernetes?
- How do you ensure the security of your Kubernetes clusters, and what are some best practices you follow?
- Storage Systems:
- How do you manage large-scale storage systems, and what are some best practices you follow?
- Can you describe your experience with price/performance optimization and backup requirements in storage systems?
- How do you ensure the security and compliance of your storage systems?
- Linux:
- Can you explain the difference between VFS, scheduler, and memory management in the Linux kernel?
- How do you optimize Linux-based systems for performance and reliability?
- Can you describe your experience with configuration management systems like SaltStack, Ansible, or Chef?
- Software Engineering:
- How do you approach contributing to a large codebase, and what are some best practices you follow?
- Can you describe your experience with version control, CI/CD pipelines, and testing methodologies?
- How do you ensure the quality and maintainability of your code?
- Cloud (GCP):
- How do you manage cloud environments, and what are some best practices you follow?
- Can you describe your experience with GCP security technologies and how you've implemented security best practices?
- How do you optimize cloud environments for performance and cost-efficiency?
Company & Culture Questions:
- Company Culture:
- How would you describe the company culture at Millennium, and how does it align with your personal values?
- Can you give an example of a time when you had to adapt to a new work environment or company culture, and how did you approach it?
- Team Dynamics:
- How do you collaborate with development teams, and what are some best practices you follow?
- Can you describe a time when you had to resolve a conflict or disagreement within a team, and how did you approach it?
- Problem-Solving:
- Can you describe a complex technical challenge you've faced in the past, and how did you approach solving it?
- How do you ensure the scalability and reliability of your infrastructure solutions, and what are some best practices you follow?
Portfolio Presentation Strategy:
- HPC Job Scheduling:
- Highlight your experience with HPC job scheduling systems, such as HTCondor, Slurm, or Spectrum LSF.
- Explain your understanding of cost metrics, preemption, job types, queuing, and optimizations.
- Demonstrate your ability to design, implement, and manage HPC job scheduling systems in large-scale environments.
- Container Orchestration:
- Showcase your Kubernetes experience by including projects that demonstrate your ability to manage containerized applications, implement Helm charts, and manage Kubernetes clusters.
- Explain your understanding of Kubernetes security best practices and monitoring tools.
- Demonstrate your ability to design, implement, and manage container orchestration systems in large-scale environments.
- Storage Systems:
- Present projects that showcase your experience deploying and managing large-scale storage systems.
- Explain your understanding of storage technologies, such as GPFS, NetApp, or GCP PDs, and your ability to optimize price/performance and backup requirements.
- Demonstrate your ability to design, implement, and manage storage systems in large-scale environments.
- Linux:
- Demonstrate your Linux expertise by including projects that showcase your ability to automate and manage large-scale Linux environments using configuration management systems.
- Explain your deep understanding of Linux kernel components and your proficiency in leveraging eBPF for performance monitoring and debugging.
- Demonstrate your ability to design, implement, and manage Linux-based systems in large-scale environments.
- Software Engineering:
- Display your software engineering skills by including projects that demonstrate your ability to contribute to a large codebase, work with CI/CD pipelines, and write clean, efficient code.
- Explain your understanding of software engineering best practices, version control, and testing methodologies.
- Demonstrate your ability to design, implement, and maintain software engineering projects in large-scale environments.
- Cloud (GCP):
- Showcase your GCP experience by including projects that demonstrate your ability to provision and manage cloud environments, implement and manage cloud-native services, and optimize for performance and cost-efficiency.
- Explain your understanding of GCP security technologies and your ability to implement security best practices.
- Demonstrate your ability to design, implement, and manage cloud environments using GCP services in large-scale environments.
📝 Enhancement Note: The portfolio presentation strategy for this role is critical, as it allows candidates to showcase their technical expertise, problem-solving abilities, and cultural fit with the team. Candidates should be prepared to explain their approach to technical challenges, articulate their understanding of the underlying technologies, and demonstrate their ability to design, implement, and manage infrastructure solutions in large-scale environments.
📌 Application Steps
To apply for this Senior Platform Engineer role with a specialization in HPC, Kubernetes, or Cloud:
- Customize Your Portfolio: Tailor your portfolio to highlight your experience with HPC job scheduling systems, container orchestration, storage systems, Linux, software engineering, and cloud environments (GCP). Ensure your portfolio demonstrates your ability to design, implement, and manage infrastructure solutions in large-scale environments.
- Optimize Your Resume: Highlight your relevant experience, skills, and accomplishments in your resume. Focus on your technical expertise, problem-solving abilities, and cultural fit with the team. Include specific project examples that demonstrate your ability to design, implement, and manage infrastructure solutions in large-scale environments.
- Prepare for Technical Challenges: Brush up on your knowledge of HPC job scheduling systems, container orchestration, storage systems, Linux, software engineering, and cloud environments (GCP). Practice designing and implementing infrastructure solutions using the relevant technologies and tools. Familiarize yourself with the company's technology stack and infrastructure requirements.
- Research the Company: Learn about Millennium's company culture, technology stack, and infrastructure requirements. Understand the company's business objectives and how the Senior Platform Engineer role contributes to its success. Prepare thoughtful questions to ask during the interview process.
📝 Important Notice: This enhanced job description includes AI-generated insights and web development/server administration industry-standard assumptions. All details should be verified directly with the hiring organization before making application decisions.
Application Requirements
Candidates should have deep expertise in HPC, Kubernetes, or cloud platforms, particularly GCP. Strong communication skills and a passion for technology and automation are essential.