Senior Engineer - Cloud Development
📍 Job Overview
- Job Title: Senior Engineer - Cloud Development
- Company: Graphcore
- Location: Bristol, UK
- Job Type: On-site
- Category: Cloud Development
- Date Posted: August 1, 2025
🚀 Role Summary
- Develop and deploy services on Graphcore's fleet of cutting-edge AI systems, working closely with Platform Engineering, Datacentre Operations, and Product Development teams.
- Collaborate with external vendors to specify, benchmark, and integrate third-party products into Graphcore's Cloud Reference Design.
- Maintain and operate Graphcore's AI systems at peak performance in private clouds, working with Datacentre Operations Engineers.
- Operate and extend existing OpenStack-based cloud services, and contribute to the deployment and development of new ones.
- Configure and test new Graphcore AI hardware and systems using Continuous Deployment and Infrastructure-as-code in internal and external datacentres.
💻 Primary Responsibilities
- Service Development and Deployment: Develop and operate end-user services on Graphcore's private clouds, supporting internal users and turning end-user and product requirements into deployed services.
- Cloud Metrics and Analysis: Help build automation to collect and analyze metrics and other data from cloud services, supporting clear identification and reporting of any issues. Work with users to provide information about product-related issues to Engineering and QA departments.
- AI System Maintenance: Work with Datacentre Operations Engineers to maintain and operate the fleet of AI systems at peak performance in Graphcore's private clouds.
- Cloud Service Management: Operate and extend existing OpenStack-based cloud services, and contribute to the deployment and development of new ones.
- Hardware Configuration and Testing: Configure and test new Graphcore AI hardware and systems using Continuous Deployment and Infrastructure-as-code in internal and external datacentres.
- Vendor Collaboration: Work with external vendors of off-the-shelf switches, servers, and storage solutions to specify, benchmark, and integrate third-party products into Graphcore's Cloud Reference Design.
- System Troubleshooting: Drive corrective actions for systems that are not operating correctly, working with DC operations and Graphcore Engineering as required.
🎓 Skills & Qualifications
Education:
- Bachelor's degree or equivalent practical experience in a relevant subject.
Experience:
- Solid software engineering or IT experience with a proven track record of delivering technical output as an individual contributor.
- Experience specifying, scoping, estimating, and detailing work plans in an AGILE and SCRUM framework, including priorities, risks, issues, impacts, and constraints.
- Strong proven Linux scripting ability (bash, python, awk, sed).
- Strong proven Linux system administration (Ubuntu, RHEL, and variants).
- Experience with a version control system (preferably Git) and using it to manage system configuration or automation.
- Experience with Continuous Integration or testing pipelines using GitLab, GitHub, or similar.
- A solid hands-on understanding of the technologies underpinning cloud services (APIs, virtualisation of CPUs, IO, systems), virtual networks, block storage, resource management, and monitoring.
- Experience with OpenStack deployments or the technologies they rely on (e.g., Ceph, Open vSwitch, KVM, QEMU).
- Experience with IAC automation tools (Terraform/OpenTofu, Ansible, Packer).
- Experience with container deployment and management tools (e.g., docker).
- Experience with solutions for monitoring and observability (e.g., Grafana, Prometheus, OpenSearch/ElasticSearch, Loki).
- Good communication and presentation skills, and experience dealing with end-users of IT services.
- An ability to work independently on critical infrastructure without oversight, and with a focus on end-user availability.
Desirable but not required:
- Experience with Openstack cloud platforms.
- Experience with managing production Kubernetes clusters and workloads.
- Experience with workload queue management systems (SLURM, LSF).
- Experience with managed switch configuration (e.g., EOS, SONiC, DNOS).
- Programming experience with Python3 utilising classes and inheritance.
- Programming experience with Go.
💵 Compensation & Benefits
- Competitive salary
- Flexible working hours
- Generous annual leave policy
- Private medical insurance and health cash plan
- Dental plan
- Pension (matched up to 5%)
- Life assurance and income protection
- Generous parental leave policy
- Employee assistance programme (health, mental wellbeing, and bereavement support)
- Healthy food and snacks at the central Bristol office
- Barista bar
📈 Career & Growth Analysis
- Web Technology Career Level: Senior Engineer - Cloud Development, responsible for developing and deploying services on Graphcore's fleet of cutting-edge AI systems, collaborating with various teams, and maintaining peak performance.
- Reporting Structure: Reports directly to the Platform Engineering team, working closely with Datacentre Operations, Product Development, and external vendors.
- Technical Impact: Plays a crucial role in Graphcore's cloud infrastructure, ensuring high availability, performance, and scalability of AI services for both internal users and customers.
🌐 Work Environment
- Office Type: On-site, central Bristol office with a collaborative workspace and cross-functional team interaction.
- Office Location(s): Bristol, UK
- Workspace Context: Collaborative workspace with multiple monitors, testing devices, and development tools available for cloud infrastructure management and service deployment.
- Work Schedule: Full-time, with flexible working hours and a focus on deployment windows, maintenance, and project deadlines.
📄 Application & Technical Interview Process
- Interview Process:
- Technical preparation recommendations and coding/configuration assessment focus
- Web architecture expectations and system design discussion
- Web development team interaction and cultural fit assessment
- Final evaluation criteria and technical impact discussion
- Portfolio Review Tips:
- Specific tactical advice for cloud infrastructure portfolio curation and live demo presentation
- Project case study structure with user experience and technical implementation focus
- Code quality demonstration and responsive design standards for cloud services
- Company-specific cloud technology considerations and performance optimization examples
- Technical Challenge Preparation:
- Typical cloud infrastructure exercise format and expectations
- Time management and solution architecture for cloud challenges
- Communication and technical explanation articulation for cloud concepts
🛠 Technology Stack & Web Infrastructure
- Cloud Infrastructure: OpenStack, IAC automation tools (Terraform/OpenTofu, Ansible, Packer), container management tools (e.g., docker), monitoring solutions (Grafana, Prometheus, OpenSearch/ElasticSearch, Loki)
- AI Systems: Graphcore's fleet of cutting-edge AI systems, including in-house AI systems and off-the-shelf high-performance servers, switches, and storage solutions
- Version Control: Git
- Programming Languages: Bash, Python, Go
- Datacentre Operations: Datacentre Operations Engineers collaborate with the Cloud Development team to maintain and operate Graphcore's AI systems at peak performance in private clouds.
👥 Team Culture & Values
- Cloud Development Values: User-centric, performance-driven, and scalable cloud services, with a focus on end-user availability and peak performance of AI systems.
- Collaboration Style: Cross-functional integration between cloud development, datacentre operations, product development, and external vendors, with a focus on clear communication, knowledge sharing, and continuous learning.
📝 Enhancement Note:
The Senior Engineer - Cloud Development role at Graphcore requires a strong background in cloud infrastructure, deployment using Infrastructure-as-Code, high-performance networking, and storage systems. Candidates should have experience with cloud services, IAC automation tools, container management, and monitoring solutions. Familiarity with OpenStack deployments and the technologies they rely on is also beneficial. The role involves collaborating with various teams, including Datacentre Operations, Product Development, and external vendors, to maintain and operate Graphcore's AI systems at peak performance in private clouds.
Application Requirements
A Bachelor's degree or equivalent experience is required, along with solid software engineering or IT experience. Candidates should have strong Linux scripting and system administration skills, as well as experience with cloud services and automation tools.