Staff Engineer - Cloud Development
📍 Job Overview
- Job Title: Staff Engineer - Cloud Development
- Company: Graphcore
- Location: Bristol, England, United Kingdom
- Job Type: On-site
- Category: DevOps Engineer
- Date Posted: 2025-08-01
🚀 Role Summary
- Develop and deploy services on Graphcore's fleet of cutting-edge AI systems, working closely with Platform Engineering, Datacentre Operations, and Product Development teams.
- Integrate cloud infrastructure, validate, benchmark, optimize, and develop high-performance AI solutions, including in-house AI systems and off-the-shelf servers, switches, and storage solutions.
- Collaborate with external vendors to integrate third-party products into the Cloud Reference Design and provide end-user support.
📝 Enhancement Note: This role requires a strong background in cloud infrastructure, deployment using Infrastructure-as-Code, networking, and storage systems, with a proven track record of delivering technical output as an individual contributor.
💻 Primary Responsibilities
- Service Development and Deployment: Develop and operate end-user services on private clouds, turning end-user and product requirements into deployed services. Collaborate with internal users to support their use of services and provide information on product-related issues to Engineering and QA departments.
- Cloud Infrastructure Management: Maintain and operate the fleet of AI systems at peak performance in private clouds, working closely with Datacentre Operations Engineers. Configure and test new Graphcore AI hardware and systems using Continuous Deployment and Infrastructure-as-code in internal and external datacentres.
- Vendor Collaboration: Work with external vendors of off-the-shelf switches, servers, and storage solutions to integrate third-party products into the Cloud Reference Design.
- Performance Benchmarking and Optimization: Collect and analyze metrics and other data from cloud services to support clear identification and reporting of any issues. Work with users to provide information on product-related issues to Engineering and QA departments.
📝 Enhancement Note: This role requires strong problem-solving skills, attention to detail, and the ability to work independently on critical infrastructure with minimal oversight, focusing on end-user availability.
🎓 Skills & Qualifications
Education: A Bachelor's degree or equivalent practical experience in a relevant subject.
Experience: Solid software engineering or IT experience with a proven track record of delivering technical output as an individual contributor.
Required Skills:
- Strong Linux scripting ability (bash, python, awk, sed)
- Strong Linux system administration (Ubuntu, RHEL, and variants)
- Experience with a version control system (preferably Git) and using it to manage system configuration or automation
- Experience with Continuous Integration or testing pipelines using GitLab, GitHub, or similar
- A solid hands-on understanding of the technologies underpinning cloud services (APIs, virtualisation of CPUs, IO, systems), virtual networks, block storage, resource management, and monitoring
- Experience with IAC automation tools (Terraform/OpenTofu, Ansible, Packer)
- Experience with container deployment and management tools (e.g., docker)
- Experience with solutions for monitoring and observability (e.g., Grafana, Prometheus, OpenSearch/ElasticSearch, Loki)
- Good communication and presentation skills, and experience dealing with end-users of IT services
- An ability to work independently on critical infrastructure with minimal oversight, and with a focus on end-user availability
Preferred Skills:
- Experience with Openstack cloud platforms
- Experience with managing production Kubernetes clusters and workloads
- Programming experience with Python3 using classes and inheritance
📝 Enhancement Note: While a Bachelor's degree is preferred, equivalent practical experience in a relevant field will also be considered. A strong portfolio demonstrating relevant skills and projects is highly valued.
📊 Web Portfolio & Project Requirements
Portfolio Essentials:
- A well-structured portfolio showcasing cloud infrastructure, deployment using Infrastructure-as-Code, networking, and storage systems projects.
- Live demos of cloud services, highlighting performance, scalability, and user experience.
- Documentation of cloud infrastructure, including architecture decisions, performance benchmarks, and user guides.
Technical Documentation:
- Code quality, commenting, and documentation standards for cloud infrastructure projects.
- Version control, deployment processes, and server configuration best practices.
- Testing methodologies, performance metrics, and optimization techniques for cloud services.
📝 Enhancement Note: A strong portfolio should demonstrate the candidate's ability to design, deploy, and maintain cloud infrastructure, with a focus on performance, scalability, and user experience.
💵 Compensation & Benefits
Salary Range: £60,000 - £80,000 per annum (based on experience and qualifications)
Benefits:
- Flexible working
- Generous annual leave policy
- Private medical insurance and health cash plan
- Dental plan
- Pension (matched up to 5%)
- Life assurance and income protection
- Generous parental leave policy
- Employee assistance programme (health, mental wellbeing, and bereavement support)
- Healthy food and snacks at the central Bristol office
- Barista bar
📝 Enhancement Note: The salary range provided is an estimate based on market research for the role, experience level, and location. The actual salary may vary based on the candidate's qualifications, skills, and negotiation.
🎯 Team & Company Context
Company Culture:
- Industry: Artificial Intelligence and cloud computing
- Company Size: Medium (250-999 employees)
- Founded: 2016
- Team Structure: The Platform Engineering team is responsible for providing AI systems to internal users via private clouds and customers via public clouds. The Cloud Development team focuses on cloud integration, validation, performance benchmarking, optimization, and development of high-performance AI solutions.
- Development Methodology: Agile and Scrum frameworks are used for project management, with a focus on priorities, risks, issues, impacts, and constraints.
Company Website: Graphcore AI
📝 Enhancement Note: Graphcore's culture emphasizes continuous learning, innovation, and collaboration, with a focus on clear communication, problem-solving, and end-user support.
🌐 Work Environment
Office Type: On-site, with flexible working options available.
Office Location(s): Bristol, England, United Kingdom
Workspace Context:
- Collaborative workspace with a focus on cross-functional team interaction and knowledge sharing.
- Access to development tools, multiple monitors, and testing devices to support cloud infrastructure development and deployment.
- Opportunities for mentoring, technical training, and career growth within the Platform Engineering organization.
Work Schedule: Full-time, with working hours typically Monday to Friday, 9:00 AM to 5:30 PM. Flexible working hours may be available based on team needs and project deadlines.
📝 Enhancement Note: The work environment at Graphcore prioritizes collaboration, knowledge sharing, and continuous learning, with a focus on supporting the development and deployment of cutting-edge AI systems.
📄 Application & Technical Interview Process
Interview Process:
- Technical Phone Screen: A 30-minute phone screen to assess the candidate's technical skills and understanding of cloud infrastructure, deployment, and networking.
- On-site Technical Interview: A 2-hour on-site interview consisting of a technical deep dive, system design discussion, and live coding challenge.
- Behavioral Interview: A 30-minute behavioral interview to assess the candidate's problem-solving skills, communication, and cultural fit.
- Final Evaluation: A final evaluation based on the candidate's performance throughout the interview process.
Portfolio Review Tips:
- Highlight cloud infrastructure projects that demonstrate the candidate's ability to design, deploy, and maintain scalable and performant cloud services.
- Include live demos and user guides to showcase the candidate's technical expertise and communication skills.
- Emphasize any projects that involve collaboration with external vendors or integration with third-party products.
Technical Challenge Preparation:
- Brush up on cloud infrastructure, deployment, networking, and storage systems concepts.
- Familiarize yourself with Graphcore's technology stack, including AI hardware, software, and cloud services.
- Practice system design and architecture decision-making exercises, focusing on scalability, performance, and user experience.
📝 Enhancement Note: The interview process at Graphcore is designed to assess the candidate's technical skills, problem-solving abilities, and cultural fit, with a focus on cloud infrastructure, deployment, and networking.
📌 Application Steps
To apply for this Staff Engineer - Cloud Development position at Graphcore:
- Update Your Portfolio: Tailor your portfolio to highlight cloud infrastructure, deployment, networking, and storage systems projects, with a focus on performance, scalability, and user experience.
- Optimize Your Resume: Highlight relevant cloud technology skills, experience, and achievements, emphasizing your ability to design, deploy, and maintain scalable and performant cloud services.
- Prepare for Technical Interviews: Brush up on cloud infrastructure, deployment, networking, and storage systems concepts, and practice system design and architecture decision-making exercises.
- Research Graphcore: Familiarize yourself with Graphcore's technology stack, AI hardware, software, and cloud services, as well as the company's mission, values, and culture.
⚠️ Important Notice: This enhanced job description includes AI-generated insights and web development/server administration industry-standard assumptions. All details should be verified directly with the hiring organization before making application decisions.
Application Requirements
Candidates should have a Bachelor's degree or equivalent experience in a relevant field and solid software engineering or IT experience. Proficiency in Linux scripting, system administration, and cloud service technologies is essential.