Staff Site Reliability Engineer (English speaking, all genders)
π Job Overview
- Job Title: Staff Site Reliability Engineer (English speaking, all genders)
- Company: Resourcify
- Location: [Remote - Germany, Austria]
- Job Type: Full-time
- Category: DevOps, Infrastructure
- Date Posted: 2025-07-22
- Experience Level: 10+
- Remote Status: Remote OK
π Role Summary
- Key Responsibilities: Plan and evolve cloud infrastructure, streamline release processes, automate operational workflows, and improve system performance and reliability.
- Key Skills: Site Reliability Engineering, DevOps, Infrastructure Engineering, GCP, Terraform, Kubernetes, CI/CD, Scripting, Monitoring, Observability, Incident Response, Database Management, Automation, Mentoring, Communication, AI Tools.
π» Primary Responsibilities
π Enhancement Note: The role focuses on planning and executing infrastructure growth, ensuring high availability, and enhancing system reliability through automation and streamlined processes.
- Infrastructure Planning & Growth: Plan for infrastructure growth to meet the demands of a growing customer base and engineering team.
- Cloud Infrastructure Evolution: Own and evolve the cloud infrastructure (GCP) to ensure high availability, scalability, and cost-efficiency.
- Core Infrastructure Design & Maintenance: Design, build, and maintain core infrastructure, including Terraform, CI/CD pipelines, and Kubernetes.
- Release Process Streamlining: Streamline and improve the release process, reducing manual steps and enabling faster, safer, and more frequent deployments.
- Automation & Workflow Optimization: Automate operational workflows (monitoring, alerting, scaling) and reduce toil by turning manual actions into automation.
- System Performance Monitoring & Improvement: Monitor and continuously improve system performance, reliability, and scalability using metrics and data to guide decisions.
- On-Call & Incident Response: Be part of an on-call rotation, responding to incidents that impact Resourcifyβs availability and supporting engineers during customer-impacting incidents. Use on-call experience to drive meaningful improvements and prevent incidents.
- Postmortem Analysis & Improvement: Conduct and lead blameless postmortems, identify root causes, and implement long-term fixes that improve system health.
- Collaboration & Troubleshooting: Collaborate with developers to debug complex production issues across services and layers of the stack. Continuously improve operational processes to make them reliable and repeatable.
π Skills & Qualifications
Education: Bachelor's degree in Computer Science, Engineering, or a related field. Relevant experience may be considered in lieu of a degree.
Experience: 8+ years of experience in Site Reliability Engineering, DevOps, or Infrastructure Engineering roles, ideally in a fast-paced environment and leadership in driving infrastructure initiatives.
Required Skills:
- Deep expertise in GCP and Infrastructure-as-Code, particularly with Terraform, applied in production environments.
- Strong fundamentals in Linux systems, networking concepts, and secure infrastructure design.
- Proven experience designing and operating Kubernetes infrastructure, including implementing robust security models and best practices in secrets management.
- Strong command of CI/CD design and automation, with hands-on experience in tools enabling safe and frequent deployments. Scripting skills in Bash, Python, or Go to support automation and tooling are a strong plus.
- Experience in architecting and operating monitoring and observability systems (Grafana, Prometheus, Loki) with a focus on proactive detection (alerting on symptoms, not outages).
- A track record of improving deployment frequency, reliability, and team velocity through automation, tooling, and process improvements.
- Leadership experience in incident response, including coordinating cross-team efforts, running postmortems, and implementing systemic fixes.
- Operational experience with databases, including provisioning, backup strategies, performance tuning, and disaster recovery.
- Strong grasp of build-vs-buy decisions and experience evaluating and integrating third-party infrastructure tooling effectively.
- A track record of mentoring engineers, leading initiatives across teams, and elevating infrastructure standards org-wide.
- Excellent communication skills, with the ability to articulate technical trade-offs and architectural decisions clearly to both engineers and leadership.
Preferred Skills:
- Familiarity with Spring Boot applications and the operational considerations that come with running Java-based applications in production.
- Experience or interest in leveraging AI tools to boost engineering productivity, automate repetitive tasks, and enhance platform tooling and developer enablement.
π Web Portfolio & Project Requirements
Portfolio Essentials:
- A well-structured and up-to-date resume highlighting relevant experience, projects, and achievements in Site Reliability Engineering, DevOps, or Infrastructure Engineering.
- A portfolio of past projects demonstrating expertise in GCP, Terraform, Kubernetes, CI/CD, monitoring, and automation.
- Case studies or blog posts detailing complex infrastructure challenges faced and the solutions implemented, including the positive impact on system performance, reliability, and scalability.
Technical Documentation:
- Code samples and documentation showcasing scripting skills in Bash, Python, or Go, with a focus on automation and tooling.
- Examples of incident response plans, postmortem analyses, and long-term fixes implemented to improve system health.
- Documentation of operational processes, including deployments, upgrades, and maintenance windows, demonstrating reliability and repeatability.
π΅ Compensation & Benefits
Salary Range: β¬75,000 - β¬100,000 per year, depending on experience and qualifications. This estimate is based on market research for similar roles in the German and Austrian tech industry, considering the required experience level and regional cost of living.
Benefits:
- Flexible Working: Your times are flexible as youβll be measured by your success, not by the number of hours you clock in.
- Extra Time to Recharge: With our flexible vacation policy, youβll have 30+ days of vacation to refuel your batteries and enjoy your life besides work. Not enough? You are more than welcome to take more days!
- Remote Work Options: We are a remote-first company and convinced that we can do excellent work from (almost) anywhere. You have the option to work from our central, beautifully equipped offices in Hamburg, Berlin, and Munich or fully remote within Germany. At the moment, we can also offer permanent employment in Austria.
- Workation: We love a Workation! You can enjoy temporary remote work for up to 3 months provided that you have access to high-speed internet.
- High-Quality Technical Equipment: We provide you with high-quality, technical equipment (Laptop) and an additional home-office budget to transform your home into the most enjoyable place to work.
π― Team & Company Context
π’ Company Culture
Industry: Waste management and recycling technology.
Company Size: Medium (51-250 employees).
Founded: 2017.
Team Structure:
- A dedicated team of engineers, designers, and product managers working together to simplify and improve waste management processes.
- A flat organizational structure that encourages collaboration, innovation, and cross-functional teamwork.
- A strong focus on sustainability, circular economy, and making a positive impact on the environment.
Development Methodology:
- Agile/Scrum methodologies with bi-weekly sprint planning, daily stand-ups, and regular retrospectives.
- A focus on continuous integration, continuous deployment, and continuous improvement.
- Collaboration tools such as Jira, Confluence, and Slack to facilitate communication and project management.
Company Website: Resourcify
π Enhancement Note: Resourcify's company culture is characterized by a strong commitment to sustainability, a collaborative and innovative work environment, and a focus on continuous improvement and learning.
π Career & Growth Analysis
Web Technology Career Level: Staff Site Reliability Engineer, responsible for planning and executing infrastructure growth, ensuring high availability, and enhancing system reliability through automation and streamlined processes.
Reporting Structure: Reports directly to the Head of Engineering, working closely with the development teams to ensure the reliability and scalability of the platform.
Technical Impact: Plays a crucial role in driving the scalability, reliability, and performance of the Resourcify platform, enabling the company to manage and track waste more efficiently and effectively.
Growth Opportunities:
- Technical Leadership: Opportunities to mentor engineers, lead initiatives across teams, and elevate infrastructure standards org-wide.
- Architecture Decisions: Involvement in strategic architecture decisions, driving the company's technical roadmap, and contributing to the overall success of the platform.
- Emerging Technologies: Exposure to emerging technologies and the opportunity to leverage AI tools to boost engineering productivity, automate repetitive tasks, and enhance platform tooling and developer enablement.
π Enhancement Note: Resourcify's career growth opportunities focus on technical leadership, architecture decisions, and emerging technologies, allowing the Staff Site Reliability Engineer to make a significant impact on the company's technical roadmap and platform success.
π Work Environment
Office Type: Modern, centrally located offices in Hamburg, Berlin, and Munich, with a remote-first approach that allows employees to work from anywhere within Germany or Austria.
Office Location(s):
- Hamburg: Schopenstehl 13, 20095 Hamburg, Germany
- Berlin: Karl-Liebknecht-Str. 29A, 10178 Berlin, Germany
- Munich: Brienner Str. 45, 80333 Munich, Germany
Workspace Context:
- Collaborative workspaces designed to facilitate teamwork, creativity, and innovation.
- Access to high-quality technical equipment, including laptops, monitors, and testing devices.
- A flexible work environment that allows employees to customize their workspace to suit their preferences and needs.
Work Schedule: Flexible working hours with a focus on results and productivity, rather than the number of hours worked.
π Enhancement Note: Resourcify's work environment is characterized by modern, centrally located offices, a remote-first approach, and a flexible work schedule that prioritizes results and productivity.
π Application & Technical Interview Process
Interview Process:
- Technical Phone Screen (30 minutes): A brief conversation to assess your technical background, experience, and cultural fit.
- Technical Deep Dive (60-90 minutes): A more in-depth discussion focusing on your technical skills, experience, and problem-solving abilities. Expect to discuss your past projects, infrastructure challenges, and the solutions you implemented.
- Behavioral & Cultural Fit Interview (30-45 minutes): An interview to assess your cultural fit, communication skills, and alignment with Resourcify's values and mission.
- Final Decision & Offer (TBD): A decision will be made based on the interviews, and an offer will be extended to the successful candidate.
Portfolio Review Tips:
- Highlight your past projects that demonstrate your expertise in GCP, Terraform, Kubernetes, CI/CD, monitoring, and automation.
- Focus on the challenges faced, the solutions implemented, and the positive impact on system performance, reliability, and scalability.
- Include code samples, documentation, and any other relevant materials that showcase your technical skills and problem-solving abilities.
Technical Challenge Preparation:
- Brush up on your GCP, Terraform, Kubernetes, CI/CD, and monitoring skills, focusing on hands-on experience and practical applications.
- Familiarize yourself with Resourcify's platform, understanding its waste management and recycling technology focus.
- Prepare for questions about incident response, postmortem analysis, and long-term fixes, demonstrating your ability to identify root causes and implement systemic improvements.
π Enhancement Note: Resourcify's interview process focuses on assessing technical skills, cultural fit, and problem-solving abilities, with a strong emphasis on past projects and portfolio materials.
π Technology Stack & Web Infrastructure
Frontend Technologies: N/A (not applicable for this role).
Backend & Server Technologies:
- Cloud Infrastructure: Google Cloud Platform (GCP) for high availability, scalability, and cost-efficiency.
- Infrastructure-as-Code: Terraform for designing, provisioning, and managing cloud infrastructure.
- Containerization: Kubernetes for orchestrating and managing containerized applications.
- CI/CD Pipelines: Jenkins or GitLab CI/CD for automated testing, building, and deployment of applications.
- Monitoring & Observability: Prometheus and Grafana for monitoring system performance and alerting on symptoms, not outages. Loki for log aggregation and analysis.
Development & DevOps Tools:
- Version Control: Git for collaborative development and version tracking.
- Code Review: GitHub or GitLab for code review, collaboration, and quality assurance.
- Secret Management: HashiCorp Vault or AWS Secrets Manager for secure storage and management of sensitive data.
- Infrastructure Automation: Terraform for automating infrastructure provisioning and management.
- CI/CD Pipelines: Jenkins or GitLab CI/CD for automated testing, building, and deployment of applications.
- Container Orchestration: Kubernetes for managing and scaling containerized applications.
π Enhancement Note: Resourcify's technology stack focuses on GCP, Terraform, Kubernetes, CI/CD, and monitoring tools to ensure high availability, scalability, and system reliability.
π₯ Team Culture & Values
Web Development Values:
- Sustainability: A strong commitment to sustainability, circular economy, and making a positive impact on the environment.
- Innovation: Encouraging creativity, collaboration, and continuous learning to drive technological advancements in waste management and recycling.
- Reliability: A focus on high availability, scalability, and system reliability to ensure the platform's performance and stability.
- Collaboration: A collaborative and inclusive work environment that values teamwork, communication, and cross-functional teamwork.
- Continuous Improvement: A commitment to continuous improvement, learning, and adaptation to drive the company's success and growth.
Collaboration Style:
- Cross-Functional Integration: Close collaboration between engineers, designers, and product managers to ensure the platform's user experience, functionality, and technical feasibility.
- Code Review Culture: A strong code review culture that prioritizes quality, security, and maintainability.
- Knowledge Sharing: A culture of knowledge sharing, technical mentoring, and continuous learning to drive team growth and expertise.
π Enhancement Note: Resourcify's team culture is characterized by a strong commitment to sustainability, innovation, reliability, collaboration, and continuous improvement, with a focus on cross-functional teamwork and knowledge sharing.
β‘ Challenges & Growth Opportunities
Technical Challenges:
- Infrastructure Growth: Planning and executing infrastructure growth to meet the demands of a growing customer base and an expanding engineering team.
- Cloud Infrastructure Evolution: Owning and evolving the cloud infrastructure (GCP) to ensure high availability, scalability, and cost-efficiency.
- System Performance & Reliability: Monitoring and continuously improving system performance, reliability, and scalability using metrics and data to guide decisions.
- Incident Response & Prevention: Responding to incidents that impact Resourcifyβs availability and driving meaningful improvements to prevent future incidents.
- Emerging Technologies: Staying up-to-date with emerging technologies and leveraging AI tools to boost engineering productivity, automate repetitive tasks, and enhance platform tooling and developer enablement.
Learning & Development Opportunities:
- Technical Skill Development: Opportunities to deepen your expertise in GCP, Terraform, Kubernetes, CI/CD, monitoring, and automation, with a focus on emerging technologies and AI tools.
- Leadership Development: Mentoring engineers, leading initiatives across teams, and elevating infrastructure standards org-wide to drive technical leadership and growth.
- Architecture Decision-Making: Involvement in strategic architecture decisions, driving the company's technical roadmap, and contributing to the overall success of the platform.
π Enhancement Note: Resourcify's challenges and growth opportunities focus on infrastructure growth, cloud infrastructure evolution, system performance and reliability, incident response and prevention, and emerging technologies, with a strong emphasis on technical skill development, leadership development, and architecture decision-making.
π‘ Interview Preparation
Technical Questions:
- GCP Expertise: Questions about GCP services, best practices, and infrastructure design patterns.
- Terraform Proficiency: Questions about Terraform configuration, provisioning, and management of cloud infrastructure.
- Kubernetes Knowledge: Questions about Kubernetes cluster management, security models, and best practices for secrets management.
- CI/CD Design & Automation: Questions about CI/CD pipeline design, automation, and safe and frequent deployments.
- Monitoring & Observability: Questions about Prometheus, Grafana, Loki, and proactive detection of symptoms, not outages.
- Incident Response & Postmortem Analysis: Questions about incident response, postmortem analysis, and long-term fixes to improve system health.
Company & Culture Questions:
- Resourcify's Mission: Questions about understanding and aligning with Resourcify's mission to simplify and improve waste management and recycling technology.
- Collaboration & Teamwork: Questions about working effectively in a cross-functional team, collaborating with engineers, designers, and product managers.
- Agile Methodologies: Questions about working within an Agile/Scrum environment, sprint planning, daily stand-ups, and retrospectives.
Portfolio Presentation Strategy:
- Project Case Studies: Presenting case studies or blog posts detailing complex infrastructure challenges faced and the solutions implemented, with a focus on system performance, reliability, and scalability.
- Code Samples & Documentation: Showcasing code samples, documentation, and any other relevant materials that demonstrate your technical skills and problem-solving abilities.
- User Experience & Impact: Highlighting the user experience and impact of your projects, with a focus on improving waste management and recycling technology.
π Enhancement Note: Resourcify's interview preparation focuses on technical expertise in GCP, Terraform, Kubernetes, CI/CD, monitoring, and incident response, with a strong emphasis on understanding the company's mission, collaboration, and teamwork.
π Application Steps
To apply for this Staff Site Reliability Engineer position at Resourcify:
- Submit Your Application: Click the 'Apply Now' button on the job listing or use the application link provided.
- Tailor Your Portfolio: Customize your portfolio to highlight relevant projects, case studies, and code samples that demonstrate your expertise in GCP, Terraform, Kubernetes, CI/CD, monitoring, and automation.
- Optimize Your Resume: Update your resume to emphasize your experience, skills, and achievements in Site Reliability Engineering, DevOps, or Infrastructure Engineering.
- Prepare for Technical Interviews: Brush up on your technical skills, review Resourcify's platform, and prepare for questions about incident response, postmortem analysis, and long-term fixes.
- Research the Company: Familiarize yourself with Resourcify's mission, values, and culture to ensure a strong cultural fit and alignment with the company's goals.
β οΈ Important Notice: This enhanced job description includes AI-generated insights and web technology industry-standard assumptions. All details should be verified directly with the hiring organization before making application decisions.
Application Requirements
The role requires 8+ years of experience in Site Reliability Engineering or related fields, with expertise in GCP and Infrastructure-as-Code. Candidates should also have experience with Kubernetes, CI/CD automation, and a strong grasp of monitoring systems.