Senior Site Reliability Specialist (SRE)

GHGSAT
Full_time

πŸ“ Job Overview

  • Job Title: Senior Site Reliability Specialist (SRE)
  • Company: GHGSat
  • Location: Hybrid (Montreal)
  • Job Type: Full-time
  • Category: DevOps, Site Reliability Engineering
  • Date Posted: 2025-06-17
  • Experience Level: 5-10 years
  • Remote Status: Hybrid

πŸš€ Role Summary

  • πŸ“ Enhancement Note: This role is a unique opportunity for an experienced SRE to join a mission-driven company at the forefront of climate change mitigation, using cutting-edge technology to tackle a global challenge.

  • As a Senior Site Reliability Specialist (SRE) at GHGSat, you'll play a pivotal role in designing, evolving, and operating the infrastructure that powers our satellite data processing, analytics pipelines, and customer platforms. You'll be responsible for increasing reliability, shaping modern SRE practices, and supporting the velocity of our development and research teams.

πŸ’» Primary Responsibilities

  • πŸ“ Enhancement Note: The primary responsibilities of this role require a deep understanding of SRE principles, cloud-native infrastructure, and a strong commitment to automation, security, and collaboration.

  • πŸ”‘ Infrastructure Design & Evolution:

    • Design and evolve GHGSat's infrastructure using modern SRE principles, such as infrastructure-as-code, self-healing systems, and robust observability.
    • Collaborate with development and research teams to understand their systems and ensure SRE supports their velocity, not slows it down.
  • πŸ› οΈ Infrastructure Operation & Optimization:

    • Operate and optimize GHGSat's cloud and on-prem services, including Kubernetes clusters, CI/CD pipelines, artifact registries, and custom workloads.
    • Build and refine our observability stack, including logs, metrics, traces, and actionable alerts.
  • πŸ”’ Security & Compliance:

    • Own and mature GHGSat's IAM strategy, including auditing, lifecycle management, and tooling across Azure AD, AWS IAM, and internal systems.
    • Lead by example in championing best practices around security, ops hygiene, and incident readiness.
  • 🀝 Collaboration & Workflow Improvement:

    • Improve and secure workflows for 80+ engineers and researchers across cloud and hybrid environments.
    • Automate relentlessly – from security audits to deployment flows – to increase efficiency and reduce human error.

πŸŽ“ Skills & Qualifications

Education: A bachelor's degree in Computer Science, Engineering, or a related field. Relevant experience may be considered in lieu of a degree.

Experience: 5-10 years of experience in SRE, DevOps, or Systems Engineering roles in fast-moving tech environments.

Required Skills:

  • Deep comfort with Linux, Kubernetes, and cloud-native infrastructure (primarily AWS)
  • Proficiency in treating Infrastructure as Code (OpenTofu, Ansible, etc.)
  • Practical experience with monitoring, alerting, and incident response
  • Solid understanding of cybersecurity best practices, especially in securing distributed systems and developer workflows
  • Experience supporting CI/CD, container builds, and artifact lifecycles in production environments

Preferred Skills:

  • Bilingual (French/English) and/or excited about space, science, and climate impact
  • Experience with Azure AD and on-prem infrastructure management

πŸ“ Enhancement Note: Given the mission-critical nature of this role, candidates should possess a strong work ethic, excellent communication skills, and a commitment to continuous learning and improvement.

πŸ“Š Web Portfolio & Project Requirements

πŸ“ Enhancement Note: While a portfolio of previous SRE projects is not explicitly required, candidates should be prepared to discuss their past experiences in infrastructure design, optimization, and security, as well as any relevant incident response and recovery efforts.

Portfolio Essentials:

  • Case studies or examples demonstrating your experience with infrastructure design, optimization, and security in a production environment
  • Examples of your incident response and recovery efforts, highlighting your problem-solving skills and ability to learn from failures
  • Documentation of your experience with cloud-native infrastructure, CI/CD pipelines, and container builds

Technical Documentation:

  • Detailed documentation of your past SRE projects, including design decisions, implementation details, and lessons learned
  • Examples of your code quality, commenting, and documentation standards
  • Documentation of your experience with version control, deployment processes, and server configuration

πŸ’΅ Compensation & Benefits

Salary Range: $120,000 - $160,000 CAD per year, depending on experience and qualifications. This estimate is based on market research for SRE roles in Montreal, considering the company's size and the candidate's level of experience.

Benefits:

  • Competitive salary + stock options for all full-time employees
  • Health/Dental benefits
  • Paid Time Off + floating statutory holidays
  • Flexible work environment

Working Hours: Full-time (40 hours/week) with a flexible work schedule and a rotational on-call schedule.

🎯 Team & Company Context

🏒 Company Culture

Industry: GHGSat operates in the space and climate technology sectors, with a focus on using satellite data to detect and measure greenhouse gas emissions.

Company Size: GHGSat is a small but mighty team, offering a creative and highly motivating work environment with high impact and meaningful work for the planet.

Founded: 2011

Team Structure:

  • The Digital Infrastructure team consists of SREs, DevOps engineers, and cloud architects, working closely with development, research, and data science teams.
  • The team follows Agile methodologies, with a focus on collaboration, continuous improvement, and delivering value to customers.

Development Methodology:

  • GHGSat uses Agile/Scrum methodologies for sprint planning, code review, testing, and quality assurance.
  • The team employs CI/CD pipelines for automated deployment and continuous integration.
  • GHGSat prioritizes collaboration and knowledge sharing, with a strong emphasis on learning and growth.

Company Website: GHGSat

πŸ“ Enhancement Note: GHGSat's unique mission and small team size offer an unparalleled opportunity for candidates to have a significant impact on the company's success and contribute to meaningful climate change mitigation efforts.

πŸ“ˆ Career & Growth Analysis

Web Technology Career Level: Senior Site Reliability Specialist (SRE) – This role requires a deep understanding of SRE principles, cloud-native infrastructure, and a strong commitment to automation, security, and collaboration. The ideal candidate will have 5-10 years of experience in SRE, DevOps, or Systems Engineering roles and be ready to take on a leadership role in shaping GHGSat's infrastructure and SRE practices.

Reporting Structure: This role reports directly to the Director of Digital Infrastructure and collaborates closely with development, research, and data science teams.

Technical Impact: The Senior SRE will have a significant impact on GHGSat's infrastructure, security, and reliability, ensuring that our satellite data processing, analytics pipelines, and customer platforms are robust, scalable, and secure.

Growth Opportunities:

  • Technical Leadership: As GHGSat grows, there will be opportunities for the Senior SRE to take on more leadership responsibilities, mentoring junior team members, and driving technical decisions that align with the company's mission and goals.
  • Emerging Technologies: GHGSat is at the forefront of climate change mitigation technology, offering candidates the opportunity to work with cutting-edge tools and techniques, and to stay up-to-date with the latest developments in SRE and cloud-native infrastructure.
  • Career Progression: This role offers a clear path for career progression, with the opportunity to take on more responsibility, grow within the team, and make a significant impact on GHGSat's success.

🌐 Work Environment

Office Type: GHGSat's office is a collaborative, creative workspace designed to foster innovation, learning, and growth. The team prioritizes open communication, knowledge sharing, and a positive work environment.

Office Location(s): Montreal, Quebec, Canada

Workspace Context:

  • GHGSat's office is equipped with modern development tools, multiple monitors, and testing devices to support the team's work.
  • The office is easily accessible by public transportation, with nearby amenities and a vibrant neighborhood.
  • GHGSat offers a flexible work environment, with a hybrid work arrangement that combines remote work and in-office collaboration.

Work Schedule: Full-time (40 hours/week) with a flexible work schedule and a rotational on-call schedule to ensure 24/7 system monitoring and incident response.

πŸ“„ Application & Technical Interview Process

Interview Process:

  1. Phone Screen (30 minutes): A brief conversation to discuss your experience, motivations, and alignment with GHGSat's mission.
  2. Technical Deep Dive (60 minutes): A detailed discussion of your SRE experience, focusing on infrastructure design, optimization, and security. Be prepared to discuss your approach to incident response, automation, and collaboration.
  3. Cultural Fit Interview (30 minutes): A conversation with a member of the leadership team to assess your cultural fit, communication skills, and alignment with GHGSat's values.
  4. Final Decision: A decision will be made based on your technical expertise, cultural fit, and alignment with GHGSat's mission.

Portfolio Review Tips:

  • Highlight your experience with infrastructure design, optimization, and security in a production environment.
  • Showcase your incident response and recovery efforts, demonstrating your problem-solving skills and ability to learn from failures.
  • Document your experience with cloud-native infrastructure, CI/CD pipelines, and container builds.

Technical Challenge Preparation:

  • Brush up on your Linux, Kubernetes, and cloud-native infrastructure knowledge (primarily AWS).
  • Familiarize yourself with GHGSat's tech stack and be prepared to discuss how you would approach specific infrastructure challenges.
  • Prepare examples of your experience with monitoring, alerting, incident response, and security best practices.

ATS Keywords: (List not included in this format)

πŸ›  Technology Stack & Web Infrastructure

Frontend Technologies: N/A (This role focuses on backend and infrastructure technologies)

Backend & Server Technologies:

  • Linux (Ubuntu, CentOS)
  • Kubernetes (EKS, GKE, AKS)
  • Cloud-native Infrastructure (AWS, GCP, Azure)
  • Infrastructure as Code (Terraform, Ansible)
  • Containerization (Docker)
  • CI/CD Pipelines (Jenkins, GitLab CI/CD, CircleCI)
  • Monitoring & Alerting (Prometheus, Grafana, ELK Stack, Datadog)
  • Incident Response & Recovery (PagerDuty, OpsGenie, On-Call Rotations)
  • Security & Compliance (AWS IAM, Azure AD, Vault, Secret Manager, Open Policy Agent)

Development & DevOps Tools:

  • Version Control (Git, GitHub)
  • Project Management (Jira, Asana)
  • Collaboration (Slack, Microsoft Teams)
  • Documentation (Confluence, Google Drive)

πŸ‘₯ Team Culture & Values

Web Development Values:

  • Reliability: GHGSat values reliability in all aspects of our work, from infrastructure design to incident response and recovery.
  • Automation: We prioritize automation to increase efficiency, reduce human error, and enable our team to focus on high-value tasks.
  • Collaboration: GHGSat fosters a culture of collaboration, knowledge sharing, and open communication.
  • Continuous Learning: We encourage our team members to stay up-to-date with the latest developments in SRE and cloud-native infrastructure.

Collaboration Style:

  • GHGSat uses Agile methodologies, with a focus on collaboration, continuous improvement, and delivering value to customers.
  • The team prioritizes open communication, knowledge sharing, and a positive work environment.
  • GHGSat encourages cross-functional collaboration between SREs, developers, researchers, and data scientists to ensure our infrastructure supports our mission and goals.

πŸ“ Enhancement Note: GHGSat's unique mission and small team size offer an unparalleled opportunity for candidates to have a significant impact on the company's success and contribute to meaningful climate change mitigation efforts.

⚑ Challenges & Growth Opportunities

Technical Challenges:

  • Scalability: GHGSat's infrastructure must be designed to scale with our growing customer base and data processing needs.
  • Security: The Senior SRE will be responsible for ensuring the security of our infrastructure, data, and customer information.
  • Incident Response: GHGSat operates in a 24/7 environment, requiring the Senior SRE to be prepared to respond to incidents and minimize downtime.
  • Emerging Technologies: GHGSat is at the forefront of climate change mitigation technology, offering candidates the opportunity to work with cutting-edge tools and techniques, and to stay up-to-date with the latest developments in SRE and cloud-native infrastructure.

Learning & Development Opportunities:

  • Technical Skill Development: GHGSat encourages its team members to stay up-to-date with the latest developments in SRE and cloud-native infrastructure, offering opportunities for training, certification, and conference attendance.
  • Mentorship: The Senior SRE will have the opportunity to mentor junior team members, helping them develop their skills and advance their careers.
  • Leadership Development: As GHGSat grows, there will be opportunities for the Senior SRE to take on more leadership responsibilities, driving technical decisions that align with the company's mission and goals.

πŸ’‘ Interview Preparation

Technical Questions:

  • Infrastructure Design: Be prepared to discuss your approach to infrastructure design, optimization, and security in a production environment. Provide specific examples of your past experiences and the challenges you faced.
  • Incident Response: Prepare examples of your incident response and recovery efforts, highlighting your problem-solving skills and ability to learn from failures.
  • Security Best Practices: Be ready to discuss your understanding of security best practices, especially in securing distributed systems and developer workflows.

Company & Culture Questions:

  • Mission Alignment: Be prepared to discuss why you are excited about GHGSat's mission and how your work as a Senior SRE will contribute to our success.
  • Team Dynamics: Prepare to discuss your experience working in a collaborative, cross-functional team environment and how you would contribute to GHGSat's team culture.
  • Adaptability: Be ready to discuss your ability to adapt to a fast-panging, mission-driven environment and your commitment to continuous learning and improvement.

Portfolio Presentation Strategy:

  • Storytelling: Prepare a compelling narrative that showcases your experience with infrastructure design, optimization, and security, as well as your incident response and recovery efforts.
  • Technical Deep Dive: Be ready to provide a detailed technical deep dive into your past SRE projects, highlighting your approach to infrastructure design, optimization, and security.
  • Q&A: Prepare for a Q&A session to address any questions or concerns about your portfolio or GHGSat's mission and culture.

πŸ“Œ Application Steps

To apply for this Senior Site Reliability Specialist (SRE) position at GHGSat:

  1. Submit your application through the application link provided in the job listing.
  2. Customize your resume to highlight your relevant SRE experience, focusing on infrastructure design, optimization, and security, as well as your incident response and recovery efforts.
  3. Prepare your portfolio to showcase your experience with cloud-native infrastructure, CI/CD pipelines, and container builds, as well as any relevant incident response and recovery efforts.
  4. Research GHGSat's mission, technology stack, and team culture to ensure a strong fit and alignment with the company's values and goals.
  5. Prepare for the technical interview by brushing up on your Linux, Kubernetes, and cloud-native infrastructure knowledge (primarily AWS), and familiarizing yourself with GHGSat's tech stack and interview process.

⚠️ Important Notice: This enhanced job description includes AI-generated insights and web development industry-standard assumptions. All details should be verified directly with the hiring organization before making application decisions.


Application Requirements

Candidates should have 5-10 years of experience in SRE, DevOps, or Systems Engineering roles, with a strong understanding of Linux, Kubernetes, and cloud-native infrastructure. Practical experience with monitoring, alerting, incident response, and cybersecurity best practices is essential.