📍 Job Overview

Job Title: Lead Site Reliability Engineer (SRE)
Company: Social Links LLC
Location: Remote
Job Type: Full-time
Category: DevOps & Infrastructure
Date Posted: June 26, 2025
Experience Level: Mid-Senior Level (5-10 years)
Remote Status: Remote OK

🚀 Role Summary

Strategic Role: Define and implement SRE practices, manage a team, and drive cloud migration.
Full Ownership: Oversee reliability, observability, and platform resiliency.
Growing Company: Join a global, product-driven company with engineering at the core.
Clear Growth Path: Foundational role with a path towards Head of Infrastructure/SRE.

📝 Enhancement Note: This role offers a unique opportunity to shape the SRE culture and cloud migration strategy for a high-impact, AI-powered platform.

💻 Primary Responsibilities

Define and Implement SRE Practices: Establish SLO/SLA management, incident response, postmortems, and alerting policies.
Lead and Manage Team: Oversee on-prem infrastructure, DevOps/CI/CD workflows, and platform observability.
Architect and Scale Cloud-Native Infrastructure: Utilize AWS services to build and scale reliable, secure, and efficient systems.
Migrate Services and Systems to the Cloud: Oversee the transition of on-premises services and systems to AWS.
Own Logging, Metrics, Recovery Processes, and Secure Runtime Environments: Ensure system reliability, availability, and security.
Implement Infrastructure Automation and Self-Healing Mechanisms: Automate processes to minimize human intervention and maximize system reliability.
Build Internal Documentation, Runbooks, and Operational Guidelines: Establish clear, up-to-date, and accessible documentation for the team.
Mentor and Foster Reliability Culture: Act as a leader and mentor for the reliability culture across engineering.

🎓 Skills & Qualifications

Education: Bachelor's degree in Computer Science, Engineering, or a related field. Relevant work experience may be considered in lieu of a degree.

Experience: 5+ years in infrastructure/SRE/DevOps roles, with 2+ years in technical leadership.

Required Skills:

Expert knowledge of Linux, Bash, and system automation.
Deep understanding of core networking: VPN, TCP/IP, DNS, routing, NAT, firewalls.
Hands-on experience with on-prem operations and modernization.
Experience with monitoring: Zabbix, Prometheus, Grafana.
Proven experience with AWS: EC2, IAM, VPC, EKS, S3, CloudWatch.
Strong skills in CI/CD tooling: GitHub Actions, GitLab CI, ArgoCD, Helm, Kustomize.
Experience implementing SRE disciplines: SLOs, error budgets, incident management.
Proficiency in writing clear documentation and infrastructure standards.

Preferred Skills:

Experience with OpenFaaS, Kubernetes, Terraform, Ansible.
Familiarity with SOC2, ISO 27001, GDPR compliance practices.
Python scripting for automation.
Experience with Vault, OPA, RBAC, and Zero Trust architectures.

📝 Enhancement Note: Candidates with experience in managing complex, legacy-heavy environments and a proven track record of mentoring and developing others will be highly valued.

📊 Web Portfolio & Project Requirements

Portfolio Essentials:

Demonstrate your experience with on-prem infrastructure management, cloud migration, and SRE practices.
Showcase your ability to lead and mentor teams through case studies or testimonials.
Highlight your proficiency in AWS services and CI/CD tooling with relevant projects or certifications.

Technical Documentation:

Provide examples of internal documentation, runbooks, or operational guidelines you've created to establish best practices.
Include any relevant certifications or training materials that demonstrate your expertise in SRE, AWS, or related technologies.

📝 Enhancement Note: A strong portfolio will showcase your ability to balance tactical tasks and long-term architecture, as well as your strategic foresight in driving reliability and observability.

💵 Compensation & Benefits

Salary Range: $150,000 - $200,000 per year (USD) based on experience and location. This estimate is derived from regional market data for mid-senior level SRE roles in the tech industry.

Benefits:

Stock Options
Flexible remote environment
Leadership visibility

Working Hours: Full-time (40 hours/week) with flexible hours and the ability to work remotely.

📝 Enhancement Note: The salary range provided is an estimate and may vary based on the candidate's experience, location, and the company's final assessment. Research methodology includes data from Glassdoor, Indeed, and Payscale, as well as regional market data for SRE roles.

🎯 Team & Company Context

🏢 Company Culture

Industry: Social Links is a global provider of OSINT and Open Data technologies, developing a modular Open Intelligence Platform that aggregates hundreds of data sources and delivers intelligence through AI agents, pipelines, and customizable workflows.

Company Size: Medium-sized company with a growing engineering team, offering a flexible remote environment and leadership visibility.

Founded: 2015 (10 years ago)

Team Structure:

The SRE team is responsible for on-prem infrastructure, DevOps/CI/CD workflows, and platform observability.
The team reports directly to the CTO and works cross-functionally with engineering, product, and design teams.
The SRE team consists of DevOps and SysOps engineers, with the Lead SRE Engineer acting as the technical lead and mentor.

Development Methodology:

Agile/Scrum methodologies with bi-weekly sprint planning for web projects.
Code review, testing, and quality assurance practices to ensure system reliability and performance.
Deployment strategies, CI/CD pipelines, and server management to automate processes and minimize human intervention.

Company Website: Social Links

📝 Enhancement Note: Social Links values ownership, reliability-first thinking, and strong communication skills in its team members. The company fosters a culture of mentorship and continuous learning, with a clear growth path for technical leadership.

📈 Career & Growth Analysis

Web Technology Career Level: Mid-Senior Level SRE with full ownership over reliability, observability, and platform resiliency. This role involves strategic decision-making, team management, and mentorship.

Reporting Structure: The Lead SRE Engineer reports directly to the CTO and manages a team of DevOps and SysOps engineers.

Technical Impact: This role has a significant impact on the reliability, performance, and security of the Social Links platform, as well as the team's ability to deliver high-quality features and services to customers.

Growth Opportunities:

Technical Growth: Expand your expertise in AWS services, cloud migration, and SRE best practices. Contribute to the development of internal tools and processes to improve system reliability and efficiency.
Leadership Growth: Mentor team members and foster a culture of reliability and continuous improvement. Collaborate with cross-functional teams to drive strategic initiatives and improve overall platform performance.
Architectural Growth: Design and implement scalable, secure, and efficient systems that support the growth and evolution of the Social Links platform. Contribute to the development of the company's long-term architecture and technology roadmap.

📝 Enhancement Note: The Lead SRE Engineer role at Social Links offers a unique opportunity for career growth, with a clear path towards Head of Infrastructure/SRE. This role is ideal for candidates seeking to own reliability architecture and be a key strategic contributor to a high-impact, AI-powered platform.

🌐 Work Environment

Office Type: Flexible remote environment with a strong emphasis on collaboration, communication, and work-life balance.

Office Location(s): Remote (global)

Workspace Context:

Collaboration: Work closely with cross-functional teams, including engineering, product, and design, to ensure system reliability and performance.
Tools & Equipment: Utilize modern development tools, multiple monitors, and testing devices to optimize your work environment.
Interaction: Engage with team members, stakeholders, and customers to gather feedback, identify improvement areas, and drive system reliability enhancements.

Work Schedule: Full-time (40 hours/week) with flexible hours and the ability to work remotely. The work schedule may include deployment windows, maintenance, and project deadlines.

📝 Enhancement Note: Social Links values a flexible work environment that empowers team members to balance their professional and personal lives. The company fosters a culture of open communication, collaboration, and continuous learning.

📄 Application & Technical Interview Process

Interview Process:

Technical Phone Screen: Assess your technical proficiency in Linux, AWS, and SRE practices. Expect questions on system design, incident management, and cloud migration strategies.
On-site Technical Deep Dive: Evaluate your hands-on experience with on-prem infrastructure, AWS services, and CI/CD tooling. Prepare for live coding exercises, architecture discussions, and problem-solving challenges.
Behavioral and Cultural Fit Interview: Assess your communication skills, leadership potential, and cultural fit with the Social Links team. Prepare for questions on mentorship, incident management, and strategic decision-making.
Final Evaluation: Evaluate your overall fit for the role, considering your technical expertise, leadership potential, and cultural alignment with the Social Links team.

Portfolio Review Tips:

Highlight your experience with on-prem infrastructure management, cloud migration, and SRE practices.
Showcase your ability to lead and mentor teams through case studies or testimonials.
Demonstrate your proficiency in AWS services and CI/CD tooling with relevant projects or certifications.

Technical Challenge Preparation:

Brush up on your Linux, AWS, and SRE skills, focusing on system design, incident management, and cloud migration strategies.
Practice live coding exercises and architecture discussions to prepare for the on-site technical deep dive.
Prepare for behavioral and cultural fit interviews by reflecting on your leadership experiences, mentorship strategies, and incident management approaches.

ATS Keywords: [See the comprehensive list of ATS keywords at the end of this document]

📝 Enhancement Note: The interview process for the Lead SRE Engineer role at Social Links is designed to evaluate your technical expertise, leadership potential, and cultural fit with the team. Prepare thoroughly and be ready to demonstrate your ability to own reliability architecture and drive strategic initiatives.

🛠 Technology Stack & Web Infrastructure

Frontend Technologies: (Not applicable for this role)

Backend & Server Technologies:

Linux: Expert knowledge required for on-prem infrastructure management and cloud migration.
AWS Services: Proven experience with EC2, IAM, VPC, EKS, S3, CloudWatch, and Route53.
Monitoring Tools: Experience with Zabbix, Prometheus, Grafana, Loki, and Tempo.

Development & DevOps Tools:

CI/CD Tooling: Strong skills in GitHub Actions, GitLab CI, ArgoCD, Helm, and Kustomize.
Infrastructure Automation: Proficiency in Bash scripting, Python scripting, and tools like Ansible, Terraform, and OpenFaaS.
Cloud Infrastructure: Experience with AWS services, including EC2, IAM, VPC, EKS, S3, CloudWatch, and Route53.

📝 Enhancement Note: The technology stack for the Lead SRE Engineer role at Social Links is diverse and includes both on-prem and cloud-based infrastructure components. Candidates should have expertise in Linux, AWS services, and CI/CD tooling, as well as experience with monitoring tools and infrastructure automation.

👥 Team Culture & Values

Web Development Values:

Reliability-First: Prioritize system reliability, availability, and performance in all aspects of platform development and maintenance.
User-Centric: Focus on the needs and pain points of users to drive platform improvements and innovations.
Continuous Learning: Foster a culture of continuous learning and improvement, encouraging team members to stay up-to-date with emerging technologies and best practices.
Collaboration: Encourage open communication, collaboration, and knowledge sharing across teams and disciplines.

Collaboration Style:

Cross-Functional Integration: Work closely with engineering, product, and design teams to ensure system reliability and performance.
Code Review Culture: Encourage peer programming and code review practices to maintain high-quality standards and share knowledge across the team.
Knowledge Sharing: Foster a culture of mentorship and continuous learning, with regular training sessions, workshops, and brown bag lunches.

📝 Enhancement Note: Social Links values a culture of collaboration, open communication, and continuous learning. The company fosters a user-centric approach to platform development and maintenance, with a strong emphasis on system reliability and performance.

⚡ Challenges & Growth Opportunities

Technical Challenges:

On-Prem Infrastructure Management: Stabilize and modernize legacy on-prem systems while driving the transition to AWS cloud infrastructure.
Cloud Migration: Architect and scale cloud-native infrastructure using AWS services, ensuring system reliability, security, and efficiency.
SRE Practice Implementation: Define and implement SRE practices, including SLO/SLA management, incident response, postmortems, and alerting policies.
Team Leadership: Manage and mentor a team of DevOps and SysOps engineers, fostering a culture of reliability and continuous improvement.

Learning & Development Opportunities:

Technical Skill Development: Expand your expertise in AWS services, cloud migration, and SRE best practices through workshops, training sessions, and hands-on projects.
Leadership Development: Enhance your leadership skills through mentorship, team management, and architecture decision-making opportunities.
Emerging Technology Adoption: Stay up-to-date with emerging technologies and best practices in the SRE and cloud infrastructure domains.

📝 Enhancement Note: The Lead SRE Engineer role at Social Links presents a unique set of technical challenges and growth opportunities. Candidates should be prepared to own reliability architecture, drive strategic initiatives, and foster a culture of continuous learning and improvement.

💡 Interview Preparation

Technical Questions:

Linux & AWS: Expect questions on system design, incident management, and cloud migration strategies. Prepare for live coding exercises and architecture discussions.
CI/CD Tooling: Be ready to discuss your experience with GitHub Actions, GitLab CI, ArgoCD, Helm, and Kustomize, as well as your approach to infrastructure automation and self-healing mechanisms.
SRE Practices: Prepare for questions on SLO/SLA management, incident response, postmortems, and alerting policies. Be ready to discuss your approach to system reliability, availability, and performance.

Company & Culture Questions:

Mentorship: Prepare for questions on your experience mentoring and developing others, as well as your approach to fostering a culture of reliability and continuous improvement.
Incident Management: Be ready to discuss your experience managing complex, legacy-heavy environments calmly and constructively, as well as your approach to incident response and postmortems.
Strategic Foresight: Prepare for questions on your ability to balance tactical tasks and long-term architecture, as well as your approach to driving reliability and observability.

Portfolio Presentation Strategy:

Live Demonstration: Prepare a live demonstration of your experience with on-prem infrastructure management, cloud migration, and SRE practices.
Architecture Walkthrough: Develop a clear and concise architecture walkthrough that highlights your approach to system design, incident management, and cloud migration strategies.
User Experience Showcase: Demonstrate your ability to balance technical requirements with user experience considerations, highlighting your commitment to driving platform reliability and performance.

📝 Enhancement Note: The interview process for the Lead SRE Engineer role at Social Links is designed to evaluate your technical expertise, leadership potential, and cultural fit with the team. Prepare thoroughly and be ready to demonstrate your ability to own reliability architecture and drive strategic initiatives.

📌 Application Steps

To apply for this Lead Site Reliability Engineer (SRE) position at Social Links:

Customize Your Portfolio: Highlight your experience with on-prem infrastructure management, cloud migration, and SRE practices. Include relevant projects, case studies, and testimonials that demonstrate your technical expertise and leadership potential.
Optimize Your Resume: Tailor your resume to the specific requirements of the Lead SRE Engineer role at Social Links, emphasizing your experience with Linux, AWS, and SRE practices. Include relevant keywords and highlight your leadership potential.
Prepare for Technical Challenges: Brush up on your Linux, AWS, and SRE skills, focusing on system design, incident management, and cloud migration strategies. Practice live coding exercises and architecture discussions to prepare for the on-site technical deep dive.
Research the Company: Familiarize yourself with Social Links' products, services, and company culture. Prepare for behavioral and cultural fit interviews by reflecting on your leadership experiences, mentorship strategies, and incident management approaches.

⚠️ Important Notice: This enhanced job description includes AI-generated insights and web technology industry-standard assumptions. All details should be verified directly with the Social Links hiring organization before making application decisions.

ATS Keywords:

Programming Languages:

Bash
Python (nice to have)

Web Frameworks:

Not applicable for this role

Server Technologies:

Linux
AWS (EC2, IAM, VPC, EKS, S3, CloudWatch, Route53)
On-prem infrastructure (VPNs, networking, firewalls, Zabbix)

Databases:

Not applicable for this role

Tools:

GitHub Actions
GitLab CI
ArgoCD
Helm
Kustomize
Ansible (nice to have)
Terraform (nice to have)
OpenFaaS (nice to have)
Vault, OPA, RBAC, and Zero Trust architectures (nice to have)

Methodologies:

Agile/Scrum methodologies
SRE practices (SLO/SLA management, incident response, postmortems, alerting policies)
Infrastructure as Code (IaC)
Continuous Integration/Continuous Deployment (CI/CD)

Soft Skills:

Leadership and mentorship
Team management
Incident management
Strategic foresight
Technical communication
Collaboration and knowledge sharing
User experience design and interface development
Problem-solving and troubleshooting
System design and architecture
Cloud migration strategies
Infrastructure automation and self-healing mechanisms

Industry Terms:

Site Reliability Engineering (SRE)
Infrastructure as Code (IaC)
Continuous Integration/Continuous Deployment (CI/CD)
Incident Management
Postmortems
Alerting Policies
Service Level Objectives (SLOs)
Service Level Agreements (SLAs)
Error Budgets
Cloud Migration
On-Prem Infrastructure
AWS Services (EC2, IAM, VPC, EKS, S3, CloudWatch, Route53)
Monitoring Tools (Zabbix, Prometheus, Grafana, Loki, Tempo)
DevOps and SysOps
Server Management
System Administration
Web Infrastructure
Reliability Architecture
Cloud-Native Infrastructure
Scalability and Efficiency
Security and Compliance
Open Data Technologies
Open Intelligence Platform
AI Agents and Pipelines
Customizable Workflows
OSINT (Open-Source Intelligence)
SOC2, ISO 27001, GDPR compliance practices
Zero Trust architectures
Mentorship and Leadership Development
Technical Skill Development
Emerging Technologies and Best Practices

SRE Lead