Mid Level System Reliability Engineer (SRE)

Valtech
Full_time

📍 Job Overview

  • Job Title: Mid Level System Reliability Engineer (SRE)
  • Company: Valtech
  • Location: Colombia - Remote
  • Job Type: Full-Time
  • Category: DevOps
  • Date Posted: July 23, 2025

🚀 Role Summary

  • Key Responsibilities:

    • Maintain systems for observability
    • Adjust and maintain SLOs
    • Participate and facilitate incident resolution
    • Work on proactive improvements to increase reliability
    • Collaborate with teams to analyze failure scenarios and mitigations
    • Assist in creating runbooks for remediation or prevention
    • Reduce non-value work
  • Key Qualifications:

    • 2+ years of experience in DevOps, Support Engineering, or Site Reliability Engineering
    • Knowledge of SLOs and incident resolution
    • Familiarity with automation systems and monitoring tools
    • Excellent command of English
    • Experience with incident management on high-impact platforms
    • Familiarity with corporate environments and international settings

💻 Primary Responsibilities

  • Observability and SLO Maintenance:

    • Maintain systems for observability to monitor performance and detect anomalies
    • Adjust and maintain SLOs to ensure reliability and meet business expectations
  • Incident Resolution and Proactive Improvements:

    • Participate and facilitate incident resolution, including on-call duty (max 1 shift per month)
    • Work on proactive improvements to increase the reliability of managed platforms
    • Collaborate with teams to analyze anticipated or manifested failure scenarios and possible mitigations
    • Assist in creating runbooks to remediate or prevent failure scenarios
  • Value Work Reduction:

    • Identify and reduce work that does not add value, focusing on automation and efficiency

🎓 Skills & Qualifications

Education: A relevant degree or equivalent experience in computer science, software engineering, or a related field.

Experience: 2+ years of experience in DevOps, Support Engineering, or Site Reliability Engineering.

Required Skills:

  • Experience with Site Reliability Engineering (SRE) principles and practices
  • Knowledge of SLOs (Service Level Objectives) and incident management
  • Familiarity with automation systems (e.g., GitHub Actions, Azure DevOps, Ansible, Chef)
  • Proficiency in scripting and programming
  • Experience with monitoring systems (e.g., Datadog, New Relic, Dynatrace, Prometheus, Grafana)
  • Familiarity with pipelining tools (e.g., GitHub Actions, Azure DevOps, Gitlab, Jenkins)
  • Knowledge of Docker and serverless services in public cloud providers (AWS, Azure, GCP)
  • Excellent command of English, both written and spoken (At least B2)

Preferred Skills:

  • Experience with high-traffic, 24x7 production environments
  • Familiarity with serverless architectures and infrastructure as code (IaC) tools
  • Knowledge of CI/CD pipelines and deployment strategies
  • Experience with container orchestration platforms (e.g., Kubernetes, Amazon ECS, Google Kubernetes Engine)

📊 Web Portfolio & Project Requirements

Portfolio Essentials:

  • Demonstrate experience with incident management on high-impact platforms
  • Showcase proficiency in scripting and automation with relevant projects
  • Highlight experience with monitoring systems and SLO maintenance
  • Include examples of value work reduction and process improvement

Technical Documentation:

  • Provide clear and concise documentation for your projects, including code quality, configuration management, and deployment processes
  • Include testing methodologies, performance metrics, and optimization techniques

💵 Compensation & Benefits

Salary Range: The salary range for this role is not specified in the job description. However, based on industry standards for a mid-level SRE role in Colombia, the estimated salary range is COP 4,000,000 - 6,000,000 per year (USD 1,100 - 1,600 per month).

Benefits:

  • Flexibility, with remote and hybrid work options
  • Career advancement opportunities with international mobility and professional development programs
  • Learning and development opportunities with access to cutting-edge tools, training, and industry experts

Working Hours: Full-time position with a standard 40-hour workweek. Working hours may vary based on project deadlines and maintenance windows.

🎯 Team & Company Context

🏢 Company Culture

Industry: Valtech operates in the experience innovation industry, focusing on digital transformation for leading brands.

Company Size: Valtech is a mid-sized company with a global presence, employing over 5,000 people worldwide.

Founded: Valtech was founded in 1993 and has since grown to become a leading experience innovation company.

Team Structure:

  • The SRE team works closely with multidisciplinary delivery teams, focusing on system reliability and stability
  • The team consists of SREs of various levels, with intermediate SREs serving as an escalation point for associate SREs and being backed up by senior colleagues

Development Methodology:

  • Valtech follows an essential DevOps way of working, where teams are responsible for keeping everyone focused on what happens in production
  • The company uses Agile methodologies, with sprint planning and regular code reviews to ensure quality and collaboration

Company Website: Valtech.com

📈 Career & Growth Analysis

Web Technology Career Level: This role is an intermediate-level position, focusing on system reliability, incident management, and SLO maintenance.

Reporting Structure: The intermediate SRE reports to a senior SRE or team lead, collaborating with delivery teams and other SREs to ensure system reliability and stability.

Technical Impact: The intermediate SRE's work directly influences the reliability and performance of Valtech's managed platforms, ensuring high availability and minimal downtime.

Growth Opportunities:

  • Technical Growth: Develop expertise in SRE principles, incident management, and automation, with opportunities to specialize in specific areas or take on more complex challenges
  • Leadership Potential: Gain experience in mentoring junior SREs, leading projects, and driving technical decision-making processes
  • International Opportunities: With Valtech's global presence, there are opportunities to work on international projects and collaborate with teams across different time zones

🌐 Work Environment

Office Type: Valtech offers a flexible work environment, with remote and hybrid work options available.

Office Location(s): Valtech has offices in over 50 countries, with the option to work remotely or visit any of the global offices.

Workspace Context:

  • Valtech provides a collaborative workspace, with access to necessary tools, multiple monitors, and testing devices
  • The company encourages knowledge sharing, technical mentoring, and continuous learning, fostering a culture of growth and innovation

Work Schedule: The work schedule for this role is flexible, with a standard 40-hour workweek and the possibility of working outside of regular business hours for incident resolution and maintenance tasks.

📄 Application & Technical Interview Process

Interview Process:

  1. Application Review: Valtech's Talent Acquisition team will review your application and reach out if your skills and experience align with the role
  2. Phone or Video Screen: A brief call or video conference to discuss your background, experience, and motivation for the role
  3. Technical Assessment: A hands-on technical assessment, focusing on your SRE skills, incident management experience, and automation proficiency
  4. Final Interview: A final interview with the hiring manager or a panel of team members to discuss your fit for the role, team dynamics, and career growth opportunities

Portfolio Review Tips:

  • Highlight your experience with incident management, SLO maintenance, and automation with relevant projects and case studies
  • Include examples of your technical skills, scripting proficiency, and problem-solving abilities
  • Showcase your understanding of monitoring systems, performance optimization, and user experience considerations

Technical Challenge Preparation:

  • Brush up on your knowledge of SRE principles, incident management best practices, and automation tools
  • Practice coding challenges and incident response scenarios to build your confidence and problem-solving skills
  • Familiarize yourself with Valtech's company culture, values, and mission to demonstrate your fit for the role

ATS Keywords: (Organized by category)

  • Programming Languages: Python, Bash, JavaScript, Go, PowerShell
  • Web Frameworks & Libraries: Not specified (focus on SRE skills and incident management)
  • Server Technologies: Linux, Windows, AWS, Azure, GCP, Docker, Kubernetes
  • Databases: Not specified (focus on SRE skills and incident management)
  • Tools: Git, GitHub, Jira, Confluence, Jenkins, Ansible, Chef, Puppet, Terraform, CloudFormation
  • Methodologies: Agile, DevOps, ITIL, SRE, Site Reliability Engineering, Incident Management
  • Soft Skills: Problem-solving, communication, teamwork, adaptability, time management
  • Industry Terms: SLO, SLA, MTBF, MTTR, MTTD, MTTA, ATO, TOIL, Chaos Engineering, Canary Releases, Blue/Green Deployments

📝 Enhancement Note: Valtech's interview process may vary based on the specific role and team dynamics. Be prepared for a thorough evaluation of your SRE skills, incident management experience, and cultural fit.

🛠 Technology Stack & Web Infrastructure

Frontend Technologies: Not applicable (focus on SRE skills and incident management)

Backend & Server Technologies:

  • Linux and Windows operating systems
  • AWS, Azure, and GCP cloud platforms
  • Docker and containerization
  • Kubernetes and container orchestration
  • Serverless architectures and infrastructure as code (IaC) tools

Development & DevOps Tools:

  • Git version control system
  • GitHub for collaborative development and code review
  • Jira and Confluence for project management and documentation
  • Jenkins, Ansible, Chef, and Puppet for automation and deployment
  • Terraform and CloudFormation for infrastructure as code (IaC) and cloud resource management
  • Monitoring tools (e.g., Datadog, New Relic, Dynatrace, Prometheus, Grafana) for system performance tracking and alerting

📝 Enhancement Note: Valtech's technology stack may vary based on the specific project and team requirements. Familiarize yourself with the company's preferred tools and platforms to demonstrate your adaptability and technical proficiency.

👥 Team Culture & Values

Web Development Values:

  • User Experience: Focus on user needs, accessibility, and performance optimization
  • Collaboration: Work closely with multidisciplinary teams, including designers, developers, and product managers
  • Continuous Learning: Stay up-to-date with emerging technologies, best practices, and industry trends
  • Innovation: Embrace new ideas, take calculated risks, and challenge conventional thinking
  • Quality: Prioritize code quality, testing, and quality assurance processes to ensure reliable and maintainable systems

Collaboration Style:

  • Cross-functional Integration: Work closely with designers, developers, and other teams to ensure a seamless user experience and efficient workflows
  • Code Review Culture: Participate in code reviews to maintain code quality, share knowledge, and foster a collaborative environment
  • Knowledge Sharing: Actively share your expertise with team members, mentoring junior team members, and contributing to internal knowledge bases

📝 Enhancement Note: Valtech's team culture may vary based on the specific team and project dynamics. Be prepared to adapt to different working styles and collaborate effectively with diverse teams.

🌐 Challenges & Growth Opportunities

Technical Challenges:

  • Incident Management: Develop expertise in incident management, focusing on high-impact incidents and complex system failures
  • Chaos Engineering: Embrace chaos engineering principles to build resilient systems and improve incident response processes
  • Performance Optimization: Continuously optimize system performance, focusing on resource utilization, scalability, and user experience
  • Emerging Technologies: Stay up-to-date with emerging technologies and adapt your skills to support new tools and platforms

Learning & Development Opportunities:

  • Technical Skill Development: Deepen your expertise in SRE principles, incident management, and automation with targeted training and certifications
  • Conference Attendance: Attend industry conferences, webinars, and workshops to expand your knowledge and network with other professionals
  • Technical Mentoring: Seek mentorship from senior SREs, team leads, or industry experts to gain insights into best practices, emerging trends, and career growth strategies

📝 Enhancement Note: Valtech's learning and development opportunities may vary based on the specific role and team dynamics. Be proactive in seeking out growth opportunities and taking initiative to expand your skillset.

💡 Interview Preparation

Technical Questions:

  • SRE Fundamentals: Demonstrate a solid understanding of SRE principles, incident management best practices, and automation tools
  • Incident Response: Walk through real-life incident scenarios, explaining your approach to diagnosis, resolution, and post-mortem analysis
  • System Design: Discuss system design considerations, trade-offs, and decision-making processes for high-availability and fault-tolerant systems

Company & Culture Questions:

  • Company Values: Explain how your personal values align with Valtech's company values and culture
  • Team Dynamics: Describe your experience working in cross-functional teams and your approach to collaboration and communication
  • Adaptability: Share examples of your ability to adapt to new technologies, tools, and working environments

Portfolio Presentation Strategy:

  • Incident Case Studies: Present detailed case studies of your incident management experience, highlighting your problem-solving skills, technical proficiency, and user experience considerations
  • Automation Projects: Showcase your automation projects, demonstrating your ability to streamline processes, reduce toil, and improve system reliability
  • Monitoring & Alerting: Highlight your experience with monitoring systems, alerting strategies, and performance optimization techniques

📝 Enhancement Note: Valtech's interview process may vary based on the specific role and team dynamics. Be prepared to adapt your interview strategy to address the unique requirements and cultural fit expectations of the role and team.

📌 Application Steps

To apply for this mid-level System Reliability Engineer (SRE) position at Valtech:

  1. Customize Your Application: Tailor your resume and cover letter to highlight your SRE skills, incident management experience, and technical proficiency
  2. Prepare Your Portfolio: Showcase your incident management case studies, automation projects, and monitoring strategies, focusing on user experience and performance optimization
  3. Research Valtech: Familiarize yourself with Valtech's company culture, values, and mission to demonstrate your fit for the role and team dynamics
  4. Practice Technical Challenges: Brush up on your SRE skills, incident management best practices, and automation tools to build your confidence and problem-solving skills
  5. Prepare for the Interview: Review the job description, company website, and any additional resources provided to ensure you are well-prepared for the interview process and technical assessments

📝 Important Notice: This enhanced job description includes AI-generated insights and web technology industry-standard assumptions. All details should be verified directly with the hiring organization before making application decisions.

Application Requirements

You should have 2 or more years of experience in DevOps, Support Engineering, or Site Reliability Engineering, with knowledge of SLOs and incident resolution. Familiarity with automation systems and monitoring tools is essential, along with a good command of English.