SR Site Reliability Engineer

Wakapi
Full_timeMendoza, Argentina

📍 Job Overview

  • Job Title: Senior Site Reliability Engineer
  • Company: Wakapi
  • Location: Mendoza, Mendoza, Argentina
  • Job Type: Full-Time
  • Category: DevOps & Infrastructure
  • Date Posted: 2025-03-19
  • Experience Level: 5-10 years
  • Remote Status: On-site

🚀 Role Summary

  • Key Responsibilities: Design, implement, and maintain scalable and highly available systems, ensure observability through metrics and alerting, collaborate with cross-functional teams to enhance platform engineering practices, and improve system performance and reliability.
  • Key Skills: Site Reliability Engineering, DevOps, Service Level Management, Infrastructure-as-Code, Terraform, Monitoring, Logging, Observability, New Relic, Prometheus, Grafana, Datadog, Kafka, AWS, CI/CD, Analytical Skills, Communication Skills.

💻 Primary Responsibilities

🌟 Scalability and High Availability

  • Design, implement, and maintain scalable and highly available systems using load balancing, auto-scaling patterns, canary releases, and blue-green deployments.

📈 Monitoring, Logging, and Observability

  • Develop and maintain monitoring and logging dashboards using tools like New Relic, Prometheus, Grafana, and Datadog.
  • Ensure observability through metrics, tracing, log aggregation, and alerting.

🛡️ Alerting and Automation

  • Help teams determine the right settings and thresholds for triggering alerts or automations on their applications.
  • Understand that each application has different performance requirements, such as varying acceptable response times or resource constraints.

📊 System Performance and Reliability

  • Monitor, optimize, and ensure system reliability and performance using tools like New Relic to apply DORA metrics and measure and improve development and operational performance.
  • Ensure compliance with SLM metrics like SLAs, SLOs, and SLIs by tracking uptime, response times, and resolution times.

💥 Resiliency

  • Implement and advocate for "Chaos" engineering practices to ensure system resiliency.

🤝 Collaboration

  • Work with cross-functional teams to enhance platform engineering practices and gather the right information for metrics analysis.

🎓 Skills & Qualifications

📚 Education

  • Relevant degree in Computer Science, Engineering, or a related field (or equivalent experience).

🕒 Experience

  • Proven experience working with Infrastructure-as-Code tooling, like Terraform, for infrastructure management.
  • Strong understanding of scalability and high availability patterns, including load balancing, auto-scaling, canary releases, and blue-green deployments.
  • Strong understanding of DevOps metrics (like DORA) and their application in measuring and improving development and operational performance.
  • Strong understanding of Service Level Management (SLM) metrics (like SLAs, SLOs, and SLIs) and their importance in defining, monitoring, and ensuring compliance from the services bound to them.
  • Experience with monitoring, logging, and observability tools like New Relic, Prometheus, Grafana, and Datadog.
  • Experience working with Kafka and improving performance of event-driven, real-time data processing and streaming projects and architectures.
  • Familiarity with tooling used for SLM, DevOps, and DORA metrics like Apache Dev Lake, Grafana, and New Relic.
  • Experience working with AWS, Azure, or GCP for cloud infrastructure management.
  • Experience working with CI/CD pipeline tools such as GitHub Actions, Jenkins, GitLab CI, or similar.
  • Analytical Skills. Ability to analyze and interpret metrics to drive improvements.
  • Strong communication skills to effectively collaborate with team members and stakeholders.

🛠️ Required Skills

  • Infrastructure-as-Code tooling (e.g., Terraform)
  • Scalability and high availability patterns
  • DevOps metrics (DORA)
  • Service Level Management (SLM) metrics (SLAs, SLOs, SLIs)
  • Monitoring, logging, and observability tools (e.g., New Relic, Prometheus, Grafana, Datadog)
  • Kafka experience
  • Cloud infrastructure management (AWS, Azure, GCP)
  • CI/CD pipeline tools (e.g., GitHub Actions, Jenkins, GitLab CI)
  • Analytical and communication skills

🌟 Preferred Skills

  • Familiarity with Observability-as-Code tooling and practices.
  • Familiarity with "Chaos" engineering practices for system resiliency.

📊 Web Portfolio & Project Requirements

  • Portfolio Essentials:

    • Demonstrate experience with Infrastructure-as-Code tooling, such as Terraform, by showcasing projects where you designed, implemented, and maintained scalable and highly available systems.
    • Highlight your monitoring and logging skills by presenting dashboards and reports created using tools like New Relic, Prometheus, Grafana, and Datadog.
    • Showcase your alerting and automation capabilities by providing examples of how you've helped teams determine the right settings and thresholds for triggering alerts or automations on their applications.
    • Display your system performance and reliability improvements by presenting metrics and data that demonstrate how you've applied DORA metrics and ensured compliance with SLM metrics like SLAs, SLOs, and SLIs.
  • Technical Documentation:

    • Provide code examples and documentation that showcase your proficiency with Infrastructure-as-Code tooling, monitoring, logging, and observability tools, and CI/CD pipeline tools.
    • Include any relevant certifications or training that demonstrate your expertise in Site Reliability Engineering, DevOps, and related technologies.

💵 Compensation & Benefits

Salary Range: The estimated salary range for a Senior Site Reliability Engineer in Mendoza, Argentina, is ARS 250,000 - ARS 350,000 per month (USD 2,250 - USD 3,150). This range is based on regional market data and industry standards for similar roles.

Benefits:

  • Competitive salary package
  • Health, dental, and vision insurance
  • Retirement plan contributions
  • Generous vacation and time-off policies
  • Professional development opportunities
  • Company-sponsored events and team-building activities

Working Hours: Full-time position with standard working hours, Monday through Friday, from 9:00 AM to 6:00 PM. Flexible hours and remote work options may be available for specific projects or team needs.

🎯 Team & Company Context

🏢 Company Culture

Industry: Wakapi operates in the e-commerce industry, focusing on providing a seamless and secure online shopping experience for customers.

Company Size: Wakapi is a mid-sized company with a growing team of dedicated professionals. As a Senior Site Reliability Engineer, you will play a critical role in ensuring the stability, performance, and scalability of our platform.

Founded: Wakapi was founded in 2015 and has since grown to become a leading online retailer in Argentina.

Team Structure:

  • The Platform Engineering team consists of experienced Site Reliability Engineers, DevOps Engineers, and Infrastructure Engineers who work collaboratively to ensure the reliability and performance of our platform.
  • The team follows an Agile/Scrum methodology, with regular sprint planning, code reviews, and continuous integration and deployment processes.

Development Methodology:

  • Wakapi follows Agile/Scrum methodologies for software development, with a focus on iterative development, continuous improvement, and customer satisfaction.
  • The team uses tools like Jira, Confluence, and GitHub for project management, collaboration, and version control.
  • Wakapi employs a microservices architecture, with each component of the system running in its own process and communicating with lightweight mechanisms, such as HTTP/REST.

Company Website: Wakapi

📈 Career & Growth Analysis

Web Technology Career Level: Senior Site Reliability Engineer roles require a high level of expertise in infrastructure management, monitoring, and observability. In this position, you will be responsible for designing, implementing, and maintaining scalable and highly available systems, as well as ensuring the reliability and performance of our platform.

Reporting Structure: As a Senior Site Reliability Engineer, you will report directly to the Head of Platform Engineering. You will work closely with other engineers, developers, and stakeholders to ensure the stability, performance, and scalability of our platform.

Technical Impact: In this role, you will have a significant impact on the reliability, performance, and scalability of Wakapi's platform. Your work will directly contribute to enhancing the user experience and driving business growth.

Growth Opportunities:

  • Technical Leadership: As a senior member of the Platform Engineering team, you will have the opportunity to mentor junior engineers and contribute to the development of best practices and standards for infrastructure management, monitoring, and observability.
  • Architecture Decisions: You will play a crucial role in making critical architecture decisions that will shape the future of Wakapi's platform and drive business growth.
  • Emerging Technologies: Wakapi is committed to staying at the forefront of technological innovation. As a Senior Site Reliability Engineer, you will have the opportunity to work with emerging technologies and drive their adoption within the organization.

🌐 Work Environment

Office Type: Wakapi's office is a modern, collaborative workspace designed to foster creativity and productivity. The office features open-plan workspaces, meeting rooms, and breakout areas for team discussions and informal gatherings.

Office Location(s): Wakapi's headquarters are located in Mendoza, Argentina, with additional offices in Buenos Aires and Rosario.

Workspace Context:

  • Collaborative Workspace: Wakapi's office layout encourages collaboration and communication among team members, with open-plan workspaces and dedicated team areas.
  • Development Tools: Wakapi provides its engineers with access to the latest development tools, multiple monitors, and testing devices to ensure optimal productivity and performance.
  • Cross-Functional Collaboration: Wakapi fosters a culture of cross-functional collaboration, with regular interactions between engineering, design, marketing, and business teams.

Work Schedule: Wakapi operates on a standard business hours schedule, Monday through Friday, from 9:00 AM to 6:00 PM. Flexible hours and remote work options may be available for specific projects or team needs.

📄 Application & Technical Interview Process

📝 Interview Process

  1. Technical Assessment: Candidates will be required to complete a technical assessment, focusing on their proficiency with Infrastructure-as-Code tooling, monitoring, logging, and observability tools, as well as their understanding of scalability and high availability patterns.
  2. System Design Discussion: Candidates will be asked to discuss their approach to designing scalable and highly available systems, as well as their experience with event-driven infrastructure projects using tools like Terraform, New Relic, Kubernetes, AWS, and Kafka.
  3. Team Fit Assessment: Candidates will participate in a team fit assessment, where they will have the opportunity to meet with members of the Platform Engineering team and discuss their cultural fit and alignment with Wakapi's values.
  4. Final Evaluation: Candidates will participate in a final evaluation, where they will be assessed on their technical skills, problem-solving abilities, and cultural fit.

📝 Portfolio Review Tips

  • Portfolio Structure: Organize your portfolio to highlight your experience with Infrastructure-as-Code tooling, monitoring, logging, and observability tools, as well as your understanding of scalability and high availability patterns.
  • Case Studies: Include detailed case studies that demonstrate your ability to design, implement, and maintain scalable and highly available systems, as well as your experience with event-driven infrastructure projects using tools like Terraform, New Relic, Kubernetes, AWS, and Kafka.
  • Code Quality: Showcase your proficiency with Infrastructure-as-Code tooling by providing clean, well-documented code examples that demonstrate your ability to design, implement, and maintain scalable and highly available systems.
  • Performance Optimization: Highlight your experience with performance optimization techniques, such as load balancing, auto-scaling, canary releases, and blue-green deployments, as well as your ability to monitor and optimize system performance using tools like New Relic.

🛠️ Technical Challenge Preparation

  • Technical Assessment: Familiarize yourself with the technical assessment format and practice solving problems related to Infrastructure-as-Code tooling, monitoring, logging, and observability tools, as well as scalability and high availability patterns.
  • System Design: Brush up on your system design skills and practice discussing your approach to designing scalable and highly available systems, as well as your experience with event-driven infrastructure projects using tools like Terraform, New Relic, Kubernetes, AWS, and Kafka.
  • Problem-Solving: Hone your problem-solving skills and practice articulating your thought process and technical explanations for complex infrastructure challenges.

💡 ATS Keywords

  • Programming Languages: Terraform, Bash, Python, Go, Java
  • Web Frameworks: Kubernetes, AWS, GCP, Azure
  • Server Technologies: New Relic, Prometheus, Grafana, Datadog, Kafka
  • Databases: PostgreSQL, MySQL, MongoDB
  • Tools: Jenkins, GitHub Actions, GitLab CI, Apache Dev Lake, Grafana, New Relic
  • Methodologies: Agile, Scrum, DORA, SLM, Infrastructure-as-Code
  • Soft Skills: Analytical, Communication, Collaboration, Problem-Solving, Leadership
  • Industry Terms: Site Reliability Engineering, DevOps, Infrastructure Management, Monitoring, Logging, Observability, Scalability, High Availability, Event-Driven Architecture, Chaos Engineering

🛠️ Technology Stack & Web Infrastructure

💻 Frontend Technologies

  • Wakapi's frontend is built using modern web development practices and frameworks, such as React, Redux, and Next.js.
  • The team follows a mobile-first approach and ensures that the platform is responsive and accessible across various devices and screen sizes.
  • Wakapi's design system is built using Storybook and ensures consistency and reusability across the platform.

🔧 Backend & Server Technologies

  • Wakapi's backend is built using a microservices architecture, with each component of the system running in its own process and communicating with lightweight mechanisms, such as HTTP/REST.
  • The team uses modern programming languages, such as Node.js, Python, and Go, to build scalable and maintainable services.
  • Wakapi's backend services are containerized using Docker and orchestrated using Kubernetes.
  • The team uses AWS for cloud infrastructure management and leverages services like EC2, RDS, and S3 to ensure the scalability, availability, and security of the platform.

🛠️ Development & DevOps Tools

  • Wakapi uses GitHub for version control and collaborative development.
  • The team employs CI/CD pipelines to automate the build, test, and deployment process, ensuring fast and reliable releases.
  • Wakapi uses tools like New Relic, Prometheus, and Datadog for monitoring, logging, and observability, ensuring the reliability and performance of the platform.
  • The team uses infrastructure-as-code tools like Terraform to manage and provision cloud resources, ensuring consistency and automation in the infrastructure management process.

👥 Team Culture & Values

🌟 Web Development Values

  • User-Centric: Wakapi prioritizes the user experience and strives to create a seamless and intuitive online shopping experience for customers.
  • Performance Optimization: The team is committed to optimizing the performance of the platform, ensuring fast and reliable user experiences across various devices and network conditions.
  • Code Quality: Wakapi values clean, well-documented, and maintainable code, ensuring the long-term sustainability and scalability of the platform.
  • Collaboration: Wakapi fosters a culture of collaboration and communication, with regular interactions between engineering, design, marketing, and business teams.

🤝 Collaboration Style

  • Cross-Functional Integration: Wakapi encourages collaboration between different teams, with regular interactions between engineering, design, marketing, and business teams.
  • Code Review Culture: The team values code reviews and peer programming practices, ensuring knowledge sharing and continuous learning among team members.
  • Knowledge Sharing: Wakapi encourages team members to share their knowledge and expertise with others, fostering a culture of continuous learning and growth.

⚡️ Challenges & Growth Opportunities

🌟 Technical Challenges

  • Scalability and High Availability: Design, implement, and maintain scalable and highly available systems using load balancing, auto-scaling patterns, canary releases, and blue-green deployments.
  • Monitoring, Logging, and Observability: Develop and maintain monitoring and logging dashboards using tools like New Relic, Prometheus, Grafana, and Datadog. Ensure observability through metrics, tracing, log aggregation, and alerting.
  • Alerting and Automation: Help teams determine the right settings and thresholds for triggering alerts or automations on their applications. Understand that each application has different performance requirements, such as varying acceptable response times or resource constraints.
  • System Performance and Reliability: Monitor, optimize, and ensure system reliability and performance using tools like New Relic to apply DORA metrics and measure and improve development and operational performance. Ensure compliance with SLM metrics like SLAs, SLOs, and SLIs by tracking uptime, response times, and resolution times.
  • Resiliency: Implement and advocate for "Chaos" engineering practices to ensure system resiliency.

🌱 Learning & Development Opportunities

  • Web Technology Skill Advancement: Wakapi encourages its engineers to stay up-to-date with the latest web technologies and trends. The company provides opportunities for professional development, including conference attendance, certification, and community involvement.
  • Conference Attendance: Wakapi supports its engineers' participation in relevant conferences and events, providing an opportunity to learn from industry experts and network with other professionals.
  • Certification: Wakapi encourages its engineers to pursue relevant certifications, such as those offered by AWS, Google Cloud, or Microsoft Azure, to demonstrate their expertise and commitment to continuous learning.
  • Technical Mentorship: Wakapi fosters a culture of knowledge sharing and mentorship, with senior engineers providing guidance and support to junior team members.
  • Leadership Development: Wakapi offers opportunities for engineers to develop their leadership skills, with roles in technical mentoring, architecture decision-making, and team management.

💡 Interview Preparation

📝 Technical Questions

  • Web Fundamentals: Brush up on your knowledge of web development fundamentals, including HTML, CSS, and JavaScript, as well as your proficiency with modern web development practices and frameworks, such as React, Redux, and Next.js.
  • Web Architecture: Familiarize yourself with web architecture principles and best practices, including microservices architecture, event-driven infrastructure, and containerization using Docker and Kubernetes.
  • Problem-Solving: Hone your problem-solving skills and practice articulating your thought process and technical explanations for complex infrastructure challenges.

📝 Company & Culture Questions

  • Company Culture: Research Wakapi's company culture and values, and be prepared to discuss how your personal values and work style align with the organization.
  • Web Development Methodology: Familiarize yourself with Wakapi's web development methodology, including Agile/Scrum practices, code review processes, and continuous integration and deployment strategies.
  • User Experience Impact: Prepare to discuss your approach to designing and implementing user-centric features that enhance the online shopping experience for customers.

📝 Portfolio Presentation Strategy

  • Live Website Demonstration: Prepare a live demonstration of your portfolio, showcasing your experience with Infrastructure-as-Code tooling, monitoring, logging, and observability tools, as well as your understanding of scalability and high availability patterns.
  • Code Explanation: Be prepared to explain your code and architecture decisions, demonstrating your ability to design, implement, and maintain scalable and highly available systems.
  • User Experience Showcase: Highlight your experience with performance optimization techniques, such as load balancing, auto-scaling, canary releases, and blue-green deployments, as well as your ability to monitor and optimize system performance using tools like New Relic.

📌 Application Steps

To apply for this Senior Site Reliability Engineer position at Wakapi:

  1. Customize Your Portfolio: Tailor your portfolio to highlight your experience with Infrastructure-as-Code tooling, monitoring, logging, and observability tools, as well as your understanding of scalability and high availability patterns.
  2. Optimize Your Resume: Update your resume to emphasize your relevant skills and experiences, focusing on your proficiency with Infrastructure-as-Code tooling, monitoring, logging, and observability tools, as well as your understanding of scalability and high availability patterns.
  3. Prepare for Technical Interview: Brush up on your technical skills and practice solving problems related to Infrastructure-as-Code tooling, monitoring, logging, and observability tools, as well as scalability and high availability patterns.
  4. Research Wakapi: Learn about Wakapi's company culture, values, and web development methodology, and be prepared to discuss how your personal values and work style align with the organization.

⚠️ Important Notice: This enhanced job description includes AI-generated insights and web development/server administration industry-standard assumptions. All details should be verified directly with the hiring organization before making application decisions.


Content Guidelines (IMPORTANT: Do not include this in the output)

Web Technology-Specific Focus:

  • Tailor every section specifically to web development, server administration, DevOps, and web infrastructure roles
  • Include web development methodologies, responsive design principles, and server management practices
  • Emphasize web portfolio requirements, live project demonstrations, and user experience considerations
  • Address web development team dynamics, cross-functional collaboration with designers and marketers
  • Focus on web technology career progression, emerging technology adoption, and technical specialization

Quality Standards:

  • Ensure no content overlap between sections - each section must contain unique information
  • Only include Enhancement Notes when making significant inferences about technical responsibilities, with specific reasoning based on role level and web technology industry practices
  • Be comprehensive but concise, prioritizing actionable information over descriptive text
  • Strategically distribute web development and server administration-related keywords throughout all sections naturally
  • Provide realistic salary ranges based on location, experience level, and web technology specialization

Industry Expertise:

  • Include specific web technologies, frameworks, server platforms, and infrastructure tools relevant to the role
  • Address web development career progression paths and technical leadership opportunities in web teams
  • Provide tactical advice for web portfolio development, live demonstrations, and project case studies
  • Include web technology-specific interview preparation and coding challenge guidance
  • Emphasize responsive design, performance optimization, accessibility standards, and user experience principles

Professional Standards:

  • Maintain consistent formatting, spacing, and professional tone throughout
  • Use web development and server administration industry terminology appropriately and accurately
  • Include comprehensive benefits and growth opportunities relevant to web technology professionals
  • Provide actionable insights that give web development and server administration candidates a competitive advantage
  • Focus on web development team culture, cross-functional collaboration, and user impact measurement

Technical Focus & Portfolio Emphasis:

  • Emphasize web development best practices, responsive design principles, and performance optimization
  • Include specific portfolio requirements tailored to the web technology discipline and role level
  • Address browser compatibility, accessibility standards, and user experience design principles
  • Focus on problem-solving methods, performance optimization, and scalable web architecture
  • Include technical presentation skills and stakeholder communication for web projects

Avoid:

  • Generic business jargon not relevant to web development or server administration roles
  • Placeholder text or incomplete sections
  • Repetitive content across different sections
  • Non-technical terminology unless relevant to the specific web technology role
  • Marketing language unrelated to web development, server administration, or user experience

Application Requirements

Proven experience with Infrastructure-as-Code tooling and a strong understanding of scalability and high availability patterns are essential. Familiarity with monitoring tools and cloud infrastructure management is also required.