MLOps Engineer (SRE)
📍 Job Overview
- Job Title: MLOps Engineer (SRE)
- Company: Metova
- Location: Mexico (Remote)
- Job Type: Full-Time
- Category: DevOps & Infrastructure
- Date Posted: August 1, 2025
🚀 Role Summary
-
📝 Enhancement Note: This role focuses on MLOps engineering with a strong emphasis on Site Reliability Engineering (SRE) practices, making it an excellent fit for experienced professionals looking to apply their skills in machine learning operations and ensure the reliability, performance, and stability of ML models in production.
-
A leading accounting software company in Mexico is seeking a highly skilled MLOps Engineer (SRE) to join their team. This role involves designing and operating observability solutions for ML models in production, collaborating with data science and product teams, and applying SRE practices to ensure model reliability and performance.
💻 Primary Responsibilities
-
📝 Enhancement Note: The primary responsibilities of this role revolve around designing and implementing robust monitoring, alerting, and traceability solutions for ML models in production, as well as collaborating with cross-functional teams to detect and mitigate incidents related to models in production.
-
📝 Enhancement Note: This role requires a strong background in SRE, DevOps, or Platform Engineering with a focus on ML projects, as well as proficiency in various tools and technologies used for ML model monitoring and observability.
-
📝 Enhancement Note: The ideal candidate will have experience in high-transaction environments and be fluent in technical English.
-
📝 Enhancement Note: This role offers the opportunity to work with cutting-edge technologies and apply SRE principles to ensure the reliability and performance of ML models in production.
-
📝 Enhancement Note: The role requires a solid understanding of CI/CD for ML pipelines, as well as experience with orchestrators such as Airflow, Kubeflow, or experiment tracking tools like MLflow and Weights & Biases.
-
📝 Enhancement Note: The primary responsibilities of this role include designing and operating observability solutions for ML models in production, developing dashboards and metrics to evaluate model performance, cost, and stability, implementing structured logging, drift monitoring, data quality, and inference error tools, collaborating with data science and product teams to detect and mitigate incidents related to models in production, and applying SRE practices such as chaos engineering, stress testing, staging testing, and continuous integration.
🎓 Skills & Qualifications
Education:
- 📝 Enhancement Note: A bachelor's degree in Computer Science, Engineering, or a related field is typically required for this role. However, relevant work experience and a strong portfolio may be considered in lieu of formal education.
Experience:
- 📝 Enhancement Note: Candidates should have at least 4 years of experience as an SRE, DevOps, or Platform Engineer with ML projects. Experience in high-transaction environments such as banking, accounting, payroll, or logistics is a plus.
Required Skills:
-
📝 Enhancement Note: Proficiency in Prometheus, Grafana, ELK/EFK, OpenTelemetry, or Datadog is essential for this role, as is experience with Kubernetes, Docker, Helm, and infrastructure automation tools like Terraform or Pulumi.
-
📝 Enhancement Note: Solid fundamentals in CI/CD for ML pipelines, including testing, validation, and rollback, are required for this role.
-
📝 Enhancement Note: Knowledge of model monitoring frameworks such as Evidently, Arize AI, WhyLabs, or similar is highly desirable for this role.
-
📝 Enhancement Note: Experience with orchestrators such as Airflow, Kubeflow, or experiment tracking tools like MLflow and Weights & Biases is required for this role.
Preferred Skills:
-
📝 Enhancement Note: Experience in high-transaction environments such as banking, accounting, payroll, or logistics is a plus for this role.
-
📝 Enhancement Note: Familiarity with chaos engineering, stress testing, staging testing, and continuous integration is highly desirable for this role.
📊 Web Portfolio & Project Requirements
Portfolio Essentials:
-
📝 Enhancement Note: A strong portfolio demonstrating experience with ML model monitoring, observability, and SRE practices is essential for this role.
-
📝 Enhancement Note: Include examples of dashboards and metrics developed to evaluate model performance, cost, and stability.
-
📝 Enhancement Note: Highlight experience with structured logging, drift monitoring, data quality, and inference error tools.
-
📝 Enhancement Note: Showcase experience with chaos engineering, stress testing, staging testing, and continuous integration.
Technical Documentation:
-
📝 Enhancement Note: Include documentation demonstrating proficiency in CI/CD for ML pipelines, as well as experience with orchestrators such as Airflow, Kubeflow, or experiment tracking tools like MLflow and Weights & Biases.
-
📝 Enhancement Note: Highlight experience with Kubernetes, Docker, Helm, and infrastructure automation tools like Terraform or Pulumi.
-
📝 Enhancement Note: Include documentation demonstrating experience with Prometheus, Grafana, ELK/EFK, OpenTelemetry, or Datadog.
💵 Compensation & Benefits
Salary Range:
- 📝 Enhancement Note: The salary range for this role is estimated to be between 120,000 MXN and 180,000 MXN per year, based on industry standards for MLOps Engineer (SRE) roles in Mexico.
Benefits:
- 📝 Enhancement Note: Benefits for this role may include health insurance, retirement plans, and other perks common to the tech industry in Mexico.
Working Hours:
- 📝 Enhancement Note: The working hours for this role are expected to be standard full-time hours, with some flexibility for maintenance windows and project deadlines.
🎯 Team & Company Context
🏢 Company Culture
Industry:
- 📝 Enhancement Note: Metova is a leading company in Mexico specializing in accounting software, with a strong focus on innovation and technology.
Company Size:
- 📝 Enhancement Note: As a leading company in its industry, Metova is expected to have a medium to large-sized team, providing ample opportunities for collaboration and growth.
Founded:
- 📝 Enhancement Note: Metova was founded in 2006, giving it a solid track record of success and stability in the accounting software industry.
Team Structure:
- 📝 Enhancement Note: The MLOps Engineer (SRE) role is expected to be part of a larger team focused on ML model development, deployment, and maintenance, collaborating with data science, product, and engineering teams.
Development Methodology:
- 📝 Enhancement Note: Metova is expected to use Agile methodologies for software development, with a focus on continuous integration and delivery.
Company Website: Metova
📈 Career & Growth Analysis
Web Technology Career Level:
- 📝 Enhancement Note: This role is expected to be at the senior level, requiring a strong background in SRE, DevOps, or Platform Engineering with a focus on ML projects.
Reporting Structure:
- 📝 Enhancement Note: The MLOps Engineer (SRE) is expected to report to the Head of MLOps or a similar role, collaborating with data science, product, and engineering teams.
Technical Impact:
- 📝 Enhancement Note: This role is expected to have a significant impact on the reliability, performance, and stability of ML models in production, as well as the overall quality and user experience of Metova's accounting software products.
Growth Opportunities:
- 📝 Enhancement Note: This role offers excellent opportunities for growth and development, including the chance to work with cutting-edge technologies, apply SRE principles, and collaborate with cross-functional teams.
🌐 Work Environment
Office Type:
- 📝 Enhancement Note: As a remote role, the work environment for this position is expected to be primarily virtual, with occasional in-person meetings or events.
Office Location(s):
- 📝 Enhancement Note: This role is based in Mexico and is expected to require some overlap with Central Standard Time (CST) working hours.
Workspace Context:
- 📝 Enhancement Note: As a remote role, the workspace for this position is expected to be a well-equipped home office with a reliable internet connection and appropriate tools for ML model monitoring and observability.
Work Schedule:
- 📝 Enhancement Note: The work schedule for this role is expected to be standard full-time hours, with some flexibility for maintenance windows, incident response, and project deadlines.
📄 Application & Technical Interview Process
Interview Process:
- 📝 Enhancement Note: The interview process for this role is expected to include technical assessments, coding challenges, and behavioral interviews, focusing on the candidate's experience with ML model monitoring, observability, and SRE practices.
Portfolio Review Tips:
- 📝 Enhancement Note: Prepare a portfolio demonstrating experience with ML model monitoring, observability, and SRE practices, including examples of dashboards, metrics, and incident response strategies.
Technical Challenge Preparation:
- 📝 Enhancement Note: Brush up on your knowledge of ML model monitoring frameworks, observability tools, and SRE practices, and be prepared to discuss your experience with these technologies in the context of real-world projects.
ATS Keywords:
- 📝 Enhancement Note: Include relevant keywords such as MLOps, SRE, DevOps, Platform Engineering, Machine Learning, Airflow, Kubeflow, MLflow, Weights & Biases, Prometheus, Grafana, Kubernetes, Docker, Terraform, CI/CD, Monitoring, and other relevant terms in your resume and application materials.
🛠 Technology Stack & Web Infrastructure
Frontend Technologies:
- 📝 Enhancement Note: As this role focuses on ML model monitoring and observability, frontend technologies are not expected to be a significant part of the technology stack.
Backend & Server Technologies:
- 📝 Enhancement Note: This role is expected to require proficiency in backend and server technologies such as Kubernetes, Docker, Helm, and infrastructure automation tools like Terraform or Pulumi.
Development & DevOps Tools:
- 📝 Enhancement Note: This role is expected to require experience with CI/CD pipelines, as well as tools for ML model monitoring, observability, and incident response, such as Prometheus, Grafana, ELK/EFK, OpenTelemetry, or Datadog.
👥 Team Culture & Values
Web Development Values:
- 📝 Enhancement Note: Metova is expected to value innovation, collaboration, and a strong focus on user experience, with a commitment to delivering high-quality accounting software products.
Collaboration Style:
- 📝 Enhancement Note: As a remote role, the collaboration style for this position is expected to be primarily virtual, with regular communication and coordination with cross-functional teams using tools such as Slack, Microsoft Teams, or Google Workspace.
⚡ Challenges & Growth Opportunities
Technical Challenges:
- 📝 Enhancement Note: This role is expected to present technical challenges related to ML model monitoring, observability, and incident response, as well as the application of SRE principles to ensure the reliability and performance of ML models in production.
Learning & Development Opportunities:
- 📝 Enhancement Note: This role offers excellent opportunities for learning and development, including the chance to work with cutting-edge technologies, apply SRE principles, and collaborate with cross-functional teams.
💡 Interview Preparation
Technical Questions:
- 📝 Enhancement Note: Prepare for technical questions related to ML model monitoring, observability, and incident response, as well as your experience with relevant tools and technologies.
Company & Culture Questions:
- 📝 Enhancement Note: Research Metova's company culture, values, and mission, and be prepared to discuss how your skills and experience align with these aspects of the organization.
Portfolio Presentation Strategy:
- 📝 Enhancement Note: Prepare a portfolio demonstrating your experience with ML model monitoring, observability, and incident response, including examples of dashboards, metrics, and real-world projects.
📌 Application Steps
To apply for this MLOps Engineer (SRE) position at Metova:
- Submit your application through the provided link.
- Customize your portfolio with live demos and responsive examples showcasing your experience with ML model monitoring, observability, and incident response.
- Optimize your resume for MLOps and SRE roles, highlighting your relevant skills and experience with ML model monitoring, observability, and incident response.
- Prepare for technical interviews by brushing up on your knowledge of ML model monitoring frameworks, observability tools, and SRE practices, and be ready to discuss your experience with these technologies in the context of real-world projects.
- Research Metova's company culture, values, and mission, and be prepared to discuss how your skills and experience align with these aspects of the organization.
⚠️ Important Notice: This enhanced job description includes AI-generated insights and industry-standard assumptions. All details should be verified directly with the hiring organization before making application decisions.
Application Requirements
Candidates should have 4+ years of experience as an SRE, DevOps, or Platform Engineer with ML projects. Proficiency in tools like Kubernetes, Docker, and CI/CD for ML pipelines is essential.