📍 Job Overview

Job Title: ML Platform Engineer
Company: 42dot
Location: Pangyo, Gyeonggi-do, South Korea
Job Type: Full-Time, On-site
Category: Data Engineering, Machine Learning
Date Posted: 2025-07-02
Experience Level: Mid-Senior (5-10 years)

🚀 Role Summary

Develop and maintain high-scale, reliable data platforms for managing, visualizing, searching, and serving large-scale datasets for ML model training, fine-tuning, and validation.
Collaborate cross-functionally with ML algorithm, ML application, and cloud infrastructure teams to align ML platforms with the overall autonomous driving system architecture.
Build advanced autonomous driving data SDKs, including scene data search, datasets preparation, and dataset loading.
Optimize data processing pipelines, reduce latencies, and improve Test Procedure (TP) coverage.
Bootstrap and maintain infrastructure for data platform components, including data processing pipelines, databases, data lakehouses, and data serving.

📝 Enhancement Note: This role requires a strong background in data engineering and machine learning platforms, with a focus on large-scale datasets and distributed systems. Familiarity with autonomous driving data and ML model training lifecycles is a plus.

💻 Primary Responsibilities

Data Platform Development: Design, develop, and maintain high-scale, reliable data platforms for managing, visualizing, searching, and serving large-scale datasets for ML model training, fine-tuning, and validation.
Data SDK Development: Build advanced autonomous driving data SDKs, including scene data search, datasets preparation, and dataset loading.
Data Pipeline Optimization: Dig into performance bottlenecks along data processing pipelines, reduce data processing latencies, data search latencies, and improve TP coverage.
Infrastructure Management: Bootstrap and maintain infrastructure for data platform components, including data processing pipelines, databases, data lakehouses, and data serving.
Cross-Functional Collaboration: Collaborate with ML algorithm, ML application, and cloud infrastructure teams to align ML platforms with the overall autonomous driving system architecture.

📝 Enhancement Note: This role involves a significant amount of data pipeline optimization and infrastructure management, requiring strong problem-solving skills and a deep understanding of data technologies and architectures.

🎓 Skills & Qualifications

Education: Bachelor's degree or higher in Computer Science, Engineering, Robotics, or a similar technical field.

Experience: Minimum of 5 years of experience in Data Engineering or ML Platform roles.

Required Skills:

Proficient in Python and solid experience in Python SDK development
Solid working experience in Databases (e.g., MongoDB, PostgreSQL, etc.)
Hands-on experience with data pipeline job orchestration using Databricks Workflows or Apache Airflow, and integrating data pipelines with machine learning models
Extensive experience with data technologies and architectures such as Data Warehouse (e.g., Hive) or Lakehouse (e.g., Delta Lake)
Experience with Apache Spark or other big data computing engines

Preferred Skills:

Experience with autonomous vehicle sensor data (e.g., LiDAR, camera, radar)
Experience with ML model training lifecycle (e.g., data preparation, model training/validation/deployment, etc.)
Understanding of modern AI frameworks (e.g., PyTorch, TensorFlow, etc.)
Understanding of data governance principles, data privacy regulations, and experience implementing security measures to protect data

📝 Enhancement Note: While not strictly required, experience with autonomous driving data and ML model training lifecycles can provide a significant advantage in this role.

📊 Web Portfolio & Project Requirements

Portfolio Essentials:

Demonstrate experience in developing and maintaining high-scale, reliable data platforms for managing large-scale datasets.
Showcase projects that involve data pipeline job orchestration, data processing pipeline optimization, and data serving.
Highlight any experience with autonomous driving data, ML model training lifecycles, or modern AI frameworks.
Include any relevant certifications or training in data governance, data privacy, or data security.

Technical Documentation:

Document code quality, commenting, and documentation standards for data processing pipelines and data serving SDKs.
Include version control, deployment processes, and server configuration details for data platform components.
Describe testing methodologies, performance metrics, and optimization techniques for data processing pipelines and data serving SDKs.

📝 Enhancement Note: As this role involves a significant amount of data pipeline optimization and infrastructure management, it's essential to provide detailed technical documentation and case studies demonstrating your problem-solving skills and understanding of data technologies and architectures.

💵 Compensation & Benefits

Salary Range: Based on the South Korean job market for mid-senior data engineering roles, the estimated salary range for this position is approximately ₩70,000,000 - ₩90,000,000 per year. This estimate is based on regional market research, company size, and role complexity.

Benefits:

Competitive salary and benefits package
Opportunities for professional growth and development
Collaborative and innovative work environment
Flexible working hours and remote work options

Working Hours: 40 hours per week, with flexible working hours and remote work options available.

📝 Enhancement Note: While the salary range is estimated based on regional market research, the actual salary offer may vary depending on the candidate's experience, skills, and negotiation.

🎯 Team & Company Context

Company Culture:

Industry: Autonomous driving technology and AI
Company Size: Medium-sized (100-500 employees)
Founded: 2017
Team Structure: The data engineering team at 42dot consists of experienced professionals working collaboratively to develop and maintain high-scale, reliable data platforms for autonomous driving ML models.
Development Methodology: Agile/Scrum methodologies, with a focus on continuous integration, continuous deployment, and iterative development.
Company Website: 42dot.ai

📝 Enhancement Note: 42dot's focus on autonomous driving technology and AI requires a strong understanding of data engineering principles and a willingness to adapt to cutting-edge technologies and methodologies.

Career & Growth Analysis:

Web Technology Career Level: Mid-Senior (5-10 years of experience)
Reporting Structure: This role reports directly to the Data Engineering Manager and collaborates cross-functionally with ML algorithm, ML application, and cloud infrastructure teams.
Technical Impact: The ML Platform Engineer plays a crucial role in developing and maintaining data platforms that support the ML model training and evaluation lifecycle, directly impacting the performance and reliability of autonomous driving systems.

Growth Opportunities:

Technical Growth: Develop expertise in autonomous driving data, ML model training lifecycles, and modern AI frameworks.
Leadership Potential: Demonstrate strong problem-solving skills, technical leadership, and mentoring abilities to advance into senior or management roles.
Architecture Decisions: Contribute to the design and architecture of data platforms, data pipelines, and data serving SDKs, influencing the overall autonomous driving system architecture.

📝 Enhancement Note: As a mid-senior role, this position offers significant growth opportunities for technical professionals looking to advance their careers in data engineering and machine learning platforms.

🌐 Work Environment

Office Type: On-site, with flexible remote work options available.

Office Location(s): Pangyo, Gyeonggi-do, South Korea

Workspace Context:

Workspace Aspect 1: Collaborative work environment with dedicated spaces for team meetings and brainstorming sessions.
Workspace Aspect 2: Modern office equipment, including high-performance workstations, multiple monitors, and testing devices, to support data engineering tasks.
Workspace Aspect 3: Opportunities for cross-functional collaboration with ML algorithm, ML application, and cloud infrastructure teams, fostering a culture of innovation and continuous learning.

Work Schedule: Flexible working hours, with a focus on results and deliverables rather than strict working hours.

📝 Enhancement Note: While this role is primarily on-site, 42dot offers flexible remote work options to accommodate the needs of its employees and promote work-life balance.

📄 Application & Technical Interview Process

Interview Process:

Document Screening: Review of the candidate's resume and portfolio.
Coding Test: Assessment of the candidate's Python programming skills and data engineering problem-solving abilities.
Technical Phone Screen (1 hour): Discussion of the candidate's technical skills, experience, and career goals.
On-site or Video Interview (3 hours): In-depth discussion of the candidate's technical skills, experience, and cultural fit, as well as a case study or architecture design challenge.

Portfolio Review Tips:

Highlight projects that demonstrate your experience in developing and maintaining high-scale, reliable data platforms for managing large-scale datasets.
Include any relevant certifications or training in data governance, data privacy, or data security.
Showcase your understanding of autonomous driving data, ML model training lifecycles, and modern AI frameworks.

Technical Challenge Preparation:

Brush up on your Python programming skills and data engineering problem-solving abilities.
Familiarize yourself with data technologies and architectures such as Data Warehouse (e.g., Hive) or Lakehouse (e.g., Delta Lake), and Apache Spark or other big data computing engines.
Prepare for case study or architecture design challenges by reviewing relevant industry trends and best practices.

ATS Keywords:

Programming Languages: Python, SQL, Apache Spark, Hive, Delta Lake
Data Technologies: MongoDB, PostgreSQL, Apache Airflow, Databricks, Apache Kafka, Apache Hadoop
Machine Learning: ML Platform, ML Model Training, ML Model Evaluation, ML Model Deployment
Data Governance: Data Governance, Data Privacy, Data Security, Data Compliance
Infrastructure: Cloud Infrastructure, Data Center, Server Management, Network Management
Soft Skills: Problem-Solving, Teamwork, Communication, Leadership, Mentoring

📝 Enhancement Note: To optimize your resume for the ATS, ensure that relevant keywords are strategically placed throughout the document, focusing on data engineering and machine learning platforms.

🛠 Technology Stack & Web Infrastructure

Frontend Technologies: Not applicable, as this role focuses on data engineering and machine learning platforms.

Backend & Server Technologies:

Databases: MongoDB, PostgreSQL
Data Processing: Apache Spark, Apache Kafka, Apache Hadoop
Data Orchestration: Apache Airflow, Databricks
Data Warehouse/Lakehouse: Hive, Delta Lake
Cloud Infrastructure: AWS, Google Cloud, Azure

Development & DevOps Tools:

Version Control: Git
CI/CD Pipelines: Jenkins, GitLab CI/CD
Monitoring Tools: Prometheus, Grafana, ELK Stack
Containerization: Docker, Kubernetes
Infrastructure as Code (IaC): Terraform, CloudFormation

📝 Enhancement Note: As a data engineering role focused on machine learning platforms, this position requires a strong understanding of data technologies, data processing pipelines, and cloud infrastructure.

👥 Team Culture & Values

Data Engineering Values:

Data Quality: Prioritize data quality, accuracy, and consistency in all data processing pipelines and data serving SDKs.
Performance Optimization: Continuously optimize data processing pipelines, data search latencies, and TP coverage to improve the efficiency of ML model training and evaluation lifecycles.
Collaboration: Work closely with ML algorithm, ML application, and cloud infrastructure teams to align ML platforms with the overall autonomous driving system architecture.
Innovation: Embrace cutting-edge technologies and methodologies to stay ahead of the curve in autonomous driving data engineering and machine learning platforms.

Collaboration Style:

Cross-Functional Integration: Collaborate closely with ML algorithm, ML application, and cloud infrastructure teams to ensure seamless data flow and efficient ML model training and evaluation lifecycles.
Code Review Culture: Conduct regular code reviews to maintain high coding standards, share knowledge, and ensure code quality and consistency across data processing pipelines and data serving SDKs.
Knowledge Sharing: Foster a culture of continuous learning and knowledge sharing, encouraging team members to stay up-to-date with the latest data engineering trends, best practices, and emerging technologies.

📝 Enhancement Note: 42dot's data engineering team values collaboration, innovation, and continuous learning, fostering a culture of excellence in autonomous driving data engineering and machine learning platforms.

⚡ Challenges & Growth Opportunities

Technical Challenges:

Data Scalability: Develop and maintain high-scale, reliable data platforms capable of managing, visualizing, searching, and serving large-scale datasets for ML model training, fine-tuning, and validation.
Data Latency: Optimize data processing pipelines, reduce data processing latencies, data search latencies, and improve TP coverage to enhance the efficiency of ML model training and evaluation lifecycles.
Data Governance: Implement data governance principles, data privacy regulations, and security measures to protect data and ensure compliance with relevant regulations and standards.
Emerging Technologies: Stay up-to-date with the latest data engineering trends, best practices, and emerging technologies to maintain a competitive edge in autonomous driving data engineering and machine learning platforms.

Learning & Development Opportunities:

Technical Skill Development: Enhance your data engineering skills by working on cutting-edge projects, attending industry conferences, and obtaining relevant certifications.
Leadership Development: Develop your leadership skills by mentoring junior team members, contributing to architecture decisions, and driving innovation in autonomous driving data engineering and machine learning platforms.
Architecture Decision-Making: Contribute to the design and architecture of data platforms, data pipelines, and data serving SDKs, influencing the overall autonomous driving system architecture.

📝 Enhancement Note: As a mid-senior role in data engineering and machine learning platforms, this position offers numerous technical challenges and learning opportunities to drive professional growth and development.

💡 Interview Preparation

Technical Questions:

Data Engineering Fundamentals: Explain your experience with data processing pipelines, data orchestration, and data serving SDKs. Describe your approach to data quality, performance optimization, and data governance.
System Design & Architecture: Discuss your experience with designing and architecting data platforms, data pipelines, and data serving SDKs. Explain your approach to scaling, latency reduction, and data governance.
Problem-Solving: Present a challenging data engineering problem you've faced in the past and explain your approach to solving it. Describe the outcome and any lessons learned.

Company & Culture Questions:

Data Engineering Culture: Explain how you've contributed to a collaborative, innovative, and continuous learning culture in your previous data engineering roles.
Autonomous Driving Data: Describe your experience with autonomous driving data, ML model training lifeccles, and modern AI frameworks. Explain how you've applied this knowledge to develop and maintain high-scale, reliable data platforms.
ML Platform Integration: Discuss your experience integrating ML platforms with overall autonomous driving system architecture. Explain how you've collaborated with ML algorithm, ML application, and cloud infrastructure teams to align ML platforms with system requirements.

Portfolio Presentation Strategy:

Live Demonstration: Prepare a live demonstration of your data engineering projects, showcasing your experience with data processing pipelines, data orchestration, and data serving SDKs.
Code Walkthrough: Be ready to walk through your code, explaining your design choices, optimization techniques, and problem-solving approaches.
Architecture Design: Prepare a high-level architecture design for a hypothetical data platform, demonstrating your understanding of data engineering principles, data technologies, and emerging trends.

📝 Enhancement Note: To excel in the technical interview process, focus on your data engineering experience, problem-solving skills, and ability to collaborate with cross-functional teams to drive innovation in autonomous driving data engineering and machine learning platforms.

📌 Application Steps

To apply for this ML Platform Engineer position at 42dot:

Resume Optimization: Tailor your resume to highlight your data engineering experience, emphasizing projects that demonstrate your ability to develop and maintain high-scale, reliable data platforms for managing large-scale datasets.
Portfolio Customization: Customize your portfolio to showcase your experience with data processing pipelines, data orchestration, and data serving SDKs. Include any relevant certifications or training in data governance, data privacy, or data security.
Technical Interview Preparation: Brush up on your Python programming skills, data engineering problem-solving abilities, and prepare for case study or architecture design challenges.
Company Research: Research 42dot's focus on autonomous driving technology and AI, and familiarize yourself with the company's culture, values, and work environment.

⚠️ Important Notice: This enhanced job description includes AI-generated insights and data engineering industry-standard assumptions. All details should be verified directly with 42dot before making application decisions.

ML Platform Engineer