Senior Platform Engineer
π Job Overview
- Job Title: Senior Platform Engineer
- Company: Alembic
- Location: San Francisco, California, United States
- Job Type: Full-Time
- Category: DevOps Engineer, Infrastructure Engineer
- Date Posted: June 11, 2025
- Experience Level: 10+ years
- Remote Status: On-site
π Role Summary
- Design and build scalable infrastructure for Alembic's data-driven platform, powering AI/ML workloads across cloud and on-prem environments.
- Collaborate with the AI Research team to deploy and manage novel ML algorithms, driving next-generation work on GPU-based development efforts.
- Serve as a technical mentor and thought leader, promoting best practices in system design, infrastructure reliability, and code quality across the engineering organization.
π Enhancement Note: This role offers a unique opportunity to shape Alembic's infrastructure from the ground up, working on cutting-edge technology and collaborating with a talented AI Research team.
π» Primary Responsibilities
- Design, build, integrate, and operate core services, data pipelines, and distributed AI/ML workloads using Infrastructure as Code (IaC) tools like Terraform and Ansible.
- Develop and maintain robust CI/CD pipelines using tools such as GitHub Actions, ArgoCD, or Bazel, ensuring reliable and rapid deployments with automated testing and rollback workflows.
- Establish and operate comprehensive observability systems, including metrics, logging, and distributed tracing, using tools like Prometheus, Grafana, Datadog, and OpenTelemetry.
- Work closely with the AI Research team to deploy and manage novel ML algorithms, ensuring seamless integration and optimal performance.
- Mentor and guide other engineers, fostering a culture of best practices, code quality, and continuous learning within the engineering organization.
π Enhancement Note: This role requires a strong background in platform engineering, with a proven track record of designing, building, and operating scalable infrastructure in fast-paced environments.
π Skills & Qualifications
Education: Bachelor's degree in Computer Science, Engineering, or a related field. Relevant experience may be considered in lieu of a degree.
Experience: 15-20 years of engineering experience, with a significant focus on platform, infrastructure, or DevOps/SRE teams.
Required Skills:
- Proficient in AWS (or GCP/Azure), with a strong understanding of cloud services and architecture.
- Expertise in container orchestration with Kubernetes and service discovery at scale.
- Deep knowledge of DevOps principles, infrastructure as code (Terraform, Ansible), and immutable infrastructure.
- Experience deploying and operating production systems in fast-paced environments, ideally in early- or growth-stage startups.
- Proficient in systems or scripting language (e.g., Python, Bash).
Preferred Skills:
- Experience with secure networking, secrets management, and managing systems in compliance-heavy environments.
- A bias for simplicity, automation, and building tools that empower developers.
- Experience with GPU-based development efforts and AI/ML workloads.
π Enhancement Note: While not explicitly stated, familiarity with data processing pipelines, data warehousing, and big data technologies would be beneficial for this role, given Alembic's focus on marketing data analytics.
π Web Portfolio & Project Requirements
Portfolio Essentials:
- Demonstrate your experience with infrastructure as code (IaC) tools, such as Terraform and Ansible, by showcasing projects that exhibit repeatable, auditable, and environment-agnostic infrastructure deployments.
- Highlight your CI/CD pipeline experience by presenting projects that showcase automated testing, rollback, and deployment workflows using tools like GitHub Actions, ArgoCD, or Bazel.
- Showcase your observability systems expertise by presenting projects that demonstrate metrics, logging, and distributed tracing using tools like Prometheus, Grafana, Datadog, and OpenTelemetry.
Technical Documentation:
- Demonstrate your understanding of infrastructure design, operation, and maintenance by providing well-commented and well-documented code examples.
- Explain your approach to incident detection and diagnosis, showcasing your ability to proactively identify and resolve issues in production environments.
π Enhancement Note: While not explicitly stated, providing examples of your collaboration with data science or AI teams on infrastructure projects would strengthen your portfolio for this role.
π΅ Compensation & Benefits
Salary Range: $190,000 - $220,000 per year (based on experience and market research for senior DevOps engineers in the San Francisco Bay Area)
Benefits: (Not explicitly stated in the job listing, but common for similar roles in the industry)
- Health, dental, and vision insurance
- 401(k) plan with company matching
- Flexible time off and paid holidays
- Professional development opportunities, such as conference attendance and training
- Competitive equity compensation package
Working Hours: Full-time, with a standard workweek of 40 hours, typically Monday through Friday, with flexibility for project deadlines and maintenance windows.
π Enhancement Note: Given Alembic's early-stage nature, benefits may be subject to change or improvement as the company grows and secures additional funding.
π― Team & Company Context
π’ Company Culture
Industry: Marketing technology, data analytics, and AI/ML
Company Size: Early-stage startup with a small but growing engineering team, providing ample opportunities for growth and impact.
Founded: 2021, with a mission to transform how businesses harness and leverage data through cutting-edge, data-driven technology.
Team Structure:
- Small but growing engineering team, collaborating closely with the AI Research team to deploy and manage novel ML algorithms.
- Flat hierarchy with a focus on cross-functional collaboration and open communication.
- Dynamic environment where roles, priorities, and projects may adapt to individual skill sets and goals.
Development Methodology:
- Agile software development methodology, with a focus on iterative development, continuous integration, and rapid deployment.
- Collaborative approach to problem-solving, with a strong emphasis on knowledge sharing and technical mentoring.
- Data-driven decision-making, with a focus on leveraging data to inform product development and business strategy.
Company Website: https://getalembic.com/
π Enhancement Note: Alembic's early-stage nature and dynamic environment may appeal to candidates who thrive in fast-paced, adaptable work environments and prefer a more hands-on, in-the-weeds approach to engineering.
π Career & Growth Analysis
Web Technology Career Level: Senior Platform Engineer, responsible for designing, building, and operating scalable infrastructure that powers Alembic's data-driven platform, collaborating with the AI Research team to deploy and manage novel ML algorithms, and serving as a technical mentor and thought leader within the engineering organization.
Reporting Structure: This role reports directly to the CTO or a similar technical leadership position, with a high degree of autonomy and influence over Alembic's infrastructure and platform development.
Technical Impact: This role has a significant impact on Alembic's platform scalability, reliability, and performance, as well as the successful deployment and management of novel ML algorithms.
Growth Opportunities:
- Technical leadership opportunities, as Alembic's engineering team grows and expands.
- Architecture and design decisions, shaping the future of Alembic's platform and infrastructure.
- Mentorship and knowledge-sharing opportunities, fostering a culture of continuous learning and development within the engineering organization.
π Enhancement Note: Given Alembic's early-stage nature and rapid growth, this role offers significant opportunities for career growth and development, with the potential to take on increasing responsibility and leadership as the company expands.
π Work Environment
Office Type: Modern, collaborative workspace designed to facilitate cross-functional collaboration and open communication.
Office Location(s): San Francisco, California, United States
Workspace Context:
- Collaborative workspace with ample opportunities for interaction and knowledge-sharing between engineers, data scientists, and other team members.
- State-of-the-art development tools, multiple monitors, and testing devices available to support efficient and effective work.
- Flexible work arrangements, with a focus on results and impact rather than strict hours or attendance.
Work Schedule: Full-time, with a standard workweek of 40 hours, typically Monday through Friday, with flexibility for project deadlines and maintenance windows.
π Enhancement Note: Alembic's dynamic and adaptable work environment may appeal to candidates who prefer a more flexible, results-driven approach to work, with a focus on collaboration and open communication.
π Application & Technical Interview Process
Interview Process:
- Phone or video screen to assess communication skills, cultural fit, and initial technical competency.
- Technical deep-dive to evaluate your understanding of infrastructure design, operation, and maintenance, as well as your ability to collaborate with data science and AI teams on infrastructure projects.
- On-site or virtual final interview with key stakeholders, focusing on your technical leadership potential, problem-solving skills, and alignment with Alembic's mission and values.
Portfolio Review Tips:
- Highlight your experience with infrastructure as code (IaC) tools, CI/CD pipelines, and observability systems, providing concrete examples of your ability to design, build, and operate scalable infrastructure.
- Demonstrate your ability to work collaboratively with data science and AI teams, showcasing your understanding of ML algorithms and GPU-based development efforts.
- Explain your approach to incident detection and diagnosis, showcasing your ability to proactively identify and resolve issues in production environments.
Technical Challenge Preparation:
- Brush up on your knowledge of AWS services, container orchestration with Kubernetes, and infrastructure as code (IaC) tools like Terraform and Ansible.
- Practice designing, building, and operating scalable infrastructure, focusing on repeatable, auditable, and environment-agnostic deployments.
- Prepare for questions about your experience with CI/CD pipelines, automated testing, and deployment workflows, as well as your approach to observability systems and incident management.
ATS Keywords: (Not explicitly stated in the job listing, but relevant for a Senior Platform Engineer role)
- Cloud services: AWS, GCP, Azure
- Containerization: Docker, Kubernetes, Helm
- Infrastructure as Code (IaC): Terraform, Ansible, CloudFormation
- CI/CD: GitHub Actions, ArgoCD, Bazel, Jenkins, CircleCI
- Observability: Prometheus, Grafana, Datadog, OpenTelemetry, ELK Stack
- Programming languages: Python, Bash, Go, Java, C++
- Data processing: Apache Spark, Apache Kafka, Apache Beam, Apache Flink
- Data warehousing: Amazon Redshift, Google BigQuery, Snowflake, Azure Synapse Analytics
- Big data technologies: Hadoop, Hive, Pig, Impala, Presto
- AI/ML: TensorFlow, PyTorch, scikit-learn, Keras, XGBoost, LightGBM
- GPU-based development: CUDA, cuDNN, ROCm, OpenCL
π Enhancement Note: While not explicitly stated, familiarity with data processing pipelines, data warehousing, and big data technologies would be beneficial for this role, given Alembic's focus on marketing data analytics.
π Technology Stack & Web Infrastructure
Cloud Services:
- Primary: AWS (with experience in GCP or Azure a plus)
- Services: EC2, RDS, DynamoDB, S3, Lambda, ECS, EKS, IAM, CloudFormation, AWS Glue, AWS Lake Formation, AWS Data Pipeline, AWS Step Functions
Containerization:
- Primary: Kubernetes (with experience in Docker and Helm a plus)
- Orchestration: ECS, EKS, Kubernetes, Docker Swarm, Nomad
- Service discovery: Kubernetes DNS, CoreDNS, Consul, etcd
Infrastructure as Code (IaC):
- Primary: Terraform, Ansible
- Other tools: CloudFormation, Pulumi, AWS CDK, Azure Resource Manager (ARM), Google Cloud Deployment Manager (GCDM)
CI/CD:
- Primary: GitHub Actions, ArgoCD, Bazel
- Other tools: Jenkins, CircleCI, GitLab CI/CD, Travis CI, GitHub Pipelines, GitHub Classic
Observability:
- Primary: Prometheus, Grafana, Datadog, OpenTelemetry
- Other tools: ELK Stack (Elasticsearch, Logstash, Kibana), New Relic, AppDynamics, Dynatrace, Honeycomb, Zipkin, Jaeger, OpenTracing
π Enhancement Note: Given Alembic's focus on marketing data analytics, experience with data processing pipelines, data warehousing, and big data technologies would be beneficial for this role.
π₯ Team Culture & Values
Web Development Values:
- Innovation: Embrace cutting-edge technology and continuous learning to drive next-generation work on GPU-based development efforts and AI/ML workloads.
- Simplicity: Prioritize simplicity, automation, and building tools that empower developers and streamline workflows.
- Collaboration: Foster a culture of open communication, knowledge-sharing, and cross-functional collaboration between engineers, data scientists, and other team members.
- Reliability: Ensure the scalability, reliability, and performance of Alembic's platform, with a focus on proactive incident detection and diagnosis.
Collaboration Style:
- Cross-functional: Encourage open communication and collaboration between engineers, data scientists, and other team members, with a focus on knowledge-sharing and technical mentoring.
- Hands-on: Emphasize a hands-on, in-the-weeds approach to engineering, with a focus on fixing broken pipelines, designing the future of Alembic's platform, and driving next-generation work on GPU-based development efforts and AI/ML workloads.
- Mentorship-driven: Foster a culture of continuous learning and development, with a focus on technical mentoring, knowledge-sharing, and skill development opportunities.
π Enhancement Note: Alembic's dynamic and adaptable work environment may appeal to candidates who prefer a more flexible, results-driven approach to work, with a focus on collaboration and open communication.
β‘ Challenges & Growth Opportunities
Technical Challenges:
- Scalability: Design, build, and operate scalable infrastructure that powers Alembic's data-driven platform, with a focus on performance, reliability, and cost-efficiency.
- Collaboration: Work closely with the AI Research team to deploy and manage novel ML algorithms, ensuring seamless integration and optimal performance.
- Innovation: Drive next-generation work on GPU-based development efforts and AI/ML workloads, embracing cutting-edge technology and continuous learning.
- Incident management: Proactively identify and resolve issues in production environments, with a focus on minimizing downtime and ensuring high availability.
Learning & Development Opportunities:
- Technical skill development: Expand your expertise in infrastructure as code (IaC) tools, CI/CD pipelines, and observability systems, as well as emerging technologies in data processing, data warehousing, and big data technologies.
- Leadership development: Develop your leadership skills, with opportunities to take on increasing responsibility and technical mentorship roles as Alembic's engineering team grows and expands.
- Architecture and design: Shape the future of Alembic's platform and infrastructure, with opportunities to make significant architectural decisions and drive next-generation work on GPU-based development efforts and AI/ML workloads.
π Enhancement Note: Given Alembic's early-stage nature and rapid growth, this role offers significant opportunities for career growth and development, with the potential to take on increasing responsibility and leadership as the company expands.
π‘ Interview Preparation
Technical Questions:
- Cloud services: Describe your experience with AWS (or GCP/Azure), highlighting your expertise in cloud services and architecture. Provide examples of your ability to design, build, and operate scalable infrastructure in fast-paced environments.
- Container orchestration: Explain your approach to container orchestration with Kubernetes, with a focus on service discovery, scalability, and reliability. Provide examples of your experience with Kubernetes and other container orchestration platforms.
- Infrastructure as Code (IaC): Discuss your experience with infrastructure as code (IaC) tools like Terraform and Ansible, with a focus on repeatable, auditable, and environment-agnostic infrastructure deployments. Provide examples of your ability to design, build, and operate scalable infrastructure using IaC tools.
- CI/CD pipelines: Describe your experience with CI/CD pipelines, with a focus on automated testing, rollback, and deployment workflows. Provide examples of your ability to design, build, and maintain robust CI/CD pipelines using tools like GitHub Actions, ArgoCD, or Bazel.
Company & Culture Questions:
- Company culture: Explain what attracts you to Alembic's dynamic and adaptable work environment, with a focus on collaboration, open communication, and continuous learning.
- Technical leadership: Describe your approach to technical mentoring and knowledge-sharing, with a focus on fostering a culture of continuous learning and development within the engineering organization.
- AI/ML workloads: Explain your understanding of ML algorithms and GPU-based development efforts, with a focus on collaborating with the AI Research team to deploy and manage novel ML algorithms.
Portfolio Presentation Strategy:
- Infrastructure as Code (IaC): Highlight your experience with infrastructure as code (IaC) tools like Terraform and Ansible, providing concrete examples of your ability to design, build, and operate scalable infrastructure.
- CI/CD pipelines: Demonstrate your ability to work collaboratively with data science and AI teams, showcasing your understanding of ML algorithms and GPU-based development efforts.
- Observability systems: Explain your approach to incident detection and diagnosis, showcasing your ability to proactively identify and resolve issues in production environments.
π Enhancement Note: Given Alembic's early-stage nature and dynamic work environment, candidates who thrive in fast-paced, adaptable work environments and prefer a more hands-on, in-the-weeds approach to engineering may be particularly well-suited to this role.
π Application Steps
To apply for this Senior Platform Engineer position at Alembic:
- Submit your application through the application link provided in the job listing.
- Customize your portfolio to highlight your experience with infrastructure as code (IaC) tools, CI/CD pipelines, and observability systems, providing concrete examples of your ability to design, build, and operate scalable infrastructure.
- Tailor your resume to emphasize your relevant technical skills, experience, and achievements in platform engineering, infrastructure, or DevOps/SRE roles.
- Prepare for the technical interview by brushing up on your knowledge of AWS services, container orchestration with Kubernetes, and infrastructure as code (IaC) tools like Terraform and Ansible. Practice designing, building, and operating scalable infrastructure, focusing on repeatable, auditable, and environment-agnostic deployments.
- Research Alembic and its mission to transform how businesses harness and leverage data through cutting-edge, data-driven technology. Prepare thoughtful questions about the company, its culture, and the role to demonstrate your interest and engagement.
β οΈ Important Notice: This enhanced job description includes AI-generated insights and web technology industry-standard assumptions. All details should be verified directly with the hiring organization before making application decisions.
Application Requirements
15β20 years of engineering experience, including significant time spent on platform, infrastructure, or DevOps/SRE teams. Deep experience with AWS, container orchestration with Kubernetes, and strong grasp of DevOps principles are essential.