Senior Site Reliability Engineer (Crypto Exchange)

Hyphen Connect Limited
Full_timeVietnam

📍 Job Overview

  • Job Title: Senior Site Reliability Engineer (Crypto Exchange)
  • Company: Hyphen Connect Limited
  • Location: Vietnam
  • Job Type: Full-Time
  • Category: DevOps Engineer, Site Reliability Engineer
  • Date Posted: 2025-07-28

🚀 Role Summary

  • 📝 Enhancement Note: This role focuses on ensuring the stability, scalability, and performance of a cutting-edge, decentralized crypto exchange, offering institutional-level systems while remaining on-chain and decentralized.

  • A Senior Site Reliability Engineer is required to balance production reliability with engineering-driven automation, reducing manual processes through innovative tooling and process improvements. This role demands a strong commitment to on-call ownership and a passion for building resilient, observable, and self-healing infrastructure.

💻 Primary Responsibilities

  • 📝 Enhancement Note: The primary responsibilities revolve around designing, implementing, and maintaining scalable infrastructure for a high-performance, low-latency trading platform, ensuring system stability, scalability, and security.

  • 💡 Key Responsibilities:

    • Design, implement, and maintain scalable infrastructure for a high-performance, low-latency trading platform.
    • Operate and enhance Kubernetes and Nomad-based environments to ensure system stability, scalability, and security.
    • Develop infrastructure automation and deployment pipelines using Terraform, Ansible, ArgoCD, and GitHub Actions.
    • Collaborate with engineering teams to streamline service onboarding, automate repetitive tasks, and improve deployment efficiency.
    • Enhance observability and reliability through improved logging, metrics, tracing, and alerting using the Grafana ecosystem.
    • Perform root cause analysis and postmortems for production incidents, driving continuous improvements in system resilience and incident response.
    • Work with security and compliance teams to ensure infrastructure meets regulatory and organizational standards.
    • Support multi-environment deployments (dev, staging, testnet, mainnet) with a focus on safe rollouts, rollbacks, and configuration management.
    • Contribute to capacity planning, cost optimization, and infrastructure scaling strategies to support platform growth.

🎓 Skills & Qualifications

Education:

  • A Bachelor's degree in Computer Science, Engineering, or a related field is preferred, but relevant experience may be considered in lieu of a degree.

Experience:

  • 📝 Enhancement Note: Candidates should have over 5 years of experience in DevOps or Site Reliability Engineering, with a strong background in low-latency distributed systems.

  • Required Skills:

    • 5+ years of relevant experience as DevOps/ SRE Engineers.
    • Proven ability to participate in an on-call rotation, demonstrating ownership in incident response and a focus on long-term system stability.
    • Extensive experience operating and maintaining low-latency, distributed systems in production environments.
    • Proficiency with cloud-native platforms and container orchestration tools, including AWS, GCP, Kubernetes, and Nomad.
    • Strong knowledge of Linux/Unix internals and the TCP/IP networking stack.
    • Proficiency in one or more of: Bash, Go, or Python.
    • Expertise in root cause analysis, performance tuning, and system-level debugging in complex service architectures.
    • Experience building and managing end-to-end infrastructure, including infrastructure as code, CI/CD pipelines, and monitoring systems.
    • Familiarity with modern GitOps workflows and tools such as GitHub Actions, ArgoCD, Argo Workflows, and Argo Events.
    • Ability to own production systems end-to-end, from infrastructure as code to automated monitoring and deployment workflows.
    • Pragmatic approach with a focus on depth, ownership, and a bias for action over broad familiarity.
  • Preferred Skills:

    • Experience with the Aeron messaging system.

📊 Web Portfolio & Project Requirements

  • 📝 Enhancement Note: While a portfolio is not explicitly mentioned, candidates should be prepared to discuss their past projects, especially those involving low-latency distributed systems, infrastructure automation, and deployment pipelines.

💵 Compensation & Benefits

  • 📝 Enhancement Note: Salary information is not provided, but based on market research for senior site reliability engineering roles in Vietnam, the estimated salary range is ₫150,000,000 - ₫250,000,000 per year (approximately $6,500 - $11,000 per month).

  • Benefits:

    • Competitive salary and benefits package.
    • Opportunity to work on cutting-edge technology in the crypto exchange industry.
    • Collaborative and dynamic work environment.
    • Opportunities for professional growth and development.

🎯 Team & Company Context

🏢 Company Culture

  • Industry: Fintech, Blockchain
  • Company Size: Medium (51-200 employees)
  • Founded: 2021
  • Team Structure:
    • The team consists of experienced professionals in software engineering, blockchain development, and DevOps.
    • The company values collaboration, innovation, and continuous learning.
  • Development Methodology:
    • The company follows Agile/Scrum methodologies for software development.
    • They emphasize code reviews, testing, and quality assurance practices.
    • Deployment strategies include CI/CD pipelines and automated deployment processes.

📈 Career & Growth Analysis

  • Web Technology Career Level: Senior Site Reliability Engineer - This role involves leading infrastructure projects, mentoring junior team members, and driving technical decisions related to system reliability and performance.
  • Reporting Structure: This role reports directly to the Head of Engineering or a similar position, with a matrix reporting structure to other teams for specific projects.
  • Technical Impact: The Senior Site Reliability Engineer will have a significant impact on the platform's stability, scalability, and performance, ensuring that it can handle increased user demand and maintain high availability.

🌐 Work Environment

  • Office Type: Hybrid (remote and on-site)
  • Office Location(s): Ho Chi Minh City, Vietnam
  • Workspace Context:
    • The company provides a collaborative workspace with multiple monitors and testing devices available.
    • The team encourages knowledge sharing, technical mentoring, and continuous learning.
  • Work Schedule: The work schedule is flexible, with a focus on delivering results and meeting project deadlines.

📄 Application & Technical Interview Process

  • 📝 Enhancement Note: The interview process is not explicitly outlined, but candidates can expect technical assessments related to infrastructure management, system reliability, and problem-solving.

🛠 Technology Stack & Web Infrastructure

  • 📝 Enhancement Note: The technology stack is focused on cloud-native platforms, container orchestration, and infrastructure automation tools.

  • Frontend Technologies: N/A (This role is focused on infrastructure and does not involve frontend development)

  • Backend & Server Technologies:

    • Kubernetes
    • Nomad
    • Terraform
    • Ansible
    • ArgoCD
    • GitHub Actions
    • AWS
    • GCP
  • Development & DevOps Tools:

    • Grafana (for monitoring and alerting)
    • Aeron (messaging system, bonus skill)

👥 Team Culture & Values

  • Web Development Values:
    • The company values innovation, collaboration, and a focus on user experience.
    • They prioritize performance optimization, accessibility, and code quality.
  • Collaboration Style:
    • The team encourages cross-functional collaboration between developers, designers, and stakeholders.
    • They emphasize code review culture and peer programming practices.
    • Knowledge sharing and technical mentoring are encouraged.

⚡ Challenges & Growth Opportunities

  • Technical Challenges:
    • Designing and implementing scalable infrastructure for a high-performance, low-latency trading platform.
    • Ensuring system stability, scalability, and security in a decentralized exchange environment.
    • Improving observability and reliability through enhanced logging, metrics, tracing, and alerting.
    • Performing root cause analysis and driving continuous improvements in system resilience and incident response.
  • Learning & Development Opportunities:
    • Working on cutting-edge technology in the crypto exchange industry.
    • Collaborating with experienced professionals in software engineering, blockchain development, and DevOps.
    • Opportunities for professional growth and development, including mentorship and leadership roles.

💡 Interview Preparation

  • 📝 Enhancement Note: Candidates should prepare for technical interviews focusing on infrastructure management, system reliability, and problem-solving. Familiarize themselves with the mentioned technology stack and be ready to discuss past projects and experiences related to low-latency distributed systems, infrastructure automation, and deployment pipelines.

📌 Application Steps

To apply for this Senior Site Reliability Engineer position:

  1. Submit your application through the provided link.
  2. Customize your resume to highlight relevant experience and skills, with a focus on infrastructure management, system reliability, and problem-solving.
  3. Prepare for technical interviews by brushing up on your knowledge of the mentioned technology stack and reviewing past projects involving low-latency distributed systems, infrastructure automation, and deployment pipelines.
  4. Research the company and the crypto exchange industry to demonstrate your understanding of the business and its technology stack.

Application Requirements

Candidates should have over 5 years of experience in DevOps or Site Reliability Engineering, with a strong background in low-latency distributed systems. Proficiency in cloud-native platforms and container orchestration tools is essential.