Senior Site Reliability Engineer (Crypto Exchange)
📍 Job Overview
- Job Title: Senior Site Reliability Engineer (Crypto Exchange)
- Company: Hyphen Connect Limited
- Location: Vietnam
- Job Type: Full-Time
- Category: DevOps Engineer, Site Reliability Engineer
- Date Posted: 2025-07-28
🚀 Role Summary
-
📝 Enhancement Note: This role focuses on ensuring the stability, scalability, and performance of a cutting-edge, decentralized crypto exchange, offering institutional-level systems while remaining on-chain and decentralized.
-
A Senior Site Reliability Engineer is required to balance production reliability with engineering-driven automation, reducing manual processes through innovative tooling and process improvements. This role demands a strong commitment to on-call ownership and a passion for building resilient, observable, and self-healing infrastructure.
💻 Primary Responsibilities
-
📝 Enhancement Note: The primary responsibilities revolve around designing, implementing, and maintaining scalable infrastructure for a high-performance, low-latency trading platform, ensuring system stability, scalability, and security.
-
💡 Key Responsibilities:
- Design, implement, and maintain scalable infrastructure for a high-performance, low-latency trading platform.
- Operate and enhance Kubernetes and Nomad-based environments to ensure system stability, scalability, and security.
- Develop infrastructure automation and deployment pipelines using Terraform, Ansible, ArgoCD, and GitHub Actions.
- Collaborate with engineering teams to streamline service onboarding, automate repetitive tasks, and improve deployment efficiency.
- Enhance observability and reliability through improved logging, metrics, tracing, and alerting using the Grafana ecosystem.
- Perform root cause analysis and postmortems for production incidents, driving continuous improvements in system resilience and incident response.
- Work with security and compliance teams to ensure infrastructure meets regulatory and organizational standards.
- Support multi-environment deployments (dev, staging, testnet, mainnet) with a focus on safe rollouts, rollbacks, and configuration management.
- Contribute to capacity planning, cost optimization, and infrastructure scaling strategies to support platform growth.
🎓 Skills & Qualifications
Education:
- A Bachelor's degree in Computer Science, Engineering, or a related field is preferred, but relevant experience may be considered in lieu of a degree.
Experience:
-
📝 Enhancement Note: Candidates should have over 5 years of experience in DevOps or Site Reliability Engineering, with a strong background in low-latency distributed systems.
-
Required Skills:
- 5+ years of relevant experience as DevOps/ SRE Engineers.
- Proven ability to participate in an on-call rotation, demonstrating ownership in incident response and a focus on long-term system stability.
- Extensive experience operating and maintaining low-latency, distributed systems in production environments.
- Proficiency with cloud-native platforms and container orchestration tools, including AWS, GCP, Kubernetes, and Nomad.
- Strong knowledge of Linux/Unix internals and the TCP/IP networking stack.
- Proficiency in one or more of: Bash, Go, or Python.
- Expertise in root cause analysis, performance tuning, and system-level debugging in complex service architectures.
- Experience building and managing end-to-end infrastructure, including infrastructure as code, CI/CD pipelines, and monitoring systems.
- Familiarity with modern GitOps workflows and tools such as GitHub Actions, ArgoCD, Argo Workflows, and Argo Events.
- Ability to own production systems end-to-end, from infrastructure as code to automated monitoring and deployment workflows.
- Pragmatic approach with a focus on depth, ownership, and a bias for action over broad familiarity.
-
Preferred Skills:
- Experience with the Aeron messaging system.
📊 Web Portfolio & Project Requirements
- 📝 Enhancement Note: While a portfolio is not explicitly mentioned, candidates should be prepared to discuss their past projects, especially those involving low-latency distributed systems, infrastructure automation, and deployment pipelines.
💵 Compensation & Benefits
-
📝 Enhancement Note: Salary information is not provided, but based on market research for senior site reliability engineering roles in Vietnam, the estimated salary range is ₫150,000,000 - ₫250,000,000 per year (approximately $6,500 - $11,000 per month).
-
Benefits:
- Competitive salary and benefits package.
- Opportunity to work on cutting-edge technology in the crypto exchange industry.
- Collaborative and dynamic work environment.
- Opportunities for professional growth and development.
🎯 Team & Company Context
🏢 Company Culture
- Industry: Fintech, Blockchain
- Company Size: Medium (51-200 employees)
- Founded: 2021
- Team Structure:
- The team consists of experienced professionals in software engineering, blockchain development, and DevOps.
- The company values collaboration, innovation, and continuous learning.
- Development Methodology:
- The company follows Agile/Scrum methodologies for software development.
- They emphasize code reviews, testing, and quality assurance practices.
- Deployment strategies include CI/CD pipelines and automated deployment processes.
📈 Career & Growth Analysis
- Web Technology Career Level: Senior Site Reliability Engineer - This role involves leading infrastructure projects, mentoring junior team members, and driving technical decisions related to system reliability and performance.
- Reporting Structure: This role reports directly to the Head of Engineering or a similar position, with a matrix reporting structure to other teams for specific projects.
- Technical Impact: The Senior Site Reliability Engineer will have a significant impact on the platform's stability, scalability, and performance, ensuring that it can handle increased user demand and maintain high availability.
🌐 Work Environment
- Office Type: Hybrid (remote and on-site)
- Office Location(s): Ho Chi Minh City, Vietnam
- Workspace Context:
- The company provides a collaborative workspace with multiple monitors and testing devices available.
- The team encourages knowledge sharing, technical mentoring, and continuous learning.
- Work Schedule: The work schedule is flexible, with a focus on delivering results and meeting project deadlines.
📄 Application & Technical Interview Process
- 📝 Enhancement Note: The interview process is not explicitly outlined, but candidates can expect technical assessments related to infrastructure management, system reliability, and problem-solving.
🛠 Technology Stack & Web Infrastructure
-
📝 Enhancement Note: The technology stack is focused on cloud-native platforms, container orchestration, and infrastructure automation tools.
-
Frontend Technologies: N/A (This role is focused on infrastructure and does not involve frontend development)
-
Backend & Server Technologies:
- Kubernetes
- Nomad
- Terraform
- Ansible
- ArgoCD
- GitHub Actions
- AWS
- GCP
-
Development & DevOps Tools:
- Grafana (for monitoring and alerting)
- Aeron (messaging system, bonus skill)
👥 Team Culture & Values
- Web Development Values:
- The company values innovation, collaboration, and a focus on user experience.
- They prioritize performance optimization, accessibility, and code quality.
- Collaboration Style:
- The team encourages cross-functional collaboration between developers, designers, and stakeholders.
- They emphasize code review culture and peer programming practices.
- Knowledge sharing and technical mentoring are encouraged.
⚡ Challenges & Growth Opportunities
- Technical Challenges:
- Designing and implementing scalable infrastructure for a high-performance, low-latency trading platform.
- Ensuring system stability, scalability, and security in a decentralized exchange environment.
- Improving observability and reliability through enhanced logging, metrics, tracing, and alerting.
- Performing root cause analysis and driving continuous improvements in system resilience and incident response.
- Learning & Development Opportunities:
- Working on cutting-edge technology in the crypto exchange industry.
- Collaborating with experienced professionals in software engineering, blockchain development, and DevOps.
- Opportunities for professional growth and development, including mentorship and leadership roles.
💡 Interview Preparation
- 📝 Enhancement Note: Candidates should prepare for technical interviews focusing on infrastructure management, system reliability, and problem-solving. Familiarize themselves with the mentioned technology stack and be ready to discuss past projects and experiences related to low-latency distributed systems, infrastructure automation, and deployment pipelines.
📌 Application Steps
To apply for this Senior Site Reliability Engineer position:
- Submit your application through the provided link.
- Customize your resume to highlight relevant experience and skills, with a focus on infrastructure management, system reliability, and problem-solving.
- Prepare for technical interviews by brushing up on your knowledge of the mentioned technology stack and reviewing past projects involving low-latency distributed systems, infrastructure automation, and deployment pipelines.
- Research the company and the crypto exchange industry to demonstrate your understanding of the business and its technology stack.
Application Requirements
Candidates should have over 5 years of experience in DevOps or Site Reliability Engineering, with a strong background in low-latency distributed systems. Proficiency in cloud-native platforms and container orchestration tools is essential.