Senior Database Reliability Engineer

ClickUp
Full_timeBulgaria

📍 Job Overview

  • Job Title: Senior Database Reliability Engineer
  • Company: ClickUp
  • Location: Bulgaria
  • Job Type: Full-Time
  • Category: DevOps, Infrastructure
  • Date Posted: 2025-08-08
  • Experience Level: 5-10 years
  • Remote Status: Remote OK

🚀 Role Summary

  • Key Responsibilities: Improve ClickUp's infrastructure stability, availability, and reliability. Build software solutions for large-scale distributed systems. Define SLOs, SLIs, and error budgeting. Manage capacity and performance.
  • Key Technologies: Cloud (AWS, GCP, Azure), Infrastructure as Code (IaC), Databases (PostgreSQL, DynamoDB, AuroraDB), Monitoring Tools, Containers (ECS, EKS), JavaScript/TypeScript.

💻 Primary Responsibilities

  • Incident Management: Own and drive the incident management process. Participate in the team's follow-the-sun model.
  • Observability: Improve and own ClickUp's observability across all services. Define SLOs and SLIs, and introduce error budgeting.
  • Automation: Automate critical portions of ClickUp's engineering processes to minimize risk and maximize innovation speed.
  • Capacity Management: Manage capacity and performance to scale ClickUp's infrastructure on public and private clouds worldwide.
  • Tool Development: Build software solutions and tools to enable reliability and operability of large-scale distributed systems.
  • Collaboration: Work cross-functionally with engineering teams to improve ClickUp's infrastructure and services.

📝 Enhancement Note: This role requires a strong software engineering background with a focus on operational and SRE disciplines. Candidates should be comfortable working in a dynamic, high-paced environment and have a deep understanding of large-scale distributed systems.

🎓 Skills & Qualifications

Education: Bachelor's degree in Computer Science, Engineering, or a related field. Relevant experience may substitute for a degree.

Experience: 5-10 years of experience in site reliability engineering, infrastructure management, or a related role. Proven track record of improving system reliability and performance.

Required Skills:

  • Strong software engineering background with operational or SRE mentality.
  • Production experience in major cloud environments (AWS, GCP, Azure).
  • Experience with infrastructure as code (IaC) tools and operations.
  • Proficiency in *nix-based operating systems and advanced troubleshooting commands.
  • Experience with compute services (VMs, containers, orchestration systems).
  • Familiarity with RDBMS and NoSQL storage solutions, indexing, locking, replication, and sharding.
  • Experience with logging, monitoring, and alerting tools. Understanding of SLOs and SLIs.

Preferred Skills:

  • Experience with ClickUp's tech stack (CloudFormation/CDK, ECS, ElasticBeanstalk, PostgreSQL, DynamoDB, AuroraDB, TypeScript, or JavaScript-based frameworks).
  • Familiarity with capacity planning and management.
  • Experience with CI/CD pipelines and deployment automation.

📝 Enhancement Note: While specific technologies are not required, experience with ClickUp's tech stack would be a significant advantage. Candidates should be comfortable learning new technologies quickly and applying them to solve complex problems.

📊 Web Portfolio & Project Requirements

Portfolio Essentials:

  • Demonstrate your understanding of large-scale distributed systems and their internals.
  • Showcase your experience with incident management, error budgeting, and capacity planning.
  • Highlight your ability to automate critical engineering processes and build tools for reliability and operability.

Technical Documentation:

  • Provide detailed documentation of your past projects, including system architecture, database schema, and deployment processes.
  • Include any relevant open-source contributions or blog posts demonstrating your technical expertise.

💵 Compensation & Benefits

Salary Range: $120,000 - $180,000 per year (based on experience and regional adjustments)

Benefits:

  • Competitive health, dental, and vision insurance.
  • 401(k) with company match.
  • Unlimited PTO and flexible work hours.
  • Professional development opportunities and annual learning stipend.
  • Company-sponsored team-building events and outings.

Working Hours: Full-time (40 hours/week) with flexible hours and remote work options.

📝 Enhancement Note: The salary range provided is based on market research for senior database reliability engineering roles in Bulgaria and the United States (given the remote OK status). Benefits may vary based on the candidate's location and individual circumstances.

🎯 Team & Company Context

🏢 Company Culture

Industry: Technology, SaaS

Company Size: Medium to Large (3,000+ employees)

Founded: 2017

Team Structure:

  • ClickUp is organized into cross-functional teams, with each team responsible for a specific aspect of the product.
  • The infrastructure team consists of site reliability engineers, DevOps engineers, and database administrators.
  • The team follows a flat hierarchy, with a strong emphasis on collaboration and autonomy.

Development Methodology:

  • Agile/Scrum methodologies with bi-weekly sprint planning.
  • Regular code reviews, pair programming, and testing practices.
  • Continuous integration and deployment pipelines with automated testing and monitoring.

Company Website: ClickUp

📝 Enhancement Note: ClickUp's culture emphasizes innovation, collaboration, and high-paced growth. The company values autonomy and expects employees to take ownership of their work and contribute to the team's success.

📈 Career & Growth Analysis

Web Technology Career Level: Senior Database Reliability Engineer - Leads the improvement of ClickUp's infrastructure reliability and performance. Mentors junior team members and contributes to architectural decisions.

Reporting Structure: Reports directly to the Director of Site Reliability Engineering or a similar role, depending on the organization's structure.

Technical Impact: Directly influences ClickUp's infrastructure stability, availability, and scalability. Contributes to architectural decisions that impact the entire product.

Growth Opportunities:

  • Technical Growth: Deepen your expertise in large-scale distributed systems, database management, and site reliability engineering.
  • Leadership Growth: Develop your leadership skills by mentoring junior team members and contributing to team strategy.
  • Architectural Growth: Gain experience in designing and implementing large-scale distributed systems and databases.

📝 Enhancement Note: ClickUp's fast-paced growth and flat hierarchy provide ample opportunities for career progression and technical development. Senior Database Reliability Engineers can expect to take on more responsibility and leadership roles as the company expands.

🌐 Work Environment

Office Type: Hybrid (remote work available)

Office Location(s): San Diego, CA, USA; Sofia, Bulgaria; and remote team members worldwide.

Workspace Context:

  • Remote Work: Remote team members receive a home office stipend and are expected to maintain a dedicated workspace.
  • On-site Work: On-site team members work in open, collaborative spaces with access to dedicated meeting rooms and quiet areas.
  • Cross-functional Collaboration: ClickUp's teams work closely together, with regular check-ins and stand-ups to ensure alignment and progress.

Work Schedule: Flexible hours with a focus on results and delivery. Core hours are 10:00 AM - 4:00 PM UTC.

📝 Enhancement Note: ClickUp's hybrid work environment encourages a healthy work-life balance while maintaining a high level of collaboration and productivity. Remote team members can expect to work asynchronously with team members in different time zones.

📄 Application & Technical Interview Process

Interview Process:

  1. Phone Screen: A brief call to discuss your experience, motivations, and expectations for the role.
  2. Technical Deep Dive: A detailed conversation focused on your technical skills, past projects, and problem-solving approach.
  3. System Design: A case study or architecture design exercise to evaluate your ability to make critical decisions and trade-offs in a large-scale system.
  4. Final Interview: A conversation with the hiring manager or a senior team member to assess your cultural fit and long-term potential within the organization.

Portfolio Review Tips:

  • Highlight your experience with large-scale distributed systems, incident management, and error budgeting.
  • Include detailed documentation of your past projects, including system architecture, database schema, and deployment processes.
  • Showcase your ability to automate critical engineering processes and build tools for reliability and operability.

Technical Challenge Preparation:

  • Brush up on your knowledge of large-scale distributed systems, databases, and cloud infrastructure.
  • Practice system design exercises and prepare for architecture trade-off discussions.
  • Familiarize yourself with ClickUp's tech stack and be ready to discuss how you would approach specific challenges within the organization.

ATS Keywords:

  • Programming Languages: JavaScript, TypeScript, Python, Bash
  • Web Frameworks: Node.js, Express.js, AWS Lambda, AWS SAM
  • Databases: PostgreSQL, DynamoDB, AuroraDB, Redis
  • Cloud Platforms: AWS, GCP, Azure
  • Infrastructure as Code (IaC): CloudFormation, CDK, Terraform
  • Monitoring Tools: Prometheus, Grafana, Datadog, New Relic
  • Containerization: Docker, ECS, EKS, Kubernetes
  • CI/CD Pipelines: Jenkins, GitHub Actions, AWS CodePipeline
  • Soft Skills: Problem-solving, troubleshooting, incident management, mentoring, leadership

📝 Enhancement Note: ClickUp's interview process focuses on evaluating candidates' technical skills, problem-solving abilities, and cultural fit. Prepare for a comprehensive assessment of your experience and potential impact on the organization.

🛠 Technology Stack & Web Infrastructure

Cloud Platforms:

  • AWS: ClickUp's primary cloud provider, used for most production workloads.
  • GCP: Used for specific services and workloads, such as machine learning and data processing.
  • Azure: Used for specific services and workloads, such as identity management and authentication.

Infrastructure as Code (IaC):

  • CloudFormation: ClickUp's primary IaC tool, used to manage and provision AWS resources.
  • CDK: ClickUp uses CDK to define and provision cloud resources using TypeScript.
  • Terraform: Used for specific services and workloads, such as multi-cloud deployments and infrastructure management.

Databases:

  • PostgreSQL: ClickUp's primary relational database, used for most production workloads.
  • DynamoDB: Used for NoSQL data storage and caching.
  • AuroraDB: ClickUp's managed relational database service, used for specific workloads and high-availability setups.

Monitoring Tools:

  • Prometheus: ClickUp's primary monitoring tool, used to collect and analyze metrics from its services and infrastructure.
  • Grafana: Used to visualize Prometheus data and create dashboards for ClickUp's services.
  • Datadog: Used for specific services and workloads, such as application performance monitoring and log aggregation.
  • New Relic: Used for specific services and workloads, such as end-user performance monitoring and error tracking.

Containerization:

  • Docker: ClickUp uses Docker to package and deploy its services.
  • ECS: ClickUp's primary container orchestration platform, used to manage and scale its services.
  • EKS: Used for specific services and workloads, such as Kubernetes-based deployments and managed Kubernetes clusters.

CI/CD Pipelines:

  • Jenkins: ClickUp's primary CI/CD pipeline tool, used to automate builds, tests, and deployments.
  • GitHub Actions: Used for specific services and workloads, such as GitHub-based deployments and pull request automation.
  • AWS CodePipeline: Used for specific services and workloads, such as AWS-based deployments and infrastructure as code (IaC) automation.

📝 Enhancement Note: ClickUp's technology stack is designed to be flexible and adaptable, allowing the organization to leverage the best tools for each specific workload and use case. Candidates should be comfortable working with multiple technologies and learning new tools as needed.

👥 Team Culture & Values

Web Development Values:

  • User-Centric: ClickUp prioritizes user experience and user needs in all aspects of its product development.
  • Performance-Driven: ClickUp focuses on building high-performance, scalable, and reliable systems to support its growing user base.
  • Collaborative: ClickUp encourages cross-functional collaboration and knowledge sharing to drive innovation and continuous improvement.
  • Innovative: ClickUp fosters a culture of experimentation, iteration, and continuous learning to stay ahead of the competition.

Collaboration Style:

  • Cross-Functional Integration: ClickUp's teams work closely together, with regular check-ins and stand-ups to ensure alignment and progress.
  • Code Review Culture: ClickUp emphasizes code reviews and pair programming to maintain high coding standards and share knowledge across the team.
  • Knowledge Sharing: ClickUp encourages team members to share their expertise and learn from one another through workshops, brown bag sessions, and mentorship programs.

📝 Enhancement Note: ClickUp's team culture emphasizes collaboration, innovation, and continuous learning. Candidates should be comfortable working in a dynamic, fast-paced environment and contributing to the team's success.

⚡ Challenges & Growth Opportunities

Technical Challenges:

  • Large-Scale Distributed Systems: Design and implement large-scale distributed systems that can handle petabytes of data and serve thousands of users daily.
  • Incident Management: Develop and improve incident management processes to minimize downtime and ensure high availability.
  • Error Budgeting: Define and manage error budgets to balance reliability and innovation.
  • Capacity Planning: Forecast and manage ClickUp's infrastructure capacity to support its growing user base and workloads.
  • Observability: Improve and maintain ClickUp's observability to ensure high performance and identify potential issues proactively.

Learning & Development Opportunities:

  • Technical Skill Development: Deepen your expertise in large-scale distributed systems, database management, and site reliability engineering.
  • Emerging Technologies: Stay up-to-date with the latest trends and best practices in cloud infrastructure, databases, and monitoring tools.
  • Leadership Development: Develop your leadership skills by mentoring junior team members and contributing to team strategy.
  • Architectural Decision-Making: Gain experience in designing and implementing large-scale distributed systems and databases.

📝 Enhancement Note: ClickUp's fast-paced growth and dynamic environment present numerous challenges and opportunities for technical and professional development. Senior Database Reliability Engineers can expect to take on more responsibility and leadership roles as the company expands.

💡 Interview Preparation

Technical Questions:

  • System Design: Describe your approach to designing large-scale distributed systems, including trade-offs, constraints, and performance considerations.
  • Incident Management: Walk through your process for managing and resolving critical incidents, including communication, escalation, and post-mortem analysis.
  • Error Budgeting: Explain how you would define and manage error budgets for a large-scale distributed system, including SLOs, SLIs, and error rate calculations.
  • Capacity Planning: Describe your approach to forecasting and managing the capacity of a large-scale distributed system, including workload modeling, resource allocation, and scaling strategies.

Company & Culture Questions:

  • ClickUp's Culture: Explain how you would contribute to ClickUp's collaborative, innovative, and user-centric culture.
  • Cross-Functional Collaboration: Describe your experience working with cross-functional teams and how you would ensure alignment and progress in a dynamic, fast-paced environment.
  • User Experience Impact: Explain how you would ensure that ClickUp's infrastructure and services support a seamless user experience, including performance, availability, and scalability.

Portfolio Presentation Strategy:

  • System Architecture: Present a detailed architecture diagram of a large-scale distributed system you have designed or contributed to, highlighting your approach to trade-offs, constraints, and performance considerations.
  • Incident Management: Describe a critical incident you have managed and resolved, including your communication strategy, escalation process, and post-mortem analysis.
  • Error Budgeting: Present a detailed error budgeting plan for a large-scale distributed system, including SLOs, SLIs, and error rate calculations.
  • Capacity Planning: Describe your approach to forecasting and managing the capacity of a large-scale distributed system, including workload modeling, resource allocation, and scaling strategies.

📝 Enhancement Note: ClickUp's interview process focuses on evaluating candidates' technical skills, problem-solving abilities, and cultural fit. Prepare for a comprehensive assessment of your experience and potential impact on the organization.

📌 Application Steps

To apply for this Senior Database Reliability Engineer position at ClickUp:

  1. Customize Your Portfolio: Highlight your experience with large-scale distributed systems, incident management, and error budgeting. Include detailed documentation of your past projects, including system architecture, database schema, and deployment processes.
  2. Optimize Your Resume: Emphasize your technical skills, past projects, and achievements in infrastructure management, database administration, and site reliability engineering. Include relevant keywords and phrases to improve search relevance and match with ClickUp's ATS system.
  3. Prepare for Technical Challenges: Brush up on your knowledge of large-scale distributed systems, databases, and cloud infrastructure. Practice system design exercises and architecture trade-off discussions. Familiarize yourself with ClickUp's tech stack and be ready to discuss how you would approach specific challenges within the organization.
  4. Research ClickUp: Learn about ClickUp's product, market position, and competition. Understand ClickUp's user base, target industries, and growth strategies. Prepare thoughtful questions about ClickUp's business, culture, and long-term vision to demonstrate your interest and engagement.

⚠️ Important Notice: This enhanced job description includes AI-generated insights and web development/server administration industry-standard assumptions. All details should be verified directly with ClickUp's hiring organization before making application decisions.

Application Requirements

The ideal candidate should have strong software engineering skills with an operational or SRE mentality, along with production experience in major cloud environments. Familiarity with infrastructure management, operating systems, and observability tools is also essential.