Lead Infrastructure Engineer - Observability

Truist
Full_timeCharlotte, United States

📍 Job Overview

  • Job Title: Lead Infrastructure Engineer - Observability
  • Company: Truist
  • Location: Charlotte, NC (with remote flexibility)
  • Job Type: Full-Time
  • Category: Infrastructure & DevOps
  • Date Posted: July 14, 2025
  • Experience Level: Mid-Senior level (5-10 years)
  • Remote Status: Hybrid (on-site and remote)

🚀 Role Summary

  • Drive the design, implementation, and evolution of enterprise-grade observability capabilities across Truist's technology landscape.
  • Champion a shift from reactive monitoring to proactive, intelligence-driven observability.
  • Lead the strategy for metrics, traces, and synthetic monitoring, enabling end-to-end visibility, accelerated incident response, and a frictionless developer experience.
  • Collaborate with cross-functional teams to embed observability into CI/CD workflows and integrate signal-based insights into reliability, performance, and business outcomes.

📝 Enhancement Note: This role requires a strong background in observability, distributed systems, and cloud-native architectures to succeed in driving enterprise adoption of observability standards and practices.

💻 Primary Responsibilities

  • Observability Platform Architecture: Design and implement a modern, scalable observability platform rooted in OpenTelemetry (Otel) and enriched by complementary technologies.
  • Telemetry Pipeline Standardization: Lead efforts to standardize telemetry pipelines, embedding observability into CI/CD workflows, and integrating signal-based insights into reliability, performance, and business outcomes.
  • Incident Response & Resolution: Perform problem tracking, diagnosis, and root-cause analysis, replication, troubleshooting, and resolution for complex issues.
  • Team Leadership & Mentoring: Lead a team of observability engineers, providing direction, and mentoring less experienced teammates.
  • Vendor Management: Engage and manage outside vendors as needed.

📝 Enhancement Note: This role involves both technical and leadership responsibilities, requiring strong problem-solving skills, project management abilities, and the capacity to influence and drive change across multiple teams.

🎓 Skills & Qualifications

Education: Bachelor's degree in Computer Science, Engineering, or a related field.

Experience: 5+ years (preferably 6+) of experience in development or application support, with a focus on observability, distributed systems, and cloud-native architectures.

Required Skills:

  • In-depth knowledge of information systems and application support best practices.
  • Expertise in OpenTelemetry (Otel), including custom instrumentation, collector configuration, and pipeline design for traces, metrics, and logs.
  • Hands-on experience with observability tooling, such as Prometheus, Grafana, Jaeger, Loki, Elastic, Splunk, and/or Dynatrace in enterprise-grade environments.
  • Strong background in distributed systems, cloud-native architectures, and Kubernetes (K8s).
  • Proficiency in scripting or development languages (e.g., Python, Go, Bash, or Java) to automate telemetry integration, create custom exporters, and contribute to platform tooling.
  • Ability to plan and manage projects, solve complex problems, provide direction, and mentor less experienced teammates.
  • Excellent communication skills, with the ability to interpret and convey complex information.

Preferred Skills:

  • Experience with observability in financial services or large enterprise environments.
  • Familiarity with service meshes, APIs, and event-driven platforms.
  • Knowledge of business processes and competitive strategies related to the IT function.

📊 Web Portfolio & Project Requirements

  • Observability Platform Demonstration: Prepare a live demo showcasing your experience with OpenTelemetry, observability tools, and telemetry pipeline design.
  • Case Studies: Develop case studies highlighting your experience driving observability adoption, improving incident response times, and enhancing developer experience.
  • Code Samples: Include code samples demonstrating your scripting or development skills, focusing on telemetry integration and custom exporter creation.

📝 Enhancement Note: Tailor your portfolio to emphasize your observability expertise, focusing on platform architecture, telemetry pipeline standardization, and incident response and resolution.

💵 Compensation & Benefits

Salary Range: $120,000 - $170,000 per year (based on regional market data for similar roles in the Charlotte, NC area)

Benefits:

  • Medical, dental, and vision insurance
  • Life insurance and disability coverage
  • 401(k) plan with company match
  • Vacation and sick leave
  • Paid holidays
  • Defined benefit pension plan, restricted stock units, and deferred compensation plan (depending on the position and division)
  • Tax-preferred savings accounts

Working Hours: Full-time (40 hours/week) with flexible scheduling for deployment windows and maintenance.

📝 Enhancement Note: The salary range provided is an estimate based on regional market data for similar roles in the Charlotte, NC area. Actual salary may vary based on experience, skills, and company-specific factors.

🎯 Team & Company Context

🏢 Company Culture

Industry: Financial Services

Company Size: Large enterprise (over 50,000 employees)

Founded: 2019 (through the merger of BB&T and SunTrust Banks)

Team Structure:

  • The Observability team is part of the broader Infrastructure & Operations division, working closely with Engineering, SRE, and Platform teams.
  • The team consists of observability engineers, focusing on metrics, traces, and logs, with a mix of senior and mid-level talent.

Development Methodology:

  • Agile/Scrum methodologies for project management and sprint planning
  • Code reviews, testing, and quality assurance practices
  • CI/CD pipelines and automated deployment strategies

Company Website: Truist

📝 Enhancement Note: Truist is a large financial services company with a strong focus on technology and digital transformation. The Observability team plays a critical role in enabling end-to-end visibility, accelerating incident response, and improving the overall developer experience.

📈 Career & Growth Analysis

Web Technology Career Level: Mid-Senior level (5-10 years of experience) with opportunities for growth into senior roles, technical leadership, or architecture positions.

Reporting Structure: The Lead Infrastructure Engineer reports directly to the Director of Observability and collaborates with various teams, including Engineering, SRE, and Platform teams.

Technical Impact: This role has a significant impact on Truist's technology landscape, driving the design and adoption of enterprise-grade observability capabilities and influencing telemetry strategies across multiple teams.

Growth Opportunities:

  • Technical Growth: Deepen expertise in observability, distributed systems, and cloud-native architectures, with opportunities to contribute to open-source projects and emerging technologies.
  • Leadership Development: Gain experience leading a team of observability engineers, with potential opportunities to mentor and develop less experienced teammates.
  • Architecture & Strategy: Contribute to the development of Truist's observability strategy, with opportunities to influence architecture decisions and shape the company's technology roadmap.

📝 Enhancement Note: This role offers significant growth opportunities for experienced observability engineers looking to advance their careers in a large enterprise environment.

🌐 Work Environment

Office Type: Hybrid (on-site and remote) with a focus on collaboration and cross-functional teamwork.

Office Location(s): Charlotte, NC (214 North Tryon Street) with remote flexibility for some positions.

Workspace Context:

  • Collaborative Environment: The Observability team works closely with various teams, fostering a collaborative and cross-functional work environment.
  • Modern Workspace: Truist provides state-of-the-art workspaces with multiple monitors, testing devices, and access to the latest development tools.
  • Flexible Scheduling: Truist offers flexible scheduling for deployment windows, maintenance, and project deadlines.

Work Schedule: Full-time (40 hours/week) with flexible scheduling for deployment windows, maintenance, and project deadlines.

📝 Enhancement Note: Truist's hybrid work environment encourages collaboration and cross-functional teamwork, with flexible scheduling to accommodate deployment windows, maintenance, and project deadlines.

📄 Application & Technical Interview Process

Interview Process:

  1. Phone Screen: A brief phone call to discuss your experience, skills, and career goals (15-30 minutes).
  2. Technical Deep Dive: A comprehensive technical interview focused on observability, distributed systems, and cloud-native architectures (60-90 minutes).
  3. Behavioral & Cultural Fit: An in-depth conversation to assess your problem-solving skills, leadership potential, and cultural fit within the Observability team (60-90 minutes).
  4. Final Review: A meeting with the hiring manager and other key stakeholders to discuss your qualifications and fit for the role (30-45 minutes).

Portfolio Review Tips:

  • Observability Platform Demonstration: Prepare a live demo showcasing your experience with OpenTelemetry, observability tools, and telemetry pipeline design.
  • Case Studies: Develop case studies highlighting your experience driving observability adoption, improving incident response times, and enhancing developer experience.
  • Code Samples: Include code samples demonstrating your scripting or development skills, focusing on telemetry integration and custom exporter creation.

Technical Challenge Preparation:

  • Observability Platform Design: Prepare for a hands-on exercise designing and implementing an observability platform using OpenTelemetry and complementary tools.
  • Incident Response Scenario: Practice diagnosing and resolving complex incidents, demonstrating your problem-solving skills and ability to work under pressure.
  • Leadership & Mentoring: Prepare for questions assessing your leadership potential, mentoring skills, and ability to drive change within a team.

ATS Keywords: OpenTelemetry, Prometheus, Grafana, Jaeger, Loki, Elastic, Splunk, Dynatrace, metrics, traces, logs, distributed systems, cloud-native architectures, Kubernetes, CI/CD, incident response, observability, leadership, mentoring, problem-solving, project management.

📝 Enhancement Note: Truist's interview process is designed to assess your technical skills, problem-solving abilities, leadership potential, and cultural fit within the Observability team. Prepare thoroughly for each stage of the interview process, focusing on your observability expertise and relevant experience.

🛠 Technology Stack & Web Infrastructure

Observability Platform:

  • OpenTelemetry (Otel) for metrics, traces, and logs
  • Prometheus for metrics and alerting
  • Grafana for data visualization and dashboarding
  • Jaeger for distributed tracing
  • Loki for log aggregation and querying
  • Elastic, Splunk, or Dynatrace for commercial APM solutions (depending on the specific use case)

Cloud & Infrastructure:

  • Amazon Web Services (AWS) and Microsoft Azure for cloud infrastructure
  • Kubernetes (K8s) for container orchestration
  • Terraform for infrastructure as code (IaC) and provisioning
  • Ansible for configuration management and automation

📝 Enhancement Note: Truist's observability platform is built on OpenTelemetry and enriched by complementary tools, providing a comprehensive solution for metrics, traces, and logs. The company leverages both AWS and Azure for cloud infrastructure, with a strong focus on Kubernetes for container orchestration.

👥 Team Culture & Values

Observability Team Values:

  • Proactive & Intelligence-Driven: Embrace a proactive approach to observability, using data-driven insights to anticipate and mitigate issues before they impact users.
  • Collaborative & Cross-Functional: Foster a collaborative work environment, partnering with Engineering, SRE, and Platform teams to drive observability adoption and enhance the developer experience.
  • Innovative & Adaptable: Stay current with emerging technologies and best practices, continuously improving our observability platform and processes.
  • User-Centric & Business-Aligned: Focus on the user experience and align observability efforts with business outcomes, driving value for Truist and its customers.

Collaboration Style:

  • Cross-Functional Integration: Work closely with Engineering, SRE, and Platform teams to embed observability into CI/CD workflows and integrate signal-based insights into reliability, performance, and business outcomes.
  • Code Review Culture: Encourage a culture of code reviews, peer programming, and knowledge sharing to ensure high-quality observability tools and telemetry pipelines.
  • Mentoring & Knowledge Sharing: Foster a culture of mentoring and knowledge sharing, empowering team members to grow their skills and advance their careers.

📝 Enhancement Note: Truist's Observability team values a proactive, collaborative, and user-centric approach to observability, driving innovation and business alignment within the organization.

⚡ Challenges & Growth Opportunities

Technical Challenges:

  • Observability Platform Scalability: Design and implement a scalable observability platform that can handle Truist's growing infrastructure and user base.
  • Telemetry Data Management: Develop strategies for managing and storing telemetry data efficiently, ensuring optimal performance and cost-effectiveness.
  • Incident Response Optimization: Continuously improve incident response times and resolution processes, minimizing the impact on users and business operations.

Learning & Development Opportunities:

  • Emerging Technologies: Stay current with emerging observability technologies and best practices, attending conferences, webinars, and online courses to expand your knowledge and skillset.
  • Leadership Development: Gain experience leading a team of observability engineers, with opportunities to mentor and develop less experienced teammates.
  • Architecture & Strategy: Contribute to the development of Truist's observability strategy, influencing architecture decisions and shaping the company's technology roadmap.

📝 Enhancement Note: Truist's Observability team faces significant technical challenges, requiring experienced engineers to design and implement scalable observability platforms, manage telemetry data efficiently, and optimize incident response processes. This role offers numerous learning and development opportunities, with a strong focus on emerging technologies, leadership development, and architecture strategy.

💡 Interview Preparation

Technical Questions:

  • Observability Platform Architecture: Describe your approach to designing and implementing a scalable observability platform using OpenTelemetry and complementary tools.
  • Telemetry Pipeline Standardization: Explain your strategy for standardizing telemetry pipelines, embedding observability into CI/CD workflows, and integrating signal-based insights into reliability, performance, and business outcomes.
  • Incident Response & Resolution: Walk through a complex incident you've resolved, detailing your problem-solving approach, root cause analysis, and resolution strategy.

Company & Culture Questions:

  • Observability Team Culture: Explain how you would foster a proactive, collaborative, and user-centric approach to observability within the Truist team.
  • Business Alignment: Describe your experience aligning observability efforts with business outcomes and driving value for the organization and its customers.
  • Technical Leadership: Discuss your experience leading a team of observability engineers, highlighting your mentoring, coaching, and decision-making skills.

Portfolio Presentation Strategy:

  • Observability Platform Demonstration: Prepare a live demo showcasing your experience with OpenTelemetry, observability tools, and telemetry pipeline design.
  • Case Studies: Develop case studies highlighting your experience driving observability adoption, improving incident response times, and enhancing developer experience.
  • Code Samples: Include code samples demonstrating your scripting or development skills, focusing on telemetry integration and custom exporter creation.

📝 Enhancement Note: Truist's interview process focuses on assessing your technical skills, problem-solving abilities, leadership potential, and cultural fit within the Observability team. Prepare thoroughly for each stage of the interview process, focusing on your observability expertise and relevant experience.

📌 Application Steps

To apply for this Lead Infrastructure Engineer - Observability position at Truist:

  1. Tailor Your Resume: Highlight your observability experience, technical skills, and leadership potential, emphasizing your ability to drive enterprise adoption of observability standards and practices.
  2. Prepare Your Portfolio: Include a live demo showcasing your experience with OpenTelemetry, observability tools, and telemetry pipeline design, along with case studies and code samples demonstrating your scripting or development skills.
  3. Research Truist: Familiarize yourself with Truist's company culture, values, and business objectives, focusing on how your observability expertise can drive business value and enhance the developer experience.
  4. Practice Technical Interview Questions: Review the technical interview questions provided and practice your responses, focusing on your problem-solving approach, leadership potential, and cultural fit within the Observability team.

⚠️ Important Notice: This enhanced job description includes AI-generated insights and web technology industry-standard assumptions. All details should be verified directly with the hiring organization before making application decisions.

Application Requirements

Candidates must have a Bachelor's degree and five years of experience in development or application support, or an equivalent combination of education and work experience. In-depth knowledge of information systems and the ability to mentor less experienced teammates are also required.