Infra Tech Lead Analyst - VP
📍 Job Overview
- Job Title: Infra Tech Lead Analyst - VP
- Company: Citi
- Location: Irving, Texas, United States
- Job Type: On-site
- Category: Infrastructure
- Date Posted: June 24, 2025
- Experience Level: 5-10 years
- Remote Status: On-site
🚀 Role Summary
- Lead and manage a team of Cloud Operations Support Engineers responsible for monitoring, troubleshooting, and maintaining the health of Citi's AWS and GCP environments.
- Ensure high availability, performance, and operational excellence by driving incident response, automation, and continuous improvement.
- Collaborate with various stakeholders, including product, engineering, and security teams, to deliver value-added outcomes.
📝 Enhancement Note: This role requires a strong technical leader with a proven track record in cloud operations, incident management, and process improvement. The ideal candidate will have experience in a large, complex, and global environment, preferably in the financial services industry.
💻 Primary Responsibilities
- Cloud Environment Management: Monitor AWS and GCP infrastructure and services to ensure availability, performance, and reliability. Perform incident management, including triage, impact assessment, and coordination with engineering teams to resolve issues.
- On-Call Rotation: Participate in on-call rotations for high-severity and major incident support coverage.
- Root Cause Analysis: Provide root cause analysis post-restoration of service and design testing approaches, complex processes, reporting streams, and assist with the automation of repetitive tasks.
- Technical Direction: Provide technical and strategic direction to team members and create, maintain, and enhance operational runbooks, SPOs, and knowledge base articles.
- Cloud Resource Provisioning: Support provisioning and configuration of cloud resources across multiple environments and implement and maintain monitoring, logging, and alerting tools.
- Compliance & Disaster Recovery: Ensure ongoing compliance with regulatory requirements and assist in deployments, patching, and disaster recovery procedures.
- Stakeholder Communication: Collaborate with senior stakeholders and act as an SME to other team members, effectively communicating technical concepts to non-technical audiences.
🎓 Skills & Qualifications
Education: Bachelor's degree or equivalent experience in a relevant field.
Experience: At least 6+ years of experience in infrastructure delivery roles with a proven track record of operational process change and improvement. Experience in cloud operations/support and site reliability engineering is required.
Required Skills:
- Hands-on experience with AWS and/or GCP.
- Proficiency with Infrastructure as Code (IaC) tools like Terraform, CloudFormation, or similar.
- Working knowledge of scripting (bash, Python, or similar).
- Strong understanding of networking, DNS, IAM, load balancing, and cloud-native services.
- Ability to develop projects required for design of metrics, analytical tools, benchmarking activities, and best practices.
- Ability to work with virtual/in-person teams and work under pressure to meet deadlines.
- Experience in a financial services or large, complex, and/or global environment is preferred.
Preferred Skills:
- Experience with incident management tools and processes.
- Familiarity with Agile methodologies and DevOps practices.
- Knowledge of ITIL frameworks and IT service management principles.
📊 Web Portfolio & Project Requirements
- Portfolio Essentials: Demonstrate your experience in cloud operations, incident management, and process improvement through case studies, success stories, and testimonials from previous colleagues or stakeholders.
- Technical Documentation: Showcase your technical documentation skills by providing examples of runbooks, SPOs, and knowledge base articles you've created or maintained. Highlight your ability to clearly explain complex technical concepts to both technical and non-technical audiences.
💵 Compensation & Benefits
Salary Range: $125,760 - $188,640 per year (Full-time, primary location: Irving, Texas, United States)
Benefits:
- Medical, dental, and vision coverage.
- 401(k) plan.
- Life, accident, and disability insurance.
- Wellness programs.
- Paid time off packages, including planned time off (vacation), unplanned time off (sick leave), and paid holidays.
🎯 Team & Company Context
🏢 Company Culture
Industry: Financial Services.
Company Size: Large, global organization with a complex and diverse technology environment.
Founded: 1812 (New York, United States).
Team Structure:
- The team consists of Cloud Operations Support Engineers, focused on monitoring, troubleshooting, and maintaining Citi's AWS and GCP environments.
- The role reports directly to the Head of Cloud Infrastructure and is responsible for leading and managing a team of engineers.
Development Methodology:
- Agile/Scrum methodologies are used for project management and delivery.
- Citi follows ITIL frameworks and IT service management principles for incident, problem, and change management.
Company Website: Citi
📝 Enhancement Note: Citi is a large, global financial services company with a complex and diverse technology environment. The ideal candidate for this role will have experience working in a large, complex, and global environment, preferably in the financial services industry.
📈 Career & Growth Analysis
Web Technology Career Level: This role is a senior-level position, requiring a proven track record in cloud operations, incident management, and process improvement. The ideal candidate will have experience leading teams and driving operational excellence in a large, complex, and global environment.
Reporting Structure: The role reports directly to the Head of Cloud Infrastructure and is responsible for leading and managing a team of Cloud Operations Support Engineers.
Technical Impact: The role has a significant impact on Citi's cloud infrastructure, ensuring high availability, performance, and operational excellence. The candidate will drive incident response, automation, and continuous improvement, collaborating with various stakeholders to deliver value-added outcomes.
Growth Opportunities:
- Technical Leadership: The role offers opportunities for technical leadership, mentoring junior team members, and driving operational excellence.
- Architecture Decisions: The candidate will have the opportunity to influence architecture decisions and contribute to the design and implementation of Citi's cloud infrastructure.
- Career Progression: With a proven track record in the role, the candidate may progress to a more senior leadership position within Citi's cloud infrastructure organization or explore other senior-level roles within the company.
📝 Enhancement Note: This role offers significant growth opportunities for the right candidate, including technical leadership, architecture decisions, and career progression within Citi's cloud infrastructure organization or the broader company.
🌐 Work Environment
Office Type: Citi's Irving, Texas, office is a large, modern facility with a collaborative work environment, designed to support the company's global operations.
Office Location(s): Irving, Texas, United States.
Workspace Context:
- The team works in a collaborative environment, with a focus on incident response, automation, and continuous improvement.
- The workspace is equipped with multiple monitors, testing devices, and other tools necessary for effective cloud operations and incident management.
- The team interacts regularly with various stakeholders, including product, engineering, and security teams, to deliver value-added outcomes.
Work Schedule: Full-time, on-site position with a standard workweek of 40 hours. The role participates in an on-call rotation for high-severity and major incident support coverage.
📝 Enhancement Note: The ideal candidate for this role will thrive in a collaborative, fast-paced environment and have experience working with virtual and in-person teams to drive operational excellence.
📄 Application & Technical Interview Process
Interview Process:
- Phone/Video Screen: A brief phone or video call to assess communication skills, cultural fit, and basic technical competencies.
- Technical Deep Dive: A comprehensive technical discussion focused on cloud operations, incident management, and process improvement. Expect to discuss your experience with AWS and GCP, Infrastructure as Code (IaC) tools, scripting, and other relevant technologies.
- Behavioral & Situational Questions: Assess your problem-solving skills, leadership abilities, and cultural fit through behavioral and situational interview questions.
- Final Evaluation: A final interview with senior leadership to evaluate your overall fit for the role and Citi's cloud infrastructure organization.
Portfolio Review Tips:
- Highlight your experience in cloud operations, incident management, and process improvement through case studies, success stories, and testimonials from previous colleagues or stakeholders.
- Showcase your technical documentation skills by providing examples of runbooks, SPOs, and knowledge base articles you've created or maintained.
- Emphasize your ability to clearly explain complex technical concepts to both technical and non-technical audiences.
Technical Challenge Preparation:
- Brush up on your AWS and GCP knowledge, focusing on incident management, automation, and continuous improvement.
- Review your experience with Infrastructure as Code (IaC) tools, scripting, and other relevant technologies.
- Prepare for behavioral and situational interview questions by reflecting on your past experiences and accomplishments in cloud operations, incident management, and process improvement.
ATS Keywords: AWS, GCP, Cloud Operations, Incident Management, Automation, Infrastructure as Code, Scripting, Networking, DNS, IAM, Load Balancing, Cloud Native Services, Agile, ITIL, Technical Leadership, Architecture Decisions, Career Progression, Financial Services, Global Environment.
📝 Enhancement Note: The interview process for this role is designed to assess the candidate's technical competencies, problem-solving skills, leadership abilities, and cultural fit. Expect a comprehensive evaluation of your experience in cloud operations, incident management, and process improvement.
🛠 Technology Stack & Web Infrastructure
Cloud Platforms:
- AWS (Amazon Web Services)
- GCP (Google Cloud Platform)
Infrastructure as Code (IaC) Tools:
- Terraform
- CloudFormation
Scripting Languages:
- Bash
- Python
Monitoring, Logging, & Alerting Tools:
- CloudWatch (AWS)
- Stackdriver (GCP)
- Prometheus
Incident Management & Collaboration Tools:
- Jira Service Management (formerly Jira Service Desk)
- Confluence
- Slack
📝 Enhancement Note: The technology stack for this role is focused on AWS and GCP cloud platforms, Infrastructure as Code (IaC) tools, scripting, and monitoring, logging, and alerting tools. The ideal candidate will have hands-on experience with these technologies and a strong understanding of cloud-native services, networking, DNS, IAM, and load balancing.
👥 Team Culture & Values
Cloud Operations Values:
- Proactive Monitoring: Proactively monitor Citi's cloud infrastructure to ensure high availability, performance, and operational excellence.
- Incident Response: Respond promptly and effectively to incidents, minimizing impact and restoring service as quickly as possible.
- Automation & Continuous Improvement: Automate repetitive tasks and continuously improve processes to drive operational excellence.
- Collaboration: Collaborate effectively with various stakeholders, including product, engineering, and security teams, to deliver value-added outcomes.
Collaboration Style:
- Cross-Functional Integration: Work closely with product, engineering, and security teams to ensure alignment with business objectives and technical requirements.
- Code Review Culture: Foster a culture of code review and knowledge sharing to improve the quality and maintainability of technical documentation.
- Mentoring & Knowledge Sharing: Encourage mentoring and knowledge sharing to help team members develop their skills and advance their careers.
📝 Enhancement Note: The ideal candidate for this role will share Citi's cloud operations values and thrive in a collaborative, fast-paced environment focused on incident response, automation, and continuous improvement.
⚡ Challenges & Growth Opportunities
Technical Challenges:
- Incident Management: Manage high-severity and major incidents, minimizing impact and restoring service as quickly as possible.
- Automation & Continuous Improvement: Automate repetitive tasks and continuously improve processes to drive operational excellence.
- Compliance & Regulatory Requirements: Ensure ongoing compliance with regulatory requirements and maintain a strong understanding of relevant laws, rules, and regulations.
- Emerging Technologies: Stay up-to-date with emerging cloud technologies and consider their integration into Citi's infrastructure.
Learning & Development Opportunities:
- Technical Skill Development: Expand your knowledge of AWS and GCP cloud platforms, Infrastructure as Code (IaC) tools, scripting, and other relevant technologies.
- Certification & Community Involvement: Pursue relevant certifications and engage with cloud technology communities to stay current with industry trends and best practices.
- Mentorship & Leadership Development: Develop your leadership skills through mentoring junior team members and driving operational excellence.
📝 Enhancement Note: The technical challenges and learning opportunities for this role are focused on incident management, automation, continuous improvement, compliance, and emerging technologies. The ideal candidate will have a strong desire to learn, grow, and drive operational excellence in Citi's cloud infrastructure organization.
💡 Interview Preparation
Technical Questions:
- Cloud Platforms: Demonstrate your hands-on experience with AWS and GCP, discussing your familiarity with incident management, automation, and continuous improvement.
- Infrastructure as Code (IaC) Tools: Explain your proficiency with Terraform, CloudFormation, or similar tools, and provide examples of how you've used them to automate infrastructure and improve processes.
- Scripting Languages: Showcase your working knowledge of scripting languages like Bash and Python, and provide examples of how you've used them to automate tasks and drive operational excellence.
- Incident Management: Discuss your experience with incident management, including triage, impact assessment, and coordination with engineering teams to resolve issues.
- Compliance & Regulatory Requirements: Explain your understanding of relevant laws, rules, and regulations, and how you ensure ongoing compliance in your cloud operations role.
Company & Culture Questions:
- Citi's Cloud Infrastructure: Demonstrate your understanding of Citi's cloud infrastructure and how your experience aligns with the company's technology environment.
- Financial Services Industry: Showcase your experience working in the financial services industry or a large, complex, and global environment, and how it has prepared you for this role.
- Citi's Cloud Operations Values: Explain how you embody Citi's cloud operations values, including proactive monitoring, incident response, automation, and collaboration.
Portfolio Presentation Strategy:
- Cloud Operations Case Studies: Present case studies or success stories demonstrating your experience in cloud operations, incident management, and process improvement.
- Technical Documentation Examples: Showcase examples of runbooks, SPOs, and knowledge base articles you've created or maintained, highlighting your ability to clearly explain complex technical concepts to both technical and non-technical audiences.
- Collaboration & Leadership Examples: Provide examples of how you've collaborated effectively with various stakeholders and driven operational excellence in your previous roles.
📝 Enhancement Note: The interview preparation for this role focuses on assessing the candidate's technical competencies, problem-solving skills, leadership abilities, and cultural fit. Expect a comprehensive evaluation of your experience in cloud operations, incident management, and process improvement, as well as your understanding of Citi's cloud infrastructure and the financial services industry.
📌 Application Steps
To apply for this Infra Tech Lead Analyst - VP position at Citi:
- Review the Job Description: Carefully read and understand the job description, highlighting the required skills, qualifications, and responsibilities.
- Tailor Your Resume: Customize your resume to emphasize your relevant experience, skills, and accomplishments in cloud operations, incident management, and process improvement.
- Prepare Your Portfolio: Curate a portfolio showcasing your experience in cloud operations, incident management, and process improvement, including case studies, success stories, and technical documentation examples.
- Research Citi: Learn about Citi's cloud infrastructure, technology environment, and culture to ensure a strong fit for the role and the company.
- Practice Interview Questions: Prepare for technical, behavioral, and situational interview questions by reflecting on your past experiences and accomplishments in cloud operations, incident management, and process improvement.
- Apply: Submit your application through the application link provided, including your tailored resume and portfolio.
⚠️ Important Notice: This enhanced job description includes AI-generated insights and web technology industry-standard assumptions. All details should be verified directly with the hiring organization before making application decisions.
Application Requirements
Candidates should have at least 6 years of experience in infrastructure delivery and hands-on experience with AWS and/or GCP. Proficiency in Infrastructure as Code tools and scripting is essential, along with strong analytical and communication skills.