Senior Site Reliability Engineer

FIS
Full_timeSerbia

📍 Job Overview

  • Job Title: Senior Site Reliability Engineer
  • Company: FIS
  • Location: SRB BELG 136B FLR2, Serbia
  • Job Type: Full-Time, Hybrid (3 days in-office)
  • Category: DevOps, Infrastructure
  • Date Posted: 2025-07-24
  • Experience Level: 5-10 years

🚀 Role Summary

  • Drive innovation and growth for Banking Solutions, Payments, and Capital Markets business.
  • Transform the organization into a leader in the competitive banking, payments, and investment landscape.
  • Collaborate with cross-functional teams to ensure application reliability, availability, and performance.
  • Implement automation, monitoring, and incident response strategies to minimize downtime and optimize response times.

📝 Enhancement Note: This role requires a strong background in site reliability engineering, with a focus on cloud platforms, monitoring tools, and incident management. Candidates should be comfortable working in a hybrid environment and collaborating with various teams to drive customer-centric innovation and automation.

💻 Primary Responsibilities

  • Design and Maintain Monitoring Solutions: Develop and maintain monitoring solutions for infrastructure, application performance, and user experience.
  • Implement Automation Tools: Streamline tasks, scale infrastructure, and ensure seamless deployments using automation tools.
  • Ensure Application Reliability and Performance: Minimize downtime and optimize response times by ensuring application reliability, availability, and performance.
  • Lead Incident Response: Identify, triage, resolve, and conduct post-incident analysis for critical incidents.
  • Capacity Planning and Performance Tuning: Plan for capacity, optimize performance, and ensure resource efficiency.
  • Collaborate with Security Teams: Implement best practices and ensure compliance with security standards.
  • Manage Deployment Pipelines: Ensure consistent and reliable app deployments through configuration management and deployment pipelines.
  • Develop and Test Disaster Recovery Plans: Ensure business continuity by developing and testing disaster recovery plans and backup strategies.
  • Collaborate with Development, QA, DevOps, and Product Teams: Align on reliability goals and incident response processes.
  • Provide 24/7 Support: Participate in on-call rotations and provide support for critical incidents.

📝 Enhancement Note: This role requires a strong focus on incident management, troubleshooting, and collaboration. Candidates should be comfortable working in a dynamic environment and have a proven track record in ensuring application reliability and performance.

🎓 Skills & Qualifications

Education: Bachelor's degree in Computer Science, Engineering, or a related field. Relevant experience may be considered in lieu of a degree.

Experience: 5-10 years of experience in site reliability engineering, with a focus on cloud platforms, monitoring tools, and incident management.

Required Skills:

  • Proficiency in development technologies, architectures, and platforms (web, API).
  • Experience with cloud platforms (AWS, Azure, Google Cloud) and Infrastructure as Code (IaC) tools.
  • Knowledge of monitoring tools (Prometheus, Grafana, DataDog) and logging frameworks (Splunk, ELK Stack).
  • Experience in incident management and post-mortem reviews.
  • Strong troubleshooting skills for complex technical issues.
  • Proficiency in scripting languages (Python, Bash) and automation tools (Terraform, Ansible).
  • Experience with CI/CD pipelines (Jenkins, GitLab CI/CD, Azure DevOps).
  • Ownership approach to engineering and product outcomes.
  • Excellent interpersonal communication, negotiation, and influencing skills.

Preferred Skills:

  • Experience with containerization (Docker, Kubernetes) and orchestration tools.
  • Familiarity with infrastructure as code (IaC) tools (Terraform, CloudFormation).
  • Knowledge of infrastructure security best practices and compliance frameworks.
  • Experience with configuration management tools (Ansible, Puppet, Chef).
  • Familiarity with Agile methodologies and DevOps practices.

📝 Enhancement Note: This role requires a strong background in site reliability engineering, with a focus on cloud platforms, monitoring tools, and incident management. Candidates should have a proven track record in ensuring application reliability and performance, as well as strong troubleshooting and communication skills.

📊 Web Portfolio & Project Requirements

Portfolio Essentials:

  • Demonstrate experience in designing and maintaining monitoring solutions for infrastructure, application performance, and user experience.
  • Showcase automation tools implementation to streamline tasks, scale infrastructure, and ensure seamless deployments.
  • Highlight incident management and resolution skills, with examples of post-incident analysis and lessons learned.
  • Display capacity planning and performance tuning projects, with measurable improvements in application reliability, availability, and performance.

Technical Documentation:

  • Document incident response processes, including identification, triage, resolution, and post-incident analysis.
  • Explain capacity planning strategies and performance tuning techniques used to optimize resource efficiency.
  • Describe automation tools implementation and their impact on streamlining tasks and scaling infrastructure.
  • Detail disaster recovery plans and backup strategies, including testing and validation processes.

📝 Enhancement Note: This role requires a strong focus on incident management, troubleshooting, and collaboration. Candidates should be comfortable working in a dynamic environment and have a proven track record in ensuring application reliability and performance. A well-structured portfolio demonstrating these skills will be essential for success in this role.

💵 Compensation & Benefits

Salary Range: The salary range for this role is estimated to be between €60,000 and €80,000 per year, based on regional market data and the candidate's experience level. This estimate is inclusive of base salary and any bonuses or incentives.

Benefits:

  • Private Healthcare
  • 27 Days of Vacation
  • Hybrid Work Model (3 days in-office)

Working Hours: The standard workweek is 40 hours, with flexible working hours and the option to work remotely for part of the week. Deployment windows and maintenance may require occasional availability outside of standard business hours.

📝 Enhancement Note: The salary range provided is an estimate based on regional market data and the candidate's experience level. Actual compensation may vary depending on the company's internal policies and the individual's qualifications.

🎯 Team & Company Context

🏢 Company Culture

Industry: FIS is a global leader in financial services technology, providing innovative software solutions and services to the banking, payments, and capital markets industries.

Company Size: FIS has a large global workforce, with over 55,000 employees across more than 100 countries. This size provides opportunities for collaboration and growth within the organization.

Founded: FIS was founded in 1989 and has since grown into a leading provider of technology solutions for the financial services industry.

Team Structure:

  • The Site Reliability Engineering team is responsible for ensuring the reliability, availability, and performance of FIS's applications and infrastructure.
  • The team works closely with development, QA, DevOps, and product teams to align on reliability goals and incident response processes.
  • The team is organized into smaller, focused groups, each responsible for specific applications or infrastructure components.

Development Methodology:

  • FIS uses Agile methodologies, such as Scrum, to manage development and delivery processes.
  • The company emphasizes collaboration, continuous improvement, and customer-centric innovation.
  • FIS's DevOps approach promotes close collaboration between development, operations, and other teams to ensure efficient and reliable software delivery.

Company Website: fisglobal.com

📝 Enhancement Note: FIS's large global workforce and focus on financial services technology provide opportunities for collaboration and growth within the organization. The company's emphasis on Agile methodologies and DevOps practices fosters a culture of continuous improvement and customer-centric innovation.

📈 Career & Growth Analysis

Web Technology Career Level: This role is a senior-level position, requiring a strong background in site reliability engineering, with a focus on cloud platforms, monitoring tools, and incident management. Candidates should have a proven track record in ensuring application reliability and performance, as well as strong troubleshooting and communication skills.

Reporting Structure: The Senior Site Reliability Engineer reports directly to the Site Reliability Engineering Manager and works closely with development, QA, DevOps, and product teams to align on reliability goals and incident response processes.

Technical Impact: The Senior Site Reliability Engineer plays a critical role in ensuring the reliability, availability, and performance of FIS's applications and infrastructure. Their work directly impacts the user experience and the company's ability to deliver innovative software solutions and services to its customers.

Growth Opportunities:

  • Technical Growth: Deepen expertise in site reliability engineering, cloud platforms, monitoring tools, and incident management. Explore emerging technologies and trends in the field to stay current and relevant.
  • Leadership Growth: Develop leadership skills by mentoring junior team members, driving team initiatives, and contributing to strategic decision-making processes.
  • Architecture and Design: Expand knowledge of application architecture and design principles, contributing to the development of scalable, reliable, and performant systems.

📝 Enhancement Note: This role offers significant opportunities for growth, both technically and in terms of leadership and architecture. Candidates should be eager to learn, adapt, and contribute to the company's transformation journey.

🌐 Work Environment

Office Type: FIS's office in SRB BELG 136B FLR2, Serbia, is a modern, collaborative workspace designed to foster innovation and creativity. The office features open-plan workspaces, meeting rooms, and breakout areas.

Office Location(s): SRB BELG 136B FLR2, Serbia

Workspace Context:

  • Collaboration: The office encourages collaboration and cross-functional teamwork, with open-plan workspaces and meeting rooms designed for team discussions and brainstorming sessions.
  • Technology: The office is equipped with modern technology, including high-speed internet, multiple monitors, and testing devices, to support the development and testing of FIS's software solutions.
  • Flexibility: The hybrid work model allows employees to balance their time between working on-site and remotely, providing flexibility and work-life balance.

Work Schedule: The standard workweek is 40 hours, with flexible working hours and the option to work remotely for part of the week. Deployment windows and maintenance may require occasional availability outside of standard business hours.

📝 Enhancement Note: FIS's modern, collaborative workspace and hybrid work model provide an ideal environment for candidates seeking a balance between on-site collaboration and remote work. The company's focus on innovation and customer-centricity fosters a dynamic and engaging work environment.

📄 Application & Technical Interview Process

Interview Process:

  1. Technical Phone Screen: A 30-minute phone or video call to assess the candidate's technical skills and understanding of site reliability engineering concepts.
  2. Technical Deep Dive: A 60-minute technical interview focused on the candidate's experience with cloud platforms, monitoring tools, and incident management. The interviewer may ask the candidate to describe their approach to designing and maintaining monitoring solutions, implementing automation tools, and ensuring application reliability and performance.
  3. Behavioral Interview: A 30-minute interview focused on the candidate's problem-solving skills, communication abilities, and cultural fit. The interviewer may ask the candidate to describe a challenging incident they have managed and how they approached resolution and post-incident analysis.
  4. Final Interview: A 30-minute interview with the hiring manager or a member of the leadership team to discuss the candidate's fit for the role and the team, as well as any remaining questions or concerns.

Portfolio Review Tips:

  • Demonstrate Experience: Highlight your experience in designing and maintaining monitoring solutions, implementing automation tools, and ensuring application reliability and performance.
  • Showcase Incident Management: Provide examples of incident management and resolution, with a focus on post-incident analysis and lessons learned.
  • Emphasize Collaboration: Highlight your ability to work effectively with cross-functional teams, including development, QA, DevOps, and product teams.
  • Demonstrate Technical Depth: Showcase your expertise in cloud platforms, monitoring tools, and incident management, with a focus on practical examples and real-world applications.

Technical Challenge Preparation:

  • Brush Up on Technical Skills: Review your knowledge of cloud platforms, monitoring tools, and incident management to ensure you are up-to-date and confident in your abilities.
  • Practice Problem-Solving: Work through practice problems and exercises to hone your troubleshooting and problem-solving skills.
  • Prepare for Behavioral Questions: Reflect on your past experiences and be ready to discuss your approach to incident management, collaboration, and communication.

ATS Keywords: See the comprehensive list of ATS keywords below, organized by category.

📝 Enhancement Note: FIS's interview process is designed to assess the candidate's technical skills, problem-solving abilities, and cultural fit. Candidates should be prepared to discuss their experience with cloud platforms, monitoring tools, and incident management, as well as their approach to collaboration and communication.

🛠 Technology Stack & Web Infrastructure

Cloud Platforms:

  • Amazon Web Services (AWS)
  • Microsoft Azure
  • Google Cloud Platform (GCP)

Monitoring Tools:

  • Prometheus
  • Grafana
  • DataDog
  • Splunk
  • ELK Stack (Elasticsearch, Logstash, Kibana)

Scripting Languages and Automation Tools:

  • Python
  • Bash
  • Terraform
  • Ansible
  • Jenkins
  • GitLab CI/CD
  • Azure DevOps

Infrastructure as Code (IaC) Tools:

  • Terraform
  • CloudFormation
  • Azure Resource Manager (ARM)

Containerization and Orchestration Tools:

  • Docker
  • Kubernetes

Configuration Management Tools:

  • Ansible
  • Puppet
  • Chef

📝 Enhancement Note: FIS's technology stack includes a range of cloud platforms, monitoring tools, and automation tools. Candidates should have experience with at least one cloud platform and be comfortable working with a variety of tools and technologies.

👥 Team Culture & Values

Web Development Values:

  • Innovation: FIS values innovation and encourages its employees to think creatively and challenge the status quo.
  • Customer-Centricity: FIS is committed to delivering exceptional customer experiences and values employees who prioritize customer needs and feedback.
  • Collaboration: FIS fosters a culture of collaboration and teamwork, with a focus on open communication and cross-functional collaboration.
  • Continuous Learning: FIS values continuous learning and encourages its employees to stay current with emerging technologies and industry trends.

Collaboration Style:

  • Cross-Functional Integration: FIS encourages collaboration between different teams, including development, operations, and product teams, to ensure efficient and effective software delivery.
  • Code Review Culture: FIS values code review as a means of ensuring code quality, knowledge sharing, and continuous improvement.
  • Peer Programming: FIS encourages peer programming as a means of fostering knowledge sharing, mentoring, and continuous learning.

📝 Enhancement Note: FIS's culture values innovation, customer-centricity, collaboration, and continuous learning. The company fosters a collaborative environment that encourages open communication and cross-functional teamwork.

⚡ Challenges & Growth Opportunities

Technical Challenges:

  • Scalability: Design and implement monitoring solutions that can scale to support FIS's growing user base and infrastructure.
  • Performance Optimization: Identify and address performance bottlenecks and optimize application performance to meet the company's high standards.
  • Incident Management: Develop and refine incident management processes to ensure efficient and effective response to critical incidents.
  • Automation: Implement automation tools to streamline tasks, scale infrastructure, and ensure seamless deployments.

Learning & Development Opportunities:

  • Technical Skill Development: Deepen your expertise in site reliability engineering, cloud platforms, monitoring tools, and incident management by attending conferences, obtaining certifications, and participating in community events.
  • Mentorship: Seek mentorship opportunities within the company to learn from experienced team members and develop your leadership skills.
  • Architecture Decision-Making: Contribute to strategic decision-making processes related to application architecture and design, expanding your knowledge and influence within the organization.

📝 Enhancement Note: FIS offers significant opportunities for growth, both technically and in terms of leadership and architecture. Candidates should be eager to learn, adapt, and contribute to the company's transformation journey.

💡 Interview Preparation

Technical Questions:

  • Cloud Platforms: Describe your experience with cloud platforms (AWS, Azure, GCP) and how you have used them to ensure application reliability and performance.
  • Monitoring Tools: Explain your experience with monitoring tools (Prometheus, Grafana, DataDog) and how you have used them to identify and resolve performance issues.
  • Incident Management: Walk through a challenging incident you have managed, discussing your approach to identification, triage, resolution, and post-incident analysis.
  • Automation Tools: Describe your experience with automation tools (Terraform, Ansible) and how you have used them to streamline tasks and scale infrastructure.

Company & Culture Questions:

  • Agile Methodologies: Explain your experience with Agile methodologies and how you have used them to drive customer-centric innovation and automation.
  • Collaboration: Describe your approach to collaboration and how you have worked effectively with cross-functional teams, including development, QA, DevOps, and product teams.
  • Customer-Centricity: Discuss your understanding of FIS's customer-centric approach and how you have prioritized customer needs and feedback in your previous roles.

Portfolio Presentation Strategy:

  • Live Demonstration: Prepare a live demonstration of your portfolio, showcasing your experience in designing and maintaining monitoring solutions, implementing automation tools, and ensuring application reliability and performance.
  • Code Walkthrough: Be prepared to walk the interviewer through your code, explaining your design decisions, troubleshooting approaches, and optimization techniques.
  • User Experience: Highlight your understanding of user experience principles and how you have applied them to ensure a seamless and intuitive user experience for FIS's customers.

📝 Enhancement Note: FIS's interview process is designed to assess the candidate's technical skills, problem-solving abilities, and cultural fit. Candidates should be prepared to discuss their experience with cloud platforms, monitoring tools, and incident management, as well as their approach to collaboration and communication.

📌 Application Steps

To apply for this Senior Site Reliability Engineer position:

  1. Customize Your Portfolio: Tailor your portfolio to highlight your experience in designing and maintaining monitoring solutions, implementing automation tools, and ensuring application reliability and performance.
  2. Optimize Your Resume: Highlight your relevant skills and experience, with a focus on cloud platforms, monitoring tools, and incident management. Include specific examples of your problem-solving abilities and collaboration skills.
  3. Prepare for Technical Interviews: Brush up on your technical skills, practice problem-solving exercises, and prepare for behavioral questions related to incident management, collaboration, and communication.
  4. Research FIS: Learn about FIS's customer-centric approach, Agile methodologies, and commitment to innovation. Be prepared to discuss how your skills and experience align with the company's values and culture.

⚠️ Important Notice: This enhanced job description includes AI-generated insights and site reliability engineering industry-standard assumptions. All details should be verified directly with FIS before making application decisions.


Content Guidelines (IMPORTANT: Do not include this in the output)

Site Reliability Engineering Focus:

  • Tailor every section specifically to site reliability engineering, with a focus on cloud platforms, monitoring tools, and incident management.
  • Include site reliability engineering methodologies, automation tools, and incident response processes.
  • Emphasize cloud platforms, monitoring tools, and incident management skills throughout the description.
  • Address site reliability engineering career progression, technical leadership opportunities, and emerging technologies.

Quality Standards:

  • Ensure no content overlap between sections; each section must contain unique information.
  • Only include Enhancement Notes when making significant inferences about site reliability engineering processes, incident management, or team structure.
  • Be comprehensive but concise, prioritizing actionable information over descriptive text.
  • Strategically distribute site reliability engineering and incident management keywords throughout all sections naturally.
  • Provide realistic salary ranges based on location, experience level, and site reliability engineering specialization.

Industry Expertise:

  • Include specific cloud platforms (AWS, Azure, GCP), monitoring tools (Prometheus, Grafana, DataDog), and incident management processes.
  • Address site reliability engineering career progression paths and technical leadership opportunities in site reliability engineering teams.
  • Provide tactical advice for incident management, automation tools implementation, and portfolio development.
  • Include site reliability engineering-specific interview preparation and coding challenge guidance.
  • Emphasize incident management, troubleshooting, and collaboration skills throughout the description.

Professional Standards:

  • Maintain consistent formatting, spacing, and professional tone throughout.
  • Use site reliability engineering and incident management industry terminology appropriately and accurately.
  • Include comprehensive benefits and growth opportunities relevant to site reliability engineering professionals.
  • Provide actionable insights that give site reliability engineering candidates a competitive advantage.
  • Focus on site reliability engineering team culture, cross-functional collaboration, and incident response processes.

Incident Management & Troubleshooting Focus:

  • Emphasize incident management, troubleshooting, and problem-solving skills throughout the description.
  • Include specific examples of incident management processes, post-incort analysis, and lessons learned.
  • Address incident management tools, automation tools, and monitoring tools in the context of site reliability engineering.
  • Provide tactical advice for incident management, automation tools implementation, and portfolio development.
  • Include site reliability engineering-specific interview preparation and coding challenge guidance.

Avoid:

  • Generic business jargon not relevant to site reliability engineering roles.
  • Placeholder text or incomplete sections.
  • Repetitive content across different sections.
  • Non-technical terminology unless relevant to the specific site reliability engineering role.
  • Marketing language unrelated to site reliability engineering, incident management, or user experience.

Generate comprehensive, site reliability engineering-focused content that serves as a valuable resource for site reliability engineering professionals seeking their next opportunity and preparing for technical interviews in the site reliability engineering industry.

Application Requirements

Candidates should have proficiency in development technologies and experience with cloud platforms and monitoring tools. Strong troubleshooting skills and experience in incident management are also required.