📍 Job Overview

Job Title: Site Reliability Engineer
Company: Feedzai
Location: Portugal (Remote)
Job Type: Full-Time
Category: DevOps Engineer, System Administrator
Date Posted: 2025-06-18
Experience Level: Mid-Senior Level (2-5 years)
Remote Status: Remote OK

🚀 Role Summary

Key Responsibilities: Ensure high availability and reliability of Feedzai's cloud services, optimize performance, and automate infrastructure.
Key Technologies: Cloud (AWS/GCP), Distributed Systems, Go, Python, Kubernetes, Monitoring (Grafana, Prometheus), Infrastructure as Code (IaC).
Key Challenges: Manage complex, low-latency, high-throughput systems; collaborate with cross-functional teams to drive improvements.

📝 Enhancement Note: This role requires a strong background in distributed systems, cloud environments, and automation to succeed in a fast-paced, collaborative environment.

💻 Primary Responsibilities

Capacity Planning & Allocation: Provide recommendations on capacity allocation, considering cost, resilience, and performance.
System Performance & Reliability: Work with product teams to support best practices and drive improvements before and after systems go live.
Automation & Infrastructure: Automate all aspects of cloud infrastructure and incident response; develop playbooks for actionable alerts.
Incident Response: Participate in incident response, root cause investigation, and resolution; maintain and develop infrastructure as code (IaC).
Collaboration & Problem Solving: Utilize problem-solving skills to help prevent and investigate production issues; collaborate with cross-functional teams.

📝 Enhancement Note: This role requires a proactive approach to identifying and mitigating potential issues, as well as strong communication skills to work effectively with various teams.

🎓 Skills & Qualifications

Education: Bachelor's degree in Computer Science, Information Systems, or a related field.

Experience:

2+ years of experience in data structures, algorithms, programming, asynchronous & multithreaded designs.
2+ years of experience building scalable and distributed cloud services.
2+ years operating production environments.
1+ year of experience in cross-team collaboration within a supportive role.

Required Skills:

Programming skills in Go, Python, or similar languages.
Experience with monitoring and observability stacks (Grafana, Prometheus).
Familiarity with Kubernetes, Cloud (AWS/GCP), and Hashicorp tools.
Experience being on-call.

Preferred Skills:

Knowledge or experience with AWS or GCP.
Familiarity with Terraform or other IaC tools.

📝 Enhancement Note: While not required, having experience with AWS or GCP and familiarity with Terraform or other IaC tools can be beneficial for this role.

📊 Web Portfolio & Project Requirements

Portfolio Essentials:

Demonstrate experience with cloud infrastructure, distributed systems, and automation.
Showcase problem-solving skills through case studies or live projects.
Highlight experience with monitoring and observability tools.

Technical Documentation:

Provide examples of code quality, commenting, and documentation standards.
Showcase version control, deployment processes, and server configuration experience.
Demonstrate understanding of testing methodologies, performance metrics, and optimization techniques.

📝 Enhancement Note: For this role, focus on projects that showcase your ability to manage complex systems, optimize performance, and automate infrastructure.

💵 Compensation & Benefits

Salary Range: €55,000 - €75,000 per year (based on market research for mid-senior level DevOps roles in Portugal)

Benefits:

Competitive salary and equity compensation.
Health, dental, and vision insurance.
401(k) or pension plan with company matching.
Flexible work arrangements and remote work options.
Professional development opportunities, including training, conferences, and certifications.

Working Hours: Full-time (40 hours/week) with flexible working hours and maintenance windows.

📝 Enhancement Note: Salary range is estimated based on market research for mid-senior level DevOps roles in Portugal, considering the required skills and experience for this position.

🎯 Team & Company Context

🏢 Company Culture

Industry: Financial Risk Management, Fintech

Company Size: Medium (250-1,000 employees)

Founded: 2011

Team Structure:

The Platform Engineering area supports the product development lifecycle, from development through testing and deployment to operations and maintenance.
The Site Reliability Engineering team focuses on ensuring high availability, reliability, and performance of Feedzai's cloud services.

Development Methodology:

Feedzai follows a DevOps approach, with close collaboration between development, operations, and other teams.
The company uses Agile methodologies, with a focus on continuous integration, continuous deployment, and continuous improvement.

Company Website: Feedzai

📝 Enhancement Note: Feedzai's culture values collaboration, innovation, and continuous learning, with a strong focus on driving results and challenging the status quo.

📈 Career & Growth Analysis

Web Technology Career Level: Mid-Senior Level (2-5 years) - Site Reliability Engineer

Reporting Structure: This role reports directly to the Engineering Manager within the Platform Engineering area.

Technical Impact: Site Reliability Engineers at Feedzai have a significant impact on the performance, reliability, and scalability of the company's cloud services, directly contributing to the customer experience and business success.

Growth Opportunities:

Technical Growth: Deepen expertise in cloud environments, distributed systems, and automation; explore emerging technologies and tools.
Leadership Growth: Develop leadership skills through mentoring, coaching, and team management opportunities.
Architecture & Design Growth: Gain experience in designing and implementing large-scale, highly available, and performant systems.

📝 Enhancement Note: Feedzai offers numerous growth opportunities for technical professionals looking to advance their careers in a dynamic and challenging environment.

🌐 Work Environment

Office Type: Hybrid (remote work available)

Office Location(s): Portugal (Lisbon, Porto, and remote)

Workspace Context:

Collaboration: Work closely with cross-functional teams, including product, development, and operations teams.
Tools & Equipment: Access to modern development tools, multiple monitors, and testing devices.
Interaction: Regular team meetings, code reviews, and knowledge-sharing sessions.

Work Schedule: Flexible working hours with maintenance windows and on-call rotations.

📝 Enhancement Note: Feedzai's hybrid work environment encourages collaboration and knowledge-sharing while offering the flexibility to work remotely.

📄 Application & Technical Interview Process

Interview Process:

Technical Phone Screen: Assess problem-solving skills, programming proficiency, and understanding of cloud environments and distributed systems.
System Design & Architecture: Evaluate system design, scalability, and performance optimization skills through a take-home exercise or on-site presentation.
Behavioral & Cultural Fit: Assess communication skills, teamwork, and cultural fit through behavioral questions and case studies.
Final Decision: Make a final decision based on the candidate's technical skills, problem-solving abilities, and cultural fit.

Portfolio Review Tips:

Highlight projects that demonstrate your ability to manage complex systems, optimize performance, and automate infrastructure.
Showcase your problem-solving skills through case studies or live projects.
Emphasize your experience with cloud environments, distributed systems, and monitoring tools.

Technical Challenge Preparation:

Brush up on your knowledge of cloud environments (AWS/GCP), distributed systems, and automation tools.
Practice system design and architecture exercises to prepare for the take-home exercise or on-site presentation.
Familiarize yourself with Feedzai's product and industry to demonstrate your understanding of the company and its mission.

ATS Keywords: (Organized by category)

Programming Languages: Go, Python, Bash
Cloud Platforms: AWS, GCP
Infrastructure as Code: Terraform, CloudFormation
Monitoring & Observability: Grafana, Prometheus, ELK Stack
Containerization & Orchestration: Kubernetes, Docker
Problem-Solving Skills: Algorithms, Data Structures, System Design
Soft Skills: Communication, Teamwork, Collaboration

📝 Enhancement Note: Tailor your resume and application materials to highlight the skills and experiences most relevant to this Site Reliability Engineer role at Feedzai.

🛠 Technology Stack & Web Infrastructure

Cloud Platforms: AWS, GCP

Programming Languages: Go, Python, Bash

Infrastructure as Code: Terraform, CloudFormation

Monitoring & Observability: Grafana, Prometheus, ELK Stack

Containerization & Orchestration: Kubernetes, Docker

Database: PostgreSQL, Redis

Caching: Redis, Memcached

Search: Elasticsearch, Algolia

Messaging: RabbitMQ, Apache Kafka

CI/CD: Jenkins, GitLab CI/CD

Version Control: Git

Project Management: Jira, Confluence

📝 Enhancement Note: Familiarize yourself with Feedzai's technology stack, as it is essential for success in this role.

👥 Team Culture & Values

Web Development Values:

Reliability: Ensure high availability and reliability of Feedzai's cloud services.
Performance: Optimize system performance and scalability to meet business demands.
Automation: Automate infrastructure and incident response processes to improve efficiency and reduce manual effort.
Collaboration: Work closely with cross-functional teams to drive improvements and deliver results.

Collaboration Style:

Cross-Functional Integration: Collaborate with product, development, and operations teams to ensure alignment and successful project delivery.
Code Review Culture: Participate in code reviews and pair programming to maintain high coding standards and share knowledge.
Knowledge Sharing: Contribute to Feedzai's culture of continuous learning and improvement by sharing your expertise with colleagues.

📝 Enhancement Note: Feedzai's culture values collaboration, innovation, and continuous learning, with a strong focus on driving results and challenging the status quo.

⚡ Challenges & Growth Opportunities

Technical Challenges:

System Complexity: Manage complex, low-latency, high-throughput systems with varying workloads and traffic patterns.
Scalability & Performance: Optimize system performance and scalability to meet growing business demands and ensure consistent user experience.
Incident Response: Participate in incident response and resolution, minimizing downtime and ensuring quick recovery.
Automation & Tooling: Develop and maintain automation tools and scripts to improve efficiency and reduce manual effort.

Learning & Development Opportunities:

Emerging Technologies: Stay up-to-date with the latest cloud technologies, distributed systems, and automation tools.
Leadership Development: Develop leadership skills through mentoring, coaching, and team management opportunities.
Architecture & Design: Gain experience in designing and implementing large-scale, highly available, and performant systems.

📝 Enhancement Note: Feedzai offers numerous growth opportunities for technical professionals looking to advance their careers in a dynamic and challenging environment.

💡 Interview Preparation

Technical Questions:

System Design & Architecture: Describe your approach to designing scalable, highly available, and performant systems. Provide examples of system design patterns and best practices you've used in previous projects.
Incident Response: Walk through a scenario where you had to respond to a critical system incident. Describe your approach, the tools you used, and the outcome.
Automation & Infrastructure as Code: Explain your experience with IaC tools (e.g., Terraform, CloudFormation) and how you've used them to automate infrastructure and incident response processes.

Company & Culture Questions:

Feedzai's Mission: Explain why you're interested in Feedzai's mission to secure the transition to a cashless world and enable digital trust in every transaction and payment type.
Collaboration & Teamwork: Describe your experience working with cross-functional teams and how you've contributed to their success. Provide specific examples of projects or initiatives where your collaboration skills were essential.

Portfolio Presentation Strategy:

System Design & Architecture: Present your approach to designing scalable, highly available, and performant systems using a structured, step-by-step process.
Incident Response: Describe a critical system incident you've responded to, highlighting your problem-solving skills, communication, and collaboration with other teams.
Automation & Infrastructure as Code: Showcase your experience with IaC tools by walking through a project or initiative where you've used them to automate infrastructure and incident response processes.

📝 Enhancement Note: Tailor your interview preparation to the specific requirements and challenges of the Site Reliability Engineer role at Feedzai, focusing on your technical skills, problem-solving abilities, and cultural fit.

📌 Application Steps

To apply for this Site Reliability Engineer position at Feedzai:

Update Your Resume: Highlight your experience with cloud environments, distributed systems, and automation tools. Emphasize your problem-solving skills, system design, and architecture expertise.
Prepare for Technical Phone Screen: Brush up on your knowledge of cloud environments (AWS/GCP), distributed systems, and automation tools. Practice problem-solving exercises and system design questions.
Prepare for System Design & Architecture Interview: Review system design patterns and best practices. Practice system design exercises and prepare a structured, step-by-step presentation approach.
Research Feedzai: Familiarize yourself with Feedzai's product, industry, and company culture. Prepare thoughtful questions to ask during the interview process.

⚠️ Important Notice: This enhanced job description includes AI-generated insights and web technology industry-standard assumptions. All details should be verified directly with the hiring organization before making application decisions.

Site Reliability Engineer