Site Reliability Engineer (SRE) - Cloud, Systems, Automation, Security

Optimiza
Full_timeβ€’Amman, Jordan

πŸ“ Job Overview

  • Job Title: Site Reliability Engineer (SRE) - Cloud, Systems, Automation, Security
  • Company: Optimiza
  • Location: Amman, Al β€˜Δ€ΕŸimah, Jordan
  • Job Type: On-site
  • Category: DevOps Engineer
  • Date Posted: 2025-07-17
  • Experience Level: 5-10 years
  • Remote Status: On-site

πŸš€ Role Summary

  • Cloud & Systems: Design, automate, and operate cloud-native platforms supporting AI and big data workloads. Collaborate with teams to architect resilient system architectures that meet uptime, performance, and compliance goals.
  • Automation: Lead incident response, implement preventive solutions, and automate cloud infrastructure, systems provisioning, and service deployments using modern Infrastructure as Code (IaC) practices.
  • Security: Champion Chaos Engineering and reliability testing to expose failure points before they reach production. Drive adoption of Zero Trust, secrets management, and least-privilege access in system-level architecture.
  • Incident Management: Troubleshoot high-priority infrastructure or service incidents, implement preventive solutions, and ensure transparent incident response. Participate in a 24/7 on-call rotation and ensure incident retrospectives lead to meaningful process and tooling improvements.

πŸ“ Enhancement Note: This role requires a strong focus on cloud-native reliability, security-first design, and operations-as-code to ensure high-performing SRE culture.

πŸ’» Primary Responsibilities

  • Cloud & Systems Architecture: Design, automate, and operate cloud-native platforms supporting AI and big data workloads. Collaborate with teams to architect resilient system architectures that meet uptime, performance, and compliance goals.
  • Incident Response & Prevention: Troubleshoot high-priority infrastructure or service incidents, implement preventive solutions, and ensure transparent incident response. Participate in a 24/7 on-call rotation and ensure incident retrospectives lead to meaningful process and tooling improvements.
  • Automation & IaC: Lead and automate cloud infrastructure, systems provisioning, and service deployments using modern Infrastructure as Code (IaC) practices. Continuously refine CI/CD pipelines and service maturity standards to support rapid, safe releases.
  • Security & Compliance: Champion Chaos Engineering and reliability testing to expose failure points before they reach production. Drive adoption of Zero Trust, secrets management, and least-privilege access in system-level architecture. Ensure compliance with security and governance controls, including logging, auditability, and vulnerability management.
  • Monitoring & Alerting: Identify patterns and analytics to inform Service Level Objectives (SLOs) and integrate automated recovery and self-healing mechanisms. Implement proactive monitoring, alerting, and runbook automation to reduce toil and support predictable incident response.
  • Stakeholder Collaboration: Build trusted relationships with stakeholders by understanding business needs and delivering scalable, secure, and reliable solutions. Collaborate with engineering teams to embed DevSecOps practices.

πŸ“ Enhancement Note: This role requires a strong focus on incident management, automation, and security to ensure reliable, secure, and compliant cloud-native platforms.

πŸŽ“ Skills & Qualifications

Education

  • Bachelor’s Degree in Computer Science, Engineering, Cybersecurity, or a related technical field

Experience

  • 5-8 years of experience with DevOps, CI/CD tooling, and production-grade service management
  • 3+ years of experience architecting and operating cloud-based, distributed systems (AWS, Azure, or GCP)

Required Skills

  • Cloud & Systems: Proven experience with cloud-based, distributed systems (AWS, Azure, or GCP) and container orchestration (e.g., Kubernetes)
  • Automation & IaC: Expertise in automation frameworks, IaC tools (e.g., Terraform, Ansible, Pulumi), and scripting (Bash, Python, or Go preferred)
  • Linux Administration: Strong knowledge of Linux systems administration, service hardening, and process-level isolation
  • Security: Solid background in security best practices, including access control, secrets management, audit logging, and runtime protection
  • Monitoring & Observability: Proficiency in monitoring and observability stacks (e.g., Prometheus, Grafana, ELK, Datadog, or similar)
  • Incident Management: Proven experience in incident response, troubleshooting, and preventive solutions implementation
  • Communication: Fluent English and Arabic is required

Preferred Skills

  • Experience with Chaos Engineering and reliability testing
  • Knowledge of QHSE (Quality Health Safety and Environment), Business Continuity, Information Security, Privacy, Risk, Compliance Management, and Governance of Organizations policies, procedures, plans, and related risk assessments
  • Familiarity with AI and big data workloads

πŸ“ Enhancement Note: Candidates with experience in AI and big data workloads and knowledge of relevant policies and procedures will have a competitive advantage.

πŸ“Š Web Portfolio & Project Requirements

Portfolio Essentials

  • Cloud & Systems: Demonstrate your experience in cloud-based, distributed systems (AWS, Azure, or GCP) with case studies showcasing your architecture and deployment processes
  • Automation & IaC: Highlight your expertise in automation frameworks and IaC tools (e.g., Terraform, Ansible, Pulumi) with examples of automated infrastructure and service deployments
  • Security: Showcase your security background with examples of implementing access control, secrets management, audit logging, and runtime protection in your projects
  • Incident Management: Include case studies demonstrating your incident response, troubleshooting, and preventive solutions implementation skills

Technical Documentation

  • Cloud & Systems: Document your cloud-based, distributed systems architecture, including system diagrams, deployment processes, and performance metrics
  • Automation & IaC: Provide detailed documentation of your automation frameworks and IaC tools, including code quality, commenting, and version control practices
  • Security: Include security-related documentation, such as vulnerability assessments, penetration testing reports, and compliance certifications
  • Incident Management: Maintain incident logs, post-mortem reports, and lessons learned documentation to demonstrate your incident management skills

πŸ“ Enhancement Note: Tailor your portfolio to highlight your experience in cloud-native reliability, security-first design, and operations-as-code to showcase your fit for this role.

πŸ’΅ Compensation & Benefits

Salary Range

  • Estimate: The estimated salary range for this role in Amman, Jordan is JD 12,000 - JD 18,000 per month (USD 17,000 - USD 25,000 per year), based on market research and industry standards for experienced DevOps engineers with cloud and security focus.

πŸ“ Enhancement Note: Salary estimates are based on regional market research and industry standards for experienced DevOps engineers with cloud and security focus. Actual salary may vary depending on the candidate's experience and skills.

Benefits

  • Class A Medical Insurance

Working Hours

  • Standard Hours: 40 hours per week, with flexible working hours to accommodate project deadlines and maintenance windows
  • On-Call Rotation: Participation in a 24/7 on-call rotation to ensure incident response and system uptime

🎯 Team & Company Context

🏒 Company Culture

Industry: Technology, with a focus on AI and big data workloads

Company Size: Medium-sized company with a team of around 50-250 employees, providing opportunities for collaboration and growth

Founded: 2001, with a history of delivering innovative technology solutions in the Middle East and North Africa region

Team Structure:

  • Cloud & Systems: A dedicated team responsible for designing, automating, and operating cloud-native platforms supporting AI and big data workloads
  • Automation & IaC: A team focused on automation frameworks, IaC tools, and CI/CD pipelines to support rapid, safe releases
  • Security: A team dedicated to ensuring the security and compliance of the company's systems and data
  • Incident Management: A team responsible for incident response, troubleshooting, and preventive solutions implementation

Development Methodology:

  • Agile/Scrum: The company follows Agile/Scrum methodologies for project management and software development
  • Code Review & Testing: The company emphasizes code review, testing, and quality assurance practices to ensure high-quality software delivery
  • Deployment Strategies: The company employs deployment strategies, such as blue-green and canary deployments, to minimize downtime and ensure smooth releases

Company Website: Optimiza

πŸ“ Enhancement Note: The company's focus on AI and big data workloads, along with its Agile/Scrum methodologies and deployment strategies, creates an environment that values innovation, collaboration, and continuous improvement.

πŸ“ˆ Career & Growth Analysis

Web Technology Career Level: This role is suitable for experienced DevOps engineers with 5-10 years of experience in cloud-based, distributed systems, and a strong focus on security and incident management. The role offers opportunities for growth into technical leadership positions, such as Senior DevOps Engineer or Technical Lead.

Reporting Structure: This role reports directly to the Head of Engineering and works closely with various teams, including cloud, automation, security, and incident management teams.

Technical Impact: The role has a significant impact on the company's AI and big data workloads, ensuring high availability, performance, and security. The role also influences the company's incident management processes and contributes to the development of its security and compliance posture.

Growth Opportunities:

  • Technical Leadership: The role offers opportunities for growth into technical leadership positions, such as Senior DevOps Engineer or Technical Lead, with a focus on mentoring team members and driving technical decision-making
  • Emerging Technologies: The company's focus on AI and big data workloads provides opportunities for candidates to gain experience with emerging technologies and drive innovation in the field
  • Architecture Decisions: The role involves making critical architecture decisions that impact the company's systems and data, providing opportunities for candidates to demonstrate their technical expertise and leadership

πŸ“ Enhancement Note: This role offers strong growth potential for experienced DevOps engineers looking to advance their careers in cloud-native reliability, security-first design, and operations-as-code.

🌐 Work Environment

Office Type: The company's office is a modern, collaborative workspace designed to facilitate team interaction and knowledge sharing

Office Location(s): The company's main office is located in Amman, Jordan, with additional offices in the Middle East and North Africa region

Workspace Context:

  • Collaborative Workspace: The office features open-plan workspaces, collaboration areas, and meeting rooms to support teamwork and communication
  • Development Tools: The office is equipped with modern development tools, including multiple monitors, testing devices, and high-speed internet connectivity
  • Cross-Functional Collaboration: The office encourages cross-functional collaboration between developers, designers, and other stakeholders to ensure user-focused and innovative solutions

Work Schedule: The company offers flexible working hours to accommodate project deadlines and maintenance windows. The role also requires participation in a 24/7 on-call rotation to ensure incident response and system uptime.

πŸ“ Enhancement Note: The company's modern, collaborative workspace and flexible working hours create an environment that values teamwork, innovation, and work-life balance.

πŸ“„ Application & Technical Interview Process

Interview Process:

  1. Technical Assessment: A technical assessment focused on cloud-native reliability, security-first design, and operations-as-code, including hands-on exercises and problem-solving scenarios
  2. System Design Discussion: A system design discussion to evaluate the candidate's ability to architect resilient, secure, and scalable cloud-native platforms
  3. Behavioral & Cultural Fit Assessment: An assessment of the candidate's behavioral and cultural fit with the company's values and team dynamics
  4. Final Evaluation: A final evaluation based on the candidate's performance in the technical assessment, system design discussion, and behavioral and cultural fit assessment

Portfolio Review Tips:

  1. Cloud & Systems: Highlight your experience in cloud-based, distributed systems (AWS, Azure, or GCP) with case studies showcasing your architecture and deployment processes
  2. Automation & IaC: Emphasize your expertise in automation frameworks and IaC tools (e.g., Terraform, Ansible, Pulumi) with examples of automated infrastructure and service deployments
  3. Security: Showcase your security background with examples of implementing access control, secrets management, audit logging, and runtime protection in your projects
  4. Incident Management: Include case studies demonstrating your incident response, troubleshooting, and preventive solutions implementation skills

Technical Challenge Preparation:

  1. Cloud & Systems: Brush up on your knowledge of cloud-based, distributed systems (AWS, Azure, or GCP) and container orchestration (e.g., Kubernetes) to prepare for hands-on exercises and problem-solving scenarios
  2. Automation & IaC: Familiarize yourself with automation frameworks, IaC tools (e.g., Terraform, Ansible, Pulumi), and scripting (Bash, Python, or Go preferred) to demonstrate your expertise in automated infrastructure and service deployments
  3. Security: Review your knowledge of security best practices, including access control, secrets management, audit logging, and runtime protection, to ensure you can effectively address security-related challenges
  4. Incident Management: Prepare for incident response, troubleshooting, and preventive solutions implementation scenarios to demonstrate your incident management skills

ATS Keywords:

  • Cloud & Systems: AWS, Azure, GCP, Kubernetes, IaC, Terraform, Ansible, Pulumi, Cloud-Native, Distributed Systems
  • Automation & IaC: CI/CD, Automation Frameworks, IaC Tools, Scripting, Bash, Python, Go, Infrastructure as Code
  • Security: Zero Trust, Secrets Management, Least-Privilege Access, Access Control, Audit Logging, Runtime Protection, Compliance, Vulnerability Management
  • Incident Management: Incident Response, Troubleshooting, Preventive Solutions, On-Call Rotation, Incident Retrospectives, Mean Time to Resolve (MTTR), Incident Frequency Reduction
  • Monitoring & Observability: Monitoring, Alerting, Observability, Prometheus, Grafana, ELK, Datadog, Service Level Objectives (SLOs), Automated Recovery, Self-Healing Mechanisms
  • Incident Management: Incident Response, Troubleshooting, Preventive Solutions, On-Call Rotation, Incident Retrospectives, Mean Time to Resolve (MTTR), Incident Frequency Reduction
  • Soft Skills: Communication, Collaboration, Problem-Solving, Decision-Making, Leadership, Mentoring, Teamwork

πŸ“ Enhancement Note: Tailor your resume and portfolio to highlight your experience with cloud-native reliability, security-first design, and operations-as-code to showcase your fit for this role.

πŸ›  Technology Stack & Web Infrastructure

Cloud & Systems:

  • Cloud Providers: AWS, Azure, or GCP
  • Container Orchestration: Kubernetes
  • IaC Tools: Terraform, Ansible, Pulumi
  • Scripting: Bash, Python, or Go

Automation & IaC:

  • CI/CD Tools: Jenkins, GitLab CI/CD, or similar
  • Automation Frameworks: Ansible, Puppet, or similar
  • Infrastructure as Code (IaC): Terraform, CloudFormation, or similar

Security:

  • Identity & Access Management (IAM): Okta, Azure Active Directory, or similar
  • Secrets Management: HashiCorp Vault, AWS Secrets Manager, or similar
  • Vulnerability Management: Nessus, OpenVAS, or similar

Monitoring & Observability:

  • Monitoring Tools: Prometheus, Grafana, ELK, Datadog, or similar
  • Alerting Tools: PagerDuty, OpsGenie, or similar
  • Service Level Objectives (SLOs): SLO Manager, Prometheus, or similar

πŸ“ Enhancement Note: Familiarize yourself with the company's technology stack and infrastructure to ensure a smooth onboarding process and effective collaboration with the team.

πŸ‘₯ Team Culture & Values

Web Development Values:

  • Cloud-Native Reliability: Prioritize cloud-native reliability, security-first design, and operations-as-code to ensure high-performing SRE culture
  • User Experience: Focus on user experience and user impact to drive innovation and continuous improvement
  • Performance Optimization: Optimize system performance, reliability, and cost efficiency to ensure scalable and secure cloud-native platforms
  • Collaboration & Knowledge Sharing: Encourage collaboration, knowledge sharing, and teamwork to foster a culture of learning and growth

Collaboration Style:

  • Cross-Functional Integration: Facilitate cross-functional integration between developers, designers, and stakeholders to ensure user-focused and innovative solutions
  • Code Review Culture: Implement a code review culture to ensure high-quality software delivery and knowledge sharing
  • Peer Programming: Encourage peer programming and mentoring to foster a culture of learning and growth

πŸ“ Enhancement Note: The company's focus on cloud-native reliability, user experience, and collaboration creates an environment that values innovation, teamwork, and continuous improvement.

⚑ Challenges & Growth Opportunities

Technical Challenges:

  • Cloud-Native Reliability: Design, automate, and operate cloud-native platforms supporting AI and big data workloads, ensuring high availability, performance, and security
  • Security-First Design: Champion Chaos Engineering and reliability testing to expose failure points before they reach production and drive adoption of Zero Trust, secrets management, and least-privilege access in system-level architecture
  • Incident Management: Troubleshoot high-priority infrastructure or service incidents, implement preventive solutions, and ensure transparent incident response
  • Performance Optimization: Identify patterns and analytics to inform Service Level Objectives (SLOs) and integrate automated recovery and self-healing mechanisms to optimize system performance, reliability, and cost efficiency

Learning & Development Opportunities:

  • Technical Skill Development: Develop your expertise in cloud-native reliability, security-first design, and operations-as-code to advance your career in the field
  • Emerging Technologies: Explore emerging technologies in AI and big data workloads to drive innovation and continuous improvement
  • Technical Leadership: Demonstrate your technical expertise and leadership by mentoring team members and driving technical decision-making

πŸ“ Enhancement Note: This role offers strong technical challenges and growth opportunities for experienced DevOps engineers looking to advance their careers in cloud-native reliability, security-first design, and operations-as-code.

πŸ’‘ Interview Preparation

Technical Questions:

  1. Cloud & Systems: Describe your experience with cloud-based, distributed systems (AWS, Azure, or GCP) and container orchestration (e.g., Kubernetes). How have you designed, automated, and operated cloud-native platforms supporting AI and big data workloads?
  2. Automation & IaC: Explain your expertise in automation frameworks, IaC tools (e.g., Terraform, Ansible, Pulumi), and scripting (Bash, Python, or Go preferred). How have you automated cloud infrastructure, systems provisioning, and service deployments using modern Infrastructure as Code (IaC) practices?
  3. Security: Discuss your background in security best practices, including access control, secrets management, audit logging, and runtime protection. How have you championed Chaos Engineering and reliability testing to expose failure points before they reach production?
  4. Incident Management: Describe your experience in incident response, troubleshooting, and preventive solutions implementation. How have you ensured transparent incident response and minimized downtime and impact on users?

Company & Culture Questions:

  1. Company Culture: How do you see yourself contributing to the company's culture of innovation, collaboration, and continuous improvement?
  2. Team Dynamics: Describe your experience working in a team environment and how you have contributed to team success and growth.
  3. User Focus: How do you ensure that your technical decisions and solutions prioritize user experience and user impact?

Portfolio Presentation Strategy:

  1. Cloud & Systems: Highlight your experience in cloud-based, distributed systems (AWS, Azure, or GCP) with case studies showcasing your architecture and deployment processes
  2. Automation & IaC: Emphasize your expertise in automation frameworks and IaC tools (e.g., Terraform, Ansible, Pulumi) with examples of automated infrastructure and service deployments
  3. Security: Showcase your security background with examples of implementing access control, secrets management, audit logging, and runtime protection in your projects
  4. Incident Management: Include case studies demonstrating your incident response, troubleshooting, and preventive solutions implementation skills

πŸ“ Enhancement Note: Prepare for technical and behavioral questions related to cloud-native reliability, security-first design, operations-as-code, incident management, and user experience to ensure a successful interview.

πŸ“Œ Application Steps

To apply for this Site Reliability Engineer (SRE) - Cloud, Systems, Automation, Security position:

  1. Tailor Your Resume: Highlight your experience in cloud-native reliability, security-first design, operations-as-code, incident management, and user experience to showcase your fit for this role
  2. Customize Your Portfolio: Showcase your expertise in cloud-based, distributed systems (AWS, Azure, or GCP), automation frameworks, IaC tools, and security best practices with case studies and live demonstrations
  3. Prepare for Technical Challenges: Brush up on your knowledge of cloud-native reliability, security-first design, operations-as-code, incident management, and user experience to ensure success in technical assessments and interviews
  4. Research the Company: Familiarize yourself with the company's mission, values, and culture to ensure a strong fit and effective collaboration with the team

⚠️ Important Notice: This enhanced job description includes AI-generated insights and web development/server administration industry-standard assumptions. All details should be verified directly with the hiring organization before making application decisions.

Application Requirements

Candidates should have a Bachelor's Degree in a relevant field and 5-8 years of experience in DevOps and cloud-based systems. Expertise in automation frameworks, security best practices, and monitoring tools is essential.