📍 Job Overview

Job Title: Site Reliability Engineer (SRE) - Cloud, Systems, Automation, Security
Company: Optimiza
Location: Amman, Al ‘Āşimah, Jordan
Job Type: On-site
Category: DevOps Engineer
Date Posted: 2025-07-17
Experience Level: 5-10 years
Remote Status: On-site

🚀 Role Summary

Cloud & Systems: Design, automate, and operate cloud-native platforms supporting AI and big data workloads. Collaborate with teams to architect resilient system architectures that meet uptime, performance, and compliance goals.
Automation: Lead incident response, implement preventive solutions, and automate cloud infrastructure, systems provisioning, and service deployments using modern Infrastructure as Code (IaC) practices.
Security: Champion Chaos Engineering and reliability testing to expose failure points before they reach production. Drive adoption of Zero Trust, secrets management, and least-privilege access in system-level architecture.
Incident Management: Troubleshoot high-priority infrastructure or service incidents, implement preventive solutions, and ensure transparent incident response. Participate in a 24/7 on-call rotation and ensure incident retrospectives lead to meaningful process and tooling improvements.

📝 Enhancement Note: This role requires a strong focus on cloud-native reliability, security-first design, and operations-as-code to ensure high-performing SRE culture.

💻 Primary Responsibilities

Cloud & Systems Architecture: Design, automate, and operate cloud-native platforms supporting AI and big data workloads. Collaborate with teams to architect resilient system architectures that meet uptime, performance, and compliance goals.
Incident Response & Prevention: Troubleshoot high-priority infrastructure or service incidents, implement preventive solutions, and ensure transparent incident response. Participate in a 24/7 on-call rotation and ensure incident retrospectives lead to meaningful process and tooling improvements.
Automation & IaC: Lead and automate cloud infrastructure, systems provisioning, and service deployments using modern Infrastructure as Code (IaC) practices. Continuously refine CI/CD pipelines and service maturity standards to support rapid, safe releases.
Security & Compliance: Champion Chaos Engineering and reliability testing to expose failure points before they reach production. Drive adoption of Zero Trust, secrets management, and least-privilege access in system-level architecture. Ensure compliance with security and governance controls, including logging, auditability, and vulnerability management.
Monitoring & Alerting: Identify patterns and analytics to inform Service Level Objectives (SLOs) and integrate automated recovery and self-healing mechanisms. Implement proactive monitoring, alerting, and runbook automation to reduce toil and support predictable incident response.
Stakeholder Collaboration: Build trusted relationships with stakeholders by understanding business needs and delivering scalable, secure, and reliable solutions. Collaborate with engineering teams to embed DevSecOps practices.

📝 Enhancement Note: This role requires a strong focus on incident management, automation, and security to ensure reliable, secure, and compliant cloud-native platforms.

🎓 Skills & Qualifications

Education

Bachelor’s Degree in Computer Science, Engineering, Cybersecurity, or a related technical field

Experience

5-8 years of experience with DevOps, CI/CD tooling, and production-grade service management
3+ years of experience architecting and operating cloud-based, distributed systems (AWS, Azure, or GCP)

Required Skills

Cloud & Systems: Proven experience with cloud-based, distributed systems (AWS, Azure, or GCP) and container orchestration (e.g., Kubernetes)
Automation & IaC: Expertise in automation frameworks, IaC tools (e.g., Terraform, Ansible, Pulumi), and scripting (Bash, Python, or Go preferred)
Linux Administration: Strong knowledge of Linux systems administration, service hardening, and process-level isolation
Security: Solid background in security best practices, including access control, secrets management, audit logging, and runtime protection
Monitoring & Observability: Proficiency in monitoring and observability stacks (e.g., Prometheus, Grafana, ELK, Datadog, or similar)
Incident Management: Proven experience in incident response, troubleshooting, and preventive solutions implementation
Communication: Fluent English and Arabic is required

Preferred Skills

Experience with Chaos Engineering and reliability testing
Knowledge of QHSE (Quality Health Safety and Environment), Business Continuity, Information Security, Privacy, Risk, Compliance Management, and Governance of Organizations policies, procedures, plans, and related risk assessments
Familiarity with AI and big data workloads

📝 Enhancement Note: Candidates with experience in AI and big data workloads and knowledge of relevant policies and procedures will have a competitive advantage.

📊 Web Portfolio & Project Requirements

Portfolio Essentials

Cloud & Systems: Demonstrate your experience in cloud-based, distributed systems (AWS, Azure, or GCP) with case studies showcasing your architecture and deployment processes
Automation & IaC: Highlight your expertise in automation frameworks and IaC tools (e.g., Terraform, Ansible, Pulumi) with examples of automated infrastructure and service deployments
Security: Showcase your security background with examples of implementing access control, secrets management, audit logging, and runtime protection in your projects
Incident Management: Include case studies demonstrating your incident response, troubleshooting, and preventive solutions implementation skills

Technical Documentation

Cloud & Systems: Document your cloud-based, distributed systems architecture, including system diagrams, deployment processes, and performance metrics
Automation & IaC: Provide detailed documentation of your automation frameworks and IaC tools, including code quality, commenting, and version control practices
Security: Include security-related documentation, such as vulnerability assessments, penetration testing reports, and compliance certifications
Incident Management: Maintain incident logs, post-mortem reports, and lessons learned documentation to demonstrate your incident management skills

📝 Enhancement Note: Tailor your portfolio to highlight your experience in cloud-native reliability, security-first design, and operations-as-code to showcase your fit for this role.

💵 Compensation & Benefits

Salary Range

Estimate: The estimated salary range for this role in Amman, Jordan is JD 12,000 - JD 18,000 per month (USD 17,000 - USD 25,000 per year), based on market research and industry standards for experienced DevOps engineers with cloud and security focus.

📝 Enhancement Note: Salary estimates are based on regional market research and industry standards for experienced DevOps engineers with cloud and security focus. Actual salary may vary depending on the candidate's experience and skills.

Benefits

Class A Medical Insurance

Working Hours

Standard Hours: 40 hours per week, with flexible working hours to accommodate project deadlines and maintenance windows
On-Call Rotation: Participation in a 24/7 on-call rotation to ensure incident response and system uptime

🎯 Team & Company Context

🏢 Company Culture

Industry: Technology, with a focus on AI and big data workloads

Company Size: Medium-sized company with a team of around 50-250 employees, providing opportunities for collaboration and growth

Founded: 2001, with a history of delivering innovative technology solutions in the Middle East and North Africa region

Team Structure:

Cloud & Systems: A dedicated team responsible for designing, automating, and operating cloud-native platforms supporting AI and big data workloads
Automation & IaC: A team focused on automation frameworks, IaC tools, and CI/CD pipelines to support rapid, safe releases
Security: A team dedicated to ensuring the security and compliance of the company's systems and data
Incident Management: A team responsible for incident response, troubleshooting, and preventive solutions implementation

Development Methodology:

Agile/Scrum: The company follows Agile/Scrum methodologies for project management and software development
Code Review & Testing: The company emphasizes code review, testing, and quality assurance practices to ensure high-quality software delivery
Deployment Strategies: The company employs deployment strategies, such as blue-green and canary deployments, to minimize downtime and ensure smooth releases

Company Website: Optimiza

📝 Enhancement Note: The company's focus on AI and big data workloads, along with its Agile/Scrum methodologies and deployment strategies, creates an environment that values innovation, collaboration, and continuous improvement.

📈 Career & Growth Analysis

Web Technology Career Level: This role is suitable for experienced DevOps engineers with 5-10 years of experience in cloud-based, distributed systems, and a strong focus on security and incident management. The role offers opportunities for growth into technical leadership positions, such as Senior DevOps Engineer or Technical Lead.

Reporting Structure: This role reports directly to the Head of Engineering and works closely with various teams, including cloud, automation, security, and incident management teams.

Technical Impact: The role has a significant impact on the company's AI and big data workloads, ensuring high availability, performance, and security. The role also influences the company's incident management processes and contributes to the development of its security and compliance posture.

Growth Opportunities:

Technical Leadership: The role offers opportunities for growth into technical leadership positions, such as Senior DevOps Engineer or Technical Lead, with a focus on mentoring team members and driving technical decision-making
Emerging Technologies: The company's focus on AI and big data workloads provides opportunities for candidates to gain experience with emerging technologies and drive innovation in the field
Architecture Decisions: The role involves making critical architecture decisions that impact the company's systems and data, providing opportunities for candidates to demonstrate their technical expertise and leadership

📝 Enhancement Note: This role offers strong growth potential for experienced DevOps engineers looking to advance their careers in cloud-native reliability, security-first design, and operations-as-code.

🌐 Work Environment

Office Type: The company's office is a modern, collaborative workspace designed to facilitate team interaction and knowledge sharing

Office Location(s): The company's main office is located in Amman, Jordan, with additional offices in the Middle East and North Africa region

Workspace Context:

Collaborative Workspace: The office features open-plan workspaces, collaboration areas, and meeting rooms to support teamwork and communication
Development Tools: The office is equipped with modern development tools, including multiple monitors, testing devices, and high-speed internet connectivity
Cross-Functional Collaboration: The office encourages cross-functional collaboration between developers, designers, and other stakeholders to ensure user-focused and innovative solutions

Work Schedule: The company offers flexible working hours to accommodate project deadlines and maintenance windows. The role also requires participation in a 24/7 on-call rotation to ensure incident response and system uptime.

📝 Enhancement Note: The company's modern, collaborative workspace and flexible working hours create an environment that values teamwork, innovation, and work-life balance.

📄 Application & Technical Interview Process

Interview Process:

Technical Assessment: A technical assessment focused on cloud-native reliability, security-first design, and operations-as-code, including hands-on exercises and problem-solving scenarios
System Design Discussion: A system design discussion to evaluate the candidate's ability to architect resilient, secure, and scalable cloud-native platforms
Behavioral & Cultural Fit Assessment: An assessment of the candidate's behavioral and cultural fit with the company's values and team dynamics
Final Evaluation: A final evaluation based on the candidate's performance in the technical assessment, system design discussion, and behavioral and cultural fit assessment

Portfolio Review Tips:

Cloud & Systems: Highlight your experience in cloud-based, distributed systems (AWS, Azure, or GCP) with case studies showcasing your architecture and deployment processes
Automation & IaC: Emphasize your expertise in automation frameworks and IaC tools (e.g., Terraform, Ansible, Pulumi) with examples of automated infrastructure and service deployments
Security: Showcase your security background with examples of implementing access control, secrets management, audit logging, and runtime protection in your projects
Incident Management: Include case studies demonstrating your incident response, troubleshooting, and preventive solutions implementation skills

Technical Challenge Preparation:

Cloud & Systems: Brush up on your knowledge of cloud-based, distributed systems (AWS, Azure, or GCP) and container orchestration (e.g., Kubernetes) to prepare for hands-on exercises and problem-solving scenarios
Automation & IaC: Familiarize yourself with automation frameworks, IaC tools (e.g., Terraform, Ansible, Pulumi), and scripting (Bash, Python, or Go preferred) to demonstrate your expertise in automated infrastructure and service deployments
Security: Review your knowledge of security best practices, including access control, secrets management, audit logging, and runtime protection, to ensure you can effectively address security-related challenges
Incident Management: Prepare for incident response, troubleshooting, and preventive solutions implementation scenarios to demonstrate your incident management skills

ATS Keywords:

Cloud & Systems: AWS, Azure, GCP, Kubernetes, IaC, Terraform, Ansible, Pulumi, Cloud-Native, Distributed Systems
Automation & IaC: CI/CD, Automation Frameworks, IaC Tools, Scripting, Bash, Python, Go, Infrastructure as Code
Security: Zero Trust, Secrets Management, Least-Privilege Access, Access Control, Audit Logging, Runtime Protection, Compliance, Vulnerability Management
Incident Management: Incident Response, Troubleshooting, Preventive Solutions, On-Call Rotation, Incident Retrospectives, Mean Time to Resolve (MTTR), Incident Frequency Reduction
Monitoring & Observability: Monitoring, Alerting, Observability, Prometheus, Grafana, ELK, Datadog, Service Level Objectives (SLOs), Automated Recovery, Self-Healing Mechanisms
Incident Management: Incident Response, Troubleshooting, Preventive Solutions, On-Call Rotation, Incident Retrospectives, Mean Time to Resolve (MTTR), Incident Frequency Reduction
Soft Skills: Communication, Collaboration, Problem-Solving, Decision-Making, Leadership, Mentoring, Teamwork

📝 Enhancement Note: Tailor your resume and portfolio to highlight your experience with cloud-native reliability, security-first design, and operations-as-code to showcase your fit for this role.

🛠 Technology Stack & Web Infrastructure

Cloud & Systems:

Cloud Providers: AWS, Azure, or GCP
Container Orchestration: Kubernetes
IaC Tools: Terraform, Ansible, Pulumi
Scripting: Bash, Python, or Go

Automation & IaC:

CI/CD Tools: Jenkins, GitLab CI/CD, or similar
Automation Frameworks: Ansible, Puppet, or similar
Infrastructure as Code (IaC): Terraform, CloudFormation, or similar

Security:

Identity & Access Management (IAM): Okta, Azure Active Directory, or similar
Secrets Management: HashiCorp Vault, AWS Secrets Manager, or similar
Vulnerability Management: Nessus, OpenVAS, or similar

Monitoring & Observability:

Monitoring Tools: Prometheus, Grafana, ELK, Datadog, or similar
Alerting Tools: PagerDuty, OpsGenie, or similar
Service Level Objectives (SLOs): SLO Manager, Prometheus, or similar

📝 Enhancement Note: Familiarize yourself with the company's technology stack and infrastructure to ensure a smooth onboarding process and effective collaboration with the team.

👥 Team Culture & Values

Web Development Values:

Cloud-Native Reliability: Prioritize cloud-native reliability, security-first design, and operations-as-code to ensure high-performing SRE culture
User Experience: Focus on user experience and user impact to drive innovation and continuous improvement
Performance Optimization: Optimize system performance, reliability, and cost efficiency to ensure scalable and secure cloud-native platforms
Collaboration & Knowledge Sharing: Encourage collaboration, knowledge sharing, and teamwork to foster a culture of learning and growth

Collaboration Style:

Cross-Functional Integration: Facilitate cross-functional integration between developers, designers, and stakeholders to ensure user-focused and innovative solutions
Code Review Culture: Implement a code review culture to ensure high-quality software delivery and knowledge sharing
Peer Programming: Encourage peer programming and mentoring to foster a culture of learning and growth

📝 Enhancement Note: The company's focus on cloud-native reliability, user experience, and collaboration creates an environment that values innovation, teamwork, and continuous improvement.

⚡ Challenges & Growth Opportunities

Technical Challenges:

Cloud-Native Reliability: Design, automate, and operate cloud-native platforms supporting AI and big data workloads, ensuring high availability, performance, and security
Security-First Design: Champion Chaos Engineering and reliability testing to expose failure points before they reach production and drive adoption of Zero Trust, secrets management, and least-privilege access in system-level architecture
Incident Management: Troubleshoot high-priority infrastructure or service incidents, implement preventive solutions, and ensure transparent incident response
Performance Optimization: Identify patterns and analytics to inform Service Level Objectives (SLOs) and integrate automated recovery and self-healing mechanisms to optimize system performance, reliability, and cost efficiency

Learning & Development Opportunities:

Technical Skill Development: Develop your expertise in cloud-native reliability, security-first design, and operations-as-code to advance your career in the field
Emerging Technologies: Explore emerging technologies in AI and big data workloads to drive innovation and continuous improvement
Technical Leadership: Demonstrate your technical expertise and leadership by mentoring team members and driving technical decision-making

📝 Enhancement Note: This role offers strong technical challenges and growth opportunities for experienced DevOps engineers looking to advance their careers in cloud-native reliability, security-first design, and operations-as-code.

💡 Interview Preparation

Technical Questions:

Cloud & Systems: Describe your experience with cloud-based, distributed systems (AWS, Azure, or GCP) and container orchestration (e.g., Kubernetes). How have you designed, automated, and operated cloud-native platforms supporting AI and big data workloads?
Automation & IaC: Explain your expertise in automation frameworks, IaC tools (e.g., Terraform, Ansible, Pulumi), and scripting (Bash, Python, or Go preferred). How have you automated cloud infrastructure, systems provisioning, and service deployments using modern Infrastructure as Code (IaC) practices?
Security: Discuss your background in security best practices, including access control, secrets management, audit logging, and runtime protection. How have you championed Chaos Engineering and reliability testing to expose failure points before they reach production?
Incident Management: Describe your experience in incident response, troubleshooting, and preventive solutions implementation. How have you ensured transparent incident response and minimized downtime and impact on users?

Company & Culture Questions:

Company Culture: How do you see yourself contributing to the company's culture of innovation, collaboration, and continuous improvement?
Team Dynamics: Describe your experience working in a team environment and how you have contributed to team success and growth.
User Focus: How do you ensure that your technical decisions and solutions prioritize user experience and user impact?

Portfolio Presentation Strategy:

Cloud & Systems: Highlight your experience in cloud-based, distributed systems (AWS, Azure, or GCP) with case studies showcasing your architecture and deployment processes
Automation & IaC: Emphasize your expertise in automation frameworks and IaC tools (e.g., Terraform, Ansible, Pulumi) with examples of automated infrastructure and service deployments
Security: Showcase your security background with examples of implementing access control, secrets management, audit logging, and runtime protection in your projects
Incident Management: Include case studies demonstrating your incident response, troubleshooting, and preventive solutions implementation skills

📝 Enhancement Note: Prepare for technical and behavioral questions related to cloud-native reliability, security-first design, operations-as-code, incident management, and user experience to ensure a successful interview.

📌 Application Steps

To apply for this Site Reliability Engineer (SRE) - Cloud, Systems, Automation, Security position:

Tailor Your Resume: Highlight your experience in cloud-native reliability, security-first design, operations-as-code, incident management, and user experience to showcase your fit for this role
Customize Your Portfolio: Showcase your expertise in cloud-based, distributed systems (AWS, Azure, or GCP), automation frameworks, IaC tools, and security best practices with case studies and live demonstrations
Prepare for Technical Challenges: Brush up on your knowledge of cloud-native reliability, security-first design, operations-as-code, incident management, and user experience to ensure success in technical assessments and interviews
Research the Company: Familiarize yourself with the company's mission, values, and culture to ensure a strong fit and effective collaboration with the team

⚠️ Important Notice: This enhanced job description includes AI-generated insights and web development/server administration industry-standard assumptions. All details should be verified directly with the hiring organization before making application decisions.

Site Reliability Engineer (SRE) - Cloud, Systems, Automation, Security