We are looking for a Senior DevOps / Site Reliability Engineer (SRE) with strong expertise in infrastructure automation, CI/CD optimization, observability, and cloud reliability engineering. The role involves building scalable DevOps solutions on Azure, ensuring high availability, resilience, and security of mission-critical systems.
The ideal candidate will have hands-on experience in Infrastructure as Code (IaC), container orchestration, monitoring frameworks, and incident response, while driving DevSecOps alignment across development, QA, and architecture teams.
Key Responsibilities
-
CI/CD Pipeline Development – Build and manage scalable CI/CD pipelines for web and mobile applications, enabling automated build, test, and deployment workflows.
-
Pull Request Validation Workflows – Implement automated PR pipelines with linting, static analysis, unit testing, and integration checks to enforce code quality.
-
Security & Code Quality Automation – Integrate SonarQube, SCA (Software Composition Analysis), and vulnerability scanning tools to enforce compliance and security.
-
Environment-Specific Deployments – Configure deployment strategies with approval gates, rollback mechanisms, and environment-specific variables.
-
Infrastructure as Code (IaC) – Automate infrastructure provisioning using Terraform and Helm charts.
-
Azure Cloud Management – Ensure availability, scalability, and resilience of applications hosted on Azure (AKS, App Services, VMs, Functions, App Gateway, VNets, Key Vault).
-
Observability & Monitoring – Implement monitoring with Azure Monitor, Grafana, Prometheus, Application Insights and set up custom alerts/dashboards.
-
Secrets Management – Manage and secure secrets via Azure Key Vault and integrate them with CI/CD pipelines.
-
Incident Response & SRE Practices – Establish on-call rotations, conduct postmortems, and apply reliability engineering practices for system stability.
-
Collaboration – Work closely with development, QA, and architecture teams to align with DevSecOps best practices.
-
Capacity & Reliability Planning – Contribute to scalability, cost optimization, and long-term infrastructure planning.
Must-Have Skills
-
Strong expertise in Azure DevOps (Pipelines, Repos, Artifacts).
-
Deep knowledge of Terraform and Helm for IaC and Kubernetes management.
-
Hands-on experience with Azure Kubernetes Service (AKS) and related Azure services (Functions, App Gateway, VNets, Key Vault).
-
Proficiency in observability tools – Azure Monitor, Application Insights, Prometheus, Grafana.
-
Solid understanding of Linux, Docker, Kubernetes, and CI/CD workflows.
DevOps Tech Stack
| Category | Tools / Technologies |
|---|---|
| CI/CD Pipelines | Azure DevOps, GitHub Actions, GitLab CI, Jenkins, Bitrise |
| Version Control | Git, GitHub, GitLab, Bitbucket |
| Infrastructure as Code | Terraform, Ansible, Helm, Bicep |
| Containerization & Orchestration | Docker, Kubernetes, AKS/EKS/GKE, Dapr |
| Code Quality & Security | SonarQube, Snyk, Trivy, Checkmarx, ESLint, Prettier |
| Monitoring & Logging | Prometheus, Grafana, ELK Stack, Azure Monitor, App Insights |
| Artifact Management | JFrog Artifactory, Nexus, GitHub Packages |
| Mobile Build Automation | Fastlane, Bitrise, App Center, Firebase App Distribution |
| Release Management | Azure DevOps Releases, GitHub Environments, Argo CD |
| Secrets Management | Azure Key Vault, HashiCorp Vault, AWS Secrets Manager |
Good to Have
-
Experience with GitOps, ArgoCD, and Service Mesh (Istio/Linkerd).
-
Knowledge of security tools – Snyk, AquaSec, Trivy.
-
Familiarity with FinOps practices for cloud cost monitoring and optimization.
Soft Skills & Competencies
-
Strong problem-solving and analytical abilities.
-
Ability to manage complex projects and multiple environments.
-
Excellent communication and collaboration skills.
-
Passion for automation, reliability, and continuous improvement.
Work Environment
This is an on-site role in Abu Dhabi, UAE, within a fast-paced enterprise digital transformation environment. The candidate will be at the center of mission-critical projects, collaborating with cross-functional teams to deliver secure, resilient, and scalable DevOps and SRE solutions.