Job Summary
We are seeking a highly skilled Senior DevSecOps Engineer to lead the design and governance of secure CI/CD pipelines for AI/ML workloads across Azure Cloud and on-premises SUSE Linux GPU clusters. The candidate will play a key role in ensuring security, compliance, and reliability for hybrid environments leveraging NVIDIA H200 GPU infrastructure.
Key Responsibilities
-
Design and implement Azure DevOps pipelines for automating deployment of GPU-enabled workloads, containers, and ML models.
-
Integrate security scanning tools (SAST, DAST, dependency analysis) within CI/CD workflows.
-
Automate SUSE Linux system hardening, patching, and compliance enforcement for GPU nodes.
-
Build and manage Infrastructure as Code (IaC) using Terraform and Bicep for hybrid GPU infrastructure provisioning.
-
Implement zero-trust architecture and enforce RBAC across hybrid workloads.
-
Configure and manage Azure Key Vault, Azure Policy, and Defender for Cloud for secure configurations.
-
Monitor GPU utilization and costs using Azure Monitor and NVIDIA DCGM integrations.
-
Manage Kubernetes security via GPU Operator and device plugin DaemonSets for AKS/Arc clusters.
-
Drive continuous improvements in compliance, automation, and observability across cloud and on-prem environments.
Technical Expertise
-
Azure DevOps: Repos, Pipelines, Boards, Artifacts
-
Operating Systems: SUSE Linux Enterprise Server (SLES) administration using zypper and YaST
-
GPU Management: NVIDIA GPU Operator, CUDA runtime, Kubernetes GPU workloads
-
Azure Security Suite: Defender for Cloud, Azure Policy, Key Vault, Sentinel
-
IaC & Automation: Terraform, Terragrunt, Python scripting
-
Monitoring & Logging: Azure Monitor, Grafana, Prometheus
Preferred Skills
-
8+ years in DevOps/DevSecOps with proven experience in hybrid cloud + on-prem infrastructure.
-
Hands-on expertise managing SUSE Linux GPU-enabled systems.
-
In-depth knowledge of H200 GPU operations, CUDA libraries, and lifecycle management.
-
Experience integrating security and compliance into AI/ML CI/CD pipelines.
-
Familiarity with CIS Benchmarks, ISO 27001, and NIST frameworks for hybrid environments.
Technical Screening Rubric (Hands-On Tasks)
Candidates may be assessed on:
-
Creating an Azure DevOps pipeline for GPU-enabled container deployment on AKS/Arc clusters.
-
Automating SUSE Linux hardening and compliance reporting within CI/CD.
-
Deploying and validating NVIDIA GPU Operator in Kubernetes clusters.
-
Developing Terraform IaC for provisioning hybrid GPU infrastructure.
-
Integrating Defender for Cloud and Key Vault for security compliance validation.