* Design, deploy, and manage production‑grade AWS cloud infrastructure including EKS, EC2, RDS, Lambda, and Step Functions
* Manage multi‑tenant Kubernetes clusters with a focus on autoscaling, workload isolation, network policies (Calico), and policy governance (Kyverno)
* Implement and maintain Infrastructure as Code using Terraform with reusable, version‑controlled modules across multiple environments
* Build, own, and continuously improve CI/CD pipelines using GitLab CI and ArgoCD, following GitOps declarative deployment principles
* Establish and maintain end‑to‑end observability using Prometheus, Grafana, Loki, Thanos, Datadog, and Dynatrace for real‑time alerting and performance insights
* Configure and enforce secure AWS networking (VPC, VPN, NAT, Transit Gateway) and implement IAM, WAF, and KMS security governance
* Manage Linux‑based environments (Amazon Linux, RHEL) including system configuration, networking, and automated patching via Ansible
* Lead cost optimization initiatives through right‑sizing, autoscaling policy design, and resource utilisation analysis
* Perform troubleshooting and root cause analysis for production incidents, ensuring rapid resolution with minimal service impact
* Contribute to security observability initiatives including SIEM integration (Wazuh or equivalent) and LLM‑enabled operational tooling where applicable
* Collaborate closely with program delivery teams, solution architects, and SITA stakeholders to align infrastructure with program objectives
TECHNICAL SKILLS & REQUIREMENTS
Skill Area
Required Proficiency
AWS Cloud Services: EKS, EC2, RDS, Lambda, Step Functions, Route53, CloudFront, WAF, KMS, IAM — hands‑on deployment and governance
CI/CD & GitOps: GitLab CI, ArgoCD declarative deployments, AWS CodeBuild/CodeDeploy; GitOps workflow ownership end‑to‑end
Monitoring & Observability: Prometheus, Grafana, Loki, Thanos, CloudWatch, Datadog, Dynatrace — full‑stack observability and alerting
Networking & Security: VPC, VPN, NAT Gateway, Transit Gateway configuration; IAM, WAF, KMS governance; Wazuh SIEM or equivalent
Linux Administration: Amazon Linux, RHEL/CentOS — system configuration, networking (ufw/firewalld), scripting
Scripting & Automation: Bash scripting for system administration; Ansible for configuration management and patching
AI/LLM Integration: Desirable: on‑prem or cloud LLM deployment (OpenShift/Kubernetes), prompt engineering, AI‑enabled workflows
ESSENTIAL REQUIREMENTS
* Minimum 4 years of hands‑on DevOps / Cloud Infrastructure engineering experience in production environments
* Demonstrated experience managing production EKS clusters with Terraform IaC — portfolio evidence or verifiable project history required
* Proficiency in GitOps methodology with ArgoCD and GitLab CI as primary toolchain
* Strong AWS services knowledge across compute, networking, security, and serverless (EKS, EC2, RDS, Lambda, IAM, WAF, KMS, Route53)
* Full observability stack experience: Prometheus + Grafana + Loki minimum; Datadog or Dynatrace highly preferred
* Linux administration proficiency (Amazon Linux / RHEL) with Bash scripting and Ansible automation
DESIRABLE / NICE-TO-HAVE
* Experience with LLM/AI platform deployment on Kubernetes or OpenShift (on‑prem or cloud)
* Familiarity with Wazuh SIEM, Sysdig, or equivalent security event monitoring tooling
* Experience with Rook CEPH or distributed storage management for persistent workloads
* Serverless migration experience (on‑prem to AWS Lambda / Aurora / Step Functions)
* Karpenter node provisioner experience for cost‑efficient EKS autoscaling
* Aviation or travel industry domain exposure (not mandatory)
EDUCATION & QUALIFICATIONS
A Bachelor's or Master's degree in Computer Science, Software Engineering, or a related technical discipline is preferred.
Relevant certifications will strengthen an application:
* AWS Certified DevOps Engineer – Professional or AWS Certified Solutions Architect
* HashiCorp Terraform Associate or Professional
#J-18808-Ljbffr