Job Description
Join us to lead, design, and operate OpenShift infrastructures with advanced automation and governance.
Key Roles and Responsibilities
* Architect, implement, and maintain large‑scale OpenShift clusters.
* Manage cluster core components: API server, etc scheduler, controllers, MCO, ingress, registry, SDN/OVN‑Kubernetes.
* Lead zero‑downtime upgrades, multi‑phase version transitions, and rollout/rollback strategies.
* Design cluster capacity models, resource plans, HA/DR topology, infra/worker segmentation.
* Manage Subscription channels, InstallPlans, CSV transitions, CRDs, Catalog Sources and operator dependencies.
* Troubleshoot operator failures, operand issues, and API deprecations.
* Guide developers on operator usage patterns and GitOps workflows.
* Function as the highest technical escalation for cluster, node, networking, storage, registry, ingress, and workload issues.
* Deliver RCA, problem prevention plans, and stability recommendations.
* Manage CSI drivers, snapshotting, cloning, PV/PVC design, and storage topology.
* Implement and manage Velero/OADP backup and application‑level restore workflows.
* Define DR strategy, namespace recovery, cluster rebuild workflows.
* Implement CIS benchmarks, RBAC/SCC policies, audit logs, TLS/hardening configurations.
* Manage authentication integrations (AD/LDAP/OAuth).
* Ensure compliance with enterprise governance, patching standards, image policies, and remediation vulnerability.
* Automate infra operations using Ansible, Terraform, Helm, and Bash/Python scripting.
* Integrate OpenShift with CI/CD pipelines (GitOps, Jenkins, Tekton, Argo CD).
* Build reusable automation frameworks for cluster provisioning and operational workflows.
* Configure and optimize Prometheus, Alertmanager, Grafana, Loki/EFK stack.
* Build actionable dashboards, alert rules, and log routing pipelines.
* Lead performance tuning and SLO/SLI management.
* Mentor L1/L2 support teams and provide KT, SOPs, and runbooks.
* Lead war‑rooms, incident bridges, and cross‑team collaboration (Network, Security, VMware, Storage).
* Represent the platform team in architecture and customer‑facing discussions.
Technical Skills
* Strong hands‑on Linux (RHEL/SUSE/Oracle Linux) admin knowledge.
* Deep understanding of Kubernetes internals and OpenShift 3.x/4.x architecture.
* Experience with VMware vSphere (HA/DRS/Networking) and container registries.
* Expertise in:
o Machine Config Operator
o Ingress/Route architecture
o SDN/OVN networking
o Node lifecycle operations
o CRI‑O/Docker, CNI, CSI
* Strong troubleshooting ability using oc/kubectl, journald, tcpdump, strace, Wireshark, systemd tools.
* Solid grasp of Git, DevOps, automation and infra‑as‑code workflows. Strong Linux/RHCOS experience and working knowledge of systemd, SELinux, and OS hardening.
* Expertise with oc, kubectl, YAML, RBAC, quotas, projects, and Operators.
* VMware vSphere operational experience:
o Cluster HA/DRS
o VM placement and sizing
o Datastore troubleshooting
o Storage fundamentals: CSI, ONTAP/Trident, NFS, iSCSI, PV/PVC lifecycle.
o Networking fundamentals: SDN, routing, MTU, ingress controllers, load balancers.
o Backup & restore fundamentals (Velero/PowerProtect).
o Monitoring tools (Prometheus, Alertmanager, Grafana).
o Develop and maintain runbooks, SOP, and automation scripts and quarterly review with version history.
o Collaboration with engineers for issue escalation and resolution.
o Experience working with Cross‑functional teams (Cloud, Security, Network, Developers Firewall teams)
o Ability to handle on‑call rotation and work in a 24*7 support env.
o Understanding ITSM workflows to handle P4, P3, P2, P1 SLA.
o Collaborate with VMware, Network, Storage, Security, and Application teams.
o Familiarity with CI/CD concepts and automation workflows.
o Experience with Bash/Shell scripting.
Process & Tools
* Good exposure in various ticketing and monitoring tools
* Vendor coordination
* Good to have Red Hat Certified Specialist in OpenShift Administration
* Terraform/Ansible, CKA - Certified Kubernetes Administration certifications.
Behavioral Skill
* Ability to own complex technical problems end‑to‑end.
* Strong communication with ability to articulate issues to senior stakeholders.
* Experience working with global customers in high‑pressure environments.
* Ability to lead, mentor, and influence without authority.
Benefits
* Regular All‑hands Meetings
* Referral Program
* Comprehensive Learning Offerings
* Company pension plan
* Highly international, high‑Performance culture
* Diversity is in our DNA
* Women network
Experience: 8-10 Years
#J-18808-Ljbffr