Svp, site reliability engineering domain lead, sre & governance, group technology

Sankt Gallen

United States Digital Space LLC

EUR 135’000 pro Jahr

Inserat online seit: Veröffentlicht vor 22 Std.

Beschreibung

Roles & Responsibilities

* Manage a large team of Production Support Personnel across multiple geographical locations covering Applications and Infrastructure
* Ensure SLAs on Alerts and Incidents (Application & Infra) are proactively managed and reduce Mean Time To Recover (MTTR) by 20%
* Ensure strict adherence to Standard Operating Procedures for recovery across Application and Infrastructure layers
* Deliver a playbook for onboarding new tasks / activities covering both Application and Infrastructure support models
* Identify opportunities to automate Production support activities (App & Infra) and reduce manual interventions
* Drive application and infrastructure improvements including performance, capacity, resilience, and operational stability; eliminate toil through automation
* Automate manual activities/processes and system health checks for Production Applications and Infrastructure ; ensure SLIs/SLOs are defined and met
* Follow Production Support Processes and provide inputs to continuously strengthen them for App + Infra operations
* Provide status to leads, stakeholders and work with vendors to review Infra/Application design, fixes, and production deployments
* Coordinate recurring issues and ensure long-term resolution through robust Incident and Problem Management across Infra and Application domains
* Work with Infrastructure, Development, and Platform teams for root cause analysis of complex issues and outages
* Drive strong stakeholder management with focus on service stability, continuous improvement, and delivery excellence across Infra and Applications
* Lead Root Cause Analysis with technology partners and facilitate RCA reviews post incident resolution
* Work with Risk teams to respond to Audit & Risk RFIs; manage audit walkthroughs covering Infrastructure and Application controls

Requirements

* 10–15 years of experience in Banking with minimum 5+ years in a Run-the-Bank (RTB) Lead role covering Application and Infrastructure Support
* Strong implementation of Site Reliability Engineering (SRE) principles across Applications and Infrastructure including performance, reliability, monitoring, alerting, and maintenance
* Proactive capacity monitoring and observability of Production Infrastructure (compute, storage, network, platform, MF and DB) with automated alerting and reporting
* Proven experience in automation of Infra & Application support tasks and reducing manual toil
* Build and maintain monitoring and automation solutions for Infrastructure and Application stacks
* Drive service improvements by tracking SLIs/SLOs/SLAs and improving system and infrastructure performance KPIs
* Strong technical understanding across RDBMS, Unix/Linux, Cloud platforms, and Infrastructure components (servers, network, middleware, containers)
* Hands‑on knowledge of infrastructure technologies, especially Linux, Database, OpenShift (or container platforms)
* Solid understanding of BAU support, Incident/Problem Management, and escalation management across distributed Infra-App environments
* Good understanding of Infrastructure architecture, capacity planning, DR/BCP, IT security, and regulatory compliance
* Strong collaborator with experience working across global teams and vendors
* Ability to present recommendations effectively in both written and verbal formats
* Proactive, independent, resourceful, and team-oriented mindset

Location: DBS Asia Hub, Job: Technology, Schedule: Regular, Employee Status: Full time

#J-18808-Ljbffr

Bewerben

E-Mail Alert anlegen

Speichern