Ai infrastructure engineer

Gland

Swissquote

Inserat online seit: Veröffentlicht vor 4 Std.

Beschreibung

PpBuilding the bank of tomorrow takes more than skills. /ppIt means combining our differences to imagine, discuss, code, develop, test, learn… and celebrate every step together. Share our vibes? Join Swissquote to unleash your potential. /ppWe are the Swiss Leader in Online Banking and we provide trading, investing and banking services to+650’000 clients, through our performant and secured digital platforms. /ppOur +1200 employees work in a flexible way, without dress code and in multicultural teams. /ppBy having a huge impact on the industry, they are growing their skills portfolio and boosting their career in a fast‑pace environment. Have a look behind the scenes by checking Humans of Swissquote on Instagram. /ppWe are all in at Swissquote. As an equal opportunity employer, we welcome candidates from all backgrounds, experiences and perspectives to join our team and contribute to our shared success. /ppAre you all in? Don’t be shy, apply! /ph3Job Description /h3pYou will join the IT Department’s IT Platform Operations team, whose role is to operate the layer between raw infrastructure and the bank’s corporate‐facing services: the application‑tier middleware fabric, the Kubernetes control plane, and the user‑facing surface of the bank’s Sovereign AI Platform. /ppThe ideal candidate will possess deep expertise in operating Kubernetes‑native platform engineering systems at scale, and will lead the integration of open‑source AI tooling within a regulated corporate environment while ensuring large language model (LLM) inference scales. Your expertise will help your team deliver the platform on which the bank provides governed access to internal and external AI capabilities — distributed inference, agentic workflows, notebooks, and chatbots — built on top of the GPU and serving substrate provided by the Systems Storage teams. /ppWith your team, you will work closely with IT Architects, Observability Performance Analysts, the Cybersecurity function and the Systems teams to plan and execute the department’s long‑term objective of a sovereign AI capability that runs under the bank’s own governance — data sovereignty, content safety, prompt‑injection defenses, agentic‑workflow audit, and cost control on external API spend — and that is AI Act‑ and DORA‑ready by design. /pulliDesign, deploy and operate distributed LLM inference (LLM‑d) on Kubernetes — sizing for throughput, tail latency and GPU utilisation against the serving substrate provided by IT Systems Services (ITSS). /liliOperate and harden the user‑facing AI surface: the Open WebUI cross‑department chatbot, JupyterHub notebooks for data scientists, and the agent catalog (agentregistry). /liliBuild and operate Agentgateway as the governed routing layer to external providers (Anthropic Claude API, OpenAI GPT API), enforcing traffic policy, rate limiting, cost controls and audit logging. /liliImplement content‑safety, prompt‑injection defense and agentic‑workflow audit controls, plus the agent‑identity model required for EU AI Act and DORA compliance. /liliOperate the Kubernetes control plane — etcd, API server, scheduler and controller‑manager — with HA sizing and surge‑upgrade discipline; contribute to multi‑cluster management for the meshed cross‑cluster pattern. /liliDefine SLOs and instrument the platform for performance and availability; lead incident response across the AI platform and control‑plane critical path. /liliAutomate platform provisioning and configuration through Infrastructure as Code and governed automation (AAP), keeping every deployment repeatable, reviewable and auditable. /liliDevelop and maintain architecture documentation and operational runbooks, and participate in the 24×7 on‑call rotation. /li /ulh3Qualifications /h3pMinimum Qualifications /pulli7+ years of experience in infrastructure or platform engineering, with at least 3 years operating production Kubernetes and/or machine‑learning serving workloads at scale. /liliProven experience managing complex, mission‑critical IT environments and contributing to large‑scale platform projects. /liliExperience in regulated or high‑assurance industries such as banking, telco, aviation, pharmaceutics or government. /liliStrong understanding of Kubernetes internals, container runtimes, distributed systems, networking and cloud‑native security. /liliExcellent interpersonal skills, capable of working with multi‑functional technical and business teams, along with different levels of management to influence decision making. /li /ulpPreferred Qualifications /pulliHands‑on experience with LLM‑d or comparable distributed inference / model‑serving frameworks (e.g. vLLM, TGI, NVIDIA Triton, Ray Serve, KServe). /liliExperience operating JupyterHub, Open WebUI, or similar multi‑tenant notebook and chatbot platforms. /liliFamiliarity with Kubernetes‑native agentic frameworks (e.g. kagent), AI traffic‑routing / gateway layers (e.g. Agentgateway), and agent‑registry / catalog patterns. /liliExperience integrating and governing external LLM providers (Anthropic Claude, OpenAI GPT) — routing, rate limiting, cost control and audit. /liliProficiency in one or more of the following languages: Python, Go, Rust, Java, C++. /liliComfortable with Infrastructure as Code and governed automation tooling (Ansible / AAP, Terraform, etc.); familiarity with event streaming (Apache Kafka) and observability stacks. /li /ulh3Job Location /h3 /p #J-18808-Ljbffr

Bewerben

E-Mail Alert anlegen

Speichern