Jobs
Meine Anzeigen
Meine Job-Alerts
Anmelden
Einen Job finden Tipps & Tricks Firmen
Suchen

Senior ml/rl training infrastructure engineer

Zürich
Apple
Inserat online seit: 3 Dezember
Beschreibung

Ready to transform how billions of people interact with technology? Apple's Core Foundation Models team is driving the intelligence that powers experiences across billions of devices worldwide—and we're looking for exceptional talent to join us Join our Europe-based applied ML team building the next generation of large-scale ML and RL training infrastructure for Apple's foundation models. We develop high-performance, distributed systems that power cutting-edge foundation model research on a massive scale. We are seeking an engineer who is passionate about designing, optimizing, and scaling the infrastructure that enables state-of-the-art machine learning and reinforcement learning workloads.

As a senior member of the team, you will work closely with researchers and systems engineers to build robust training frameworks, accelerate experimentation, and push the boundaries of performance and efficiency. You will collaborate with teams across Apple's engineering hubs—including New York, Seattle, and Cupertino—to advance the tooling and systems that make large-scale model training possible. If you thrive at the intersection of distributed systems, ML frameworks, and high-performance computing, this is the role for you.



Description


As a core member of our ML infrastructure team, you will design, build, and scale the systems that enable large-scale reinforcement learning for Apple's foundation models. You will focus on TPU-based training with JAX, developing robust, high-performance RL pipelines that support distributed actor/learner architectures, efficient experience replay, and large-scale environment execution.

In this role, you will work across the full stack of RL training systems—from low-level performance tuning and compiler optimization to cluster-level orchestration and resource management. You will ensure that training pipelines are efficient, reliable, reproducible, and observable, enabling research teams to iterate quickly and explore more complex RL environments and models.

Your work will directly impact the scalability, throughput, and stability of RL experiments, helping to unlock new capabilities in agentic reasoning, decision-making, and policy learning for Apple's foundation models. This position is ideal for engineers who enjoy distributed systems, high-performance ML frameworks, and building the infrastructure that makes large-scale RL research possible.



Minimum Qualifications


PhD or MSc in Computer Science, Computer Engineering or a closely related field.
Hands-on experience designing, building, or maintaining large-scale ML training infrastructure.
Strong proficiency with PyTorch or JAX and experience running training workloads on GPUs/TPUs.
Solid understanding of distributed systems concepts (parallelism strategies, fault tolerance, synchronization).



Preferred Qualifications


Practical experience developing or optimizing training loops, RL pipelines, or large-scale model-training frameworks.
Strong software engineering skills in Python, with emphasis on reliability, debuggability, and high-performance execution.
Deep experience with PyTorch/JAX internals, XLA, debugging and performance profiling on GPU/TPU architectures.
Expertise in distributed RL training patterns, including actor/learner architectures, experience replay, and parallel environment execution.
Experience building training services, orchestration tools, or automated pipelines for large-scale experiments.
Proven success diagnosing bottlenecks in large-scale ML jobs (I/O, input pipelines, kernel performance, memory, compilation).
Familiarity with RL-specific infrastructure requirements (e.g., actor/learner architectures, experience replay systems, large-scale environment execution).
Strong software engineering practices: code quality, design reviews, testing, observability, CI/CD.
Experience working with cloud-scale clusters or specialized accelerators (TPU v5/v6, GPU, custom hardware)
Contributions to ML frameworks, distributed training libraries, or high-performance computing systems.
Excellent communication and collaboration skills for working with research and engineering partners.

Bewerben
E-Mail Alert anlegen
Alert aktiviert
Speichern
Speichern
Ähnlicher Job
Internship - experimentation software engineer (search), aiml
Zürich
Praktikum
Apple
Software Ingenieur
Ähnlicher Job
Internship computer vision
Zürich
Praktikum
Apple
Ähnlicher Job
Lead engineer
Zürich
Apple
Lead Engineer
Ähnliche Jobs
Stellenanzeigen Apple
Apple Jobs in Zürich
Jobs Zürich
Jobs Zürich (Bezirk)
Jobs Zürich (Kanton)
Home > Stellenanzeigen > Senior ML/RL Training Infrastructure Engineer

Jobijoba

  • Karriere & Bewerbung
  • Bewertungen Unternehmen

Stellenanzeigen finden

  • Stellenanzeigen nach Job-Titel
  • Stellenanzeigen nach Berufsfeld
  • Stellenanzeigen nach Firma
  • Stellenanzeigen nach Ort

Kontakt / Partner

  • Kontakt
  • Veröffentlichen Sie Ihre Angebote auf Jobijoba

Impressum - Allgemeine Nutzungsbedingungen - Datenschutzerklärung - Meine Cookies verwalten - Barrierefreiheit: Nicht konform

© 2025 Jobijoba - Alle Rechte vorbehalten

Bewerben
E-Mail Alert anlegen
Alert aktiviert
Speichern
Speichern