We're looking for a Lead Kernel Engineer/Architect to join our team in Switzerland in a hybrid working mode.
Are you passionate about pushing advanced hardware accelerators to their limits? Join us in shaping the future of AI performance and scalability.
As a Lead Kernel Engineer/Architect, you will drive the optimization of critical machine learning operations for large-scale training and inference, working with cutting-edge hardware like TPUs and GPUs, advanced ML models and performance toolchains. Your work will enable faster AI research and production deployments on cloud platforms and within open-source ecosystems.
In this role, you will collaborate with researchers, compiler engineers and framework developers to deliver optimized, high-performance solutions that set the standard for modern AI computation.
Responsibilities
* Design and optimize high-performance kernels for TPU and GPU architectures using low-level programming frameworks such as Pallas, Triton or Mosaic
* Build and maintain performance infrastructure, including benchmarking suites, autotuning systems, regression testing frameworks and tooling
* Collaborate with ML framework developers (e.g., JAX, PyTorch) and compiler teams (XLA/MLIR) to integrate custom kernels and reduce performance bottlenecks
* Track advancements in accelerator hardware, compiler technology and AI model design to identify opportunities for kernel-level optimization
* Develop clear documentation, APIs and supporting OSS components that improve developer usability and adoption
* Analyze and resolve complex performance issues impacting large-scale distributed training and inference systems
Requirements
* Bachelor’s degree or equivalent practical experience
* 12+ years of industry experience in software engineering or systems programming
* 5+ years of experience in software development using C++ or Python
* 3+ years of experience in testing, maintaining or launching software products and at least 1 year in software design or architecture
* Hands‑on experience in performance optimization at the kernel level for accelerators or high-performance systems
Nice to have
* Proficiency in low-level accelerator programming (CUDA, Triton, Pallas)
* Familiarity with ML frameworks such as JAX or PyTorch and optimization techniques for attention layers, Mixture of Experts (MoE) and precision tuning
* Strong understanding of modern hardware accelerators, including pipelining, data movement and heterogeneous compute
* Knowledge of compiler principles and intermediate representations (e.g., MLIR, OpenXLA)
* Experience building OSS developer infrastructure, APIs and performance-critical libraries
* Excellent problem‑solving skills and ability to collaborate in cross‑functional engineering environments
We offer
* 5 weeks of vacation
* EPAM Employee Stock Purchase Plan (ESPP)
* Enhanced parental leave
* Extended pension plan
* Daily sickness allowance insurance
* Employee assistance program
* Global business travel medical and accident insurance
* Learning and development opportunities including in‑house training and coaching, professional certifications, and courses
Please note that any offers will be subject to appropriate background checks.
We do not accept CVs from recruiting or staffing agencies.
For this position, we are able to consider applications from the following:
* Swiss nationals
* EU/EFTA nationals
* Third‑country nationals based in Switzerland with an appropriate work permit
* Displaced people from Ukraine who are currently in Switzerland and hold, or have already applied for, S permits
#J-18808-Ljbffr