Overview
GenPeach AI is a product-driven research lab building vertical multimodal foundation models for hyper-realistic human generation in image and video – designed for emotionally resonant, human-centered AI experiences. Our goal is to create tools that supercharge human creativity rather than replace it.
We train models from scratch: proprietary datasets at massive scale, novel architectures and training recipes, large GPU clusters, and tight product integration so research ships to users quickly.
We are a deeply technical team of around 10 people. We’re advised by Directors from Google DeepMind and backed by leading AI-focused funds and angels from OpenAI, Meta AI, Microsoft AI, Project Prometheus, and Fal. Collectively, our team, advisors, and angels have contributed to models including Meta’s Imagine/MovieGen and foundation-model work behind OpenAI’s Sora, plus Google’s Veo and Gemini.
About GenPeach AI
You’ll join the research team working across image/video generation and multimodal understanding. You’ll work closely with other Research Engineers and Scientists, as well as Founders and help turn research into scalable training runs, strong evaluations, and production-ready systems.
Role
We’re hiring an AI Research Engineer to help build and scale GenPeach’s foundation models end-to-end – from implementing new model ideas and training recipes, to owning the parts of the training stack that determine quality and speed, to pushing models through production constraints.
This is a hands-on, high-ownership role. You’ll write research-grade code that becomes production-critical.
Responsibilities
* Implement and iterate on image/video generative model ideas (architecture, losses, conditioning, sampling, distillation, post-training)
* Own training performance end-to-end (distributed training, throughput, memory, stability, debugging scaling failure modes)
* Build the experimentation loop (evals, ablations, reproducibility tooling, reporting, decision hygiene)
* Build and improve VLMs for image/video captioning (data recipes, training strategies, model variants, evaluation)
* Run high-iteration research: read papers when useful, implement ideas, validate empirically
* Create captioning pipelines that improve generation training and product quality
* Partner with inference/product to ship under real constraints (latency, cost, reliability, rollout safety)
* Build demos and prototypes to showcase capabilities and accelerate iteration
Qualifications
Minimum Qualifications
* Strong Python and PyTorch skills (4+ years of experience)
* Experience implementing and training deep learning models (generative models, VLMs, LLMs, vision/video, or adjacent)
* Solid understanding of training dynamics, optimization, and practical debugging
* Ability to drive projects end-to-end with minimal supervision
Preferred Qualifications
* Hands-on experience with diffusion/flow-based image or video generation, or large-scale generative modeling in adjacent domains
* Experience with distributed training at scale (multi-node) and performance tuning (throughput/memory)
* Experience building evaluation frameworks (offline metrics + human eval + regression tracking)
* Strong intuition for data quality and dataset/labeling tradeoffs for training and captioning
* Publications are a plus, but shipped impact and strong technical evidence matter more
What makes this role unique
* Build frontier image/video models and the VLM captioning systems that power them
* Join a lean, senior team that holds a high engineering + research bar
* Direct product impact: your training runs become real user-facing capabilities
* Benchmark against the best in the world and compete on model quality through what we ship
How we work
* You own outcomes end-to-end and are trusted with real responsibility
* Direct, low-ego communication and fast feedback loops
* Bias toward impact: measure → iterate → ship
* Research discipline: clear ablations, reproducibility, and crisp decision-making
Logistics
* Location: Zurich (Switzerland) or Warsaw (Poland)— onsite or hybrid. If you’re elsewhere, we’re open to remote (team/timezone fit considered).
* Compensation: competitive salary + meaningful equity (level-dependent)
* Interview process: quick screen → 2x technical rounds (practical + systems) → team fit/values
What we offer
* Visa sponsorship (where applicable); we’ll make a strong effort to relocate you to Switzerland or Poland if desired
* Remote-friendly: work fully remote, hybrid, or on-site from our hubs
* Regular offsites and in-person events to collaborate and connect
* Flexible PTO
#J-18808-Ljbffr