The project aims to develop a machine learning system for the automated classification of assay protocol documents. The system will create a model or agent, track and improve its performance on challenging, imbalanced classes, and ensure it is robust and ready for integration into a production environment. The outcome will help scientists focus more on research, accelerate data access for project teams, and improve overall data quality.
The ideal candidate is an expert AI/NLP consultant with hands-on experience fine-tuning Transformer models and strong proficiency in PyTorch. They should have demonstrable experience with multi-task and multi-label classification, expertise in handling severe class imbalance in text data, and proficiency in deploying ML models as REST APIs. Strong software engineering fundamentals are also essential.
General Information:
• Start date: ASAP
• latest Start Date: 01/11/2025
• End date: 31/12/2025
• Extension: to be discussed, depending on new project pipeline
• Workplace: Basel
• Workload: 50-100%
• Home Office: possible
• Travel: No travel intended
Tasks & Responsibilities:
• Review and optimize a transformer architecture and training pipeline.
• Implement and experiment with advanced techniques to improve performance on fields with severe class imbalance.
• Conduct in-depth error analysis to identify patterns in misclassifications and propose data-driven improvements.
• Refine and validate the data processing, label mapping, and stratified data splitting procedures to ensure maximum reliability.
• Collaborate with our software architect to integrate the final, optimized model into a production-ready API for inference.
• Document the final model architecture, training procedures, performance benchmarks, and best practices for future development.
Must Haves:
• Educational background: advanced degree in AI/NLP or related field
• Min. 3 years hands-on experience fine-tuning Transformer models ****
• Demonstrable experience with multi-task and multi-label classification problems, expertise in handling severe class imbalance in text data ****
• Proficiency in deploying machine learning models as REST APIs ****
• Strong proficiency in PyTorch, including creating custom model architectures (e.g., multi-head classifiers) and custom loss functions ****
• Strong software engineering fundamentals and the ability to write clean, modular, and well-documented Python code, experience with Docker ****
• Professional proficiency in English
• Strong analytical and problem-solving skills, and collaboration abilities.
Nice to Haves:
• Direct experience working with biomedical, scientific, or other technical document formats.
• Familiarity with advanced data splitting techniques for multi-label datasets.
• Experience with MLOps principles and tools.