AI Engineer / Analyst (Temporary 12 months)
Job Info
Job Identification 427
Job Category INFORMATION TECHNOLOGY
Posting Date 12/08/2025, 10:22 AM
Job Schedule Full time
Contract Type Temporary
Job Description
Mission
We are seeking a Document Analyst / AI Engineer to operate, enhance, and industrialize our document classification pipeline across enterprise content systems (Documentum, SharePoint, Confluence) with AI and financial domain background and education. The role blends hands‑on ML engineering, infrastructure‑as‑code (Ansible), model management via Nvidia NIM, and observability (Grafana), ensuring accurate classification, robust operations at scale (100k+ financial documents), and continuous evaluation and improvement.
This position will maintain and extend the current production pipeline that uses an unsupervised approach with multi‑class detection, optional summarization, and rigorous evaluation. You also will study and research the migration from unsupervised to supervised approaches, with the creation of the training dataset, training models on our documents, model versioning, and continuous evaluation (precision/recall/F1), as well as compare various machine learning approaches.
You will collaborate with ECM/IT stakeholders to integrate and automate flows, secure data, and deliver measurable improvements to precision/recall, throughput, and reliability.
Main responsibilities
Pipeline Operations and Enhancement
Operate and improve the document classification large‑scale pipeline
Implement and refine multi‑class detection by analyzing sample sequences of PDF pages;
Maintain and generate document summaries via VLM tuning.
Ensure robust long‑run executions on large‑scale corpora including batching, parallelization, and retry/restart.
Store and efficiently load and update logs and outputs in a secure PySQL database, ensuring availability and consistency.
Configuration Management and Versioning
Maintain container‑mounted JSON configuration (system, category descriptions, filename mappings).
Version and A/B test category definitions and prompts; manage rollbacks and change logs.
Enforce structured outputs (JSON schema) and guided choices for consistent predictions.
Deployment and Integration
Automate deployments and updates using Ansible (idempotent playbooks, inventories, secrets handling).
Integrate with ECM systems: Documentum (ID mapping, retrieval/update), and expand to SharePoint/Confluence ingestion and publication.
Operate and consume models via Nvidia NIM (multimodal/classification endpoints), ensuring performance and availability.
Observability, Reliability, and Compliance
Build and maintain Grafana dashboards and alerts (latency, throughput, errors, success rates, SLOs).
Implement comprehensive logging and experiment tracking.
Apply security‑by‑design and data protection best practices for potentially sensitive documents.
Evaluation and Continuous Improvement
Run evaluator modes (predict, eval, all) and produce metrics: accuracy, precision, recall, F1, confusion matrix, statistical confidence intervals.
Conduct error analysis, refine prompts and category notes, expand ground truth, and compare runs.
Document findings and recommendations; communicate outcomes to stakeholders and drive iteration.
Collaboration and Coordination
Act as liaison between AI, ECM, and IT teams to align requirements, timelines, and deployments.
Facilitate workshops and status reviews; prepare clear reports and dashboards for management.
Your profile
Must‑have
Master’s degree in Data Science from a high ranked university, with strong academic and research background and related large‑scale ML projects/publications.
Similar previous AI/ML/Analysis background in the financial and IT domains.
Strong Python skills for ML pipelines (pytorch, tensorflow, transformers, langchain, OpenAI, etc.), designing model architecture and fine‑tuning transformer models.
Strong Python engineering for production pipelines (FastAPI/back‑end orchestration, PDF/image processing with PyMuPDF/fitz, PIL, Open Telemetry, SQLAlchemy, alembic, gradio).
Experience with latest multilingual OCR techniques (DeepSeek OCR, Chandra, Docling, etc.), including layout and structure extraction for multipage documents.
Knowledge in statistical result testing, observation and experimentation.
Ansible for containerized deployments and configuration management (playbooks, roles, inventories, secrets).
Experience with Nvidia NIM for ML model and container serving, ideally multimodal/VLM use cases.
ECM integration experience, preferably with Documentum (API usage, ID mapping, CRUD), and exposure to SharePoint/Confluence.
Observability using Grafana (dashboards, metrics, alerting) for ML/data pipelines.
Familiarity with cascade classification designs and prompt engineering techniques with structured outputs (JSON schema, guided decoding).
Practical knowledge of batch processing, GPU/CPU parallelization (e.g., ProcessPoolExecutor), and reliability patterns for large‑scale runs.
Solid understanding of classification evaluation methodologies and ML performance metrics.
Nice‑to‑have
Experience with Huggingface hub, such as loading, uploading, fine‑tuning models.
Evaluation frameworks for category definitions and prompts.
PostgreSQL for experiment tracking
CI/CD with Azure DevOps for container building and hardening.
Containers technologies with Podman
Security and compliance awareness in handling sensitive documents.
On premise registries access through Artifactory
Soft Skills
Clear written and oral communication (technical reports, dashboards, stakeholder updates).
Structured problem‑solving, attention to detail, and ownership mindset.
Ability to collaborate cross‑functionally and drive change.
Language Skills
Fluency in English required; French is a strong plus.
Adapt to bilingual work environments when needed.
Core Competencies : Adherence to the company’s values: Dedication, Conviction, Agility and Responsibility - Compliance with regulations and internal directives
#J-18808-Ljbffr