Overview
Are you a talented Platform Engineer with a strong background in Data Engineering and DevOps who has experience supporting development and scaling of AI infrastructure? This role is ideal for someone who thrives in fast-paced environments and enjoys building robust systems that bridge raw data, machine learning models, and cloud infrastructure.
About the Role
You’ll play a key role in enabling high-performance, secure, and cost-efficient AI solutions by designing and operating scalable data pipelines, embedding workflows, and cloud-based model lifecycles.
Responsibilities
* Data Pipeline Development: Build and maintain SQL-centric ETL/ELT pipelines using tools like Python, PySpark, DBT, and Liquibase. Integrate data from sources such as MSSQL, Snowflake, Databricks, and S3.
* Embeddings & Vector Stores: Develop workflows for chunking and embedding data. Manage and optimize vector databases (e.g., Pinecone, Weaviate) with incremental updates and nightly refreshes.
* MLOps Automation: Orchestrate model training, evaluation, and drift detection using SageMaker Pipelines or Bedrock. Register models via MLflow or SageMaker Registry.
* Infrastructure & DevOps: Manage Terraform-based infrastructure, including GPU autoscaling, spot-instance optimization, and quota tracking for Bedrock usage.
* Governance & Compliance: Implement data lineage tracking, encryption, and PII masking. Prepare audit-ready documentation for SOC2, MiFID, and model risk management.
* Monitoring & Reporting: Use Grafana and Prometheus to monitor data quality, pipeline health, and operational costs. Set up alerts for proactive issue resolution.
Qualifications
* 4+ years in data or ML engineering, with 3+ years managing production-grade MLOps pipelines
* Experience in regulated industries (e.g., finance, healthcare)
* Strong collaboration skills with cross-functional teams (AI researchers, SREs, data stewards)
* Proven ability to take ownership of mission-critical infrastructure
Required Skills
* Cloud & ML Services: AWS Glue, Step Functions, SageMaker, Bedrock, S3
* Infrastructure as Code: Terraform (multi-account setups)
Preferred Skills
* Experience in regulated industries (e.g., finance, healthcare)
* Strong collaboration skills with cross-functional teams (AI researchers, SREs, data stewards)
Seniority level
* Not Applicable
Employment type
* Full-time
Job function
* Information Technology
#J-18808-Ljbffr