We are seeking a skilled Platform Engineer with a strong background in Data Engineering and DevOps to support the development and scaling of AI infrastructure. This role is ideal for someone who thrives in fast-paced environments and enjoys building robust systems that bridge raw data, machine learning models, and cloud infrastructure. You’ll play a key role in enabling high-performance, secure, and cost-efficient AI solutions by designing and operating scalable data pipelines, embedding workflows, and cloud-based model lifecycles.
Key Responsibilities:
Data Pipeline Development:
Build and maintain SQL-centric ETL/ELT pipelines using tools like Python, PySpark, DBT, and Liquibase. Integrate data from sources such as MSSQL, Snowflake, Databricks, and S3.
Embeddings & Vector Stores:
Develop workflows for chunking and embedding data. Manage and optimize vector databases (e.g., Pinecone, Weaviate) with incremental updates and nightly refreshes.
MLOps Automation:
Orchestrate model training, evaluation, and drift detection using SageMaker Pipelines or Bedrock. Register models via MLflow or SageMaker Registry.
Infrastructure & DevOps:
Manage Terraform-based infrastructure, including GPU autoscaling, spot-instance optimization, and quota tracking for Bedrock usage.
Governance & Compliance:
Implement data lineage tracking, encryption, and PII masking. Prepare audit-ready documentation for SOC2, MiFID, and model risk management.
Monitoring & Reporting:
Use Grafana and Prometheus to monitor data quality, pipeline health, and operational costs. Set up alerts for proactive issue resolution.
Technical Skills:
* Cloud & ML Services: AWS Glue, Step Functions, SageMaker, Bedrock, S3
* Infrastructure as Code: Terraform (multi-account setups)
Qualifications:
* 4+ years in data or ML engineering, with 3+ years managing production-grade MLOps pipelines
* Experience in regulated industries (e.g., finance, healthcare)
* Strong collaboration skills with cross-functional teams (AI researchers, SREs, data stewards)
* Proven ability to take ownership of mission-critical infrastructure
#J-18808-Ljbffr