Lightium is building next-generation photonic integrated circuits on thin-film lithium niobate (TFLN). As we scale, we are looking for a data engineer/scientist to help us analyze and visualize our technical data as well as contribute to scale the infrastructure that turns that data into insight. Position Summary This is a hands-on, build-things role. You will work closely with the Head of Data and our technical characterization team to design and implement data pipelines, transformations, and analytical models that connect our lab, manufacturing, and enterprise systems into a coherent data platform. We want someone who is comfortable writing production Python, can think clearly about data modelling, and is excited to apply machine learning and AI tooling to real scientific and operational problems. If you are a recent graduate or an industry expert who is hungry to build at scale and learn fast, this is the role for you. Responsibilities Data Engineering & Pipeline Development Design, build, and maintain scalable data pipelines that ingest data from lab instruments, PLM (Aras Innovator), ERP (Oracle NetSuite), MES, and other operational systems. Develop and manage ELT/ETL transformations using Python and DBT, applying software engineering best practices: version control, testing, modularity, and documentation. Work with Apache Iceberg and cloud object storage (AWS S3 or GCP GCS) to build and manage a scalable data lake that supports both batch and incremental processing patterns. Build and operate distributed data processing workflows using Apache Spark for large-scale transformation, aggregation, and feature engineering tasks. Implement data quality checks, schema validation, and pipeline monitoring to ensure that data flowing through the platform is reliable, traceable, and fit for purpose. Manage and evolve the data warehouse layer (table design, partitioning strategies, naming conventions, and access controls) to support growing analytical workloads. Dat...