Data Platform Engineer

Hybrid

Full-time

We are looking for a Data Platform Engineer to help operate and build the platform and the shared foundations that allow project teams to ingest, structure, govern, process, and reuse data reliably.

About the role

IMU is applying cutting-edge immune system science, data engineering, and machine learning to understand human health in a deeper and more actionable way. As our work moves closer to the clinic, we are building the data infrastructure needed to make complex biological and clinical data reliable, reproducible, governed, and ready for scientific discovery.

We are looking for a Data Platform Engineer to help build and operate the platform behind that work. We think of the data platform as a product, not just infrastructure: it should be reliable, usable, well-documented, and shaped around the needs of the scientists, data scientists, and engineers who depend on it.

You will help turn high-dimensional biological data into trusted, reusable, analysis-ready data products, and you will build the infrastructure that allows teams to run reproducible pipelines at scale.

This is a hands-on engineering role for someone who lives in the cloud, expects infrastructure to be defined as code, and cares deeply about reproducibility. The work connects modern data platform engineering directly to cutting-edge science and, ultimately, better human health.

The role is hybrid, with an expectation of working from our London office a couple of days per week.


Team and ways of working

You will join a small, growing data platform function working closely with computational immunologists, data scientists, software engineers, and clinical-facing teams. This role sits in the platform layer: building the shared foundations that allow project teams to ingest, structure, govern, process, and reuse data reliably.

You will not be working in isolation, but you should be comfortable owning substantial pieces of the platform, making pragmatic technical decisions, and helping set engineering standards as the function scales.


What you will do

  • Build, maintain, and improve cloud-native data platform services used by scientific, data science, and engineering teams.

  • Develop and operate data lake and medallion-style architectures across raw, cleaned, curated, and analysis-ready data layers.

  • Create reliable patterns for data ingestion, validation, transformation, storage, metadata capture, and access control.

  • Support reproducible scientific and data pipelines, including assisting in the migration of workflows into production-ready deployable artefacts.

  • Improve data discoverability, lineage, provenance, auditability, and reuse in line with practical FAIR data principles.

  • Build automation, infrastructure, and developer workflows using infrastructure-as-code, CI/CD, containers, and cloud-native engineering practices.

  • Work closely with scientists, software engineers, and platform users to turn real data problems into reliable platform capabilities.

  • Improve reliability, documentation, maintainability, and cost-awareness across data platform services and pipelines.


What we are looking for


Core experience

We do not expect every candidate to have used every tool in our stack. At CV stage, we are mainly looking for evidence of strong engineering judgement and hands-on experience in a few key areas:

  • Strong software engineering experience in Python or another production-grade language.

  • Practical experience building and operating cloud-native systems on AWS.

  • Experience with infrastructure-as-code, ideally Pulumi, Terraform, or an equivalent tool.

  • Experience with data lakes, medallion architectures, lakehouse patterns, or large-scale analytical data platforms.

  • Experience enabling, operating, or productionising data pipelines or scientific workflows.


Technologies we use or value highly

The closer your experience is to this stack, the faster you are likely to be productive, but we care more about depth and judgement than superficial exposure to every tool:

  • AWS

  • Pulumi, Terraform, or equivalent infrastructure-as-code

  • GitHub Actions or equivalent CI/CD

  • SQLMesh or equivalent SQL-based transformation, modelling, testing, and deployment framework

  • Containers and modern DevOps practices

  • Data lakes, object storage, metadata, validation, schema management, and data lifecycle patterns

  • FAIR data principles, lineage, provenance, and governance


Nice to have

  • Experience with Seqera Platform and Nextflow for reproducible scientific workflow execution.

  • Experience with Dagster or Snowflake.

  • Experience in scientific, bioinformatics, computational biology, immunology, clinical, healthcare, or other regulated data environments.

  • Experience with lakehouse technologies such as Delta Lake, Apache Iceberg, Apache Hudi, Databricks, Athena, Glue, Trino, Spark, or DuckDB.

  • Experience with regulated software, security, quality, or healthcare frameworks such as IEC 62304, ISO 27001, ISO 13485, HTA, HIPAA, or similar.

  • Experience building self-service data platforms, internal developer platforms, or platform capabilities for data science teams.


What success looks like

You will help make the data platform a dependable foundation for scientific discovery and decision-making. Success means data is easier to find, easier to trust, easier to reuse, and easier to process reproducibly.

Scientific and data teams should be able to spend less time fighting infrastructure, tracing data provenance, or manually moving data between systems, and more time generating insight.


Why join us

IMU is building a company around a big scientific idea: that a deeper, data-driven understanding of the immune system can change how we understand, monitor, and ultimately improve human health.

We work with rich, complex biological data and combine scientific expertise, machine learning, and modern data infrastructure to generate insight from the immune system. Our work is moving closer to the clinic, which means the platform we build now has to support both rapid discovery and the discipline needed for future clinical and regulated use.

This is a rare opportunity to join at the point where the foundations are still being shaped. You will not be maintaining a legacy estate or simply keeping dashboards alive. You will help define how data is structured, governed, processed, discovered, and reused across the organisation.

The team is collaborative, scientifically curious, and pragmatic. We value strong engineering, clear ownership, sensible architecture, and people who can work effectively across disciplines while staying focused on practical delivery and real scientific outcomes.

You will also have room to grow. The data platform is central to IMU's future, so this role offers the chance to influence technical direction, shape engineering standards, and build systems that can scale with the company as we move from discovery toward clinical impact.

We support hybrid working, with time in our London office a couple of days per week, and flexibility around how people do their best work.

About the role

IMU is applying cutting-edge immune system science, data engineering, and machine learning to understand human health in a deeper and more actionable way. As our work moves closer to the clinic, we are building the data infrastructure needed to make complex biological and clinical data reliable, reproducible, governed, and ready for scientific discovery.

We are looking for a Data Platform Engineer to help build and operate the platform behind that work. We think of the data platform as a product, not just infrastructure: it should be reliable, usable, well-documented, and shaped around the needs of the scientists, data scientists, and engineers who depend on it.

You will help turn high-dimensional biological data into trusted, reusable, analysis-ready data products, and you will build the infrastructure that allows teams to run reproducible pipelines at scale.

This is a hands-on engineering role for someone who lives in the cloud, expects infrastructure to be defined as code, and cares deeply about reproducibility. The work connects modern data platform engineering directly to cutting-edge science and, ultimately, better human health.

The role is hybrid, with an expectation of working from our London office a couple of days per week.


Team and ways of working

You will join a small, growing data platform function working closely with computational immunologists, data scientists, software engineers, and clinical-facing teams. This role sits in the platform layer: building the shared foundations that allow project teams to ingest, structure, govern, process, and reuse data reliably.

You will not be working in isolation, but you should be comfortable owning substantial pieces of the platform, making pragmatic technical decisions, and helping set engineering standards as the function scales.


What you will do

  • Build, maintain, and improve cloud-native data platform services used by scientific, data science, and engineering teams.

  • Develop and operate data lake and medallion-style architectures across raw, cleaned, curated, and analysis-ready data layers.

  • Create reliable patterns for data ingestion, validation, transformation, storage, metadata capture, and access control.

  • Support reproducible scientific and data pipelines, including assisting in the migration of workflows into production-ready deployable artefacts.

  • Improve data discoverability, lineage, provenance, auditability, and reuse in line with practical FAIR data principles.

  • Build automation, infrastructure, and developer workflows using infrastructure-as-code, CI/CD, containers, and cloud-native engineering practices.

  • Work closely with scientists, software engineers, and platform users to turn real data problems into reliable platform capabilities.

  • Improve reliability, documentation, maintainability, and cost-awareness across data platform services and pipelines.


What we are looking for


Core experience

We do not expect every candidate to have used every tool in our stack. At CV stage, we are mainly looking for evidence of strong engineering judgement and hands-on experience in a few key areas:

  • Strong software engineering experience in Python or another production-grade language.

  • Practical experience building and operating cloud-native systems on AWS.

  • Experience with infrastructure-as-code, ideally Pulumi, Terraform, or an equivalent tool.

  • Experience with data lakes, medallion architectures, lakehouse patterns, or large-scale analytical data platforms.

  • Experience enabling, operating, or productionising data pipelines or scientific workflows.


Technologies we use or value highly

The closer your experience is to this stack, the faster you are likely to be productive, but we care more about depth and judgement than superficial exposure to every tool:

  • AWS

  • Pulumi, Terraform, or equivalent infrastructure-as-code

  • GitHub Actions or equivalent CI/CD

  • SQLMesh or equivalent SQL-based transformation, modelling, testing, and deployment framework

  • Containers and modern DevOps practices

  • Data lakes, object storage, metadata, validation, schema management, and data lifecycle patterns

  • FAIR data principles, lineage, provenance, and governance


Nice to have

  • Experience with Seqera Platform and Nextflow for reproducible scientific workflow execution.

  • Experience with Dagster or Snowflake.

  • Experience in scientific, bioinformatics, computational biology, immunology, clinical, healthcare, or other regulated data environments.

  • Experience with lakehouse technologies such as Delta Lake, Apache Iceberg, Apache Hudi, Databricks, Athena, Glue, Trino, Spark, or DuckDB.

  • Experience with regulated software, security, quality, or healthcare frameworks such as IEC 62304, ISO 27001, ISO 13485, HTA, HIPAA, or similar.

  • Experience building self-service data platforms, internal developer platforms, or platform capabilities for data science teams.


What success looks like

You will help make the data platform a dependable foundation for scientific discovery and decision-making. Success means data is easier to find, easier to trust, easier to reuse, and easier to process reproducibly.

Scientific and data teams should be able to spend less time fighting infrastructure, tracing data provenance, or manually moving data between systems, and more time generating insight.


Why join us

IMU is building a company around a big scientific idea: that a deeper, data-driven understanding of the immune system can change how we understand, monitor, and ultimately improve human health.

We work with rich, complex biological data and combine scientific expertise, machine learning, and modern data infrastructure to generate insight from the immune system. Our work is moving closer to the clinic, which means the platform we build now has to support both rapid discovery and the discipline needed for future clinical and regulated use.

This is a rare opportunity to join at the point where the foundations are still being shaped. You will not be maintaining a legacy estate or simply keeping dashboards alive. You will help define how data is structured, governed, processed, discovered, and reused across the organisation.

The team is collaborative, scientifically curious, and pragmatic. We value strong engineering, clear ownership, sensible architecture, and people who can work effectively across disciplines while staying focused on practical delivery and real scientific outcomes.

You will also have room to grow. The data platform is central to IMU's future, so this role offers the chance to influence technical direction, shape engineering standards, and build systems that can scale with the company as we move from discovery toward clinical impact.

We support hybrid working, with time in our London office a couple of days per week, and flexibility around how people do their best work.