Senior Software Engineer, Semantics
Hybrid
Full-time
We are looking for a Senior Software Engineer, Semantics to help build the semantic and metadata capabilities of the core Data Platform.
About IMU Biosciences
IMU Biosciences has developed proprietary platform technologies that generate and translate vast system-level immune data into actionable insights and tools to drive the development of precision medicines across a variety of diseases. Built on over a decade of research at King’s College London and the Francis Crick Institute, IMU leverages advanced immune profiling with proprietary AI and machine learning analytics to uncover novel clinical immune signatures. IMU continues to establish partnerships with leading pharma and biotech companies to advance disease diagnosis, optimise product selection, and improve patient stratification and monitoring - while also building its own pipeline of innovative products.
About the role
IMU is applying cutting-edge immune system science, data engineering, and machine learning to understand human health in a deeper and more actionable way. As our work scales, we are building the platform capabilities needed to make complex biological and computational data structured, traceable, reproducible, and reusable across the organisation.
We are looking for a Senior Software Engineer, Semantics to help build the semantic and metadata capabilities of the core Data Platform.
This is not a standalone ontology or knowledge graph initiative. The role sits directly within the Data Platform team and focuses on building practical platform systems that help scientists, computational immunologists, and engineers work with trusted and well-structured scientific data.
You will work closely with the Computational Immunology team to understand how analytical and machine learning workflows produce data, and help ensure those outputs are consistently structured, versioned, lineage-aware, discoverable, and reusable inside the platform.
A core part of the role is helping turn fragmented scientific and computational outputs into usable data products that can be reliably found, assembled, interpreted, and reused by laboratory, project management, Computational Immunology, and Data Platform teams.
The work includes metadata systems, lineage and provenance capture, dataset contracts, FAIR data practices, data catalog capabilities, and semantic integration between pipelines, datasets, and scientific outputs.
This is a hands-on engineering role for someone who enjoys working across platform engineering, scientific workflows, metadata systems, and cloud-native infrastructure, while staying grounded in practical delivery and operational ownership.
The role is hybrid, with an expectation of working from our London office a couple of days per week.
Team and ways of working
You will join a small, growing Data Platform function working closely with Computational Immunology, wet lab, and clinical-facing teams.
This role sits within the platform layer: helping build the shared foundations that allow teams to ingest, structure, govern, discover, process, assemble, and reuse scientific data reliably.
You will not be working in isolation. The role is deeply connected to the wider platform effort and will involve close collaboration with engineers responsible for ingestion, orchestration, infrastructure, security, transformation, and platform operations.
As a senior engineer, you will be expected to:
Own substantial technical problems from discovery through implementation and operation.
Work directly with stakeholders to gather requirements and translate them into practical platform capabilities.
Influence platform architecture and engineering standards alongside the wider Data Platform team.
Make pragmatic technical trade-offs in environments with evolving scientific and operational requirements.
Contribute to shaping how the platform evolves as usage, scale, and regulatory expectations grow.
You should be comfortable balancing hands-on implementation work with technical leadership, collaboration, and operational ownership.
What you will do
Build and maintain metadata, semantic, lineage, and provenance capabilities within the core Data Platform.
Work closely with the Computational Immunology team to understand analytical workflows and translate them into dataset contracts, metadata standards, and reusable platform capabilities.
Develop systems for structuring, validating, registering, versioning, and governing scientific datasets and computational outputs.
Build ingestion pathways that return outputs from analytical and machine learning pipelines to the Data Platform as governed and reusable scientific datasets.
Help establish practical patterns for discovering, assembling, and delivering trusted datasets to laboratory, Computational Immunology, and operational teams.
Improve data discoverability, usability, lineage, provenance, auditability, reproducibility, and reuse through well-structured, named, versioned, archived, and analysis-ready data products aligned with practical FAIR data principles.
Contribute to data catalog and scientific knowledge graph capabilities that connect datasets, workflows, biological entities, analytical outputs, and scientific conclusions.
Build AWS-native services, APIs, automation, and platform tooling using modern engineering practices.
Work closely with scientists, computational immunologists, software engineers, and platform users to turn real scientific and data problems into reliable platform capabilities.
Improve observability, reliability, documentation, maintainability, and operational maturity across semantic and metadata services.
What we are looking for
Core experience
We do not expect every candidate to have used every technology in our stack. We are mainly looking for strong engineering judgement, practical delivery experience, and evidence of building reliable systems in complex data environments.
Strong software engineering experience in Python.
Practical experience building and operating cloud-native systems on AWS.
Experience working with data platforms, metadata systems, or data-intensive distributed systems.
Experience with APIs, distributed services, infrastructure-as-code, CI/CD, and production engineering practices.
Experience working with data lineage, metadata, schema management, versioning, validation, or data governance concepts.
Experience supporting or integrating analytical, machine learning, or scientific workflows into production systems.
Technologies we use or value highly
The closer your experience is to this stack, the faster you are likely to be productive, but we care more about engineering depth and judgement than keyword matching.
AWS ecosystem
Pulumi or equivalent infrastructure-as-code
GitHub Actions or equivalent CI/CD
Nextflow or equivalent scientific workflow orchestration systems
Data lakes, metadata systems, lineage systems, schema management, and governed dataset patterns
FAIR data principles, provenance, auditability, and reproducibility
Knowledge graphs, semantic modelling, ontologies, RDF / OWL, or related technologies
Data catalogs and metadata management systems
Containers and modern DevOps practices
Nice to have
Experience in scientific, bioinformatics, computational biology, immunology, clinical, healthcare, or regulated data environments.
Experience with large-scale analytical or biomedical datasets.
Experience implementing data catalogs, lineage systems, or knowledge graph capabilities.
Experience with lakehouse technologies such as Delta Lake, Apache Iceberg, Athena, Glue, Spark, or DuckDB.
Experience with regulated software, security, quality, or healthcare frameworks such as ISO 27001, IEC 62304, HIPAA, or similar.
Experience building self-service platform capabilities for scientific or data-intensive teams.
What success looks like
You will help make the Data Platform a dependable foundation for scientific discovery and computational research.
Success means computational outputs are no longer treated as disconnected pipeline artifacts, but as trusted and reusable scientific assets with clear structure, metadata, provenance, lineage, and lifecycle management.
Laboratory and Computational Immunology teams should be able to reliably find, assemble, interpret, and reuse trusted datasets without needing bespoke manual data wrangling for each project.
The platform should increasingly support discoverable, versioned, semantically connected scientific datasets that can be reused across workflows, studies, and future AI systems.
Why join us
IMU is building a company around a big scientific idea: that a deeper, data-driven understanding of the immune system can change how we understand, monitor, and ultimately improve human health.
We work with rich, complex biological and computational data and combine scientific expertise, machine learning, and modern platform engineering to generate insight from the immune system. As our work moves closer to the clinic, the systems we build now need to support both rapid scientific discovery and the discipline required for reproducibility, governance, and future regulated use.
This is a rare opportunity to help shape the semantic and metadata foundations of a modern scientific data platform while the architecture and operating model are still being defined.
You will not be maintaining a legacy metadata system or building isolated ontology models disconnected from real workflows. You will help build practical platform capabilities that directly support computational science, reproducibility, AI-readiness, and scientific reuse at scale.
The team is collaborative, scientifically curious, and pragmatic. We value strong engineering, sensible architecture, operational ownership, and people who can work effectively across disciplines while staying focused on practical delivery and real scientific outcomes.
The Data Platform is central to IMU’s future, which means this role offers genuine scope to influence technical direction, shape engineering standards, and help define how scientific data is structured, governed, discovered, and reused across the organisation as the platform scales.
You will join while foundational decisions are still being made, with the opportunity to build systems and patterns that can grow with the company from research-scale workflows toward future clinical and regulated environments.
We support hybrid working, with time in our London office a couple of days per week, and flexibility around how people do their best work.
About IMU Biosciences
IMU Biosciences has developed proprietary platform technologies that generate and translate vast system-level immune data into actionable insights and tools to drive the development of precision medicines across a variety of diseases. Built on over a decade of research at King’s College London and the Francis Crick Institute, IMU leverages advanced immune profiling with proprietary AI and machine learning analytics to uncover novel clinical immune signatures. IMU continues to establish partnerships with leading pharma and biotech companies to advance disease diagnosis, optimise product selection, and improve patient stratification and monitoring - while also building its own pipeline of innovative products.
About the role
IMU is applying cutting-edge immune system science, data engineering, and machine learning to understand human health in a deeper and more actionable way. As our work scales, we are building the platform capabilities needed to make complex biological and computational data structured, traceable, reproducible, and reusable across the organisation.
We are looking for a Senior Software Engineer, Semantics to help build the semantic and metadata capabilities of the core Data Platform.
This is not a standalone ontology or knowledge graph initiative. The role sits directly within the Data Platform team and focuses on building practical platform systems that help scientists, computational immunologists, and engineers work with trusted and well-structured scientific data.
You will work closely with the Computational Immunology team to understand how analytical and machine learning workflows produce data, and help ensure those outputs are consistently structured, versioned, lineage-aware, discoverable, and reusable inside the platform.
A core part of the role is helping turn fragmented scientific and computational outputs into usable data products that can be reliably found, assembled, interpreted, and reused by laboratory, project management, Computational Immunology, and Data Platform teams.
The work includes metadata systems, lineage and provenance capture, dataset contracts, FAIR data practices, data catalog capabilities, and semantic integration between pipelines, datasets, and scientific outputs.
This is a hands-on engineering role for someone who enjoys working across platform engineering, scientific workflows, metadata systems, and cloud-native infrastructure, while staying grounded in practical delivery and operational ownership.
The role is hybrid, with an expectation of working from our London office a couple of days per week.
Team and ways of working
You will join a small, growing Data Platform function working closely with Computational Immunology, wet lab, and clinical-facing teams.
This role sits within the platform layer: helping build the shared foundations that allow teams to ingest, structure, govern, discover, process, assemble, and reuse scientific data reliably.
You will not be working in isolation. The role is deeply connected to the wider platform effort and will involve close collaboration with engineers responsible for ingestion, orchestration, infrastructure, security, transformation, and platform operations.
As a senior engineer, you will be expected to:
Own substantial technical problems from discovery through implementation and operation.
Work directly with stakeholders to gather requirements and translate them into practical platform capabilities.
Influence platform architecture and engineering standards alongside the wider Data Platform team.
Make pragmatic technical trade-offs in environments with evolving scientific and operational requirements.
Contribute to shaping how the platform evolves as usage, scale, and regulatory expectations grow.
You should be comfortable balancing hands-on implementation work with technical leadership, collaboration, and operational ownership.
What you will do
Build and maintain metadata, semantic, lineage, and provenance capabilities within the core Data Platform.
Work closely with the Computational Immunology team to understand analytical workflows and translate them into dataset contracts, metadata standards, and reusable platform capabilities.
Develop systems for structuring, validating, registering, versioning, and governing scientific datasets and computational outputs.
Build ingestion pathways that return outputs from analytical and machine learning pipelines to the Data Platform as governed and reusable scientific datasets.
Help establish practical patterns for discovering, assembling, and delivering trusted datasets to laboratory, Computational Immunology, and operational teams.
Improve data discoverability, usability, lineage, provenance, auditability, reproducibility, and reuse through well-structured, named, versioned, archived, and analysis-ready data products aligned with practical FAIR data principles.
Contribute to data catalog and scientific knowledge graph capabilities that connect datasets, workflows, biological entities, analytical outputs, and scientific conclusions.
Build AWS-native services, APIs, automation, and platform tooling using modern engineering practices.
Work closely with scientists, computational immunologists, software engineers, and platform users to turn real scientific and data problems into reliable platform capabilities.
Improve observability, reliability, documentation, maintainability, and operational maturity across semantic and metadata services.
What we are looking for
Core experience
We do not expect every candidate to have used every technology in our stack. We are mainly looking for strong engineering judgement, practical delivery experience, and evidence of building reliable systems in complex data environments.
Strong software engineering experience in Python.
Practical experience building and operating cloud-native systems on AWS.
Experience working with data platforms, metadata systems, or data-intensive distributed systems.
Experience with APIs, distributed services, infrastructure-as-code, CI/CD, and production engineering practices.
Experience working with data lineage, metadata, schema management, versioning, validation, or data governance concepts.
Experience supporting or integrating analytical, machine learning, or scientific workflows into production systems.
Technologies we use or value highly
The closer your experience is to this stack, the faster you are likely to be productive, but we care more about engineering depth and judgement than keyword matching.
AWS ecosystem
Pulumi or equivalent infrastructure-as-code
GitHub Actions or equivalent CI/CD
Nextflow or equivalent scientific workflow orchestration systems
Data lakes, metadata systems, lineage systems, schema management, and governed dataset patterns
FAIR data principles, provenance, auditability, and reproducibility
Knowledge graphs, semantic modelling, ontologies, RDF / OWL, or related technologies
Data catalogs and metadata management systems
Containers and modern DevOps practices
Nice to have
Experience in scientific, bioinformatics, computational biology, immunology, clinical, healthcare, or regulated data environments.
Experience with large-scale analytical or biomedical datasets.
Experience implementing data catalogs, lineage systems, or knowledge graph capabilities.
Experience with lakehouse technologies such as Delta Lake, Apache Iceberg, Athena, Glue, Spark, or DuckDB.
Experience with regulated software, security, quality, or healthcare frameworks such as ISO 27001, IEC 62304, HIPAA, or similar.
Experience building self-service platform capabilities for scientific or data-intensive teams.
What success looks like
You will help make the Data Platform a dependable foundation for scientific discovery and computational research.
Success means computational outputs are no longer treated as disconnected pipeline artifacts, but as trusted and reusable scientific assets with clear structure, metadata, provenance, lineage, and lifecycle management.
Laboratory and Computational Immunology teams should be able to reliably find, assemble, interpret, and reuse trusted datasets without needing bespoke manual data wrangling for each project.
The platform should increasingly support discoverable, versioned, semantically connected scientific datasets that can be reused across workflows, studies, and future AI systems.
Why join us
IMU is building a company around a big scientific idea: that a deeper, data-driven understanding of the immune system can change how we understand, monitor, and ultimately improve human health.
We work with rich, complex biological and computational data and combine scientific expertise, machine learning, and modern platform engineering to generate insight from the immune system. As our work moves closer to the clinic, the systems we build now need to support both rapid scientific discovery and the discipline required for reproducibility, governance, and future regulated use.
This is a rare opportunity to help shape the semantic and metadata foundations of a modern scientific data platform while the architecture and operating model are still being defined.
You will not be maintaining a legacy metadata system or building isolated ontology models disconnected from real workflows. You will help build practical platform capabilities that directly support computational science, reproducibility, AI-readiness, and scientific reuse at scale.
The team is collaborative, scientifically curious, and pragmatic. We value strong engineering, sensible architecture, operational ownership, and people who can work effectively across disciplines while staying focused on practical delivery and real scientific outcomes.
The Data Platform is central to IMU’s future, which means this role offers genuine scope to influence technical direction, shape engineering standards, and help define how scientific data is structured, governed, discovered, and reused across the organisation as the platform scales.
You will join while foundational decisions are still being made, with the opportunity to build systems and patterns that can grow with the company from research-scale workflows toward future clinical and regulated environments.
We support hybrid working, with time in our London office a couple of days per week, and flexibility around how people do their best work.