ML and Virtual Cells
Modern fluorescence microscopy produces massive, high-dimensional datasets: large 5D volumes (x, y, z, time, channels) of living cells under many conditions. Analysing this data at scale requires advanced machine learning (ML) and AI models that go beyond simple quantification, enabling us to extract hidden patterns, predict behaviours, and ultimately generate virtual representations of cells (“Virtual Cells”).
Our computational pipeline is designed to:
- Automate experiment setup and imaging decisions.
- Denoise and enhance image quality for gentle, long-term live-cell imaging.
- Segment and quantify subcellular structures across thousands of cells.
- Build generative models that learn the rules of cell behaviour.
- Create “Virtual Cells” that allow prediction, visualisation, and simulation of cellular processes.
ML in the Imaging Pipeline
1. Experiment Automation
- Cell detection and localisation: Neural networks adapted from DETR (detection transformers) operate directly on raw volumetric OPM data to find cells of interest.
- Automated cell selection: Using user-provided labels (e.g. healthy/unhealthy, cell-cycle stage, receptor localisation), ML learns decision boundaries to choose which cells to image in real time.
- Adaptive parameter optimisation: Models predict the best laser power, exposure, and acquisition strategy for each cell, maximising image quality and throughput.
2. Image Quality and Denoising
- Generative photon models: Deep generative networks trained on photon-counting data denoise extremely low-light images, allowing gentle imaging that preserves cell viability.
- Data augmentation: Synthetic images are generated to increase dataset diversity and improve downstream training.
3. Segmentation and Quantification
- Pre-trained architectures: Advanced networks (U-Net, diffusion models, transformers) are adapted to microscopy data using low-rank adaptation (LoRA) for efficient fine-tuning.
- Multi-structure segmentation: Plasma membrane, endosomes, Golgi, ER, nucleus, mitochondria, and more are segmented in full 5D volumes.
- Population statistics: Quantitative descriptors (shape, size, localisation, dynamics) are extracted across thousands of cells and distilled into population-level metrics.
Virtual Cells
1. Latent Representations of Cell States
- Each cell state is represented as a point in a learned latent space, defined by the spatial distribution of subcellular structures.
- Generative models capture the co-dependence between structures, allowing us to predict unlabelled organelles from labelled ones or from label-free OP-SLIM reference channels.
- “Average cells” can be reconstructed, representing population-level states in an interpretable form.
2. Hierarchical Generative Models
- Variational hierarchical models are trained on large 5D datasets to represent cell states and their distributions across populations.
- This allows simulation of new cell states sampled from the learned distribution, providing both statistical insight and realistic visualisations.
3. Dynamics and Time Evolution
- Generative temporal models (transformer-based) learn trajectories of cells through the latent state space.
- Models capture both short-term transitions (Markovian) and long-range dependencies (non-Markovian).
- These models enable predictive simulations of cell behaviour, including responses to stimuli before and after imaging.
Outputs and Sharing
- Software: ML pipelines, APIs, and GUIs for automated imaging, segmentation, and generative modelling.
- Data: Large-scale 5D datasets with full metadata, available via open repositories.
- Models: Pre-trained generative models of cell states and dynamics for reuse by the bioimaging and AI communities.