This is the official repository for the ICML 2025 spotlight paper Enforcing Latent Euclidean Geometry in Single-Cell VAEs for Manifold Interpolation. Many tools for single-cell RNA-seq (scRNA-seq) operate under the assumption that the latent space exhibits approximate Euclidean geometry, using straight lines to estimate cell state transitions and distances. To support and enhance this assumption, we introduce FlatVI, a representation learning model for scRNA-seq data that promotes locally flat geometry in the latent space, making it a natural complement to existing single-cell analysis pipelines.
FlatVI is a Variational Autoencoder (VAE) trained with a negative binomial likelihood tailored to single-cell data, and augmented with geometric regularisation. The VAE's decoder maps latent representations to parameters of a statistical manifold defined by negative binomial distributions. The local geometry of the latent space is governed by the pullback metric, which we regularise toward a scaled identity matrix. This encourages the latent space to adopt a local Euclidean structure.
Find our work at:
- OpenReview
- Soon on ArXiv too
The repository is currently undergoing significant restructuring and simplification, as the software is being adapted to the scvi-tools
framework structure (see the official repo). The current folder supporting the ICML 2025 publication will be preserved in a dedicated branch.
All the used datasets and checkpoints will be made publicly available on Zenodo by the time of the conference. Nonetheless, the datasets in this study are public and can be accessed from their original publications.
- Clone our repository
git clone https://github.com/theislab/FlatVI.git
- Create the conda environment:
conda env create -f environment.yml
- Activate the environment:
conda activate flatvi
- Install the FlatVI package in development mode:
cd directory_where_you_have_your_git_repos/FlatVI
pip install -e .
- Create symlink to the storage folder for experiments:
cd directory_where_you_have_your_git_repos/FlatVI
ln -s folder_for_experiment_storage project_folder
- Create an
experiment
anddataset
folder.
cd project_folder
mkdir datasets
mkdir experiments
Requirements
See environment.yml
for the required packages.
Hydra
Our implementation leverages hydra to handle experiments. The configuration hierarchy can be found in the configs_hydra
folder.
FlatVI
The source folder for the model is in the flatvi
folder.
Training scripts
Training scripts are in flatvi/train_hydra
:
train_cfm.py
trains conditional flow matching.train_vae.py
trains the negative binomial variational autoencoder (either with or without regularization).train_geodesic_vae.py
trains the geodesic autoencoder baseline.
Models
Model scripts are in the flatvi/models
folder:
- In
flatvi/models/base
we have standard modules for the variational autoencoder, both with and without regularization, and the geodesic autoencoder baseline. - In
flatvi/models/cfm
we have modules for implementing Conditional Flow Matching, inspired by the torchCFM repo. - In
flatvi/models/manifold
we have the modules to deal with operations on manifolds, such as geodesic distance approximations or metric computations.
Bash scripts to launch training are in scripts
. To retrain the models, first create a logs
folder in the scripts folder of interest. It will be used to dump the slurm error and output files. The scripts by default assume the use of the slurm
scheduling system but can be adapted to standard bash commands.
We provide example notebooks in the notebook
folder.
@inproceedings{
palma2025enforcing,
title={Enforcing Latent Euclidean Geometry in Single-Cell {VAE}s for Manifold Interpolation},
author={Alessandro Palma and Sergei Rybakov and Leon Hetzel and Stephan G{\"u}nnemann and Fabian J Theis},
booktitle={Forty-second International Conference on Machine Learning},
year={2025},
url={https://openreview.net/forum?id=DoDXFkF10S}
}