A pretrained single cell gene expression language model.
- clone this repository:
git clone [email protected]:keiserlab/exceiver.git - install lightweight packaging tool:
conda install flit - install this repo in a new environment:
flit install -s
see notebooks/example.ipynb for loading pretrained models:
from exceiver.models import Exceiver
model = Exceiver.load_from_checkpoint("../pretrained_models/exceiver/pretrained_TS_exceiver.ckpt")
- downlaod the Tabula Sapiens dataset from figshare, specifically
TabulaSapiens.h5ad.zip(careful this link will likely begin download: https://figshare.com/ndownloader/files/34702114) - run
scripts/preprocess.py:
python preprocess.py --ts_path /path/to/download/TabulaSapiens.h5ad
--out_path /path/to/prepocessed/TabulaSapiens
pytorch_lightning makes distributed training easy and CLI access to a host of hyperparameters by running scripts/train.py:
python train.py --name MODELNAME
--data_path /path/to/prepocessed/TabulaSapiens
--logs path/to/model/logs
--frac 0.15
--num_layers 1
--nhead 4
--query_len 128
--batch_size 64
--min_epochs 5
--max_epochs 10
--strategy ddp
--gpus 0,1,2,3