This project provides a pipeline for protein function prediction using deep learning and protein language model embeddings.
-
Prepare Data:
- Use
prepare_data.py
to process your raw FASTA and TSV files into training-ready data.
- Use
-
Extract Embeddings:
- Generate embeddings for your protein sequences (see
plm.py
orcluster_embed/
).
- Generate embeddings for your protein sequences (see
-
Train Model:
- Run the main training pipeline:
python train_script.py
- The script uses configs in
configs/
and saves results inoutputs/
orruns/
.
- Run the main training pipeline:
train_script.py
: Main entry for trainingprepare_data.py
: Data preparationconfigs/
: Experiment/model configsNetwork/
,models/
: Model codeutils/
: Utilities