ImmuScope: Self-iterative multiple-instance learning enables the prediction of CD4+ T cell immunogenic epitopes
This repository contains the source code for the paper Self-iterative multiple-instance learning enables the prediction of CD4+ T cell immunogenic epitopes.
Accurately predicting the antigen presentation to CD4+ T cells and subsequent induction of immune response is fundamentally important for vaccine development, autoimmune disease treatments, and cancer neoepitope identification. In immunopeptidomics, single-allelic data are highly specific but limited in allele scope, while multi-allelic data contain broader coverage at the cost of weakly labeling. Existing computational approaches either overlook the massive multi-allelic data or introduce label ambiguity due to inadequate modeling strategies. Here, we introduce ImmuScope, a weakly supervised deep-learning framework integrating precise MHC-II antigen presentation, CD4+ T cell epitopes, and immunogenicity predictions. ImmuScope leverages self-iterative multiple-instance learning with positive-anchor triplet loss to explore peptide-MHC-II (pMHC-II) binding from weakly labeled multi-allelic data and single-allelic data, comprising over 600,000 ligands across 142 alleles. Moreover, ImmuScope can also interpret the MHC-II binding specificity and motif deconvolution of immunopeptidomics data. We successfully applied ImmuScope to discover melanoma neoantigens, revealing variations in pMHC-II binding and immunogenicity upon epitope mutations. We further employed ImmuScope to assess the effects of SARS-CoV-2 epitope mutations on immune escape, with its predictions aligned well with experimentally determined immune escape dynamics. Overall, ImmuScope provides a comprehensive solution for CD4+ T cell antigen recognition and immunogenicity assessment, with broad potential for advancing vaccine design and personalized immunotherapy.
-
Clone the repository.
git clone https://github.com/shenlongchen/ImmuScope.git
-
Create a virtual environment by conda.
conda create -n ImmuScope_env python=3.9.19 conda activate ImmuScope_env
-
Download PyTorch>=1.12.1, which is compatible with your CUDA version and other Python packages.
conda install pytorch==1.12.1 torchvision==0.13.1 torchaudio==0.12.1 cudatoolkit=11.6 -c pytorch -c conda-forge
-
Install other dependencies.
pip install -r requirements.txt
The following data and model weights are available at Zenodo.
tar -xvzf ImmuScope-data.tar.gz
tar -xvzf ImmuScope-weights.tar.gzIn ImmuScope-data.tar.gz File:
data/raw: Raw data related to antigen presentation and CD4+ T cell epitopes: binding affinity data, EL data, CD4+ T cell epitope benchmarking test data, MHC-II allele list, supported MHC-II pseudo sequences.data/el_datasets: single- and multi- allelic antigen presentation training data applied to ImmuScope input.data/cd4_datasets: binding affinity training data and CD4+ T cell epitope benchmarking test data applied to ImmuScope input.data/im_datasets: Immunogenicity training and testing data applied to ImmuScope input and raw data.
In ImmuScope-weights.tar.gz File:
weights/EL: model weights for predicting antigen presentation & binding specificity discovery & multi-allelic data binding motif deconvolution.weights/CD4: model weights for predicting CD4+ T cell epitope.weights/IM: model weights for predicting MHC-II epitope immunogenicity.
ImmuScope/
│── configs/ # Configuration files (e.g., model parameters, training settings)
│── data/ # Dataset storage and preprocessing scripts (Extract ImmuScope-data.tar.gz File)
│── ImmuScope/ # Core project code and implementation
│── results/ # Experiment outputs (e.g. ImmuScope-EL, ImmuScope-CD4 and ImmuScope-IM) and logs
│── weights/ # Trained model weights and checkpoints (Extract ImmuScope-weights.tar.gz File)
│── main_antigen_presentation_5cv.py # 5-fold cross validation for antigen presentation prediction
│── main_antigen_presentation_train.py # Train antigen presentation model with eluted ligand data
│── main_cd4_epitope_train.py # Train CD4+ T cell epitope prediction model
│── main_cd4_epitope_test.py # Test CD4+ T cell epitope prediction model
│── main_immunogenicity_train.py # Train immunogenicity prediction model
│── main_immunogenicity_test.py # Test immunogenicity prediction model
│── README.md # Project documentation
│── requirements.txt # List of dependencies-
5-fold cross validation for antigen presentation prediction.
python main_antigen_presentation_5cv.py \ --data-cnf configs/data.yaml \ --model-cnf configs/ImmuScope-EL.yaml -
Train antigen presentation prediction model using single- and multi-allelic eluted ligand data.
python main_antigen_presentation_train.py \ --data-cnf configs/data.yaml \ --model-cnf configs/ImmuScope-EL.yaml
-
Train CD4+ T cell epitope prediction model based on antigen presentation prediction model.
python main_cd4_epitope_train.py \ --data-cnf configs/data.yaml \ --model-cnf configs/ImmuScope.yaml -
Test CD4+ T cell epitope prediction model on benchmarking test data (trained model weights are available in
weights/CD4).python main_cd4_epitope_test.py \ --data-cnf configs/data.yaml \ --model-cnf configs/ImmuScope.yaml
-
Train immunogenicity prediction model with immunogenicity data.
python main_immunogenicity_train.py \ --data-cnf configs/data.yaml \ --model-cnf configs/ImmuScope-IM.yaml -
Test immunogenicity prediction model on immunogenicity data (trained model weights are available in
weights/IM).python main_immunogenicity_test.py \ --data-cnf configs/data.yaml \ --model-cnf configs/ImmuScope-IM.yaml
@article{shen2025self,
title={Self-iterative multiple-instance learning enables the prediction of CD4+ T cell immunogenic epitopes},
author={Shen, Long-Chen and Zhang, Yumeng and Wang, Zhikang and Littler, Dene R and Liu, Yan and Tang, Jinhui and Rossjohn, Jamie and Yu, Dong-Jun and Song, Jiangning},
journal={Nature Machine Intelligence},
pages={1--16},
year={2025},
publisher={Nature Publishing Group UK London}
}If you have any questions, please contact us at [email protected], [email protected] or [email protected].
