Patient Trajectory Prediction with Clinical Notes Integration

This repository contains the code and resources for our research on patient trajectory prediction, integrating both structured (CCS codes) and unstructured (clinical notes) data using advanced deep learning techniques.

Project Structure

cross_val_train_ddp.py: Script for cross-validation training of the Transformer encoder-decoder model with integrated clinical notes, using distributed data parallelism.
model.py: Contains the main model architecture.
prepare_notes.py: Script for preprocessing clinical notes.
pretrain_bert.py: Script for pretraining Clinical Mosaic, our custom BERT model.
train.py: Vanilla Transformer-based training without clinical notes or distributed data parallelism.
train_ddp.py: Training script for the Transformer model with injected notes using distributed data parallelism.
train_with_notes.py: Training script for the Transformer model with injected notes without distributed data parallelism.

Subdirectories

literature_models/: Contains notebooks and scripts to reproduce results from various models in the literature using our cross-validation folds.
notebooks/: Jupyter notebooks for data preparation, model training, and evaluation.
- Clinical_Mosaic_MedNLI.ipynb: Notebook for reproducing MedNLI results of Clinical Mosaic.
- prepare_data.ipynb: Notebook to prepare the data (after preparing the notes)
stats/: Utilities for visualization and statistical analysis.
tests/: Test scripts to ensure correct preprocessing of notes, especially when using multiple processes/threads.
utils/: Various utility functions and modules for data processing, model training, and evaluation.

Setup

Clone the repository:

git clone [repository-url]
cd [repository-name]

Set up the environment:

conda env create -f environment.yml
conda activate [env-name]

Alternatively, if you prefer using pip:

pip install -r requirements.txt

Usage

Prepare the clincal notes:
```
python prepare_notes.py
```
Pretrain Clinical Mosaic:
```
python pretrain_bert.py
```
Prepare the data:
- Run the notebook located at notebooks/prepare_data.ipynb.
Train the model:
- For vanilla training:
```
python train.py
```
- For training with notes:
```
python train_with_notes.py
```
- For distributed training with notes:
```
python train_ddp.py
```
Evaluate the model:
- Use the notebooks in the notebooks/ directory for detailed evaluation and analysis.

Reproducing Literature Results

To reproduce results from other models in the literature:

Navigate to the literature_models/ directory.
Run the corresponding notebook or script for the model you wish to reproduce.

Visualization

Various visualization scripts and utilities are available in the stats/ directory. Use these to generate plots and analyze the results.

Acknowledgments

The project leading to this publication has received funding from the Excellence Initiative of Aix Marseille Université - A*Midex, a French “Investissements d’Avenir programme” AMX-21-IET-017.

We would like to thank LIS | Laboratoire d'Informatique et Systèmes, Aix-Marseille University for providing the GPU resources necessary for pretraining and conducting extensive experiments. Additionally, we acknowledge CEDRE | CEntre de formation et de soutien aux Données de la REcherche, Programme 2 du projet France 2030 IDeAL for supporting early-stage experiments and hosting part of the computational infrastructure.

Citation

BibTeX:

@misc{klioui2025patienttrajectorypredictionintegrating,
      title={Patient Trajectory Prediction: Integrating Clinical Notes with Transformers}, 
      author={Sifal Klioui and Sana Sellami and Youssef Trardi},
      year={2025},
      eprint={2502.18009},
      archivePrefix={arXiv},
      primaryClass={cs.LG},
      url={https://arxiv.org/abs/2502.18009}, 
}

@article{RNTI/papers/1002990,
  author    = {Sifal Klioui and Sana Sellami and Youssef Trardi},
  title     = {Prédiction de la trajectoire du patient : Intégration des notes cliniques aux transformers},
  journal = {Revue des Nouvelles Technologies de l'Information},
  volume = {Extraction et Gestion des Connaissances, RNTI-E-41},
  year      = {2025},
  pages     = {135-146}
}

More Information

For further details, please refer to the model’s repository and supplementary documentation.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Patient Trajectory Prediction with Clinical Notes Integration

Project Structure

Subdirectories

Setup

Usage

Reproducing Literature Results

Visualization

Acknowledgments

Citation

More Information

About

Uh oh!

Releases

Packages

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 258 Commits
literature_models		literature_models
notebooks		notebooks
scripts		scripts
stats		stats
tests		tests
utils		utils
.gitignore		.gitignore
README.md		README.md
cross_val_train_ddp.py		cross_val_train_ddp.py
environment.yml		environment.yml
model.py		model.py
paths.yaml		paths.yaml
prepare_notes.py		prepare_notes.py
pretrain_bert.py		pretrain_bert.py
pyproject.toml		pyproject.toml
requirements.txt		requirements.txt
train.py		train.py
train_ddp.py		train_ddp.py
train_with_notes.py		train_with_notes.py

MostHumble/PatientTrajectoryForecasting

Folders and files

Latest commit

History

Repository files navigation

Patient Trajectory Prediction with Clinical Notes Integration

Project Structure

Subdirectories

Setup

Usage

Reproducing Literature Results

Visualization

Acknowledgments

Citation

More Information

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages