Code to run the experiments for the paper "Large Language Models forecast Patient Health Trajectories enabling Digital Twins".
β NOTE: We are working on a Python library that will enable easier conversions, that is both indication agnostic and generally better engineered. This repo is more towards exploratory experiments, often with dead ends and potentially redundant code. Please raise an issue if something is missing or unclear.
# First install a fresh environment
cd DT-GPT
pip install -r requirements.txt
pip install -e .
This repository contains the implementation of DT-GPT (Digital Twin GPT), a framework for using Large Language Models to forecast patient health trajectories. The codebase is organized into three main components: pipeline infrastructure, experimental runs, and exploratory analyses.
Developed as a collaboration project by Roche, Helmholtz Munich & LMU.
In case of any questions, please reach out to Nikita Makarov or Michael Menden.
Contains the main pipeline components and utilities for running experiments:
Experiment.py
- Main experiment orchestration classEvaluationManager.py
- Handles model evaluation and metrics computationMetricManager.py
- Computes various forecasting metricsSplitters.py
- Data splitting strategies for time series forecastingBaselineHelpers.py
- Baseline model implementationsDartsHelpers.py
- Integration with Darts time series forecasting libraryLLMTimeHelpers.py
- Utilities for LLMTime model implementationNeuralForecastHelpers.py
- Neural forecasting model helpersPlottingHelpers.py
- Visualization utilitiesNormalizationFilterManager.py
- Data preprocessing and filteringMatchingManagers.py
- Patient matching strategies
DataFrameConverters.py
- Base classes for data conversionDataFrameConvertTDBDMIMIC.py
- MIMIC-IV dataset conversion utilitiesDataFrameConvertTemplateTextBasicDescription.py
- Text template generation for LLM input
DataProcessorBiomistral.py
- BioMistral model-specific data processing
Experiments on the critical care variables dataset:
2_copy_forward_baseline/
- Copy-forward baseline implementations3_1_time_llm/
- Time-LLM model experiments3_2_llmtime/
- LLMTime model experiments3_3_general_llms/
- General LLM experiments (GPT, BioMistral, etc.)3_dart_models_combined/
- Traditional time series models using Darts library4_dt_gpt_instruction/
- DT-GPT instruction-based forecasting experiments
Same experimental structure as above but applied to MIMIC-IV dataset:
- Similar folder structure with baseline, LLM, and traditional model experiments
Alzheimer's Disease Neuroimaging Initiative dataset experiments:
2_copy_forward/
- Baseline experiments3_dt_gpt/
- DT-GPT experiments4_darts_models/
- Traditional forecasting models5_time_llm/
- Time-LLM experiments6_llmtime/
- LLMTime experiments7_general_llms/
- General LLM experiments
Zero-shot forecasting experiments:
2025_02_07_run_on_all_prompts.py
- Zero-shot evaluation script2025_02_10_post_process_200_patient_run.ipynb
- Post-processing of zero-shot results
2025_07_28_run_zero_shot_interpretation.py
- Zero-shot interpretation analysis
The primary DT-GPT experiments can be found in the following folders:
-
Critical Variables Dataset (
2024_02_05_critical_vars/
):- Main DT-GPT experiments:
4_dt_gpt_instruction/
- Baseline experiments:
2_copy_forward_baseline/
- Comparison LLM experiments:
3_3_general_llms/
- Traditional model experiments:
3_dart_models_combined/
- Main DT-GPT experiments:
-
MIMIC-IV Dataset (
2024_02_08_mimic_iv/
):- Main DT-GPT experiments:
4_dt_gpt_instruction/
- Similar structure with baseline and comparison model folders
- Main DT-GPT experiments:
-
ADNI Dataset (
2025_02_03_adni/
):- Main DT-GPT experiments:
3_dt_gpt/
- Baseline experiments:
2_copy_forward/
- Comparison model experiments:
4_darts_models/
,5_time_llm/
,6_llmtime/
,7_general_llms/
- Main DT-GPT experiments:
- Zero-shot experiment analysis:
3_various_explorations/zero_shot/2025_02_10_post_process_200_patient_run.ipynb
- Model interpretation results:
3_various_explorations/zero_shot_interpretation/
The framework implements several forecasting approaches:
- Baseline Models: Copy-forward and simple statistical baselines
- Traditional Time Series Models: Using Darts library (Linear model, LSTM, TiDE, ...)
- Specialized Time Series LLMs: Time-LLM and LLMTime
- General Purpose LLMs: BioMistral
- DT-GPT: Our instruction-based approach for patient trajectory forecasting
The codebase supports multiple healthcare datasets:
- Critical Care Variables: ICU patient monitoring data
- MIMIC-IV: Publicly available critical care database
- ADNI: Alzheimer's disease progression data
Due to license restrictions, each dataset needs to be accessed from the respective organisation. In case of any specific data preprocessing questions that are not clear from the code, please reach out to [email protected] or [email protected]. Additionally, we use the custom R library FlatironTimeSeries
for preprocessing the Flatiron data, please reach out to us in case you need access to it.
The data preprocessing scripts are located in their respective dataset folders under 1_data
, with each indication having its own README. Please note, that you will need to adjust the paths in the scripts to your specific system setup.
This is the final processed text version for DT-GPT on a synthetic patient, in the NSCLC set up. Note, if you run this in vLLM, you need to use the completions enpoint (i.e. not using chat completions or chat templates). The <patient_prediction>
part is an essential input for the model.
Input:
First, patient chronological patient history up until the current day. Patient visits for the first time, with the following values: Hyperlipidemia, unspecified is diagnosed, Essential (primary) hypertension is diagnosed, ECOG is 1, <...>.
135 days after previous visit, patient visits again, with the following values: ECOG is 1, neutrophils.segmented is 2.8, <...>.
17 days after previous visit, patient visits again, with the following values: ECOG is 1, alanine aminotransferase is 6, <...>.
7 days after previous visit, <...>.
<...>
Next, the baseline data for the patient: birth year is 1953, gender is M, ses index is 3, is cancer advanced is True, histology is Non-squamous cell carcinoma, cancer stage is Stage IV, smoking status is History of smoking, ethnicity is Not Hispanic or Latino, Current line of therapy is Pembrolizumab, Current line number is 2.
Finally, the variables which you should predict, and for which days in the future from the current day: {"hemoglobin": [21, 42, 63, 70, 77, 84]}
Now, your task is as follows: Given the non small cell lung cancer patient's history, please predict for this patient the previously noted down variables and future days, in the same JSON format. <patient_prediction>
Output:
{"hemoglobin": ["10.8", "11.3", "11.2", "11.6", "11.7", "11.7"]}
If you use this code, please cite our paper:
@article{makarov2024large,
title={Large language models forecast patient health trajectories enabling digital twins},
author={Makarov, Nikita and Bordukova, Maria and Rodriguez-Esteban, Raul and Schmich, Fabian and Menden, Michael P},
journal={medRxiv},
pages={2024--07},
year={2024},
publisher={Cold Spring Harbor Laboratory Press}
}
See License.txt
for licensing information.