AIPAL Validator is a tool designed to streamline the validation process for AIPAL. This software provides a comprehensive FHIR-based validation pipeline for pediatric acute leukemia prediction models.
Repository: https://github.com/UMEssen/aipal-validation
Version: 0.2.0
License: MIT License
- Python: 3.10 or higher
- R: 4.0 or higher
- Poetry: 1.0 or higher (for dependency management)
- Docker: Optional, for containerized deployment
- dplyr
- tidyr
- yaml
- caret
- xgboost
- fhir-pyrate (from GitHub)
- wandb ^0.18.0
- pyyaml ^6.0.1
- python-dotenv ^1.0.0
- psycopg2-binary ^2.9.7
- matplotlib ^3.8.0
- scikit-learn ^1.4.2
- openpyxl ^3.1.3
- xgboost ^2.1.0
- shap ^0.46.0
- seaborn ^0.13.2
- tabulate ^0.9.0
- Linux (Ubuntu 20.04+)
- macOS (10.15+)
- Windows 10 (via Docker)
- Minimum: 8GB RAM, 2 CPU cores
- Recommended: 16GB RAM, 4+ CPU cores
- Storage: 2GB free space
-
Install R and required packages:
# Ubuntu/Debian sudo apt-get install r-base # macOS brew install r # Install R packages R -e "install.packages(c('dplyr', 'tidyr', 'yaml', 'caret', 'xgboost'))"
-
Install Poetry:
curl -sSL https://install.python-poetry.org | python3 -
-
Clone and install dependencies:
git clone https://github.com/UMEssen/aipal-validation.git cd aipal-validation poetry install
-
Verify installation:
poetry run aipal_validation --help
Typical installation time: 5-10 minutes on a standard desktop computer
-
Install Docker and Docker Compose
-
Clone repository:
git clone https://github.com/UMEssen/aipal-validation.git cd aipal-validation
-
Build and run container:
docker compose build docker compose run aipal bash
Typical installation time: 5-10 minutes (including Docker image build)
The software includes synthetic test data for demonstration purposes.
-
Generate synthetic data and run complete pipeline:
# Local installation cd synthetic_test_data poetry run python generate_syntetic_data.py cd .. poetry run aipal_validation \ --config aipal_validation/config/config_synthetic.yaml \ --task aipal \ --step sampling+test \ --debug
# Docker installation docker compose run aipal bash cd synthetic_test_data python generate_syntetic_data.py cd .. python -m aipal_validation \ --config aipal_validation/config/config_synthetic.yaml \ --task aipal \ --step sampling+test \ --debug
The demo will generate the following files in synthetic_test_data/aipal/
:
data.csv
- Original synthetic medical data (1000 samples)samples.csv
- Processed data for ML pipelinepredict.csv
- Model predictions with probabilitiespredictions_*.csv
- Results with different confidence cutoffsresults.csv
- Final evaluation metrics (AUC, sensitivity, specificity)
Expected console output includes:
- Data processing statistics
- Model prediction results
- Performance metrics
- Confidence interval calculations
Expected run time: 2-5 minutes on a standard desktop computer
-
Prepare your data:
- Create a directory structure:
your_cohort_name/aipal/
- Place your Excel file in the
aipal
folder - Ensure column names match expected format (see configuration files)
- Create a directory structure:
-
Set configuration:
# Update run_id in config to match your cohort name vim aipal_validation/config/config_training.yaml
-
Generate samples and run validation:
poetry run aipal_validation --task aipal --step sampling poetry run aipal_validation --task aipal --step test
-
Configure FHIR connection:
- Update database connection settings in configuration files
- Set up environment variables for database credentials
-
Run complete pipeline:
poetry run aipal_validation --task aipal --step all
poetry run aipal_validation --task outlier --step detect
poetry run aipal_validation --task retrain --step all
aipal_validation/
├── r/ # R scripts for prediction and model training
├── config/ # Configuration files
├── data_preprocessing/ # Data preprocessing modules
├── eval/ # Evaluation modules
├── fhir/ # FHIR-related modules
├── ml/ # Machine learning modules
├── outlier/ # Outlier detection modules
└── helper/ # Utility functions
To reproduce the results from the associated manuscript:
- Follow the installation guide above
- Use the provided synthetic data or your own dataset
- Run the complete validation pipeline:
poetry run aipal_validation --task aipal --step all
- Evaluation scripts are available in the
eval/
directory for detailed analysis
This software is licensed under the MIT License. See the LICENSE file for details.
If you use this software in your research, please cite:
[Citation information to be added when manuscript is published]
For issues and questions, please open an issue on the GitHub repository.