Building Machine Learning Pipelines for Predicting Pediatric Bone Marrow Transplant Survival

Project Overview

This project leverages a dataset containing bone marrow transplantation characteristics for pediatric patients from the UCI Machine Learning Repository. The primary objective is to build a comprehensive machine learning pipeline that encompasses all preprocessing and data cleaning steps, ultimately selecting the best classifier to predict patient survival status.

Dataset Description

The dataset includes various features related to both donors and recipients of hematopoietic stem cells. Key features of interest include:

donor_age: The age of the donor at the time of hematopoietic stem cell apheresis.
donor_age_below_35: Indicator of whether the donor's age is below 35 (yes, no).
donor_ABO: ABO blood group of the donor (0, A, B, AB).
donor_CMV: Presence of cytomegalovirus infection in the donor before transplantation (present, absent).
recipient_age: Age of the recipient at the time of transplantation.
recipient_age_below_10: Indicator of whether the recipient's age is below 10 (yes, no).
recipient_age_int: Discretized intervals of the recipient's age (0,5], (5, 10], (10, 20].
recipient_gender: Gender of the recipient (female, male).
recipient_body_mass: Body mass of the recipient at the time of transplantation.
survival_status: Survival status (0 - alive, 1 - dead).

Project Steps

Data Loading:
- Load the dataset into a DataFrame named df.
Data Preprocessing:
- Identify and separate numerical and categorical columns.
- Process numerical and categorical data appropriately, considering that all imported data are numeric. Binary columns (e.g., donor_age_below_35) are encoded as 0 and 1, and categorical columns (e.g., donor_ABO) are encoded as -1, 0, 1, and 2.
Pipeline Construction:
- Construct a machine learning pipeline that handles data preprocessing, including scaling and encoding.
- Ensure the pipeline processes numerical and categorical columns separately.
Model Selection:
- Implement various classifiers and select the best-performing model based on evaluation metrics.
- Use cross-validation to ensure the robustness of the selected model.
Evaluation and Analysis:
- Evaluate the performance of the chosen model using metrics such as accuracy, precision, recall, and F1-score.
- Analyze the results to draw insights and conclusions about the factors influencing pediatric bone marrow transplant survival.

Conclusion

This project demonstrates the construction of a machine-learning pipeline to predict the survival status of pediatric bone marrow transplant patients. By preprocessing the data and selecting the best classifier, we aim to enhance the predictive accuracy and provide valuable insights into the factors affecting patient outcomes.

Output

Dataset Source

The dataset used in this project is available at the UCI Machine Learning Repository: Bone Marrow Transplantation Dataset.

Repository Structure

data/: Contains the dataset used for the project.
notebooks/: Jupyter notebooks with exploratory data analysis, pipeline construction, and model training.
scripts/: Python scripts for data preprocessing, model training, and evaluation.
results/: Evaluation metrics and analysis results.
README.md: Project overview and detailed explanation of the steps involved.

Getting Started

To reproduce the results, follow these steps:

Clone the repository.
Install the necessary dependencies.
Run the preprocessing script to prepare the data.
Execute the model training script to build and evaluate the pipeline.

Feel free to explore the notebooks and scripts to understand the detailed implementation of the project.

By building a robust machine learning pipeline, this project aims to contribute to the predictive modeling of pediatric bone marrow transplantation outcomes, potentially aiding in better decision-making and improved patient care.

Name		Name	Last commit message	Last commit date
Latest commit History 8 Commits
README.md		README.md
bone_marrow.arff		bone_marrow.arff
script.py		script.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Building Machine Learning Pipelines for Predicting Pediatric Bone Marrow Transplant Survival

Project Overview

Dataset Description

Project Steps

Conclusion

Output

Dataset Source

Repository Structure

Getting Started

About

Uh oh!

Releases

Packages

Languages

XENO2410/Predicting-Pediatric-Bone-Marrow-Transplant-Survival

Folders and files

Latest commit

History

Repository files navigation

Building Machine Learning Pipelines for Predicting Pediatric Bone Marrow Transplant Survival

Project Overview

Dataset Description

Project Steps

Conclusion

Output

Dataset Source

Repository Structure

Getting Started

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages