This script implements a machine learning approach to predict diabetes outcomes using a diabetes dataset (diabetes_dataset.csv). It compares the performance of Logistic Regression and Linear Discriminant Analysis (LDA) for classification, evaluating how dropping certain features impacts the model, calculating expected loss with different thresholds, and comparing ROC curves for both models.
- Logistic Regression: A model used to predict the probability of diabetes (Outcome) based on various input features.
- Linear Discriminant Analysis (LDA): Another classification model that is compared with Logistic Regression using ROC curves and AUC scores.
- Objective: Determine whether removing the
SkinThickness
feature improves the performance of the Logistic Regression model. - Approach:
- Load the dataset and create two feature sets:
- All features, including
SkinThickness
. - All features except
SkinThickness
.
- All features, including
- Standardize the features using
StandardScaler
. - Perform 10-fold cross-validation for both models to compare their performance based on mean accuracy.
- Load the dataset and create two feature sets:
- Objective: Calculate the expected loss for different thresholds based on false positives (500) and false negatives (2000) and compare it with different decision thresholds.
- Approach:
- Train a Logistic Regression model using the better-performing feature set from Part A.
- Compute the expected loss for:
- A baseline model (predicting all zeros).
- A model with a threshold of 0.5.
- A model with a calculated "breakeven" threshold.
- Compare the losses on the test set.
- Objective: Compare the performance of Logistic Regression and LDA using ROC curves and AUC scores.
- Approach:
- Train both Logistic Regression and LDA models on the test set.
- Plot the ROC curves for both models and calculate the AUC to compare their effectiveness in classifying diabetes.
- pandas
- numpy
- scikit-learn
- matplotlib
To install the required libraries, run:
pip install pandas numpy scikit-learn matplotlib