This project applies Bayesian methods to analyze two datasets: the National Institute of Diabetes and Digestive and Kidney Diseases’ (NIDDK) diabetes dataset and the PJM Interconnection’s hourly energy consumption dataset. Inspired by the 2017 paper "Deep Learning: A Bayesian Perspective", we implement Bayesian approaches to showcase their strengths, challenges, and the potential for broader applications in statistical analysis.
▪️ Zakarya Elmimouni
– (ENSAE)
▪️ Ahmed Khairaldin
– (ENSAE)
▪️ Amine Razig
– (ENSAE)
The primary objective of this project is to demonstrate the advantages and challenges of Bayesian methods when applied to real-world data. We aim to:
- Explore Bayesian methods for analyzing the diabetes and energy consumption datasets.
- Develop a simulation study to evaluate the methods' robustness and effectiveness.
- Discuss advancements in Bayesian techniques since the publication of the paper.
-
National Institute of Diabetes and Digestive and Kidney Diseases (NIDDK) Diabetes Dataset
- Description: Medical data on factors influencing diabetes progression, including age, BMI, blood pressure, and glucose levels.
- Purpose: Analyze how Bayesian methods can provide insights into the relationships between predictors and the likelihood of diabetes.
-
PJM Interconnection Hourly Energy Consumption Dataset
- Description: Time series data capturing hourly energy consumption for regional transmission operators.
- Purpose: Explore Bayesian approaches for modeling and predicting energy usage patterns over time.
- MLP
- Markov Chain Monte Carlo
- Dropout Approximation
- Bayesian Neural Networks
- Inspired by the original paper, implemented for exploring complex relationships in the data.
Energy Consumption Data
git clone https://github.com/ahmedkakiAK/Bayesian_stats_project.git
cd Bayesian_stats_project
- Experiments are conducted in separate files
Run these notebooks individually to explore specific experiments.
diabetes_exp.ipynb
Notebook for experiments with Diabetes data.synth_exp.ipynb
Notebook for experiments with synthetic data.PMJE_exp.ipynb
Notebook for experiments with Hourly Power Consumption data.
- Key results, (predictive performance) are visualized in each method-specific notebook.
- Python 3.8+
- Libraries:
numpy
pandas
matplotlib
seaborn
scipy
sklearn
- 'torch'
tensorflow
(for Bayesian Neural Networks)- 'keras'
- Optional: GPU for accelerated computation with neural models.
-
Vadim Sokolov Nicholas Polson. Deep learning: A bayesian perspective. 12:1275–1304, 2017
-
Zoubin Ghahramani Yarin Gal. Dropout as a bayesian approximation: Representing model uncer- tainty in deep learning. 2016