This is a Data Science project for my Data Science for Software Engineers (Course Code 544) class, where I achieved a grade of A. I used machine learning models to predict the potential level of obesity of a patient. I identified the best models and parameters to achieve the highest accuracy in predictions. Obesity Risk Dataset contains information about the key attributes of individuals such as their age, gender, diet, and level of activity, and the level of obesity. In total there are 16 features excluding the ID number, and the target vector is the level of obesity of the individual. The goal of this project is to use these attributes to estimate an individual's risk of obesity.
Three models used for comparison was LogisticRegression, KNeighborsClassifier and SVC. The best performing model for this dataset was SVC with a training score of 0.87 and a validation score of 0.88. SVC performed the best due to high dimensionality and large feature count. The optimal parameters for SVC
were svc_c = 100
and svc_gamma = 0.01
.
The dataset used can be found at Obesity Risk Dataset on Kaggle.