A study on usage of machine learning techniques for predicting water quality with reduced information.
This study only includes data from Indian lakes.Data has been collected from collected from the CPCB website under NWMP data.
Data is available to download from https://www.kaggle.com/datasets/akkshaysr/nwmp-water-quality-data-for-indian-lakes or /data/csv in repository.
All code used for the project is available within /notebooks.
WQI_Prediction.ipynbis the main notebook and contains the methods used for prediction of WQIeffect_of_data_imbalance.ipynbandbenchmarks.ipynbare notebooks containing additional details used in the project.
The following python packages have been used in the code and are required to run/modify the code:
pandas==1.5.3
numpy==1.23.5
matplotlib==3.6.3
seaborn==0.12.2
scikit-learn==1.2.1
torch==2.0.1
statsmodels==0.14.0
Branch v2023 contains updated data of 2022 along with usage of ensemble models to further improve accuracy. Different voting strategies to combine the internal models of the ensemble models are tested in combining_models.ipynb.