This repository contains three distinct data science python projects, each addressing a unique problem using advanced analytical techniques. The projects include:
- Loan Approval Dataset Analysis: Explores applicant features to inform loan approval decisions.
- Employee Attrition Prediction: Predicts employee attrition using machine learning models.
- Mall Customer Segmentation Analysis: Segments customers into groups for targeted marketing strategies.
Each project is designed to uncover actionable insights and demonstrate proficiency in data cleaning, feature engineering, modeling, and visualization.
- Goal: Analyze loan applicant data to identify patterns and relationships for better approval decisions.
- Key Tasks:
- Data exploration and cleaning.
- Feature engineering (
Income_to_Loan_Ratio
,EMI
). - Visualizations (boxplots, histograms, scatterplots).
- Insights: Higher-income applicants tend to request larger loans; new features enhance risk assessment.
- Python libraries: Pandas, Numpy, Scikit-learn, Matplotlib, Seaborn.
- Goal: Predict employee attrition using machine learning models to identify at-risk employees.
- Key Tasks:
- Data preparation (encoding, scaling).
- Model training (KNN, Decision Tree, SVM, Random Forest, MLP).
- Evaluation (accuracy, precision, recall).
- Insights: KNN performed best with balanced precision/recall; class imbalance was a challenge.
- Python libraries: Pandas, Scikit-learn, TensorFlow/Keras.
- Goal: Segment mall customers into distinct groups for targeted marketing.
- Key Tasks:
- Data preparation (encoding, scaling, PCA).
- Clustering (KMeans, Agglomerative, GMM, BIRCH).
- Evaluation (Silhouette Score, Davies-Bouldin Index).
- Insights: KMeans was most effective for clear, interpretable clusters.
- Python libraries: Pandas, Numpy , Scikit-learn, Matplotlib, Seaborn.
Key Libraries:
- Pandas: Data cleaning, transformation, and analysis.
- NumPy: Numerical computations.
- Scikit-learn: Machine learning models and clustering.
- TensorFlow/Keras: Deep learning implementation.
- Matplotlib & Seaborn: Data visualization.
Development:
- Google Colab for cloud-based execution and collaboration.
This repository showcases a diverse range of data science tasks, from predictive modeling to unsupervised learning. Each project highlights problem-solving skills, technical proficiency, and the ability to derive meaningful insights from data. The results can be leveraged for decision-making in finance, HR, and marketing domains.
Explore the individual project folders for detailed documentation and code! 🚀