This repository contains techniques and notebooks for Feature Engineering (FE) and Exploratory Data Analysis (EDA). The repository is structured into two main directories:
- Feature Engineering: Covers various feature engineering techniques for handling missing values, encoding categorical data, balancing datasets, and more.
- EDA: Includes exploratory data analysis performed on different datasets.
├── Feature Engineering/
│ ├── Data_encoding_Nominal_OHE.ipynb # One-Hot Encoding for Nominal Data
│ ├── FE-handling_imbalanced_dataset.ipynb # Handling Imbalanced Datasets
│ ├── FE-missing_values.ipynb # Handling Missing Values
│ ├── Label_Ordinal_Encoding.ipynb # Label & Ordinal Encoding
│ ├── Number_summary_&_Box_Plot.ipynb # Summary Statistics & Box Plots
│ ├── Smote.ipynb # Synthetic Minority Over-sampling Technique (SMOTE)
│ ├── Target_Guided_Ordinal.ipynb # Target Guided Ordinal Encoding
│
├── EDA/
│ ├── flight_prices_prediction/ # EDA on Flight Prices Dataset
│ ├── wine-quality/ # EDA on Wine Quality Dataset
│ ├── google_playstore_dataset/ # EDA on Google Play Store Dataset
-
Data Encoding Techniques
- One-Hot Encoding (OHE)
- Label Encoding
- Ordinal Encoding (Target Guided)
-
Handling Missing Values
- Imputation techniques
- Handling missing data in different scenarios
-
Dealing with Imbalanced Datasets
- SMOTE (Synthetic Minority Over-sampling Technique)
- Other techniques for handling class imbalance
-
Exploring Numeric Features
- Summary statistics
- Box plots for outlier detection
Each dataset in the EDA/
folder contains a detailed analysis covering:
- Data Cleaning
- Visualization
- Summary Statistics
- Insights & Trends
Clone the repository and run the Jupyter notebooks in your preferred environment.
git clone https://github.com/AlokTheDataGuy/Feature-Engineering-EDA.git
Ensure you have the following dependencies installed:
pip install pandas numpy matplotlib seaborn scikit-learn imbalanced-learn
This project is open-source and available under the MIT License.