Skip to content

This repository contains techniques and notebooks for Feature Engineering (FE) and Exploratory Data Analysis (EDA).

Notifications You must be signed in to change notification settings

AlokTheDataGuy/Feature-Engineering-and-EDA

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

3 Commits
 
 
 
 
 
 

Repository files navigation

Feature Engineering & Exploratory Data Analysis

This repository contains techniques and notebooks for Feature Engineering (FE) and Exploratory Data Analysis (EDA). The repository is structured into two main directories:

  • Feature Engineering: Covers various feature engineering techniques for handling missing values, encoding categorical data, balancing datasets, and more.
  • EDA: Includes exploratory data analysis performed on different datasets.

Repository Structure

├── Feature Engineering/
│   ├── Data_encoding_Nominal_OHE.ipynb  # One-Hot Encoding for Nominal Data
│   ├── FE-handling_imbalanced_dataset.ipynb  # Handling Imbalanced Datasets
│   ├── FE-missing_values.ipynb  # Handling Missing Values
│   ├── Label_Ordinal_Encoding.ipynb  # Label & Ordinal Encoding
│   ├── Number_summary_&_Box_Plot.ipynb  # Summary Statistics & Box Plots
│   ├── Smote.ipynb  # Synthetic Minority Over-sampling Technique (SMOTE)
│   ├── Target_Guided_Ordinal.ipynb  # Target Guided Ordinal Encoding
│
├── EDA/
│   ├── flight_prices_prediction/  # EDA on Flight Prices Dataset
│   ├── wine-quality/  # EDA on Wine Quality Dataset
│   ├── google_playstore_dataset/  # EDA on Google Play Store Dataset

Feature Engineering Notebooks

  1. Data Encoding Techniques

    • One-Hot Encoding (OHE)
    • Label Encoding
    • Ordinal Encoding (Target Guided)
  2. Handling Missing Values

    • Imputation techniques
    • Handling missing data in different scenarios
  3. Dealing with Imbalanced Datasets

    • SMOTE (Synthetic Minority Over-sampling Technique)
    • Other techniques for handling class imbalance
  4. Exploring Numeric Features

    • Summary statistics
    • Box plots for outlier detection

Exploratory Data Analysis (EDA)

Each dataset in the EDA/ folder contains a detailed analysis covering:

  • Data Cleaning
  • Visualization
  • Summary Statistics
  • Insights & Trends

Usage

Clone the repository and run the Jupyter notebooks in your preferred environment.

git clone https://github.com/AlokTheDataGuy/Feature-Engineering-EDA.git

Requirements

Ensure you have the following dependencies installed:

pip install pandas numpy matplotlib seaborn scikit-learn imbalanced-learn

License

This project is open-source and available under the MIT License.

About

This repository contains techniques and notebooks for Feature Engineering (FE) and Exploratory Data Analysis (EDA).

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published