This project analyzes behavioral and pricing patterns in an online clothing store using clickstream data from a real-world e-commerce environment.
It combines exploratory analysis, regression modeling, classification, and pricing anomaly detection to uncover actionable insights into customer behavior, product value, and price perception.
- Identify sales trends and product-level performance using clickstream data.
- Predict product price using features like color, placement, and category.
- Classify products into budget vs. premium pricing tiers using machine learning.
- Detect pricing–perception mismatches by comparing predicted values and classification outcomes.
The dataset is a publicly available e-shop clothing clickstream dataset (2008).
It contains:
- Session-level purchase behavior
- Product category and color
- Price and price tier
- Page placement and photography style
- Data Cleaning & Feature Engineering
- Exploratory Data Analysis (EDA)
- Regression Models: Linear Regression, Random Forest, XGBoost
- Classification Models: Logistic Regression, Random Forest
- Residual-Based Mispricing Detection
- Trousers were top revenue generators and frequently mispriced.
- Photography style and product color strongly influenced pricing.
- Over 5000 products were flagged as high-priced but perceived as budget-tier.