This project implements a Random Forest Classifier to detect fraudulent credit card transactions using an imbalanced dataset of over 284,000 records. After data cleaning, visualization, and train-test splitting, the model was trained with 100 estimators and evaluated using a classification report. The final model achieved an F1-score of 0.70 and precision of 84.3%, successfully identifying rare fraud cases despite class imbalance.
| Technology | Purpose |
|---|---|
| Python | Primary programming language |
| Pandas | Data preprocessing and manipulation |
| Matplotlib & Seaborn | Data visualization |
| Scikit-learn | Modeling and evaluation (Random Forest) |
| Jupyter Notebook / Colab | Notebook-based development and execution |
- βοΈ Worked with heavily imbalanced data (0.17% fraud cases)
- π² Trained a Random Forest model with 100 estimators
- π Achieved F1-score: 0.70, Precision: 84.3% on test set
- π Visualized correlation heatmap and fraud distribution
- π Included confusion matrix and classification report for evaluation
- Hands-on experience handling imbalanced datasets in classification
- Applied Random Forest with parameter tuning
- Used precision and F1-score for evaluation of rare-event prediction
- Visualized data imbalance and performance metrics using Seaborn
Sumdiboii β Machine Learning Enthusiast & Software Developer
LinkedIn β Sumedh Pimplikar
Detecting the undetectable β this project highlights the power of ensemble learning in uncovering rare patterns in financial fraud detection.
