Skip to content

bigjosher/Fraud-Detection-Model

Repository files navigation

Banking Fraud Detection Project

This is a complete, end-to-end Fraud Detection project using real-world transactional data. It simulates a practical data analyst workflow for a banking domain, involving Python, SQL, Tableau, and a simple Machine Learning model. The goal is to detect fraudulent transactions, build insights for decision-making, and demonstrate resume-ready data skills.


Dataset Used

  • Source: Kaggle Credit Card Fraud Dataset
  • File: creditcard.csv
  • Duration: 2 days of anonymized transaction data
  • Features: 30 columns (V1–V28, Amount, Time), with Class as the fraud label (1 = fraud, 0 = non-fraud)

Project Structure & Workflow

1. clean_data.py – Data Cleaning & EDA

  • Loads and inspects the raw dataset
  • Checks for duplicates and missing values
  • Visualizes class imbalance and amount distributions
  • Outputs: clean_creditcard.csv

2. sql_queries.py – SQL Integration

  • Loads the cleaned CSV into an in-memory SQLite database
  • Runs SQL queries:
    • Total fraud vs non-fraud counts
    • Average transaction amount by class
  • Outputs: sql_avg_amount.csv for use in Tableau

3. fraud_model.py – Machine Learning

  • Trains a Logistic Regression model to predict fraud
  • Handles class imbalance using class_weight='balanced'
  • Scales features and splits data into training/testing
  • Evaluates using confusion matrix and classification report

4. Tableau Dashboard

  • Visualizes key fraud insights:
    • Bar chart: Fraud vs Non-Fraud Count
    • Line chart: Transactions Over Time
    • KPI Cards: SQL-derived Average Amounts
    • Filters: Transaction Amount Range, Class (Fraud vs Non-Fraud)

Step 1: Clone or download the repo

cd fraud_transaction_model/

Step 2: Install dependencies

pip install -r requirements.txt

Step 3: Run everything

python run_all.py

TABLEAU

Worksheet 1

Show how many transactions were fraud vs non-fraud.

Worksheet 2

Show how transactions (especially fraud) evolve over time. This visual illustrates how transaction volume fluctuates over time, with Fraud events plotted alongside Non-Fraud ones. Fraud is rare (blue line hugging the X-axis), but its timing can still be analyzed for patterns.

Worksheet 3

SQL calcuated average transaction amount

Python Files

  1. clean_data.py This cleasn up the csv dataset
  2. sql_queries.py load the dataset into an in memory sqlite db and performs queries
  3. fraud_model.py simple logistic regression model to classify transactions as fraud or non-fraud. Despite the class imbalance, It uses balancing techniques and is able to achieve strong recall — which is critical in minimizing undetected fraud.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages