This is a complete, end-to-end Fraud Detection project using real-world transactional data. It simulates a practical data analyst workflow for a banking domain, involving Python, SQL, Tableau, and a simple Machine Learning model. The goal is to detect fraudulent transactions, build insights for decision-making, and demonstrate resume-ready data skills.
- Source: Kaggle Credit Card Fraud Dataset
- File:
creditcard.csv
- Duration: 2 days of anonymized transaction data
- Features: 30 columns (V1–V28, Amount, Time), with
Class
as the fraud label (1 = fraud, 0 = non-fraud)
- Loads and inspects the raw dataset
- Checks for duplicates and missing values
- Visualizes class imbalance and amount distributions
- Outputs:
clean_creditcard.csv
- Loads the cleaned CSV into an in-memory SQLite database
- Runs SQL queries:
- Total fraud vs non-fraud counts
- Average transaction amount by class
- Outputs:
sql_avg_amount.csv
for use in Tableau
- Trains a Logistic Regression model to predict fraud
- Handles class imbalance using
class_weight='balanced'
- Scales features and splits data into training/testing
- Evaluates using confusion matrix and classification report
- Visualizes key fraud insights:
- Bar chart: Fraud vs Non-Fraud Count
- Line chart: Transactions Over Time
- KPI Cards: SQL-derived Average Amounts
- Filters: Transaction Amount Range, Class (Fraud vs Non-Fraud)
cd fraud_transaction_model/
pip install -r requirements.txt
python run_all.py
Show how many transactions were fraud vs non-fraud.
Show how transactions (especially fraud) evolve over time. This visual illustrates how transaction volume fluctuates over time, with Fraud events plotted alongside Non-Fraud ones. Fraud is rare (blue line hugging the X-axis), but its timing can still be analyzed for patterns.
SQL calcuated average transaction amount
- clean_data.py This cleasn up the csv dataset
- sql_queries.py load the dataset into an in memory sqlite db and performs queries
- fraud_model.py simple logistic regression model to classify transactions as fraud or non-fraud. Despite the class imbalance, It uses balancing techniques and is able to achieve strong recall — which is critical in minimizing undetected fraud.