PhishNet - Phishing Detection System

Built by Srinjay Panja

Introduction

PhishNet is a system designed to detect phishing URLs in real-time. It utilizes a machine learning model (URLSet Ensemble) trained on URL characteristics, combined with a backend API and a browser extension for seamless integration. When you browse the web, the extension sends the current URL to the backend API, which uses the trained model to predict whether the URL is likely malicious (phishing) or legitimate.

🚀 Live Deployment & Easy Usage

PhishNet is already deployed and ready to use!

The backend is live at: https://phishnet-pavv.onrender.com
Anyone can use the PhishNet browser extension from anywhere — no server setup required!

Note: Because of the free Render plan, the backend will "spin down" after a period of inactivity. The first request after a period of inactivity can be delayed by 50 seconds or more while the server wakes up. Subsequent requests will be fast.

How to Use

Install the Extension:
- Download just the extension folder from this repository (no need to clone the entire repo).
- Open your browser's extensions page (e.g., chrome://extensions for Chrome).
- Enable Developer Mode.
- Click "Load unpacked" and select the extension folder you downloaded.
Browse the Web:
- The extension will automatically check URLs using the live backend.
- You’ll see notifications if a site is flagged as phishing.

This project consists of three main components:

Backend: A FastAPI application that serves the ML model predictions via an API endpoint.
Machine Learning (URLSet Ensemble): A model trained using features extracted from the urlset.csv dataset. The training scripts and related assets are included.
Browser Extension: A simple browser extension that communicates with the backend API to check URLs as you visit them.

Features

Real-time URL analysis via browser extension.
Phishing detection powered by an ensemble of 4 machine learning models: Random Forest, Logistic Regression, Decision Tree, and Gradient Boosting.
FastAPI backend for efficient API request handling.
Modular structure with separate components for the backend, ML training, and extension.
Includes scripts for data preprocessing and model retraining.

Project Structure

PhishNet/
├── backend/                  # FastAPI backend application
│   ├── ml_assets/            # (Contains assets for other models - not used by default)
│   ├── traditional_ml_assets/ # (Contains assets for other models - not used by default)
│   ├── urlset_ml_assets/     # Model assets (scaler, columns, model) - NOT in repo, see below
│   ├── .env.example          # Example environment file for MongoDB URI
│   ├── feature_extraction.py # Feature extraction logic for prediction
│   ├── main.py               # FastAPI application entry point
│   └── requirements.txt      # Backend Python dependencies
├── backend_env/              # (Ignored) Virtual environment for backend
├── data/                     # (Ignored) Datasets
├── extension/                # Browser extension files
│   ├── icons/                # Extension icons
│   ├── background.js         # Extension logic
│   └── manifest.json         # Extension configuration
├── ml_training/              # Scripts for ML model training
│   ├── feature_extraction.py # Feature extraction logic for training
│   ├── preprocess_urlset.py  # Preprocessing script for urlset.csv
│   ├── train_urlset.py       # Training script for the URLSet model
│   └── requirements.txt      # ML training Python dependencies
├── ml_training_env/          # (Ignored) Virtual environment for ML training
├── README.md                 # This file
└── ...                       # Other configuration/installer files

Note: Virtual environments, large binary/model/data files, and sensitive assets are not stored in the repository. See below for how to obtain required model files.

Setup Instructions

Note: Manual backend installation and local MongoDB setup are NOT required. The backend is already deployed and ready to use. Most users only need to install the extension as described above.

If you want to deploy your own backend or retrain the model, see the Render deployment and ML training instructions below.

Training the Model (Optional)

If you want to retrain the URLSet ensemble model using the provided data or your own data:

Ensure you have completed the ML Training Setup steps (virtual environment activated, dependencies installed).
Navigate to the ML training directory: cd path/to/PhishNet/ml_training
Run the training script:
```
python train_urlset.py
```
- This script will typically perform preprocessing (using preprocess_urlset.py and feature_extraction.py) on the data/urlset.csv file and then train the ensemble model.
Copy Assets: After successful training, new model assets will likely be generated within the ml_training directory (or a subdirectory). You need to manually copy the updated assets (e.g., urlset_ensemble_model.pkl, scaler.pkl, feature_columns.pkl, processed_data.pkl) to the backend/urlset_ml_assets/ directory, overwriting the existing files.
Restart the backend server for the changes to take effect.

Demo

Below are screenshots demonstrating PhishNet in action:

Phishing detected (extension warning):

Phishing but Chrome Secure can't detect:

Safe site detected:

License

MIT License

Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the “Software”), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions:

The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software.

THE SOFTWARE IS PROVIDED “AS IS”, WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.

Notes on Large Files & LFS

This repository does not store model/data files or virtual environments. These are ignored via .gitignore.
GitHub hard-limits files to 100 MB, even with Git LFS.
Model files are distributed via cloud storage and must be downloaded as described above.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

PhishNet - Phishing Detection System

Introduction

🚀 Live Deployment & Easy Usage

How to Use

Features

Project Structure

Setup Instructions

Training the Model (Optional)

Demo

License

Notes on Large Files & LFS

About

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 17 Commits
backend		backend
extension		extension
img		img
ml_training		ml_training
read		read
.gitattributes		.gitattributes
.gitignore		.gitignore
README.md		README.md
python-3.10.11-installer.exe		python-3.10.11-installer.exe

Bucke200/PhishNet

Folders and files

Latest commit

History

Repository files navigation

PhishNet - Phishing Detection System

Introduction

🚀 Live Deployment & Easy Usage

How to Use

Features

Project Structure

Setup Instructions

Training the Model (Optional)

Demo

License

Notes on Large Files & LFS

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages