Skip to content

Fine-tuned NLP model that detects toxic vs. non-toxic tone in text. Based on DistilBERT and trained on real-world comment data. Includes training script and a simple script for testing.

Notifications You must be signed in to change notification settings

Pfauberg/text-tone-classifier

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

1 Commit
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

Text Tone Classifier

This is a fine-tuned model that checks if a sentence sounds toxic or not. The project uses a small dataset of online comments. It fine-tunes a pretrained DistilBERT model to do binary classification: toxic or not toxic.

πŸ“ Important Project files

β”œβ”€β”€ main.py             # Trains the model
β”œβ”€β”€ predict.py          # Usage of trained model
β”œβ”€β”€ download_dataset.py # Downloads dataset
β”œβ”€β”€ model/              # Saved model after training (ignored by git)

πŸ“Š Dataset

Dataset: Jigsaw Toxic Comment Classification Challenge from Kaggle.
Only train.csv is needed.
Use download_dataset.py to download it into the project folder.

πŸš€ How to run

  1. Install libraries
pip install transformers datasets scikit-learn pandas accelerate kagglehub
  1. Download dataset
    python download_dataset.py

  2. Train the model
    python main.py

  3. Check tone of your sentence
    python predict.py

Then type a sentence in the console. The model will say if it is toxic or not.

About

Fine-tuned NLP model that detects toxic vs. non-toxic tone in text. Based on DistilBERT and trained on real-world comment data. Includes training script and a simple script for testing.

Topics

Resources

Stars

Watchers

Forks

Languages