This project implements an profanity filter api using machine learning techniques. It consists of two main components: a model training script and a main application that uses the trained model to detect and censor profanity in user-input text.
train_model.py
: Script for training the profanity detection modelmain.py
: Main application for running the profanity filterprofanity_model.joblib
: Saved trained modeltfidf_vectorizer.joblib
: Saved TF-IDF vectorizer
The model training process involves the following steps:
-
Data Loading:
- The script loads a dataset from
raw_dataset.csv
containing comments and their toxicity labels.
- The script loads a dataset from
-
Data Preprocessing:
- Combines multiple toxic categories (toxic, severe_toxic, obscene, threat, insult, identity_hate) into a single 'toxic_combined' column.
-
Data Splitting:
- Splits the data into training and testing sets (80% train, 20% test) using
train_test_split
from scikit-learn.
- Splits the data into training and testing sets (80% train, 20% test) using
-
Feature Extraction:
- Uses TF-IDF (Term Frequency-Inverse Document Frequency) vectorization to convert text data into numerical features.
- The
TfidfVectorizer
is configured to use a maximum of 10,000 features and consider both unigrams and bigrams.
-
Model Training:
- Trains a Logistic Regression model on the vectorized training data.
- The model is configured with a random state for reproducibility and a maximum of 1000 iterations.
-
Model Evaluation:
- Evaluates the model's accuracy on the test set.
-
Saving the Model and Vectorizer:
- Saves the trained model and TF-IDF vectorizer using joblib for later use in the main application.
The main application uses the trained model to filter profanity in real-time:
-
Model Loading:
- Loads the pre-trained Logistic Regression model and TF-IDF vectorizer.
-
Profanity Prediction:
- The
predict_profanity
function takes a text input, vectorizes it using the loaded TF-IDF vectorizer, and predicts the probability of it being profane using the trained model.
- The
-
Text Censoring:
- The
censor_text
function splits the input text into words and checks each word for profanity. - Words with a profanity score above 0.5 (adjustable threshold) are replaced with asterisks.
- The
-
User Interface:
- Provides a command-line interface for users to input text.
- For each input, it displays the profanity score and a censored version of the text.
- The program runs in a loop until the user types 'quit'.
-
TF-IDF Vectorization: Converts text data into numerical features, capturing the importance of words in the document corpus.
-
Logistic Regression: A binary classification algorithm used to predict the probability of text being profane.
-
Train-Test Split: Ensures the model is evaluated on unseen data to assess its generalization capability.
-
Joblib: Used for efficient saving and loading of Python objects, particularly useful for large numpy arrays and scikit-learn models.
-
Regular Expressions: Used in the censoring process to identify and replace profane words while preserving text structure.
Model Accuracy: 0.9549 or 95.49%
Model Precision: 0.9199 or 91.99%
Model Recall: 0.6091 or 60.91%
Model F1 Score: 0.7329 or 73.29%
The effectiveness of this profanity filter depends on the quality and diversity of the training data. Regular updates to the training data and model can help improve its accuracy and coverage of different types of profanity.