The Spam Detection System is a machine learning-based application designed to identify and filter spam messages. Leveraging PyTorch and PyTorch Lightning, the system provides an efficient pipeline for training, evaluating, and deploying spam detection models. The project includes a user-friendly Streamlit interface for seamless interaction, making it accessible for developers and non-technical users alike.
- Custom Data Handling: A tailored PyTorch data generator and DataModule for preprocessing datasets and ensuring compatibility with machine learning pipelines.
- Distributed Training: Optimized for scalable training and inference using PyTorch Lightning.
- Interactive Interface: A Streamlit-based app for easy data uploads, training configuration, and real-time visualizations of model performance.
- High Accuracy: Hyperparameter optimization and robust evaluation metrics ensure reliable spam detection.
- Modular Design: The project structure supports customization and future enhancements.
- Clone this repository:
git clone https://github.com/YourUsername/spam-detection-system.git cd spam-detection-system - Create a virtual environment and activate it:
python -m venv venv source venv/bin/activate # On Windows: venv\Scripts\activate
- Install the required dependencies:
pip install -r requirements.txt
-
Prepare Your Data
- Upload a labeled dataset in CSV format. Ensure it contains message texts and corresponding labels.
-
Run the Streamlit App
- Launch the app for training and inference:
streamlit run Main.py
- Use the app to configure hyperparameters, monitor training progress, and evaluate results.
- Launch the app for training and inference:
-
Train the Model
- Use the app to train the model with your dataset, fine-tune hyperparameters, and observe performance metrics.
-
Model Inference
- Test the trained model using sample inputs to predict spam or legitimate messages.
DataGenerator.py: Custom data loader to tokenize messages and generate input features for the model.prepare_data_module.py: PyTorch Lightning DataModule for handling training, validation, and test datasets.Home.py: Streamlit app homepage with project description and features.
- Programming Languages: Python
- Frameworks: PyTorch, PyTorch Lightning, Streamlit
- Visualization Tools: TensorBoard, Streamlit
Link : https://mini-spam-detector-analyser.streamlit.app
- Integrate more advanced NLP models such as transformers for enhanced performance.
- Extend the app for multi-language spam detection.
- Deploy the application as a web service for broader accessibility.
Contributions are welcome! If you'd like to contribute, please fork the repository, create a feature branch, and submit a pull request.
This project is licensed under the MIT License. See the LICENSE file for details.
Special thanks to all open-source contributors whose libraries and frameworks made this project possible.