This project predicts whether an incoming email is Spam or Ham (not spam) using a Logistic Regression model. The workflow includes preprocessing the dataset, splitting into training/testing sets, and training the model for accurate predictions.
- 
Mail Data Collect email datasets containing labeled examples of spam and ham messages.
 - 
Data Preprocessing
- Cleaning the text (removing stop words, punctuation, and numbers).
 - Converting text into numerical form using techniques like TF-IDF.
 - Handling missing values.
 
 - 
Train-Test Split Split the processed dataset into training and testing subsets to evaluate model performance.
 - 
Model Training Train a Logistic Regression model using the training dataset.
 - 
Prediction
- Input: New email text.
 - Output: Spam or Ham.
 
 
- 
Language: Python
 - 
Libraries:
pandasβ Data manipulationnumpyβ Numerical computationsscikit-learnβ Machine learning algorithms and preprocessingmatplotlib/seabornβ Visualization
 
The trained Logistic Regression model is evaluated using:
- Accuracy
 - Precision
 - Recall
 - F1 Score
 
This report summarizes the performance of the classification model. The model was evaluated on a test set of 15 samples, focusing on its ability to distinguish between "Ham" and "Spam" messages.
| Class | Precision | Recall | F1-Score | Support | 
|---|---|---|---|---|
| Ham | 0.64 | 1.00 | 0.78 | 7 | 
| Spam | 1.00 | 0.50 | 0.67 | 8 | 
- Precision: The percentage of positive predictions that were actually correct.
 - Recall: The percentage of actual positives that were correctly identified.
 - F1-Score: A measure that balances precision and recall.
 - Support: The number of occurrences of each class in the test set.
 
| Metric | Score | 
|---|---|
| Accuracy | 0.73 | 
| Macro Average | 0.72 | 
| Weighted Average | 0.72 | 
- Accuracy: The percentage of total predictions that were correct.
 - Macro Average: The unweighted average of the metrics across all classes.
 - Weighted Average: The average of the metrics, weighted by the number of samples in each class.
 
- Experiment with advanced models like Naive Bayes or Random Forest.
 - Implement real-time email scanning.
 - Create a web-based UI for user interaction.
 
- Husnain Khaliq