💬 Social Media Sentiment Analysis

An end‑to‑end NLP project for analyzing sentiment in social media posts (positive / negative / neutral).
Includes data cleaning, text normalization, feature extraction (TF‑IDF / embeddings), classical ML baselines (Logistic Regression, Naive Bayes, SVM), and optional deep learning (Bi‑LSTM). Provides rigorous evaluation with Accuracy, F1‑score, and confusion matrix.

📖 Overview

This repository demonstrates a reproducible sentiment analysis pipeline, from raw text to deployable models.
It covers:

Preprocessing & normalization (lowercasing, punctuation removal, stopwords, lemmatization)
Feature extraction with TF‑IDF (n‑grams) or pretrained embeddings
Model training & comparison (LogReg, NB, SVM, Random Forest)
(Optional) Bi‑LSTM sequence model for improved context handling
Explainability & error analysis (misclassification review)

🗂️ Dataset

Input: CSV with columns like: id, text, label (pos/neg/neu or 1/0).
Location: place your files under Dataset/ (e.g., train.csv, test.csv).
Class balance: check for imbalance; consider stratified splits and class weights.

If you use a public dataset (e.g., Twitter Sentiment, Sentiment140), cite the source in this README.

🧹 Preprocessing

Clean text: lowercase, remove URLs, mentions, hashtags (optional keep hashtag text), emojis handling
Tokenization, stopword removal, lemmatization
TF‑IDF with n‑grams (1–2 or 1–3), max_features cap
(Optional) Emoji/emoticon normalization and slang expansion

🧠 Models

Logistic Regression (strong linear baseline on TF‑IDF)
Multinomial Naive Bayes (fast baseline for sparse text)
Linear SVM (robust with TF‑IDF)
Random Forest (tabular baseline)
(Optional) Bi‑LSTM with pretrained word embeddings

📈 Evaluation (replace with your numbers)

Model	Accuracy	F1 (macro)
Logistic Regression	0.90	0.89
Multinomial NB	0.88	0.87
Linear SVM	0.92	0.91
Random Forest	0.86	0.85
Bi‑LSTM (optional)	0.93	0.92

Export:

Confusion matrix for 3‑class sentiment
Per‑class precision/recall/F1
Top informative features (for linear models)

🧩 Repository Structure (suggested)

Social-Media-Sentiment-Analysis/
├─ Dataset/                      # train/test CSVs
├─ Notebook/
│  └─ Sentiment_Analysis.ipynb   # main notebook
├─ src/                          # optional scripts
│  ├─ data.py                    # loading/cleaning
│  ├─ features.py                # TF-IDF / embeddings
│  ├─ train.py                   # training & tuning
│  └─ eval.py                    # metrics & plots
├─ reports/figures/              # CM, PR curves, feature plots
├─ requirements.txt
├─ .gitignore
└─ README.md

⚙️ Setup & Usage

Clone & install

git clone https://github.com/ziaee-mohammad/Social-Media-Sentiment-Analysis.git
cd Social-Media-Sentiment-Analysis
pip install -r requirements.txt

Run notebook

jupyter notebook Notebook/Sentiment_Analysis.ipynb

(Optional) Run scripts

python -m src.train --model "svm" --ngrams 1,2 --max_features 200000
python -m src.eval  --report

📦 Requirements (example)

pandas
numpy
scikit-learn
nltk
matplotlib
seaborn

If using Bi‑LSTM: add torch (or tensorflow) and suitable tokenizers/embeddings.

✅ Good Practices

Use stratified train/test split
Keep vectorizer + model in a single Pipeline to avoid leakage
Fix random seeds for reproducibility
Save vectorizer/model artifacts if you plan to deploy

🏷 Tags

data-science
machine-learning
nlp
sentiment-analysis
text-mining
classification
python
scikit-learn
tf-idf
``

---

## 👤 Author
**Mohammad Ziaee** — Computer Engineer | AI & Data Science  
📧 [email protected]  
🔗 https://github.com/ziaee-mohammad

---

## 📜 License
MIT — free to use and adapt with attribution.

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
Dataset		Dataset
Question		Question
Scripts		Scripts
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

💬 Social Media Sentiment Analysis

📖 Overview

🗂️ Dataset

🧹 Preprocessing

🧠 Models

📈 Evaluation (replace with your numbers)

🧩 Repository Structure (suggested)

⚙️ Setup & Usage

📦 Requirements (example)

✅ Good Practices

🏷 Tags

About

Uh oh!

Releases

Packages

Languages

ziaee-mohammad/Social-Media-Sentiment-Analysis

Folders and files

Latest commit

History

Repository files navigation

💬 Social Media Sentiment Analysis

📖 Overview

🗂️ Dataset

🧹 Preprocessing

🧠 Models

📈 Evaluation (replace with your numbers)

🧩 Repository Structure (suggested)

⚙️ Setup & Usage

📦 Requirements (example)

✅ Good Practices

🏷 Tags

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages