Skip to content

ziaee-mohammad/Social-Media-Sentiment-Analysis

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

3 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

πŸ’¬ Social Media Sentiment Analysis

An end‑to‑end NLP project for analyzing sentiment in social media posts (positive / negative / neutral).
Includes data cleaning, text normalization, feature extraction (TF‑IDF / embeddings), classical ML baselines (Logistic Regression, Naive Bayes, SVM), and optional deep learning (Bi‑LSTM). Provides rigorous evaluation with Accuracy, F1‑score, and confusion matrix.


πŸ“– Overview

This repository demonstrates a reproducible sentiment analysis pipeline, from raw text to deployable models.
It covers:

  • Preprocessing & normalization (lowercasing, punctuation removal, stopwords, lemmatization)
  • Feature extraction with TF‑IDF (n‑grams) or pretrained embeddings
  • Model training & comparison (LogReg, NB, SVM, Random Forest)
  • (Optional) Bi‑LSTM sequence model for improved context handling
  • Explainability & error analysis (misclassification review)

πŸ—‚οΈ Dataset

  • Input: CSV with columns like: id, text, label (pos/neg/neu or 1/0).
  • Location: place your files under Dataset/ (e.g., train.csv, test.csv).
  • Class balance: check for imbalance; consider stratified splits and class weights.

If you use a public dataset (e.g., Twitter Sentiment, Sentiment140), cite the source in this README.


🧹 Preprocessing

  • Clean text: lowercase, remove URLs, mentions, hashtags (optional keep hashtag text), emojis handling
  • Tokenization, stopword removal, lemmatization
  • TF‑IDF with n‑grams (1–2 or 1–3), max_features cap
  • (Optional) Emoji/emoticon normalization and slang expansion

🧠 Models

  • Logistic Regression (strong linear baseline on TF‑IDF)
  • Multinomial Naive Bayes (fast baseline for sparse text)
  • Linear SVM (robust with TF‑IDF)
  • Random Forest (tabular baseline)
  • (Optional) Bi‑LSTM with pretrained word embeddings

πŸ“ˆ Evaluation (replace with your numbers)

Model Accuracy F1 (macro)
Logistic Regression 0.90 0.89
Multinomial NB 0.88 0.87
Linear SVM 0.92 0.91
Random Forest 0.86 0.85
Bi‑LSTM (optional) 0.93 0.92

Export:

  • Confusion matrix for 3‑class sentiment
  • Per‑class precision/recall/F1
  • Top informative features (for linear models)

🧩 Repository Structure (suggested)

Social-Media-Sentiment-Analysis/
β”œβ”€ Dataset/                      # train/test CSVs
β”œβ”€ Notebook/
β”‚  └─ Sentiment_Analysis.ipynb   # main notebook
β”œβ”€ src/                          # optional scripts
β”‚  β”œβ”€ data.py                    # loading/cleaning
β”‚  β”œβ”€ features.py                # TF-IDF / embeddings
β”‚  β”œβ”€ train.py                   # training & tuning
β”‚  └─ eval.py                    # metrics & plots
β”œβ”€ reports/figures/              # CM, PR curves, feature plots
β”œβ”€ requirements.txt
β”œβ”€ .gitignore
└─ README.md

βš™οΈ Setup & Usage

  1. Clone & install
git clone https://github.com/ziaee-mohammad/Social-Media-Sentiment-Analysis.git
cd Social-Media-Sentiment-Analysis
pip install -r requirements.txt
  1. Run notebook
jupyter notebook Notebook/Sentiment_Analysis.ipynb
  1. (Optional) Run scripts
python -m src.train --model "svm" --ngrams 1,2 --max_features 200000
python -m src.eval  --report

πŸ“¦ Requirements (example)

pandas
numpy
scikit-learn
nltk
matplotlib
seaborn

If using Bi‑LSTM: add torch (or tensorflow) and suitable tokenizers/embeddings.


βœ… Good Practices

  • Use stratified train/test split
  • Keep vectorizer + model in a single Pipeline to avoid leakage
  • Fix random seeds for reproducibility
  • Save vectorizer/model artifacts if you plan to deploy

🏷 Tags

data-science
machine-learning
nlp
sentiment-analysis
text-mining
classification
python
scikit-learn
tf-idf
``

---

## πŸ‘€ Author
**Mohammad Ziaee** β€” Computer Engineer | AI & Data Science  
πŸ“§ [email protected]  
πŸ”— https://github.com/ziaee-mohammad

---

## πŸ“œ License
MIT β€” free to use and adapt with attribution.

About

NLP-based sentiment analysis of Twitter and Google Play reviews using machine learning algorithms.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published