This repository presents a comprehensive approach to emotion classification on text data labeled with six emotions:
- Sadness
- Joy
- Love
- Anger
- Fear
- Surprise
We explore and compare four deep learning architectures, each in its own notebook:
-
CNN (cnn.ipynb)
- Embedding layer, convolutional filters, and global max pooling
- Achieves around 90% accuracy (depending on hyperparameters)
-
LSTM (lstm.ipynb)
- Incorporates LSTM units to capture sequential text patterns
- Dropout for regularization and optional RMSprop/Adam optimizers
-
BiGRU (BiGRU.ipynb)
- Bidirectional GRU capturing context in both directions
- Approx. 94% accuracy with extensive preprocessing (emoji handling, chat word cleanup)
-
BERT (Bert.ipynb)
- Fine-tunes a pre-trained BERT model
- Often yields the highest accuracy (~97%) but requires more computational resources
The dataset is sourced from:
“Emotion dataset.” Kaggle: NIDULA ELGIRIYEWITHANA. https://www.kaggle.com/datasets/nelgiriyewithana/emotions (accessed Apr. 2, 2024).
- Approximately 400,000 tweets labeled with one of the six emotions.
- Preprocessing includes tokenization, stopword removal, padding, and optional class balancing.
- Data Preprocessing: Includes removing emojis/chat words, balancing classes, tokenizing, and padding.
- Model Development: Each notebook details training setup, hyperparameters (batch size, learning rate, epochs), and the chosen optimizer.
- Evaluation Metrics: Accuracy, Precision, Recall, F1-score, Confusion Matrix, and sometimes ROC curves.
- Comparison: Highlights how model complexity and architecture choice affect classification performance.
.
├── cnn.ipynb # CNN-based text classification
├── lstm.ipynb # LSTM-based approach
├── BiGRU.ipynb # Bidirectional GRU
├── Bert.ipynb # BERT-based model
├── text.csv # Contains Kaggle Emotion dataset
└── README.md # Project documentation
- Clone this repository:
git clone https://github.com/YourUsername/Text-Emotion-Classification-NLP.git cd Text-Emotion-Classification-NLP
- Obtain and place the Kaggle Emotion dataset:
- Download from Kaggle: NIDULA ELGIRIYEWITHANA’s Emotion dataset.
-
Launch Jupyter Notebook (or JupyterLab):
jupyter notebook
-
Open any of the notebooks (e.g.,
cnn.ipynb
,lstm.ipynb
, etc.):- Run the cells in order to load data, build the model, and evaluate results.
- Each notebook details its own architecture and training routine.
-
Compare Results:
- Look at accuracy, F1 scores, and confusion matrices to see how CNN, LSTM, BiGRU, and BERT differ in performance.
- CNN: About 90% accuracy, excels at extracting local features from text.
- LSTM: Captures long-term context, around the low- to mid-90s accuracy.
- BiGRU: Often surpasses LSTM in capturing bidirectional context, ~94% accuracy.
- BERT: Typically the best performer (~97% accuracy), though more computationally intensive.
This project is licensed under the MIT License.