- We perform Fine-grained Emotion Classification (FEC) on benchmark datasets using
Empathetic Dialogues(ED)
(32 labels) andGoEmotions
(27 labels). - Our approach leverages feature extraction from three large language models (LLMs)—Llama-2-7b-chat, BERT-large, and RoBERTa-large—followed by a structured feature fusion and a lightweight classification model. The feature fusion method and the classifier is adapted from the ACL 2024 paper LLMEmbed with the corresponding GitHub repository.
- Instead of fine-tuning LLMs, we extract their representations and train a compact classifier that integrates semantic knowledge and feature interactions through co-occurrence pooling and power normalization. This method ensures an efficient, scalable, and expressive emotion classification pipeline.
- The code for creating and saving feature tensors is in
llama2_rep_extract.py
. The data for ED and go_emotion isdata
folder. - To extract and save LLaMA2 feature tensors for the training set on ED dataset, run the following:
python llama2_rep_extract.py -device cuda -task ED -mode train
- Some imp arguments :
-task
denotes name of dataset possible options areED
andgo_emotion
.-mode
Possible options aretrain
,test
andvalid
. - For ED, the tensors will be saved inside 'dataset\ED\llama2_7b_chat`. Similarly for go_emotion.
- For LLaMA2, the possible model choices are the base model Llama-2-7b-hf and the version optimized for dialogue use cases, Llama-2-7b-chat-hf. We used
Llama-2-7b-chat-hf
becauseLlama-2-7b-hf
produced NaN values on the validation and test sets. - For LLaMA2, the embedding dimension is 4096. We average the embeddings across all tokens in the last 5 layers and save the resulting tensor of shape (5, 4096) per sample, where 5 represents the last 5 layers. For reference, see the log file
Create_data_LLAMA2_GO.out
in theLogs
folder.
- The code for creating and saving feature tensors is in
X_rep_extract.py
where X is bert/roberta. The data for ED and go_emotion isdata
folder. - To extract and save bert feature tensors for the training set on ED dataset, run the following:
python bert_rep_extract.py -device cuda -task ED -mode train
- For ED, the tensors will be saved inside 'dataset\ED\bert`. Likewise the tensors in case of roberta will be saved inside 'dataset\ED\roberta`. Similarly it will be saved for go_emotion.
- For BERT/RoBERTa, the embedding dimension is 1024. We save the representation of the [CLS] token (i.e., the 0th token) from the final layer. For each sample, we store a tensor of shape (1024). For reference, see the log files in the
Logs
folder.
After feature extraction and saving tensors inside the data
folder, run the following script to train the FEC on ED dataset.
python main.py -dataset 'ED'
There are a few other optional runtime arguments, which can be found in main.py
We report the average results over 5 independent random runs.
The results on the ED dataset are comparable to SOTA approaches, whereas for GoEmotions, the performance is significantly below the current SOTA. The Empathetic Dialogues (ED) dataset consists of conversational dialogues, which align well with the underlying LLaMA-2-7B-Chat model, as it is also fine-tuned for dialogue-based use cases. The GoEmotions (go_emotions) dataset consists of Reddit comments, which differ in style and structure from conversational dialogues. Since LLaMA-2-7B-Chat is optimized for dialogues, this domain mismatch likely contributes to its lower performance on GoEmotions.
Explanation of Structured Feature Fusion and Co-occurrence Pooling Used in the LLM-Fusion Classifier
The code for this module is in DownstreamModel.py
- LLaMA2 produces (5, 4096) embeddings per sample.
- Each 4096-d embedding is compressed to 1024-d using a linear transformation (
4096 → 1024
). - This results in a compressed representation of shape (5, 1024).
- The compressed LLaMA2 embeddings (5, 1024) are combined with:
- BERT CLS embedding (1024)
- RoBERTa CLS embedding (1024)
- The final representation is stacked into a tensor of shape (7, 1024) per sample.
- Instead of simple concatenation, we compute pairwise feature interactions.
- The interaction matrix (7×7) is computed using dot products.
- This is then flattened into a vector of size 49.
- A nonlinear transformation (
tanh(2*sigma*X)
) is applied to rescale interaction values to (-1,1). sigma is a hyperparameter that can be tuned. We have used a default value of 1. - This ensures balanced feature magnitudes.
- The mean-pooled LLaMA2 embedding (4096-d) is computed.
- The final representation is concatenated as
[interaction_features (49) + mean_LLaMA2 (4096)]
. - This results in a final feature vector of size (4145-d) per sample.
- A small feedforward network maps features to logit scores, which are used for classification.