HistoricalEmotionAnalysis

This is the repository of my master's thesis "Computational Linguistics Approaches to Historical Emotion Analysis: Evaluating Word Embeddings, Induction Algorithms, and Lexicon Choice."

Abstract

This master’s thesis investigates potential extensions of the model for historical emotion analysis proposed by Hellrich et al. (2019). For this purpose, the model – based on the combination of historical word embedding, a VAD lexicon, and an induction algorithm – is extended by three word embeddings (PPMI, CBOW, FastText), an induction algorithm (linear regression), and a VAD lexicon (NRC-VAD). In addition, the influence of lexicon size is examined. Model performance can be successfully improved: the best model combines SGNS embedding, linear regression as induction algorithm, and the largest version of the Warriner VAD lexicon used in the original model. However, the results indicate that a stronger contemporary influence has a positive effect on historical emotion analysis. Possible causes for this contradictory outcome are discussed.

Notes

Note that (1) the following data could not be included due to size constraints:

for Section 4.1 (Preprocessing): the preprocessed corpora, the trained embeddings
for Section 4.3 (Investigating Lexicon Size): the randomly generated seed word lexica, the predicted historical vad lexica

Note that (2) the COHA could not be included as it is not open data. I did include an 'Example Corpus' and example preprocessed data (preprocessed_corpus.json and preprocessed_filtered_corpus.json are NOT the corpus files I used, they are here as example data!!), so the code can be tested. But, of course, one cannot recreate my findings with these data alone.

Note that (3), if you decide to retrain the embeddings yourself, results will vary slightly due to the non-deterministic nature of SGNS, CBOW, and FastText. While some outcomes in 4.2 and 4.3 may change, my central findings of 4.4 (discussed in 5.) still hold true.

Name		Name	Last commit message	Last commit date
Latest commit History 53 Commits
Embeddings		Embeddings
Example Corpus/wlp_1830s_ksq		Example Corpus/wlp_1830s_ksq
HistoricalVAD		HistoricalVAD
VADLexica		VADLexica
LinearRegression.py		LinearRegression.py
Masterarbeit_finished_version.pdf		Masterarbeit_finished_version.pdf
PaRaSim.py		PaRaSim.py
README.md		README.md
RandomWalk.py		RandomWalk.py
correlation.py		correlation.py
functions.py		functions.py
kNN.py		kNN.py
main.ipynb		main.ipynb
model_definition.py		model_definition.py
preprocessed_corpus.json		preprocessed_corpus.json
preprocessed_filtered_corpus.json		preprocessed_filtered_corpus.json
preprocessing.py		preprocessing.py
requirements		requirements
visualizations.ipynb		visualizations.ipynb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

HistoricalEmotionAnalysis

Abstract

Notes

About

Uh oh!

Releases

Packages

Languages

NotJona/HistoricalEmotionAnalysis

Folders and files

Latest commit

History

Repository files navigation

HistoricalEmotionAnalysis

Abstract

Notes

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages