NLP_CourseFinalProject

Repository for the final project of "Practical Machine Learning for Natural Language Processing" (University of Vienna, Summer Semester 2024).

Project Overview

This project presents a comparative analysis of modern Natural Language Processing (NLP) methodologies for the task of detecting sexist language in tweets. It evaluates the performance and cost of five distinct approaches:

Fine-Tuned BERT Model for sequence classification. ('germeval.ipynb')
Zero-Shot and Few-Shot Prompting using DeepSeek and GPT APIs. ('comparisonAPI.ipynb')
Retrieval-Augmented Generation (RAG) using a VectorStoreIndex. ('comparisonVectorStoreIndex.ipynb')
Retrieval-Augmented Generation (RAG) using a KeywordTableIndex. ('comparisonKeywordTableIndex.ipynb')
Combining RAG pipelines with few shot Prompting. (comparisonAPIRAG.ipynb)

Motivation

Detecting sexist language is a key challenge in studying online discourse. Automated methods can help researchers analyze large-scale corpora of social media data to understand the prevalence and nature of sexist rhetoric. However, the nuanced and context-dependent nature of such language makes it a difficult classification problem.

Methodology

The performance of each method was evaluated and compared on a standardized dataset (with dedicated train, development, and test set) using the F1-score as the primary metric. The implementation details for each approach are as follows:

Fine-Tuned BERT (bert-base-uncased): Implemented using the Hugging Face transformers library, optimized for sequence classification on our specific dataset.
API Prompting (DeepSeek & GPT): Leveraged zero-shot and few-shot prompting strategies to query large language models via their APIs for classification, testing their ability to generalize with little to no task-specific training.
RAG Pipeline (VectorStoreIndex): Implemented a RAG system using LlamaIndex's VectorStoreIndex to retrieve the most semantically similar examples from the training set to provide context for an LLM's classification decision.
RAG Pipeline (KeywordTableIndex): Implemented an alternative RAG system using a KeywordTableIndex to retrieve examples based on keyword matching, providing a different retrieval strategy for comparison.
RAG Pipelines (3. and 4.) combined with API Prompting (DeepSeek & GPT) (2.).

Findings

Best overall performance was achieved by models relying on the API (specifically DeepSeek) both with and without RAG. The fine-tuned BERT models achieved average results, while models relying only on RAG yielded the lowest F1-scores.

Name		Name	Last commit message	Last commit date
Latest commit History 36 Commits
data_germeval		data_germeval
README.md		README.md
combinationAPIRAG.ipynb		combinationAPIRAG.ipynb
comparisonAPI.ipynb		comparisonAPI.ipynb
comparisonRAG_KeywordTableIndex.ipynb		comparisonRAG_KeywordTableIndex.ipynb
comparisonRAG_VectorStoreIndex.ipynb		comparisonRAG_VectorStoreIndex.ipynb
functions.py		functions.py
germeval.ipynb		germeval.ipynb
pml4nlp_final_project.pdf		pml4nlp_final_project.pdf
requirements		requirements

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

NLP_CourseFinalProject

Project Overview

Motivation

Methodology

Findings

About

Uh oh!

Releases

Packages

Languages

NotJona/PracticalMachineLearningForNLP

Folders and files

Latest commit

History

Repository files navigation

NLP_CourseFinalProject

Project Overview

Motivation

Methodology

Findings

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages