Yet another toxic comment classification
- Python 3.7 or higher
- GNU Make
- CUDA 10.2 or higher
Clone the repo to your local machine:
git clone https://github.com/halecakir/toxic-comment-classificationBuild the python virtual environment:
make venv/bin/activateFetch wordvec data from multiple sources (glove, google-news, fasttext):
make fetch_allTrain the model with the jigsaw data:
make train ARGS=WORD_VECTOR # WORD_VECTOR ∈ {"google.bin", "fasttext.bin", "glove.txt"})Test the model:
make testRemove all model artifacts:
make clean- Try Attention mechanism
- Try tranformers-based mechanismss
- Try incorporation of hybrid (word level + character level) word vectors for words that have no pretrained vectors
- Try Gradient clipping for exploding gradient
- Add hyperparamerer optimization
- Add sanity tests
- Documentation!