Unofficial implementation of the Efficient Estimation of Word Representations in Vector Space paper written in PyTorch with code for training and demonstration of the properties of the trained model. Emphasis was placed on the Skip-gram Model only.
Files to be familiarized with:
word2vec.pth
is a pre-trained model on the Amazon Fashion dataset with a 4000-word vocabulary,inference.ipynb
contains the playground and demonstrates some properties of the model,train.ipynb
trains word2vec from scratch. Use it if you want to customize the training process for yourself,extra/cloud.svg
shows t-SNE visualization of the most distinct word clusters.
git clone https://github.com/tejpaper/word2vec.git
cd word2vec
pip install -r requirements.txt
Emotions and feelings |
Family |
Seasons |
Numbers |
Colors |
Body parts |
Clothes |
Sizes |
MIT