Skip to content

kbulutozler/nlp-createML

Repository files navigation

nlp-createML

This repository is for publishing documented examples of NLP applications with Apple's createML.

Sequence Classification

The data can be found here. It consists of highly polar movie reviews. After downloading the file, it will be shown as a directory named aclImdb. Put this folder in this repository.

  • If you don't want to change/redo the way the dataset is preprocessed, go to Datasets/MLTextClassifier/ and see ready-to-go dataset files.
  • If you do, use the notebook MLTextClassifier_preprocessing.ipynb to extract the training and testing datasets in the json format. Change the dataset path to where you put aclImdb folder.
  • The notebook will create 2 files: training.json, testing.json. Go to xcode and add these files to Resources folder of the project.
  • In the xcode, you can train the default model which is maximum entropy, or transfer learning method. The results I got are in the table:
Model Accuracy Percentage
Original Best* 88.89
maxEnt 87.52
transferLearning 74.45
  • Best result in the paper under the column Our Dataset.

Token Classification

The data can be found here and here. en_train.csv has tokens in both normalized and unnormalized form. Tokens belong to a large list of sentences. pos_tags.csv has tags for tokens that have "plain" class in en_train.csv.

  • If you don't want to change/redo the way the dataset is preprocessed, go to Datasets/MLWordTagger/ and see ready-to-go dataset files.
  • If you do, use the notebook MLWordTagger_preprocessing.ipynb to extract the training and testing datasets in the json format. Change the dataset path to where you put en_train.csv and pos_tags.csv files.
  • The notebook will create several files with different sizes since the original dataset is pretty large and slow to process. Since the original dataset has both unnormalized and normalized tokens, the notebook creates training and testing files for each. Choose the files you want to use, go to xcode and add these files to Resources folder of the project.
  • In the xcode, you can train the default model which is crf, or transfer learning method. The results I got are in the table:
Model Training (Size) Testing (Size) Accuracy Percentage
crf normalized (6k) normalized (3k) 94.43
crf normalized (6k) unnormalized (3k) 90.89
crf unnormalized (6k) normalized (3k) 76.33
crf unnormalized (6k) unnormalized (3k) 93.78

Hits

About

NLP applications with Apple's createML

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published