Skip to content

Train your own sliding window n-gram Markov prediction model based on your discord messages

Snorfield/Train-Your-Own-Prediction-Model

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

25 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Train-Your-Own-Prediction-Model

This is a Node.js implementation of a sliding window n-gram Markov model. It's specifically built to parse the discord data package formatting, but could be swiftly adapted to other datasets.

Prerequisites

Requires Node.js

Requires you to request your data from discord, can be found in User Settings -> Data & Privacy -> Request Your Data. It is recommended to only ask for messages, as that's all this project needs to function.

Usage

After recieving your messages from discord, they should be in this format:

index.json
./c34546745847
./c345357567882
./c2342341253467
...

Run the following command to start the parsing and training process and answer the prompts to configure the paths to your index.js and messages folder.

node setup

After that, you can use your model just by running the following command:

node chat

This will open a chat session with the model.

Since this is a text prediction model it needs a phrase to start with. If it cannot predict anything from that input it will just output the input phrase.

How to Use This In A Project

Okay, so you've trained your model and you want to use it's output in another project. This is thankfully, quite easy with Node.js.

Firstly, you need your model.json file (of course!). Then the only two files you need from this project is tokenize.js and model.js. model.js uses tokenize.js, hence why we need it as well.

Finally, import the following functions from model.js:

const { generateTokens, stringifyOutput } = require('./model');

These functions require some parms:

generateTokens(context, startingPhrase)
  • context: Number of tokens to look back upon in order to predict the next word.

  • startingPhrase: Phrase to build upon.

stringifyOutput(array)
  • array: Array of tokens to be formatted as a string (use generateTokens() as input).

generateTokens() returns an array of tokens, and stringifyOutput() can be used to format this array of tokens to display. For example:

stringifyOutput(generateTokens(3, "Hello there"))

About

Train your own sliding window n-gram Markov prediction model based on your discord messages

Resources

Stars

Watchers

Forks

Packages

No packages published