This is a Node.js implementation of a sliding window n-gram Markov model. It's specifically built to parse the discord data package formatting, but could be swiftly adapted to other datasets.
Requires Node.js
Requires you to request your data from discord, can be found in User Settings -> Data & Privacy -> Request Your Data. It is recommended to only ask for messages, as that's all this project needs to function.
After recieving your messages from discord, they should be in this format:
index.json
./c34546745847
./c345357567882
./c2342341253467
...
Run the following command to start the parsing and training process and answer the prompts to configure the paths to your index.js and messages folder.
node setup
After that, you can use your model just by running the following command:
node chat
This will open a chat session with the model.
Since this is a text prediction model it needs a phrase to start with. If it cannot predict anything from that input it will just output the input phrase.
Okay, so you've trained your model and you want to use it's output in another project. This is thankfully, quite easy with Node.js.
Firstly, you need your model.json file (of course!). Then the only two files you need from this project is tokenize.js and model.js. model.js uses tokenize.js, hence why we need it as well.
Finally, import the following functions from model.js:
const { generateTokens, stringifyOutput } = require('./model');These functions require some parms:
generateTokens(context, startingPhrase)-
context: Number of tokens to look back upon in order to predict the next word. -
startingPhrase: Phrase to build upon.
stringifyOutput(array)array: Array of tokens to be formatted as a string (usegenerateTokens()as input).
generateTokens() returns an array of tokens, and stringifyOutput() can be used to format this array of tokens to display. For example:
stringifyOutput(generateTokens(3, "Hello there"))