To get started with this module you will need to have the following requirements : uv
Recommended way use the installation script :
curl -LsSf https://raw.githubusercontent.com/nhamilakis/Lexical-benchmark/refs/heads/dev/install.sh | sh
This script does all the setup required & creates a local env with all requirements.
This module is mostly a library allowing to be imported to perform the various tasks and create analytics from the datasets. There are two different mediums
A list of scripts in the src/scripts/ folder allow to do most of the computations
TBA...
A list of notebooks allowing to explore data and plot analytics.
TBA...
If you are working in a slurm cluster all scripts used to run the experiments can be found @ src/slurm_scripts/
For more detailed information on each dataset you can check out their pages :
-
ChildRealistic (TBA)
-
Details on how the word-lists used for word validation -> wordlist