Recreating the Byte Pair Encoding algorithm used in tokenizers, along with GPT styled Regex Tokenizer.
-
- basically a follow along notebook from Karpathy's tokenizer video.
-
- a cleaned up version of the notebook, BasicTokenizer of the Karpathy exercise.
-
- Regex Tokenizer from the Karpathy Exercise, also implemented handling of special tokens.
- Implement sentence piece