Skip to content

azain47/gpt-tokenizer

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

3 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

GPT Tokenizer

Recreating the Byte Pair Encoding algorithm used in tokenizers, along with GPT styled Regex Tokenizer.

  • BPEtokenizer.ipynb :

    • basically a follow along notebook from Karpathy's tokenizer video.
  • BPETokenizer.py:

    • a cleaned up version of the notebook, BasicTokenizer of the Karpathy exercise.
  • GPTTokenizer.py

    • Regex Tokenizer from the Karpathy Exercise, also implemented handling of special tokens.

TO-DO

  • Implement sentence piece

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published