Skip to content

juxtin/llm-from-scratch

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

LLM (From Scratch)

In this repo, I'm building a large language model comparable to GPT-2, using the following resources:

  1. The book Build a Large Language Model (From Scratch), by @rasbt.
  2. The YouTube playlist Building LLMs from scratch by Vizuara.
  3. Attention Is All You Need, by Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N. Gomez, Lukasz Kaiser, and Illia Polosukhin.

Notebooks

I recommend checking out the notebooks in this order. This follows the plan of the book, although some of the notebooks are now significantly more advanced than what the book itself covers.

  1. gpt: A great place to start. This file implements the core GPT architecture as a Torch module. It explains a lot of the math related to inference, but avoids the elements that are specific to training.
  2. training: Includes both a quick-and-dirty minimal training approach and a much more mature, modular training function with swappable metrics, example generation, and learning schedulers.
  3. project_gutenberg: Follows the training notebook with a real-world(-ish) example. In this case, it trains a model on the deepmind/pg19 dataset, which includes about 28,000 English-language books in the public domain.
  4. openai: This notebook allows you to download and import pre-trained weights from OpenAI for various GPT-2 models.
  5. classifier: Fine-tunes one of the OpenAI models to classify SMS messages as either "spam" or "not spam."
  6. chat: Fine-tunes one of the OpenAI models to follow instructions, such as "Use the word 'artificial' in a sentence."
  7. grading_with_ollama: Uses a production-grade local LLM to evaluate the responses generated by the fine-tuned model in chat.ipynb. This also has the results of my experiment with pre-training a model on Project Gutenberg and then fine-tuning it to follow instructions.
  8. apple_neural: Demonstrates exporting a fine-tuned model using CoreML so you can bundle it in a Swift app.

Demo

I've bundled a model (using Git LFS) and you can try it out yourself:

python src/demo.py

As an experiment, this demo color-codes tokens based on the confidence that the model had in them. It's an interesting way to see a little bit deeper into what the model "thinks." I had hoped it might shed a little light on hallucinations, but in practice I don't really think it does.

Current Status

  • All of the book/class material is complete.
  • I've done some serious refactors and made significant extensions beyond that, mainly to explore what a more production-quality system might look like.
  • I would love to go back and enhance some of the notes, but I may or may not ever do that.

To run

  • uv venv && source .venv/bin/activate && uv pip sync pyproject.toml
  • for CUDA support, uv pip install --group cuda
  • the openai notebook requires uv pip install --group dev
  • MLflow is optional, but recommended. If you want to run project_gutenberg without it, make sure to set metrics=training.StdoutMetrics().
  • Open in Jupyter Notebook

MLflow dashboard

To expose the MLflow dashboard (to get training metrics), run this in a terminal:

source .venv/bin/activate
mlflow ui

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published