LLM (From Scratch)

In this repo, I'm building a large language model comparable to GPT-2, using the following resources:

The book Build a Large Language Model (From Scratch), by @rasbt.
The YouTube playlist Building LLMs from scratch by Vizuara.
Attention Is All You Need, by Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N. Gomez, Lukasz Kaiser, and Illia Polosukhin.

Notebooks

I recommend checking out the notebooks in this order. This follows the plan of the book, although some of the notebooks are now significantly more advanced than what the book itself covers.

gpt: A great place to start. This file implements the core GPT architecture as a Torch module. It explains a lot of the math related to inference, but avoids the elements that are specific to training.
training: Includes both a quick-and-dirty minimal training approach and a much more mature, modular training function with swappable metrics, example generation, and learning schedulers.
project_gutenberg: Follows the training notebook with a real-world(-ish) example. In this case, it trains a model on the deepmind/pg19 dataset, which includes about 28,000 English-language books in the public domain.
openai: This notebook allows you to download and import pre-trained weights from OpenAI for various GPT-2 models.
classifier: Fine-tunes one of the OpenAI models to classify SMS messages as either "spam" or "not spam."
chat: Fine-tunes one of the OpenAI models to follow instructions, such as "Use the word 'artificial' in a sentence."
grading_with_ollama: Uses a production-grade local LLM to evaluate the responses generated by the fine-tuned model in chat.ipynb. This also has the results of my experiment with pre-training a model on Project Gutenberg and then fine-tuning it to follow instructions.
apple_neural: Demonstrates exporting a fine-tuned model using CoreML so you can bundle it in a Swift app.

Demo

I've bundled a model (using Git LFS) and you can try it out yourself:

python src/demo.py

As an experiment, this demo color-codes tokens based on the confidence that the model had in them. It's an interesting way to see a little bit deeper into what the model "thinks." I had hoped it might shed a little light on hallucinations, but in practice I don't really think it does.

Current Status

All of the book/class material is complete.
I've done some serious refactors and made significant extensions beyond that, mainly to explore what a more production-quality system might look like.
I would love to go back and enhance some of the notes, but I may or may not ever do that.

To run

uv venv && source .venv/bin/activate && uv pip sync pyproject.toml
for CUDA support, uv pip install --group cuda
the openai notebook requires uv pip install --group dev
MLflow is optional, but recommended. If you want to run project_gutenberg without it, make sure to set metrics=training.StdoutMetrics().
Open in Jupyter Notebook

MLflow dashboard

To expose the MLflow dashboard (to get training metrics), run this in a terminal:

source .venv/bin/activate
mlflow ui

Name		Name	Last commit message	Last commit date
Latest commit History 134 Commits
models		models
notebooks		notebooks
script		script
src		src
.gitignore		.gitignore
README.md		README.md
pyproject.toml		pyproject.toml
requirements.txt		requirements.txt
uv.lock		uv.lock

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

LLM (From Scratch)

Notebooks

Demo

Current Status

To run

MLflow dashboard

About

Uh oh!

Releases

Packages

Uh oh!

Languages

juxtin/llm-from-scratch

Folders and files

Latest commit

History

Repository files navigation

LLM (From Scratch)

Notebooks

Demo

Current Status

To run

MLflow dashboard

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages