Zero Dependency Rust Workspace

This workspace showcases what modern AI infrastructure looks like when it is written entirely on top of the Rust standard library. Every crate in the repository is dependency-free and built to be instructive: you can inspect, tweak, and extend the full stack without pulling code from crates.io. The primary purpose is educational—learning how the pieces of a transformer-based language model, autograd engine, tensor library, tokenizer, and support tooling fit together. The code is not optimized for performance or production use, it is pretty slow compared to optimized libraries, but it is simple and easy to follow.

Quite many modules were written with help from large language models (LLMs) which has quite of impact on code. However, overall architecture is designed without LLM.

Unsafe code is widely used inside kernels to win a bit on runtime performance (e.g. for matrix multiplication), however it is better to avoid that in production code.

Idea

Basically, I've built this project while reading Build a Large Language Model from Scratch book by Sebastian Raschka. In this book, pytorch is used for replicating some simple LLM (GPT2) using pytorch. To be a bit picky, I would say it is not really "from scratch" because pytorch is a huge library with many dependencies and it hides you a lot of important details about training itself, such as autograd, tensor operations, etc.

So I decided to implement the same thing but without any dependencies, just using Rust standard library. This is ultimately a fun exercise, using LLM can help a lot, but it is not magic, you still need to understand what is going on and guide the LLM to produce the code you want.

Highlights

GPT-2 style transformer implemented from scratch in neural-net, including embeddings, multi-head attention, feed-forward blocks, layer norms, dropout, and a text generation API.
Autograd and training utilities in zero-grad, supporting forward/backward passes, state dicts, and simple optimization loops.
Tokenization, dataset, and generation tooling for loading GPT-2 BPE vocabularies (texten), turning raw text into training batches, and streaming generated tokens.
Reusable building blocks like numeric (matrix ops), par-iter (Rayon-inspired parallel iterators), small-vec, rand, json, json-derive, and zerr, all written without external crates.

Workspace Layout

Crate	Purpose
`neural-net`	End-to-end neural network stack with GPT-2 model, dataset loaders, text generation utilities, and SafeTensors/Hugging Face weight import helpers.
`zero-grad`	Reverse-mode autograd engine, tensor runtime, and training scaffolding used by `neural-net`.
`numeric`	Core tensor math and linear algebra primitives (views, matmul, reductions, etc.).
`par-iter`	Parallel iterator combinators (map, zip, reduce, cartesian product) built with std threads/atomics.
`small-vec`	Small vector optimized for stack storage with transparent heap promotion.
`rand`	Minimal random number utilities (e.g., `SimpleRng`) for sampling, dropout, initialization.
`texten`	Zero-dependency BPE tokenizer compatible with GPT-2 vocabularies.
`json`, `json-derive`	JSON parser/serializer plus a derive macro crate, used for configuration and tooling.
`zerr`	Ergonomic error handling helpers (context chaining, threading support).
`json-derive-test`	Integration tests and examples for the derive macros.

In addition, the notebooks/ folder contains reference datasets (notebooks/data) and Jupyter notebooks that showcase training and inference workflows.

GPT-2 from Scratch

The neural-net crate assembles a complete GPT-2 style decoder-only transformer:

modules::gpt: model builder with token/position embeddings, transformer stacks, and language-model head.
modules::transformer_block & modules::multi_head_attention: full attention pipeline with dropout, causal masking, and QKV projections.
generation::text_simple: iterative text generation with greedy or top-k sampling, streaming support, and performance metrics.
generation::data_loader: slice datasets, batching, and shuffling utilities for training.
state::{binary, safetensors, huggingface}: load/save checkpoints, including conversion from Hugging Face naming.

Because everything lives in this workspace, you can inspect the entire inference/training path: tensors come from numeric, gradients from zero-grad, scheduling from par-iter, tokenization from texten, and errors bubble through zerr.

Sample Inference

use neural_net::generation::TextGenerator;
use neural_net::modules::GPTModelBuilder;
use neural_net::TensorResult;
use zero_grad::Tensor;

fn main() -> TensorResult<()> {
    // 1. Build a small GPT model (adjust hyperparameters as needed)
    let model = GPTModelBuilder::new()
        .vocab_size(50257)
        .context_length(128)
        .emb_dim(768)
        .n_heads(12)
        .n_layers(12)
        .drop_rate(0.0)
        .build::<f32>()?;

    // 2. Prepare an input prompt (batch size = 1)
    let prompt_ids = Tensor::from(vec![[50256_f32, 318., 257., 2211.]])?; // ``~ hello gpt2``

    // 3. Generate additional tokens with greedy decoding
    let generated = TextGenerator::new()
        .max_new_tokens(32)
        .temperature(1.0)
        .greedy()
        .generate(&model, &prompt_ids, 128)?;

    println!("Generated token ids: {:?}", generated.elements().collect::<Vec<_>>());
    Ok(())
}

Sample Training Step

See neural-net/examples/step.rs for a runnable training demo that:

Loads the GPT-2 BPE vocabulary from notebooks/data/gpt2 using the texten tokenizer.
Creates a SliceDataset from notebooks/data/dataset/verdict.txt with stride-based sampling.
Builds the GPT model and runs a single optimizer step with zero_grad::train.

Run the example:

cargo run -p neural-net --release --example step -- ./notebooks/data

Getting Started

Install Rust (stable toolchain is sufficient).
Clone the repository and navigate into it.
Verify everything builds and the tests pass:

cargo test

Running Text Generation with GPT-2 model

Download a safetensors GPT-2 model (e.g., from Hugging Face) and place it in .data/model.safetensors. Then run text generation:

cargo run --release --example safetensors -- .data/model.safetensors

It will be pretty slow, but you should see generated tokens printed to the console in streaming fashion.

Jupyter Notebooks

The notebooks in notebooks/ mirror some of these flows and include exploratory analysis, GPT fine-tuning experiments, and profiling notes.

Why Zero Dependencies?

Transparency & auditability – every line is here, so you can review and modify the full stack.
Educational value – learn how tensors, autograd, attention, tokenization, and training loops are implemented under the hood.
Portability – minimal compile times and no external supply-chain risk.
Experimentation playground – swap components (e.g., attention variants, optimizers) without fighting opaque upstream implementations.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Zero Dependency Rust Workspace

Idea

Highlights

Workspace Layout

GPT-2 from Scratch

Sample Inference

Sample Training Step

Getting Started

Running Text Generation with GPT-2 model

Jupyter Notebooks

Why Zero Dependencies?

About

Uh oh!

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
json-derive-test		json-derive-test
json-derive		json-derive
json		json
neural-net		neural-net
notebooks		notebooks
numeric		numeric
par-iter		par-iter
rand		rand
small-vec		small-vec
texten		texten
zero-grad		zero-grad
zerr		zerr
.gitignore		.gitignore
.rustfmt.toml		.rustfmt.toml
Cargo.lock		Cargo.lock
Cargo.toml		Cargo.toml
LICENSE.md		LICENSE.md
README.md		README.md

License

reinterpretcat/zero-depend-pub

Folders and files

Latest commit

History

Repository files navigation

Zero Dependency Rust Workspace

Idea

Highlights

Workspace Layout

GPT-2 from Scratch

Sample Inference

Sample Training Step

Getting Started

Running Text Generation with GPT-2 model

Jupyter Notebooks

Why Zero Dependencies?

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages