Designed and implemented Transformer model from scratch using NumPy, replicating 'Attention Is All You Need' architecture; ported to PyTorch for efficient GPU training on a synthetic dataset.

Overview

This project implements the Transformer architecture from the paper "Attention Is All You Need" entirely from scratch using NumPy. Due to the computational requirements of training, the implementation was then ported to PyTorch to leverage GPU acceleration.

Transformer Architecture

Source: "Attention Is All You Need"

Implementation Details

NumPy Implementation

The initial implementation follows a modular approach, constructing the Transformer step-by-step with key components:

Embedding Layer: Converts token indices into dense vector representations.
Positional Encoding: Adds sinusoidal positional encodings to retain sequential information.
Multi-Head Attention: Implements self-attention and masked self-attention (for autoregressive tasks).
Layer Normalization: Normalizes activations for stable training.
Feed-Forward Neural Network: Fully connected layers applied after attention.
Encoder Layer & Encoder: Stacks multiple encoder layers.
Decoder Layer & Decoder: Implements the decoder with masked self-attention and encoder-decoder attention.
Dropout Layer: Regularization technique to prevent overfitting.
Transformer Model: Combines all modules into the final Transformer architecture.

PyTorch Implementation

Due to the high computational cost of training a Transformer model from scratch, the full implementation was ported to PyTorch, enabling:

Efficient GPU-accelerated training.
Autograd support for backpropagation.
Seamless data loading and batch processing.

Training Details

A synthetic dataset was generated for testing the Transformer model.
The PyTorch version was trained on this dataset to validate the implementation.
Loss reduction and model outputs were observed to assess training effectiveness.

Usage

Running the NumPy Implementation

python numpy_transformer.ipynb

This runs the Transformer model built using only NumPy.

Running the PyTorch Implementation

python torch_transformer.ipynb

Sample Output (PyTorch)

Epoch 1/50, Loss: 4.6052, LR: 0.000088  
Epoch 2/50, Loss: 4.2000, LR: 0.000177  
...  
Epoch 50/50, Loss: 0.0500, LR: 0.000500  

Input: I like to read  
Predicted translation: <start> j aime lire <end>

Results & Insights

The PyTorch implementation demonstrated significantly faster training times due to GPU acceleration.
The model effectively learned the synthetic dataset, confirming correctness.
Comparison between NumPy and PyTorch highlighted the challenges of implementing deep learning architectures manually.

Future Work

Training on real-world NLP datasets such as WMT or IWSLT.
Implementing optimizations like FlashAttention.
Extending the model to support larger-scale tasks.

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
README.md		README.md
numpy_transformer.ipynb		numpy_transformer.ipynb
torch_transformer.ipynb		torch_transformer.ipynb
transformer_architecture.png		transformer_architecture.png

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Designed and implemented Transformer model from scratch using NumPy, replicating 'Attention Is All You Need' architecture; ported to PyTorch for efficient GPU training on a synthetic dataset.

Overview

Transformer Architecture

Implementation Details

NumPy Implementation

PyTorch Implementation

Training Details

Usage

Running the NumPy Implementation

Running the PyTorch Implementation

Sample Output (PyTorch)

Results & Insights

Future Work

References

About

Uh oh!

Releases

Packages

Languages

iDharshan/Tiny-Transformer-from-Scratch-NumPy-to-PyTorch

Folders and files

Latest commit

History

Repository files navigation

Designed and implemented Transformer model from scratch using NumPy, replicating 'Attention Is All You Need' architecture; ported to PyTorch for efficient GPU training on a synthetic dataset.

Overview

Transformer Architecture

Implementation Details

NumPy Implementation

PyTorch Implementation

Training Details

Usage

Running the NumPy Implementation

Running the PyTorch Implementation

Sample Output (PyTorch)

Results & Insights

Future Work

References

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages