Skip to content

xmarva/transformer-architectures

Repository files navigation

Transformer Playground

To understand Transformer Architectures through practice—from implementing one from scratch to adapting pre-trained models (BERT, GPT).

Notebook Concepts English Russian
1. Tokenization and Encoding BPE, HuggingFace Tokenizers, Collator Kaggle
Google Colab
Kaggle
Google Colab
2. Transformer Architecture Positional Encoding, Attention(KQV, Multi-Head), Encoder-Decoder Kaggle
Google Colab
Kaggle
Google Colab
3. Functions, Metrics, Tools BLEU, ROGUE, METEOR, WandB Kaggle
Google Colab
Kaggle
Google Colab
4. Transformer Training LR Scheduler, Xavier Initialization, LabelSmoothing, Hyperparameter Tuning Kaggle
Google Colab
Kaggle
Google Colab
5. Complete Transformer End-to-End Implementation Kaggle
Google Colab
Kaggle
Google Colab

Progress

Basic Components

  • Tokenization
  • Transformer Architecture
  • Functions and Tools
  • Transformer Training
  • Improvements and Experiments

Resources & References 📚

Paper Component/Method Link
Attention is All You Need Transformer Architecture arXiv:1706.03762
Neural Machine Translation of Rare Words with Subword Units Byte-Pair Encoding (BPE) arXiv:1508.07909
SentencePiece: A Simple Language-Independent Subword Tokenizer SentencePiece Tokenizer arXiv:1808.06226
Layer Normalization Layer Normalization arXiv:1607.06450
Adam: A Method for Stochastic Optimization Adam Optimizer arXiv:1412.6980
SGDR: Stochastic Gradient Descent with Warm Restarts Cosine Learning Rate Scheduler arXiv:1608.03983
Decoupled Weight Decay Regularization AdamW Optimizer arXiv:1711.05101
Understanding the Difficulty of Training Deep Feedforward Neural Networks Xavier/Glorot Initialization PMLR 9:249-256
Rethinking the Inception Architecture for Computer Vision Label Smoothing arXiv:1512.00567
Practical Bayesian Optimization of Machine Learning Algorithms Hyperparameter Tuning arXiv:1206.2944
Beam Search Strategies for Neural Machine Translation Beam Search Decoding arXiv:1702.01806
Efficient Transformers: A Survey Memory Optimization Techniques arXiv:2009.06732

Note: This is a living project—code and structure may evolve.

Releases

No releases published

Packages

No packages published