Understanding Large Language Models: Visualization and Training Concepts

Overview ✨

Welcome! This repository offers a two-part guide designed to demystify the internal workings and training lifecycle of modern Large Language Models (LLMs), focusing on the Transformer architecture. We aim to bridge the gap between abstract concepts and concrete examples by visualizing a real model's parameters and explaining how such models learn.

Part 1 explores and visualizes the architecture, parameters, and dynamic attention mechanisms of the deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B model to build intuition.
Part 2 provides a conceptual overview of the LLM training lifecycle, including pre-training, fine-tuning strategies (SFT, Alignment with RLHF/GRPO), knowledge distillation, and parameter-efficient techniques (PEFT/LoRA).

Target Audience 🎯

This guide is intended for:

Students learning about AI, Machine Learning, and Natural Language Processing (NLP).
Developers curious about the models they interact with.
Researchers looking for practical ways to inspect model internals or understand training paradigms.
Anyone seeking a deeper understanding of how LLMs function and learn.

Basic familiarity with Python is assumed. Key concepts are explained within the notebooks.

Content Overview 📚

This guide covers the following key areas across two notebooks:

Part 1: Architecture & Visualization (LLM_Architecture_Visualization.ipynb)

Foundations: Core ML/ANN concepts, parameters.
Input: Tokenization, Token Embeddings (visualized).
Transformer Blocks: Self-Attention (QKV, Multi-Head, GQA context), Position-wise Feed-Forward Networks (FFN using SwiGLU), Layer , Residual Connections (components visualized for Layer 0, Middle, Last).
Output: Final Normalization, Language Modeling Head (visualized, weight tying checked).
Analysis: Parameter statistics across layers, dynamic attention pattern heatmaps, aggregate weight visualizations (Q, K, V, O, FFN projections across all layers).

Part 2: Training & Fine-tuning Concepts (LLM_Training_Lifecycle.ipynb)

Pre-training: Building foundational knowledge (Next-Token Prediction).
Knowledge Distillation: Context for the specific DeepSeek-R1-Distill model.
Fine-tuning: Supervised Fine-tuning (SFT) / Instruction Tuning.
Alignment Tuning: Concepts of RLHF/PPO, GRPO.
Efficiency: Parameter-Efficient Fine-tuning (PEFT), focusing on LoRA.

Notebooks

Part 1: Architecture & Visualization
- File: LLM_Architecture_Visualization.ipynb
- Focus: Dissects the architecture of deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B and provides code to visualize its parameters and attention mechanisms.
Part 2: Training & Fine-tuning Concepts
- File: LLM_Training_Lifecycle.ipynb
- Focus: Provides a conceptual explanation of the LLM training lifecycle relevant to the model in Part 1. Contains no runnable training code.

Model Used (for Visualization in Part 1)

Model ID: deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B
Link: Hugging Face Model Card

How to Use

Open in Colab: Click the "Open In Colab" badges in the Notebooks section above.
Select Runtime (Part 1): Use a GPU accelerator in Colab (Runtime -> Change runtime type -> T4 GPU) for Part 1 for best performance with model loading and visualization. Part 2 is conceptual.
Run Cells Sequentially: Execute the notebook cells in order.
Explore: Read the explanations and observe the generated outputs and visualizations in Part 1. Note: The aggregate weight plots near the end of Part 1 can be very resource-intensive (RAM/CPU) and may take significant time to render or cause slowdowns.

Requirements

Python libraries: transformers, torch, accelerate, matplotlib, seaborn, numpy. (Installed by the notebook).
Internet connection (for model download).
Sufficient RAM (>= 12GB recommended) and GPU VRAM (>= 8GB recommended).

Some Things to Consider

Part 1 Focus: Primarily architecture, parameters, attention visualization. Excludes runnable training, activation analysis. Aggregate plots are resource-heavy.
Part 2 Focus: Conceptual explanations only; no runnable training code.
Model Specificity: While core concepts are general, some implementation details relate to the specific Qwen/DeepSeek model.

Name		Name	Last commit message	Last commit date
Latest commit History 13 Commits
assets		assets
LLM_Architecture_Visualization.ipynb		LLM_Architecture_Visualization.ipynb
LLM_Training_Lifecycle.ipynb		LLM_Training_Lifecycle.ipynb
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Understanding Large Language Models: Visualization and Training Concepts

Overview ✨

Target Audience 🎯

Content Overview 📚

Notebooks

Model Used (for Visualization in Part 1)

How to Use

Requirements

Some Things to Consider

About

Uh oh!

Releases

Packages

Languages

Hercules45/Understanding-LLM-Internals

Folders and files

Latest commit

History

Repository files navigation

Understanding Large Language Models: Visualization and Training Concepts

Overview ✨

Target Audience 🎯

Content Overview 📚

Notebooks

Model Used (for Visualization in Part 1)

How to Use

Requirements

Some Things to Consider

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages