Diabl0

A hybrid world model implementation that combines symbolic tokenization with continuous data processing through a NumPy-based transformer architecture. Built on a simple PyGame foundation, Diabl0 demonstrates how neural architectures can learn emergent representations from gameplay interactions.

Overview

Diabl0 processes game state through a complete machine learning pipeline:

Multi-Stream Tokenization: Separates static environment, dynamic entities, and discrete events into efficient token streams
Continuous Data Capture: Stores raw sensor data (positions, velocities) in optimized HDF5 format
Hybrid Fusion: Combines symbolic tokens and continuous data into unified 464-dimensional vectors
Transformer Encoding: 6-layer self-attention encoder with anti-collapse mechanisms produces contextualized embeddings
Provenance Tracking: Maintains causal chains and lineage for interpretability

Quick Start

# Install dependencies
pip install -r requirements.txt

# Download emoji sprites (first time only)
python download_emojis.py

# Run the game
python game.py

## Controls

- **WASD**: Move player
- **E**: Start new episode
- **ESC**: Quit

## Architecture

### Data Pipeline

```mermaid
graph TD
    A[Game State] --> B[Multi-Stream Tokenizer<br/>(STATIC/DYNAMIC/EVENT)]
    B --> C[Hybrid Sequence Builder<br/>(concatenate + encode)]
    C --> D[464D Fused Vectors<br/>(3×112D symbolic + 128D continuous)]
    D --> E[Transformer Encoder<br/>(6 layers, 8 heads, 2048D FFN)]
    E --> F[Contextualized Embeddings (464D)]

Core Components

Tokenization (core/)

tokenizer.py - Vocabulary and base tokenizer
token_store.py - Multi-stream token storage
hybrid_tokenizer.py - Main hybrid interface

Hybrid Processing (core/hybrid/)

embeddings.py - Learned lookup tables for symbolic tokens
encoders.py - MLP/CNN encoders for continuous data
fusion.py - Stream fusion and temporal alignment

Transformer (core/transformer/)

encoder.py - Full 6-layer transformer encoder
attention.py - Multi-head self-attention mechanism
blocks.py - Encoder blocks with residuals (rank collapse prevention)
layers.py - LayerNorm, Linear, activation functions
positional.py - Sinusoidal position encoding
diagnostics.py - Real-time rank monitoring and alerts
visualization.py - Attention and embedding visualization tools

Storage & Provenance (core/)

storage/episode_manager.py - Episode lifecycle management
storage/continuous_store.py - HDF5 storage for continuous data
lineage/provenance.py - Token lineage and causal chains

Game Engine

game_engine.py - Physics, collision detection, event emission
entities/ - Player, obstacles, base entity class
rendering.py - Unified rendering system
sprites.py - Asset management

Key Features

Rank Collapse Prevention

The transformer implements all critical mechanisms from Dong et al. (2021):

Residual connections in every encoder block
4× FFN expansion (d_ff=2048 vs d_model=464)
Scaled attention (1/√d_k = 0.1313)
Layer normalization

Effective rank stays above 82 across all layers (critical threshold: 58).

Multi-Stream Architecture

Three separate token streams optimize storage and semantics:

STATIC: Emitted once (environment, obstacles)
DYNAMIC: Delta-based emission (player, enemies)
EVENT: Always emitted (inputs, collisions)

Real-Time Console Monitoring

Diagnostic tools provide comprehensive model health tracking:

Rank analysis with configurable thresholds
Attention pattern detection (dead/collapsed heads)
Token similarity monitoring
ASCII visualizations for console output (dashboard-ready data export available)

Testing

# Quick transformer validation (5 core tests)
python tests/test_transformer_quick.py

# Full transformer tests (10 tests)
python tests/test_transformer.py

# Edge case tests (7 tests)
python tests/test_edge_cases.py

# Integration testing
python tests/test_integration_1000frames.py --frames 1000 --verbose

# Stress test suite
python tests/test_integration_1000frames.py --stress

All tests include rank preservation validation, performance benchmarks, and stability checks.

Performance

Current Benchmarks (NumPy on CPU):

Single frame: ~1.3ms average (meets <10ms target)
Batch processing: 105-142ms/sample (optimization in progress)
Throughput: 7-10 samples/sec for batch size 8
Effective rank: 82-169 across 6 layers (no collapse)

Project Structure

diabl0/
├── core/
│   ├── entities/          # Game entities (player, obstacles)
│   ├── hybrid/            # Hybrid tokenization components
│   ├── transformer/       # NumPy transformer implementation
│   ├── storage/           # Episode and continuous data management
│   ├── lineage/           # Token provenance tracking
│   └── *.py               # Game engine, tokenizer, rendering
├── assets/sprites/        # PNG sprite assets
├── docs/                  # Implementation reports and documentation
├── data/episodes/         # Generated training data
├── game.py                # Main game loop
├── test_*.py              # Test suites
└── requirements.txt

Configuration

Transformer Config

from core.transformer import TransformerConfig

config = TransformerConfig(
    d_model=464,      # Matches hybrid fusion output
    num_heads=8,      # 8 attention heads
    num_layers=6,     # Encoder depth
    d_ff=2048,        # 4× expansion for rank preservation
    seq_len=100       # Context window
)

Hybrid Tokenizer Config

from core.hybrid_tokenizer import HybridTokenizerConfig

config = HybridTokenizerConfig(
    sequence_length=100,      # Frames per sequence
    overlap_frames=20,        # Window overlap
    enable_continuous=True,   # Capture continuous data
    enable_lineage=True       # Track provenance
)

Documentation

Comprehensive documentation available in docs/:

transformer_report.md - Complete implementation report with theoretical validation
dimension_fix_summary.md - Detailed explanation of dimension calculation fix
Implementation references to Vaswani et al. (2017), Dong et al. (2021)

Contributing

Development Setup

# Clone repository
git clone <repository-url>
cd diabl0

# Create virtual environment
python -m venv venv
source venv/bin/activate  # On Windows: venv\Scripts\activate

# Install dependencies
pip install -r requirements.txt

# Download sprites
python download_emojis.py

# Run tests
python test_transformer.py

Code Standards

Pure NumPy implementation (no PyTorch/TensorFlow) for educational clarity
CPU-only execution (intentional design choice for learning/debugging)
Comprehensive inline documentation
All critical sections marked with "CRITICAL" comments
Type hints for all public APIs
Configuration-based design (parameterized values)

Adding Features

New Entity Types: Extend Entity base class in core/entities/
New Encoders: Add to core/hybrid/encoders.py
Custom Fusion: Modify StreamFusion in core/hybrid/fusion.py
Monitoring: Extend RankCollapseMonitor in core/transformer/diagnostics.py

Testing Requirements

All contributions should include:

Unit tests for new components
Integration tests for pipeline changes
Rank preservation validation
Performance benchmarks

Run the full test suite before submitting:

python tests/test_transformer_quick.py && \
python tests/test_edge_cases.py && \
python tests/test_transformer.py && \
python tests/test_integration_1000frames.py --frames 100

Dependencies

Python 3.8+
NumPy ≥1.24
PyGame ≥2.5
h5py ≥3.0 (for continuous data storage)
requests (for emoji sprite downloader)

References

Theoretical Foundation:

Vaswani et al. (2017): "Attention Is All You Need"
Dong et al. (2021): "Attention is not all you need: pure attention loses rank doubly exponentially with depth"
He et al. (2016): "Deep Residual Learning for Image Recognition"
Ba et al. (2016): "Layer Normalization"

Implementation Principles:

Residual connections prevent rank collapse
FFN expansion maintains representational capacity
Layer normalization ensures training stability
Provenance tracking enables interpretability

License

[Your License Here]

Acknowledgments

Built with educational intent - heavily commented for learning purposes.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Diabl0

Overview

Quick Start

Core Components

Key Features

Testing

Performance

Project Structure

Configuration

Transformer Config

Hybrid Tokenizer Config

Documentation

Contributing

Development Setup

Code Standards

Adding Features

Testing Requirements

Dependencies

References

License

Acknowledgments

About

Uh oh!

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
core		core
docs		docs
tests		tests
.gitignore		.gitignore
README.md		README.md
download_emojis.py		download_emojis.py
game.py		game.py
requirements.txt		requirements.txt

asavschaeffer/0

Folders and files

Latest commit

History

Repository files navigation

Diabl0

Overview

Quick Start

Core Components

Key Features

Testing

Performance

Project Structure

Configuration

Transformer Config

Hybrid Tokenizer Config

Documentation

Contributing

Development Setup

Code Standards

Adding Features

Testing Requirements

Dependencies

References

License

Acknowledgments

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages