A hybrid world model implementation that combines symbolic tokenization with continuous data processing through a NumPy-based transformer architecture. Built on a simple PyGame foundation, Diabl0 demonstrates how neural architectures can learn emergent representations from gameplay interactions.
Diabl0 processes game state through a complete machine learning pipeline:
- Multi-Stream Tokenization: Separates static environment, dynamic entities, and discrete events into efficient token streams
- Continuous Data Capture: Stores raw sensor data (positions, velocities) in optimized HDF5 format
- Hybrid Fusion: Combines symbolic tokens and continuous data into unified 464-dimensional vectors
- Transformer Encoding: 6-layer self-attention encoder with anti-collapse mechanisms produces contextualized embeddings
- Provenance Tracking: Maintains causal chains and lineage for interpretability
# Install dependencies
pip install -r requirements.txt
# Download emoji sprites (first time only)
python download_emojis.py
# Run the game
python game.py
## Controls
- **WASD**: Move player
- **E**: Start new episode
- **ESC**: Quit
## Architecture
### Data Pipeline
```mermaid
graph TD
A[Game State] --> B[Multi-Stream Tokenizer<br/>(STATIC/DYNAMIC/EVENT)]
B --> C[Hybrid Sequence Builder<br/>(concatenate + encode)]
C --> D[464D Fused Vectors<br/>(3×112D symbolic + 128D continuous)]
D --> E[Transformer Encoder<br/>(6 layers, 8 heads, 2048D FFN)]
E --> F[Contextualized Embeddings (464D)]
Tokenization (core/
)
tokenizer.py
- Vocabulary and base tokenizertoken_store.py
- Multi-stream token storagehybrid_tokenizer.py
- Main hybrid interface
Hybrid Processing (core/hybrid/
)
embeddings.py
- Learned lookup tables for symbolic tokensencoders.py
- MLP/CNN encoders for continuous datafusion.py
- Stream fusion and temporal alignment
Transformer (core/transformer/
)
encoder.py
- Full 6-layer transformer encoderattention.py
- Multi-head self-attention mechanismblocks.py
- Encoder blocks with residuals (rank collapse prevention)layers.py
- LayerNorm, Linear, activation functionspositional.py
- Sinusoidal position encodingdiagnostics.py
- Real-time rank monitoring and alertsvisualization.py
- Attention and embedding visualization tools
Storage & Provenance (core/
)
storage/episode_manager.py
- Episode lifecycle managementstorage/continuous_store.py
- HDF5 storage for continuous datalineage/provenance.py
- Token lineage and causal chains
Game Engine
game_engine.py
- Physics, collision detection, event emissionentities/
- Player, obstacles, base entity classrendering.py
- Unified rendering systemsprites.py
- Asset management
Rank Collapse Prevention
The transformer implements all critical mechanisms from Dong et al. (2021):
- Residual connections in every encoder block
- 4× FFN expansion (d_ff=2048 vs d_model=464)
- Scaled attention (1/√d_k = 0.1313)
- Layer normalization
Effective rank stays above 82 across all layers (critical threshold: 58).
Multi-Stream Architecture
Three separate token streams optimize storage and semantics:
- STATIC: Emitted once (environment, obstacles)
- DYNAMIC: Delta-based emission (player, enemies)
- EVENT: Always emitted (inputs, collisions)
Real-Time Console Monitoring
Diagnostic tools provide comprehensive model health tracking:
- Rank analysis with configurable thresholds
- Attention pattern detection (dead/collapsed heads)
- Token similarity monitoring
- ASCII visualizations for console output (dashboard-ready data export available)
# Quick transformer validation (5 core tests)
python tests/test_transformer_quick.py
# Full transformer tests (10 tests)
python tests/test_transformer.py
# Edge case tests (7 tests)
python tests/test_edge_cases.py
# Integration testing
python tests/test_integration_1000frames.py --frames 1000 --verbose
# Stress test suite
python tests/test_integration_1000frames.py --stress
All tests include rank preservation validation, performance benchmarks, and stability checks.
Current Benchmarks (NumPy on CPU):
- Single frame: ~1.3ms average (meets <10ms target)
- Batch processing: 105-142ms/sample (optimization in progress)
- Throughput: 7-10 samples/sec for batch size 8
- Effective rank: 82-169 across 6 layers (no collapse)
diabl0/
├── core/
│ ├── entities/ # Game entities (player, obstacles)
│ ├── hybrid/ # Hybrid tokenization components
│ ├── transformer/ # NumPy transformer implementation
│ ├── storage/ # Episode and continuous data management
│ ├── lineage/ # Token provenance tracking
│ └── *.py # Game engine, tokenizer, rendering
├── assets/sprites/ # PNG sprite assets
├── docs/ # Implementation reports and documentation
├── data/episodes/ # Generated training data
├── game.py # Main game loop
├── test_*.py # Test suites
└── requirements.txt
from core.transformer import TransformerConfig
config = TransformerConfig(
d_model=464, # Matches hybrid fusion output
num_heads=8, # 8 attention heads
num_layers=6, # Encoder depth
d_ff=2048, # 4× expansion for rank preservation
seq_len=100 # Context window
)
from core.hybrid_tokenizer import HybridTokenizerConfig
config = HybridTokenizerConfig(
sequence_length=100, # Frames per sequence
overlap_frames=20, # Window overlap
enable_continuous=True, # Capture continuous data
enable_lineage=True # Track provenance
)
Comprehensive documentation available in docs/
:
transformer_report.md
- Complete implementation report with theoretical validationdimension_fix_summary.md
- Detailed explanation of dimension calculation fix- Implementation references to Vaswani et al. (2017), Dong et al. (2021)
# Clone repository
git clone <repository-url>
cd diabl0
# Create virtual environment
python -m venv venv
source venv/bin/activate # On Windows: venv\Scripts\activate
# Install dependencies
pip install -r requirements.txt
# Download sprites
python download_emojis.py
# Run tests
python test_transformer.py
- Pure NumPy implementation (no PyTorch/TensorFlow) for educational clarity
- CPU-only execution (intentional design choice for learning/debugging)
- Comprehensive inline documentation
- All critical sections marked with "CRITICAL" comments
- Type hints for all public APIs
- Configuration-based design (parameterized values)
- New Entity Types: Extend
Entity
base class incore/entities/
- New Encoders: Add to
core/hybrid/encoders.py
- Custom Fusion: Modify
StreamFusion
incore/hybrid/fusion.py
- Monitoring: Extend
RankCollapseMonitor
incore/transformer/diagnostics.py
All contributions should include:
- Unit tests for new components
- Integration tests for pipeline changes
- Rank preservation validation
- Performance benchmarks
Run the full test suite before submitting:
python tests/test_transformer_quick.py && \
python tests/test_edge_cases.py && \
python tests/test_transformer.py && \
python tests/test_integration_1000frames.py --frames 100
- Python 3.8+
- NumPy ≥1.24
- PyGame ≥2.5
- h5py ≥3.0 (for continuous data storage)
- requests (for emoji sprite downloader)
Theoretical Foundation:
- Vaswani et al. (2017): "Attention Is All You Need"
- Dong et al. (2021): "Attention is not all you need: pure attention loses rank doubly exponentially with depth"
- He et al. (2016): "Deep Residual Learning for Image Recognition"
- Ba et al. (2016): "Layer Normalization"
Implementation Principles:
- Residual connections prevent rank collapse
- FFN expansion maintains representational capacity
- Layer normalization ensures training stability
- Provenance tracking enables interpretability
[Your License Here]
Built with educational intent - heavily commented for learning purposes.