π Table of Contents
- π― Project Goals
- π§ Project Description
- ποΈ Architecture & Tech Stack
- π Key Resources
- π οΈ Prerequisites
- π Getting Started
- π» Usage
- ποΈ TrackMania Setup (Optional)
- π§ͺ Testing
- π Documentation
- ποΈ Project Structure
- π Algorithm Overview
- π§ Troubleshooting
- π Contributing
- π₯ Team
- License
π Deeptactics Trackmania is a student-driven project exploring Reinforcement Learning (RL) in the racing game Trackmania. Our goal is to design, train, and visualize agents capable of completing tracks, improving over time, and eventually outperforming human players in our group.
trackmania-agent.mp4
Our Rainbow DQN agent navigating a TrackMania track
-
Main Goal:
Build an RL system that can successfully complete a Trackmania track. -
Subgoals:
- Achieve competitive performance on challenging tracks
- Visualize trained agents playing inside the game
- Document training progress and results
- Support both local and HPC cluster training
We train RL agents using multiple deep Q-learning methods in Trackmania and various Gymnasium environments. The project emphasizes:
- Implementing state-of-the-art RL algorithms from scratch (DQN, IQN, Rainbow).
- Building shared knowledge through research workshops and collaborative development.
- Using Weights & Biases dashboards to monitor training progress and metrics.
- Combining technical learning with social team-building.
- Ensuring every team member can understand, modify, and train agents independently.
Environments:
- Gymnasium (v1.2.2+): LunarLander, CarRacing, CartPole, Acrobot, MountainCar, Ant (MuJoCo)
- TMRL (custom fork): Real TrackMania game integration via RTGym interface
RL Algorithms Implemented:
- DQN: Deep Q-Network with optional Dueling + Prioritized Replay + Double DQN
- IQN: Implicit Quantile Networks with distributional RL and noisy exploration
- Rainbow DQN: Combines IQN, Dueling, PER, DDQN, N-step returns, and noisy layers
- CarRacing variant: CNN-based for image observations
- TrackMania variant: Multi-input (images + car features + action history)
Tech Stack:
- Framework: PyTorch 2.7.0+, TorchRL 0.10.1+
- Experiment Tracking: Weights & Biases (WandB)
- Environment Manager: UV (Python package manager)
- Development Tools: Pre-commit hooks, pytest, Git
- Deployment: SLURM cluster support (NTNU HPC with V100 GPUs)
Research Papers:
- DQN Paper (Mnih et al.) - Original Deep Q-Network
- Dueling DQN (Wang et al.) - Value/Advantage decomposition
- Prioritized Experience Replay (Schaul et al.)
- IQN Paper (Dabney et al.) - Implicit Quantile Networks
- Rainbow DQN (Hessel et al.) - Combining improvements
- IMPALA (Espeholt et al.) - CNN architecture
Frameworks & Tools:
- TMRL Framework - TrackMania RL interface
- Gymnasium Documentation - Environment library
- Linesight RL (YouTube) - RL tutorials
- TMUnlimiter - TrackMania tools
- Git: Version control system. Download Git
- Python 3.13+: Required for the project. Download Python
- UV: Python package and environment manager. Install UV
- CUDA (recommended): For GPU-accelerated training on Windows/Linux
- TrackMania 2020 (optional): Required only for TrackMania training
- Free version available at trackmania.com
- OpenPlanet Plugin: Required for TMRL integration. Download
- See TMRL Setup Guide for complete installation instructions
git clone https://github.com/CogitoNTNU/DeepTactics-TrackMania.git
cd DeepTactics-TrackManiaThe project uses uv for dependency management. PyTorch will be installed with the appropriate backend:
- Windows: CUDA 13.0
- Linux: ROCm 6.4 (AMD GPUs)
- macOS: CPU-only
uv syncuv run pre-commit installGenerate configuration files from templates:
uv run create_configs.pyThis creates:
config_files/config.py- Configuration for Gymnasium environmentsconfig_files/tm_config.py- Configuration for TrackMania
All training settings are controlled from a single config file - no need to edit TMRL's config.json or other files!
Edit the generated config files to customize your training:
For Gymnasium environments (config_files/config.py):
# Environment selection
env_name = "LunarLander-v3" # Or: CarRacing-v3, CartPole-v1, etc.
# Algorithm selection
use_DQN = False # Standard DQN
use_IQN = True # Implicit Quantile Networks (recommended)
# Algorithm features
use_dueling = True # Dueling architecture
use_prioritized_replay = True
use_doubleDQN = True
# Hyperparameters
learning_rate = 0.0001
batch_size = 32
discount_factor = 0.997
epsilon_decay_steps = 250_000
# Network architecture
hidden_dim = 128
# Training control
checkpoint = True
resume_from_checkpoint = False # Set True to continue from checkpointFor TrackMania (config_files/tm_config.py):
# Algorithm features
use_dueling = True
use_prioritized_replay = True
use_doubleDQN = True
# TrackMania-specific
crash_detection = True # Penalize velocity drops (crashes)
crash_threshold = 10.0 # Velocity drop threshold
crash_penalty = 10 # Penalty for crashes
# Network architecture
hidden_dim = 256
conv_channels_1 = 16
conv_channels_2 = 32
car_feature_hidden_dim = 256
action_history_hidden_dim = 256
act_buf_len = 4
# Hyperparameters
learning_rate = 0.0001
batch_size = 64
discount_factor = 0.997
n_step_buffer_len = 4
epsilon_decay_steps = 2_000_000
# Training control
checkpoint = True
checkpoint_frequency = 10 # Save every N episodes
resume_from_checkpoint = FalseThat's it! All settings are in one place. No need to configure TMRL's config.json separately.
The main entry point is main.py. Toggle between TrackMania and Gymnasium training:
# In main.py
run_tm = False # Set to True for TrackMania, False for GymnasiumStart training:
uv run main.pyGymnasium Training (vector and image environments):
- Automatically selects agent based on config (DQN/IQN/Rainbow)
- Supports 6 environments: LunarLander, CarRacing, CartPole, Acrobot, MountainCar, Ant
- Tracks metrics to WandB (requires login:
wandb login)
TrackMania Training (Windows only):
- Requires TrackMania 2020 with OpenPlanet plugin
- Uses Rainbow agent with multi-input architecture
- Integrates replay buffer saving for crash recovery
- See TMRL Setup Guide for detailed installation instructions
Training metrics are logged to Weights & Biases:
- Episode rewards
- Q-value estimates
- Loss curves
- Epsilon decay
- Race completion times (TrackMania)
Checkpoints are automatically saved to checkpoints/ directory:
checkpoint_latest.pt- Most recent checkpoint (for resuming)checkpoint_episode_N.pt- Periodic snapshotscheckpoint_final.pt- End of training
To resume from checkpoint, set in config:
resume_from_checkpoint = TrueTo train agents in the actual TrackMania 2020 game, follow these additional steps:
-
Install TrackMania 2020
- Download from trackmania.com (free version works)
- Launch the game at least once to complete initial setup
-
Install Visual C++ Runtime (Windows prerequisite)
# Download and install from: # https://aka.ms/vs/16/release/vc_redist.x64.exe
-
Install OpenPlanet Plugin
- Download from openplanet.nl
- During installation, click "More Info" β "Install Anyway" if prompted about unsigned certificate
- Verify installation: Launch TM2020, press
F3, should see OpenPlanet menu
-
Initialize TMRL Environment
uv run python -m tmrl --install
This creates
~/TmrlData/folder with:- Pre-trained models (for testing)
- OpenPlanet plugins (
TMRL_GrabData.op,TMRL_SaveGhost.op) - Configuration templates
- Test maps
-
Verify OpenPlanet Integration
- Launch TrackMania 2020 and any track
- Press
F3βDeveloperβ(Re)load pluginβTMRL Grab Data - Should see "waiting for incoming connection" in
OpenPlanet > Log
-
Configure Training Settings
All training settings are in
config_files/tm_config.py- you don't need to edit TMRL'sconfig.json!Our project uses a custom Rainbow DQN agent instead of TMRL's default SAC. All hyperparameters, network architecture, and training settings are controlled from the single config file:
# In config_files/tm_config.py # Algorithm features use_dueling = True use_prioritized_replay = True crash_detection = True # Network architecture hidden_dim = 256 batch_size = 64 # Training hyperparameters learning_rate = 0.0001 discount_factor = 0.997 epsilon_decay_steps = 2_000_000 # See config_files/static_tm_config.py for all available options
-
Run Training
# In main.py, set: run_tm = True # Then start training: uv run main.py
For complete installation instructions, configuration options, troubleshooting, and advanced features, see:
π Complete TMRL Setup Guide
This includes:
- Linux installation (Steam/Proton)
- Network configuration for distributed training
- Custom reward function recording
- Performance tuning
- Security configuration (TLS/passwords)
- Environment types (FULL vs LIDAR)
Run the test suite:
uv run pytest --doctest-modules --cov=src --cov-report=htmlView coverage report:
open htmlcov/index.html # macOS/Linux
start htmlcov/index.html # WindowsBuild and preview the documentation site locally:
uv run mkdocs build
uv run mkdocs serveThis starts a local server at http://127.0.0.1:8000/ with the docs and API reference.
View the latest published documentation: https://cogitontnu.github.io/DeepTactics-TrackMania/
DeepTactics-TrackMania/
βββ src/
β βββ agents/
β β βββ DQN.py # Deep Q-Network agent
β β βββ IQN.py # Implicit Quantile Networks
β β βββ rainbow.py # Rainbow for CarRacing
β β βββ rainbow_tm.py # Rainbow for TrackMania
β βββ helper_functions/
β β βββ tm_actions.py # TrackMania action mapping
β β βββ tm_checkpointing.py # Checkpoint utilities
β β βββ ant_wrappers.py # Discrete action wrapper for Ant
β βββ env.py # Gymnasium training script
β βββ env_tm.py # TrackMania training script
βββ config_files/
β βββ static_config.py # Gymnasium config template
β βββ static_tm_config.py # TrackMania config template
βββ main.py # Main entry point
βββ create_configs.py # Config generation script
βββ run_slurm.slurm # SLURM cluster deployment
βββ pyproject.toml # Dependencies and project metadata
Standard Q-learning with neural network approximation. Supports:
- Dueling architecture: Separates value and advantage streams
- Prioritized Experience Replay: Samples important transitions more frequently
- Double DQN: Reduces overestimation bias
Distributional RL that learns the full distribution of Q-values:
- Quantile regression: More stable than expectation-based methods
- Noisy layers: Built-in exploration without epsilon-greedy
- Cosine embedding: Encodes quantile values
Combines multiple improvements for state-of-the-art performance:
- IQN (distributional RL)
- Dueling architecture
- Prioritized Experience Replay
- Double DQN
- N-step returns
- Noisy layers for exploration
- TrackMania variant: Multi-input (images + car state + action history)
"ModuleNotFoundError: No module named 'tmrl'"
- TMRL is only required for TrackMania training
- Set
run_tm = Falsein main.py to use Gymnasium environments instead
"CUDA out of memory"
- Reduce
batch_sizein config (e.g., from 64 to 32) - Use smaller
hidden_dim(e.g., 128 instead of 256) - Close other GPU-intensive applications
Training diverges / Q-values explode
- The project uses hard target network updates by default (
tau = 1.0in config) - Hard updates copy the entire policy network to target network periodically
- For soft updates, set
tauto a small value (e.g.,0.001or0.005):tau = 1.0β Hard update (full copy, recommended and default)tau = 0.005β Soft update (ΞΈ_target = 0.005ΞΈ_policy + 0.995ΞΈ_target)tau = 0.0β No update (target network never changes)
- Warning: Soft updates (
tau < 1.0) can cause training instability and Q-value divergence - Use constant learning rate (no decay scheduler) for stable training
WandB not logging
- Run
wandb loginand enter your API key - Set
wandb_logging = Truein config - Check internet connection
TrackMania connection issues
- Ensure TrackMania Nations Forever is running
- Install TMInterface plugin
- Check that TMRL server is configured correctly
We welcome contributions! This is a learning-focused project where collaboration is key.
Guidelines:
- Every line of code should be understandable by all team members
- Document your changes and explain the reasoning
- Run tests before submitting:
uv run pytest - Format code with pre-commit hooks:
uv run pre-commit run --all-files
Areas for Contribution:
- Algorithm improvements and hyperparameter tuning
- New environment support
- Visualization tools
- Documentation and tutorials
This project would not have been possible without the hard work and dedication of all contributors. Thank you for the time and effort you have put into making this project a reality.
![]() Ludvig Γvrevik Project Lead |
![]() Brage Kvamme Project Lead |
Edvard Klavenes Project Member |
Henrik Γen Project Member |
Simen FΓΈrdestrΓΈm Verhoeven Project Member |
|
Eldar Alvik Project Member |
![]() Kristoffer Seyffarth Project Member |
Distributed under the MIT License. See LICENSE for more information.




