Skip to content

An implementation of reinforcement learning algorithms designed to beat Trackmania maps. Features IQN and DQN agents with improvements inspired by Rainbow DQN and Beyond the Rainbow research papers.

License

Notifications You must be signed in to change notification settings

CogitoNTNU/DeepTactics-Trackmania

Repository files navigation

GitHub Workflow Status (with event) GitHub top language GitHub language count License: MIT Project Version

Cogito Project Logo
πŸ“‹ Table of Contents

DeepTactics-TrackMania

πŸš— Deeptactics Trackmania is a student-driven project exploring Reinforcement Learning (RL) in the racing game Trackmania. Our goal is to design, train, and visualize agents capable of completing tracks, improving over time, and eventually outperforming human players in our group.

πŸŽ₯ Agent in Action

trackmania-agent.mp4

Our Rainbow DQN agent navigating a TrackMania track


🎯 Project Goals

  • Main Goal:
    Build an RL system that can successfully complete a Trackmania track.

  • Subgoals:

    • Achieve competitive performance on challenging tracks
    • Visualize trained agents playing inside the game
    • Document training progress and results
    • Support both local and HPC cluster training

🧠 Project Description

We train RL agents using multiple deep Q-learning methods in Trackmania and various Gymnasium environments. The project emphasizes:

  • Implementing state-of-the-art RL algorithms from scratch (DQN, IQN, Rainbow).
  • Building shared knowledge through research workshops and collaborative development.
  • Using Weights & Biases dashboards to monitor training progress and metrics.
  • Combining technical learning with social team-building.
  • Ensuring every team member can understand, modify, and train agents independently.

πŸ—οΈ Architecture & Tech Stack

Environments:

  • Gymnasium (v1.2.2+): LunarLander, CarRacing, CartPole, Acrobot, MountainCar, Ant (MuJoCo)
  • TMRL (custom fork): Real TrackMania game integration via RTGym interface

RL Algorithms Implemented:

  • DQN: Deep Q-Network with optional Dueling + Prioritized Replay + Double DQN
  • IQN: Implicit Quantile Networks with distributional RL and noisy exploration
  • Rainbow DQN: Combines IQN, Dueling, PER, DDQN, N-step returns, and noisy layers
    • CarRacing variant: CNN-based for image observations
    • TrackMania variant: Multi-input (images + car features + action history)

Tech Stack:

  • Framework: PyTorch 2.7.0+, TorchRL 0.10.1+
  • Experiment Tracking: Weights & Biases (WandB)
  • Environment Manager: UV (Python package manager)
  • Development Tools: Pre-commit hooks, pytest, Git
  • Deployment: SLURM cluster support (NTNU HPC with V100 GPUs)

πŸ“š Key Resources

Research Papers:

Frameworks & Tools:


πŸ› οΈ Prerequisites

  • Git: Version control system. Download Git
  • Python 3.13+: Required for the project. Download Python
  • UV: Python package and environment manager. Install UV
  • CUDA (recommended): For GPU-accelerated training on Windows/Linux
  • TrackMania 2020 (optional): Required only for TrackMania training

πŸš€ Getting Started

1. Clone the Repository

git clone https://github.com/CogitoNTNU/DeepTactics-TrackMania.git
cd DeepTactics-TrackMania

2. Install Dependencies

The project uses uv for dependency management. PyTorch will be installed with the appropriate backend:

  • Windows: CUDA 13.0
  • Linux: ROCm 6.4 (AMD GPUs)
  • macOS: CPU-only
uv sync

3. Set Up Pre-commit Hooks (Development Only)

uv run pre-commit install

4. Create Configuration Files

Generate configuration files from templates:

uv run create_configs.py

This creates:

  • config_files/config.py - Configuration for Gymnasium environments
  • config_files/tm_config.py - Configuration for TrackMania

5. Configure Your Training

All training settings are controlled from a single config file - no need to edit TMRL's config.json or other files!

Edit the generated config files to customize your training:

For Gymnasium environments (config_files/config.py):

# Environment selection
env_name = "LunarLander-v3"  # Or: CarRacing-v3, CartPole-v1, etc.

# Algorithm selection
use_DQN = False              # Standard DQN
use_IQN = True               # Implicit Quantile Networks (recommended)

# Algorithm features
use_dueling = True           # Dueling architecture
use_prioritized_replay = True
use_doubleDQN = True

# Hyperparameters
learning_rate = 0.0001
batch_size = 32
discount_factor = 0.997
epsilon_decay_steps = 250_000

# Network architecture
hidden_dim = 128

# Training control
checkpoint = True
resume_from_checkpoint = False  # Set True to continue from checkpoint

For TrackMania (config_files/tm_config.py):

# Algorithm features
use_dueling = True
use_prioritized_replay = True
use_doubleDQN = True

# TrackMania-specific
crash_detection = True       # Penalize velocity drops (crashes)
crash_threshold = 10.0       # Velocity drop threshold
crash_penalty = 10           # Penalty for crashes

# Network architecture
hidden_dim = 256
conv_channels_1 = 16
conv_channels_2 = 32
car_feature_hidden_dim = 256
action_history_hidden_dim = 256
act_buf_len = 4

# Hyperparameters
learning_rate = 0.0001
batch_size = 64
discount_factor = 0.997
n_step_buffer_len = 4
epsilon_decay_steps = 2_000_000

# Training control
checkpoint = True
checkpoint_frequency = 10    # Save every N episodes
resume_from_checkpoint = False

That's it! All settings are in one place. No need to configure TMRL's config.json separately.


πŸ’» Usage

Running Training

The main entry point is main.py. Toggle between TrackMania and Gymnasium training:

# In main.py
run_tm = False  # Set to True for TrackMania, False for Gymnasium

Start training:

uv run main.py

Training Modes

Gymnasium Training (vector and image environments):

  • Automatically selects agent based on config (DQN/IQN/Rainbow)
  • Supports 6 environments: LunarLander, CarRacing, CartPole, Acrobot, MountainCar, Ant
  • Tracks metrics to WandB (requires login: wandb login)

TrackMania Training (Windows only):

  • Requires TrackMania 2020 with OpenPlanet plugin
  • Uses Rainbow agent with multi-input architecture
  • Integrates replay buffer saving for crash recovery
  • See TMRL Setup Guide for detailed installation instructions

Monitoring Progress

Training metrics are logged to Weights & Biases:

  • Episode rewards
  • Q-value estimates
  • Loss curves
  • Epsilon decay
  • Race completion times (TrackMania)

Checkpointing

Checkpoints are automatically saved to checkpoints/ directory:

  • checkpoint_latest.pt - Most recent checkpoint (for resuming)
  • checkpoint_episode_N.pt - Periodic snapshots
  • checkpoint_final.pt - End of training

To resume from checkpoint, set in config:

resume_from_checkpoint = True

🏎️ TrackMania Setup (Optional)

To train agents in the actual TrackMania 2020 game, follow these additional steps:

Quick Setup

  1. Install TrackMania 2020

    • Download from trackmania.com (free version works)
    • Launch the game at least once to complete initial setup
  2. Install Visual C++ Runtime (Windows prerequisite)

    # Download and install from:
    # https://aka.ms/vs/16/release/vc_redist.x64.exe
  3. Install OpenPlanet Plugin

    • Download from openplanet.nl
    • During installation, click "More Info" β†’ "Install Anyway" if prompted about unsigned certificate
    • Verify installation: Launch TM2020, press F3, should see OpenPlanet menu
  4. Initialize TMRL Environment

    uv run python -m tmrl --install

    This creates ~/TmrlData/ folder with:

    • Pre-trained models (for testing)
    • OpenPlanet plugins (TMRL_GrabData.op, TMRL_SaveGhost.op)
    • Configuration templates
    • Test maps
  5. Verify OpenPlanet Integration

    • Launch TrackMania 2020 and any track
    • Press F3 β†’ Developer β†’ (Re)load plugin β†’ TMRL Grab Data
    • Should see "waiting for incoming connection" in OpenPlanet > Log
  6. Configure Training Settings

    All training settings are in config_files/tm_config.py - you don't need to edit TMRL's config.json!

    Our project uses a custom Rainbow DQN agent instead of TMRL's default SAC. All hyperparameters, network architecture, and training settings are controlled from the single config file:

    # In config_files/tm_config.py
    
    # Algorithm features
    use_dueling = True
    use_prioritized_replay = True
    crash_detection = True
    
    # Network architecture
    hidden_dim = 256
    batch_size = 64
    
    # Training hyperparameters
    learning_rate = 0.0001
    discount_factor = 0.997
    epsilon_decay_steps = 2_000_000
    
    # See config_files/static_tm_config.py for all available options
  7. Run Training

    # In main.py, set:
    run_tm = True
    
    # Then start training:
    uv run main.py

Detailed Setup Guide

For complete installation instructions, configuration options, troubleshooting, and advanced features, see:

πŸ“˜ Complete TMRL Setup Guide

This includes:

  • Linux installation (Steam/Proton)
  • Network configuration for distributed training
  • Custom reward function recording
  • Performance tuning
  • Security configuration (TLS/passwords)
  • Environment types (FULL vs LIDAR)

πŸ§ͺ Testing

Run the test suite:

uv run pytest --doctest-modules --cov=src --cov-report=html

View coverage report:

open htmlcov/index.html  # macOS/Linux
start htmlcov/index.html # Windows

πŸ“– Documentation

Build and preview the documentation site locally:

uv run mkdocs build
uv run mkdocs serve

This starts a local server at http://127.0.0.1:8000/ with the docs and API reference.

View the latest published documentation: https://cogitontnu.github.io/DeepTactics-TrackMania/


πŸ—οΈ Project Structure

DeepTactics-TrackMania/
β”œβ”€β”€ src/
β”‚   β”œβ”€β”€ agents/
β”‚   β”‚   β”œβ”€β”€ DQN.py              # Deep Q-Network agent
β”‚   β”‚   β”œβ”€β”€ IQN.py              # Implicit Quantile Networks
β”‚   β”‚   β”œβ”€β”€ rainbow.py          # Rainbow for CarRacing
β”‚   β”‚   └── rainbow_tm.py       # Rainbow for TrackMania
β”‚   β”œβ”€β”€ helper_functions/
β”‚   β”‚   β”œβ”€β”€ tm_actions.py       # TrackMania action mapping
β”‚   β”‚   β”œβ”€β”€ tm_checkpointing.py # Checkpoint utilities
β”‚   β”‚   └── ant_wrappers.py     # Discrete action wrapper for Ant
β”‚   β”œβ”€β”€ env.py                  # Gymnasium training script
β”‚   └── env_tm.py               # TrackMania training script
β”œβ”€β”€ config_files/
β”‚   β”œβ”€β”€ static_config.py        # Gymnasium config template
β”‚   └── static_tm_config.py     # TrackMania config template
β”œβ”€β”€ main.py                     # Main entry point
β”œβ”€β”€ create_configs.py           # Config generation script
β”œβ”€β”€ run_slurm.slurm            # SLURM cluster deployment
└── pyproject.toml             # Dependencies and project metadata

πŸŽ“ Algorithm Overview

DQN (Deep Q-Network)

Standard Q-learning with neural network approximation. Supports:

  • Dueling architecture: Separates value and advantage streams
  • Prioritized Experience Replay: Samples important transitions more frequently
  • Double DQN: Reduces overestimation bias

IQN (Implicit Quantile Networks)

Distributional RL that learns the full distribution of Q-values:

  • Quantile regression: More stable than expectation-based methods
  • Noisy layers: Built-in exploration without epsilon-greedy
  • Cosine embedding: Encodes quantile values

Rainbow DQN

Combines multiple improvements for state-of-the-art performance:

  • IQN (distributional RL)
  • Dueling architecture
  • Prioritized Experience Replay
  • Double DQN
  • N-step returns
  • Noisy layers for exploration
  • TrackMania variant: Multi-input (images + car state + action history)

πŸ”§ Troubleshooting

Common Issues

"ModuleNotFoundError: No module named 'tmrl'"

  • TMRL is only required for TrackMania training
  • Set run_tm = False in main.py to use Gymnasium environments instead

"CUDA out of memory"

  • Reduce batch_size in config (e.g., from 64 to 32)
  • Use smaller hidden_dim (e.g., 128 instead of 256)
  • Close other GPU-intensive applications

Training diverges / Q-values explode

  • The project uses hard target network updates by default (tau = 1.0 in config)
  • Hard updates copy the entire policy network to target network periodically
  • For soft updates, set tau to a small value (e.g., 0.001 or 0.005):
    • tau = 1.0 β†’ Hard update (full copy, recommended and default)
    • tau = 0.005 β†’ Soft update (ΞΈ_target = 0.005ΞΈ_policy + 0.995ΞΈ_target)
    • tau = 0.0 β†’ No update (target network never changes)
  • Warning: Soft updates (tau < 1.0) can cause training instability and Q-value divergence
  • Use constant learning rate (no decay scheduler) for stable training

WandB not logging

  • Run wandb login and enter your API key
  • Set wandb_logging = True in config
  • Check internet connection

TrackMania connection issues

  • Ensure TrackMania Nations Forever is running
  • Install TMInterface plugin
  • Check that TMRL server is configured correctly

πŸ“ Contributing

We welcome contributions! This is a learning-focused project where collaboration is key.

Guidelines:

  • Every line of code should be understandable by all team members
  • Document your changes and explain the reasoning
  • Run tests before submitting: uv run pytest
  • Format code with pre-commit hooks: uv run pre-commit run --all-files

Areas for Contribution:

  • Algorithm improvements and hyperparameter tuning
  • New environment support
  • Visualization tools
  • Documentation and tutorials

πŸ‘₯ Team

This project would not have been possible without the hard work and dedication of all contributors. Thank you for the time and effort you have put into making this project a reality.

Ludvig Øvrevik
Ludvig Øvrevik

Project Lead
Brage Kvamme
Brage Kvamme

Project Lead
Edvard Klavenes
Edvard Klavenes

Project Member
Henrik Øen
Henrik Øen

Project Member
Simen FΓΈrdestrΓΈm Verhoeven
Simen FΓΈrdestrΓΈm Verhoeven

Project Member
Eldar Alvik
Eldar Alvik

Project Member
Kristoffer Seyffarth
Kristoffer Seyffarth

Project Member
Group picture

License


Distributed under the MIT License. See LICENSE for more information.

About

An implementation of reinforcement learning algorithms designed to beat Trackmania maps. Features IQN and DQN agents with improvements inspired by Rainbow DQN and Beyond the Rainbow research papers.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 10