Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
37 commits
Select commit Hold shift + click to select a range
067a1ae
feature(xjy): Enhance text-based games like Jericho with text decodin…
puyuan1996 Jun 3, 2025
36fd720
fix(fir): fix timestep and non-text-based games compatibility for muz…
Firerozes Jun 23, 2025
9e4cb99
fix(pu): fix dtype bug in sez buffer
Jun 23, 2025
a5c1343
fix(pu): fix timestep and reward-type compatibility (#380)
puyuan1996 Jun 29, 2025
8aaac01
fix(fir): fix compatibility of stochastic muzero in collector/evaluat…
Firerozes Jul 1, 2025
527d355
polish(fir): polish ensure_softmax function (#389)
puyuan1996 Jul 21, 2025
2a66cfd
feature(fir): enable independent configuration for reward/value cate…
Firerozes Jul 23, 2025
3148c7e
fix(fir): fix timestep compatibility in muzero_evaluator.py (#386)
Firerozes Jul 23, 2025
005cea1
fix(fir): fix probabilities visualization (#393)
Firerozes Jul 27, 2025
c2eb518
polish(fir): polish softmax (#394)
Firerozes Jul 27, 2025
5c412bb
feature(xjy): add encoder_decoder_type option for jericho's world mod…
xiongjyu Aug 27, 2025
90e44a6
fix(pu): fix pad dtype bug (#412)
puyuan1996 Sep 6, 2025
5069425
fix(pu): fix pos_in_game_segment bug in buffer (#414)
puyuan1996 Sep 10, 2025
da2da95
fix(pu): fix muzero_evaluator compatibility when n_evaluator_episode>…
puyuan1996 Sep 10, 2025
da2a62f
adaptively set the config of batchsize and accumulation_steps in Jeri…
xiongjyu Sep 18, 2025
bbbe505
polish(pu): polish comments and style in entry of scalezero
puyuan1996 Sep 28, 2025
bf9f965
polish(pu): polish comments and style of ctree/tree_search/buffer/com…
puyuan1996 Sep 28, 2025
fb04c7a
polish(pu): polish comments and style of files in lzero.model
puyuan1996 Sep 28, 2025
06148e7
polish(pu): polish comments and style of files in lzero.model.unizero…
puyuan1996 Sep 28, 2025
471ae6a
polish(pu): polish comments and style of unizero_world_models
puyuan1996 Sep 28, 2025
07933a5
polish(pu): polish comments and style of files in policy/
puyuan1996 Sep 28, 2025
df3b644
polish(pu): polish comments and style of files in worker
puyuan1996 Sep 28, 2025
4f89dcc
polish(pu): polish comments and style of files in configs
puyuan1996 Sep 28, 2025
e7a8796
Merge remote-tracking branch 'origin/main' into dev-multitask-balance…
puyuan1996 Sep 28, 2025
ab746d1
fix(pu): fix some merge typo
tAnGjIa520 Sep 28, 2025
0476aca
fix(pu): fix ln norm_type, fix kv_cache rewrite bug, add value_priori…
tAnGjIa520 Sep 28, 2025
2c0a965
fix(pu): fix unizero_mt
tAnGjIa520 Sep 28, 2025
84e6094
polish(pu): add LN in head, polish init_weight, polish adamw
tAnGjIa520 Sep 29, 2025
05da638
fix(pu): fix configure_optimizer_unizero in unizero_mt
tAnGjIa520 Oct 2, 2025
06ad080
feature(pu): add encoder-clip, label smooth, analyze_latent_represent…
tAnGjIa520 Oct 9, 2025
9f69f5a
feature(pu): add encoder-clip, label smooth option in unizero_multit…
tAnGjIa520 Oct 9, 2025
af99278
fix(pu): fix tb log when gpu_num<task_num, fix total_loss += bug, polish
tAnGjIa520 Oct 9, 2025
bf91ca2
polish(pu):polish config
tAnGjIa520 Oct 9, 2025
b18f892
fix(pu): fix encoder-clip bug and num_channel/res bug
tAnGjIa520 Oct 11, 2025
bf3cd12
polish(pu): polish scale_factor in DPS
tAnGjIa520 Oct 12, 2025
b1efa60
tmp
tAnGjIa520 Oct 18, 2025
c2f9817
feature(pu): add some analysis metrics in tensorboard for unizero and…
tAnGjIa520 Oct 23, 2025
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -28,7 +28,7 @@
[![GitHub license](https://img.shields.io/github/license/opendilab/LightZero)](https://github.com/opendilab/LightZero/blob/master/LICENSE)
[![discord badge](https://dcbadge.vercel.app/api/server/dkZS2JF56X?style=flat)](https://discord.gg/dkZS2JF56X)

Updated on 2025.04.09 LightZero-v0.2.0
Updated on 2025.06.03 LightZero-v0.2.0

English | [简体中文(Simplified Chinese)](https://github.com/opendilab/LightZero/blob/main/README.zh.md) | [Documentation](https://opendilab.github.io/LightZero) | [LightZero Paper](https://arxiv.org/abs/2310.08348) | [🔥UniZero Paper](https://arxiv.org/abs/2406.10667) | [🔥ReZero Paper](https://arxiv.org/abs/2404.16364)

Expand Down
2 changes: 1 addition & 1 deletion README.zh.md
Original file line number Diff line number Diff line change
Expand Up @@ -27,7 +27,7 @@
[![Contributors](https://img.shields.io/github/contributors/opendilab/LightZero)](https://github.com/opendilab/LightZero/graphs/contributors)
[![GitHub license](https://img.shields.io/github/license/opendilab/LightZero)](https://github.com/opendilab/LightZero/blob/master/LICENSE)

最近更新于 2025.04.09 LightZero-v0.2.0
最近更新于 2025.06.03 LightZero-v0.2.0

[English](https://github.com/opendilab/LightZero/blob/main/README.md) | 简体中文 | [文档](https://opendilab.github.io/LightZero) | [LightZero 论文](https://arxiv.org/abs/2310.08348) | [🔥UniZero 论文](https://arxiv.org/abs/2406.10667) | [🔥ReZero 论文](https://arxiv.org/abs/2404.16364)

Expand Down
7 changes: 4 additions & 3 deletions docs/source/tutorials/algos/customize_algos.md
Original file line number Diff line number Diff line change
Expand Up @@ -119,16 +119,17 @@ Here is an example of unit testing in LightZero. In this example, we test the `i
```Python
import pytest
import torch
from lzero.policy.scaling_transform import inverse_scalar_transform, InverseScalarTransform
from lzero.policy.scaling_transform import DiscreteSupport, inverse_scalar_transform, InverseScalarTransform

@pytest.mark.unittest
def test_scaling_transform():
import time
logit = torch.randn(16, 601)
discrete_support = DiscreteSupport(-300., 301., 1.)
start = time.time()
output_1 = inverse_scalar_transform(logit, 300)
output_1 = inverse_scalar_transform(logit, discrete_support)
print('t1', time.time() - start)
handle = InverseScalarTransform(300)
handle = InverseScalarTransform(discrete_support)
start = time.time()
output_2 = handle(logit)
print('t2', time.time() - start)
Expand Down
7 changes: 4 additions & 3 deletions docs/source/tutorials/algos/customize_algos_zh.md
Original file line number Diff line number Diff line change
Expand Up @@ -120,16 +120,17 @@ if timestep.done:
```Python
import pytest
import torch
from lzero.policy.scaling_transform import inverse_scalar_transform, InverseScalarTransform
from lzero.policy.scaling_transform import DiscreteSupport, inverse_scalar_transform, InverseScalarTransform

@pytest.mark.unittest
def test_scaling_transform():
import time
logit = torch.randn(16, 601)
discrete_support = DiscreteSupport(-300., 301., 1.)
start = time.time()
output_1 = inverse_scalar_transform(logit, 300)
output_1 = inverse_scalar_transform(logit, discrete_support)
print('t1', time.time() - start)
handle = InverseScalarTransform(300)
handle = InverseScalarTransform(discrete_support)
start = time.time()
output_2 = handle(logit)
print('t2', time.time() - start)
Expand Down
3 changes: 2 additions & 1 deletion docs/source/tutorials/config/config.md
Original file line number Diff line number Diff line change
Expand Up @@ -44,7 +44,8 @@ The `main_config` dictionary contains the main parameter settings for running th
- `downsample`: Whether to downsample the input.
- `norm_type`: The type of normalization used.
- `num_channels`: The number of channels in the convolutional layers (number of features extracted).
- `support_scale`: The range of the value support set (`-support_scale` to `support_scale`).
- `reward_support_range`: The range of the reward support set (`(start, stop, step)`).
- `value_support_range`: The range of the value support set (`(start, stop, step)`).
- `bias`: Whether to use bias terms in the layers.
- `discrete_action_encoding_type`: How discrete actions are encoded.
- `self_supervised_learning_loss`: Whether to use a self-supervised learning loss (as in EfficientZero).
Expand Down
3 changes: 2 additions & 1 deletion docs/source/tutorials/config/config_zh.md
Original file line number Diff line number Diff line change
Expand Up @@ -43,7 +43,8 @@
- `downsample`: 是否进行降采样。
- `norm_type`: 归一化使用的方法。
- `num_channels`: 卷积层提取的特征个数。
- `support_scale`: 价值支持集的范围 (-support_scale, support_scale)。
- `reward_support_range`: 价值支持集的范围 (`(start, stop, step)`)。<!-- TODO : ADAPT THIS DESCRIPTION, I DON'T SPEAK CHINESE -->
- `value_support_range`: 价值支持集的范围 (`(start, stop, step)`)。<!-- TODO : ADAPT THIS DESCRIPTION, I DON'T SPEAK CHINESE -->
- `bias`: 是否使用偏置。
- `discrete_action_encoding_type`: 离散化动作空间使用的编码类型。
- `self_supervised_learning_loss`: 是否使用自监督学习损失(参照EfficientZero的实现)。
Expand Down
5 changes: 2 additions & 3 deletions lzero/agent/config/gumbel_muzero/gomoku_play_with_bot.py
Original file line number Diff line number Diff line change
Expand Up @@ -44,9 +44,8 @@
image_channel=3,
num_res_blocks=1,
num_channels=32,
support_scale=10,
reward_support_size=21,
value_support_size=21,
reward_support_range=(-10., 11., 1.),
value_support_range=(-10., 11., 1.),
),
cuda=True,
env_type='board_games',
Expand Down
5 changes: 2 additions & 3 deletions lzero/agent/config/gumbel_muzero/tictactoe_play_with_bot.py
Original file line number Diff line number Diff line change
Expand Up @@ -38,9 +38,8 @@
reward_head_hidden_channels=[8],
value_head_hidden_channels=[8],
policy_head_hidden_channels=[8],
support_scale=10,
reward_support_size=21,
value_support_size=21,
reward_support_range=(-10., 11., 1.),
value_support_range=(-10., 11., 1.),
),
cuda=True,
env_type='board_games',
Expand Down
5 changes: 2 additions & 3 deletions lzero/agent/config/muzero/gomoku_play_with_bot.py
Original file line number Diff line number Diff line change
Expand Up @@ -44,9 +44,8 @@
image_channel=3,
num_res_blocks=1,
num_channels=32,
support_scale=10,
reward_support_size=21,
value_support_size=21,
reward_support_range=(-10., 11., 1.),
value_support_range=(-10., 11., 1.),
),
cuda=True,
env_type='board_games',
Expand Down
5 changes: 2 additions & 3 deletions lzero/agent/config/muzero/tictactoe_play_with_bot.py
Original file line number Diff line number Diff line change
Expand Up @@ -38,9 +38,8 @@
reward_head_hidden_channels=[8],
value_head_hidden_channels=[8],
policy_head_hidden_channels=[8],
support_scale=10,
reward_support_size=21,
value_support_size=21,
reward_support_range=(-10., 11., 1.),
value_support_range=(-10., 11., 1.),
norm_type='BN',
),
cuda=True,
Expand Down
6 changes: 3 additions & 3 deletions lzero/entry/__init__.py
Original file line number Diff line number Diff line change
@@ -1,5 +1,6 @@
from .eval_alphazero import eval_alphazero
from .eval_muzero import eval_muzero

from .eval_muzero_with_gym_env import eval_muzero_with_gym_env
from .train_alphazero import train_alphazero
from .train_muzero import train_muzero
Expand All @@ -12,6 +13,5 @@
from .train_muzero_multitask_segment_ddp import train_muzero_multitask_segment_ddp
from .train_unizero_multitask_segment_ddp import train_unizero_multitask_segment_ddp
from .train_unizero_multitask_segment_eval import train_unizero_multitask_segment_eval
from .utils import *

from .train_unizero_multitask_balance_segment_ddp import train_unizero_multitask_balance_segment_ddp
from .train_unizero_multitask_balance_segment_ddp import train_unizero_multitask_balance_segment_ddp
from .utils import *
80 changes: 0 additions & 80 deletions lzero/entry/compute_task_weight.py

This file was deleted.

3 changes: 2 additions & 1 deletion lzero/entry/eval_muzero.py
Original file line number Diff line number Diff line change
@@ -1,6 +1,7 @@
import os
from functools import partial
from typing import Optional, Tuple
import logging

import numpy as np
import torch
Expand Down Expand Up @@ -51,7 +52,7 @@ def eval_muzero(
# Create main components: env, policy
env_fn, collector_env_cfg, evaluator_env_cfg = get_vec_env_setting(cfg.env)
evaluator_env = create_env_manager(cfg.env.manager, [partial(env_fn, cfg=c) for c in evaluator_env_cfg])

# print(f"cfg.seed:{cfg.seed}")
evaluator_env.seed(cfg.seed, dynamic_seed=False)
set_pkg_seed(cfg.seed, use_cuda=cfg.policy.cuda)

Expand Down
Loading
Loading