Skip to content

[BUG] It's not clear how to call an advantage module with batched envs and pixel observations. #1522

Open
@skandermoalla

Description

@skandermoalla

Describe the bug

When you get a tensordict rollout of shape (N_envs, N_steps, C, H, W) out of a collector and you want to apply an advantage module that starts with conv2d layers:

  1. directly applying the module will crash with the conv2d layer complaining about the input size e.g. RuntimeError: Expected 3D (unbatched) or 4D (batched) input to conv2d, but got input of size: [2, 128, 4, 84, 84]
  2. flattening the tensordict first with rollout.reshape(-1) so that it has shape [B, C, H, W] and then calling the advantage module will run but issue the warning torchrl/objectives/value/advantages.py:99: UserWarning: Got a tensordict without a time-marked dimension, assuming time is along the last dimension. leaving you unsure of wether the advantages were computed correctly.

So it's not clear how one should proceed.

  • I have checked that there is no similar issue in the repo (required)
  • I have read the documentation (required)
  • I have provided a minimal working example to reproduce the bug (required)

Metadata

Metadata

Assignees

Labels

bugSomething isn't working

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions