Any reason for strick reset condition and simple reward for Inverted_pendulum? #604

benthebear93 · 2025-04-25T07:35:54Z

benthebear93
Apr 25, 2025

I have been using the inverted_pendulum env for testing different RL method, mostly TD3.
I found that the env has a rather strick reset condition (it resets when pole angle exceed 0.2 rad) and a simple reward function.
Also, it initializes with only small variations (+- 0.01 rad) aftera a reset, which results almost no changes between episodes.

Is there a reason why the inverted_pendulum env is designed this way?
Maybe it's meant to be kept minimal?
Or is it because the rail is too short to allow the pole to recover from larger deviations?

I know that designing environments isn't the main purpose of Brax, but I'm just curious :)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Any reason for strick reset condition and simple reward for Inverted_pendulum? #604

Uh oh!

{{title}}

Uh oh!

Replies: 0 comments

Select a reply

Uh oh!

Any reason for strick reset condition and simple reward for Inverted_pendulum? #604

Uh oh!

benthebear93 Apr 25, 2025

Replies: 0 comments

benthebear93
Apr 25, 2025