Any reason for strick reset condition and simple reward for Inverted_pendulum? #604
benthebear93
started this conversation in
General
Replies: 0 comments
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
-
I have been using the inverted_pendulum env for testing different RL method, mostly TD3.
I found that the env has a rather strick reset condition (it resets when pole angle exceed 0.2 rad) and a simple reward function.
Also, it initializes with only small variations (+- 0.01 rad) aftera a reset, which results almost no changes between episodes.
Is there a reason why the inverted_pendulum env is designed this way?
Maybe it's meant to be kept minimal?
Or is it because the rail is too short to allow the pole to recover from larger deviations?
I know that designing environments isn't the main purpose of Brax, but I'm just curious :)
Beta Was this translation helpful? Give feedback.
All reactions