This repository is an extension of the repository SafeDRP from kaji-ou,
which is already an extension of DRPChallenge.
We extend the original environment with a hybrid action policy that combines rule-based navigation and reinforcement learning (RL).
While in epymarl:
python3 src/main.py --config=qmix --env-config=gymma with env_args.time_limit=100 env_args.key="drp_env:drp-4agent_map_8x5-v2" env_args.state_repre_flag="onehot_fov"
This uses safe wrapper "dep_env/SafeMarlEnv/env_wrapper"
python3 src/main.py --config=qmix --env-config=gymma with env_args.time_limit=100 env_args.key="drp_env:drp_safe-4agent_map_8x5-v2" env_args.state_repre_flag="onehot_fov"Our approach employs a hybrid action policy: a combination of rule-based navigation and reinforcement learning (RL).
- Rule-based algorithm: guides the agent toward the target node via the shortest path when possible.
- RL exploration: enables adaptive behavior through trial-and-error.
- Weighted combination:
- Early training: 90% rule-based, 10% RL.
- Over time: the rule-based influence decreases with episode index.
probability_rule_based(): returns probability of following rule-based policy (decreases with training).shortest_path_action(joint_action): computes valid shortest-path actions.action_policy(joint_action): mixes rule-based and RL policies.action_policy_verifying(next_node, i): ensures validity of chosen actions.get_map_complexity(): calculates a numerical complexity score for the map.
Stabilizes early training with rule-based guidance. Encourages adaptive, generalized behavior as RL influence grows.