The following is a mapping from terminology used in the codebase to that used in the paper, where it differs:
axe
in the code corresponds to the "salad-making task" in the paperdeer
in the code corresponds to the "hunting task" in the papermonster
in the code corresponds to the "scavenging task" in the paperfactory
in the code corresponds to the "factory task" in the paperwaypoint
in the code refers to the "subgoal reward" in the paper
- Figures 2 (State Visitation Maps and Performance Curve of Episodic and Non-Episodic Learning)
- Figures 4 (Effect of Dynamic Environment on Episodic and Non-Episodic Learning) and 6 (Dynamic Ablations)
- Figure 5 (Shaping Methods for Episodic and Non-Episodic Learning)
- Figure 8 (Shaping Methods for Walled Salad-Making Task)
- Figures 9 and 10 (State Visitation Counts for Walled Salad-Making Task)
- Figures 11 and 12 (Learned Behavior on Salad-Making and Hunting Tasks)
- Table 1 (Hitting Time and Marginal State Entropy for Dynamism and Environment Shaping)
The experiments require a set of validation environments for performance evaluation. Paired with each experiment script
is a script called gen_validation_envs.py
, which outputs a Python pickle file containing these validation
environments, the path to which can be fed in directly to the experiment script as its validation_envs_pkl
argument
in the algorithm_kwargs
dictionary found in each experiment script.