This final project attempts to cluster different agent types (random, deterministic, value-based) based on their trajectories. Unfortunately we can't differientiate between random and value-based ๐.
Files tree:
/Users/aneesh/Documents/neuro240
โโโ .gitattributes
โโโ README.md
โโโ agents
โย ย โโโ d_agent.py
โย ย โโโ r_agent.py
โย ย โโโ v_agent.py
โโโ dataset
โย ย โโโ gen.py
โย ย โโโ storage.py
โโโ env
โย ย โโโ grid.py
โโโ model
โย ย โโโ full.py
โย ย โโโ ln.py
โย ย โโโ pp.py
โโโ train.py
โโโ train.sh
โโโ utils
โโโ log.py
โโโ loss.py
โโโ visualize.py
-
agents: Contains agent models implementing different strategies.
d_agent.py
: Implements deterministic agents with predefined movement (we loop throug samples) patterns.r_agent.py
:random agents.v_agent.py
: value-iteration trained agents.
-
dataset: Handles dataset generation and storage.
-
env:
grid.py
: grid environment with reward config in init
-
model:
full.py
: The combined end-to-end model used to predict actions from latent representations.ln.py
: Latent space network for clustering trajectories.pp.py
: Prediction network that uses the latent space to determine the next state/action.
-
train.py: train script
-
train.sh: to run on HPC
python train.py
should work. Be careful with Jax Cuda, future work should convert codebase to jax for major speed up.
- Agent Types:
- Toggle experiments by modifying the
agent_types
variable intrain.py
. .
- Toggle experiments by modifying the