jurgisp / memory-maze Goto Github PK

View Code? Open in Web Editor NEW

129.0 129.0 13.0 52 KB

Evaluating long-term memory of reinforcement learning algorithms

License: MIT License

Python 100.00%

benchmark reinforcement-learning research

memory-maze's People

Contributors

Stargazers

Watchers

Forkers

danijar dosssman liuqi8827 tae898 freyavs rtplayground howuhh electrixoul dirkmcpherson subho406 wyq199321 emptydiagram nikita-dudorov

memory-maze's Issues

Optimal planner of the maze

Hi, do you provide an optimal planner to generate actions to reach the current target?

I searched the code and found BFS function def breadth_first_search() in memory_maze/oracle.py

However, it was not possible to generate optimal actions using this function due to the continuous nature of the position of an agent, while the path generated by the planner is discrete.

Do you have any other methods to generate optimal actions?

Any feedback will be appreciated.

Drop box Link for offline data has some problems

While exploring on Memory-maze github page, I noticed that offline data (~100GB) dropbox link has some problems. I was trying to download the offline data, but it suddenly stops downloading at some point which was random. I tried this multiple times, but it stopped randomly every time I tried. I also checked on my colleagues' laptop and thought that there may be some collapse in uploaded data.

Would you check the dropbox file one more time?

Ability to specify random seed in gym wrapper

The ability to set random seed was recently implemented in dm_env. It needs to be added to gym wrapper too.

Add environment variant with continuous action space

It would be nice to have a version of the environment with the raw continuous action space instead of the discretized one.

same colors for different maze sizes

Is there any way to make the number of colors the same for all sizes of mazes? I find it very interesting to test generalization on more complex problems by learning on easier ones. However, to test the memory only (and not a representations), it is important that the colors be familiar to the agent (for example during transition from 9x9 to 15x15 maze).

Does the target location chage when the reset method is called?

Thank you for your anwsering my last issue.
You tell me that when I call the reset method, the maze layout will be changed.
So I want know Does the target location and the maze layout change together in every episode?

And, in your discription, the agent is prompted to find the target object of a specific color, indicated by the border color in the observation image. I don't clear the relationship between the target object color and the border color. The border has different color. How the agent find the specific color target according to the color of the border?

WARNING:absl:Cannot set velocity on Entity with no free joint.

Hi,

Thanks for making nice benchmark task.
I'm curious if there is a way to avoid following warning output once a episode. Am I wrong to use?
This warning seems to appear on other people's terminal as well.

WARNING:absl:Cannot set velocity on Entity with no free joint.

I'm using this repo and following command with python3.8.

python3 dreamer.py --configs MemoryMaze --logdir ./logdir/test

Thanks in advance!

Best regards,
NM512

How many hours and how many CPUs are needed?

Dear authors,

It is a great repo. May I ask how many hours and how many CPUs are needed to run the experiments for each seed?

run this in dreamerv3

GUI doesn't update in the human mode

Hi!

I installed the benchmark following the instruction on the website, and was trying to play around with it in human mode using "python gui/run_gui.py --env "memory_maze:MemoryMaze-9x9-HD-v0". However, it looks like the observation never change no matter what action I do (only "exit" and "reset" works).

I have made sure that my keyboard works correctly, and I have also tried binding the actions to other keys instead of arrow keys, but it still doesn't work. When I printed out the action in line 174, the correct action is printed out after I pressed a key, but the observation still doesn't change. (didn't try to store the full observation, but obs.sum() is always the same no matter what I do, which probably indicates some errors).

Besides that, the only thing that is getting updated is the stuff in the bottom-left corner, I have no idea what that is but it seems to be changed according to my actions.

I have little to no experience in Pygame development so I hope to seek some help here. Let me know if you need more information from me, thanks!

PPO Baseline for MemoryMaze

Hi, Great environment. Just wondering, is there a PPO baseline available for this environment?

AttributeError: 'NoneType' object has no attribute 'glGetError'

I ran a fresh conda env (python 3.9.7), installed pip install memory-maze and pip install gym pygame pillow imageio.

When I run python gui/run_gui.py I get the following error:

python gui/run_gui.py
Creating environment: memory_maze:MemoryMaze-9x9-v0
Traceback (most recent call last):
File "/home/leo/memory-maze/gui/run_gui.py", line 231, in
main()
File "/home/leo/memory-maze/gui/run_gui.py", line 56, in main
env = gym.make(args.env, disable_env_checker=True)
File "/home/leo/miniconda3/envs/memory-maze2/lib/python3.9/site-packages/gym/envs/registration.py", line 540, in make
importlib.import_module(module)
File "/home/leo/miniconda3/envs/memory-maze2/lib/python3.9/importlib/init.py", line 127, in import_module
return _bootstrap._gcd_import(name[level:], package, level)
File "", line 1030, in _gcd_import
File "", line 1007, in _find_and_load
File "", line 986, in _find_and_load_unlocked
File "", line 680, in _load_unlocked
File "", line 850, in exec_module
File "", line 228, in _call_with_frames_removed
File "/home/leo/memory-maze/memory_maze/init.py", line 8, in
from . import tasks
File "/home/leo/memory-maze/memory_maze/tasks.py", line 2, in
from dm_control import composer
File "/home/leo/miniconda3/envs/memory-maze2/lib/python3.9/site-packages/dm_control/composer/init.py", line 18, in
from dm_control.composer.arena import Arena
File "/home/leo/miniconda3/envs/memory-maze2/lib/python3.9/site-packages/dm_control/composer/arena.py", line 20, in
from dm_control import mjcf
File "/home/leo/miniconda3/envs/memory-maze2/lib/python3.9/site-packages/dm_control/mjcf/init.py", line 18, in
from dm_control.mjcf.attribute import Asset
File "/home/leo/miniconda3/envs/memory-maze2/lib/python3.9/site-packages/dm_control/mjcf/attribute.py", line 28, in
from dm_control.mujoco.wrapper import util
File "/home/leo/miniconda3/envs/memory-maze2/lib/python3.9/site-packages/dm_control/mujoco/init.py", line 18, in
from dm_control.mujoco.engine import action_spec
File "/home/leo/miniconda3/envs/memory-maze2/lib/python3.9/site-packages/dm_control/mujoco/engine.py", line 41, in
from dm_control import _render
File "/home/leo/miniconda3/envs/memory-maze2/lib/python3.9/site-packages/dm_control/_render/init.py", line 86, in
Renderer = import_func()
File "/home/leo/miniconda3/envs/memory-maze2/lib/python3.9/site-packages/dm_control/_render/init.py", line 46, in _import_osmesa
from dm_control._render.pyopengl.osmesa_renderer import OSMesaContext
File "/home/leo/miniconda3/envs/memory-maze2/lib/python3.9/site-packages/dm_control/_render/pyopengl/osmesa_renderer.py", line 35, in
from OpenGL import GL
File "/home/leo/miniconda3/envs/memory-maze2/lib/python3.9/site-packages/OpenGL/GL/init.py", line 4, in
from OpenGL.GL.VERSION.GL_1_1 import *
File "/home/leo/miniconda3/envs/memory-maze2/lib/python3.9/site-packages/OpenGL/GL/VERSION/GL_1_1.py", line 14, in
from OpenGL.raw.GL.VERSION.GL_1_1 import *
File "/home/leo/miniconda3/envs/memory-maze2/lib/python3.9/site-packages/OpenGL/raw/GL/VERSION/GL_1_1.py", line 7, in
from OpenGL.raw.GL import _errors
File "/home/leo/miniconda3/envs/memory-maze2/lib/python3.9/site-packages/OpenGL/raw/GL/_errors.py", line 4, in
_error_checker = _ErrorChecker( _p, _p.GL.glGetError )
AttributeError: 'NoneType' object has no attribute 'glGetError'

Someone else mentioned something about pyopengl in #30, could it be the same issue?
The versions I have installed are:
memory-maze-1.0.3
mujoco-3.1.4
pyopengl-3.1.7

Did I do something wrong/is there something wrong with my environment or is it the pip installation?

Thanks

Warning "Can not set velocity [...]" on env reset

There are pesky WARNING:absl:Cannot set velocity on Entity with no free joint. warnings on environment initialization, coming from somewhere inside MuJoCo. They don't mean anything bad, but would be nice to get rid of them, not to pollute the logs.

I always meeting error as below, so I want know could I install memory-maze not by git command?

`
Collecting dmc-memory-maze
Cloning https://github.com/jurgisp/dmc-memory-maze.git to c:\users\hnxcd\appdata\local\temp\pip-install-g2hj9ehh\dmc-memory-maze_276fa8f10fe54bec87532cf80ab27f9d
Running command git clone -q https://github.com/jurgisp/dmc-memory-maze.git 'C:\Users\HNXCD\AppData\Local\Temp\pip-install-g2hj9ehh\dmc-memory-maze_276fa8f10fe54bec87532cf80ab27f9d'
fatal: unable to access 'https://github.com/jurgisp/dmc-memory-maze.git/': Failed to connect to github.com port 443 after 21056 ms: Timed out
WARNING: Discarding git+https://github.com/jurgisp/dmc-memory-maze.git#egg=dmc-memory-maze. Command errored out with exit status 128: git clone -q https://github.com/jurgisp/dmc-memory-maze.git 'C:\Users\HNXCD\AppData\Local\Temp\pip-install-g2hj9ehh\dmc-memory-maze_276fa8f10fe54bec87532cf80ab27f9d' Check the logs for full command output.
ERROR: Exception:
Traceback (most recent call last):
File "D:\app\Anaconda\envs\memory_maze\lib\site-packages\pip_vendor\resolvelib\resolvers.py", line 341, in resolve
self._add_to_criteria(self.state.criteria, r, parent=None)
File "D:\app\Anaconda\envs\memory_maze\lib\site-packages\pip_vendor\resolvelib\resolvers.py", line 173, in _add_to_criteria
raise RequirementsConflicted(criterion)
pip._vendor.resolvelib.resolvers.RequirementsConflicted: Requirements conflict: UnsatisfiableRequirement('dmc-memory-maze')

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "D:\app\Anaconda\envs\memory_maze\lib\site-packages\pip_internal\resolution\resolvelib\resolver.py", line 95, in resolve
collected.requirements, max_rounds=try_to_avoid_resolution_too_deep
File "D:\app\Anaconda\envs\memory_maze\lib\site-packages\pip_vendor\resolvelib\resolvers.py", line 472, in resolve
state = resolution.resolve(requirements, max_rounds=max_rounds)
File "D:\app\Anaconda\envs\memory_maze\lib\site-packages\pip_vendor\resolvelib\resolvers.py", line 343, in resolve
raise ResolutionImpossible(e.criterion.information)
pip._vendor.resolvelib.resolvers.ResolutionImpossible: [RequirementInformation(requirement=UnsatisfiableRequirement('dmc-memory-maze'), parent=None)]

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "D:\app\Anaconda\envs\memory_maze\lib\site-packages\pip_internal\cli\base_command.py", line 173, in _main
status = self.run(options, args)
File "D:\app\Anaconda\envs\memory_maze\lib\site-packages\pip_internal\cli\req_command.py", line 203, in wrapper
return func(self, options, args)
File "D:\app\Anaconda\envs\memory_maze\lib\site-packages\pip_internal\commands\install.py", line 316, in run
reqs, check_supported_wheels=not options.target_dir
File "D:\app\Anaconda\envs\memory_maze\lib\site-packages\pip_internal\resolution\resolvelib\resolver.py", line 101, in resolve
collected.constraints,
File "D:\app\Anaconda\envs\memory_maze\lib\site-packages\pip_internal\resolution\resolvelib\factory.py", line 630, in get_installation_error
return self._report_single_requirement_conflict(req, parent)
File "D:\app\Anaconda\envs\memory_maze\lib\site-packages\pip_internal\resolution\resolvelib\factory.py", line 580, in _report_single_requirement_conflict
cands = self._finder.find_all_candidates(req.project_name)
File "D:\app\Anaconda\envs\memory_maze\lib\site-packages\pip_internal\index\package_finder.py", line 798, in find_all_candidates
page_candidates = list(page_candidates_it)
File "D:\app\Anaconda\envs\memory_maze\lib\site-packages\pip_internal\index\sources.py", line 134, in page_candidates
yield from self._candidates_from_page(self._link)
File "D:\app\Anaconda\envs\memory_maze\lib\site-packages\pip_internal\index\package_finder.py", line 758, in process_project_url
html_page = self._link_collector.fetch_page(project_url)
File "D:\app\Anaconda\envs\memory_maze\lib\site-packages\pip_internal\index\collector.py", line 490, in fetch_page
return _get_html_page(location, session=self.session)
File "D:\app\Anaconda\envs\memory_maze\lib\site-packages\pip_internal\index\collector.py", line 400, in _get_html_page
resp = _get_html_response(url, session=session)
File "D:\app\Anaconda\envs\memory_maze\lib\site-packages\pip_internal\index\collector.py", line 132, in _get_html_response
"Cache-Control": "max-age=0",
File "D:\app\Anaconda\envs\memory_maze\lib\site-packages\pip_vendor\requests\sessions.py", line 555, in get
return self.request('GET', url, **kwargs)
File "D:\app\Anaconda\envs\memory_maze\lib\site-packages\pip_internal\network\session.py", line 454, in request
return super().request(method, url, *args, **kwargs)
File "D:\app\Anaconda\envs\memory_maze\lib\site-packages\pip_vendor\requests\sessions.py", line 542, in request
resp = self.send(prep, **send_kwargs)
File "D:\app\Anaconda\envs\memory_maze\lib\site-packages\pip_vendor\requests\sessions.py", line 655, in send
r = adapter.send(request, **kwargs)
File "D:\app\Anaconda\envs\memory_maze\lib\site-packages\pip_vendor\cachecontrol\adapter.py", line 53, in send
resp = super(CacheControlAdapter, self).send(request, **kw)
File "D:\app\Anaconda\envs\memory_maze\lib\site-packages\pip_vendor\requests\adapters.py", line 449, in send
timeout=timeout
File "D:\app\Anaconda\envs\memory_maze\lib\site-packages\pip_vendor\urllib3\connectionpool.py", line 696, in urlopen
self._prepare_proxy(conn)
File "D:\app\Anaconda\envs\memory_maze\lib\site-packages\pip_vendor\urllib3\connectionpool.py", line 964, in _prepare_proxy
conn.connect()
File "D:\app\Anaconda\envs\memory_maze\lib\site-packages\pip_vendor\urllib3\connection.py", line 359, in connect
conn = self._connect_tls_proxy(hostname, conn)
File "D:\app\Anaconda\envs\memory_maze\lib\site-packages\pip_vendor\urllib3\connection.py", line 506, in connect_tls_proxy
ssl_context=ssl_context,
File "D:\app\Anaconda\envs\memory_maze\lib\site-packages\pip_vendor\urllib3\util\ssl.py", line 453, in ssl_wrap_socket
ssl_sock = ssl_wrap_socket_impl(sock, context, tls_in_tls)
File "D:\app\Anaconda\envs\memory_maze\lib\site-packages\pip_vendor\urllib3\util\ssl.py", line 495, in _ssl_wrap_socket_impl
return ssl_context.wrap_socket(sock)
File "D:\app\Anaconda\envs\memory_maze\lib\ssl.py", line 407, in wrap_socket
_context=self, _session=session)
File "D:\app\Anaconda\envs\memory_maze\lib\ssl.py", line 773, in init
raise ValueError("check_hostname requires server_hostname")
ValueError: check_hostname requires server_hostname
`

Mean Max Reward

Hi! You report mean max reward in your paper for 1000 steps @ 4 actions / step (4hz). If I was running this environment for 1000 steps @ 1 action / step (1hz) is it fair to assume the mean max rewards will be a fourth of what you've reported?

Great environment! Thanks!

Is the Id of the Maze Env changed?

Hello!
I found that this project has updated a few days ago.
Is the id to make the env changed?
I remember the old name is 'dmc_memory_maze:MemoryMaze-9x9-v0', but now it seems not.
I wander to know the differences between the new version and the old one.

Requesting Updated PIP Package

Seems like the current PIP release is outdated and does not include deterministic maze generation introduced 6 months earlier. Could you update the package to the current repo which includes this change?

env step time

Hi! Just want to double check. I noticed that rollouts are a lot longer than I expected and created a simple test with random actions:

import gym
import memory_maze
from tqdm.auto import trange

def rollout(env):
    done = False
    obs = env.reset()
    total_reward = 0.0
    while not done:
        obs, reward, done, _ = env.step(env.action_space.sample())
        total_reward += reward
    return total_reward


env = gym.make("MemoryMaze-9x9-v0")

for _ in trange(10):
    rollout(env)

On my M1 this will take almost 2 minutes just for 10 random rollouts. Is it normal? Because it's very slow (~80fps).

Versionning issues

Hello there, this is a possible issue and a solution.

When installing dm_control, there are no version specified in the setup and the dm_control newest version seems not to be adapted + it may also install the newest mujoco if you donj't already have it. However, memory_maze requires methods that do not exist past mujoco 2.3.7.

Therefore if you have any versioning issues try:
dm-control-1.0.11
mujoco-2.3.7
pyopengl-3.1.7

Took me a bit of searching to have everything working, so if I can save you the trouble.

Dear creators, great job for this amazing work :)
Please do not hesitate to specify the version used in your works.

Best!

what about this longshort memory

https://github.com/NeuromorphicComputing/STPN

Data generation

I was wondering if you could post the code of how you generated the data, as I'd like to do the same with some modifications. With my naive data generation approach it seems the agent looks at walls most of the time.

Regards

[Explanations on offline data]

Hi, I have question on offline data specification.

When I loaded the one of npz file, I noticed that all the keys like 'action' or 'reward' or 'terminal' have size of 1001.

Did you just put dummy 'action', 'reward', 'terminal' for the first element?

I mean if the original sequence is O_0, a_0, r_0, t_0, O_1, a_1, r_1, t_1, ... (O: image, a: action, r: reward, t: terminal), is the offline data formed as O_0, a_-1, r_-1, t_-1, O_1, a_0, r_0, t_0, ... (a_-1, r_-1, t_-1 are some dummy values) ?

Thanks.

Reducing FPS and Memory Usage

Hey,
I am trying to use this environment with Sample Factory. My current setup is 12 CPUs, 80GB Memory, 1XA100 GPU. I'm training using Async PPO with a total of 24 parallel environment instances. However, I'm noticing extremely high environment step times and RAM usage, which is making it difficult for me to increase the number of environment instances beyond 24. Is there a way reduce the memory usage and the environment step times? Maybe the tricks used in the Sample Factory DMLab experiments to cache the level data might help?
Thanks

Is the structure of the environment fixed once the environment is created?

When I use gym to interact with memory-maze.

Is the structure of the environment fixed once the environment is created?

Or will thie env structure be rebuilt every time when the reset method is called?

Fixed maze every episode, Random agent location

Hey, I'm looking to modify this environment slightly. I'm looking for an environment where the maze does change after every call to the reset (it is fixed based on the seed used to initialize it), but the player location is reset randomly every reset call. Is it possible to do in this in the current implementation or will it require some code changes? Do you have some advice on what changes might be required to get this working.

Specify dependency version requirements

Can you list the versions of required packages that you used or know work? When I try to run the GUI demo on mac python 3.9.2, I get this error, which I assume is because I have an incompatible version:

% python gui/run_gui.py

pygame 2.1.2 (SDL 2.0.18, Python 3.9.2)
Hello from the pygame community. https://www.pygame.org/contribute.html
Creating environment: memory_maze:MemoryMaze-9x9-v0
Traceback (most recent call last):
  File "/Users/zplizzi/temp/memory-maze/gui/run_gui.py", line 228, in <module>
    main()
  File "/Users/zplizzi/temp/memory-maze/gui/run_gui.py", line 53, in main
    env = gym.make(args.env, disable_env_checker=True)
  File "/Users/zplizzi/.pyenv/versions/3.9.2/Python.framework/Versions/3.9/lib/python3.9/site-packages/gym/envs/registration.py", line 156, in make
    return registry.make(id, **kwargs)
  File "/Users/zplizzi/.pyenv/versions/3.9.2/Python.framework/Versions/3.9/lib/python3.9/site-packages/gym/envs/registration.py", line 101, in make
    env = spec.make(**kwargs)
  File "/Users/zplizzi/.pyenv/versions/3.9.2/Python.framework/Versions/3.9/lib/python3.9/site-packages/gym/envs/registration.py", line 70, in make
    env = self.entry_point(**_kwargs)
  File "/Users/zplizzi/.pyenv/versions/3.9.2/Python.framework/Versions/3.9/lib/python3.9/site-packages/memory_maze/__init__.py", line 23, in _make_gym_env
    dmenv = dm_task(**kwargs)
  File "/Users/zplizzi/.pyenv/versions/3.9.2/Python.framework/Versions/3.9/lib/python3.9/site-packages/memory_maze/tasks.py", line 25, in memory_maze_9x9
    return _memory_maze(9, 3, 250, **kwargs)
TypeError: _memory_maze() got an unexpected keyword argument 'disable_env_checker'

Code for the offline datasets generation

Hi! Very cool benchmark! Is it possible to publish the code with which the datasets were generated? I am very interested in testing my ideas about memory in Offline RL setup. However, for a complete benchmark (e.g. for a paper), I would like to have datasets for both 11x11 and 13x13 mazes too.

Show Cue only for N frames

Hey, I am working on a modified Memory Maze setting, where the goal is to remember the cue signal instead of the maze layout. My goal is to have an environment where the maze layout is provided as input to the agent along with the current observation. The cue signal, which is currently shown as border in the observation at all times, should be shown only for N frames when the cue changes. I was thinking of using the oracle wrappers, but modify it in a way that the cue becomes white after N frames. Do you have any suggestions on how I could make this work in the current environment? or do you have a wrapper that already does this?