How to run gtrxl with ppo policy? can someone provide an example?

Try this one <a href="https://github.com/opendilab/DI-engine/blob/main/dizoo/bsuite/co

gtrxl about di-engine HOT 18 CLOSED

opendilab commented on August 22, 2024

gtrxl

from di-engine.

Comments (18)

zxzzz0 commented on August 22, 2024

Try this one https://github.com/opendilab/DI-engine/blob/main/dizoo/bsuite/config/serial/memory_len/memory_len_15_r2d2_gtrxl_config.py

from di-engine.

PaParaZz1 commented on August 22, 2024

You can try gtrxl with R2D2 according to above mentioned link.

As for gtrxl with PPO, if necessary, we will add corresponding implementation. Which RL environment do you want to use gtrxl with ppo, such as some discrete action envs like lunarlander?

from di-engine.

hlsafin commented on August 22, 2024

Hmmm oddly enough, I tried that link and it was giving me some error. I will try to run it again and see what the error was. I am just trying to test it out on atari env at the moment.

from di-engine.

hlsafin commented on August 22, 2024

and I get this error

File "/opt/conda/lib/python3.8/site-packages/torch/nn/modules/module.py", line 778, in getattr
raise ModuleAttributeError("'{}' object has no attribute '{}'".format(
torch.nn.modules.module.ModuleAttributeError: 'GTrXLDiscreteHead' object has no attribute 'dropout'

from di-engine.

PaParaZz1 commented on August 22, 2024

and I get this error

File "/opt/conda/lib/python3.8/site-packages/torch/nn/modules/module.py", line 778, in getattr raise ModuleAttributeError("'{}' object has no attribute '{}'".format( torch.nn.modules.module.ModuleAttributeError: 'GTrXLDiscreteHead' object has no attribute 'dropout'

Maybe your config is wrong, you can check it by comparing it with our atari config, i.e. for Atari env, you need to indicate 3D obs_shape

from di-engine.

hlsafin commented on August 22, 2024

oh, I see! thank you!

from di-engine.

hlsafin commented on August 22, 2024

Okay so that code works now, I didnt change the config, but it crashes after a bit. I have a RTX 3080 and 32GB of RAM.

from di-engine.

PaParaZz1 commented on August 22, 2024

What kind of crash error you get, can you offer more details like the error traceback?

from di-engine.

hlsafin commented on August 22, 2024

I can't really see, I have everything on that docker file, it trains for a while then after several hours it exits, so I am assuming maybe it was a memory leak then sigkill was called? possibly. Were you able to run training in that environment with no issues after a very long period of time?

from di-engine.

PaParaZz1 commented on August 22, 2024

Maybe your problem is OOM, due to the huge replay buffer of R2D2-GTrXL, you can try to rerun your experiment with smaller replay_buffer_size. In our experiment about R2D2-GTrXL in Atari (training curve is in here), it usually needs more than 50~60GB of RAM, you can monitor the usage of RAM.

from di-engine.

hlsafin commented on August 22, 2024

I had my replay buffer at 1000, and also doesn't the replay buffer fill up fairly quickly in the beginning? or does it still fill up several hours later that would cause this OOM. Because my understanding was that if it can work for the first 3 hours, why all of a sudden does it break after that? make no sense to me.

from di-engine.

PaParaZz1 commented on August 22, 2024

For R2D2-GTrXL, each element in replay buffer is a train_sample, i.e., a list of transition of length unroll_len, so it will not be full at one. I think you should monitor the usage of RAM in your experiment first, you can utilize some tools like this

from di-engine.

hlsafin commented on August 22, 2024

This is what I get after 1.6 million env_time_steps, and the mean reward is still around -20, I use the same config as what was mentioned before except I brought the experience replay buffer to 1000 vs 10,000 before. Does this look right to you?

from di-engine.

PaParaZz1 commented on August 22, 2024

I have rerun the pong_r2d2_gtrxl.py with buffer_size 10000 and 1000, here is the naive result:

You can see the experiment with buffer_size 1000 (blue curve) indeed shows more poor performance and until 2M env step it begins to rise. But the experiment with buffer_size 10000 exhibits the similar result with our previous experiment, so there is no bug in our implementation and you need a larger replay buffer.

BTW, the memory utilization of these two experiment are shown as follows:

buffer_size 10000, max usage 40.0GB:

buffer_size 1000, max usage 9.8GB:

from di-engine.

hlsafin commented on August 22, 2024

Okay wow, thank you for the demonstration, I didn't realize that the buffer size played such a huge role. I'm currently trying to tackle Montezuma revenge with 8 GPUs, how do I make them run r2d2gtrx on multi-GPUs with data-parallel (i'm currently doing ddp). What sort of config do you recommend? I currently am running it on 1 gpu with memory len at 256, and unroll_len 95,seq_len = 90, and not much learning is happening, I'm wondering if it's my setup or r2d2gtrxl just isn't able to solve this problem?

from di-engine.

PaParaZz1 commented on August 22, 2024

Buffer size is of great important to off-policy value-based method, if interested, you can have a look about this paper.

For data-parallel, you can refer to this doc.

To solve Montezuma Revenge, exploration is more important than better time-series model like GTrXL, I think you can think about combine RND or Go-Explore with your current code.

from di-engine.

hlsafin commented on August 22, 2024

Okay, Because I was under the impression that games like minigrid, which this approach shouldn't have any issue solving, and Montezuma revenge are similar in that they both have sparse rewards. Maybe I am incorrect here.

Also, I tried running pong with the approach, and I still wasn't able to get your reward. It stays around -19 for 5 million time steps.

from di-engine.

Sino-Huang commented on August 22, 2024

@PaParaZz1 Greeting, a quick question, is there a easy way for me to concatenate a Conv2D network before the GTrXl network because it looks like I cannot directly use GTrXl for Atari environment due to obs_shape, also there is no obs_shape parameter in GTrXl.

from di-engine.

gtrxl about di-engine HOT 18 CLOSED

Comments (18)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent