denys88 / rl_games Goto Github PK
View Code? Open in Web Editor NEWRL implementations
License: MIT License
RL implementations
License: MIT License
Hi,
I saw you were working on making it possible to train SAC agents with rl_games. Is that possible already? I was checking the configs and couldn't find anything. Everything seems ppo related. So I guess you can't run SAC currently?
sry no time for PR but in common/wrappers.py it should be in BatchedFrameStack
def _get_ob(self):
assert len(self.frames) == self.k
if self.transpose:
frames = np.transpose(self.frames, (1, 2, 0))
else:
if self.flatten:
frames = np.array(self.frames)
shape = np.shape(frames)
frames = np.transpose(frames, (1, 0, 2))
frames = np.reshape(frames, (shape[1], shape[0] * shape[2]))
else:
frames = np.transpose(self.frames, (1, 0, 2))
return frames
Hi, does anyone know where I can find an example config file like rl_games/configs/ppo_continuous.yaml
except that I want to use CNN to handle the image input? I tried to set the config file as follows:
config:
name: ${resolve_default:FrankaCabinet,${....experiment}}
full_experiment_name: ${.name}
env_name: rlgpu
ppo: True
mixed_precision: False
normalize_input: True
normalize_value: True
num_actors: ${....task.env.numEnvs}
reward_shaper:
scale_value: 0.01
normalize_advantage: True
gamma: 0.99
tau: 0.95
learning_rate: 5e-4
lr_schedule: adaptive
kl_threshold: 0.008
score_to_win: 10000
max_epochs: ${resolve_default:1500,${....max_iterations}}
save_best_after: 200
save_frequency: 100
print_stats: True
grad_norm: 1.0
entropy_coef: 0.0
truncate_grads: True
e_clip: 0.2
horizon_length: 16
minibatch_size: 5
mini_epochs: 8
critic_coef: 4
clip_value: True
seq_len: 4
bounds_loss_coef: 0.0001
use_entral_value: True
central_value_config:
normalize_input: True
learning_rate: 0.0005
input_shape: [3, 320, 480]
model:
name: continuous_a2c_logstd
network:
name: resnet_actor_critic
separate: False
value_shape: 1
space:
discrete:
cnn:
conv_depths: [ 16, 32, 32 ]
activation: relu
initializer:
name: default
regularizer:
name: 'None'
mlp:
units: [ 256, 128, 64 ]
activation: elu
d2rl: False
initializer:
name: default
regularizer:
name: None
I set use_entral_value to True and set central_value_config. But an error occurred as
Traceback (most recent call last):
File "train.py", line 133, in launch_rlg_hydra
'checkpoint': cfg.checkpoint
File "/home/quan/rl_games/rl_games/torch_runner.py", line 109, in run
self.run_train(args)
File "/home/quan/rl_games/rl_games/torch_runner.py", line 88, in run_train
agent = self.algo_factory.create(self.algo_name, base_name='run', params=self.params)
File "/home/quan/rl_games/rl_games/common/object_factory.py", line 15, in create
return builder(**kwargs)
File "/home/quan/rl_games/rl_games/torch_runner.py", line 38, in <lambda>
self.algo_factory.register_builder('a2c_continuous', lambda **kwargs : a2c_continuous.A2CAgent(**kwargs))
File "/home/quan/rl_games/rl_games/algos_torch/a2c_continuous.py", line 59, in __init__
self.central_value_net = central_value.CentralValueTrain(**cv_config).to(self.ppo_device)
File "/home/quan/rl_games/rl_games/algos_torch/central_value.py", line 37, in __init__
self.model = network.build(state_config)
File "/home/quan/rl_games/rl_games/algos_torch/models.py", line 28, in build
return self.Network(self.network_builder.build(self.model_class, **config), obs_shape=obs_shape,
File "/home/quan/rl_games/rl_games/algos_torch/network_builder.py", line 766, in build
net = A2CResnetBuilder.Network(self.params, **kwargs)
File "/home/quan/rl_games/rl_games/algos_torch/network_builder.py", line 599, in __init__
NetworkBuilder.BaseNetwork.__init__(self, **kwargs)
File "/home/quan/rl_games/rl_games/algos_torch/network_builder.py", line 35, in __init__
nn.Module.__init__(self, **kwargs)
TypeError: __init__() got an unexpected keyword argument 'num_agents'
So, is there any config file that I can refer to?
Hi, on the computing infrastructure I am using I need to continue interrupted training regularly. I have been trying to use the checkpointing utility (for PPO, but I think these issues appear for all) to reload the checkpoints, but the training doe not actually continue from those checkpoints. I believe that is because other important parameters such as the optimizer are not stored in the checkpoints (please correct me if I am wrong).
In the image below, I interrupted two runs with the same seed at two different states and continued training from the latest checkpoint.
Would it be possible to checkpoint all components of the algorithms to enable continuing training from a checkpoint?
Running
python runner.py --train --file rl_games/configs/brax/ppo_ant.yaml
trains so fast that step_time becomes 0.0 and then leads to a crash in two different places:
fps step: 1048568.0 fps step and policy inference: 699047.1 fps total: 299589.9 epoch: 3/1000
fps step: 1048576.0 fps step and policy inference: 699047.1 fps total: 299589.9 epoch: 4/1000
fps step: 699050.7 fps step and policy inference: 419429.1 fps total: 262141.5 epoch: 5/1000
fps step: 524284.0 fps step and policy inference: 524284.0 fps total: 253261.5 epoch: 6/1000
fps step: 613857.2 fps step and policy inference: 613857.2 fps total: 282772.3 epoch: 7/1000
fps step: 699043.6 fps step and policy inference: 524280.0 fps total: 259049.0 epoch: 8/1000
fps step: 699040.0 fps step and policy inference: 419433.0 fps total: 233017.7 epoch: 9/1000
fps step: 699054.2 fps step and policy inference: 524286.0 fps total: 262141.5 epoch: 10/1000
fps step: 699054.2 fps step and policy inference: 524282.0 fps total: 262140.5 epoch: 11/1000
fps step: 699047.1 fps step and policy inference: 524284.0 fps total: 262141.5 epoch: 12/1000
fps step: 699043.6 fps step and policy inference: 349519.1 fps total: 209712.6 epoch: 13/1000
fps step: 524282.0 fps step and policy inference: 349520.9 fps total: 209712.3 epoch: 14/1000
fps step: 699040.0 fps step and policy inference: 349520.9 fps total: 234601.1 epoch: 15/1000
Traceback (most recent call last):
File "runner.py", line 67, in <module>
runner.run(args)
File "F:\dev\rl_games\rl_games\torch_runner.py", line 122, in run
self.run_train(args)
File "F:\dev\rl_games\rl_games\torch_runner.py", line 103, in run_train
agent.train()
File "F:\dev\rl_games\rl_games\common\a2c_common.py", line 1158, in train
self.write_stats(total_time, epoch_num, step_time, play_time, update_time, a_losses, c_losses, entropies, kls, last_lr, lr_mul, frame, scaled_time, scaled_play_time, curr_frames)
File "F:\dev\rl_games\rl_games\common\a2c_common.py", line 284, in write_stats
self.writer.add_scalar('performance/step_fps', curr_frames / step_time, frame)
ZeroDivisionError: float division by zero
fps step: 744015.2 fps step and policy inference: 488626.7 fps total: 301957.5 epoch: 741/1000
fps step: 699032.9 fps step and policy inference: 523978.2 fps total: 286478.0 epoch: 742/1000
fps step: 795304.5 fps step and policy inference: 432675.5 fps total: 328279.1 epoch: 743/1000
fps step: 2097216.0 fps step and policy inference: 524284.0 fps total: 299591.2 epoch: 744/1000
fps step: 524276.0 fps step and policy inference: 524276.0 fps total: 299587.9 epoch: 745/1000
fps step: 524286.0 fps step and policy inference: 524286.0 fps total: 299591.2 epoch: 746/1000
Traceback (most recent call last):
File "runner.py", line 67, in <module>
runner.run(args)
File "F:\dev\rl_games\rl_games\torch_runner.py", line 122, in run
self.run_train(args)
File "F:\dev\rl_games\rl_games\torch_runner.py", line 103, in run_train
agent.train()
File "F:\dev\rl_games\rl_games\common\a2c_common.py", line 1153, in train
fps_step = curr_frames / step_time
ZeroDivisionError: float division by zero
Hi there, I am just wondering, in A2CBuilder
in Network
, which cuda device is actually used for the training and how would I hand down the sim_device
respectively rl_device
variables from env_creator: create_env_thunk
in Isaacgmyenvs or alike into the Network class so I can put the tensors onto the right GPU for torchrun
Multi-GPU training? Or is the right cuda-device set automatically?
Kind regards
I am using rl-games with IsaacGym to train my RL agent. However, when I was trying to use the --checkpoint=
command line argument to resume the training, I found that the training always restarts from the very begining. I uses rl-games in the way below:
runner = Runner(algo_observer)
runner.load(cfg_train)
runner.reset()
runner.run(args)
and resume my training with command:
$ python ./rlg_train.py --task=[my_task_name] --checkpoint=[absolute path of trained model]
I take a look at the source code, and found that the class method Runner.run_train(self)
has a duplicated load_config()
command.
else:
self.reset()
**self.load_config(self.default_config)**
This line causes the command line argument --checkpoint
be covered by configurations in config file.
I thought that this command should be deleted, and another command should be added in Runner.run(self, arg)
function:
if 'checkpoint' in args and args['checkpoint'] is not None:
if len(args['checkpoint']) > 0:
**self.load_check_point = True**
self.load_path = args['checkpoint']
so that I can use command line argument to resume the training without modifying my config file.
Could you please take a look and check if I've gotten it right? Thanks a lot!
Hi,
I have a few doubts wrt to implementing multiple agents in Isaac Gym (or Brax). (apologies if they are too trivial)
I want to use 2 or more agents in the same experiment (agents will have different environments, especially if Domain Randomisation is enabled) and train them sequentially (i.e. first Agent 1 gets trained via PPO, then Agent 2 and so on...)
How can I go about implementing this? I am not sure which files I should be modifying and how to configure train.py
to support the above functionality.
Thanks!
Hi, I came into the error
Traceback (most recent call last):
File "./train.py", line 110, in launch_rlg_hydra
runner.run({
File "/home/fuchaojie/DATA_UBUNTU/Isaac_env/lib/python3.8/site-packages/rl_games/torch_runner.py", line 139, in run
self.run_train()
File "/home/fuchaojie/DATA_UBUNTU/Isaac_env/lib/python3.8/site-packages/rl_games/torch_runner.py", line 125, in run_train
agent.train()
File "/home/fuchaojie/DATA_UBUNTU/Isaac_env/lib/python3.8/site-packages/rl_games/common/a2c_common.py", line 1143, in train
step_time, play_time, update_time, sum_time, a_losses, c_losses, b_losses, entropies, kls, last_lr, lr_mul = self.train_epoch()
File "/home/fuchaojie/DATA_UBUNTU/Isaac_env/lib/python3.8/site-packages/rl_games/common/a2c_common.py", line 1023, in train_epoch
self.train_central_value()
File "/home/fuchaojie/DATA_UBUNTU/Isaac_env/lib/python3.8/site-packages/rl_games/common/a2c_common.py", line 521, in train_central_value
return self.central_value_net.train_net()
File "/home/fuchaojie/DATA_UBUNTU/Isaac_env/lib/python3.8/site-packages/rl_games/algos_torch/central_value.py", line 176, in train_net
loss += self.train_critic(self.dataset[idx])
File "/home/fuchaojie/DATA_UBUNTU/Isaac_env/lib/python3.8/site-packages/rl_games/algos_torch/central_value.py", line 155, in train_critic
loss = self.calc_gradients(input_dict)
File "/home/fuchaojie/DATA_UBUNTU/Isaac_env/lib/python3.8/site-packages/rl_games/algos_torch/central_value.py", line 201, in calc_gradients
values, _ = self.forward(batch_dict)
File "/home/fuchaojie/DATA_UBUNTU/Isaac_env/lib/python3.8/site-packages/rl_games/algos_torch/central_value.py", line 136, in forward
value, rnn_states = self.model(input_dict)
File "/home/fuchaojie/DATA_UBUNTU/Isaac_env/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1102, in _call_impl
return forward_call(*input, **kwargs)
File "/home/fuchaojie/DATA_UBUNTU/Isaac_env/lib/python3.8/site-packages/rl_games/algos_torch/network_builder.py", line 403, in forward
out, states = self.rnn(out, states)
File "/home/fuchaojie/DATA_UBUNTU/Isaac_env/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1102, in _call_impl
return forward_call(*input, **kwargs)
File "/home/fuchaojie/DATA_UBUNTU/Isaac_env/lib/python3.8/site-packages/torch/nn/modules/rnn.py", line 691, in forward
result = _VF.lstm(input, hx, self._flat_weights, self.bias, self.num_layers,
RuntimeError: rnn: hx is not contiguous
Set the environment variable HYDRA_FULL_ERROR=1 for a complete stack trace.
when running IsaacGym (Preview 3)'s ShadowHandOpenAI_LSTM
example with the parameter layers
in training config file ShadowHandPPOAsymmLSTM.yaml
set to 2. It seems like that rl_games don't currently support multilayer-LSTM. Is it true or it's just a bug?
I was modifying some of the rl_games code when I noticed newer version do not work with Issac Gym. Prior to this merge #113 things appear to be working correctly.
Dear colleagues,
Thanks for your great contribution!
I found one problem when:
$ import rl_games.common.env_configurations
My PC will output: Segmentation fault (core dumped)
I tried many different virtual environment but the same problem occurs.
Finally, I solved the problem by move import rl_games.envs.test
into line 1 in env_configurations.py.
I do not know the exact reason why it could work after moving line 3 to the beginning.
Just let you know in case it is a potential bug.
PC environment: python 3.7, ubuntu 18.04, AMD 3990x cpu, NV RTX 3080.
Running
python rlg_train.py --task Ant
from carbgym/python/rlgpu
gives the following error:
Traceback (most recent call last): File "rlg_train.py", line 13, in <module> from rl_games.common import env_configurations, experiment, vecenv File "/home/dcg-adlr-gradeyw-source/rl_games/rl_games/common/env_configurations.py", line 1, in <module> from rl_games.common import wrappers File "/home/dcg-adlr-gradeyw-source/rl_games/rl_games/common/wrappers.py", line 8, in <module> import cv2 File "/opt/conda/lib/python3.6/site-packages/cv2/__init__.py", line 5, in <module> from .cv2 import * ImportError: libGL.so.1: cannot open shared object file: No such file or directory
Using the master branch of carbgym and Python 3.6
>>> import rl_games
>>> rl_games.__
rl_games.__cached__ rl_games.__doc__ rl_games.__getattribute__( rl_games.__le__( rl_games.__new__( rl_games.__repr__( rl_games.__subclasshook__(
rl_games.__class__( rl_games.__eq__( rl_games.__gt__( rl_games.__loader__ rl_games.__package__ rl_games.__setattr__(
rl_games.__delattr__( rl_games.__file__ rl_games.__hash__( rl_games.__lt__( rl_games.__path__ rl_games.__sizeof__(
rl_games.__dict__ rl_games.__format__( rl_games.__init__( rl_games.__name__ rl_games.__reduce__( rl_games.__spec__
rl_games.__dir__( ```
I don't see `rl_games.__version__` so if you install rl_games you can't tell which version you have.
Hi! Been using rl_games for a few months, awesome work guys :) Was wondering if the SAC integration will be ready anytime soon to try out?
Thanks!
The performance of your project is shocking. I want to know why CNN+PPO can be so excellent.
Thank you!!!!!!
Hi to all,
First, congrats on the work. It is truly appealing.
I came to use RL-games repo through the IsaacGymEnvs repository. I am extending several of my works to operate on this new simulator with IsaacGymEnvs which have as dependency the 1.1.3 version of rl-games.
This issue is more of a request/advise to have a more organized and structured repository and collaboration framework. Things that I believe could help encouraging contributions from third party and a larger adoption of your repository are:
frame
, step
), (epoch
, episode
) which lead to confusions and probably bugs as these are in reality different concepts.Perhaps it would be useful to clarify and standarize some concepts like:
step
: Step can reefer to simulation step, or to a single agent simulation/experience step. In this case of parallel simulation a single simulation step accounts for multiple experience "steps".frame
: Its unclear how you use this concept, sometimes it seems to reefer to a simulation step, and sometimes to epochs.epoch
: User defined numerical value to trigger a logging sequence of metrics and statistics. The units with which you define this and other frequency variables (e.g., max_iterations
) is also ambiguous as sometimes it appears to be defined in terms of samples of experience collected (preferably) and sometimes in epochs or batches (dependent on specific size of batch or epoch).actor
: in your implementation of SAC the concepts of actors
, agents
and envs
are constantly interchanged generating confusion. For the sake of generality (multi agent) an env
might hold multiple agents
and each agent could have multiple actor
networks.I was cleaning and fixing some of the bugs of your SAC implementation when I found all of this problems, which have made it really difficult to contribute, and to work with different versions of the repo code with which IsaacGymEnvs depends on.
I've been testing the PPO implementation, and It doesn't seem like it is currently possible to export a model as a c++ compatible module.
Is it something you are planning?
If not, I could try to give it a go, though would appreciate it if you have any pointers.
I found that the number of subprocesses in the yaml file is 8.
However, the number of subprocesses used in QMIX is 1.
I am running PPO with wandb integration, but the statistics seem to not be recorded as intended.
I am testing this with Isaac Gym environments but I am unsure if this issue is specific to Isaac Gym.
Steps to reproduce: after installing following the IsaacGymEnvs instructions, run a command like this in the isaacgymenvs/
directory:
python train.py task=Ant headless=True wandb_activate=True wandb_entity=danieltakeshi wandb_project=isaac-gym
Where you can replace danieltakeshi
with your username, and change isaac-gym
to your project.
After I run this, the reward goes up (good) but I also see this on wandb:
The code is recording the reward as a function of iter
, step
, and time
. It stores it in rl_games
here:
rl_games/rl_games/common/a2c_common.py
Lines 947 to 955 in d8645b2
The code is storing the statistics with respect to different quantities (epoch, step, and time) to the self.writer
which is a tensorboardX.SummaryWriter
(link to docs). But the statistics on wandb seem to only show the x-axis as "iter
" (which is the same as epoch_num
here) and they don't show performance as a function of the step or time. Is there a way to address such an issue here?
(Also posting on the Isaac Gym repo isaac-sim/IsaacGymEnvs#87)
Hi there,
I was just wondering whether the RNN Experience Replay is implemented right?
The reason is that in play_steps_rnn()
update_data()
is called but not update_data_rnn()
.
Specifically for replaying experiences in RNN a whole seq_len
would have to be replayed for the GRU or LSTM respectively to deliver right results, right? Or is the current state of the GRU/LSTM cells also stored in the replay buffer in each step?
Or maybe I haven't fully understood these concepts in RL yet.
Kind regards
I am trying to use the BlackjackEnv (https://github.com/openai/gym/blob/master/gym/envs/toy_text/blackjack.py) in rl_games. It seems rl_games doesn't support the discrete observation space like:
spaces.Tuple((spaces.Discrete(32), spaces.Discrete(11), spaces.Discrete(2)))
Any plan to support this feature?
I am trying to use the Squashed Normal distribution for training a PPO agent to bound the action space. For the SquashedNormal
distribution, entropy is assumed to be equal to entropy of the base (Normal) distribution, which ignores the additional (E[log(d(tanh)/dx)]) term. Would using entropy of the underlying Normal distribution as a proxy (since entropy for the new distribution does not have a closed form) cause any stability issues?
Thank you!
Hi there
Why is horovod needed if you have ray? Ray can also run on multiple GPUs. And do both not interfere? And where are the parameters handled? In rays database or hvd?
Kind regards
Hi, I was trying to update rl_games to the latest 1.4.0 version.
However, it shows that the latest version of rl_games failed to achieve the same performance in 1.1.4 which the Isaac Gym requires.
The environment I'm testing with is Humanoid, and the command I used is as follows:
python train.py task=Humanoid headless=True
and
python train.py task=Humanoid checkpoint=runs/Humanoid/nn/last_Humanoid_ep_500_rew_5396.84.pth test=True num_envs=9
When using 1.1.4, the humanoids can run forward, but with the latest version of rl_games, all the humanoids just collapse where they start.
May I know the changes between 1.1.4 and the latest version?
Or should I change something for the yaml config file to make it work in the latest version?
The latest rl_games imports turtle, which imports tkinter, leading to this error. Is this an absolutely unavoidable import?
FYI: I commented-out that 1st line from rl_games/algos_torch/sac_agent.py (from turtle import shape), and things seem to work fine without that import.
Traceback (most recent call last):
File "runner.py", line 44, in <module>
from rl_games.torch_runner import Runner
File "F:\dev\rl_games\rl_games\torch_runner.py", line 19, in <module>
from rl_games.algos_torch import sac_agent
File "F:\dev\rl_games\rl_games\algos_torch\sac_agent.py", line 1, in <module>
from turtle import shape
File "c:\python37\lib\turtle.py", line 107, in <module>
import tkinter as TK
File "c:\python37\lib\tkinter\__init__.py", line 36, in <module>
import _tkinter # If this fails your Python may not be configured for Tk
ModuleNotFoundError: No module named '_tkinter'
In IsaacGymEnvs, rl-games + multiGPU seems to have some issues. As shown in the screenshot, rl-games + multiGPU performs uses twice amount of data and performs worse than the single GPU setting in Ant
This issue tracks the investigation of this issue.
I suggest making sure we make sure there is no loss in sample efficiency first before scaling to more envs by matching implementation details in our prototype in CleanRL: https://cleanrl-git-new-multi-gpu-vwxyzjn.vercel.app/rl-algorithms/ppo/#implementation-details_6.
We need to seed multiGPU processes with different seeds to decorrelate experience, otherwise the multiGPU processes will produce the exact observations.
Configuration-wise we can set the overall seed with params.seed
and env seed with params.config.env_config.seed
, so if params.config.env_config.seed
is set but params.seed
is not set, we get identical observations from the environments as shown below:
This is probably ok since the agent still samples different actions, but it's nonetheless a problem. The correct implementation is to use seed = seed + local_rank
.
After fixing #163, I was able to match the sample efficiency in the single GPU setting:
However, the wall time is slower than I had expected. On a separate benchmark I made with CleanRL, the experiments show horovod should make Ant step 20% faster.
Maybe it's the averaging stats overhead? In the CleanRL benchmark experiments I did not mess with stats at all.
Hi,
When I tried to run branch DM/torch_gpu
with command python3 torch_runner.py --train --file configs/ppo_smac_cnn.yaml
, I got the following complaint:
Traceback (most recent call last):
File "torch_runner.py", line 141, in <module>
runner.run(args)
File "torch_runner.py", line 111, in run
self.run_train()
File "torch_runner.py", line 95, in run_train
agent = self.algo_factory.create(self.algo_name, base_name='run', observation_space=obs_space, action_space=action_space, config=self.config)
File "/pymarl/common/object_factory.py", line 12, in create
return builder(**kwargs)
File "torch_runner.py", line 25, in <lambda>
self.algo_factory.register_builder('a2c_discrete', lambda **kwargs : a2c_discrete.DiscreteA2CAgent(**kwargs))
File "/pymarl/algos_torch/a2c_discrete.py", line 18, in __init__
self.model = self.network.build(config)
File "/pymarl/algos_torch/models.py", line 25, in build
(pid=67) Game has started.
return ModelA2C.Network(self.network_builder.build('a2c', **config))
File "/pymarl/algos_torch/network_builder.py", line 297, in build
net = A2CBuilder.Network(self.params, **kwargs)
(pid=67) Sending ResponseJoinGame
File "/pymarl/algos_torch/network_builder.py", line 157, in __init__
'input_size' : self._calc_input_size(input_shape, self.actor_cnn),
File "/pymarl/algos_torch/network_builder.py", line 58, in _calc_input_size
return nn.Sequential(*cnn_layers)(torch.rand(1, *(input_shape))).flatten(1).data.size(1)
File "/usr/local/lib/python3.6/dist-packages/torch/nn/modules/module.py", line 550, in __call__
result = self.forward(*input, **kwargs)
File "/usr/local/lib/python3.6/dist-packages/torch/nn/modules/container.py", line 100, in forward
input = module(input)
File "/usr/local/lib/python3.6/dist-packages/torch/nn/modules/module.py", line 550, in __call__
result = self.forward(*input, **kwargs)
File "/usr/local/lib/python3.6/dist-packages/torch/nn/modules/conv.py", line 208, in forward
self.padding, self.dilation, self.groups)
TypeError: conv1d(): argument 'padding' (position 5) must be tuple of ints, not str
Would you like to help me solve it? Or give me any guidline of how to run the ppo to get reported performance?
Really thanks
i'm running this command to "play" my trained model without using the gpu:
python train.py task=Ant test=True checkpoint=cp.pth num_envs=4 sim_device=cpu rl_device=cpu pipeline=cpu
but i still get this CUDA memory error sometimes if i try to run this while a model is being trained in a different terminal window:
Error executing job with overrides: ['task=Ant', 'test=True', 'checkpoint=cp.pth', 'num_envs=4', 'sim_device=cpu', 'rl_device=cpu', 'pipeline=cpu']
Traceback (most recent call last):
File "train.py", line 134, in <module>
launch_rlg_hydra()
File "/home/stuart/miniconda3/envs/rlgpu/lib/python3.7/site-packages/hydra/main.py", line 52, in decorated_main
config_name=config_name,
File "/home/stuart/miniconda3/envs/rlgpu/lib/python3.7/site-packages/hydra/_internal/utils.py", line 378, in _run_hydra
lambda: hydra.run(
File "/home/stuart/miniconda3/envs/rlgpu/lib/python3.7/site-packages/hydra/_internal/utils.py", line 214, in run_and_report
raise ex
File "/home/stuart/miniconda3/envs/rlgpu/lib/python3.7/site-packages/hydra/_internal/utils.py", line 211, in run_and_report
return func()
File "/home/stuart/miniconda3/envs/rlgpu/lib/python3.7/site-packages/hydra/_internal/utils.py", line 381, in <lambda>
overrides=args.overrides,
File "/home/stuart/miniconda3/envs/rlgpu/lib/python3.7/site-packages/hydra/_internal/hydra.py", line 111, in run
_ = ret.return_value
File "/home/stuart/miniconda3/envs/rlgpu/lib/python3.7/site-packages/hydra/core/utils.py", line 233, in return_value
raise self._return_value
File "/home/stuart/miniconda3/envs/rlgpu/lib/python3.7/site-packages/hydra/core/utils.py", line 160, in run_job
ret.return_value = task_function(task_cfg)
File "train.py", line 130, in launch_rlg_hydra
'play': cfg.test,
File "/home/stuart/miniconda3/envs/rlgpu/lib/python3.7/site-packages/rl_games/torch_runner.py", line 142, in run
player = self.create_player()
File "/home/stuart/miniconda3/envs/rlgpu/lib/python3.7/site-packages/rl_games/torch_runner.py", line 128, in create_player
return self.player_factory.create(self.algo_name, config=self.config)
File "/home/stuart/miniconda3/envs/rlgpu/lib/python3.7/site-packages/rl_games/common/object_factory.py", line 15, in create
return builder(**kwargs)
File "/home/stuart/miniconda3/envs/rlgpu/lib/python3.7/site-packages/rl_games/torch_runner.py", line 29, in <lambda>
self.player_factory.register_builder('a2c_continuous', lambda **kwargs : players.PpoPlayerContinuous(**kwargs))
File "/home/stuart/miniconda3/envs/rlgpu/lib/python3.7/site-packages/rl_games/algos_torch/players.py", line 28, in __init__
self.actions_low = torch.from_numpy(self.action_space.low.copy()).float().to(self.device)
File "/home/stuart/miniconda3/envs/rlgpu/lib/python3.7/site-packages/torch/cuda/__init__.py", line 170, in _lazy_init
torch._C._cuda_init()
RuntimeError: CUDA error: out of memory
i asked in the nvidia forum too but thought i would check here if it's an unavoidable rl_games thing
also, the memory error persists until i reboot. is that a memory leak? or is there any way rl_games could clear the gpu memory?
Amazing repo! I was wondering if you could help me clarify the confusion I have around the recurrent layer implementations.
I found that the input to A2CBuilder.Network.forward()
seems to only have a sequence of 1, even though in the yaml, it's a non 1 value.
I am currently on commit a33b6c4d easy fix (#145)
, up to date with the most recent master commit
I ran this command:
python runner.py --train --file rl_games/configs/ppo_lunar_continiuos_torch.yaml
with a breakpoint at rl_games/algos_torch/network_builder.py:341~342
the shape of a_out
, a_states
, c_out
, c_states
are all torch.size([1, 16, 64])
(seq_length, batch_size, input_dim from previous mlp)
Although, in the yaml file. params.config.seq_length: 4
which I assumed to be the length of the rnn sequence.
I also didn't find a mechanism in the code that passes in a sequence of inputs to the RNN.
I'm wondering if I missed something? or if this feature is not yet implemented?
Trying to run multi-gpu training with horovod, I get the following error:
[1,1]<stderr>:/opt/conda/lib/python3.8/site-packages/horovod/torch/sync_batch_norm.py:33: DeprecationWarning: distutils Version classes are deprecated. Use packaging.version instead.
[1,1]<stderr>: LooseVersion(torch.__version__) >= LooseVersion('1.5.0') and
[1,1]<stderr>:/opt/conda/lib/python3.8/site-packages/gym/spaces/box.py:84: UserWarning: WARN: Box bound precision lowered by casting to float32
[1,1]<stderr>: logger.warn(f"Box bound precision lowered by casting to {self.dtype}")
[1,1]<stderr>:/workspace/isaacgymenvs/isaacgymenvs/tasks/allegro_hand.py:275: DeprecationWarning: an integer is required (got type isaacgym._bindings.linux-x86_64.gym_38.DofDriveMode). Implicit conversion to integers using __int__ is deprecated, and may be removed in a future version of Python.
[1,1]<stderr>: asset_options.default_dof_drive_mode = gymapi.DOF_MODE_POS
[1,1]<stderr>:/opt/conda/lib/python3.8/site-packages/horovod/common/util.py:227: DeprecationWarning: Parameter `average` has been replaced with `op` and will be removed in v0.21.0
[1,1]<stderr>: warnings.warn('Parameter `average` has been replaced with `op` and will be removed in v0.21.0',
[1,1]<stderr>:Error executing job with overrides: ['task=AllegroHandLSTM', 'headless=True', 'multi_gpu=True', 'train.params.config.mixed_precision=False']
[1,1]<stderr>:Traceback (most recent call last):
[1,1]<stderr>: File "train.py", line 137, in launch_rlg_hydra
[1,1]<stderr>: runner.run({
[1,1]<stderr>: File "/opt/conda/lib/python3.8/site-packages/rl_games/torch_runner.py", line 97, in run
[1,1]<stderr>: self.run_train(args)
[1,1]<stderr>: File "/opt/conda/lib/python3.8/site-packages/rl_games/torch_runner.py", line 78, in run_train
[1,1]<stderr>: agent.train()
[1,1]<stderr>: File "/opt/conda/lib/python3.8/site-packages/rl_games/common/a2c_common.py", line 1141, in train
[1,1]<stderr>: step_time, play_time, update_time, sum_time, a_losses, c_losses, b_losses, entropies, kls, last_lr, lr_mul = self.train_epoch()
[1,1]<stderr>: File "/opt/conda/lib/python3.8/site-packages/rl_games/common/a2c_common.py", line 1012, in train_epoch
[1,1]<stderr>: self.train_central_value()
[1,1]<stderr>: File "/opt/conda/lib/python3.8/site-packages/rl_games/common/a2c_common.py", line 516, in train_central_value
[1,1]<stderr>: return self.central_value_net.train_net()
[1,1]<stderr>: File "/opt/conda/lib/python3.8/site-packages/rl_games/algos_torch/central_value.py", line 194, in train_net
[1,1]<stderr>: self.update_lr(self.lr)
[1,1]<stderr>: File "/opt/conda/lib/python3.8/site-packages/rl_games/algos_torch/central_value.py", line 79, in update_lr
[1,1]<stderr>: self.hvd.broadcast_value(lr_tensor, 'cv_learning_rate')
[1,1]<stderr>: File "/opt/conda/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1185, in __getattr__
[1,1]<stderr>: raise AttributeError("'{}' object has no attribute '{}'".format(
[1,1]<stderr>:AttributeError: 'CentralValueTrain' object has no attribute 'hvd'
[1,1]<stderr>:
[1,1]<stderr>:Set the environment variable HYDRA_FULL_ERROR=1 for a complete stack trace.
[1,0]<stderr>:/opt/conda/lib/python3.8/site-packages/horovod/torch/sync_batch_norm.py:33: DeprecationWarning: distutils Version classes are deprecated. Use packaging.version instead.
[1,0]<stderr>: LooseVersion(torch.__version__) >= LooseVersion('1.5.0') and
[1,0]<stderr>:/opt/conda/lib/python3.8/site-packages/gym/spaces/box.py:84: UserWarning: WARN: Box bound precision lowered by casting to float32
[1,0]<stderr>: logger.warn(f"Box bound precision lowered by casting to {self.dtype}")
[1,0]<stderr>:/workspace/isaacgymenvs/isaacgymenvs/tasks/allegro_hand.py:275: DeprecationWarning: an integer is required (got type isaacgym._bindings.linux-x86_64.gym_38.DofDriveMode). Implicit conversion to integers using __int__ is deprecated, and may be removed in a future version of Python.
[1,0]<stderr>: asset_options.default_dof_drive_mode = gymapi.DOF_MODE_POS
[1,0]<stderr>:/opt/conda/lib/python3.8/site-packages/horovod/common/util.py:227: DeprecationWarning: Parameter `average` has been replaced with `op` and will be removed in v0.21.0
[1,0]<stderr>: warnings.warn('Parameter `average` has been replaced with `op` and will be removed in v0.21.0',
[1,0]<stderr>:Error executing job with overrides: ['task=AllegroHandLSTM', 'headless=True', 'multi_gpu=True', 'train.params.config.mixed_precision=False']
[1,0]<stderr>:Traceback (most recent call last):
[1,0]<stderr>: File "train.py", line 137, in launch_rlg_hydra
[1,0]<stderr>: runner.run({
[1,0]<stderr>: File "/opt/conda/lib/python3.8/site-packages/rl_games/torch_runner.py", line 97, in run
[1,0]<stderr>: self.run_train(args)
[1,0]<stderr>: File "/opt/conda/lib/python3.8/site-packages/rl_games/torch_runner.py", line 78, in run_train
[1,0]<stderr>: agent.train()
[1,0]<stderr>: File "/opt/conda/lib/python3.8/site-packages/rl_games/common/a2c_common.py", line 1141, in train
[1,0]<stderr>: step_time, play_time, update_time, sum_time, a_losses, c_losses, b_losses, entropies, kls, last_lr, lr_mul = self.train_epoch()
[1,0]<stderr>: File "/opt/conda/lib/python3.8/site-packages/rl_games/common/a2c_common.py", line 1012, in train_epoch
[1,0]<stderr>: self.train_central_value()
[1,0]<stderr>: File "/opt/conda/lib/python3.8/site-packages/rl_games/common/a2c_common.py", line 516, in train_central_value
[1,0]<stderr>: return self.central_value_net.train_net()
[1,0]<stderr>: File "/opt/conda/lib/python3.8/site-packages/rl_games/algos_torch/central_value.py", line 194, in train_net
[1,0]<stderr>: self.update_lr(self.lr)
[1,0]<stderr>: File "/opt/conda/lib/python3.8/site-packages/rl_games/algos_torch/central_value.py", line 79, in update_lr
[1,0]<stderr>: self.hvd.broadcast_value(lr_tensor, 'cv_learning_rate')
[1,0]<stderr>: File "/opt/conda/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1185, in __getattr__
[1,0]<stderr>: raise AttributeError("'{}' object has no attribute '{}'".format(
[1,0]<stderr>:AttributeError: 'CentralValueTrain' object has no attribute 'hvd'
[1,0]<stderr>:
[1,0]<stderr>:Set the environment variable HYDRA_FULL_ERROR=1 for a complete stack trace.
[1,0]<stdout>:[1,1]<stdout>:--------------------------------------------------------------------------
It seems that central value module never creates or receives Horovod wrapper object.
hi there,
what does the value self.mb_rnn_states
stands for? self.rnn_states
is already set, but this value is updated when horizon episode is finished.
kind regards
Hi,
Had a couple of questions regarding extending the current functionality present in rl_games:
What would be the best way to extend one of the algos (say, A2C Continuous) to allow for an external function (like a controller function) to be called after the NN forward pass? For example, normally the forward pass (at inference time) might look like model_forward()
--> dist_from_output()
--> sample_from_dist()
, whereas I'm hoping to inject an external function after the forward pass so that the pipeline would look like model_forward()
--> external_postprocessing()
--> dist_from_output()
--> sample_from_dist()
, where external_postprocessing()
would take in the model's outputted values and return the post-processed values (potentially of a different dimension, which would be the "final" action dimension used to generate the sampling distribution; e.g.: conversion from eef commands into joint torques)
What would be the best way to include additional information to be stored in the replay buffer (to be used by the above external function)? Ideally, this would be a dict of tensors that is stored along with the normal (s, a, r, s') values for a given env step.
Working with @ViktorM on applications relevant to these features and he thought it might be best if I posted here. Thanks!
How can I get the rl_games v1.1.4 source code which is required by the IsaacGymEnvs. I want to add my own algorithms in rl_games. Therefore, I need to get the source code. Does anyone know how to do that?
Hi,
I manually controlled the ant robot from this example (https://github.com/NVIDIA-Omniverse/IsaacGymEnvs/blob/main/isaacgymenvs/tasks/ant.py) and recorded the corresponding joint angle values.
I would like to know whether there is a way of integrating this per-existing knowledge into the agent training (such as the imitation learning from SB3: (https://imitation.readthedocs.io/en/latest/algorithms/gail.html)).
On the bottom left of the example gif for Isaac Gym on the root readme on the master branch, there is a in-hand manipulation result on the Allegro Hand. Where is the source for that env? Was that created for use in rl_games? I would like to recreate those results and add to that experiment if possible. Thanks!
(this is on line 1214 in the version that isaac gym is using):
rl_games/rl_games/common/a2c_common.py
Line 1207 in a33b6c4
the only changes i made in Isaac Gym is this function in isaacgymenvs/tasks/cartpole.py
:
@torch.jit.script
def compute_cartpole_reward(pole_angle, pole_vel, cart_vel, cart_pos,
reset_dist, reset_buf, progress_buf, max_episode_length):
# type: (Tensor, Tensor, Tensor, Tensor, float, Tensor, Tensor, float) -> Tuple[Tensor, Tensor]
reward = 1 - torch.abs(pole_angle) - 0.01*torch.abs(cart_vel)
reset = reset_buf
return reward, reset
error:
(rlgpu) stuart@hp:~/repos/IsaacGymEnvs/isaacgymenvs$ python train.py task=Cartpole
...
fps step: 229110.7 fps step and policy inference: 168550.7 fps total: 121436.5
Error executing job with overrides: ['task=Cartpole']
Traceback (most recent call last):
File "train.py", line 131, in <module>
launch_rlg_hydra()
File "/home/stuart/miniconda3/envs/rlgpu/lib/python3.7/site-packages/hydra/main.py", line 52, in decorated_main
config_name=config_name,
File "/home/stuart/miniconda3/envs/rlgpu/lib/python3.7/site-packages/hydra/_internal/utils.py", line 378, in _run_hydra
lambda: hydra.run(
File "/home/stuart/miniconda3/envs/rlgpu/lib/python3.7/site-packages/hydra/_internal/utils.py", line 214, in run_and_report
raise ex
File "/home/stuart/miniconda3/envs/rlgpu/lib/python3.7/site-packages/hydra/_internal/utils.py", line 211, in run_and_report
return func()
File "/home/stuart/miniconda3/envs/rlgpu/lib/python3.7/site-packages/hydra/_internal/utils.py", line 381, in <lambda>
overrides=args.overrides,
File "/home/stuart/miniconda3/envs/rlgpu/lib/python3.7/site-packages/hydra/_internal/hydra.py", line 111, in run
_ = ret.return_value
File "/home/stuart/miniconda3/envs/rlgpu/lib/python3.7/site-packages/hydra/core/utils.py", line 233, in return_value
raise self._return_value
File "/home/stuart/miniconda3/envs/rlgpu/lib/python3.7/site-packages/hydra/core/utils.py", line 160, in run_job
ret.return_value = task_function(task_cfg)
File "train.py", line 127, in launch_rlg_hydra
'play': cfg.test,
File "/home/stuart/miniconda3/envs/rlgpu/lib/python3.7/site-packages/rl_games/torch_runner.py", line 139, in run
self.run_train()
File "/home/stuart/miniconda3/envs/rlgpu/lib/python3.7/site-packages/rl_games/torch_runner.py", line 125, in run_train
agent.train()
File "/home/stuart/miniconda3/envs/rlgpu/lib/python3.7/site-packages/rl_games/common/a2c_common.py", line 1214, in train
self.save(os.path.join(self.nn_dir, 'last_' + self.config['name'] + 'ep' + str(epoch_num) + 'rew' + str(mean_rewards)))
UnboundLocalError: local variable 'mean_rewards' referenced before assignment
(rlgpu) stuart@hp:~/repos/IsaacGymEnvs/isaacgymenvs$
hope this is helpful. i'm new to this stuff
Hi, @Denys88 . I saw an appearant performance drop during training with the latest rl_games version, the reward picture is as follows (trained with the FrankaCabinet Environment in IsaacGymEnvs):
The orange line is training with the latest version and the blue one is with the old version (v1.4.0). I found that the latest code in a2c_common.py
, there is no self.schedule_type
, and all scheduler updates as the way when self.schedule_type=='standard'
. The latest code is as follows:
for mini_ep in range(0, self.mini_epochs_num):
ep_kls = []
for i in range(len(self.dataset)):
a_loss, c_loss, entropy, kl, last_lr, lr_mul, cmu, csigma, b_loss = self.train_actor_critic(self.dataset[i])
a_losses.append(a_loss)
c_losses.append(c_loss)
ep_kls.append(kl)
entropies.append(entropy)
if self.bounds_loss_coef is not None:
b_losses.append(b_loss)
self.dataset.update_mu_sigma(cmu, csigma)
av_kls = torch_ext.mean_list(ep_kls)
if self.multi_gpu:
dist.all_reduce(av_kls, op=dist.ReduceOp.SUM)
av_kls /= self.rank_size
self.last_lr, self.entropy_coef = self.scheduler.update(self.last_lr, self.entropy_coef, self.epoch_num, 0, av_kls.item())
self.update_lr(self.last_lr)
When I changed the code as follows
for mini_ep in range(0, self.mini_epochs_num):
ep_kls = []
for i in range(len(self.dataset)):
a_loss, c_loss, entropy, kl, last_lr, lr_mul, cmu, csigma, b_loss = self.train_actor_critic(self.dataset[i])
a_losses.append(a_loss)
c_losses.append(c_loss)
ep_kls.append(kl)
entropies.append(entropy)
if self.bounds_loss_coef is not None:
b_losses.append(b_loss)
self.dataset.update_mu_sigma(cmu, csigma)
if self.multi_gpu:
dist.all_reduce(av_kls, op=dist.ReduceOp.SUM)
av_kls /= self.rank_size
self.last_lr, self.entropy_coef = self.scheduler.update(self.last_lr, self.entropy_coef, self.epoch_num, 0, av_kls.item())
self.update_lr(self.last_lr)
av_kls = torch_ext.mean_list(ep_kls)
Then the performance is as before.
So, why you choose to remove the selection of self.scheduler_type? Will it be better when you set the default self.schedule_type='legacy'
as before?
test=True
with a checkpoint. @ArthurAllshire has already done it but I think it would be good to have that in the same wrapper. Should be pretty straightforward and will make our lives very easy.It seems brax changed their API and the brax_visualization.ipynb
needs some quick fix:
in cell#3, line#5:
config = runner.get_prebuilt_config()
needs to be commented/removed
in cell#3, line#8:
env_config = config['env_config']
should change to
env_config = runner.params['config']['env_config']
In cell#5, line#14:
env.state.qp
should change to env.env._state.qp
In cell#5, line#17:
env.step(act.unsqueeze(0))
should change to env.step(act)
in cell#7,
display(visualize(env.env.sys, qps))
should change to
display(visualize(env.env._env.sys, qps))
Value bootstrap is calculated here:
rl_games/rl_games/common/a2c_common.py
Line 618 in 92525ce
Essentially, what the code does is:
a(t) = actor(obs(t))
v(t) = critic(obs(t))
obs(t+1), rew(t), is_timeout(t) = env.step(a(t))
rew(t) += gamma * v(t) * is_timeout(t)
(1)where t
is the index of the timestep in the episode according to which timestep we populate in self.experience_buffer
(hope my notation is clear)
The idea here is that we should add the estimated return for the rest of the episode as if it was infinitely long.
So, ideally,
rew(t) += gamma * v(t+1)
(2)
instead of rew(t) += gamma * v(t)
as in (1). Using (1) is undesirable because v(t) already accounts for rew(t) and so if the environment returns a large reward on the last step it will be accounted for twice.
The thing is that we can't really get v(t+1) = critic(obs(t+1)) because if is_timeout(t) is true, done(t) will also be true, which means obs(t+1) corresponds to the next episode.
We can't estimate v(t+1) using v(t) either because v(t) = rew(t) + gamma * v(t+1) ==> gamma * v(t+1) = v(t) - rew(t)
When we use this in the equation (2) above we get:
rew(t) += v(t) - rew(t)
(3)
this just sets rew(t) to v(t) which entirely discards rew(t).
Basically this leaves us with just two options for value bootstrap, which is:
I feel like both options are really hacky and I wonder if there's even a right way to do it. What do you think? Am I missing something here?
On the other hand, both of these options are viable as long as rew(t) on the last step of the episode is negligible. If it is not, i.e. if the environment returns some non-trivial reward when is_timeout(t) is true, both options lead to incorrect learning behavior.
Hey,
first of all, thank you for the great work! I encountered your repo due to the IsaacGymEnvs and was training some Trifinger agents.
However, when trying to load the trained weights I'm getting the following error:
RuntimeError: Error(s) in loading state_dict for Network: Unexpected key(s) in state_dict: "value_mean_std.running_mean", "value_mean_std.running_var", "value_mean_std.count"
I'm running from the basically same repository and did not change any parameter in the config.
This is a great project, thanks!
The current output of rl_games to the terminal doesn't show progress.
I know you can use Tensorboard etc, but is there a way to customize the terminal output, to include information, such as Episode [4/500], similar but less verbose than rsl_rl / legged_gym?
"deterministic" is currently misspelled as "determenistic":
rl_games/rl_games/common/player.py
Line 45 in 1a89097
A few of the cfgs in IsaacGymEnvs (ex.) and OmniIsaacGymEnvs (ex.) use deterministic: True
, but this shouldn't affect them since the default value is True
.
Might be a good idea to accept both "deterministic" and "determenistic" for backwards compatibility.
Hi I am getting the error below while running the code:
Traceback (most recent call last):
File "tf14_runner.py", line 144, in <module>
runner.run(args)
File "tf14_runner.py", line 114, in run
self.run_train()
File "tf14_runner.py", line 98, in run_train
agent = self.algo_factory.create(self.algo_name, sess=self.sess, base_name='run', observation_space=obs_space, action_space=action_space, config=self.config)
File "/home/anujm/Documents/rl_games/rl_games/common/object_factory.py", line 12, in create
return builder(**kwargs)
File "tf14_runner.py", line 25, in <lambda>
self.algo_factory.register_builder('a2c_discrete', lambda **kwargs : a2c_discrete.A2CAgent(**kwargs))
File "/home/anujm/Documents/rl_games/rl_games/algos_tf14/a2c_discrete.py", line 45, in __init__
self.vec_env = vecenv.create_vec_env(self.env_name, self.num_actors, **self.env_config)
File "/home/anujm/Documents/rl_games/rl_games/common/vecenv.py", line 138, in create_vec_env
return RayVecSMACEnv(config_name, num_actors, **kwargs)
File "/home/anujm/Documents/rl_games/rl_games/common/vecenv.py", line 101, in __init__
self.num_agents = ray.get(res)
File "/home/anujm/anaconda3/envs/rlgames/lib/python3.7/site-packages/ray/worker.py", line 2193, in get
raise value
ray.exceptions.RayTaskError: ray_worker (pid=16737, host=anujm-X299-A)
File "/home/anujm/Documents/rl_games/rl_games/common/vecenv.py", line 58, in get_number_of_agents
return self.env.get_number_of_agents()
AttributeError: 'BatchedFrameStack' object has no attribute 'get_number_of_agents'
Hi, I just came across this repo. I'm quite surprised that you use envpool to achieve 2 min Pong and 20min Breakout, nice work!
I'm wondering if you'd like to open a pull request at EnvPool to link with your result (like the CleanRL ones), and if it is possible for us to include your experiment result in our incoming arXiv paper. Also, it would be great if you can make more amazing results based on EnvPool mujoco tasks (which has aligned with gym's implementation and can also get a free speedup). Thanks!
BTW, isn't it a typo?
https://github.com/Denys88/rl_games/blame/master/docs/ATARI_ENVPOOL.md#L9
-* **Breakout-v3** 20 minutes training time to achieve 20+ score.
+* **Breakout-v3** 20 minutes training time to achieve 400+ score.
Hi, thanks for the amazing work!
I am wondering how important the value normalization is? When I disable the value normalization in some tasks, especially the ShadowHand, the PPO agent doesn't work anymore. I looked up the code and it seems to me that it normalizes the returns and predicted (old) values before calculating the loss. However, the (new) value output by the model is not normalized (due to the unnorm function). So why does it work or did I misunderstand something?
Also, if I want to test Isaac Gym with the SAC codes, can I achieve it using RL games?
How can I use multiple GPUs for simulation and training? I am enabling horovod but it seems that it can only use one device.
Hi, Nice work!
I noticed your work when I was looking at the Brax repository:)
In their paper, the Brax team mentioned that their PPO implementation didn't work well on humanoid, and this bug still exists now.
Previously I had suspected that there were some bugs with the Brax env.
But your performance on the humanoid seems to demonstrate that the problem may lie in their algorithm or hyperparameters.
I'd appreciate it if you could let me know if there's anything to note when you try humanoid with Brax.
Congratulations again on your excellent work.
Hi, I notice that we have functions get_env_state() and set_env_state() to save and load the info for the environment. Does it work in Isaac Gym?
I created an environment for a new robot in a repository derived from the IssacGymEnvs preview release (https://developer.nvidia.com/isaac-gym).
I would like to log different parts of the reward function of the environment to see what the neural network optimizes first. For this I would need to either create a new torch.utils.tensorboard.SummaryWriter, or use the existing one from the A2CBase. What is the best way to log scalar values from the environment?
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.