clvrai / skimo Goto Github PK

View Code? Open in Web Editor NEW

50.0 4.0 9.0 51.4 MB

Skill-based Model-based Reinforcement Learning (CoRL 2022)

Home Page: https://clvrai.com/skimo

Python 98.94% Dockerfile 0.20% Shell 0.86%

model-based skill-based reinforcement-learning robot-learning

skimo's People

Contributors

Stargazers

Watchers

Forkers

zebrajack junjungoal davidchoi76 uenian33 carl-qi jaavad beaulolve

skimo's Issues

Can't run with MPI

Hello. I'm trying to use MPI to speedup the pre-training, but the program crashes when syncing grads.

Running without mpi (or with 1 process) is fine, but when trying with more than 1 process (mpirun -np 2 python run.py --config-name skimo_maze run_prefix=test2 gpu=0 wandb=true) I get this traceback:

Error executing job with overrides: ['run_prefix=test2', 'gpu=0', 'wandb=true']
Traceback (most recent call last):
  File "/home/flyingwolfox/tcc-src-2/skimo/run.py", line 39, in main
    SkillRLRun(cfg).run()
  File "/home/flyingwolfox/tcc-src-2/skimo/rolf/rolf/main.py", line 56, in run
    trainer.train()
  File "/home/flyingwolfox/tcc-src-2/skimo/skill_trainer.py", line 41, in train
    self._pretrain()
  File "/home/flyingwolfox/tcc-src-2/skimo/skill_trainer.py", line 76, in _pretrain
    _train_info = self._agent.pretrain()
  File "/home/flyingwolfox/tcc-src-2/skimo/skimo_agent.py", line 713, in pretrain
    _train_info = self._pretrain(batch)
  File "/home/flyingwolfox/tcc-src-2/skimo/skimo_agent.py", line 847, in _pretrain
    joint_grad_norm = self.joint_optim.step(hl_loss + ll_loss)
  File "/home/flyingwolfox/tcc-src-2/skimo/rolf/rolf/utils/pytorch.py", line 466, in step
    sync_grad(self._model, self._device)
  File "/home/flyingwolfox/tcc-src-2/skimo/rolf/rolf/utils/pytorch.py", line 152, in sync_grad
    flat_grads, grads_shape = _get_flat_grads(network)
  File "/home/flyingwolfox/tcc-src-2/skimo/rolf/rolf/utils/pytorch.py", line 175, in _get_flat_grads
    for key_name, value in network.named_parameters():
AttributeError: 'list' object has no attribute 'named_parameters'

I tried to transform the network list into torch.nn.Sequential (before _get_flat_grads() call), but that didn't work either, getting the grad of the reward and critic modules fails (https://pastebin.com/Wu7fp0sP)

Is it possible to run with mpi? If so, how can I make it? Thanks

SPiRL model pre-trained on CALVIN or relevant dataloading, config files

Dear authors, thank you for such a smooth-running code.

I would greatly appreciate if you provided the SPiRL model pre-trained on CALVIN. If unavailable, please share the SPiRL hyperparameters and the custom data loader code you must have written for CALVIN. That will allow me to run SPiRL + X baselines correctly.

I understand that this request concerns SPiRL more than Skimo. But, a fair comparison of Skimo along with other baselines reported in the original paper would help my work a lot. I look forward to your response.

how to generate the pre-trained data.

Hi, thanks for sharing this wonderful code!

I would like to ask you how you generated the offline data (https://github.com/clvrai/skimo#download-offline-datasets), as I am trying to generate new pre-training data myself.

help

I want to know about some details of the project, so I need the code of the project, can you send me the code?
Thank you.

Results in Maze navigation

Hello, I am very interested in this work. I have a question about the SPiRL baseline in Maze navigation task (Figure 4, left).

In this paper, Figure 4 left, at 2M steps, the success rate is only about 0.6. However, in the original SPIRL paper, at 1.4M steps, the success rate is almost 100%. https://clvrai.github.io/spirl/.

Also in the Kitchen env, the results are different from the SPiRL paper.

I check this repo and you are basically using the same code as SPiRL. So did you know what made this big difference? Or because you changed the original environment? Thanks.

Calvin run length is 500 instead of 360 in the dataset

I unzipped the Calvin dataset, iterated through the dataset and was surprised to find that many of the 'obs' sequences were of length 500. This is strange because Calvin has a max episode length of 360, so what is this extra data doing there? Shouldn't the agent have been cut off?

I also don't get what information the 'dones' add, if they're always length 500 and all 0.

When training high-level policy, is it a bug to use the fixed observation(first one) while iterating in time?

Hi,

When training the high-level policy in skimo_agent.py, z_next_pred is initialized as the first observation(line 616) and it is not updated at all after that.
Assuming from the comment and the paper, it seems like there should be a function call for hl_agent.model.imagine_step to update z_next_pred to the next imagine step. However, there is no such function call.
Is it a bug? or am I missing something?

Also, the code seems to suggest using the 'encoded ground-truth state' for the task policy when calculating the skill_prior_loss. But, in paper (Ep 7). it uses the imagined state to calculate the skill_prior_loss. I would like to know the logistics behind, why to use imagine step for the actor loss and why to use ground-truth state for the prior loss

Thank you!

clvrai / skimo Goto Github PK

skimo's People

Contributors

Stargazers

Watchers

Forkers

skimo's Issues

Can't run with MPI

SPiRL model pre-trained on CALVIN or relevant dataloading, config files

how to generate the pre-trained data.

help

Results in Maze navigation

Calvin run length is 500 instead of 360 in the dataset

When training high-level policy, is it a bug to use the fixed observation(first one) while iterating in time?

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent