Code Monkey home page Code Monkey logo

skimo's People

Contributors

youngwoon avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar

skimo's Issues

Can't run with MPI

Hello. I'm trying to use MPI to speedup the pre-training, but the program crashes when syncing grads.

Running without mpi (or with 1 process) is fine, but when trying with more than 1 process (mpirun -np 2 python run.py --config-name skimo_maze run_prefix=test2 gpu=0 wandb=true) I get this traceback:

Error executing job with overrides: ['run_prefix=test2', 'gpu=0', 'wandb=true']
Traceback (most recent call last):
  File "/home/flyingwolfox/tcc-src-2/skimo/run.py", line 39, in main
    SkillRLRun(cfg).run()
  File "/home/flyingwolfox/tcc-src-2/skimo/rolf/rolf/main.py", line 56, in run
    trainer.train()
  File "/home/flyingwolfox/tcc-src-2/skimo/skill_trainer.py", line 41, in train
    self._pretrain()
  File "/home/flyingwolfox/tcc-src-2/skimo/skill_trainer.py", line 76, in _pretrain
    _train_info = self._agent.pretrain()
  File "/home/flyingwolfox/tcc-src-2/skimo/skimo_agent.py", line 713, in pretrain
    _train_info = self._pretrain(batch)
  File "/home/flyingwolfox/tcc-src-2/skimo/skimo_agent.py", line 847, in _pretrain
    joint_grad_norm = self.joint_optim.step(hl_loss + ll_loss)
  File "/home/flyingwolfox/tcc-src-2/skimo/rolf/rolf/utils/pytorch.py", line 466, in step
    sync_grad(self._model, self._device)
  File "/home/flyingwolfox/tcc-src-2/skimo/rolf/rolf/utils/pytorch.py", line 152, in sync_grad
    flat_grads, grads_shape = _get_flat_grads(network)
  File "/home/flyingwolfox/tcc-src-2/skimo/rolf/rolf/utils/pytorch.py", line 175, in _get_flat_grads
    for key_name, value in network.named_parameters():
AttributeError: 'list' object has no attribute 'named_parameters'

I tried to transform the network list into torch.nn.Sequential (before _get_flat_grads() call), but that didn't work either, getting the grad of the reward and critic modules fails (https://pastebin.com/Wu7fp0sP)

Is it possible to run with mpi? If so, how can I make it? Thanks

SPiRL model pre-trained on CALVIN or relevant dataloading, config files

Dear authors, thank you for such a smooth-running code.

I would greatly appreciate if you provided the SPiRL model pre-trained on CALVIN. If unavailable, please share the SPiRL hyperparameters and the custom data loader code you must have written for CALVIN. That will allow me to run SPiRL + X baselines correctly.

I understand that this request concerns SPiRL more than Skimo. But, a fair comparison of Skimo along with other baselines reported in the original paper would help my work a lot. I look forward to your response.

help

I want to know about some details of the project, so I need the code of the project, can you send me the code?
Thank you.

Results in Maze navigation

Hello, I am very interested in this work. I have a question about the SPiRL baseline in Maze navigation task (Figure 4, left).

In this paper, Figure 4 left, at 2M steps, the success rate is only about 0.6. However, in the original SPIRL paper, at 1.4M steps, the success rate is almost 100%. https://clvrai.github.io/spirl/.

Also in the Kitchen env, the results are different from the SPiRL paper.

I check this repo and you are basically using the same code as SPiRL. So did you know what made this big difference? Or because you changed the original environment? Thanks.

Calvin run length is 500 instead of 360 in the dataset

I unzipped the Calvin dataset, iterated through the dataset and was surprised to find that many of the 'obs' sequences were of length 500. This is strange because Calvin has a max episode length of 360, so what is this extra data doing there? Shouldn't the agent have been cut off?

I also don't get what information the 'dones' add, if they're always length 500 and all 0.

When training high-level policy, is it a bug to use the fixed observation(first one) while iterating in time?

Hi,

When training the high-level policy in skimo_agent.py, z_next_pred is initialized as the first observation(line 616) and it is not updated at all after that.
Assuming from the comment and the paper, it seems like there should be a function call for hl_agent.model.imagine_step to update z_next_pred to the next imagine step. However, there is no such function call.
Is it a bug? or am I missing something?

Also, the code seems to suggest using the 'encoded ground-truth state' for the task policy when calculating the skill_prior_loss. But, in paper (Ep 7). it uses the imagined state to calculate the skill_prior_loss. I would like to know the logistics behind, why to use imagine step for the actor loss and why to use ground-truth state for the prior loss

Thank you!
image

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.