Code Monkey home page Code Monkey logo

spirl's People

Contributors

jesbu1 avatar kpertsch avatar namsan96 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar

spirl's Issues

regularization in the first stage

In Sec 3.2, equation 1. wants to maximize the following evidence lower bound (ELBO):
E_q[ log p(a_i|z ) - \beta (log q(z|a_i) - logp(z)) ],

But in Sec B. the equation t. want to minimize the regulation loss

  • \beta D_KL ( N(m_z, std_z) || N(0 , 1) )

So the algorithm wants to make the KL divergence between Skill Posterior q(z|a_i) and Fixed Prior N(0, 1) larger or smaller?
This question has bothered me for days, if you can reply to me, I will be very grateful!

How to evaluate the learned embedding space and skill prior?

Hi,

before we train the RL policy, how could we evaluate the skill embedding space Z and the learned skill prior? I know that the training results can be visualized via tensorboard, but do we have other metrics to check its performance or how could we make sure the skill prior really work?

The success rate definition of Maze Navigation Env

Hi, I am Ce Hao and I am reproducing your code for SPiRL paper.

In Figure 4 of the paper, the success rate of Maze Navigation reached almost 1 after 1 M steps.

However, in the wandb logger, there is no variable called 'success rate', so I presume this 'success rate' is an indirect variable.
The definition is, at each epoch(50 episodes), if at least one reward > 1, which means the agent at least reaches the target once; then we think it is successful. And we calculate the mean and standard deviation of the success rate over 3 seeds.

However, the real experiments are different. Also as you show in Figure 5, SPiRL (Ours), the agent is still exploring many other places, but not converging to the path directly to the goal. My reproduction also shows that only less than 20% of trajectories finally reach the target.

I want to develop new algorithm on the SPiRL baseline, so could you please help us explain the definition of the success rate of Maze Navigation? Thanks!

Best,
Ce Hao

KeyError: 'completed_tasks'

I encounter error like this:
Traceback (most recent call last):
File "spirl/rl/train.py", line 311, in
RLTrainer(args=get_args())
File "spirl/rl/train.py", line 76, in init
self.train(start_epoch)
File "spirl/rl/train.py", line 104, in train
self.warmup()
File "spirl/rl/train.py", line 190, in warmup
warmup_experience_batch, _ = self.sampler.sample_batch(batch_size=self._hp.n_warmup_steps)
File "/home/user/spirl/spirl/rl/components/sampler.py", line 154, in sample_batch
obs, reward, done, info = self._env.step(agent_output.action)
File "/home/user/spirl/spirl/rl/envs/kitchen.py", line 20, in step
return obs, np.float64(rew), done, self._postprocess_info(info) # casting reward to float64 is important for getting shape later
File "/home/user/spirl/spirl/rl/envs/kitchen.py", line 34, in _postprocess_info
completed_subtasks = info.pop("completed_tasks")
KeyError: 'completed_tasks'

It seems that there is no 'completed_tasks' in info.

How to speed up the training process?

I found that the GPU utilization when running training-scriptpython3 spirl/train.py --path=spirl/configs/skill_prior_learning/kitchen/hierarchical_cl --val_data_size=160 is very low. Could you provide some suggestions to make full use of GPU resource to speed up the process?I have tried to set num_worker larger but it seems doesn't help ,and when I try to set batch_size larger, there will be mistakes like following

len val dataset 160
Running Testing
Traceback (most recent call last):
  File "spirl/spirl/train.py", line 390, in <module>
    ModelTrainer(args=get_args())
  File "spirl/spirl/train.py", line 76, in __init__
    self.train(start_epoch)
  File "spirl/spirl/train.py", line 105, in train
    self.val()
  File "spirl/spirl/train.py", line 199, in val
    self.evaluator.dump_results(self.global_step)
  File "/home/lyf/Videos/bin/skild/skild/spirl/spirl/components/evaluator.py", line 66, in dump_results
    self.dump_metrics(it)
  File "/home/lyf/Videos/bin/skild/skild/spirl/spirl/components/evaluator.py", line 72, in dump_metrics
    best_idxs = 0 if self._top_of_n == 1 else self._get_best_idxs(self.full_eval_buffer[self._top_comp_metric])
TypeError: 'NoneType' object is not subscriptable

Thank you very much!

parallel env

Obvisiously, it is very slow to run one env at a time. But parallel env suffers from the problem of different steps in HRL especially fixed interval.
Would you have some ideas to solve this problem?

Logger Error for --mode=val

Hi, I was trying to replicate the results by running in validation mode and got an error for the logger:

Traceback (most recent call last):
  File "spirl/rl/train.py", line 311, in <module>
    RLTrainer(args=get_args())
  File "spirl/rl/train.py", line 78, in __init__
    self.val()
  File "spirl/rl/train.py", line 162, in val
    self.logger, log_images=True, step=self.global_step)
  File "/home-nfs/rteehan/spirl/spirl/rl/components/agent.py", line 259, in log_outputs
    super().log_outputs(logging_stats, rollout_storage, logger, log_images, step)
  File "/home-nfs/rteehan/spirl/spirl/rl/components/agent.py", line 74, in log_outputs
    logger.log_scalar_dict(logging_stats, prefix='train' if self._is_train else 'val', step=step)
AttributeError: 'NoneType' object has no attribute 'log_scalar_dict'

Looking at the RLTrainer code, in the self.setup_logging() function it only seems to set a logger for --mode=train and not for val.

stacked_imgs are only available up to 2

Hi!
I'm developing my learning framework based on SPiRL, using my custom dataset.
In skill learning phase, when I set "n_input_frames=4", I got the following error message at the _get_seq_enc() of ImageClSPiRLMdl:

Code:
stacked_imgs = torch.cat([inputs.images[:, t:t+inputs.actions.shape[1]]
for t in range(self._hp.n_input_frames)], dim=2)

Error Message:
RuntimeError: Sizes of tensors must match except in dimension 2. Got 13 and 12 in dimension 1 (The offending index is 2)

In this case, the shape of each element is as following
'actions'={Tensor: (128, 13, 7)}
'pad_mask'={Tensor: (128, 14)}
'states'={Tensor: (128, 14, 34)}
'images'={Tensor: (128, 14, 3, 128, 128)}
'observations'={Tensor: (128, 13, 7)}

It works well if the n_input_frames is less than 3, but occurs error if the value is greater than 2.
I think the shape of action should be (128, 11, 7) to run the code line correctly.

How can I solve this problem?

Not able to replicate results mentioned

Hello @kpertsch and @youngwoon
I tried training the skill prior module using the block-stacking dataset in the given configuration. The overloss went from 15 to 11. When I use this pretrained model as a skill prior module for SAC, I am not able to get proper results.
I am not sure, but I think the skill prior module did not converge properly on the training dataset.
Can you please provide me with pretrained models/events files you got during training. I would be able to interpret which part of the model is not performing well. Also, do you have any other suggestions on replicating the results mentioned in the paper.
Thank You

Unable to download the data

Hello, I am trying to download the maze data using the command line
gdown https://drive.google.com/uc?id=1pXM-EDCwFrfgUjxITBsR48FqW9gMoXYZ

But when I downloaded about 4GB data, the terminal abruptly stopped the downloading. Maybe because this file is too large and my Internet connection was disconnected. I wonder if there is any way to only download some parts of the maze data so that I can download them little by little.

Thanks.

How to get the 'kitchen-mixed-v0.py' ?

Hello!

When i run this code,

python3 spirl/rl/train.py --path=spirl/configs/hrl/kitchen/spirl_cl --seed=0 --prefix=SPIRL_kitchen_seed0

Then i got this key error.

completed_subtasks = info.pop("completed_tasks")
KeyError: 'completed_tasks'

So, i commented it out and ran the code.
It went well up to 7%. (Train Epoch: 0 [It 100001/1500000 (7%)])
But this error occured.

ValueError: tile cannot extend outside image

That means i can't render from the environment.
The reason is that there is no file named 'kitchen-mixed-v0'
How to get the 'kitchen-mixed-v0'?
I know the code that download the offline env file but i think it doesn't opperate.

Here is my directory of the project folder.
./spirlProject
├── d4rl
│   ├── AdditionalMaps_0.9.8
│   │   ├── CarlaUE4
│   │   └── Engine
│   ├── CARLA_0.9.8
│   │   ├── CarlaUE4
│   │   ├── Engine
│   │   ├── HDMaps
│   │   ├── Import
│   │   ├── PythonAPI
│   │   └── Tools
│   ├── d4rl
│   │   ├── carla
│   │   ├── carla__
│   │   ├── flow
│   │   ├── gym_bullet
│   │   ├── gym_minigrid
│   │   ├── gym_mujoco
│   │   ├── hand_manipulation_suite
│   │   ├── kitchen
│   │   ├── locomotion
│   │   ├── pointmaze
│   │   ├── pointmaze_bullet
│   │   ├── pycache
│   │   └── utils
│   ├── d4rl.egg-info
│   ├── flow
│   │   ├── benchmarks
│   │   ├── controllers
│   │   ├── core
│   │   ├── envs
│   │   ├── multiagent_envs
│   │   ├── networks
│   │   ├── pycache
│   │   ├── renderer
│   │   ├── scenarios
│   │   ├── utils
│   │   └── visualize
│   ├── flow222
│   │   ├── docs
│   │   ├── examples
│   │   ├── scripts
│   │   ├── tests
│   │   └── tutorials
│   └── scripts
│   ├── generation
│   └── reference_scores
├── data
├── docs
│   └── resources
│   ├── env_videos
│   └── policy_videos
├── experiments
│   ├── hrl
│   │   └── kitchen
│   └── skill_prior_learning
│   └── kitchen
├── spirl
│   ├── components
│   │   └── pycache
│   ├── configs
│   │   ├── data_collect
│   │   ├── default_data_configs
│   │   ├── hrl
│   │   ├── rl
│   │   └── skill_prior_learning
│   ├── data
│   │   ├── block_stacking
│   │   ├── kitchen
│   │   ├── maze
│   │   ├── office
│   │   └── pycache
│   ├── models
│   │   └── pycache
│   ├── modules
│   │   └── pycache
│   ├── pycache
│   ├── rl
│   │   ├── agents
│   │   ├── components
│   │   ├── envs
│   │   ├── policies
│   │   └── utils
│   └── utils
│   ├── pycache
│   └── scripts
└── venv
├── bin
└── lib
└── python3.8

Are the d4rl or flow path wrong?

mat1 and mat2 shapes cannot be multiplied

Hello @kpertsch
First of all, Thanks a lot for this awesome work and documentation.
I am trying to set up the repo locally. When I tried to run train vanilla SAC on block_stacking environment, I am having the below error:
"RuntimeError: mat1 and mat2 shapes cannot be multiplied (1x41 and 23x256)"

Resetting the environment is returning an observation of dimension 1x41
Could you please help in resolving this?

Thank You

completed_subtasks = info.pop("computed_tasks error")

Hello, I have an issue on the SPiRL learning.

After I finished the skill prior learning on the kitchen environment,
I tried to train the SPiRL_CL based on the skill prior network.

But I got the KeyError when my code attempts pop the key "completed_tasks" from info variable

image

I don't understand the code line 33 of _postprocess_info because the returned "info" from step function only contains the following 5 key-value pairs as described in "kitchen_multitask_v0.py"

image

and, I could not find any other parts of adding the "completed_tasks" key to the info variable.

Am I missing something?

Could not find calibration file

Thank you for sharing codes.

when I run " python3 spirl/train.py --path=spirl/configs/skill_prior_learning/kitchen/flat --val_data_size=160 "

I encounter error like this:
"
Could not find calibration file at: /d4rl/kitchen/adept_envs/franka/robot/franka_config.xml
"
how can I solve this problem

RuntimeError of gradient compuptation

Hi, I added a new RL environment and run as readme.md. But I met this issue when self._perform_update(policy_loss, self.policy_opt, self.policy):

RuntimeError: one of the variables needed for gradient computation has been modified by an inplace operation: [torch.cuda.FloatTensor [256, 1]], which is output 0 of TBackward, is at version 2; expected version 1 instead. Hint: enable anomaly detection to find the operation that failed to compute its gradient, with torch.autograd.set_detect_anomaly(True).

Could you offer some help?

The performance of the SAC algorithm in the project is significantly worse than the performance of SAC in the stable baseline3.

The performance of the SAC algorithm in the project is significantly worse than the performance of SAC in the stable baseline. The training of the slide cabinet subtask in the kitchen environment using the SAC algorithm in this project fails to converge, while the loss function tends to exponentially explode. I have carefully examined the code of the project and the SAC in stable baseline3 and found no reason for this anomaly.
https://github.com/clvrai/spirl/blob/master/spirl/rl/agents/ac_agent.py
https://github.com/DLR-RM/stable-baselines3/blob/master/stable_baselines3/sac/sac.py

Caught RuntimeError in replica 1 on device 1.

Hi.
I'm trying to learn a skill prior using my personal dataset.
It works well when using single GPU, but I got the following error when using multi-GPUs (two RTX 3090)

Screenshot from 2022-04-18 19-10-59

How can I solve this?

Kitchen enivronment observation is 60 instead of 30

Hi, I am wondering whether including goal as an input is intentional. The SPIRL paper says that the kitchen environment has 30 states but running python3 spirl/train.py --path=spirl/configs/skill_prior_learning/kitchen/hierarchical_cl --val_data_size=160 will include goal (observation becomes 60). I guess running RL also will have 60 observations because it uses KitchenEnv instead of NoGoalKitchenEnv.

Is it possible to change the camera zoom?

Thank you for your great work.

I understand the given dataset and generating process focuses on agent's current location, but
I want to create an image based 2d maze dataset where entire map is given (zoomed out).

Where can i start for this?
Thanks!

Not able to render the block_stacking environment

Hello @kpertsch
I am trying to render the block stacking in the mujoco viewer. After changing the has_renderer to True in the MujocoEnv of base.py file, I am having the "Failed to initialize GLFW error.
GLFW error (code %d): %s 65544 b'X11: Failed to open display :1'
GLFW error (code %d): %s 65544 b'X11: Failed to open display :1'
*** mujoco_py.cymj.GlfwError: Failed to initialize GLFW

This error is shown for the line
self.viewer = MujocoPyRenderer(self.sim)
in spirl/rl/utils/robosuite_utils.py(21)

I also tried saving the xml file and loading it separately. I am able to load it but the positions of the blocks and gripper is disturbed.

Could you please help in resolving this?

Segmentation fault (core dumped)

Hi! I cannot complete the environment configuration, I meet a problem called:
loading from the config file spirl/configs/skill_prior_learning/kitchen/hierarchical_cl/conf.py
Warning: Mujoco-based envs failed to import. Set the environment variable D4RL_SUPPRESS_IMPORT_ERROR=1 to suppress this message.
No module named 'mjrl'
Warning: Flow failed to import. Set the environment variable D4RL_SUPPRESS_IMPORT_ERROR=1 to suppress this message.
No module named 'flow'
MoTTY X11 proxy: Authorisation not recognised
/home/hehongcai/miniconda3/envs/SPiRL/lib/python3.7/site-packages/glfw/init.py:912: GLFWError: (65544) b'X11: Failed to open display localhost:12.0'
warnings.warn(message, GLFWError)
Segmentation fault (core dumped)

I don't change anything except the wandb part.
I have found many ways but cannot solve them. Do you have any solutions?

Thank you very much!

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.