clvrai / spirl Goto Github PK

View Code? Open in Web Editor NEW

186.0 7.0 36.0 17 MB

Official implementation of "Accelerating Reinforcement Learning with Learned Skill Priors", Pertsch et al., CoRL 2020

Python 100.00%

robot-learning reinforcement-learning

spirl's People

Contributors

Stargazers

Watchers

spirl's Issues

regularization in the first stage

In Sec 3.2, equation 1. wants to maximize the following evidence lower bound (ELBO):
E_q[ log p(a_i|z ) - \beta (log q(z|a_i) - logp(z)) ],

But in Sec B. the equation t. want to minimize the regulation loss

\beta D_KL ( N(m_z, std_z) || N(0 , 1) )

So the algorithm wants to make the KL divergence between Skill Posterior q(z|a_i) and Fixed Prior N(0, 1) larger or smaller?
This question has bothered me for days, if you can reply to me, I will be very grateful!

How to evaluate the learned embedding space and skill prior?

Hi,

before we train the RL policy, how could we evaluate the skill embedding space Z and the learned skill prior? I know that the training results can be visualized via tensorboard, but do we have other metrics to check its performance or how could we make sure the skill prior really work?

How can i simulate the trained policy for kitchen environment with Mujoco or another?

Hi.

I trained the skill prior and hrl with example commands ('kitchen-mixed-v0').
How can i simulate the trained policy for kitchen environment with Mujoco? (like this https://clvrai.github.io/spirl/)

The success rate definition of Maze Navigation Env

Hi, I am Ce Hao and I am reproducing your code for SPiRL paper.

In Figure 4 of the paper, the success rate of Maze Navigation reached almost 1 after 1 M steps.

However, in the wandb logger, there is no variable called 'success rate', so I presume this 'success rate' is an indirect variable.
The definition is, at each epoch(50 episodes), if at least one reward > 1, which means the agent at least reaches the target once; then we think it is successful. And we calculate the mean and standard deviation of the success rate over 3 seeds.

However, the real experiments are different. Also as you show in Figure 5, SPiRL (Ours), the agent is still exploring many other places, but not converging to the path directly to the goal. My reproduction also shows that only less than 20% of trajectories finally reach the target.

I want to develop new algorithm on the SPiRL baseline, so could you please help us explain the definition of the success rate of Maze Navigation? Thanks!

Best,
Ce Hao

KeyError: 'completed_tasks'

I encounter error like this:
Traceback (most recent call last):
File "spirl/rl/train.py", line 311, in
RLTrainer(args=get_args())
File "spirl/rl/train.py", line 76, in init
self.train(start_epoch)
File "spirl/rl/train.py", line 104, in train
self.warmup()
File "spirl/rl/train.py", line 190, in warmup
warmup_experience_batch, _ = self.sampler.sample_batch(batch_size=self._hp.n_warmup_steps)
File "/home/user/spirl/spirl/rl/components/sampler.py", line 154, in sample_batch
obs, reward, done, info = self._env.step(agent_output.action)
File "/home/user/spirl/spirl/rl/envs/kitchen.py", line 20, in step
return obs, np.float64(rew), done, self._postprocess_info(info) # casting reward to float64 is important for getting shape later
File "/home/user/spirl/spirl/rl/envs/kitchen.py", line 34, in _postprocess_info
completed_subtasks = info.pop("completed_tasks")
KeyError: 'completed_tasks'

It seems that there is no 'completed_tasks' in info.

How to speed up the training process?

I found that the GPU utilization when running training-scriptpython3 spirl/train.py --path=spirl/configs/skill_prior_learning/kitchen/hierarchical_cl --val_data_size=160 is very low. Could you provide some suggestions to make full use of GPU resource to speed up the process？I have tried to set num_worker larger but it seems doesn't help ,and when I try to set batch_size larger, there will be mistakes like following

len val dataset 160
Running Testing
Traceback (most recent call last):
  File "spirl/spirl/train.py", line 390, in <module>
    ModelTrainer(args=get_args())
  File "spirl/spirl/train.py", line 76, in __init__
    self.train(start_epoch)
  File "spirl/spirl/train.py", line 105, in train
    self.val()
  File "spirl/spirl/train.py", line 199, in val
    self.evaluator.dump_results(self.global_step)
  File "/home/lyf/Videos/bin/skild/skild/spirl/spirl/components/evaluator.py", line 66, in dump_results
    self.dump_metrics(it)
  File "/home/lyf/Videos/bin/skild/skild/spirl/spirl/components/evaluator.py", line 72, in dump_metrics
    best_idxs = 0 if self._top_of_n == 1 else self._get_best_idxs(self.full_eval_buffer[self._top_comp_metric])
TypeError: 'NoneType' object is not subscriptable

Thank you very much！

parallel env

Obvisiously, it is very slow to run one env at a time. But parallel env suffers from the problem of different steps in HRL especially fixed interval.
Would you have some ideas to solve this problem?

Logger Error for --mode=val

Hi, I was trying to replicate the results by running in validation mode and got an error for the logger:

Traceback (most recent call last):
  File "spirl/rl/train.py", line 311, in <module>
    RLTrainer(args=get_args())
  File "spirl/rl/train.py", line 78, in __init__
    self.val()
  File "spirl/rl/train.py", line 162, in val
    self.logger, log_images=True, step=self.global_step)
  File "/home-nfs/rteehan/spirl/spirl/rl/components/agent.py", line 259, in log_outputs
    super().log_outputs(logging_stats, rollout_storage, logger, log_images, step)
  File "/home-nfs/rteehan/spirl/spirl/rl/components/agent.py", line 74, in log_outputs
    logger.log_scalar_dict(logging_stats, prefix='train' if self._is_train else 'val', step=step)
AttributeError: 'NoneType' object has no attribute 'log_scalar_dict'

Looking at the RLTrainer code, in the self.setup_logging() function it only seems to set a logger for --mode=train and not for val.

stacked_imgs are only available up to 2

Hi!
I'm developing my learning framework based on SPiRL, using my custom dataset.
In skill learning phase, when I set "n_input_frames=4", I got the following error message at the _get_seq_enc() of ImageClSPiRLMdl:

Code:
stacked_imgs = torch.cat([inputs.images[:, t:t+inputs.actions.shape[1]]
for t in range(self._hp.n_input_frames)], dim=2)

Error Message:
RuntimeError: Sizes of tensors must match except in dimension 2. Got 13 and 12 in dimension 1 (The offending index is 2)

In this case, the shape of each element is as following
'actions'={Tensor: (128, 13, 7)}
'pad_mask'={Tensor: (128, 14)}
'states'={Tensor: (128, 14, 34)}
'images'={Tensor: (128, 14, 3, 128, 128)}
'observations'={Tensor: (128, 13, 7)}

It works well if the n_input_frames is less than 3, but occurs error if the value is greater than 2.
I think the shape of action should be (128, 11, 7) to run the code line correctly.

How can I solve this problem?

Running error "gym.error.UnregisteredEnv: No registered env with id: kitchen-mixed-v0"

Hello,when I am running the code "gym.make('kitchen-mixed-v0')" throw error "gym.error.UnregisteredEnv: No registered env with id: kitchen-mixed-v0", but I am already install d4rl and can import it correctly.
How can I solve it?
Thank you for replay!

Not able to replicate results mentioned

Hello @kpertsch and @youngwoon
I tried training the skill prior module using the block-stacking dataset in the given configuration. The overloss went from 15 to 11. When I use this pretrained model as a skill prior module for SAC, I am not able to get proper results.
I am not sure, but I think the skill prior module did not converge properly on the training dataset.
Can you please provide me with pretrained models/events files you got during training. I would be able to interpret which part of the model is not performing well. Also, do you have any other suggestions on replicating the results mentioned in the paper.
Thank You

Unable to download the data

Hello, I am trying to download the maze data using the command line
gdown https://drive.google.com/uc?id=1pXM-EDCwFrfgUjxITBsR48FqW9gMoXYZ

But when I downloaded about 4GB data, the terminal abruptly stopped the downloading. Maybe because this file is too large and my Internet connection was disconnected. I wonder if there is any way to only download some parts of the maze data so that I can download them little by little.

Thanks.

How to get the 'kitchen-mixed-v0.py' ?

Hello!

When i run this code,

python3 spirl/rl/train.py --path=spirl/configs/hrl/kitchen/spirl_cl --seed=0 --prefix=SPIRL_kitchen_seed0

Then i got this key error.

completed_subtasks = info.pop("completed_tasks")
KeyError: 'completed_tasks'

So, i commented it out and ran the code.
It went well up to 7%. (Train Epoch: 0 [It 100001/1500000 (7%)])
But this error occured.

ValueError: tile cannot extend outside image

That means i can't render from the environment.
The reason is that there is no file named 'kitchen-mixed-v0'
How to get the 'kitchen-mixed-v0'?
I know the code that download the offline env file but i think it doesn't opperate.

Here is my directory of the project folder.
./spirlProject
├── d4rl
│   ├── AdditionalMaps_0.9.8
│   │   ├── CarlaUE4
│   │   └── Engine
│   ├── CARLA_0.9.8
│   │   ├── CarlaUE4
│   │   ├── Engine
│   │   ├── HDMaps
│   │   ├── Import
│   │   ├── PythonAPI
│   │   └── Tools
│   ├── d4rl
│   │   ├── carla
│   │   ├── carla__
│   │   ├── flow
│   │   ├── gym_bullet
│   │   ├── gym_minigrid
│   │   ├── gym_mujoco
│   │   ├── hand_manipulation_suite
│   │   ├── kitchen
│   │   ├── locomotion
│   │   ├── pointmaze
│   │   ├── pointmaze_bullet
│   │   ├── pycache
│   │   └── utils
│   ├── d4rl.egg-info
│   ├── flow
│   │   ├── benchmarks
│   │   ├── controllers
│   │   ├── core
│   │   ├── envs
│   │   ├── multiagent_envs
│   │   ├── networks
│   │   ├── pycache
│   │   ├── renderer
│   │   ├── scenarios
│   │   ├── utils
│   │   └── visualize
│   ├── flow222
│   │   ├── docs
│   │   ├── examples
│   │   ├── scripts
│   │   ├── tests
│   │   └── tutorials
│   └── scripts
│   ├── generation
│   └── reference_scores
├── data
├── docs
│   └── resources
│   ├── env_videos
│   └── policy_videos
├── experiments
│   ├── hrl
│   │   └── kitchen
│   └── skill_prior_learning
│   └── kitchen
├── spirl
│   ├── components
│   │   └── pycache
│   ├── configs
│   │   ├── data_collect
│   │   ├── default_data_configs
│   │   ├── hrl
│   │   ├── rl
│   │   └── skill_prior_learning
│   ├── data
│   │   ├── block_stacking
│   │   ├── kitchen
│   │   ├── maze
│   │   ├── office
│   │   └── pycache
│   ├── models
│   │   └── pycache
│   ├── modules
│   │   └── pycache
│   ├── pycache
│   ├── rl
│   │   ├── agents
│   │   ├── components
│   │   ├── envs
│   │   ├── policies
│   │   └── utils
│   └── utils
│   ├── pycache
│   └── scripts
└── venv
├── bin
└── lib
└── python3.8

Are the d4rl or flow path wrong?

mat1 and mat2 shapes cannot be multiplied

Hello @kpertsch
First of all, Thanks a lot for this awesome work and documentation.
I am trying to set up the repo locally. When I tried to run train vanilla SAC on block_stacking environment, I am having the below error:
"RuntimeError: mat1 and mat2 shapes cannot be multiplied (1x41 and 23x256)"

Resetting the environment is returning an observation of dimension 1x41
Could you please help in resolving this?

Thank You

completed_subtasks = info.pop("computed_tasks error")

Hello, I have an issue on the SPiRL learning.

After I finished the skill prior learning on the kitchen environment,
I tried to train the SPiRL_CL based on the skill prior network.

But I got the KeyError when my code attempts pop the key "completed_tasks" from info variable

I don't understand the code line 33 of _postprocess_info because the returned "info" from step function only contains the following 5 key-value pairs as described in "kitchen_multitask_v0.py"

and, I could not find any other parts of adding the "completed_tasks" key to the info variable.

Am I missing something?

Dataset for kitchen environment

Hi Kpertsch, thanks for providing the SPiRL code and Maze dataset.
I find you updated the closed-loop model for the kitchen environment. How did you generate the dataset for it?

In D4RL, I find a script to generate kitchen data, but I am not sure if it is correct.
https://github.com/kpertsch/d4rl/blob/master/scripts/generate_kitchen_datasets.py

Thanks.
Best, Ce

Could not find calibration file

Thank you for sharing codes.

when I run " python3 spirl/train.py --path=spirl/configs/skill_prior_learning/kitchen/flat --val_data_size=160 "

I encounter error like this:
"
Could not find calibration file at: /d4rl/kitchen/adept_envs/franka/robot/franka_config.xml
"
how can I solve this problem

RuntimeError of gradient compuptation

Hi, I added a new RL environment and run as readme.md. But I met this issue when self._perform_update(policy_loss, self.policy_opt, self.policy):

RuntimeError: one of the variables needed for gradient computation has been modified by an inplace operation: [torch.cuda.FloatTensor [256, 1]], which is output 0 of TBackward, is at version 2; expected version 1 instead. Hint: enable anomaly detection to find the operation that failed to compute its gradient, with torch.autograd.set_detect_anomaly(True).

Could you offer some help?

The performance of the SAC algorithm in the project is significantly worse than the performance of SAC in the stable baseline3.

The performance of the SAC algorithm in the project is significantly worse than the performance of SAC in the stable baseline. The training of the slide cabinet subtask in the kitchen environment using the SAC algorithm in this project fails to converge, while the loss function tends to exponentially explode. I have carefully examined the code of the project and the SAC in stable baseline3 and found no reason for this anomaly.
https://github.com/clvrai/spirl/blob/master/spirl/rl/agents/ac_agent.py
https://github.com/DLR-RM/stable-baselines3/blob/master/stable_baselines3/sac/sac.py

Caught RuntimeError in replica 1 on device 1.

Hi.
I'm trying to learn a skill prior using my personal dataset.
It works well when using single GPU, but I got the following error when using multi-GPUs (two RTX 3090)

How can I solve this?

Kitchen enivronment observation is 60 instead of 30

Hi, I am wondering whether including goal as an input is intentional. The SPIRL paper says that the kitchen environment has 30 states but running python3 spirl/train.py --path=spirl/configs/skill_prior_learning/kitchen/hierarchical_cl --val_data_size=160 will include goal (observation becomes 60). I guess running RL also will have 60 observations because it uses KitchenEnv instead of NoGoalKitchenEnv.

RuntimeError: one of the variables needed for gradient computation has been modified by an inplace operation:

Hello @kpertsch and @jesbu1
I am having the following error when I tried to train a SAC with skill priors.

Could you please help me with this

How to regenerate kitchen dataset?

Hi,

do we have a way to regenerate new datasets for the kitchen environment?

Pre-trained models

Would it be possible to share/release pre-trained model weights?

Is it possible to change the camera zoom?

Thank you for your great work.

I understand the given dataset and generating process focuses on agent's current location, but
I want to create an image based 2d maze dataset where entire map is given (zoomed out).

Where can i start for this?
Thanks!

Not able to render the block_stacking environment

Hello @kpertsch
I am trying to render the block stacking in the mujoco viewer. After changing the has_renderer to True in the MujocoEnv of base.py file, I am having the "Failed to initialize GLFW error.
GLFW error (code %d): %s 65544 b'X11: Failed to open display :1'
GLFW error (code %d): %s 65544 b'X11: Failed to open display :1'
*** mujoco_py.cymj.GlfwError: Failed to initialize GLFW

This error is shown for the line
self.viewer = MujocoPyRenderer(self.sim)
in spirl/rl/utils/robosuite_utils.py(21)

I also tried saving the xml file and loading it separately. I am able to load it but the positions of the blocks and gripper is disturbed.

Could you please help in resolving this?

RuntimeError: No filenames found in /home/user_name/rl_ws/spirl/data/block_stacking

Hi,

I downloaded block_stacking.zip dataset and put it in the created folder, like ./data/block_stacking/blocking_stacking.zip, the data_dir is /home/user_name/rl_ws/spirl/data/block_stacking, while the file can not be found, I also tried this way ./data/blocking_stacking.zip, it is the same, did I miss something?

Thanks.

Segmentation fault (core dumped)

Hi! I cannot complete the environment configuration, I meet a problem called:
loading from the config file spirl/configs/skill_prior_learning/kitchen/hierarchical_cl/conf.py
Warning: Mujoco-based envs failed to import. Set the environment variable D4RL_SUPPRESS_IMPORT_ERROR=1 to suppress this message.
No module named 'mjrl'
Warning: Flow failed to import. Set the environment variable D4RL_SUPPRESS_IMPORT_ERROR=1 to suppress this message.
No module named 'flow'
MoTTY X11 proxy: Authorisation not recognised
/home/hehongcai/miniconda3/envs/SPiRL/lib/python3.7/site-packages/glfw/init.py:912: GLFWError: (65544) b'X11: Failed to open display localhost:12.0'
warnings.warn(message, GLFWError)
Segmentation fault (core dumped)

I don't change anything except the wandb part.
I have found many ways but cannot solve them. Do you have any solutions?

Thank you very much!

clvrai / spirl Goto Github PK

spirl's People

Contributors

Stargazers

Watchers

Forkers

spirl's Issues

Recommend Projects

Recommend Topics

Recommend Org