clvrai / spirl Goto Github PK
View Code? Open in Web Editor NEWOfficial implementation of "Accelerating Reinforcement Learning with Learned Skill Priors", Pertsch et al., CoRL 2020
Official implementation of "Accelerating Reinforcement Learning with Learned Skill Priors", Pertsch et al., CoRL 2020
In Sec 3.2, equation 1. wants to maximize the following evidence lower bound (ELBO):
E_q[ log p(a_i|z ) - \beta (log q(z|a_i) - logp(z)) ],
But in Sec B. the equation t. want to minimize the regulation loss
So the algorithm wants to make the KL divergence between Skill Posterior q(z|a_i) and Fixed Prior N(0, 1) larger or smaller?
This question has bothered me for days, if you can reply to me, I will be very grateful!
Hi,
before we train the RL policy, how could we evaluate the skill embedding space Z and the learned skill prior? I know that the training results can be visualized via tensorboard, but do we have other metrics to check its performance or how could we make sure the skill prior really work?
Hi.
I trained the skill prior and hrl with example commands ('kitchen-mixed-v0').
How can i simulate the trained policy for kitchen environment with Mujoco? (like this https://clvrai.github.io/spirl/)
Hi, I am Ce Hao and I am reproducing your code for SPiRL paper.
In Figure 4 of the paper, the success rate of Maze Navigation reached almost 1 after 1 M steps.
However, in the wandb logger, there is no variable called 'success rate', so I presume this 'success rate' is an indirect variable.
The definition is, at each epoch(50 episodes), if at least one reward > 1, which means the agent at least reaches the target once; then we think it is successful. And we calculate the mean and standard deviation of the success rate over 3 seeds.
However, the real experiments are different. Also as you show in Figure 5, SPiRL (Ours), the agent is still exploring many other places, but not converging to the path directly to the goal. My reproduction also shows that only less than 20% of trajectories finally reach the target.
I want to develop new algorithm on the SPiRL baseline, so could you please help us explain the definition of the success rate of Maze Navigation? Thanks!
Best,
Ce Hao
I encounter error like this:
Traceback (most recent call last):
File "spirl/rl/train.py", line 311, in
RLTrainer(args=get_args())
File "spirl/rl/train.py", line 76, in init
self.train(start_epoch)
File "spirl/rl/train.py", line 104, in train
self.warmup()
File "spirl/rl/train.py", line 190, in warmup
warmup_experience_batch, _ = self.sampler.sample_batch(batch_size=self._hp.n_warmup_steps)
File "/home/user/spirl/spirl/rl/components/sampler.py", line 154, in sample_batch
obs, reward, done, info = self._env.step(agent_output.action)
File "/home/user/spirl/spirl/rl/envs/kitchen.py", line 20, in step
return obs, np.float64(rew), done, self._postprocess_info(info) # casting reward to float64 is important for getting shape later
File "/home/user/spirl/spirl/rl/envs/kitchen.py", line 34, in _postprocess_info
completed_subtasks = info.pop("completed_tasks")
KeyError: 'completed_tasks'
It seems that there is no 'completed_tasks' in info.
I found that the GPU utilization when running training-scriptpython3 spirl/train.py --path=spirl/configs/skill_prior_learning/kitchen/hierarchical_cl --val_data_size=160
is very low. Could you provide some suggestions to make full use of GPU resource to speed up the process?I have tried to set num_worker larger but it seems doesn't help ,and when I try to set batch_size larger, there will be mistakes like following
len val dataset 160
Running Testing
Traceback (most recent call last):
File "spirl/spirl/train.py", line 390, in <module>
ModelTrainer(args=get_args())
File "spirl/spirl/train.py", line 76, in __init__
self.train(start_epoch)
File "spirl/spirl/train.py", line 105, in train
self.val()
File "spirl/spirl/train.py", line 199, in val
self.evaluator.dump_results(self.global_step)
File "/home/lyf/Videos/bin/skild/skild/spirl/spirl/components/evaluator.py", line 66, in dump_results
self.dump_metrics(it)
File "/home/lyf/Videos/bin/skild/skild/spirl/spirl/components/evaluator.py", line 72, in dump_metrics
best_idxs = 0 if self._top_of_n == 1 else self._get_best_idxs(self.full_eval_buffer[self._top_comp_metric])
TypeError: 'NoneType' object is not subscriptable
Thank you very much!
Obvisiously, it is very slow to run one env at a time. But parallel env suffers from the problem of different steps in HRL especially fixed interval.
Would you have some ideas to solve this problem?
Hi, I was trying to replicate the results by running in validation mode and got an error for the logger:
Traceback (most recent call last):
File "spirl/rl/train.py", line 311, in <module>
RLTrainer(args=get_args())
File "spirl/rl/train.py", line 78, in __init__
self.val()
File "spirl/rl/train.py", line 162, in val
self.logger, log_images=True, step=self.global_step)
File "/home-nfs/rteehan/spirl/spirl/rl/components/agent.py", line 259, in log_outputs
super().log_outputs(logging_stats, rollout_storage, logger, log_images, step)
File "/home-nfs/rteehan/spirl/spirl/rl/components/agent.py", line 74, in log_outputs
logger.log_scalar_dict(logging_stats, prefix='train' if self._is_train else 'val', step=step)
AttributeError: 'NoneType' object has no attribute 'log_scalar_dict'
Looking at the RLTrainer code, in the self.setup_logging()
function it only seems to set a logger for --mode=train and not for val.
Hi!
I'm developing my learning framework based on SPiRL, using my custom dataset.
In skill learning phase, when I set "n_input_frames=4", I got the following error message at the _get_seq_enc() of ImageClSPiRLMdl:
Code:
stacked_imgs = torch.cat([inputs.images[:, t:t+inputs.actions.shape[1]]
for t in range(self._hp.n_input_frames)], dim=2)
Error Message:
RuntimeError: Sizes of tensors must match except in dimension 2. Got 13 and 12 in dimension 1 (The offending index is 2)
In this case, the shape of each element is as following
'actions'={Tensor: (128, 13, 7)}
'pad_mask'={Tensor: (128, 14)}
'states'={Tensor: (128, 14, 34)}
'images'={Tensor: (128, 14, 3, 128, 128)}
'observations'={Tensor: (128, 13, 7)}
It works well if the n_input_frames is less than 3, but occurs error if the value is greater than 2.
I think the shape of action should be (128, 11, 7) to run the code line correctly.
How can I solve this problem?
Hello,when I am running the code "gym.make('kitchen-mixed-v0')" throw error "gym.error.UnregisteredEnv: No registered env with id: kitchen-mixed-v0", but I am already install d4rl and can import it correctly.
How can I solve it?
Thank you for replay!
Hello @kpertsch and @youngwoon
I tried training the skill prior module using the block-stacking dataset in the given configuration. The overloss went from 15 to 11. When I use this pretrained model as a skill prior module for SAC, I am not able to get proper results.
I am not sure, but I think the skill prior module did not converge properly on the training dataset.
Can you please provide me with pretrained models/events files you got during training. I would be able to interpret which part of the model is not performing well. Also, do you have any other suggestions on replicating the results mentioned in the paper.
Thank You
Hello, I am trying to download the maze data using the command line
gdown https://drive.google.com/uc?id=1pXM-EDCwFrfgUjxITBsR48FqW9gMoXYZ
But when I downloaded about 4GB data, the terminal abruptly stopped the downloading. Maybe because this file is too large and my Internet connection was disconnected. I wonder if there is any way to only download some parts of the maze data so that I can download them little by little.
Thanks.
Hello!
When i run this code,
python3 spirl/rl/train.py --path=spirl/configs/hrl/kitchen/spirl_cl --seed=0 --prefix=SPIRL_kitchen_seed0
Then i got this key error.
completed_subtasks = info.pop("completed_tasks")
KeyError: 'completed_tasks'
So, i commented it out and ran the code.
It went well up to 7%. (Train Epoch: 0 [It 100001/1500000 (7%)])
But this error occured.
ValueError: tile cannot extend outside image
That means i can't render from the environment.
The reason is that there is no file named 'kitchen-mixed-v0'
How to get the 'kitchen-mixed-v0'?
I know the code that download the offline env file but i think it doesn't opperate.
Here is my directory of the project folder.
./spirlProject
├── d4rl
│ ├── AdditionalMaps_0.9.8
│ │ ├── CarlaUE4
│ │ └── Engine
│ ├── CARLA_0.9.8
│ │ ├── CarlaUE4
│ │ ├── Engine
│ │ ├── HDMaps
│ │ ├── Import
│ │ ├── PythonAPI
│ │ └── Tools
│ ├── d4rl
│ │ ├── carla
│ │ ├── carla__
│ │ ├── flow
│ │ ├── gym_bullet
│ │ ├── gym_minigrid
│ │ ├── gym_mujoco
│ │ ├── hand_manipulation_suite
│ │ ├── kitchen
│ │ ├── locomotion
│ │ ├── pointmaze
│ │ ├── pointmaze_bullet
│ │ ├── pycache
│ │ └── utils
│ ├── d4rl.egg-info
│ ├── flow
│ │ ├── benchmarks
│ │ ├── controllers
│ │ ├── core
│ │ ├── envs
│ │ ├── multiagent_envs
│ │ ├── networks
│ │ ├── pycache
│ │ ├── renderer
│ │ ├── scenarios
│ │ ├── utils
│ │ └── visualize
│ ├── flow222
│ │ ├── docs
│ │ ├── examples
│ │ ├── scripts
│ │ ├── tests
│ │ └── tutorials
│ └── scripts
│ ├── generation
│ └── reference_scores
├── data
├── docs
│ └── resources
│ ├── env_videos
│ └── policy_videos
├── experiments
│ ├── hrl
│ │ └── kitchen
│ └── skill_prior_learning
│ └── kitchen
├── spirl
│ ├── components
│ │ └── pycache
│ ├── configs
│ │ ├── data_collect
│ │ ├── default_data_configs
│ │ ├── hrl
│ │ ├── rl
│ │ └── skill_prior_learning
│ ├── data
│ │ ├── block_stacking
│ │ ├── kitchen
│ │ ├── maze
│ │ ├── office
│ │ └── pycache
│ ├── models
│ │ └── pycache
│ ├── modules
│ │ └── pycache
│ ├── pycache
│ ├── rl
│ │ ├── agents
│ │ ├── components
│ │ ├── envs
│ │ ├── policies
│ │ └── utils
│ └── utils
│ ├── pycache
│ └── scripts
└── venv
├── bin
└── lib
└── python3.8
Are the d4rl or flow path wrong?
Hello @kpertsch
First of all, Thanks a lot for this awesome work and documentation.
I am trying to set up the repo locally. When I tried to run train vanilla SAC on block_stacking environment, I am having the below error:
"RuntimeError: mat1 and mat2 shapes cannot be multiplied (1x41 and 23x256)"
Resetting the environment is returning an observation of dimension 1x41
Could you please help in resolving this?
Thank You
Hello, I have an issue on the SPiRL learning.
After I finished the skill prior learning on the kitchen environment,
I tried to train the SPiRL_CL based on the skill prior network.
But I got the KeyError when my code attempts pop the key "completed_tasks" from info variable
I don't understand the code line 33 of _postprocess_info because the returned "info" from step function only contains the following 5 key-value pairs as described in "kitchen_multitask_v0.py"
and, I could not find any other parts of adding the "completed_tasks" key to the info variable.
Am I missing something?
Hi Kpertsch, thanks for providing the SPiRL code and Maze dataset.
I find you updated the closed-loop model for the kitchen environment. How did you generate the dataset for it?
In D4RL, I find a script to generate kitchen data, but I am not sure if it is correct.
https://github.com/kpertsch/d4rl/blob/master/scripts/generate_kitchen_datasets.py
Thanks.
Best, Ce
Thank you for sharing codes.
when I run " python3 spirl/train.py --path=spirl/configs/skill_prior_learning/kitchen/flat --val_data_size=160 "
I encounter error like this:
"
Could not find calibration file at: /d4rl/kitchen/adept_envs/franka/robot/franka_config.xml
"
how can I solve this problem
Hi, I added a new RL environment and run as readme.md. But I met this issue when self._perform_update(policy_loss, self.policy_opt, self.policy)
:
RuntimeError: one of the variables needed for gradient computation has been modified by an inplace operation: [torch.cuda.FloatTensor [256, 1]], which is output 0 of TBackward, is at version 2; expected version 1 instead. Hint: enable anomaly detection to find the operation that failed to compute its gradient, with torch.autograd.set_detect_anomaly(True).
Could you offer some help?
The performance of the SAC algorithm in the project is significantly worse than the performance of SAC in the stable baseline. The training of the slide cabinet subtask in the kitchen environment using the SAC algorithm in this project fails to converge, while the loss function tends to exponentially explode. I have carefully examined the code of the project and the SAC in stable baseline3 and found no reason for this anomaly.
https://github.com/clvrai/spirl/blob/master/spirl/rl/agents/ac_agent.py
https://github.com/DLR-RM/stable-baselines3/blob/master/stable_baselines3/sac/sac.py
Hi, I am wondering whether including goal as an input is intentional. The SPIRL paper says that the kitchen environment has 30 states but running python3 spirl/train.py --path=spirl/configs/skill_prior_learning/kitchen/hierarchical_cl --val_data_size=160
will include goal (observation becomes 60). I guess running RL also will have 60 observations because it uses KitchenEnv
instead of NoGoalKitchenEnv
.
Hi,
do we have a way to regenerate new datasets for the kitchen environment?
Would it be possible to share/release pre-trained model weights?
Thank you for your great work.
I understand the given dataset and generating process focuses on agent's current location, but
I want to create an image based 2d maze dataset where entire map is given (zoomed out).
Where can i start for this?
Thanks!
Hello @kpertsch
I am trying to render the block stacking in the mujoco viewer. After changing the has_renderer to True in the MujocoEnv of base.py file, I am having the "Failed to initialize GLFW error.
GLFW error (code %d): %s 65544 b'X11: Failed to open display :1'
GLFW error (code %d): %s 65544 b'X11: Failed to open display :1'
*** mujoco_py.cymj.GlfwError: Failed to initialize GLFW
This error is shown for the line
self.viewer = MujocoPyRenderer(self.sim)
in spirl/rl/utils/robosuite_utils.py(21)
I also tried saving the xml file and loading it separately. I am able to load it but the positions of the blocks and gripper is disturbed.
Could you please help in resolving this?
Hi,
I downloaded block_stacking.zip dataset and put it in the created folder, like ./data/block_stacking/blocking_stacking.zip, the data_dir is /home/user_name/rl_ws/spirl/data/block_stacking, while the file can not be found, I also tried this way ./data/blocking_stacking.zip, it is the same, did I miss something?
Thanks.
Hi! I cannot complete the environment configuration, I meet a problem called:
loading from the config file spirl/configs/skill_prior_learning/kitchen/hierarchical_cl/conf.py
Warning: Mujoco-based envs failed to import. Set the environment variable D4RL_SUPPRESS_IMPORT_ERROR=1 to suppress this message.
No module named 'mjrl'
Warning: Flow failed to import. Set the environment variable D4RL_SUPPRESS_IMPORT_ERROR=1 to suppress this message.
No module named 'flow'
MoTTY X11 proxy: Authorisation not recognised
/home/hehongcai/miniconda3/envs/SPiRL/lib/python3.7/site-packages/glfw/init.py:912: GLFWError: (65544) b'X11: Failed to open display localhost:12.0'
warnings.warn(message, GLFWError)
Segmentation fault (core dumped)
I don't change anything except the wandb part.
I have found many ways but cannot solve them. Do you have any solutions?
Thank you very much!
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.