tianhongdai / hindsight-experience-replay Goto Github PK

View Code? Open in Web Editor NEW

391.0 391.0 77.0 32.3 MB

This is the pytorch implementation of Hindsight Experience Replay (HER) - Experiment on all fetch robotic environments.

License: MIT License

Python 100.00%

ddpg exploration her hindsight-experience-replay off-policy pytorch-implmention reinforcement-learning

hindsight-experience-replay's People

Contributors

Stargazers

Watchers

Forkers

carlo- dafangwang jekyll1021 wh-forker yangrui2015 psyche-mia johny-c watchernyu zhaoenmin xiaoxiansun sierax ultrainren muhammed-saeed praneethsv wissamakretche shara27 malharsb zhanghanbo b1sounours shitianyu-hue fuxianh leejeyeol geyang wei-tianhao colorlessboy yangyi0318 lehighcse375-ajm tailintalent saiakhil0034 panargirakis maitycyrus returncloud qitaohou alexlioralexli younghyopark dmr07 sandipan1 jypuigbo thomasrantian guoyijie jc-bao nknguyenlx zixin-tang holarissun wmoalxf siyuanlee haoqiangyang1998 jack-sherman01 jongkook-heo drawlinson ycl010203 yaoxt3 hanchengyao tiamokip qabot-zh christophernwufo1 rodrigodelazcano li-ming-fan myleosu jiawei415 lxqpku weihangguo zorrostardust dexin-wang aaronluo8 yekaraoglan mengf1 faheem-khaskheli mzhao98 shenjiede ruborias qiaodan-cuhk nikisim wyq199321 pynator kavuturuyeswanth

hindsight-experience-replay's Issues

With image obeservation

Hi, thank you for sharing the implementations!

I was encountered with memory problems when using images (128*128) as observations in my custom environment. I think the image observations should be first encoded into latent features to reduce dimensions (for reply buffer) before being inputted into the agent model. Do you have any recommoneded repo or advices in implementing this? Should the encoder model also be updated during agent training?

How does 'future_p' correspond to 'reply_k'?

Hello, I am a little confused about this equation:
self.future_p = 1 - (1. / (1 + replay_k))
I think reply_k means that we want to select k transitions in one episode(50 transitions) for computing HER goals, but how dose future_p correspond to this? Can you give some interpretation? Thank you!

How to debug your code in Pycharm?

@TianhongDai Hi, Tianhong. Very glad to find your implementation of HER in Pytorch, I would like to use as a baseline to do my own research, however I find it hard to debug with Pycharm, some errors shows up. Could you give me some help? if possible, we can talk in wechat, I have send you my id to your email. Thanks a lot.

Clipping target q values

Hello, I would like to ask question about clipping target q values to just negative numbers in :
https://github.com/TianhongDai/hindsight-experience-replay/blob/master/ddpg_agent.py#L216

Is it due to the fact that the reward is always less than zero, thus the values should be always less than zero ?

Thanks in advance !

Could you provide me with some advice?

Hello,

First of all, thank you for providing the DDPG+HER code; it has been a great help. However, I have some basic questions as I am just starting to learn about reinforcement learning. After adapting your application to my custom environment, I noticed that during the initial stages of training, the printed actor loss is very small, typically around 0.000-something, and the critic loss is usually about 0.0000-something. I am not sure if this is normal or if there is a problem somewhere?

FetchSlide pretrained model performing poorly

Hi,

Finding your code very helpful thanks!

Just to let you know, the pretrained model for FetchSlide only had a 40% success rate across 100 rollouts when I tested it.

how to set the distance_threshold

In the Fetchxxx-Env, the env's distance_threshold is set as 0.05 default to determine whether a task is completed successfully. I try to modify it by using env.distance_threshold = 0.01(or other value) because I think 0.05m is too rough for some tasks. But it doesn't work! (When the success_rate is 1.0, the distance between the achieved_goal and the desired_goal is 0.04x, which is larger than I set.)
How should I do to change the conditions for determining whether a task is complete?

Lib version updates

The required versions of different libs in readme are outdated, are there some newer versions available?

Thanks!

why lock?

Hello, thank you for your work. Here, some codes confuses me. Why do you use threading.Lock? I don't find any code about multi-threading in your project, thus there is no need for setting lock, isn't it?

about replay buffer

请问训练过程中的经验数据是在电脑硬盘还是存在内存？存在内存的话如果状态空间中有图像是不是很容易就存满了。
---入门rl新人求解

FetchPickAndPlace-v1

Hello!TianhongDai. Thank you very much for your code to help me! But I can't run the results given in the paper in environment 'FetchPickAndPlace-v1'. Can you give me some help?

How comes the amazing result in Fetch_pick_and_place-v1?

Hi, I find that it is pretty fast to train the agent in the Fetch_pick_and_place-v1 environment without using demonstrations. As we all know, this is a two-stage task, and the reward is quite sparse, which makes it the most difficult task in Fetch_Robot envs. But the time of training the agent in Fetch_pick_and_place-v1 using your code is not much longer than training in other environments(Reach, Slide and Push).

I find you are mainly using code from OpenAI Baselines, but their results show that we need very long time to train the agent in Fetch_pick_and_place-v1 environment if we don't use demonstartions. How do you explain it? Thank you!

glfwGetVideoMode: Assertion `monitor != ((void *)0)' failed. Aborted (core dumped)

When i evalute the performance with python demo.py --env-name='FetchReach-v1', there is a bug as below:

Creating window glfw
python: /builds/florianrhiem/pyGLFW/glfw-3.3.3/src/monitor.c:447: glfwGetVideoMode: Assertion `monitor != ((void *)0)' failed.
Aborted (core dumped)

i have try to solve it, but i can't find out the good.
could you give me some ideas. Tanks very much!

get global standard deviation for plotting

To plot the training results I can simply use the mean value logged in .log.
However, the standard deviation is not computed in _eval_agent. How did you get it?

what is the definition of actor_loss in ddpg_agent,.py?

 actor_loss = -self.critic_network(inputs_norm_tensor, actions_real).mean()
 actor_loss += self.args.action_l2 * (actions_real / self.env_params['action_max']).pow(2).mean()

I think the output of critic_network is enough to be the actor_loss. So is it a regularizer or trick?
it would be better for me to reply in Chinese.

FetchPickAndPlace-v1

Hello! I'm sorry to bother you again. I created a sub optimal policy when running ' mpirun -oversubscribe -np 16 python -u train.py --env-name='FetchPickAndPlace-v1' 2>&1 | tee pick.log'. When the success rate reaches 0.9, it will not increase. Can you analyze the reason? Thank you very much! I'm sorry to bother you again!

Performance on FetchReach-v0

The policy for the FetchReach task seems to converge much faster than in the original report, considering the fact that the given command in the ReadMe should result in 10 * 2 * 50 = 1000 timesteps per epoch, while the HER OpenAI report has about 95k timesteps in a single epoch. Is there a reason why this implementation does much better?? Am I missing something?

Some questions about distributed reinforcement learning

@TianhongDai Thank you for writing the perfect HER code, But I want to implement multi-process, multi-GPU or MPI in my code, but I have no idea, maybe you can give me some hints, can i communicate with you in WeChat Or other communication apps? Want to make a friend! Waiting for your reply.

cannot replicate the graphs

i tried training from scratch, only FetchReach perform at 1.0 success rate . while other cannot go beyond, 0.3 to 0.4.

Can I run the training in the handmanipulate env?

Hi, thanks for sharing the code. I'm wondering if I can train the DDPG agent in the handmanipulate env since they are from the same robotics env group.

ERROR: OpenGL version 1.5 or higher required

Hi, when I ran python demo.py --env-name='FetchReach-v1' on a ubuntu server, I met a problem here.
Creating window glfw
ERROR: OpenGL version 1.5 or higher required

Press Enter to exit ...
Any suggestion?

Why Push is not performing as expected

Hi,

Thank you for sharing the code. I've tried to run the code as suggested in readme.

mpirun -np 8 python -u train.py --env-name='FetchPush-v1' 2>&1 | tee push.log

But the success rate is much lower compared to the plot in readme. I got a success rate of about 0.2 after running 50 epochs. Do you have any idea why this might happen?

question about strategy for sampling goals for replay?

thanks a lot! tihs project works well with my own robotic environment. But I am confused about her.her_sampler.sample_her_transitions, because it's quite different from the strategy future as I think.

In paper, for every transition in buffer, k goals are sampled for every transition in buffer. Then k new transitions are stored in buffer, which seems to be data augmentation .In code, replay-k means ratio to replace, not the number of goals. As her.her_sampler.sample_her_transitions shows, when updating the network, 256 transitions are choosen and part of their goals are replaced with achieved goal.
Does replacing goals proportionally eual the strategy future?

How to reset goals when picking or pushing objects

Hi,
thanks for the interesting work. I am not sure how you set the goal for the pickup or pushing task, because the goal is the position of the object. As I understand, the goal of all the failure trials before touching the object is set to be the same object position, as it doesn't move. But it doesn't sound very correct to me. Is it set in this way? Thank you very much

Why single process on Push not work

Hi, Tianhong, thanks for sharing the code. I've tried to run your code based on the guidance in readme

mpirun -np 8 python -u train.py --env-name='FetchPush-v1' 2>&1 | tee push.log

BUt surprisingly I find that running

mpirun -np 1 python -u train.py --env-name='FetchPush-v1' 2>&1 | tee push.log

does not work at all.

Do you happen to know the reason why it does not work?