tianhongdai / hindsight-experience-replay Goto Github PK
View Code? Open in Web Editor NEWThis is the pytorch implementation of Hindsight Experience Replay (HER) - Experiment on all fetch robotic environments.
License: MIT License
This is the pytorch implementation of Hindsight Experience Replay (HER) - Experiment on all fetch robotic environments.
License: MIT License
Hi, thank you for sharing the implementations!
I was encountered with memory problems when using images (128*128) as observations in my custom environment. I think the image observations should be first encoded into latent features to reduce dimensions (for reply buffer) before being inputted into the agent model. Do you have any recommoneded repo or advices in implementing this? Should the encoder model also be updated during agent training?
Hello, I am a little confused about this equation:
self.future_p = 1 - (1. / (1 + replay_k))
I think reply_k
means that we want to select k transitions in one episode(50 transitions) for computing HER goals, but how dose future_p
correspond to this? Can you give some interpretation? Thank you!
@TianhongDai Hi, Tianhong. Very glad to find your implementation of HER in Pytorch, I would like to use as a baseline to do my own research, however I find it hard to debug with Pycharm, some errors shows up. Could you give me some help? if possible, we can talk in wechat, I have send you my id to your email. Thanks a lot.
Hello, I would like to ask question about clipping target q values to just negative numbers in :
https://github.com/TianhongDai/hindsight-experience-replay/blob/master/ddpg_agent.py#L216
Is it due to the fact that the reward is always less than zero, thus the values should be always less than zero ?
Thanks in advance !
Hello,
First of all, thank you for providing the DDPG+HER code; it has been a great help. However, I have some basic questions as I am just starting to learn about reinforcement learning. After adapting your application to my custom environment, I noticed that during the initial stages of training, the printed actor loss is very small, typically around 0.000-something, and the critic loss is usually about 0.0000-something. I am not sure if this is normal or if there is a problem somewhere?
Hi,
Finding your code very helpful thanks!
Just to let you know, the pretrained model for FetchSlide only had a 40% success rate across 100 rollouts when I tested it.
In the Fetchxxx-Env, the env's distance_threshold is set as 0.05 default to determine whether a task is completed successfully. I try to modify it by using env.distance_threshold = 0.01(or other value)
because I think 0.05m is too rough for some tasks. But it doesn't work! (When the success_rate is 1.0, the distance between the achieved_goal and the desired_goal is 0.04x, which is larger than I set.)
How should I do to change the conditions for determining whether a task is complete?
The required versions of different libs in readme
are outdated, are there some newer versions available?
Thanks!
Hello, thank you for your work. Here, some codes confuses me. Why do you use threading.Lock? I don't find any code about multi-threading in your project, thus there is no need for setting lock, isn't it?
请问训练过程中的经验数据是在电脑硬盘还是存在内存?存在内存的话如果状态空间中有图像是不是很容易就存满了。
---入门rl新人求解
Hello!TianhongDai. Thank you very much for your code to help me! But I can't run the results given in the paper in environment 'FetchPickAndPlace-v1'. Can you give me some help?
Hi, I find that it is pretty fast to train the agent in the Fetch_pick_and_place-v1 environment without using demonstrations. As we all know, this is a two-stage task, and the reward is quite sparse, which makes it the most difficult task in Fetch_Robot envs. But the time of training the agent in Fetch_pick_and_place-v1 using your code is not much longer than training in other environments(Reach, Slide and Push).
I find you are mainly using code from OpenAI Baselines, but their results show that we need very long time to train the agent in Fetch_pick_and_place-v1 environment if we don't use demonstartions. How do you explain it? Thank you!
When i evalute the performance with python demo.py --env-name='FetchReach-v1', there is a bug as below:
Creating window glfw
python: /builds/florianrhiem/pyGLFW/glfw-3.3.3/src/monitor.c:447: glfwGetVideoMode: Assertion `monitor != ((void *)0)' failed.
Aborted (core dumped)
i have try to solve it, but i can't find out the good.
could you give me some ideas. Tanks very much!
To plot the training results I can simply use the mean value logged in .log.
However, the standard deviation is not computed in _eval_agent. How did you get it?
actor_loss = -self.critic_network(inputs_norm_tensor, actions_real).mean()
actor_loss += self.args.action_l2 * (actions_real / self.env_params['action_max']).pow(2).mean()
I think the output of critic_network is enough to be the actor_loss. So is it a regularizer or trick?
it would be better for me to reply in Chinese.
Hello! I'm sorry to bother you again. I created a sub optimal policy when running ' mpirun -oversubscribe -np 16 python -u train.py --env-name='FetchPickAndPlace-v1' 2>&1 | tee pick.log'. When the success rate reaches 0.9, it will not increase. Can you analyze the reason? Thank you very much! I'm sorry to bother you again!
The policy for the FetchReach task seems to converge much faster than in the original report, considering the fact that the given command in the ReadMe should result in 10 * 2 * 50 = 1000 timesteps per epoch, while the HER OpenAI report has about 95k timesteps in a single epoch. Is there a reason why this implementation does much better?? Am I missing something?
@TianhongDai Thank you for writing the perfect HER code, But I want to implement multi-process, multi-GPU or MPI in my code, but I have no idea, maybe you can give me some hints, can i communicate with you in WeChat Or other communication apps? Want to make a friend! Waiting for your reply.
i tried training from scratch, only FetchReach perform at 1.0 success rate . while other cannot go beyond, 0.3 to 0.4.
Hi, thanks for sharing the code. I'm wondering if I can train the DDPG agent in the handmanipulate env since they are from the same robotics env group.
Hi, when I ran python demo.py --env-name='FetchReach-v1' on a ubuntu server, I met a problem here.
Creating window glfw
ERROR: OpenGL version 1.5 or higher required
Press Enter to exit ...
Any suggestion?
Hi,
Thank you for sharing the code. I've tried to run the code as suggested in readme.
mpirun -np 8 python -u train.py --env-name='FetchPush-v1' 2>&1 | tee push.log
But the success rate is much lower compared to the plot in readme. I got a success rate of about 0.2 after running 50 epochs. Do you have any idea why this might happen?
thanks a lot! tihs project works well with my own robotic environment. But I am confused about her.her_sampler.sample_her_transitions
, because it's quite different from the strategy future as I think.
In paper, for every transition in buffer, k goals are sampled for every transition in buffer. Then k new transitions are stored in buffer, which seems to be data augmentation .In code, replay-k
means ratio to replace, not the number of goals. As her.her_sampler.sample_her_transitions
shows, when updating the network, 256 transitions are choosen and part of their goals are replaced with achieved goal.
Does replacing goals proportionally eual the strategy future?
Hi,
thanks for the interesting work. I am not sure how you set the goal for the pickup or pushing task, because the goal is the position of the object. As I understand, the goal of all the failure trials before touching the object is set to be the same object position, as it doesn't move. But it doesn't sound very correct to me. Is it set in this way? Thank you very much
Hi, Tianhong, thanks for sharing the code. I've tried to run your code based on the guidance in readme
mpirun -np 8 python -u train.py --env-name='FetchPush-v1' 2>&1 | tee push.log
BUt surprisingly I find that running
mpirun -np 1 python -u train.py --env-name='FetchPush-v1' 2>&1 | tee push.log
does not work at all.
Do you happen to know the reason why it does not work?
Hello,
I noticed that the training time increases linearly at each epoch, in fact, it seems to scale up as the buffer size increases. After a bit of research, I think that the reason is the .copy()
in this line of the sampling function.
I might have missed something, but it seems that the copying is not needed; the original implementation in baselines also doesn't have this. I removed it and it seems to fix the issue.
Hi, how do you obtain the plotsshowing success rate? Do you load and plot from model.pt
, or do you use different software to obtain the plots shown in the readme?
Thanks!
Why do you sum rather than average the gradients in sync_grads? Won't this result in different learning rates when you run different number of processes?
Hi, I have the doubt.
In this distributed RL, because of OS scheduling, every process will have the near state and do the same thing?
so all process will update the network and then sync the grad together? like sync algorithm A2C?
thanks very much.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.