Code Monkey home page Code Monkey logo

hindsight-experience-replay's People

Contributors

tianhongdai avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar

hindsight-experience-replay's Issues

With image obeservation

Hi, thank you for sharing the implementations!

I was encountered with memory problems when using images (128*128) as observations in my custom environment. I think the image observations should be first encoded into latent features to reduce dimensions (for reply buffer) before being inputted into the agent model. Do you have any recommoneded repo or advices in implementing this? Should the encoder model also be updated during agent training?

How does 'future_p' correspond to 'reply_k'?

Hello, I am a little confused about this equation:
self.future_p = 1 - (1. / (1 + replay_k))
I think reply_k means that we want to select k transitions in one episode(50 transitions) for computing HER goals, but how dose future_p correspond to this? Can you give some interpretation? Thank you!

How to debug your code in Pycharm?

@TianhongDai Hi, Tianhong. Very glad to find your implementation of HER in Pytorch, I would like to use as a baseline to do my own research, however I find it hard to debug with Pycharm, some errors shows up. Could you give me some help? if possible, we can talk in wechat, I have send you my id to your email. Thanks a lot.

Could you provide me with some advice?

Hello,

First of all, thank you for providing the DDPG+HER code; it has been a great help. However, I have some basic questions as I am just starting to learn about reinforcement learning. After adapting your application to my custom environment, I noticed that during the initial stages of training, the printed actor loss is very small, typically around 0.000-something, and the critic loss is usually about 0.0000-something. I am not sure if this is normal or if there is a problem somewhere?

how to set the distance_threshold

In the Fetchxxx-Env, the env's distance_threshold is set as 0.05 default to determine whether a task is completed successfully. I try to modify it by using env.distance_threshold = 0.01(or other value) because I think 0.05m is too rough for some tasks. But it doesn't work! (When the success_rate is 1.0, the distance between the achieved_goal and the desired_goal is 0.04x, which is larger than I set.)
How should I do to change the conditions for determining whether a task is complete?

Lib version updates

The required versions of different libs in readme are outdated, are there some newer versions available?

Thanks!

why lock?

Hello, thank you for your work. Here, some codes confuses me. Why do you use threading.Lock? I don't find any code about multi-threading in your project, thus there is no need for setting lock, isn't it?

about replay buffer

请问训练过程中的经验数据是在电脑硬盘还是存在内存?存在内存的话如果状态空间中有图像是不是很容易就存满了。
---入门rl新人求解

FetchPickAndPlace-v1

Hello!TianhongDai. Thank you very much for your code to help me! But I can't run the results given in the paper in environment 'FetchPickAndPlace-v1'. Can you give me some help?

How comes the amazing result in Fetch_pick_and_place-v1?

Hi, I find that it is pretty fast to train the agent in the Fetch_pick_and_place-v1 environment without using demonstrations. As we all know, this is a two-stage task, and the reward is quite sparse, which makes it the most difficult task in Fetch_Robot envs. But the time of training the agent in Fetch_pick_and_place-v1 using your code is not much longer than training in other environments(Reach, Slide and Push).

I find you are mainly using code from OpenAI Baselines, but their results show that we need very long time to train the agent in Fetch_pick_and_place-v1 environment if we don't use demonstartions. How do you explain it? Thank you!

glfwGetVideoMode: Assertion `monitor != ((void *)0)' failed. Aborted (core dumped)

When i evalute the performance with python demo.py --env-name='FetchReach-v1', there is a bug as below:

Creating window glfw
python: /builds/florianrhiem/pyGLFW/glfw-3.3.3/src/monitor.c:447: glfwGetVideoMode: Assertion `monitor != ((void *)0)' failed.
Aborted (core dumped)

i have try to solve it, but i can't find out the good.
could you give me some ideas. Tanks very much!

what is the definition of actor_loss in ddpg_agent,.py?

 actor_loss = -self.critic_network(inputs_norm_tensor, actions_real).mean()
 actor_loss += self.args.action_l2 * (actions_real / self.env_params['action_max']).pow(2).mean()

I think the output of critic_network is enough to be the actor_loss. So is it a regularizer or trick?
it would be better for me to reply in Chinese.

FetchPickAndPlace-v1

Hello! I'm sorry to bother you again. I created a sub optimal policy when running ' mpirun -oversubscribe -np 16 python -u train.py --env-name='FetchPickAndPlace-v1' 2>&1 | tee pick.log'. When the success rate reaches 0.9, it will not increase. Can you analyze the reason? Thank you very much! I'm sorry to bother you again!

2021-09-30 10-21-47 的屏幕截图

Performance on FetchReach-v0

The policy for the FetchReach task seems to converge much faster than in the original report, considering the fact that the given command in the ReadMe should result in 10 * 2 * 50 = 1000 timesteps per epoch, while the HER OpenAI report has about 95k timesteps in a single epoch. Is there a reason why this implementation does much better?? Am I missing something?

Some questions about distributed reinforcement learning

@TianhongDai Thank you for writing the perfect HER code, But I want to implement multi-process, multi-GPU or MPI in my code, but I have no idea, maybe you can give me some hints, can i communicate with you in WeChat Or other communication apps? Want to make a friend! Waiting for your reply.

cannot replicate the graphs

i tried training from scratch, only FetchReach perform at 1.0 success rate . while other cannot go beyond, 0.3 to 0.4.

ERROR: OpenGL version 1.5 or higher required

Hi, when I ran python demo.py --env-name='FetchReach-v1' on a ubuntu server, I met a problem here.
Creating window glfw
ERROR: OpenGL version 1.5 or higher required

Press Enter to exit ...
Any suggestion?

Why Push is not performing as expected

Hi,

Thank you for sharing the code. I've tried to run the code as suggested in readme.

mpirun -np 8 python -u train.py --env-name='FetchPush-v1' 2>&1 | tee push.log

But the success rate is much lower compared to the plot in readme. I got a success rate of about 0.2 after running 50 epochs. Do you have any idea why this might happen?

question about strategy for sampling goals for replay?

thanks a lot! tihs project works well with my own robotic environment. But I am confused about her.her_sampler.sample_her_transitions, because it's quite different from the strategy future as I think.
Screenshot from 2022-03-30 17-16-02
In paper, for every transition in buffer, k goals are sampled for every transition in buffer. Then k new transitions are stored in buffer, which seems to be data augmentation .In code, replay-k means ratio to replace, not the number of goals. As her.her_sampler.sample_her_transitions shows, when updating the network, 256 transitions are choosen and part of their goals are replaced with achieved goal.
Does replacing goals proportionally eual the strategy future?

How to reset goals when picking or pushing objects

Hi,
thanks for the interesting work. I am not sure how you set the goal for the pickup or pushing task, because the goal is the position of the object. As I understand, the goal of all the failure trials before touching the object is set to be the same object position, as it doesn't move. But it doesn't sound very correct to me. Is it set in this way? Thank you very much

Why single process on Push not work

Hi, Tianhong, thanks for sharing the code. I've tried to run your code based on the guidance in readme

mpirun -np 8 python -u train.py --env-name='FetchPush-v1' 2>&1 | tee push.log

BUt surprisingly I find that running

mpirun -np 1 python -u train.py --env-name='FetchPush-v1' 2>&1 | tee push.log

does not work at all.

Do you happen to know the reason why it does not work?

Unnecessary copying makes sampling time explode

Hello,
I noticed that the training time increases linearly at each epoch, in fact, it seems to scale up as the buffer size increases. After a bit of research, I think that the reason is the .copy() in this line of the sampling function.

I might have missed something, but it seems that the copying is not needed; the original implementation in baselines also doesn't have this. I removed it and it seems to fix the issue.

How to obtain plots?

Hi, how do you obtain the plotsshowing success rate? Do you load and plot from model.pt, or do you use different software to obtain the plots shown in the readme?

Thanks!

Why MPI.sum in sync_grad (utils.py)

Why do you sum rather than average the gradients in sync_grads? Won't this result in different learning rates when you run different number of processes?

All process update the network and then sync the grad?

Hi, I have the doubt.
In this distributed RL, because of OS scheduling, every process will have the near state and do the same thing?
so all process will update the network and then sync the grad together? like sync algorithm A2C?
thanks very much.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.