Code Monkey home page Code Monkey logo

awr's People

Contributors

tchlux avatar xbpeng avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

awr's Issues

Offline version of AWR

Hi, I am trying to modify AWR into the offline version (or fully off-policy version). I find that the paper states that one can simply treat the dataset as the replay buffer and don't need to do any modifications. But I notice that if I remove sampling in rl_agent.train, line 105 in rl_agent.py:
train_return, train_path_count, new_sample_count = self._rollout_train(self._samples_per_iter),
new_sample_count will remain 0, so that update steps are also 0.

Would you like to point out a proper way of modifications to obtain the offline AWR?

Train_Return vs Test_Return

Hi, Thank you for sharing the repo!

I was wondering how the Train_Return and Test_Return is calculated
and what the difference between the two.
I see that one is using norm_a_tf and sample_a_tf in the code.

Why Normalization of vf

Hello,

thanks for the code, while I tried to re-implement the program, I find that there is one step to normalize value function vf here . It's implementated by v_predict = v(s; \theta) * (1-/gamma) and critic update is implemented by
min_\theta [v(s; \theta) * (1-/gamma) - v_estimate ]^2.

Is there any reason to normalize Value functions output, I tested to remove the normalization term and rescaled learning rate(by 1-gamma), looks there is no problem in HalfCheetah-v2.

It holds similar performance with original version.

Best,

Parameters used for motion imitation

Hello,

I am trying to use this algorithm (rewritten in PyTorch with Gym vectorized envs) for motion imitation, starting with the PyBullet implementation of the DeepMimic environment. In the paper, section 5.3, there is a comparison of DeepMimic's modified off-policy PPO with AWR and RWR on some of DeepMimic's tasks, but no further information was given on which hyperparameters were used there.

The appendix gives some parameters which I think apply to the usual MuJoCo benchmarks, but I'm not sure if they also apply to the DeepMimic tasks (for instance the MLP hidden dimensions of (128, 64) don't seem right for DeepMimic since the original paper uses (1024, 512)).

Beta (temperature) set to 1.0

In the paper it says the Beta value is 0.05 but in the code here for all environments it is 1.0. Can you provide an explanation for this please?

Thank you.
Trenton

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.