zhangchuheng123 / reinforcement-implementation Goto Github PK
View Code? Open in Web Editor NEWImplementation of benchmark RL algorithms
Implementation of benchmark RL algorithms
在PPO算法实现中,https://github.com/zhangchuheng123/Reinforcement-Implementation/blob/master/code/ppo.py
total_loss = loss_surr + args.loss_coeff_value * loss_value + args.loss_coeff_entropy * loss_entropy
所定义的total_loss
中,loss_entropy
的方向是否应该为负?即最大化entropy。
Hi Chuheng,
Hope you're having a great day and staying safe. Thank you for open sourcing your implementation of PPO, which has helped me a lot during the initial stages of my research. However, after carefully inspecting the PPO implementation from openai/baselines, my investigation found 32 implementation details, which you can found at https://costa.sh/blog-the-32-implementation-details-of-ppo.html, yet it appears your implementation has some discrepancies. Namely, they are
2**0.5
instead of just 1
that is presented in your implementation1e-5
, which is different from the default 1e-8
Memory
seems quite unnecessary.I found your implementation to be especially helpful when I was getting into the field, And I feel quite strongly to give you these feedbacks. Hope they will be helpful to you.
Best wishes,
Costa
In your TRPO implementation, when doing the line-search, instead of using the KL constraint explicitly, it seems that your acceptance for model update only depends on the surrogate loss. The condition is like
if actual_improve > 0 and actual_improve > alpha * expected_improve:
return true, xnew
Did I misunderstand your codes? Can you please give an explanation of the implementation here? Thank you.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.