Code Monkey home page Code Monkey logo

baby-a3c's People

Contributors

greydanus avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar

baby-a3c's Issues

why override the default step()

baby-a3c/baby-a3c.py

Lines 72 to 78 in 85899d7

def step(self, closure=None):
for group in self.param_groups:
for p in group['params']:
if p.grad is None: continue
self.state[p]['shared_steps'] += 1
self.state[p]['step'] = self.state[p]['shared_steps'][0] - 1 # a "step += 1" comes later
super.step(closure)

Why did you override the default implementation of step(closure)? The default one calculates exponential moving average. Your implementation doesn't calculate the step count because it always returns None. I looked over torch's documentation for step() but couldn't understand exactly why you chose to overide the step function.
Kindly review the following PR: #9

*** Error in `python': corrupted size vs. prev_size: 0x0000000000863430 ***

If I disabled all GPUs then I get an error
"*** Error in `python': corrupted size vs. prev_size: 0x0000000000863430 *** "

If I don't disable GPU then I get error

terminate called after throwing an instance of 'std::runtime_error'
what(): CUDA error (3): initialization error
terminate called after throwing an instance of 'std::runtime_error'
what(): CUDA error (3): initialization error
terminate called after throwing an instance of 'std::runtime_error'
what(): CUDA error (3): initialization error
terminate called after throwing an instance of 'std::runtime_error'
what(): CUDA error (3): initialization error

Then
*** Error in `python': corrupted size vs. prev_size: 0x00000000021dc430 ***

Is there any idea to fix it?

Decreasing entropy factor in loss function

Hi @greydanus ,

I'm wondering is there a specific reason why you've added a term which decreases the entropy in the loss function. From other implementations of A3C I've seen, a factor to increase the entropy is added instead with the factor being reduced over time. My understanding is that preserving a small amount of entropy helps by encouraging exploration.

Many Thanks,
Akmal Bakar

sum vs. mean

Hi, Sam.

First of all, thank so much for this code and making it available. Having such a short implementation definitely helps understanding the algorithm.

I was hoping you can give me some intuition behind your use of sum instead of mean. In my implementation of REINFORCE, A3C, GAE, A2C I use mean and things work fine. Equations in online resources seem to suggest mean is the right approach. Other implementations, also use the mean.

Now, your implementation works very well, too! I tested it myself with ATARI games and other environments, and got rock solid results.

Can you share some insights on the use of sum instead of mean?

Again, thanks so much in advance!

license

Hello,
would you mind adding an open-source license to this project?

Loop should break when episode is done

Hi, I believe we should break out of the

for step in range(args.rnn_steps):

loop when done == True. Currently, when the environment indicates that the episode is done, the loop continues to go on for a couple of steps. Not sure how the Gym environment responds to that, but new values, rewards etc keep being added to the lists and that can't be good for training.

No saved model

Hi, I've run the script in training mode, and even after the training was over, if I then run it in test or render mode I was given the "no saved model" message. How do I make it save the model? And how do I render the (saved and learned) policy only once the training is over? Thanks in advance for your time!

TensorFlow 2 implementation

Thanks for your great implementation.
Currently Iam trying to translate it to TF2 implementation. But I find it difficult for me to understand SharedAdam part and do not know how to implement it in TF2.
Could you kindly give me some tips?
Thank you.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.