greydanus / baby-a3c Goto Github PK

View Code? Open in Web Editor NEW

168.0 5.0 43.0 83.64 MB

A high-performance Atari A3C agent in 180 lines of PyTorch

License: Apache License 2.0

Python 100.00%

deep-reinforcement-learning a3c actor-critic pytorch pytorch-a3c atari pytorch-rl

baby-a3c's People

Contributors

Stargazers

Watchers

Forkers

cosmmb lwneal mattolson93 kastnerkyle shubhampachori12110095 ellerylin mhwaliji vvanirudh luochao1024 jingweiz 174high xkzju landoufulxf dz9 flybirp gazlaws-dev renatolfc sarikayamehmet jwlee89 mynkpl1998 nicolas99-9 robot0102 stjordanis mimoralea roclark cfj1996123 5l1v3r1 luo-li banben fanbbbb guobaoyo zeta1999 blondemonk williammunch dpineo chengjiangchang ondrejbiza vdblm jaehyek tianbingsheng rphilipzhang knut0815 haydenk

baby-a3c's Issues

why override the default step()

baby-a3c/baby-a3c.py

Lines 72 to 78 in 85899d7

    
           def step(self, closure=None): 
        
               for group in self.param_groups: 
        
                   for p in group['params']: 
        
                       if p.grad is None: continue 
        
                       self.state[p]['shared_steps'] += 1 
        
                       self.state[p]['step'] = self.state[p]['shared_steps'][0] - 1 # a "step += 1"  comes later 
        
               super.step(closure)

Why did you override the default implementation of step(closure)? The default one calculates exponential moving average. Your implementation doesn't calculate the step count because it always returns None. I looked over torch's documentation for step() but couldn't understand exactly why you chose to overide the step function.
Kindly review the following PR: #9

bugs when episode is done

Dear Author,

I think when an episode is done, hx should be reset. I am not sure whether it's a bug in
https://github.com/greydanus/baby-a3c/blob/master/baby-a3c.py#L144

* Error in `python': corrupted size vs. prev_size: 0x0000000000863430 *

If I disabled all GPUs then I get an error
"*** Error in `python': corrupted size vs. prev_size: 0x0000000000863430 *** "

If I don't disable GPU then I get error

terminate called after throwing an instance of 'std::runtime_error'
what(): CUDA error (3): initialization error
terminate called after throwing an instance of 'std::runtime_error'
what(): CUDA error (3): initialization error
terminate called after throwing an instance of 'std::runtime_error'
what(): CUDA error (3): initialization error
terminate called after throwing an instance of 'std::runtime_error'
what(): CUDA error (3): initialization error

Then
*** Error in `python': corrupted size vs. prev_size: 0x00000000021dc430 ***

Is there any idea to fix it?

Decreasing entropy factor in loss function

Hi @greydanus ,

I'm wondering is there a specific reason why you've added a term which decreases the entropy in the loss function. From other implementations of A3C I've seen, a factor to increase the entropy is added instead with the factor being reduced over time. My understanding is that preserving a small amount of entropy helps by encouraging exploration.

Many Thanks,
Akmal Bakar

Why sync grad only when grad in None

The shared_param.grad is synced only when it is None here https://github.com/greydanus/baby-a3c/blob/master/baby-a3c.py#L159. I am kind of confused. I think we have to sync it without the condition above. That means we have to sync it whenever the local model calculates a grad. Is it auto synced somewhere? Thank you for your time

sum vs. mean

Hi, Sam.

First of all, thank so much for this code and making it available. Having such a short implementation definitely helps understanding the algorithm.

I was hoping you can give me some intuition behind your use of sum instead of mean. In my implementation of REINFORCE, A3C, GAE, A2C I use mean and things work fine. Equations in online resources seem to suggest mean is the right approach. Other implementations, also use the mean.

Now, your implementation works very well, too! I tested it myself with ATARI games and other environments, and got rock solid results.

Can you share some insights on the use of sum instead of mean?

Again, thanks so much in advance!

license

Hello,
would you mind adding an open-source license to this project?

finished with exit code 0 qucikly when in train mode (can't train)

Hi, everyone. When I run in train mode, the code finished with exit code 0 quickly (within 10 s) without reporting any error. But, it runs normally when in test mode with a single process or multi-process. Is there anyone facing the same problem?

Loop should break when episode is done

Hi, I believe we should break out of the

for step in range(args.rnn_steps):

loop when done == True. Currently, when the environment indicates that the episode is done, the loop continues to go on for a couple of steps. Not sure how the Gym environment responds to that, but new values, rewards etc keep being added to the lists and that can't be good for training.

No saved model

Hi, I've run the script in training mode, and even after the training was over, if I then run it in test or render mode I was given the "no saved model" message. How do I make it save the model? And how do I render the (saved and learned) policy only once the training is over? Thanks in advance for your time!

TensorFlow 2 implementation

Thanks for your great implementation.
Currently Iam trying to translate it to TF2 implementation. But I find it difficult for me to understand SharedAdam part and do not know how to implement it in TF2.
Could you kindly give me some tips?
Thank you.

greydanus / baby-a3c Goto Github PK

baby-a3c's People

Contributors

Stargazers

Watchers

Forkers

baby-a3c's Issues

why override the default step()

bugs when episode is done

* Error in `python': corrupted size vs. prev_size: 0x0000000000863430 *

Decreasing entropy factor in loss function

Why sync grad only when grad in None

sum vs. mean

license

finished with exit code 0 qucikly when in train mode (can't train)

Loop should break when episode is done

No saved model

TensorFlow 2 implementation

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent

	def step(self, closure=None):
	for group in self.param_groups:
	for p in group['params']:
	if p.grad is None: continue
	self.state[p]['shared_steps'] += 1
	self.state[p]['step'] = self.state[p]['shared_steps'][0] - 1 # a "step += 1" comes later
	super.step(closure)