lweitkamp / option-critic-pytorch Goto Github PK
View Code? Open in Web Editor NEWPyTorch implementation of the Option-Critic framework, Harb et al. 2016
PyTorch implementation of the Option-Critic framework, Harb et al. 2016
Does this code support the continuous action space environment?
you need to re-evaluate the features/state after the optimization step
optim.step()
because that updates the feature layer hence the features themselves
The termination probability is calculated over the next state according to the original paper. So it should be using next_obs instead of obs.
option-critic-pytorch/option_critic.py
Line 238 in 0c57da7
I believe, when the batch size is reached, the action policy/loss is only computed with a single sample instead of one of size batch_size. This looks to be a major difference from the original implementation unless I am understanding the code wrong.
"python main.py --switch-goal True --env fourrooms"
Should "--num-options 4" be added?
For the fourrooms environment, the number of Option is 4.
Maybe it's something I didn't understand correctly, looking forward to and thank you for your answer.
Hi, I got inplace operation message while running your code. It seems to be caused by wrong detach in calculating the loss function. I tried to find a similar issue but I could not find anything. Could you have a look at it?
Traceback (most recent call last):
File "main.py", line 147, in <module>
run(args)
File "main.py", line 129, in run
loss.backward()
File "/home/mw/anaconda3/envs/HRL/lib/python3.6/site-packages/torch/_tensor.py", line 255, in backward
torch.autograd.backward(self, gradient, retain_graph, create_graph, inputs=inputs)
File "/home/mw/anaconda3/envs/HRL/lib/python3.6/site-packages/torch/autograd/__init__.py", line 149, in backward
allow_unreachable=True, accumulate_grad=True) # allow_unreachable flag
RuntimeError: one of the variables needed for gradient computation has been modified by an inplace operation: [torch.cuda.FloatTensor [32, 64]], which is output 0 of TBackward, is at version 2; expected version 1 instead. Hint: enable anomaly detection to find the operation that failed to compute its gradient, with torch.autograd.set_detect_anomaly(True).
Hello when I download the code and run it in my computer, and I met an error in loss.backward()
RuntimeError: one of the variables needed for gradient computaion has been modified by an inplace operation:[torch.FloatTensor [32, 64], which is output 0 of AsStrideBackward0, is at version 2; expected version 1 instead]
I didn't modify the code anywhere.
my dependencies
pytorch 1.3.0
python 3.6.13
tensorboard 2.0.2
gym 0.15.3
option-critic-pytorch/option_critic.py
Line 241 in 0c57da7
Thanks for providing the pytorch version of option critic. I want to ask why don't we clean replay buffer after each episode for on-policy policy gradient update? I think both algorithm 1 in the paper and the derivation for the intra-option policy gradient theorem is done under the setting of on-policy setup. If we do not clean the replay buffer, importance sampling should be implemented to account for the off-policy update. But I did not see any part of code related to that. I tried to read the original Theano repo, but it seems that they did the same thing. Do you have any comments on this?
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.