matthewja / inverse-reinforcement-learning Goto Github PK
View Code? Open in Web Editor NEWImplementations of selected inverse reinforcement learning algorithms.
License: MIT License
Implementations of selected inverse reinforcement learning algorithms.
License: MIT License
The environments of GridWorld and ObjectWorld are all tabular environments, in which the states are discreate and limited. We can easily write down the feature matrix by listing all possible states.
However, when we are dealing with more complicated non-tabular environments (such as Super Mario Game), it's impossible to represent the feature matrix by explicitly listing all possible states, since all states are continuous (e.g. any picture of Super Mario Game at time t) and infinite.
So, how to implement inverse reinforcement learning to deal with non-tabular environment like Super Mario Game? Anyone have any idea about this?
In the backwards pass of MaxEnt (Algo 9.1 Brian's thesis), MaxEnt uses a softmax calculation to update the V
function (soft Value function), but maxent.py seems to call value_iteration.optimal_value which calculates the hard Value
function that is it uses max instead of softmax. This seems like a bug.
Also the initialization seems kind of weird, atleast for gridworld settings only the final state should be initialized to 0 while all others should be -infinity but value_iteration.optimal_value seems to set everything to 0 initially. Any reason for this discrepancy?
Code for reference: https://github.com/MatthewJA/Inverse-Reinforcement-Learning/blob/master/irl/value_iteration.py#L63
I am having a little bit of issues with this part of the code: φ = T.nnet.sigmoid(th.compile.ops.Rebroadcast((0, False), (1, True))(b) + W.dot(φs[-1])).
Traceback (most recent call last):
File "C:\Users\Sankalp Chauhan\AppData\Local\Programs\Python\Python37\lib\site-packages\theano\gof\vm.py", line 301, in call
thunk()
File "C:\Users\Sankalp Chauhan\AppData\Local\Programs\Python\Python37\lib\site-packages\theano\gof\op.py", line 892, in rval
r = p(n, [x[0] for x in i], o)
File "C:\Users\Sankalp Chauhan\AppData\Local\Programs\Python\Python37\lib\site-packages\theano\tensor\blas.py", line 1552, in perform
z[0] = np.asarray(np.dot(x, y))
ValueError: ('shapes (3,10) and (2,41) not aligned: 10 (dim 1) != 2 (dim 0)', (3, 10), (2, 41))
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "ID_grouping.py", line 363, in
learning_rate,initialisation="normal",l1=0.1,l2=0.1))
File "ID_grouping.py", line 355, in irl
reward = train(reshaped_to_2d)
File "C:\Users\Sankalp Chauhan\AppData\Local\Programs\Python\Python37\lib\site-packages\theano\compile\function_module.py", line 903, in call
self.fn() if output_subset is None else
File "C:\Users\Sankalp Chauhan\AppData\Local\Programs\Python\Python37\lib\site-packages\theano\gof\vm.py", line 305, in call
link.raise_with_op(node, thunk)
File "C:\Users\Sankalp Chauhan\AppData\Local\Programs\Python\Python37\lib\site-packages\theano\gof\link.py", line 325, in raise_with_op
reraise(exc_type, exc_value, exc_trace)
File "C:\Users\Sankalp Chauhan\AppData\Local\Programs\Python\Python37\lib\site-packages\six.py", line 702, in reraise
raise value.with_traceback(tb)
File "C:\Users\Sankalp Chauhan\AppData\Local\Programs\Python\Python37\lib\site-packages\theano\gof\vm.py", line 301, in call
thunk()
File "C:\Users\Sankalp Chauhan\AppData\Local\Programs\Python\Python37\lib\site-packages\theano\gof\op.py", line 892, in rval
r = p(n, [x[0] for x in i], o)
File "C:\Users\Sankalp Chauhan\AppData\Local\Programs\Python\Python37\lib\site-packages\theano\tensor\blas.py", line 1552, in perform
z[0] = np.asarray(np.dot(x, y))
ValueError: ('shapes (3,10) and (2,41) not aligned: 10 (dim 1) != 2 (dim 0)', (3, 10), (2, 41))
Apply node that caused the error: Dot22(W, x.T)
Toposort index: 28
Inputs types: [TensorType(float64, matrix), TensorType(float64, matrix)]
Inputs shapes: [(3, 10), (2, 41)]
Inputs strides: [(80, 8), (8, 16)]
Inputs values: ['not shown', 'not shown']
Outputs clients: [[Elemwise{Composite{scalar_sigmoid((i0 + i1))}}[(0, 1)](Rebroadcast{?,1}.0, Dot22.0)]]
Backtrace when the node is created(use Theano flag traceback.limit=N to make it longer):
File "ID_grouping.py", line 363, in
learning_rate,initialisation="normal",l1=0.1,l2=0.1))
File "ID_grouping.py", line 307, in irl
φ = T.nnet.sigmoid(th.compile.ops.Rebroadcast((0, False), (1, True))(b) + W.dot(φs[-1]))
HINT: Use the Theano flag 'exception_verbosity=high' for a debugprint and storage map footprint of this apply node.
Hello, thanks for sharing your code. I'm running examples/lp_gridworld.py
and seeing this reward estimate, which looks good:
However, when I change the body of gridworld.reward
to e.g.:
def reward(self, state_int):
if state_int == 2: # Goal state now in bottom right of 3x3, not top right
return 1
return 0
... then I see this reward estimate:
i.e. linear_irl.irl
seems to assume that the 'goal state' is in the top right. Have I got something wrong? How can I get linear IRL to work with different goal states? Thanks.
When I test the code.
There is a fault about the super() function.
So, do you implement the code in python 3?
gw = gridworld.Gridworld(5, .3, .2)
gw.transition_probability[7,0,:].reshape(5,5)
outputs
array([
[ 0. , 0. , 0.075, 0. , 0. ],
[ 0. , 0.075, 0.075, 0.775, 0. ],
[ 0. , 0. , 0.075, 0. , 0. ],
[ 0. , 0. , 0. , 0. , 0. ],
[ 0. , 0. , 0. , 0. , 0. ]])
But should sum of this be 1 ?
Hello!I am a graduate student from China,and in your demo,the feature_matrix is not provided,so when giving state space,how to obtain the feature_matrix?I would be grateful if you could answer.
Apart from that, I have some questions about the details of inverse reinforcement learning,If you can provide your contact information,I would very appreciate.
Hi Matthew!
This repo is just great: It works, its transparant and modular!
I only found two differences between Ziebart's thesis and your implementation.
Can you let me know if you were aware of them?
Here is your code:
And here is Eq 9.1:
Which uses
And here is your code:
You include a discount factor in Eq 9.2, and in 9.1 you convert a subtraction (
I am currently running code and am getting an error:
I reshaped the feature matrix as reshaped_to_2d_reshape as (num_of_states, num_of_dimensions)
and am still getting this error.
I am not sure how to debug it.
Traceback (most recent call last):
File "ID_grouping.py", line 409, in
learning_rate,initialisation="normal",l1=0.1,l2=0.1))
File "ID_grouping.py", line 398, in irl
reward = train(reshaped_to_2d_reshape[0])
File "C:\Users\Sankalp Chauhan\AppData\Local\Programs\Python\Python37\lib\site-packages\theano\compile\function_module.py", line 813, in call
allow_downcast=s.allow_downcast)
File "C:\Users\Sankalp Chauhan\AppData\Local\Programs\Python\Python37\lib\site-packages\theano\tensor\type.py", line 178, in filter
data.shape))
TypeError: Bad input argument to theano function with name "ID_grouping.py:388" at index 0 (0-based).
Backtrace when that variable is created:
File "ID_grouping.py", line 409, in
learning_rate,initialisation="normal",l1=0.1,l2=0.1))
File "ID_grouping.py", line 316, in irl
s_feature_matrix = T.matrix("x")
Wrong number of dimensions: expected 2, got 1 with shape (5,).
What version of theano and theano.tensor you use in your code?
This occurs with deep_maxent.py
I would like to ask you if you can list some references in order to understand how you formulated the block matrix form of the linear program for solving large state space problem according the paper by Ng. I am missing how you treat the function p(x) = x if x>0 or 2x if x<0 that is part of the objective function.
I need help to try to remove this warning:
WARNING (theano.configdefaults): g++ not available, if using conda: conda install m2w64-toolchain
C:\Users\Sankalp Chauhan\AppData\Local\Programs\Python\Python37\lib\site-packages\theano\configdefaults.py:560: UserWarning: DeprecationWarning: there is no c++ compiler.This is deprecated and with Theano 0.11 a c++ compiler will be mandatory warnings.warn("DeprecationWarning: there is no c++ compiler." WARNING (theano.configdefaults): g++ not detected ! Theano will be unable to execute optimized C-implementations (for both CPU and GPU) and will default to Python implementations. Performance will be severely degraded. To remove this warning, set Theano flags cxx to an empty string.
My code is working but it is taking a long time to get output
Can you give some points on how you designed the feature matrix? You kept it as 25*25 (in case of Gridword), where each state is represented separately. According to my understanding, states have to be grouped together according to their characteristics (eg. goal state, ground states, puddles states etc). Then why did you characterized each state separately?
Hello,
The report link in Readme.md is broken, it'd be great to me and future readers if it's updated to read how the algorithm works.
Thanks.
I was wondering why in irl/maxnet.py line 71, you unpack 3 thing from a 2-d array, I think this is a small error here?
File "../irl/mdp/objectworld.py", line 56, in init
super().init(grid_size, wind, discount)
TypeError: super() takes at least 1 argument (0 given)
Could anyone give any help?
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.