Code Monkey home page Code Monkey logo

inverse-reinforcement-learning's People

Contributors

matthewja avatar oculusmode avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

inverse-reinforcement-learning's Issues

How to deal with non-tabular environment?

The environments of GridWorld and ObjectWorld are all tabular environments, in which the states are discreate and limited. We can easily write down the feature matrix by listing all possible states.
However, when we are dealing with more complicated non-tabular environments (such as Super Mario Game), it's impossible to represent the feature matrix by explicitly listing all possible states, since all states are continuous (e.g. any picture of Super Mario Game at time t) and infinite.
So, how to implement inverse reinforcement learning to deal with non-tabular environment like Super Mario Game? Anyone have any idea about this?

MaxEnt Efficient State Frequency Calculation

According to Ziebart's paper, the equation that updates the state visit frequency is as follows:
image
So, I think the implementation should be:
expected_svf[i, t] += (expected_svf[k, t-1] * policy[i, j] * # Stochastic policy transition_probability[i, j, k])

maxent seems to be using max instead of softmax for V_soft?

In the backwards pass of MaxEnt (Algo 9.1 Brian's thesis), MaxEnt uses a softmax calculation to update the V function (soft Value function), but maxent.py seems to call value_iteration.optimal_value which calculates the hard Value function that is it uses max instead of softmax. This seems like a bug.

Also the initialization seems kind of weird, atleast for gridworld settings only the final state should be initialized to 0 while all others should be -infinity but value_iteration.optimal_value seems to set everything to 0 initially. Any reason for this discrepancy?

Code for reference: https://github.com/MatthewJA/Inverse-Reinforcement-Learning/blob/master/irl/value_iteration.py#L63

Code help

I am having a little bit of issues with this part of the code: φ = T.nnet.sigmoid(th.compile.ops.Rebroadcast((0, False), (1, True))(b) + W.dot(φs[-1])).
Traceback (most recent call last):
File "C:\Users\Sankalp Chauhan\AppData\Local\Programs\Python\Python37\lib\site-packages\theano\gof\vm.py", line 301, in call
thunk()
File "C:\Users\Sankalp Chauhan\AppData\Local\Programs\Python\Python37\lib\site-packages\theano\gof\op.py", line 892, in rval
r = p(n, [x[0] for x in i], o)
File "C:\Users\Sankalp Chauhan\AppData\Local\Programs\Python\Python37\lib\site-packages\theano\tensor\blas.py", line 1552, in perform
z[0] = np.asarray(np.dot(x, y))
ValueError: ('shapes (3,10) and (2,41) not aligned: 10 (dim 1) != 2 (dim 0)', (3, 10), (2, 41))

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "ID_grouping.py", line 363, in
learning_rate,initialisation="normal",l1=0.1,l2=0.1))
File "ID_grouping.py", line 355, in irl
reward = train(reshaped_to_2d)
File "C:\Users\Sankalp Chauhan\AppData\Local\Programs\Python\Python37\lib\site-packages\theano\compile\function_module.py", line 903, in call
self.fn() if output_subset is None else
File "C:\Users\Sankalp Chauhan\AppData\Local\Programs\Python\Python37\lib\site-packages\theano\gof\vm.py", line 305, in call
link.raise_with_op(node, thunk)
File "C:\Users\Sankalp Chauhan\AppData\Local\Programs\Python\Python37\lib\site-packages\theano\gof\link.py", line 325, in raise_with_op
reraise(exc_type, exc_value, exc_trace)
File "C:\Users\Sankalp Chauhan\AppData\Local\Programs\Python\Python37\lib\site-packages\six.py", line 702, in reraise
raise value.with_traceback(tb)
File "C:\Users\Sankalp Chauhan\AppData\Local\Programs\Python\Python37\lib\site-packages\theano\gof\vm.py", line 301, in call
thunk()
File "C:\Users\Sankalp Chauhan\AppData\Local\Programs\Python\Python37\lib\site-packages\theano\gof\op.py", line 892, in rval
r = p(n, [x[0] for x in i], o)
File "C:\Users\Sankalp Chauhan\AppData\Local\Programs\Python\Python37\lib\site-packages\theano\tensor\blas.py", line 1552, in perform
z[0] = np.asarray(np.dot(x, y))
ValueError: ('shapes (3,10) and (2,41) not aligned: 10 (dim 1) != 2 (dim 0)', (3, 10), (2, 41))
Apply node that caused the error: Dot22(W, x.T)
Toposort index: 28
Inputs types: [TensorType(float64, matrix), TensorType(float64, matrix)]
Inputs shapes: [(3, 10), (2, 41)]
Inputs strides: [(80, 8), (8, 16)]
Inputs values: ['not shown', 'not shown']
Outputs clients: [[Elemwise{Composite{scalar_sigmoid((i0 + i1))}}[(0, 1)](Rebroadcast{?,1}.0, Dot22.0)]]

Backtrace when the node is created(use Theano flag traceback.limit=N to make it longer):
File "ID_grouping.py", line 363, in
learning_rate,initialisation="normal",l1=0.1,l2=0.1))
File "ID_grouping.py", line 307, in irl
φ = T.nnet.sigmoid(th.compile.ops.Rebroadcast((0, False), (1, True))(b) + W.dot(φs[-1]))

HINT: Use the Theano flag 'exception_verbosity=high' for a debugprint and storage map footprint of this apply node.

Unexpected reward estimate

Hello, thanks for sharing your code. I'm running examples/lp_gridworld.py and seeing this reward estimate, which looks good:

Screenshot 2023-06-13 at 19 21 51

However, when I change the body of gridworld.reward to e.g.:

    def reward(self, state_int):
        if state_int == 2:  # Goal state now in bottom right of 3x3, not top right
            return 1
        return 0

... then I see this reward estimate:

Screenshot 2023-06-13 at 19 26 05

i.e. linear_irl.irl seems to assume that the 'goal state' is in the top right. Have I got something wrong? How can I get linear IRL to work with different goal states? Thanks.

Py2 or Py3?

When I test the code.
There is a fault about the super() function.
So, do you implement the code in python 3?

sum of Gridworld.transition_probability is not 1

gw = gridworld.Gridworld(5, .3, .2)
gw.transition_probability[7,0,:].reshape(5,5)

outputs

array([
[ 0. , 0. , 0.075, 0. , 0. ],
[ 0. , 0.075, 0.075, 0.775, 0. ],
[ 0. , 0. , 0.075, 0. , 0. ],
[ 0. , 0. , 0. , 0. , 0. ],
[ 0. , 0. , 0. , 0. , 0. ]])

But should sum of this be 1 ?

About feature_matrix

     Hello!I am a graduate student from China,and in your demo,the feature_matrix is not provided,so when giving state space,how to obtain the feature_matrix?I would be grateful if you could answer.
    Apart from that, I have some questions about the details of inverse reinforcement learning,If you can provide your contact information,I would  very appreciate.

Are Ziebart's thesis, equation 9.2 and find_policy() function the same?

Hi Matthew!

This repo is just great: It works, its transparant and modular!

I only found two differences between Ziebart's thesis and your implementation.
Can you let me know if you were aware of them?

So here is Eq 9.2:
Screenshot 2022-06-07 at 11 12 54

Here is your code:
Screenshot 2022-06-07 at 11 10 21

And here is Eq 9.1:
Screenshot 2022-06-07 at 11 12 59
Which uses $V^{\text{soft}}$:
Screenshot 2022-06-07 at 11 17 22

And here is your code:
Screenshot 2022-06-07 at 11 10 30

You include a discount factor in Eq 9.2, and in 9.1 you convert a subtraction ($Q^{\text{soft}}-V^{\text{soft}}$) into a fraction ($\frac{Q^{\text{soft}}}{V^{\text{soft}}}$), correct?

Theano package error

I am currently running code and am getting an error:
I reshaped the feature matrix as reshaped_to_2d_reshape as (num_of_states, num_of_dimensions)
and am still getting this error.
I am not sure how to debug it.
Traceback (most recent call last):
File "ID_grouping.py", line 409, in
learning_rate,initialisation="normal",l1=0.1,l2=0.1))
File "ID_grouping.py", line 398, in irl
reward = train(reshaped_to_2d_reshape[0])
File "C:\Users\Sankalp Chauhan\AppData\Local\Programs\Python\Python37\lib\site-packages\theano\compile\function_module.py", line 813, in call
allow_downcast=s.allow_downcast)
File "C:\Users\Sankalp Chauhan\AppData\Local\Programs\Python\Python37\lib\site-packages\theano\tensor\type.py", line 178, in filter
data.shape))
TypeError: Bad input argument to theano function with name "ID_grouping.py:388" at index 0 (0-based).
Backtrace when that variable is created:

File "ID_grouping.py", line 409, in
learning_rate,initialisation="normal",l1=0.1,l2=0.1))
File "ID_grouping.py", line 316, in irl
s_feature_matrix = T.matrix("x")
Wrong number of dimensions: expected 2, got 1 with shape (5,).
What version of theano and theano.tensor you use in your code?
This occurs with deep_maxent.py

IRL large state space

I would like to ask you if you can list some references in order to understand how you formulated the block matrix form of the linear program for solving large state space problem according the paper by Ng. I am missing how you treat the function p(x) = x if x>0 or 2x if x<0 that is part of the objective function.

Theano package help

I need help to try to remove this warning:
WARNING (theano.configdefaults): g++ not available, if using conda: conda install m2w64-toolchain C:\Users\Sankalp Chauhan\AppData\Local\Programs\Python\Python37\lib\site-packages\theano\configdefaults.py:560: UserWarning: DeprecationWarning: there is no c++ compiler.This is deprecated and with Theano 0.11 a c++ compiler will be mandatory warnings.warn("DeprecationWarning: there is no c++ compiler." WARNING (theano.configdefaults): g++ not detected ! Theano will be unable to execute optimized C-implementations (for both CPU and GPU) and will default to Python implementations. Performance will be severely degraded. To remove this warning, set Theano flags cxx to an empty string.
My code is working but it is taking a long time to get output

Feature matrix

Can you give some points on how you designed the feature matrix? You kept it as 25*25 (in case of Gridword), where each state is represented separately. According to my understanding, states have to be grouped together according to their characteristics (eg. goal state, ground states, puddles states etc). Then why did you characterized each state separately?

Broken Report Link

Hello,
The report link in Readme.md is broken, it'd be great to me and future readers if it's updated to read how the algorithm works.

Thanks.

super() takes at least 1 argument (0 given)

File "../irl/mdp/objectworld.py", line 56, in init
super().init(grid_size, wind, discount)
TypeError: super() takes at least 1 argument (0 given)

Could anyone give any help?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.