Code Monkey home page Code Monkey logo

rl's People

Contributors

shmuma avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar

rl's Issues

ADI doesn't converge

Hi,

I try to do my own implementation as a personal learning (I use keras, not pytorch).
As I understood value should ideally represent a "distance" from the goal state, like:

  • for the solved cube, 0.
  • for states which are neighbour of the solved state, 1.
  • for states which are 2 distant from the solved state, 0.
  • for states which are 3 distant from the solved state, -1.
  • for states which are 4 distant from the solved state, -2.
    etc.

In my case sometimes the value is increasing the further the state really is from the solved state:

  • for solved cube,0. (this is forced to be labeled as 0 all times, as in your solution)
  • for neighbours, 1. (I manage to force this to be 1+V(solved state) )
  • for 2 distant states, 2. (or an arbitrary value, more than 1)
  • for 3 distant states, 3.
    etc.

The problem I think is that there will always be a state which is not in the training data, thus the above solution indeed has low squared error:
Take into account a chain in the training set (states that are neighbours and the "real, unknown" distance is increasing) with states S1, S2, S3...Sn (Here S1, ... Sn are in the training set, and the index measures the real distance, e.g. minimal rotations needed to solve the state).
V(S0) =0,V(S1)=1 |here the states fall apart| V(S2) = V(S3)-1=...=V(Sn)-1 = V(Sn+1)-1 is a good solution, since Sn+1 is not in the training set, thus V(Sn+1) could be anything (e.g. 10000).
Here V(S) is the value predicted by the Neural Net.

What am I missing? What makes the above solution invalid to converge to? It should be penalized somehow...

Thank you

Weights at loss function

In train.py, if config.weight_samples==True #in 74. row
Instead of 'simple' value_loss_t = value_loss_t.mean(), shouldn't we divide value_loss_t.sum() by the sum of weights (i.e. weighted average).
I think we don't want the loss function to be lower just because the weights in the batch was lower, i.e. sampling 'farther' states.

License

Hello,

I'm doing my thesis on Rubik's cube solving with DL, mainly based on this paper.
Your article and code have been very valuable so far and definitely made my life easier. Though, just to make sure (and because this repo doesn't have a license listed), is it okay if I use your code and cite your article in my thesis?

Thanks!
-- Evelyn

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.