Code Monkey home page Code Monkey logo

replicating-deepmind's Introduction

Replicating-DeepMind

Reproducing the results of "Playing Atari with Deep Reinforcement Learning" by DeepMind. All the information is in our Wiki.

Progress: System is up and running on a GPU cluster with cuda-convnet2. It can learn to play better than random but not much better yet :) It is rather fast but still about 2x slower than DeepMind's original system. It does not have RMSprop implemented at the moment which is our next goal.

Note 1: You can also check out a popular science article we wrote about the system to Robohub.

Note 2: Nathan Sprague has a implementation based on Theano. It can do fairly well. See his github for more details.

replicating-deepmind's People

Contributors

kristjankorjus avatar kuz avatar neurocsut-gpu avatar rdtm avatar taivop avatar tambetm avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

replicating-deepmind's Issues

Decouple ALE from MemoryD

ALE class should encapsulate only connection with ALE and should be decoupled from MemoryD. In particular all references to memory should be removed and should be moved to main.py.

Also I see no reason why ALE should be in separate directory, why not src with main.py?

Use named fields for minibatch

Minibatch components are addressed using indexes, should be using names. Also rename variables in NeuralNet.train() to prestate and poststate.

Question about function φ

I am still not clear about the function φ in Algorithm 1. It is obvious from the paper that by using the function φ the input to Q-network is clipped into a 84×84×4 image. But how did it do that?

In Algorithm 1 we found that

20150707103629

and

20150707103638

This makes me confused. What on earth is s_t+1? Does that mean:

s1 = x1
s2 = s1,a1,x2 = x1,a1,x2
s3 = s2,a2,x3 = x1,a1,x2,a2,x3
s4 = s3,a3,x4 = x1,a1,x2,a3,x3,a3,x4
......

So how did φ process s3, for instance? φ(3) should equal to φ(s3) = φ(x1,a1,x2,a2,x3)? I feel hard to understand this.

I would appreciate if anyone could help.

Memory overwrite

The memory is of a fixed length, so when we reach 100000 transistions in memory, we need to start overwriting the first transitions. This is not implemented so far.

Attention should be put to figuring out how to deal with extracting transistions from minibatch once part of the transitions have been overwritten. For example: If we have overwritten transistions till position 10 in memory and minibatch asks for transition nr 11, then the 3 previous "images" in the memory do not correspond to what actually happened before the transition 11. SO we either
1)give a repetition of the 11th image instead of img 10, img9 and img8... as we do in the case when we are asked transistions in the beginng of new game. Downside of it is that actually such transition (same image for 4 frames and then an action) never takes place.
or
2) we forbid the minibatch to ask transitions at that location. Considering we have another 1M of transitions to choose from, frobidding to select 3 of them, seems like no problem.

Simplify NeuralNet interface

NeuralNet class should have only methods train() and predict(), everything else (in particular predict_best_action() and minibatch processing) should be moved to main.py. NeuralNet should be simple wrapper around ConvNet, this would allow using it in other projects too.

Define ALE and cuda-convnet2 as submodules of DeepMind

ALE and cuda-convnet2 should be defined as submodules of DeepMind. This way you will get latest version of both ALE and cuda-convnet2 when doing checkout. Also we can push our fixes directly to those projects.
http://stackoverflow.com/questions/5252450/using-someone-elses-repo-as-a-git-submodule-on-github

What to do with cuda-convnet2 patches? We can leave them as manual work, prepare them as patch files to be applied automatically during make, or hope that they are included in next release.

Saving/loading network

We need to crate a function to save the learned network parameters to file after a desired nr of games is played in main.py.

We need to add a constructor of NeuralNet, that would build neural net from given weight values.

AI

wait a minute... are you saying you have copied deep minds functionality and written your own atari game agent?

Increment frames_played

I downloaded this about a month ago and ran it on my GPU.

Is frames_played incremented anywhere? I was printing out the value for epsilon on every frame and it seemed to stay at 0.9.

Weight initialization

The weights should be initialized in a way that the initial values for expected rewards (when giving input to initial network) would be the same order of magnitude or rather a few orders of magnitude smaller than the reward that we give in case we break a tile (reward=1). At the moment the rewards at the randomly initialized network go as far as (-200 or +200).
We need to decrease weight values, because then adding a reward of 1 to a desired transition/state would really make us choose this same transition next time.

this should be done in constructors of individual layers (the way we initialize W and B)

also, Biases are all initialized at zero for the moment. need to change that.

Write tests for MemoryD

MemoryD class contains some non-trivial logic. As it is completely independent class, it should be easy to write tests for this class. This would ensure, that it works correctly. And maybe do some profiling as well.

Pre-processing too slow

for each frame, the loop in preprocessor.py must run for 210*160 times, which is bit inefficient:

Fill the PIL image object with the correct pixel values

    for i in range(len(image_string)/2):
        num_rows = i % width
        num_cols = i / width
        hex1 = int(image_string[i*2], 16)

        # Division by 2 because: http://en.wikipedia.org/wiki/List_of_video_game_console_palettes
        hex2 = int(image_string[i*2+1], 16)/2
        gray_val = int(arr[hex2, hex1])
        pixels[num_rows, num_cols] = (gray_val, gray_val, gray_val)

    # Crop and downscale image
    roi = (0, 33, 160, 193)  # region of interest is lines 33 to 193
    img = img.crop(roi)
    new_size = 84, 84
    img.thumbnail(new_size)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.