Code Monkey home page Code Monkey logo

torch-twrl's Introduction

Build Status License Join the chat at https://gitter.im/torch-twrl/Lobby

torch-twrl: Reinforcement Learning in Torch

torch-twrl is an RL framework built in Lua/Torch by Twitter.

Installation

Install torch

git clone https://github.com/torch/distro.git ~/torch --recursive
cd ~/torch; bash install-deps;
./install.sh

Install torch-twrl

git clone --recursive https://github.com/twitter/torch-twrl.git
cd torch-twrl
luarocks make

Want to play in the gym?

  1. Start a virtual environment, not necessary but it helps keep your installation clean

  2. Download and install OpenAI Gym, gym-http-api requirements, and ffmpeg

pip install virtualenv
virtualenv venv
source venv/bin/activate
pip install gym
pip install -r src/gym-http-api/requirements.txt
brew install ffmpeg

Works so far?

You should have everything you need:

  • Start your gym_http_server with
python src/gym-http-api/gym_http_server.py
  • In a new console window (or tab), run the example script (policy gradient agent in environment CartPole-v0)
cd examples
chmod u+x cartpole-pg.sh
./cartpole-pg.sh

This script sets parameters for the experiment, in detail here is what it is calling:

th run.lua \
	-env 'CartPole-v0' \
	-policy categorical \
	-learningUpdate reinforce \
   	-model mlp \
	-optimAlpha 0.9 \
   	-timestepsPerBatch 1000 \
	-stepsizeStart 0.3 \
	-gamma 1 \
	-nHiddenLayerSize 10 \
	-gradClip 5 \
	-baselineType padTimeDepAvReturn \
	-beta 0.01 \
	-weightDecay 0 \
	-windowSize 10 \
   	-nSteps 1000 \
	-nIterations 1000 \
	-video 100 \
	-optimType rmsprop \
	-verboseUpdate true \
	-uploadResults false \
	-renderAllSteps false

Your results should look something our results from the OpenAI Gym leaderboard

Doesn't work?

  1. Test the gym-http-api
cd /src/gym-http-api/
nose2
  1. Start a Gym HTTP server in your virtual environment
python src/gym-http-api/gym_http_server.py
  1. In a new console window (or tab), run torch-twrl tests
luarocks make; th test/test.lua

Dependencies

Testing of RL development is a tricky endeavor, it requires well established, unified, baselines and a large community of active developers. The OpenAI Gym provides a great set of example environments for this purpose. Link: https://github.com/openai/gym

The OpenAI Gym is written in python and it expects algorithms which interact with its various environments to be as well. torch-twrl is compatible with the OpenAI Gym with the use of a Gym HTTP API from OpenAI; gym-http-api is a submodule of torch-twrl.

All Lua dependencies should be installed on your first build.

Note: if you make changes, you will need to recompile with

luarocks make

Agents

torch-twrl implements several agents, they are located in src/agents. Agents are defined by a model, policy, and learning update.

  • Random
    • model: noModel
    • policy: random
    • learningUpdate: noLearning
  • TD(Lambda)
    • model: qFunction
    • policy: egreedy
    • learningUpdate: tdLambda - implements temporal difference (Q-learning or SARSA) learning with eligibility traces (replacing or accumulating)
  • Policy Gradient Williams, 1992:
    • model: mlp - multilayer perceptron, final layeer: tanh for continuous, softmax for discrete
    • policy: stochasticModelPolicy, normal for continuous actions, categorical for discrete
    • learningUpdate: reinforce

Important note about agent/environment compatibility:

The OpenAI Gym has many environments and is continuously growing. Some agents may be compatible with only a subset of environments. That is, an agent built for continuous action space environments may not work if the environment expects discrete action spaces.

Here is a useful table of the environments, with details on the different variables that may help to configure agents appropriately.

Testing details:

Continuous integration is accomplished by building with Travis. Testing is done with LUAJIT21, LUA51 and LUA52 with compilers gcc and clang.

Tests are defined in the /tests directory with separate basic unit tests set and a Gym integration test set.

Known Issues:

  • LUA52 and libhash not working, so tilecoding examples fail in LUA52.

Future Work

References

  1. Boyan, J., & Moore, A. W. (1995). Generalization in reinforcement learning: Safely approximating the value function. Advances in neural information processing systems, 369-376.
  2. Sutton, R. S. (1988). Learning to predict by the methods of temporal differences. Machine learning, 3(1), 9-44.
  3. Singh, S. P., & Sutton, R. S. (1996). Reinforcement learning with replacing eligibility traces. Machine learning, 22(1-3), 123-158.
  4. Barto, A. G., Sutton, R. S., & Anderson, C. W. (1983). Neuronlike adaptive elements that can solve difficult learning control problems. Systems, Man and Cybernetics, IEEE Transactions on, (5), 834-846.
  5. Sutton, Richard S., and Andrew G. Barto. Reinforcement learning: An introduction. Vol. 1. No. 1. Cambridge: MIT press, 1998.
  6. Williams, Ronald J. "Simple statistical gradient-following algorithms for connectionist reinforcement learning." Machine learning 8.3-4 (1992): 229-256.

License

torch-twrl is released under the MIT License. Copyright (c) 2016 Twitter, Inc.

torch-twrl's People

Contributors

davemssavage avatar gitter-badger avatar korymath avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

torch-twrl's Issues

Torch Tensor Types

Should unify the Tensor types and then set the default type.

This should allow for easy transition to CUDA tensors, when and if the need arises.

Build failing. Dig in to make sure that it builds nice.

Looks like LUA52 doesn't build nicely because libhash is missing.

snip:

tilecodeConsistent
 Function call failed
error loading module 'libhash' from file '/home/travis/torch/install/lib/lua/5.2/libhash.so'

Need to make sure that LUA52 has a hashing function.

qFunction issue

Hi,
I got an issue by simply running the example script cartpole-td.sh:

Error: Experiment was not successfully run.
...rch/install/share/lua/5.1/twrl/agent/model/qFunction.lua:22: attempt to index local 'state' (a nil value)
stack traceback:
...drougard/torch/install/share/lua/5.1/twrl/experiment.lua:62: in function '__index'
...rch/install/share/lua/5.1/twrl/agent/model/qFunction.lua:22: in function 'getFeatures'
...rch/install/share/lua/5.1/twrl/agent/model/qFunction.lua:43: in function 'estimateQ'
...rch/install/share/lua/5.1/twrl/agent/model/qFunction.lua:56: in function 'estimateAllQ'
...orch/install/share/lua/5.1/twrl/agent/policy/egreedy.lua:20: in function 'selectAction'
...drougard/torch/install/share/lua/5.1/twrl/experiment.lua:37: in function <...drougard/torch/install/share/lua/5.1/twrl/experiment.lua:15>
[C]: in function 'xpcall'
...drougard/torch/install/share/lua/5.1/twrl/experiment.lua:66: in function <...drougard/torch/install/share/lua/5.1/twrl/experiment.lua:1>
run.lua:69: in main chunk
[C]: in function 'dofile'
...TWRL/torch/install/lib/luarocks/rocks/trepl/scm-1/bin/th:150: in main chunk
[C]: at 0x00406670

Strangely this error disappears when asking for more videos, e.g. -video 5.
It disappears also when adding enough prints in qFunction.lua (in order to track the error's origin).
Hence it seems to be related to a process (server?) unable to provide the state at the proper time (it becomes able to do it when slowing down all processes, e.g. by adding video or printing enough data).

The issue has been observed on two PCs using Ubuntu 14.04.

Misc notes before launch

oldPrint, test_api.lua:9 is a global

Getting two errors, one failing test on an anaconda-based torch install:

$ th -g test/test.lua
created global variable: oldPrint @ /Users/awiltschko/Code/torch-twrl/src/gym-http-api/binding-lua/test_api.lua:9
Running 14 tests
 1/14 testMujoco ........................................................ [ERROR]
 2/14 testPendulum ...................................................... [PASS]
 3/14 getSummary ........................................................ [PASS]
 4/14 addRewardTerminal ................................................. [PASS]
 5/14 torchTensor ....................................................... [PASS]
 6/14 testFrozenLake .................................................... [PASS]
 7/14 reset ............................................................. [PASS]
 8/14 badExperimentCall ................................................. [PASS]
 9/14 tilecodeConsistent ................................................ [PASS]
10/14 tilecodePredictable ............................................... [FAIL]
11/14 testCartPole ...................................................... [PASS]
12/14 randomNoLearningNoModel ........................................... [PASS]
13/14 addRewardNonTerminal .............................................. [PASS]
14/14 testAtari ......................................................... [ERROR]
Completed 12 asserts in 14 tests with 1 failure and 2 errors
--------------------------------------------------------------------------------
testMujoco
 Function call failed
...ch-twrl/src/gym-http-api/binding-lua/gym_http_client.lua:82: attempt to concatenate local 'instance_id' (a nil value)
stack traceback:
        ...ch-twrl/src/gym-http-api/binding-lua/gym_http_client.lua:82: in function 'env_action_space_info'
        ...ode/torch-twrl/src/gym-http-api/binding-lua/test_api.lua:26: in function 'runTest'
        test/test.lua:54: in function <test/test.lua:53>
        [C]: in function 'xpcall'
        /Users/awiltschko/anaconda/share/lua/5.2/torch/Tester.lua:477: in function '_pcall'
        /Users/awiltschko/anaconda/share/lua/5.2/torch/Tester.lua:436: in function '_run'
        /Users/awiltschko/anaconda/share/lua/5.2/torch/Tester.lua:355: in function 'run'
        test/test.lua:146: in main chunk
        [C]: in function 'dofile'
        ...wiltschko/anaconda/lib/luarocks/rocks/trepl/scm-1/bin/th:145: in main chunk
        [C]: in ?

--------------------------------------------------------------------------------
tilecodePredictable
tiles and predictTables should be equal
At table location 1: EQ failed: 506 ~= 820
stack traceback:
        test/test.lua:87: in function <test/test.lua:74>
--------------------------------------------------------------------------------
testAtari
 Function call failed
...ch-twrl/src/gym-http-api/binding-lua/gym_http_client.lua:82: attempt to concatenate local 'instance_id' (a nil value)
stack traceback:
        ...ch-twrl/src/gym-http-api/binding-lua/gym_http_client.lua:82: in function 'env_action_space_info'
        ...ode/torch-twrl/src/gym-http-api/binding-lua/test_api.lua:26: in function 'runTest'
        test/test.lua:49: in function <test/test.lua:48>
        [C]: in function 'xpcall'
        /Users/awiltschko/anaconda/share/lua/5.2/torch/Tester.lua:477: in function '_pcall'
        /Users/awiltschko/anaconda/share/lua/5.2/torch/Tester.lua:436: in function '_run'
        /Users/awiltschko/anaconda/share/lua/5.2/torch/Tester.lua:355: in function 'run'
        test/test.lua:146: in main chunk
        [C]: in function 'dofile'
        ...wiltschko/anaconda/lib/luarocks/rocks/trepl/scm-1/bin/th:145: in main chunk
        [C]: in ?

--------------------------------------------------------------------------------
/Users/awiltschko/anaconda/bin/lua: /Users/awiltschko/anaconda/share/lua/5.2/torch/Tester.lua:363: An error was found while running tests!
stack traceback:
        [C]: in function 'assert'
        /Users/awiltschko/anaconda/share/lua/5.2/torch/Tester.lua:363: in function 'run'
        test/test.lua:146: in main chunk
        [C]: in function 'dofile'
        ...wiltschko/anaconda/lib/luarocks/rocks/trepl/scm-1/bin/th:145: in main chunk
        [C]: in ?

Silent Server Error: bad argument #1 to '?'

The client needs to see more information when this error occurs.

Reproduced by stopping the server while running or a dropped packet from the server.

It would be nice if the experiment could do one of the following:

  • restart the whole run (useful for short runs)
  • save the agent's information (weights, traces, etc) before restarting the episode and resetting the agent with that information (useful for long runs)

Inconsistent types with boolean variables

I've noticed strange behaviour with -verboseUpdate param since it defaults to false. The other boolean variables return strings of true or false, however this returns an actual boolean. This is probably also the case for any variables that default to true!

I would suggest making all booleans default to false, however there might be a better way to solve this.

This is also fixed by this by checking for both conditions, but might be worth defaulting to false for uniformity!

Critical: Issue with tilecoding example

...orch/install/share/lua/5.1/rl/agent/model/tilecoding.lua:32: attempt to get length of local 's' (a nil value)
Error: Experiment was not successfully run.

gym_http_client reporting server error

When batch processing some runs are stopping early due to the following error:

Error: Experiment was not successfully run.
...orymathewson/torch/install/share/lua/5.1/socket/http.lua:117: interrupted!
stack traceback:
...athewson/torch/install/share/lua/5.1/twrl/experiment.lua:65: in function <...athewson/torch/install/share/lua/5.1/twrl/experiment.lua:62>
[C]: in function 'request'
...ch/install/share/lua/5.1/httpclient/luasocket_driver.lua:136: in function 'post'
...stall/share/lua/5.1/twrl/binding-lua/gym_http_client.lua:45: in function 'post_request'
...stall/share/lua/5.1/twrl/binding-lua/gym_http_client.lua:73: in function 'env_step'
...athewson/torch/install/share/lua/5.1/twrl/experiment.lua:37: in function <...athewson/torch/install/share/lua/5.1/twrl/experiment.lua:16>
[C]: in function 'xpcall'
...athewson/torch/install/share/lua/5.1/twrl/experiment.lua:69: in function <...athewson/torch/install/share/lua/5.1/twrl/experiment.lua:1>
run.lua:69: in main chunk
[C]: in function 'dofile'
...wson/torch/install/lib/luarocks/rocks/trepl/scm-1/bin/th:145: in main chunk
[C]: at 0x010f3a8cf0

Tilecoding scaling fails for unbounded OpenAI gym envs

print(floats) in tilecoding.lua

{
1 : -1.9983250206265
2 : -1.3611293865541e+39
3 : -1.9772034502097
4 : -1.3611293865541e+39
}

Running cartpole-td.sh will not learn as some of the floating features are hashed to huge dimensions.

Need to split out the scaling from the tile coding, and handle the tile coding of integers as well, within the tilecoder.

Humanoid-v1 Tensor Issue

inconsistent tensor size at /tmp/luarocks_torch-scm-1-519/torch7/lib/TH/generic/THTensorCopy.c:7
stack traceback:
...athewson/torch/install/share/lua/5.1/twrl/experiment.lua:64: in function <...athewson/torch/install/share/lua/5.1/twrl/experiment.lua:61>
[C]: at 0x0b87ff30
[C]: in function '__newindex'
...ll/share/lua/5.1/twrl/agent/learningUpdate/reinforce.lua:37: in function 'learn'
...son/torch/install/share/lua/5.1/twrl/agent/baseAgent.lua:112: in function 'reward'
...athewson/torch/install/share/lua/5.1/twrl/experiment.lua:40: in function <...athewson/torch/install/share/lua/5.1/twrl/experiment.lua:16>
[C]: in function 'xpcall'
...athewson/torch/install/share/lua/5.1/twrl/experiment.lua:68: in function <...athewson/torch/install/share/lua/5.1/twrl/experiment.lua:1>
test-run.lua:69: in main chunk
[C]: in function 'dofile'
...wson/torch/install/lib/luarocks/rocks/trepl/scm-1/bin/th:145: in main chunk
[C]: at 0x010b716cf0

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.