rlworkgroup / garage Goto Github PK

View Code? Open in Web Editor NEW

1.8K 57.0 309.0 58.94 MB

A toolkit for reproducible reinforcement learning research.

License: MIT License

Python 89.49% Shell 0.79% Makefile 0.16% Dockerfile 0.22% Jupyter Notebook 9.33%

rl-algorithms reproducibility pytorch tensorflow

garage's People

Contributors

Stargazers

Watchers

Forkers

gntoni scapeqin wwxfromtju innerlee landoufulxf shikharbahl half-potato krzentner xucheng1 robotgradient ml-lab kairproject yuanzhaoyz ying-wen venutrue taochenosu henry-zhang-bohan hejia-zhang mwufi andrewliujian flyers dpduanpu shadiakiki1986 catherinesue seanhsieh vincentzhang kelvinson uitml tobyge edrya psxz wei-tianhao maxiaoba remosy mark-koren fybazs nikolausberl mburakg damonclifford kukanani paulshuva nish21 yanxg xht033 fanjups nttrungmt adrianbzg lywong92 seanmcrae arbenton wyjw splendor-kill kornbergfresnel lpetrich h3dema piraka9011 nhanph parachutel habibzadeh featuremachine zaneh1992 jonasnm zhangyx96 dgharsallah joomladigger lxmwust sicelukwanda huy-ha dosssman tfrance capri2014 kunbb collector-m megayeye maltimore akulashray xuchengguang qin1921 guyk1971 gwaxg dremartstud bqzhu922 dadacheng igor-krawczuk snazari rainwangphy hyyh28 wjssx aipachakutiqwan manik-hossain lightningtrade js-god mzy2240 waldow90 nerdstark21 prograguo ashutosh-adhikari andcelli adibellathur gitanshu

garage's Issues

setup_linux.sh fails while looking for mujoco 1.5.0 on a clean system

This is probably related to ryanjulian/rllab#89, since dm_control uses mujoco 1.5.0. We may need to combine setup_mujoco.sh and setup_.sh

Imported from ryanjulian/rllab#90

The current logger interface is okay, but the implementation is a bit messy and the TensorBoard integration is kind of a bolt-on afterthought. It's also package-global which can induce bad implementation decisions.

I'd like to rewrite the logger to be properly encapsulated as a class(es). There can still be a global singleton instance for easy access.

Ideas:

Eliminate global scope
First-class TensorBoard support
Multiprocess-aware logging
Make the logger API appropriate for all aspects of rllab (e.g. bring-up, training algo, random debugging, etc.) to avoid code peppered with print statements
Decouple logged datapoints from output formats
Decouple checkpointing and logging
Sophisticated checkpoint/log destinations (e.g. remote buckets?)
Take advantage of a minimalist logging framework rather than hand-crafting logger formats?

Imported from ryanjulian/rllab#80

Run existing unit tests in the CI

Replace conda with a pip package

conda makes it difficult to use rllab as a library.

We would like to transition to using the standard Python package interface. This will require getting all the dependencies to install using pip, plus probably a some custom setup scripts for setup.py.

Adding wheel compilation to the CI (e.g. appveyor) is also in-scope for this project.

Imported from ryanjulian/rllab#81

Move sandbox.rocky.tf to rllab.tf

We are moving towards making the common parts of rllab agnostic of the NN library. TensorFlow should no longer be a second-class citizen.

This change would remove the TensorFlow sandbox and make the TensorFlow tree a first-class rllab citizen.

Imported from ryanjulian/rllab#84

Sawyer MuJoCo Support

Extract objects manager interface for contrib.ros

Imported from ryanjulian/rllab#76

Add initialize interface for robot

Imported from ryanjulian/rllab#74

Move theano-specific code to garage.theano

Theano should no longer be first-class while tf second-class. We are aiming for major parts of rllab to be NN-framework agnostic.

This should move theano-specific components into garage.theano, while stripping Theano dependencies from common parts of the code.

Imported fromhttps://github.com/ryanjulian/rllab/issues/83

tf/TD3

Incorporate the TD3 algorithm in garage.

Custom flake8-import-order style class

See https://github.com/PyCQA/flake8-import-order

This will allow us to put sandbox and contrib in separate groups below imports from garage

Add the functionality to compute transformation matrix between vicon frame and robot frame in contrib.ros.util.vicon

Imported from ryanjulian/rllab#94

InvertedDoublePendulumEnv is broken

Traceback (most recent call last):
  File "tests/envs/test_envs.py", line 65, in <module>
    envs = [cls() for cls in simple_env_classes]
  File "tests/envs/test_envs.py", line 65, in <listcomp>
    envs = [cls() for cls in simple_env_classes]
  File "/home/rjulian/code/garage/rllab/envs/mujoco/point_env.py", line 21, in __init__
    super(PointEnv, self).__init__(*args, **kwargs)
  File "/home/rjulian/code/garage/rllab/envs/mujoco/mujoco_env.py", line 85, in __init__
    self.reset()
  File "/home/rjulian/code/garage/rllab/envs/mujoco/mujoco_env.py", line 131, in reset
    return self.get_current_obs()
  File "/home/rjulian/code/garage/rllab/envs/mujoco/mujoco_env.py", line 134, in get_current_obs
    return self._get_full_obs()
  File "/home/rjulian/code/garage/rllab/envs/mujoco/mujoco_env.py", line 138, in _get_full_obs
    cdists = np.copy(self.sim.geom_margin).flat
AttributeError: 'mujoco_py.cymj.MjSim' object has no attribute 'geom_margin'

conda build regression test

Imported from ryanjulian/rllab#34

GitHub check for commit message guidelines

Imported from ryanjulian/rllab#107

Support multi-modal policies

In order to support visuomotor control learning and other problems, we need to implement a way to use policies that consist of submodules which handle certain input modalities, such as images and vectors. OpenAI Gym already has support for a tuple_space that is a tuple of different spaces. The most common use-case of such multi-modal observation spaces are combinations of 2d images and vectors.

Exact specification needs to be done but for now the task items look as follows:

add a new space representing 2d images
implement a test environment that has a tuple_space as observation space consisting of an image and a vector (e.g. reacher with top-down view image and 2d endeffector position)
additionally a wrapper would be useful that adds a visual output to an existing environment (renders user-defined camera to 2d pixel array and adds it to the tuple space, or makes a tuple space if environment was unimodal before)
implement a multi-modal policy that builds convolutional submodules for image spaces and MLPs for vectors, and merges the top layers from these submodules via an MLP that computes the final output

Your feedback on this issue is most welcome so that we can split up this feature into smaller tasks.

Imported from ryanjulian/rllab#108

Asynchronous TensorFlow plotting doesn't work on Mac OSX

Asynchronous plotting for TensorFlow works perfectly fine on Linux using threading.Thread, but Mac OS X will not display a window, even if threading.Thread is switched to multiprocessing.Process, which is how it was implemented using Theano.

Imported from ryanjulian/rllab#127

Add ros stack blocks and place blocks in bins env

Imported from ryanjulian/rllab#77

Create ray sampler

Right now rllab uses parallelism in an ad-hoc manner through multiprocessing library, mostly in the sampler. If we use a principled parallelism library (e.g. mpi, ray, others), we can probably clean up the code while avoiding tricky multiprocessing bugs in the future.

Imported from ryanjulian/rllab#82

Cleanup normalized env

Normalized env of gym env is done at ryanjulian/rllab#125, but the refactoring of gym env is not done yet(ryanjulian/rllab#129). Please refactor for normalized_gym_env.py of our codebase.

Imported from: ryanjulian/rllab#131

init() step for the environment interface

This is very useful for real robots. Is it supported by gym.Env?

Imported from ryanjulian/rllab#70

Fix mujoco env

I found an issue in mujoco_env. The MjSim of mujoco_py does not has an attribute of geom_margin, which is used by _get_full_obs(). This is caused by a pr that refactors rllab.mujoco_py to mujoco_py

There might be other issues imported this pr. So a regression test of mujoco envs is desired for this issue.

Get rid of layers library in the TensorFlow tree

This is largely superseded by tf.nn, and it adds another unnecessary layer of complexity--it would be easier to understand if primitives were just written in pure TF.

Imported from ryanjulian/rllab#42

Support for dynamics randomization

https://blog.openai.com/generalizing-from-simulation/

We should probably not attempt this until #4 is done.

For now, only MuJoCo support is necessary. If MuJoCo is too burdensome, we can consider switching engines (e.g. Bullet)

Related issue openai/mujoco-py#148 suggests this may be nontrivial.

Imported from ryanjulian/rllab#61

Normalization for gym envs

Now that we are using OpenAI gym directly in our RL algorithms, we still need to normalize the environment (actions, observations, rewards) as in https://github.com/ryanjulian/rllab/blob/integration/rllab/envs/normalized_env.py.

So implement NormalizedEnv for gym.Env.

Imported from ryanjulian/rllab#64

Integrate MuJoCo setup script with setup_<platform>.sh

We should get rid of this to make packaging easier, unless there's a very compelling reason why not to. dm_control makes it work fine.

Imported from ryanjulian/rllab#96

Runtime docker containers

They have not been touched in a long time.

Tasks:

Dependency audit
Rebase onto current NVIDIA base containers
Reintroduce docker build CI tests

Imported from ryanjulian/rllab#79

Add PEP8 checks to the CI

Imported from ryanjulian/rllab#95

Add tf GPU options

It should be possible to set the TensorFlow session options whenever a tf.Session is created for training, such as here (other places where a session is constructed might not need to use these options). It is sometimes necessary to limit the available memory to tf running on a GPU, etc.
A possible implementation could allow the user to specify the GPU options via a ConfigProto setting in config_personal.py (set to None by default).

Imported from ryanjulian/rllab#123

setup_linux.sh fails while installing dm_control

It's missing an absl dependency.

Imported from ryanjulian/rllab#89

DDPG for TensorFlow

Original paper:
https://arxiv.org/abs/1509.02971

Implementation in the Theano tree:
https://github.com/ryanjulian/rllab/blob/integration/rllab/algos/ddpg.py

OpenAI baselines implementation:
https://github.com/openai/baselines/tree/master/baselines/ddpg

blog post (there are many other resources
http://pemami4911.github.io/blog/2016/08/21/ddpg-rl.html

Sketch:

Provide an implementation in sandbox/rocky/tf/ddpg.py
Add any needed primitives to rllab/ and sandbox/rocky/tf/
Provide a regression test (of the reward curve) against the openai/baselines implementation

Imported from ryanjulian/rllab#26

Roboschool support

Imported from ryanjulian/rllab#6

Merge rlkit code to start garage/torch

It will be the base for garage.torch

Asynchronous plotting for TensorFlow

Task

Presently asynchronous plotting/3D rendering is not supported in the part of rllab based on TensorFlow (sandbox.rocky.tf), but it is supported in the rllab code which uses Theano.

This means that when you turn plotting on for a Theano training session, the plot does not block the training process. The TensorFlow implementation runs the rendering loop directly in the training algorithm (rather than a worker), so it blocks. This makes training using TensorFlow much slower than Theano when plotting is turned on (they are about the same without plotting).

TensorFlow's notion of a session makes this tricky. I'm not 100% sure that there is a solution. If you figure out that it is impossible, or requires rewriting large parts of the repository, email me with what you tried and some explanations why.

Current behavior
3D plotting with MuJoCo in TensorFlow is synchronous, blocks the training process.

Desired behavior
3D plotting with MuJoCo in TensorFlow is asynchronous and does not block the training process (just as with Theano)

Pointers to relevant source

To run a basic training algorithm which has 3D plots using Theano (you will need to pass plot=True to TRPO):
https://github.com/ryanjulian/rllab/blob/master/examples/trpo_swimmer.py
To run a basic training algorithm with TensorFlow:
https://github.com/ryanjulian/rllab/blob/master/sandbox/rocky/tf/launchers/trpo_cartpole.py
Note: this does not use a MuJoCo environment/3D plotting, but you just need to change the environment to SwimmerEnv() and it will.
Asynchronous plotter API used by the Theano code
https://github.com/ryanjulian/rllab/tree/master/rllab/plotter
Theano and TF implementations of BatchPolopt (calls the plotter)
Theano: https://github.com/ryanjulian/rllab/blob/master/rllab/algos/batch_polopt.py
TF: https://github.com/ryanjulian/rllab/blob/master/sandbox/rocky/tf/algos/batch_polopt.py
rllab implementation of Serializable (used to transfer objects across multiprocessing queues): https://github.com/ryanjulian/rllab/blob/master/rllab/core/serializable.py

Submission instructions

Fork this repository into you own Github account, and implement this feature in its own branch, based on the master branch. When you are done and would like a code review, send me an email with a link to your feature branch. DO NOT SUBMIT A PULL REQUEST TO THIS REPOSITORY.

Consider this a professional software engineering task, and provide a high-quality solution which does not break existing users, minimizes change, and is stable. Tests are welcome where appropriate. Please always use PEP8 style in your code, and format it using YAPF (with the PEP8 setting).

Notes

Testing the software requires a freely-available student license for MuJoCo available here. It takes a couple days to get approved, so do it early.
rllab has setup instructions at http://rllab.readthedocs.io
You can find examples of how to launch rllab in examples and sandbox/rocky/tf/launchers. Note that everything must run using the run_experiment_lite wrapper with the parameter n_parallel greater than 1 (this triggers multiprocess operation).
rllab currently has two parallel implementation of the neural network portions of the library. The original is written in Theano and is found in rllab/. The tree sandbox/rocky/tf re-implements classes from the original tree using TensorFlow, and is backwards-compatible with the Theano tree. We are working towards using only one NN library soon, but for now your implementation needs to work in both trees.
rllab is an upstream dependency to many projects, so it is important we do not break the existing APIs. Adding to APIs is fine as long as there is a good reason.

Imported from ryanjulian/rllab#1

DQN for TensorFlow

Original paper:
https://storage.googleapis.com/deepmind-media/dqn/DQNNaturePaper.pdf

OpenAI baselines implementation:
https://github.com/openai/baselines/tree/master/baselines/deepq

(There are many more resources on Google with DQN implementations in TF)

Sketch:

Provide an implementation in sandbox/rocky/tf/dqn.py
Add any needed primitives to rllab/ and sandbox/rocky/tf/
Provide a regression test (of the reward curve) against the openai/baselines implementation

Imported from ryanjulian/rllab#50

Bullet support

We would like to add support for the Bullet physics engine to rllab. Thankfully, the Bullet team have recently provided Python bindings in the form of pybullet, and even provides examples of how to implement the gym.Env interface (from OpenAI Gym) using pyBullet.

This task is to add pybullet to the rllab conda environment, and implement a class (similar to GymEnv, e.g. BulletEnv) which allows any rllab algorithm to learn against pybullet environments. You will also need to implement the plot interface, if pybullet does not already, which shows the user a 3D animation of the environment. Essentially, you should duplicate the experience of running one of the MuJoCo-based examples (e.g. trpo_swimmer.py), but using a Bullet environment instead. You should include examples (in examples/ and sandbox/rocky/tf/launchers/) of launcher scripts which use an algorithms (suggestion: TRPO) to train the KukaGymEnv environment.

This is conceptually the same as GymEnv, which allows rllab users to import any OpenAI Gym environment and learn against them. In fact, pybullet environments implement the Gym interface, so in theory we should be done as soon as we can import pybullet. In practice, our constructor for Gym environments only takes the string name (e.g. "Humanoid-v1") of a Gym environment, not the class of a Gym environment. The pybullet environments do not have string shortcuts because they are not part of the official Gym repository. Furthermore, we'd like to use other unofficial Gym environments in rllab, but it is currently difficult for the same reason.

So you might structure this task as two pull requests (1) adding pybullet to the conda environment and (2) Modifying GymEnv to support arbitrary environments which implement the gym.Env interface (attempted in ryanjulian/rllab#12).

Consider this a professional software engineering task, and provide a high-quality solution which does not break existing users, minimizes change, and is stable. Please always use PEP8 style in your code, and format it using YAPF (with the PEP8 setting). Submit your pull request against the integration branch of this repository.

Some notes:

You can find examples of how to launch rllab in examples and sandbox/rocky/tf/launchers. Note that everything must run using the run_experiment_lite wrapper.
rllab currently has two parallel implementation of the neural network portions of the library. The original is written in Theano and is found in rllab/. The tree sandbox/rocky/tf re-implements classes from the original tree using TensorFlow, and is backwards-compatible with the Theano tree. We are working towards using only one NN library soon, but for now your implementation needs to work in both trees.
rllab is an upstream dependency to many projects, so it is important we do not break the existing APIs. Adding to APIs is fine as long as there is a good reason.

Imported from ryanjulian/rllab#5

Sawyer runtime support

Imported from ryanjulian/rllab#8

Replace rllab.envs.Env with gym.Env

The community has settled on gym.Env as a de-facto standard environment interface. There's no reason to keep our own around.

The scope of this change is to remove the rllab.envs.Env base interface, and refactor implementing classes to instead implement gym.Env. Note that this explicitly does not mean that we are adopting the physics engine, registration system, benchmarks, etc of OpenAI Gym--just the gym.Env abstract interface.

Imported from ryanjulian/rllab#85

Gym environments should not show a plot when plot=False

Imported from ryanjulian/rllab#31

Permanent fix for 'GLEW initalization error: Missing GL version' on Linux machines

Right now the temporary solution to this issue is prepend python examples/xxx.py with LD_PRELOAD=/usr/lib/x86_64-linux-gnu/libGLEW.so:/usr/lib/nvidia-384/libGL.so; note that you must replace nvidia-384 to the one installed on your machine (use nvidia-smi to determine the driver version currently in use). Relevant comments are here and here, and this link is a wrapper written as a temporary fix.

However, the more permanent solution would require pre-loading without the use of LD_PRELOAD on the command line. See DeepMind's implementation as a starting point.

Imported from ryanjulian/rllab#117

Add lint checks to CI

Options:

pylint
autopep8
flake8
flake8-bugbear
pyflakes
pycodestyle
mypy
pytype

See:
https://github.com/PyCQA

Important rules (running list):

PEP8 import styling

Imported from ryanjulian/rllab#100

Replace Distributions with tf.Distributions

To add some details: Distributions are used by policies and other modules to add distribution functionality, such as computing the KL divergence between two distributions etc. given the parameters of a distribution (which are often output tensors of an NN). TF.Distributions probably implements the same thing so that we should try to replace our code by using the TF counterpart.

Imported from ryanjulian/rllab#52

Add adaptive task parameters support in ros.env

Imported from ryanjulian/rllab#97

Cross-platform support for TF asynchronous plotting

The current implementation of async plotting uses Multithreading. However, using multiprocessing will throw segmentation for Mac OSX machines, but we want to prioritize multiprocessing for Linux machines. Therefore, write code that will support both implementations.