rlworkgroup / garage Goto Github PK
View Code? Open in Web Editor NEWA toolkit for reproducible reinforcement learning research.
License: MIT License
A toolkit for reproducible reinforcement learning research.
License: MIT License
Imported from ryanjulian/rllab#35
This is probably related to ryanjulian/rllab#89, since dm_control uses mujoco 1.5.0. We may need to combine setup_mujoco.sh and setup_.sh
Imported from ryanjulian/rllab#90
The current logger interface is okay, but the implementation is a bit messy and the TensorBoard integration is kind of a bolt-on afterthought. It's also package-global which can induce bad implementation decisions.
I'd like to rewrite the logger to be properly encapsulated as a class(es). There can still be a global singleton instance for easy access.
Ideas:
Imported from ryanjulian/rllab#80
conda makes it difficult to use rllab as a library.
We would like to transition to using the standard Python package interface. This will require getting all the dependencies to install using pip, plus probably a some custom setup scripts for setup.py.
Adding wheel compilation to the CI (e.g. appveyor) is also in-scope for this project.
Imported from ryanjulian/rllab#81
We are moving towards making the common parts of rllab agnostic of the NN library. TensorFlow should no longer be a second-class citizen.
This change would remove the TensorFlow sandbox and make the TensorFlow tree a first-class rllab citizen.
Imported from ryanjulian/rllab#84
Imported from ryanjulian/rllab#76
Imported from ryanjulian/rllab#74
Theano should no longer be first-class while tf second-class. We are aiming for major parts of rllab to be NN-framework agnostic.
This should move theano-specific components into garage.theano
, while stripping Theano dependencies from common parts of the code.
Imported fromhttps://github.com/ryanjulian/rllab/issues/83
Incorporate the TD3 algorithm in garage.
See https://github.com/PyCQA/flake8-import-order
This will allow us to put sandbox
and contrib
in separate groups below imports from garage
Imported from ryanjulian/rllab#94
Traceback (most recent call last):
File "tests/envs/test_envs.py", line 65, in <module>
envs = [cls() for cls in simple_env_classes]
File "tests/envs/test_envs.py", line 65, in <listcomp>
envs = [cls() for cls in simple_env_classes]
File "/home/rjulian/code/garage/rllab/envs/mujoco/point_env.py", line 21, in __init__
super(PointEnv, self).__init__(*args, **kwargs)
File "/home/rjulian/code/garage/rllab/envs/mujoco/mujoco_env.py", line 85, in __init__
self.reset()
File "/home/rjulian/code/garage/rllab/envs/mujoco/mujoco_env.py", line 131, in reset
return self.get_current_obs()
File "/home/rjulian/code/garage/rllab/envs/mujoco/mujoco_env.py", line 134, in get_current_obs
return self._get_full_obs()
File "/home/rjulian/code/garage/rllab/envs/mujoco/mujoco_env.py", line 138, in _get_full_obs
cdists = np.copy(self.sim.geom_margin).flat
AttributeError: 'mujoco_py.cymj.MjSim' object has no attribute 'geom_margin'
Imported from ryanjulian/rllab#34
Imported from ryanjulian/rllab#107
In order to support visuomotor control learning and other problems, we need to implement a way to use policies that consist of submodules which handle certain input modalities, such as images and vectors. OpenAI Gym already has support for a tuple_space that is a tuple of different spaces. The most common use-case of such multi-modal observation spaces are combinations of 2d images and vectors.
Exact specification needs to be done but for now the task items look as follows:
tuple_space
as observation space consisting of an image and a vector (e.g. reacher with top-down view image and 2d endeffector position)Your feedback on this issue is most welcome so that we can split up this feature into smaller tasks.
Imported from ryanjulian/rllab#108
Asynchronous plotting for TensorFlow works perfectly fine on Linux using threading.Thread
, but Mac OS X will not display a window, even if threading.Thread
is switched to multiprocessing.Process
, which is how it was implemented using Theano.
Imported from ryanjulian/rllab#127
Imported from ryanjulian/rllab#77
Right now rllab uses parallelism in an ad-hoc manner through multiprocessing library, mostly in the sampler. If we use a principled parallelism library (e.g. mpi, ray, others), we can probably clean up the code while avoiding tricky multiprocessing bugs in the future.
Imported from ryanjulian/rllab#82
Normalized env of gym env is done at ryanjulian/rllab#125, but the refactoring of gym env is not done yet(ryanjulian/rllab#129). Please refactor for normalized_gym_env.py of our codebase.
Imported from: ryanjulian/rllab#131
This is very useful for real robots. Is it supported by gym.Env
?
Imported from ryanjulian/rllab#70
I found an issue in mujoco_env. The MjSim of mujoco_py does not has an attribute of geom_margin, which is used by _get_full_obs(). This is caused by a pr that refactors rllab.mujoco_py to mujoco_py
There might be other issues imported this pr. So a regression test of mujoco envs is desired for this issue.
This is largely superseded by tf.nn
, and it adds another unnecessary layer of complexity--it would be easier to understand if primitives were just written in pure TF.
Imported from ryanjulian/rllab#42
https://blog.openai.com/generalizing-from-simulation/
We should probably not attempt this until #4 is done.
For now, only MuJoCo support is necessary. If MuJoCo is too burdensome, we can consider switching engines (e.g. Bullet)
Related issue openai/mujoco-py#148 suggests this may be nontrivial.
Imported from ryanjulian/rllab#61
Now that we are using OpenAI gym directly in our RL algorithms, we still need to normalize the environment (actions, observations, rewards) as in https://github.com/ryanjulian/rllab/blob/integration/rllab/envs/normalized_env.py.
So implement NormalizedEnv
for gym.Env
.
Imported from ryanjulian/rllab#64
We should get rid of this to make packaging easier, unless there's a very compelling reason why not to. dm_control
makes it work fine.
Imported from ryanjulian/rllab#96
They have not been touched in a long time.
Tasks:
Imported from ryanjulian/rllab#79
Imported from ryanjulian/rllab#95
It should be possible to set the TensorFlow session options whenever a tf.Session
is created for training, such as here (other places where a session is constructed might not need to use these options). It is sometimes necessary to limit the available memory to tf running on a GPU, etc.
A possible implementation could allow the user to specify the GPU options via a ConfigProto
setting in config_personal.py
(set to None
by default).
Imported from ryanjulian/rllab#123
It's missing an absl dependency.
Imported from ryanjulian/rllab#89
Original paper:
https://arxiv.org/abs/1509.02971
Implementation in the Theano tree:
https://github.com/ryanjulian/rllab/blob/integration/rllab/algos/ddpg.py
OpenAI baselines implementation:
https://github.com/openai/baselines/tree/master/baselines/ddpg
blog post (there are many other resources
http://pemami4911.github.io/blog/2016/08/21/ddpg-rl.html
Sketch:
Imported from ryanjulian/rllab#26
Imported from ryanjulian/rllab#6
It will be the base for garage.torch
Presently asynchronous plotting/3D rendering is not supported in the part of rllab based on TensorFlow (sandbox.rocky.tf), but it is supported in the rllab code which uses Theano.
This means that when you turn plotting on for a Theano training session, the plot does not block the training process. The TensorFlow implementation runs the rendering loop directly in the training algorithm (rather than a worker), so it blocks. This makes training using TensorFlow much slower than Theano when plotting is turned on (they are about the same without plotting).
TensorFlow's notion of a session makes this tricky. I'm not 100% sure that there is a solution. If you figure out that it is impossible, or requires rewriting large parts of the repository, email me with what you tried and some explanations why.
Current behavior
3D plotting with MuJoCo in TensorFlow is synchronous, blocks the training process.
Desired behavior
3D plotting with MuJoCo in TensorFlow is asynchronous and does not block the training process (just as with Theano)
To run a basic training algorithm which has 3D plots using Theano (you will need to pass plot=True to TRPO):
https://github.com/ryanjulian/rllab/blob/master/examples/trpo_swimmer.py
To run a basic training algorithm with TensorFlow:
https://github.com/ryanjulian/rllab/blob/master/sandbox/rocky/tf/launchers/trpo_cartpole.py
Note: this does not use a MuJoCo environment/3D plotting, but you just need to change the environment to SwimmerEnv() and it will.
Asynchronous plotter API used by the Theano code
https://github.com/ryanjulian/rllab/tree/master/rllab/plotter
Theano and TF implementations of BatchPolopt (calls the plotter)
Theano: https://github.com/ryanjulian/rllab/blob/master/rllab/algos/batch_polopt.py
TF: https://github.com/ryanjulian/rllab/blob/master/sandbox/rocky/tf/algos/batch_polopt.py
rllab implementation of Serializable
(used to transfer objects across multiprocessing queues): https://github.com/ryanjulian/rllab/blob/master/rllab/core/serializable.py
Fork this repository into you own Github account, and implement this feature in its own branch, based on the master
branch. When you are done and would like a code review, send me an email with a link to your feature branch. DO NOT SUBMIT A PULL REQUEST TO THIS REPOSITORY.
Consider this a professional software engineering task, and provide a high-quality solution which does not break existing users, minimizes change, and is stable. Tests are welcome where appropriate. Please always use PEP8 style in your code, and format it using YAPF (with the PEP8 setting).
run_experiment_lite
wrapper with the parameter n_parallel
greater than 1 (this triggers multiprocess operation).rllab/
. The tree sandbox/rocky/tf
re-implements classes from the original tree using TensorFlow, and is backwards-compatible with the Theano tree. We are working towards using only one NN library soon, but for now your implementation needs to work in both trees.Imported from ryanjulian/rllab#1
Original paper:
https://storage.googleapis.com/deepmind-media/dqn/DQNNaturePaper.pdf
OpenAI baselines implementation:
https://github.com/openai/baselines/tree/master/baselines/deepq
(There are many more resources on Google with DQN implementations in TF)
Sketch:
Imported from ryanjulian/rllab#50
We would like to add support for the Bullet physics engine to rllab. Thankfully, the Bullet team have recently provided Python bindings in the form of pybullet, and even provides examples of how to implement the gym.Env
interface (from OpenAI Gym) using pyBullet.
This task is to add pybullet
to the rllab conda
environment, and implement a class (similar to GymEnv
, e.g. BulletEnv
) which allows any rllab algorithm to learn against pybullet environments. You will also need to implement the plot interface, if pybullet does not already, which shows the user a 3D animation of the environment. Essentially, you should duplicate the experience of running one of the MuJoCo-based examples (e.g. trpo_swimmer.py), but using a Bullet environment instead. You should include examples (in examples/
and sandbox/rocky/tf/launchers/
) of launcher scripts which use an algorithms (suggestion: TRPO) to train the KukaGymEnv
environment.
This is conceptually the same as GymEnv, which allows rllab users to import any OpenAI Gym environment and learn against them. In fact, pybullet environments implement the Gym interface, so in theory we should be done as soon as we can import pybullet. In practice, our constructor for Gym environments only takes the string name (e.g. "Humanoid-v1") of a Gym environment, not the class of a Gym environment. The pybullet environments do not have string shortcuts because they are not part of the official Gym repository. Furthermore, we'd like to use other unofficial Gym environments in rllab, but it is currently difficult for the same reason.
So you might structure this task as two pull requests (1) adding pybullet to the conda environment and (2) Modifying GymEnv
to support arbitrary environments which implement the gym.Env
interface (attempted in ryanjulian/rllab#12).
Consider this a professional software engineering task, and provide a high-quality solution which does not break existing users, minimizes change, and is stable. Please always use PEP8 style in your code, and format it using YAPF (with the PEP8 setting). Submit your pull request against the integration
branch of this repository.
Some notes:
run_experiment_lite
wrapper.rllab/
. The tree sandbox/rocky/tf
re-implements classes from the original tree using TensorFlow, and is backwards-compatible with the Theano tree. We are working towards using only one NN library soon, but for now your implementation needs to work in both trees.Imported from ryanjulian/rllab#5
Imported from ryanjulian/rllab#8
The community has settled on gym.Env
as a de-facto standard environment interface. There's no reason to keep our own around.
The scope of this change is to remove the rllab.envs.Env base interface, and refactor implementing classes to instead implement gym.Env. Note that this explicitly does not mean that we are adopting the physics engine, registration system, benchmarks, etc of OpenAI Gym--just the gym.Env abstract interface.
Imported from ryanjulian/rllab#85
Imported from ryanjulian/rllab#31
Right now the temporary solution to this issue is prepend python examples/xxx.py
with LD_PRELOAD=/usr/lib/x86_64-linux-gnu/libGLEW.so:/usr/lib/nvidia-384/libGL.so
; note that you must replace nvidia-384
to the one installed on your machine (use nvidia-smi
to determine the driver version currently in use). Relevant comments are here and here, and this link is a wrapper written as a temporary fix.
However, the more permanent solution would require pre-loading without the use of LD_PRELOAD
on the command line. See DeepMind's implementation as a starting point.
Imported from ryanjulian/rllab#117
Options:
Important rules (running list):
Imported from ryanjulian/rllab#100
To add some details: Distributions are used by policies and other modules to add distribution functionality, such as computing the KL divergence between two distributions etc. given the parameters of a distribution (which are often output tensors of an NN). TF.Distributions probably implements the same thing so that we should try to replace our code by using the TF counterpart.
Imported from ryanjulian/rllab#52
Imported from ryanjulian/rllab#97
The current implementation of async plotting uses Multithreading. However, using multiprocessing will throw segmentation for Mac OSX machines, but we want to prioritize multiprocessing for Linux machines. Therefore, write code that will support both implementations.
Given a Distribution(y|x), where x = [x_1, x_2, x_3] I should be able to define a conditional distributions Dist(y| x_1), Dist(y | [x_1, x_3]), Dist(y | x_2), etc.
Imported from ryanjulian/rllab#3
Make it more clean and simple
Imported from ryanjulian/rllab#75
This is more principled than stuffing dependency logic into shell scripts.
Imported from ryanjulian/rllab#78
Currently this is proxied by Policy.recurrent, but there are loss functions for non-recurrent policies which need fixed-length input trajectories/valid variables (i.e. any time you want to differentiate through the loss function).
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.