Code Monkey home page Code Monkey logo

firedup's Introduction

Welcome to Fired Up in Deep RL!

This is a clone of OpenAI's Spinning Up in PyTorch. Spinning Up is an awesome educational resource produced by Josh Achiam, a research scientist at OpenAI, that makes it easier to learn about deep reinforcement learning (deep RL).

Installation

Fired Up requires Python3, PyTorch, OpenAI Gym, and OpenMPI.

Fired Up is currently only supported on Linux and OSX. It may be possible to install on Windows, though I haven't tested this OS.

Installing Python

We recommend installing Python through Anaconda. Anaconda is a Python distribution that includes many useful packages especially for scientific computing, as well as an environment manager called conda that makes package management simple.

Download and install Anaconda 2018.x (at time of writing, 2018.12) Python 3.7. Then create a conda environment for organizing packages used in Fired Up:

conda create -n firedup python=3.7

To use Python from the environment you just created, activate the environment with:

source activate firedup

You can alternatively use virtualenv with the Python3 version you have. Just install it via pip3 and then:

virtualenv firedup

To activate this virtual environment you need to:

source /path/to/firedup/bin/activate

Installing OpenMPI

Ubuntu

sudo apt update && sudo apt install libopenmpi-dev

Mac OS X

Installation of system packages on Mac requires Homebrew. With Homebrew installed, run the following:

brew install openmpi

Installing Fired Up

git clone https://github.com/kashif/firedup.git
cd firedup
pip install -e .

Fired Up defaults to installing everything in Gym except the MuJoCo environments.

Check Your Install

To see if you've successfully installed Fired Up, try running PPO in the LunarLander-v2 environment with:

python -m fireup.run ppo --hid "[32,32]" --env LunarLander-v2 --exp_name installtest --gamma 0.999

After it finishes training, watch a video of the trained policy with:

python -m fireup.run test_policy data/installtest/installtest_s0

And plot the results with:

python -m fireup.run plot data/installtest/installtest_s0

Algorithms

The following algorithms are implemented in the Fired Up package:

  • Vanilla Policy Gradient (VPG)
  • Trust Region Policy Optimization (TRPO)
  • Proximal Policy Optimization (PPO)
  • Deep Q-Network (DQN)
  • Deep Deterministic Policy Gradient (DDPG)
  • Twin Delayed DDPG (TD3)
  • Soft Actor-Critic (SAC)

They are all implemented with MLP (non-recurrent) actor-critics, making them suitable for fully-observed, non-image-based RL environments, e.g. the Gym Mujoco environments.

Citation

If you use Fired Up in your research please use the following BibTeX entry:

@misc{rasulfiredup,
  author =       {Kashif Rasul and Joshua Achiam},
  title =        {Fired Up},
  howpublished = {\url{https://github.com/kashif/firedup/}},
  year =         {2019}
}

firedup's People

Contributors

kashif avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar

firedup's Issues

How can I easily modify the code?

Since I have to install this package before it works through pip install -e ., this is inconvenient for modify the code, can I use it without install?

ddpg torch error

Hi @kashif, thanks for making this available!

The DDPG implementation currently gives me this following error:

RuntimeError: one of the variables needed for gradient computation has been modified by an inplace operation: [torch.FloatTensor [300, 1]], which is output 0 of TBackward, is at version 2; expected version 1 instead. Hint: enable anomaly detection to find the operation that failed to compute its gradient, with torch.autograd.set_detect_anomaly(True).

I am on torch version

torch                             1.6.0
torchvision                       0.5.0

Performance Differences between Tensorflow and Pytorch

I cloned your repo and ran the vpg algo and compared the perf with the tensorflow version. I did an average of 5 runs to take care of the random seed and I saw some interesting results

Tensorflow: Avg Episode Return 81
Pytorch: Avg Episode Return 31
Why do you think this might be the case.

Disclaimer: I haven't read your code thoroughly so there might be some very small mistake. But is diff in performance of RL algos substantial in tf and pytorch ?

Is there any requirements for the env to fit in this repo?

I tried this repo with a simple env

class SimpleEnv(gym.Env):
    metadata = {'render.modes': ['human']}

    def __init__(self):
        super(SimpleEnv, self).__init__()
        self.observation_space = spaces.Box(low=0, high=2, shape=(4, 4))
        self.action_space = spaces.Discrete(3)
        self.reset()

    def step(self, action):
        ob = self.observation_space.sample()
        reward = 1
        episode_over = False if random.random()>0.5 else True
        return ob, reward, episode_over, {}

    def reset(self):
        ob = self.observation_space.sample()
        return ob

    def render(self, mode='human'):
        pass

and use it with the policy gradient agent as

    env = SimpleEnv
    env.seed(0)
    ac_kwargs = dict(hidden_sizes=(16,))
    agent = vpg(env, ac_kwargs=ac_kwargs)
    episode_count = 100
    reward = 0
    done = False

    for i in range(episode_count):
        ob = env.reset()
        while True:
            print(done)
            action = agent.act(ob, reward, done)
            ob, reward, done, _ = env.step(action)
            if done:
                break

But when I run this it get:
RuntimeError: size mismatch, m1: [1 x 16], m2: [4 x 16] at /opt/conda/conda-bld/pytorch-cpu_1549626403278/work/aten/src/TH/generic/THTensorMath.cpp:940

This seems because of the mismatch of observation space and the Actor-Critic network. But it works well with env provided by the gym. Did I missed something here?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.