Code Monkey home page Code Monkey logo

elegantrl's Introduction

ElegantRL “小雅”: Scalable and Elastic Deep Reinforcement Learning

Downloads Downloads Python 3.6 PyPI



ElegantRL is developped for researchers and practitioners with the following advantages:

  • Lightweight: The core codes <1,000 lines (check elegantrl/tutorial), using PyTorch (train), OpenAI Gym (env), NumPy, Matplotlib (plot).

  • Efficient: in many testing cases, we find it more efficient than Ray RLlib.

  • Stable: much more stable than Stable Baselines 3. Stable Baselines 3 can only use single GPU, but ElegantRL can use 1~8 GPUs for stable training.

ElegantRL implements the following model-free deep reinforcement learning (DRL) algorithms:

  • DDPG, TD3, SAC, PPO, PPO (GAE) for continuous actions
  • DQN, DoubleDQN, D3QN for discrete actions

For the details of DRL algorithms, please check out the educational webpage OpenAI Spinning Up.

Contents

News

Framework

File_structure

An agent (agent.py) with Actor-Critic networks (net.py) is trained (run.py) by interacting with an environment (env.py).

A high-level overview:

  • 1). Instantiate an environment in Env.py, and an agent in Agent.py with an Actor network and a Critic network in Net.py;
  • 2). In each training step in Run.py, the agent interacts with the environment, generating transitions that are stored into a Replay Buffer;
  • 3). The agent fetches a batch of transitions from the Replay Buffer to train its networks;
  • 4). After each update, an evaluator evaluates the agent's performance (e.g., fitness score or cumulative return) and saves the agent if the performance is good.

Code Structure

Core Codes

  • elegantrl/net.py         # Neural networks.
    • Q-Net,
    • Actor network,
    • Critic network,
  • elegantrl/agent.py   # RL algorithms.
    • AgentBase,
  • elegantrl/run.py       # run DEMO 1 ~ 4
    • Parameter initialization,
    • Training loop,
    • Evaluator.

Until Codes

  • elegantrl/envs/      # gym env or custom env, including FinanceStockEnv.
    • gym_utils.py: A PreprocessEnv class for gym-environment modification.
    • Stock_Trading_Env: A self-created stock trading environment as an example for user customization.
  • eRL_demo_BipedalWalker.ipynb        # BipedalWalker-v2 in jupyter notebooks
  • eRL_demos.ipynb      # Demo 1~4 in jupyter notebooks. Tell you how to use tutorial version and advanced version.
  • eRL_demo_SingleFilePPO.py      # Use single file to train PPO, more simple than tutorial version
  • eRL_demo_StockTrading.py      # Stock Trading Application in jupyter notebooks

Start to Train

Initialization:

  • hyper-parameters args.
  • env = PreprocessEnv() : creates an environment (in the OpenAI gym format).
  • agent = agent.XXX() : creates an agent for a DRL algorithm.
  • buffer = ReplayBuffer() : stores the transitions.
  • evaluator = Evaluator() : evaluates and stores the trained model.

Training (a while-loop):

  • agent.explore_env(…): the agent explores the environment within target steps, generates transitions, and stores them into the ReplayBuffer.
  • agent.update_net(…): the agent uses a batch from the ReplayBuffer to update the network parameters.
  • evaluator.evaluate_save(…): evaluates the agent's performance and keeps the trained model with the highest score.

The while-loop will terminate when the conditions are met, e.g., achieving a target score, maximum steps, or manually breaks.

Experiment

Experiment 1 Comparisons with benchmark algorithms

Experiment 2 Self-comparisons

It is easy to run the

Experimental Demos

LunarLanderContinuous-v2

LunarLanderTwinDelay3

BipedalWalkerHardcore-v2

Note: BipedalWalkerHardcore is a difficult task in continuous action space. There are only a few RL implementations can reach the target reward. Check out a experiment video: Crack the BipedalWalkerHardcore-v2 with total reward 310 using IntelAC.

Requirements

Necessary:
| Python 3.6+     |           
| PyTorch 1.6+    |    

Not necessary:
| Numpy 1.18+     | For ReplayBuffer. Numpy will be installed along with PyTorch.
| gym 0.17.0      | For env. Gym provides tutorial env for DRL training. (env.render() bug in gym==0.18 pyglet==1.6. Change to gym==0.17.0, pyglet==1.5)
| pybullet 2.7+   | For env. We use PyBullet (free) as an alternative of MuJoCo (not free).
| box2d-py 2.3.8  | For gym. Use pip install Box2D (instead of box2d-py)
| matplotlib 3.2  | For plots. 

pip3 install gym==1.17.0 pybullet Box2D matplotlib

Citation:

To cite this repository:

@misc{erl,
  author = {Liu, Xiao-Yang and Li, Zechu and Wang, Zhaoran and Zheng, Jiahao},
  title = {{ElegantRL}: A Scalable and Elastic Deep Reinforcement Learning Library},
  year = {2021},
  publisher = {GitHub},
  journal = {GitHub repository},
  howpublished = {\url{https://github.com/AI4Finance-Foundation/ElegantRL}},
}

elegantrl's People

Contributors

yonv1943 avatar supersglzc avatar yangletliu avatar xiao000l avatar bruceyanghy avatar ziyixia avatar orionzou avatar zywang624 avatar shixun404 avatar csbobby avatar everssun avatar zhangaipi avatar rayrui312 avatar zhumingpassional avatar daijing5763 avatar

Watchers

James Cloos avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.