Code Monkey home page Code Monkey logo

xingtian's Introduction

中文

Introduction

License: MIT

XingTian (刑天) is a componentized library for the development and verification of reinforcement learning algorithms. It supports multiple algorithms, including DQN, DDPG, PPO, and IMPALA etc, which could training agents in multiple environments, such as Gym, Atari, Torcs, StarCraftII and so on. To meet users' requirements for quick verification and solving RL problems, four modules are abstracted: Algorithm, Model, Agent, and Environment. They work in a similar way as the combination of `Lego' building blocks. For details about the architecture, please see the Architecture introduction.

Dependencies

# ubuntu 18.04
sudo apt-get install python3-pip libopencv-dev -y
pip3 install opencv-python

# run with tensorflow 1.15.0 or tensorflow 2.3.1
pip3 install zmq h5py gym[atari] tqdm imageio matplotlib==3.0.3 Ipython pyyaml tensorflow==1.15.0 pyarrow lz4 fabric2 absl-py psutil tensorboardX setproctitle

or, using pip3 install -r requirements.txt

If your want to used PyTorch as the backend, please install it by yourself. Ref Pytorch

Installation

# cd PATH/TO/XingTian 
pip3 install -e .

After installation, you could use import xt; print(xt.__Version__) to check whether the installation is successful.

In [1]: import xt

In [2]: xt.__version__
Out[2]: '0.3.0'

Quick Start


Setup configuration

Follow's configuration shows a minimal example with Cartpole environment. More detailed description with the parameters of agent, algorithm and environment could been find in the User guide .

alg_para:
  alg_name: PPO
  alg_config:
    process_num: 1
    save_model: True  # default False
    save_interval: 100

env_para:
  env_name: GymEnv
  env_info:
    name: CartPole-v0
    vision: False

agent_para:
  agent_name: PPO
  agent_num : 1
  agent_config:
    max_steps: 200
    complete_step: 1000000
    complete_episode: 3550

model_para:
  actor:
    model_name: PpoMlp
    state_dim: [4]
    action_dim: 2
    input_dtype: float32
    model_config:
      BATCH_SIZE: 200
      CRITIC_LOSS_COEF: 1.0
      ENTROPY_LOSS: 0.01
      LR: 0.0003
      LOSS_CLIPPING: 0.2
      MAX_GRAD_NORM: 5.0
      NUM_SGD_ITER: 8
      SUMMARY: False
      VF_SHARE_LAYERS: False
      activation: tanh
      hidden_sizes: [64, 64]

env_num: 10

In addition, your could find more configuration sets in examples directory.

Start training task

python3 xt/main.py -f examples/cartpole_ppo.yaml -t train

img

Evaluate local trained model

Set benchmark.eval.model_path for evaluation within the YOUR_CONFIG_FILE.yaml

benchmark:
  eval:
    model_path: /YOUR/PATH/TO/EVAL/models
    gap: 10           # index gap of eval model
    evaluator_num: 1  # the number of evaluator instance

# run command
python3 xt/main.py -f examples/cartpole_ppo.yaml -t evaluate

NOTE: XingTian start with -t train as default.

Run with CLI

# Could replace `python3 xt/main.py` with `xt_main` command!
xt_main -f examples/cartpole_ppo.yaml -t train

# train with evaluate
xt_main -f examples/cartpole_ppo.yaml -t train_with_evaluate

Develop with Custom case

  1. Write custom module, and register it. More detail guidance on custom module can be found in the Developer Guide
  2. Add YOUR-CUSTOM-MODULE name into your_train_configure.yaml
  3. Start training with xt_main -f path/to/your_train_configure.yaml :)

Reference Results

Episode Reward Average

  1. DQN Reward after 10M time-steps (40M frames).

    env XingTian Basic DQN RLlib Basic DQN Hessel et al. DQN
    BeamRider 6706 2869 ~2000
    Breakout 352 287 ~150
    QBert 14087 3921 ~4000
    SpaceInvaders 947 650 ~500
  2. PPO Reward after 10M time-steps (40M frames).

    env XingTian PPO RLlib PPO Baselines PPO
    BeamRider 4877 2807 ~1800
    Breakout 341 104 ~250
    QBert 14771 11085 ~14000
    SpaceInvaders 1025 671 ~800
  3. IMPALA Reward after 10M time-steps (40M frames).

    env XingTian IMPALA RLlib IMPALA
    BeamRider 2313 2071
    Breakout 334 385
    QBert 12205 4068
    SpaceInvaders 742 719

Throughput

  1. DQN

    env XingTian Basic DQN RLlib Basic DQN
    BeamRider 129 109
    Breakout 117 113
    QBert 111 90
    SpaceInvaders 115 100
  2. PPO

    env XingTian PPO RLlib PPO
    BeamRider 2422 1618
    Breakout 2497 1535
    QBert 2436 1617
    SpaceInvaders 2438 1608
  3. IMPALA

    env XingTian IMPALA RLlib IMPALA
    BeamRider 8756 3637
    Breakout 8814 3525
    QBert 8249 3471
    SpaceInvaders 8463 3555

Experiment condition: 72 Intel(R) Xeon(R) Gold 6154 CPU @ 3.00GHz with single Tesla V100

Ray's reward data come from https://github.com/ray-project/rl-experiments, and Throughout from ray 0.8.6 with the same machine condition.

Acknowledgement

XingTian refers to the following projects: DeepMind/scalable_agent, baselines, ray.

License

The MIT License(MIT)

xingtian's People

Contributors

hustqj avatar kevinlu0123 avatar lijiahui08 avatar tianqi-777 avatar tongxialiang avatar zgfhill avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

xingtian's Issues

关于模拟器的几个问题

你好,有几个问题希望可以得到解答。
1、xingtian可以在window下运行吗?
2、“A Hierarchical Reinforcement Learning Based Optimization Framework for Large-scale Dynamic Pickup and Delivery Problems”这篇论文提到dpdp模拟器,文中的分层智能体算法在哪里可以找到?

One question about the simulator of `DPDP Competition`.

What's the function of file output_route.json?

I mean it looks like the file output_destination.json could support it finishes all the plans since it contains the information of order involving the items, factories about pickup and delivery, and of course the vehicle itself.

Or it will auto-updated that the destination of the vehicle by the planned route when it reached the destination and after pickup?

关于使用yaml自定义模型

1、关于使用yaml文件参数自定义模型,如果可以自定义卷积层或者全连接的层数易用性应该会更好一些?
2、好像没有实现lstm模块

What features do you want to add to XingTian?

We want to know what you want to add to XingTian, we will evaluate your suggestions and make a plan. Please let us know if you are interested.

  • add detailed user-defined module guidance
  • add supports for distribution of continuous & discrete action space
  • add supports for model-based algorithms, e.g, MuZero
  • add supports for multi-agents algorithms, e.g, Qmix
  • add support for evolutionary algorithms, e.g, PBT
  • add supports for DaVinci
  • add supports for training with Multi-GPU
  • add supports for call XingTian within user-python-code

多进程启用无效

参考user.cn.md文件,配置参数
alg_config:
process_num: 1 # 训练是否启用多进程(完善中)
尝试修改process_num训练cartpole,性能并没有得到改善,请问是该功能还未完善吗

增大explores数量,吞吐量没有相应增大

您好:
我复现了刑天的dqn算法,测试应用场景有Qbert、CartPole、LunarLander,使用的默认配置yaml文件运行,仅仅修改env_num(在单个节点下并行多实例explorer的数量)。发现随着explorers增大,吞吐量并没有随之增大。猜测可能是通信效率的问题,explorer采样一步后传递数据,多个explorer争夺锁导致吞吐量没有像预期地增大。

遇到的问题:单节点下增加explorer数量,吞吐量没有上升。

实验数据:1个explorer,吞吐量259;2个explorers,吞吐量355;4个explorers,吞吐量300;8个explorers,吞吐量311。

请问,我应该怎么设置才能得到近线性或较好的吞吐量加速比?期待您的回复,谢谢!

Use of custom environment and agents

I am interested in using XingTian for multi-agent training with PPO algorithm in the SMARTS environment. An example to use SMARTS environment is available here.

Could you provide a detailed step-by-step instructions and an example on how to use XingTian with our own custom environment for multi-agent training?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.