vwxyzjn / cleanba Goto Github PK

CleanRL's implementation of DeepMind's Podracer Sebulba Architecture for Distributed DRL

License: Other

Python 94.79% Shell 5.21%

cleanba's Introduction

Cleanba: A Reproducible and Efficient Distributed Reinforcement Learning Platform

[Paper]

Cleanba is CleanRL-style implementation of DeepMind's Sebulba distributed training platform, but with a few different design choices to make distributed RL more reproducible and transparent to use.

Warning This repo is intended for archiving purposes. Once the codebase is stable, we will move it to CleanRL for future maintenance.

Highlights

Strong performance: Cleanba's IMPALA and PPO achieve about 165% median HNS in Atari with sticky actions, matching monobeast IMPALA's 165% median HNS and outperforming moolib IMPALA's 140% median HNS.
Short training time: Under the 1 GPU 10 CPU setting, Cleanba's IMPALA is 6.8x faster monobeast's IMPALA and 1.2x faster than moolib's IMPALA. Under a max specification setting, Cleanba's IMPALA (8 GPU and 40 CPU) and 2x faster than moolib's IMPALA (8 GPU and 80 CPU) is 5x faster than monobeast's IMPALA (1 GPU and 80 CPU).
Highly reproducible: Cleanba shows predictable and reproducible learning curves across 1 and 8 GPU settings given the same set of hyperparameters, whereas moolib's learning curves can be considerably different, even if hyperparameters are controlled to be the same.

Understandable: We adopt the single-file implementation philosophy used in CleanRL, making our core codebase succinct and easy to understand. For example, our cleanba/cleanba_ppo.py is ~800 lines of code.

Get started

Prerequisites:

Python >=3.8
Poetry 1.3.2+
CUDA 11.2+
CuDNN 8.2+

Installation:

poetry install
poetry run pip install --upgrade "jax[cuda11_cudnn82]==0.4.8" -f https://storage.googleapis.com/jax-releases/jax_cuda_releases.html
poetry run python cleanba/cleanba_ppo.py
poetry run python cleanba/cleanba_ppo.py --help

Experiments:

Let us use a0-l1,2,3-d1 to denote our setups, where a0 means actor on GPU 0, l1,2,3 means learner on GPUs 1,2,3, and d1 means the computation is distributed 1 time. Here are come common setups. You can also run the commands with --track to track the experiments with Weights & Biases.

# a0-l0-d1: single GPU
python cleanba/cleanba_ppo.py --actor-device-ids 0 --learner-device-ids 0 --local-num-envs 60 --track
# a0-l0,1-d1: two GPUs
python cleanba/cleanba_ppo.py --actor-device-ids 0 --learner-device-ids 0 1 --local-num-envs 60
# a0-l1,2-d1: three GPUs
python cleanba/cleanba_ppo.py --actor-device-ids 0 --learner-device-ids 1 2 --local-num-envs 60
# a0-l1,2,3-d1: four GPUs
python cleanba/cleanba_ppo.py --actor-device-ids 0 --learner-device-ids 1 2 3
# a0-l1,2,3,4-d1: five GPUs
python cleanba/cleanba_ppo.py --actor-device-ids 0 --learner-device-ids 1 2 3 4 --local-num-envs 60
# a0-l1,2,3,4,5,6-d1: seven GPUs
python cleanba/cleanba_ppo.py --actor-device-ids 0 --learner-device-ids 1 2 3 4 5 6 --local-num-envs 60

# a0-l0-d2: 8 GPUs (distributed 2 times on 4 GPUs)
# execute them in separate terminals; here we assume all 8 GPUs are on the same machine
# however it is possible to scale to hundreds of GPUs allowed by `jax.distributed`
CUDA_VISIBLE_DEVICES="0,1,2,3" SLURM_JOB_ID=26017 SLURM_STEP_NODELIST=localhost SLURM_NTASKS=2 SLURM_PROCID=0 SLURM_LOCALID=0 SLURM_STEP_NUM_NODES=2 python cleanba/cleanba_ppo.py --distributed --actor-device-ids 0 --learner-device-ids 1 2 3 --local-num-envs 60
CUDA_VISIBLE_DEVICES="4,5,6,7" SLURM_JOB_ID=26017 SLURM_STEP_NODELIST=localhost SLURM_NTASKS=2 SLURM_PROCID=1 SLURM_LOCALID=0 SLURM_STEP_NUM_NODES=2 python cleanba/cleanba_ppo.py --distributed --actor-device-ids 0 --learner-device-ids 1 2 3 --local-num-envs 60

# if you have slurm it's possible to run the following
python -m cleanrl_utils.benchmark \
    --env-ids Breakout-v5 \
    --command "poetry run python cleanrl/cleanba_ppo.py --distributed --learner-device-ids 1 2 3 --local-num-envs 60 --track --save-model --upload-model" \
    --num-seeds 1 \
    --workers 1 \
    --slurm-gpus-per-task 4 \
    --slurm-ntasks 2 \
    --slurm-nodes 1 \
    --slurm-template-path cleanba.slurm_template

Reproduction of all of our results.

Please see benchmark.sh for the commands to reproduce all of our results.

The commands to reproduce the TPU experiments can be found in tpu.sh. Here is a video demonstrating the orchestration of TPU experiments.

Screen.Recording.2023-03-23.at.9.31.58.PM.mov

Using an earlier version of the codebase, here are some runtime numbers for different hardware settings (GPUs TPUs).

	runtime (minutes) in `Breakout-v5`
baseline (8 A100)	30.4671
a0_l0_d1 (1 A100)	154.079
a0_l0_d2 (2 A100)	93.3155
a0_l1_d1 (2 A100)	121.107
a0_l01_d1 (2 A100)	101.63
a0_l1 2_d1 (3 A100)	70.2993
a0_l1 2 3_d1 (4 A100)	52.5321
a0_l0_d4 (4 A100)	58.4344
a0_l1 2 3 4_d1 (5 A100)	44.8671
a0_l1 2 3 4 5 6_d1 (7 A100)	38.4216
a0_l1 2 3 4 5 6_d1 (7 TPUv3-8 cores)	124.397
a0_l1 2_d1 (6 TPUv4 cores )	44.4206
a0_l1_d1 (4 TPUv4 cores)	54.6161
a0_l1_d2 (8 TPUv4 cores)	33.1134

Detailed performance

The complete learning curves can be found in the static/cleanba folder. static/cleanba/plot.sh contains the plotting script.

Acknowledgements

We thank

Stability AI's HPC for generously providing much GPU computational resources to this project.
Hugging Face's cluster for providing much GPU computational resources to this project.
Google's TPU Research Cloud for providing the TPU computational resources.

Citation

@misc{huang2023cleanba,
      title={Cleanba: A Reproducible and Efficient Distributed Reinforcement Learning Platform}, 
      author={Shengyi Huang and Jiayi Weng and Rujikorn Charakorn and Min Lin and Zhongwen Xu and Santiago Ontañón},
      year={2023},
      eprint={2310.00036},
      archivePrefix={arXiv},
      primaryClass={cs.LG}
}

cleanba's People

Contributors

Stargazers

Watchers

Forkers

chufansuki 51616 altmand ruanjohn charlebulla1 techthiyanes frasermince wbrenton zzmjohn

cleanba's Issues

Segfault

python -m cleanrl_utils.benchmark \
    --env-ids Breakout-v5 \
    --command "poetry run python cleanba/cleanba_ppo_envpool_impala_atari_wrapper.py --exp-name cleanba_ppo_envpool_impala_atari_wrapper_a0_l1+2+3_d32 --distributed --total-timesteps 100000000 --anneal-lr False --learner-device-ids 1 2 3 --track --wandb-project-name cleanba" \
    --num-seeds 1 \
    --workers 3 \
    --slurm-gpus-per-task 4 \
    --slurm-ntasks 32 \
    --slurm-total-cpus 960 \
    --slurm-template-path cleanba.slurm_template

Produces the following error, but it's not always reproducible.

Running task 0 with env_id: Breakout-v5 and seed: 1
wandb: Currently logged in as: costa-huang (openrlbenchmark). Use `wandb login --relogin` to force relogin
wandb: Tracking run with wandb version 0.13.10
wandb: Run data is saved locally in /admin/home-costa/cleanba/wandb/run-20230307_160455-ir6iswuf
wandb: Run `wandb offline` to turn off syncing.
wandb: Syncing run Breakout-v5__cleanba_ppo_envpool_impala_atari_wrapper_a0_l1+2+3_d32__1__455adc4b-68f2-49b5-b08b-85cead7657d8
wandb: ⭐️ View project at https://wandb.ai/openrlbenchmark/cleanba
wandb: 🚀 View run at https://wandb.ai/openrlbenchmark/cleanba/runs/ir6iswuf
srun: error: ip-26-0-141-178: task 15: Segmentation fault
srun: error: ip-26-0-141-217: task 16: Segmentation fault
srun: error: ip-26-0-141-217: task 17: Segmentation fault
2023-03-07 16:06:30.684022: E external/org_tensorflow/tensorflow/tsl/distributed_runtime/coordination/coordination_service.cc:956] /job:jax_worker/replica:0/task:15 has been set to ERROR in coordination service: UNAVAILABLE: Task /job:jax_worker/replica:0/task:15 heartbeat timeout. This indicates that the remote task has failed, got preempted, or crashed unexpectedly. [type.googleapis.com/tensorflow.CoordinationServiceError='']
2023-03-07 16:06:30.684105: E external/org_tensorflow/tensorflow/tsl/distributed_runtime/coordination/coordination_service.cc:956] /job:jax_worker/replica:0/task:16 has been set to ERROR in coordination service: UNAVAILABLE: Task /job:jax_worker/replica:0/task:16 heartbeat timeout. This indicates that the remote task has failed, got preempted, or crashed unexpectedly. [type.googleapis.com/tensorflow.CoordinationServiceError='']
2023-03-07 16:06:30.684126: E external/org_tensorflow/tensorflow/tsl/distributed_runtime/coordination/coordination_service.cc:956] /job:jax_worker/replica:0/task:17 has been set to ERROR in coordination service: UNAVAILABLE: Task /job:jax_worker/replica:0/task:17 heartbeat timeout. This indicates that the remote task has failed, got preempted, or crashed unexpectedly. [type.googleapis.com/tensorflow.CoordinationServiceError='']
2023-03-07 16:06:30.684140: E external/org_tensorflow/tensorflow/tsl/distributed_runtime/coordination/coordination_service.cc:411] Stopping coordination service as heartbeat has timed out for /job:jax_worker/replica:0/task:15 and there is no service-to-client connection
2023-03-07 16:07:19.683537: E external/org_tensorflow/tensorflow/tsl/distributed_runtime/coordination/coordination_service_agent.cc:711] Coordination agent is in ERROR: INVALID_ARGUMENT: Unexpected task request with task_name=/job:jax_worker/replica:0/task:0
Additional GRPC error information from remote target unknown_target_for_coordination_leader:
:{"created":"@1678205239.683296695","description":"Error received from peer ipv4:26.0.141.128:61939","file":"external/com_github_grpc_grpc/src/core/lib/surface/call.cc","file_line":1056,"grpc_message":"Unexpected task request with task_name=/job:jax_worker/replica:0/task:0","grpc_status":3} [type.googleapis.com/tensorflow.CoordinationServiceError='']
2023-03-07 16:07:19.683623: E external/org_tensorflow/tensorflow/compiler/xla/pjrt/distributed/client.cc:452] Coordination service agent in error status: INVALID_ARGUMENT: Unexpected task request with task_name=/job:jax_worker/replica:0/task:0
Additional GRPC error information from remote target unknown_target_for_coordination_leader:
:{"created":"@1678205239.683296695","description":"Error received from peer ipv4:26.0.141.128:61939","file":"external/com_github_grpc_grpc/src/core/lib/surface/call.cc","file_line":1056,"grpc_message":"Unexpected task request with task_name=/job:jax_worker/replica:0/task:0","grpc_status":3} [type.googleapis.com/tensorflow.CoordinationServiceError='']
2023-03-07 16:07:19.683670: F external/org_tensorflow/tensorflow/compiler/xla/pjrt/distributed/client.h:75] Terminating process because the coordinator detected missing heartbeats. This most likely indicates that another task died; see the other task logs for more details. Status: INVALID_ARGUMENT: Unexpected task request with task_name=/job:jax_worker/replica:0/task:0
Additional GRPC error information from remote target unknown_target_for_coordination_leader:
:{"created":"@1678205239.683296695","description":"Error received from peer ipv4:26.0.141.128:61939","file":"external/com_github_grpc_grpc/src/core/lib/surface/call.cc","file_line":1056,"grpc_message":"Unexpected task request with task_name=/job:jax_worker/replica:0/task:0","grpc_status":3} [type.googleapis.com/tensorflow.CoordinationServiceError='']
2023-03-07 16:07:20.194646: E external/org_tensorflow/tensorflow/tsl/distributed_runtime/coordination/coordination_service_agent.cc:711] Coordination agent is in ERROR: UNAVAILABLE: failed to connect to all addresses
Additional GRPC error information from remote target unknown_target_for_coordination_leader:
:{"created":"@1678205240.194404984","description":"Failed to pick subchannel","file":"external/com_github_grpc_grpc/src/core/ext/filters/client_channel/client_channel.cc","file_line":3940,"referenced_errors":[{"created":"@1678205240.190710821","description":"failed to connect to all addresses","file":"external/com_github_grpc_grpc/src/core/ext/filters/client_channel/lb_policy/pick_first/pick_first.cc","file_line":392,"grpc_status":14}]}
2023-03-07 16:07:20.194713: E external/org_tensorflow/tensorflow/compiler/xla/pjrt/distributed/client.cc:452] Coordination service agent in error status: UNAVAILABLE: failed to connect to all addresses
2023-03-07 16:07:20.193225: E external/org_tensorflow/tensorflow/tsl/distributed_runtime/coordination/coordination_service_agent.cc:711] Coordination agent is in ERROR: UNAVAILABLE: failed to connect to all addresses
Additional GRPC error information from remote target unknown_target_for_coordination_leader:
:{"created":"@1678205240.193003968","description":"Failed to pick subchannel","file":"external/com_github_grpc_grpc/src/core/ext/filters/client_channel/client_channel.cc","file_line":3940,"referenced_errors":[{"created":"@1678205240.189222724","description":"failed to connect to all addresses","file":"external/com_github_grpc_grpc/src/core/ext/filters/client_channel/lb_policy/pick_first/pick_first.cc","file_line":392,"grpc_status":14}]}
2023-03-07 16:07:20.193273: E external/org_tensorflow/tensorflow/compiler/xla/pjrt/distributed/client.cc:452] Coordination service agent in error status: UNAVAILABLE: failed to connect to all addresses
srun: error: ip-26-0-141-247: task 21: Aborted
srun: error: ip-26-0-142-13: task 24: Aborted
srun: error: ip-26-0-141-178: task 14: Aborted
srun: error: ip-26-0-141-146: task 6: Aborted
srun: error: ip-26-0-142-24: task 29: Aborted
srun: error: ip-26-0-142-29: task 30: Aborted
srun: error: ip-26-0-141-132: task 3: Aborted
srun: error: ip-26-0-141-166: task 13: Aborted
srun: error: ip-26-0-141-140: task 5: Aborted
srun: error: ip-26-0-142-3: task 22: Aborted
srun: error: ip-26-0-142-21: task 27: Aborted
srun: error: ip-26-0-141-247: task 20: Aborted
srun: error: ip-26-0-141-161: task 10: Aborted
srun: error: ip-26-0-142-29: task 31: Aborted
srun: error: ip-26-0-141-157: task 9: Aborted
srun: error: ip-26-0-141-166: task 12: Aborted
srun: error: ip-26-0-142-13: task 25: Aborted
srun: error: ip-26-0-142-24: task 28: Aborted
srun: error: ip-26-0-141-146: task 7: Aborted
srun: error: ip-26-0-141-132: task 2: Aborted
srun: error: ip-26-0-141-140: task 4: Aborted
srun: error: ip-26-0-142-3: task 23: Aborted
srun: error: ip-26-0-141-228: task 19: Aborted
srun: error: ip-26-0-142-21: task 26: Aborted
srun: error: ip-26-0-141-161: task 11: Aborted
srun: error: ip-26-0-141-157: task 8: Aborted
srun: error: ip-26-0-141-228: task 18: Aborted
srun: error: ip-26-0-141-128: task 1: Aborted

Feature Request!

Could you add RNN or GTRXL to it please?!
Like this Repo: https://github.com/subho406/Recurrent-PPO-Jax

Missing clipped value loss in PPO implementation

Hi @vwxyzjn ,

This codebase is great, thanks for the hard work! I've been using it to run baseline experiments in procgen, and I've noticed that your implementation of PPO does not use value loss clipping. However it is enabled by default in the Pytorch implementation that is most often encountered in papers testing agents in procgen.

Is there a reason why it was left out? I'm not super familiar with ALE, perhaps it is not as common there?

As part of my project I've created scripts to train and evaluate PPO in procgen* and I've implemented the DAAC agent (https://arxiv.org/abs/2102.10330). Would you like me to make a PR to include them to cleanba?

*On top of re-implementing value loss clipping in PPO I found minor differences between the atari and procgen environments, such as the info dict returned by envpool.step() being slightly different, and the videos in the eval script supporting grayscale images only.

Install jax with CUDA

Running poetry install doesn't install Jax with cuda for me.
I test with this pyproject.toml. It sounds like installing jax with cuda support for me.

[tool.poetry]
name = "test"
version = "0.1.0"
description = ""
authors = ["Chufan Chen <[email protected]>"] 
readme = "README.md"

[tool.poetry.dependencies]
python = "^3.8"
jaxlib = {version =  "0.3.25+cuda11.cudnn82", source = "jax"}
jax = "0.3.25"


[[tool.poetry.source]]
name = "jax"
url = "https://storage.googleapis.com/jax-releases/jax_cuda_releases.html"
default = false
secondary = false

[build-system]
requires = ["poetry-core"]
build-backend = "poetry.core.masonry.api"

The pyproject.tom in cleanrl may also need modification.

Recommend Projects

React

A declarative, efficient, and flexible JavaScript library for building user interfaces.
Vue.js

🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
Typescript

TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
TensorFlow

An Open Source Machine Learning Framework for Everyone
Django

The Web framework for perfectionists with deadlines.
Laravel

A PHP framework for web artisans
D3

Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

javascript

JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
web

Some thing interesting about web. New door for the world.
server

A server is a program made to process requests and deliver data to clients.
Machine learning

Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Visualization

Some thing interesting about visualization, use data art
Game

Some thing interesting about game, make everyone happy.

Recommend Org

Facebook

We are working to build community through open source technology. NB: members must have two-factor auth.
Microsoft

Open source projects and samples from Microsoft.
Google

Google ❤️ Open Source for everyone.
Alibaba

Alibaba Open Source for everyone
D3

Data-Driven Documents codes.
Tencent

China tencent open source team.