stepneverstop / rls Goto Github PK

Reinforcement Learning Algorithms Based on PyTorch

Home Page: https://stepneverstop.github.io

License: Apache License 2.0

Python 100.00%

sac training-agents reinforcement-learning-algorithm ml-agents unity3d gym pytorch deep-reinforcement-learning reinforcement-learning

rls's Introduction

Hi 👋, I'm Keavnn

Reinforcement Learning Amateur. An AI Ph.D. candidate from LAMDA5 of Nanjing University

🔭 I’m currently working on a RL training framework——RLs
🌱 I’m currently learning Safe RL, HRL, MBRL, etc.
👯 I’m looking to collaborate with other RL creators
👨‍💻 All of my projects are available at https://github.com/StepNeverStop
📝 I regulary write articles on https://stepneverstop.github.io/
💬 Contact me through [email protected]
⚡ Fun fact: playing mobile game, swimming, sleeping

rls's People

Contributors

Stargazers

Watchers

Forkers

familywei kasimte kmakeev hititan dongf17 abluceli liuqiangopenmind bluefisher renweiya zhoushiyang12 wudiymy intelligent-robotic-group laokpa generalzh chaoyue729 yyht allensmile airicky daiyl wwdaddf luo-li taogz buptbf ziyiliubird wnight963 ljp580230 herb927 tanxiangtj poet-libai xrosliang xj8018 ech0potato arielliu3124 lfy80 isehd czh513 hackylee ruixianzhang666 lvzw1895 drwxyh ncepuwwy97 carolinexull spartmanavon sevenjiao ai-hub-deep-learning-fundamental whateveryet yisuoyanyudmj chen-yongquan gaosihua kiminh wyz1074152339 skyshis littleyoung-0 dragon-wang zhijie-ai demomagic jenniecheng davidlisten owenpanqiufeng 123world t-agent 1abner1 qiushuiai michaelcola chenweigou stevenjokess 16214499888 nisheethjaiswal vbkbmqj zhanyon tonylibing butterfly2sea jiangyuzi wakeupppp tbcc66 rl-code-lib angelo3287 aphelion17 floraljq mrsling gaosz0755 kunal266 violet712 birdjj tjevgerres zfang2019 sihai90 zhonglj2012 sherlockjian johannaye zengli0910 iq-scm yuzougit dreamchao

rls's Issues

Implement Model Saving Mechanism

include save model based on:

training time cost
score performance of under-training policy
training timestep
...

优化模型、日志的保存目录和逻辑

可选择“模型保存到一个文件”/“各个Module保存到独立的文件”
设置模型保存文件夹下可同时保存几个模型文件，并且FIFO，新保存的模型替换最旧保存的模型
基于训练次数保存模型
添加wandb f3ebb82

Under Consideration:

基于训练时间保存模型
基于智能体得分性能保存模型

运行python run.py -p unity -a ppo -n run_with_unity时报错

您好，我输入 python run.py -p unity -a ppo -n run_with_unity ，运行Unity后遇到了这个问题：name '_outs' is not defined，多次重新安装之后依然还是这样。请问应该如何解决呢，希望得到您的指点，谢谢！

Broken pipe

在程序运行过程中有时会报错：BrokenPipeError: [Errno 32] Broken pipe，且报错时间不确定

Thanks

thanks for your sharing.it is nice work. I wonder how to use the hiro in your package.

自动format

使用autopep8自动format某一文件
使用autopep8自动format某一目录下所有文件
使用isort自动整理文件的import格式

Train a custom gym env

I would like to understand how can i use my custom environment that extends Gym env to train my model instead of using the default gym cases.

masac

Has masac been trained in the multiagent particle envs? Can it converge?

用Unity Editor训练时，画面渲染卡顿

使用Unity Editor训练智能体时，官方ML-Agents可以保证训练时画面流畅地渲染，而本仓库渲染则卡顿。

如何利用训练生成的.pth文件在Unity ML-Agents中进行测试

您好，十分感谢此次的更新，目前已经可以利用新版本在Unity中进行训练了。想向您请教下，由于mlagent支持的是.nn/.onnx格式，应该如何应用训练生成的.pth在Unity环境中进行测试呢

设置无论on-policy还是off-policy其数据类型均为至少2维，即[batchsize, dimension]

fix function update_config in run.py

Must be:
def update_config(config, file):
_config = sth.load_config(file)
try:
for key in _config:
config[key] = _config[key]
except Exception as e:
print(e)
sys.exit()
return config

and I want to ask you other question,this project use GPU? I have GPU installed,but it remain have this question

Originally posted by @strikeman1 in #23 (comment)

值函数相关

通用的n-step值函数计算
TD($\lambda$)

以PPO为例，实现几种Trace计算方法：

Retrace
V-Trace

tutorial

is there a tutorial on how to use this with ml-agents and unity?

maddpg训练问题

我用maddpg训练一个unity应用，配置了多个大脑，输出action都是连续的，但是运行一开始就报错，list has no attributes is_continuous。

Change to pytorch

Tensorflow的tf.function实在太不灵活了，对自定义的数据类型支持薄弱，很多功能受限制，浪费大量时间来debug
loop类型操作在tf.function中很难调试，学习tf的数据类型成本也比较高

Check that the code implementation is accurate and reasonable

Hello. I want to know how to use RLs into unity3d-mlagent.

I have read your project,but i do not know how to use in mlagent.please teach me,thanks.

I eliminated some bugs and ran the masac algorithm with pettingzoo, but the curve on TensorBoard is giving me pain

我排除了一些bug用pettingzoo运行了masac算法, 但是TensorBoard上的曲线让我痛苦

这个结果我没法接受请问是不是您的代码有问题

I cannot accept this result. Is there something wrong with your code?

Exception during running MountainCar-v0 case with ppo

Thanks for your developement it seems to be inspiring project!
Although when I tried to launch a command from Examples:
python run.py --gym -a ppo -n train_using_gym --gym-env MountainCar-v0 --render-episode 1000 --gym-agents 4
I've got Error:
render() missing 1 required positional argument: 'record'

Part of the log before the exception:

INFO:common.agent:| Model-0 |no op step 2496
INFO:common.agent:| Model-0 |no op step 2497
INFO:common.agent:| Model-0 |no op step 2498
INFO:common.agent:| Model-0 |no op step 2499
WARNING:tensorflow:Layer a_c_v_discrete is casting an input tensor from dtype float64 to the layer's dtype of float32, which is new behavior in TensorFlow 2.  The layer has dtype float32 because it's dtype defaults to floatx.

If you intended to run this layer in float32, you can safely ignore this warning. If in doubt, this warning is likely only an issue if you are porting a TensorFlow 1.X model to TensorFlow 2.

To change all layers to have dtype float64 by default, call `tf.keras.backend.set_floatx('float64')`. To change just this layer, pass dtype='float64' to the layer constructor. If you are the author of this layer, you can disable autocasting by passing autocast=False to the base Layer constructor.

^[[BINFO:common.agent:| Model-0 |Pass time(h:m:s) 00:00:10 |----------------------------------------
INFO:common.agent:| Model-0 |Episode: 100 | step: 2000 | last_done_step  200 | rewards: -200.0, -200.0, -200.0, -200.0
Save checkpoint success. Episode: 100
render() missing 1 required positional argument: 'record'

Could you explain how to fix/overcome this error?

PS. Just before this I tried to launch the same env and model with command:

python run.py --gym -a ppo -n train_using_gym --gym-env MountainCar-v0 --render-episode 100 --gym-agents 1

It was executing a little bit longer but had no success in improving reward (always -200). And finally it had finished with the same exception.

error when checking the length of shape tf 2.0

tf.version
'2.0.0'
tfp.version
'0.8.0'

params --gym -a sac_no_v -n train_using_gym -g --gym-env CarRacing-v0 --render-episode 10 --gym-agents 4

ER
in converted code:
relative to C:\Python34\RLs\Nn:

tf2nn.py:144 call  *
    features = self.share(super().call(vector_input, visual_input))
tf2nn.py:86 call  *
    features = self.conv1(visual_input)

AttributeError: 'actor_continuous' object has no attribute 'conv1'

in tf2nn.py class ImageNet(tf.keras.Model): init()
len(visual_dim) is '4'
and conv1 layers are not added to the model, etc. since 'if len(visual_dim) == 5:'
in 'def call(self, vector_input, visual_input):' shape is (None, 1, 96, 96, 3), him len is '5'
and here we get an error
if visual_input is None or len(visual_input.shape) != 5:
pass
else:
features = self.conv1(visual_input)

智能体状态相关

状态拼接上一时刻动作(sarl&marl)
状态拼接智能体ID one_hot(marl)
状态移动归一化/标准化

优化器相关

学习率衰减
梯度裁剪

about using gumbel_distribution to transform discrete space

In the code you provided, the DDPG algorithm supports continuous and discrete action spaces by using the Gumbel_distribution. Maddpg is a DDPG-based extension, and whether it is suitable for discrete action spaces by using Gumbel_distribution. when i employ Gumbel in MADDPG, i can not obtain appropriate results. the version of tensorflow i used is 1.14, i don't use the tensorflow_probability module, could you give me some code exmples of Gumbel in TF, or give me some instructions? sorry to bother you.

Recorder相关

添加verbose选项，避免记录太多信息

Cite Activations

Hi.
Awesome work. I am relatively new to RL in general. But this is a real good resource. I would really appreciate if you could cite my activation function "Mish" and same for Swish. Would help users to backtrack to the origin papers of those algorithms.
The original repository for Mish - https://github.com/digantamisra98/Mish
The Readme contains the link to the arXiv paper as well.
If you'd like to cite the paper instead, the link is - https://arxiv.org/abs/1908.08681
Thank You!

RNN相关

burn-in技巧
可选GRU or LSTM

智能体动作相关

离散动作，加入mask屏蔽非法动作的选择

实现多类型观测值存储和学习

向量
射线
多图像

运行SoccerTwos环境出错，每次prefilling到38%就报index16 is out of bounds for axis 0 with size 16

Error applying gradient for some algorithms

OS - Ubuntu 19.4

print(tfp.version)
0.9.0-dev20191113
print(tf.version)
2.1.0-dev20191111

In pg, ac a2c algorithms, I get the error:

in converted code:
/home/konstantin/IdeaProjects/RLs/Algorithms/tf2algos/a2c.py:158 train *
self.optimizer_actor.apply_gradients(
/home/konstantin/anaconda3/envs/tsfl2/lib/python3.7/site-packages/tensorflow_core/python/keras/optimizer_v2/optimizer_v2.py:434 apply_gradients
self._create_slots(var_list)
/home/konstantin/anaconda3/envs/tsfl2/lib/python3.7/site-packages/tensorflow_core/python/keras/optimizer_v2/adam.py:149 _create_slots
self.add_slot(var, 'm')
/home/konstantin/anaconda3/envs/tsfl2/lib/python3.7/site-packages/tensorflow_core/python/keras/optimizer_v2/optimizer_v2.py:574 add_slot
var_key = _var_key(var)
/home/konstantin/anaconda3/envs/tsfl2/lib/python3.7/site-packages/tensorflow_core/python/keras/optimizer_v2/optimizer_v2.py:1065 _var_key
if var._in_graph_mode:

AttributeError: 'ListWrapper' object has no attribute '_in_graph_mode'

But with the algorithms dqn, ddqn, dddqn, dpg... everything works.

After writing this code, the error disappears:

self.optimizer_actor.apply_gradients(
zip(actor_grads, self.actor_net.trainable_variables + self.log_std)
)

Unable to run algorithms due to gym environment issue

I am able to successfully install the RLs. While running the command the command python run.py -p gym -a dqn -e CartPole-v0 -c 12 -n dqn_cartpole
I am getting this error

load config from rls/configs/gym/CartPole-v0.yaml failed, cannot find file.

重构Unity Wrapper

由python端在连接Unity时自动发送诸如“环境并行数量、智能体雷达检测密度、是否强制reset环境”等变量
由python端在初始化训练环境时指定是否需要stack状态输入，无需另写StackWapper
...

s_dim dimension.

Hi, i dont know if this is bug or just this library doesnt handle it.

But firstly thanks for this library..
..its really good kick starter in RL learning process. Help me a lot :)

My question is if i have more than 1 dimensions -- s_dim (env.observation_space). For example with shape Box(4,8) ...i always had a error.

So i have to handle it myself? Or library can handle it?

Thanks.

实现新的强化学习算法

MARL:
- MADDPG
- MASAC 1346949
- IQL
- VDN
- Q-MIX
- Qatten ad8be31
- MAPPO
- COMA
- QTRAN-alt
- QTRAN-base 4c45ba0
- QPLEX 92d4b9a
SARL:
- Model-free
  - CEM
  - TRPO 67b8979
  - NPG 71115ea
  - FQF
- Model-based:
  - Dreamer b7d88a1
  - MVE 14c9bfc
  - STEVE
  - MBPO
  - PlaNet 7965bcf
  - DreamerV2 7f988d4
- Offline:
  - BC
  - CQL 026ba1d
  - BCQ d60741c
  - AWR
  - BRAC

优化ReadMe中的get start

ReadMe中关于如何使用该项目介绍的不够详细，需要进一步阐述和举例说明；
示例阐述如何基于本仓库构建自己新的算法

关于AC框架的算法中的Loss？

关于AC框架的算法，可不可以用loss作为评价网络训练好坏的标准呢？那么对于一般形式来说，actor_loss、critic_loss有确定的收敛趋势吗？比如说actor_loss上升，critic_loss下降？