Code Monkey home page Code Monkey logo

rls's Introduction

Hi 👋, I'm Keavnn

stepneverstop

python csharp pytorch tensorflow unity docker kubernetes bash git linux

stepneverstop

stepneverstop

rls's People

Contributors

bluefisher avatar dragon-wang avatar kasimte avatar kmakeev avatar stepneverstop avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

rls's Issues

优化模型、日志的保存目录和逻辑

  • 可选择“模型保存到一个文件”/“各个Module保存到独立的文件”
  • 设置模型保存文件夹下可同时保存几个模型文件,并且FIFO,新保存的模型替换最旧保存的模型
  • 基于训练次数保存模型
  • 添加wandb f3ebb82

Under Consideration:

  • 基于训练时间保存模型
  • 基于智能体得分性能保存模型

运行python run.py -p unity -a ppo -n run_with_unity时报错

您好,我输入 python run.py -p unity -a ppo -n run_with_unity ,运行Unity后遇到了这个问题:name '_outs' is not defined,多次重新安装之后依然还是这样。请问应该如何解决呢,希望得到您的指点,谢谢!
problem

Broken pipe

在程序运行过程中有时会报错:BrokenPipeError: [Errno 32] Broken pipe,且报错时间不确定

Thanks

thanks for your sharing.it is nice work. I wonder how to use the hiro in your package.

自动format

  • 使用autopep8自动format某一文件
  • 使用autopep8自动format某一目录下所有文件
  • 使用isort自动整理文件的import格式

Train a custom gym env

I would like to understand how can i use my custom environment that extends Gym env to train my model instead of using the default gym cases.

masac

Has masac been trained in the multiagent particle envs? Can it converge?

fix function update_config in run.py

Must be:
def update_config(config, file):
_config = sth.load_config(file)
try:
for key in _config:
config[key] = _config[key]
except Exception as e:
print(e)
sys.exit()
return config

值函数相关

  • 通用的n-step值函数计算
  • TD($\lambda$)

以PPO为例,实现几种Trace计算方法:

  • Retrace
  • V-Trace

tutorial

is there a tutorial on how to use this with ml-agents and unity?

maddpg训练问题

我用maddpg训练一个unity应用,配置了多个大脑,输出action都是连续的,但是运行一开始就报错,list has no attributes is_continuous。

Change to pytorch

  • Tensorflow的tf.function实在太不灵活了,对自定义的数据类型支持薄弱,很多功能受限制,浪费大量时间来debug
  • loop类型操作在tf.function中很难调试,学习tf的数据类型成本也比较高

Applied to multiple agents?

Hi, I have tested your project for the multi-agent environment.
But I am not sure whether it is suitable for the multi-agent environment.

My environment includes 8 agents with discrete actions. However, I got this error:

initialize model SUCCUESS.
save config to /RLData/sac_no_v/test/GridWorldLearning/config
There was a mismatch between the provided action and the environment's expectation: The brain GridWorldLearning expected 8 discrete action(s), but was provided: [-0.050590384751558304, -0.665206789970398, -0.0410725474357605, -0.23551416397094727, 0.010302126407623291, 0.2644920349121094, -1.0, -0.10047897696495056, -1.0, 0.03841760754585266, -1.0, -1.0, -0.33658552169799805, 0.7163478136062622, -0.1180223822593689, 0.31758153438568115, -1.0, -0.18739420175552368, -0.15177105367183685, -0.2588164806365967, 0.11979779601097107, -0.5222678184509277, -0.6121081113815308, -1.0, -0.08478996157646179, -0.6589073538780212, -1.0, 0.32313454151153564, -0.3325958251953125, -0.9373922348022461, 0.4225391149520874, -0.18213623762130737, 0.7108762264251709, 0.1738891303539276, -0.6963950395584106, 0.41238147020339966, -1.0, 0.451471209526062, -0.6678181886672974, -0.8575950860977173]
unsupported operand type(s) for +: 'NoneType' and 'str'

Could you please help with it? Thanks!

Exception during running MountainCar-v0 case with ppo

Thanks for your developement it seems to be inspiring project!
Although when I tried to launch a command from Examples:
python run.py --gym -a ppo -n train_using_gym --gym-env MountainCar-v0 --render-episode 1000 --gym-agents 4
I've got Error:
render() missing 1 required positional argument: 'record'

Part of the log before the exception:

INFO:common.agent:| Model-0 |no op step 2496
INFO:common.agent:| Model-0 |no op step 2497
INFO:common.agent:| Model-0 |no op step 2498
INFO:common.agent:| Model-0 |no op step 2499
WARNING:tensorflow:Layer a_c_v_discrete is casting an input tensor from dtype float64 to the layer's dtype of float32, which is new behavior in TensorFlow 2.  The layer has dtype float32 because it's dtype defaults to floatx.

If you intended to run this layer in float32, you can safely ignore this warning. If in doubt, this warning is likely only an issue if you are porting a TensorFlow 1.X model to TensorFlow 2.

To change all layers to have dtype float64 by default, call `tf.keras.backend.set_floatx('float64')`. To change just this layer, pass dtype='float64' to the layer constructor. If you are the author of this layer, you can disable autocasting by passing autocast=False to the base Layer constructor.

^[[BINFO:common.agent:| Model-0 |Pass time(h:m:s) 00:00:10 |----------------------------------------
INFO:common.agent:| Model-0 |Episode: 100 | step: 2000 | last_done_step  200 | rewards: -200.0, -200.0, -200.0, -200.0
Save checkpoint success. Episode: 100
render() missing 1 required positional argument: 'record'

Could you explain how to fix/overcome this error?

PS. Just before this I tried to launch the same env and model with command:

python run.py --gym -a ppo -n train_using_gym --gym-env MountainCar-v0 --render-episode 100 --gym-agents 1

It was executing a little bit longer but had no success in improving reward (always -200). And finally it had finished with the same exception.

error when checking the length of shape tf 2.0

tf.version
'2.0.0'
tfp.version
'0.8.0'

params --gym -a sac_no_v -n train_using_gym -g --gym-env CarRacing-v0 --render-episode 10 --gym-agents 4

ER
in converted code:
relative to C:\Python34\RLs\Nn:

tf2nn.py:144 call  *
    features = self.share(super().call(vector_input, visual_input))
tf2nn.py:86 call  *
    features = self.conv1(visual_input)

AttributeError: 'actor_continuous' object has no attribute 'conv1'

in tf2nn.py class ImageNet(tf.keras.Model): init()
len(visual_dim) is '4'
and conv1 layers are not added to the model, etc. since 'if len(visual_dim) == 5:'
in 'def call(self, vector_input, visual_input):' shape is (None, 1, 96, 96, 3), him len is '5'
and here we get an error
if visual_input is None or len(visual_input.shape) != 5:
pass
else:
features = self.conv1(visual_input)

智能体状态相关

  • 状态拼接上一时刻动作(sarl&marl)
  • 状态拼接智能体ID one_hot(marl)
  • 状态移动归一化/标准化

about using gumbel_distribution to transform discrete space

In the code you provided, the DDPG algorithm supports continuous and discrete action spaces by using the Gumbel_distribution. Maddpg is a DDPG-based extension, and whether it is suitable for discrete action spaces by using Gumbel_distribution. when i employ Gumbel in MADDPG, i can not obtain appropriate results. the version of tensorflow i used is 1.14, i don't use the tensorflow_probability module, could you give me some code exmples of Gumbel in TF, or give me some instructions? sorry to bother you.

Cite Activations

Hi.
Awesome work. I am relatively new to RL in general. But this is a real good resource. I would really appreciate if you could cite my activation function "Mish" and same for Swish. Would help users to backtrack to the origin papers of those algorithms.
The original repository for Mish - https://github.com/digantamisra98/Mish
The Readme contains the link to the arXiv paper as well.
If you'd like to cite the paper instead, the link is - https://arxiv.org/abs/1908.08681
Thank You!

Error applying gradient for some algorithms

OS - Ubuntu 19.4

print(tfp.version)
0.9.0-dev20191113
print(tf.version)
2.1.0-dev20191111

In pg, ac a2c algorithms, I get the error:

in converted code:
/home/konstantin/IdeaProjects/RLs/Algorithms/tf2algos/a2c.py:158 train *
self.optimizer_actor.apply_gradients(
/home/konstantin/anaconda3/envs/tsfl2/lib/python3.7/site-packages/tensorflow_core/python/keras/optimizer_v2/optimizer_v2.py:434 apply_gradients
self._create_slots(var_list)
/home/konstantin/anaconda3/envs/tsfl2/lib/python3.7/site-packages/tensorflow_core/python/keras/optimizer_v2/adam.py:149 _create_slots
self.add_slot(var, 'm')
/home/konstantin/anaconda3/envs/tsfl2/lib/python3.7/site-packages/tensorflow_core/python/keras/optimizer_v2/optimizer_v2.py:574 add_slot
var_key = _var_key(var)
/home/konstantin/anaconda3/envs/tsfl2/lib/python3.7/site-packages/tensorflow_core/python/keras/optimizer_v2/optimizer_v2.py:1065 _var_key
if var._in_graph_mode:

AttributeError: 'ListWrapper' object has no attribute '_in_graph_mode'

But with the algorithms dqn, ddqn, dddqn, dpg... everything works.

After writing this code, the error disappears:

self.optimizer_actor.apply_gradients(
zip(actor_grads, self.actor_net.trainable_variables + self.log_std)
)

Unable to run algorithms due to gym environment issue

I am able to successfully install the RLs. While running the command the command python run.py -p gym -a dqn -e CartPole-v0 -c 12 -n dqn_cartpole
I am getting this error

load config from rls/configs/gym/CartPole-v0.yaml failed, cannot find file.

重构Unity Wrapper

  1. 由python端在连接Unity时自动发送诸如“环境并行数量、智能体雷达检测密度、是否强制reset环境”等变量
  2. 由python端在初始化训练环境时指定是否需要stack状态输入,无需另写StackWapper
  3. ...

s_dim dimension.

Hi, i dont know if this is bug or just this library doesnt handle it.

But firstly thanks for this library..
..its really good kick starter in RL learning process. Help me a lot :)

My question is if i have more than 1 dimensions -- s_dim (env.observation_space). For example with shape Box(4,8) ...i always had a error.

So i have to handle it myself? Or library can handle it?

Thanks.

优化ReadMe中的get start

  1. ReadMe中关于如何使用该项目介绍的不够详细,需要进一步阐述和举例说明;
  2. 示例阐述如何基于本仓库构建自己新的算法

关于AC框架的算法中的Loss?

关于AC框架的算法,可不可以用loss作为评价网络训练好坏的标准呢?那么对于一般形式来说,actor_loss、critic_loss有确定的收敛趋势吗?比如说actor_loss上升,critic_loss下降?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.