tianhongdai / reinforcement-learning-algorithms Goto Github PK

This repository contains most of pytorch implementation based classic deep reinforcement learning algorithms, including - DQN, DDQN, Dueling Network, DDPG, SAC, A2C, PPO, TRPO. (More algorithms are still in progress)

Python 100.00%

deep-reinforcement-learning ddpg ppo proximal-policy-optimization deep-learning actor-critic algorithm dqn flappy-bird trpo

reinforcement-learning-algorithms's People

Contributors

Stargazers

Watchers

Forkers

monkeyjohn lyealy nke001 xuexia7023 daominglyu arunkumarramanan little1tow allensmile zbn123 collector-m friendshipity gareththomasnz amarpreet2209 jkingf windwebber alabarga ahrenh avalacher nd1511 fancyerii wsgan001 weibit elliotbao sweetice shuxjweb imranrolo avijit9 megayeye deeksha-5 dafangwang shamcondor knaggita kaikangsdu paparazz1 xuyou314 liuqiangopenmind harveyphm vic-haoxu likelyzhao wangbingok1118 running-on-faith daidalose jnu-tangyin liangtianxin bailiping tonylibing zhaoyangacc zstbackcourt mayoo00 guilinf edgarzou xiaotailong zivzone solversa tanxiangtj jeme-yufeng-zhan sprakashdash nuclearavocado llt1 lars12llt miaopasilsc1998 thswind arm-comal fengguilin1994 saran-gangster zhanglin831 mehrdad-moradi hsl89 amalgharbi lt310 misterola magus96 crazyalltnt liukeycc alirezashamsoshoara jongkook-heo miracle1207 zcchenvy yyds-xtt cuiyaping321 shi-yiwei yuling91 chzhan li-ming-fan yanzia12138 ruturajsambhusvt mengf1 xiaolonghao junjiefu ricky-zhu samuelsuntree yongqidong shenjiede yaxuniu omyllymaki dhruvsheth-ai

reinforcement-learning-algorithms's Issues

Bug using SAC with torch version 1.8.0a0+963f762

I run the SAC code with torch (compiled version) while i encounter the error

RuntimeError: one of the variables needed for gradient computation has been modified by an inplace operation: [torch.cuda.FloatTensor [256, 1]], which is output 0 of TBackward, is at version 2; expected version 1 instead. Hint: enable anomaly detection to find the operation that failed to compute its gradient, with torch.autograd.set_detect_anomaly(True).

how can i fix it?

The whole error is listed below

Logging to logs//HalfCheetah-v2/
Initial exploration has been finished!
Traceback (most recent call last):
  File "train.py", line 14, in <module>
    sac_trainer.learn()
  File "/home/reinforcement-learning-algorithms/rl_algorithms/sac/sac_agent.py", line 97, in learn
    qf1_loss, qf2_loss, actor_loss, alpha, alpha_loss = self._update_newtork()
  File "/home/reinforcement-learning-algorithms/rl_algorithms/sac/sac_agent.py", line 189, in _update_newtork
    actor_loss.backward()
  File "/home/admin/anaconda3/envs/pytorch_build/lib/python3.8/site-packages/torch/tensor.py", line 221, in backward
    torch.autograd.backward(self, gradient, retain_graph, create_graph)
  File "/home/admin/anaconda3/envs/pytorch_build/lib/python3.8/site-packages/torch/autograd/__init__.py", line 130, in backward
    Variable._execution_engine.run_backward(

this is how i run the code

python train.py --env-name HalfCheetah-v2 --cuda --seed 1

Retraining the saved model

I'm trying to retrain the saved model, but it behaves very strangely:

does not seem to start from the same behaviour that has been saved
repeats only one type of action after running the retraining

I guess this is pytorch issue, but maybe you've succeeded the retraining, and might know how should it be done?

Saving:

def save_model_for_training(self, episode, filepath):
        checkpoint = {
            'episode': episode,
            'state_dict': self.net.state_dict(),
            'optimizer': self.optimizer.state_dict()
        }
        torch.save(checkpoint, filepath)

 self.save_model_for_training(episode, filepath= self.model_path + 'model.pt')

Loading saved model:

checkpoint = torch.load(self.model_path + 'model.pt')
self.start_episode = checkpoint['episode']
self.net.load_state_dict(checkpoint['state_dict'])
self.optimizer.load_state_dict(checkpoint['optimizer'])

Thanks a lot in advance

How to visualize reward-epoch?

How to visualize reward-epoch?
Does It need extra code for post-processing(post-process print infomation) or is there some tools for visualization? I cound not find the extra code in github project.

A3C in description

Hi, just wanted to point out that it says "A3C" in the repo description instead of "A2C" (which is actually implemented).

the same code？

Thanks for your code for DRL. But I find the dueling_agent.py of dueling dqn is the same as ddqn_agent.py of double dqn. So where do you implement the dueling dqn algorithm into code?

SAC Agent still wrong with "tuple index out of range" use "MountainCar-v0"

I try to use "MountainCar-v0" env in sac agent
but still wrong with "tuple index out of range"
Could you tell me how to fix it ?
Thanks

Traceback (most recent call last):
File "D:\RL_library/sac/train.py", line 14, in
sac_trainer = sac_agent(env, args)
File "D:RL_library\sac\sac_agent.py", line 30, in init
self.qf1 = flatten_mlp(self.env.observation_space.shape[0], self.args.hidden_size, self.env.action_space.shape[0])
IndexError: tuple index out of range

BrokenPipeError: [Errno 32] Broken pipe

Hi, after I installed using pip install -e ., I tried to run:

python train.py --env-name PongNoFrameskip-v4 --cuda

in reinforcement-learning-algorithms/rl_algorithms/a2c and I get the following error on google colab:

Logging to logs//PongNoFrameskip-v4/
Logging to logs//PongNoFrameskip-v4/
Logging to logs//PongNoFrameskip-v4/
Logging to logs//PongNoFrameskip-v4/
Logging to logs//PongNoFrameskip-v4/
Logging to logs//PongNoFrameskip-v4/
Logging to logs//PongNoFrameskip-v4/
Logging to logs//PongNoFrameskip-v4/
Logging to logs//PongNoFrameskip-v4/
Logging to logs//PongNoFrameskip-v4/
Logging to logs//PongNoFrameskip-v4/
Logging to logs//PongNoFrameskip-v4/
Logging to logs//PongNoFrameskip-v4/
Logging to logs//PongNoFrameskip-v4/
Logging to logs//PongNoFrameskip-v4/
Logging to logs//PongNoFrameskip-v4/
Traceback (most recent call last):
  File "train.py", line 20, in <module>
    a2c_trainer.learn()
  File "/content/reinforcement-learning-algorithms/rl_algorithms/a2c/a2c_agent.py", line 42, in learn
    _, pi = self.net(input_tensor)
  File "/usr/local/lib/python3.6/dist-packages/torch/nn/modules/module.py", line 727, in _call_impl
    result = self.forward(*input, **kwargs)
  File "/content/reinforcement-learning-algorithms/rl_algorithms/a2c/models.py", line 47, in forward
    x = self.cnn_layer(inputs / 255.0)
  File "/usr/local/lib/python3.6/dist-packages/torch/nn/modules/module.py", line 727, in _call_impl
    result = self.forward(*input, **kwargs)
  File "/content/reinforcement-learning-algorithms/rl_algorithms/a2c/models.py", line 28, in forward
    x = x.view(-1, 32 * 7 * 7)
RuntimeError: view size is not compatible with input tensor's size and stride (at least one dimension spans across two contiguous subspaces). Use .reshape(...) instead.
Exception ignored in: <bound method SubprocVecEnv.__del__ of <rl_utils.env_wrapper.multi_envs_wrapper.SubprocVecEnv object at 0x7f772325ca58>>
Traceback (most recent call last):
  File "/content/reinforcement-learning-algorithms/rl_utils/env_wrapper/multi_envs_wrapper.py", line 105, in __del__
  File "/content/reinforcement-learning-algorithms/rl_utils/env_wrapper/__init__.py", line 96, in close
  File "/content/reinforcement-learning-algorithms/rl_utils/env_wrapper/multi_envs_wrapper.py", line 89, in close_extras
  File "/usr/lib/python3.6/multiprocessing/connection.py", line 206, in send
  File "/usr/lib/python3.6/multiprocessing/connection.py", line 404, in _send_bytes
  File "/usr/lib/python3.6/multiprocessing/connection.py", line 368, in _send
BrokenPipeError: [Errno 32] Broken pipe

Add prioritized experience replay

Thanks for the excellent implementations of multiple classic RL agents, I have tried some of them, worked very well.
Just curious, do you plan to add the prioritized experience replay to the DQN? I have tried to embed the openai baseline PER files (https://github.com/openai/baselines/blob/master/baselines/deepq/replay_buffer.py) to the current DQN, should be a really easy job, sadly it doesn't work very well, seems no obvious improvements against the vanilla DQN.
If you have any insights on this or have you tried some similar implementation, please let me know, really appreciate it.

Plotted Reward Scale

Hi, I am oscar and I do appreciate those source codes with integrating various algorithms.

I have tried to run the nature DQN with default setting through Pong and BeamRider environment and found that the reward scale is not as large as the one posted in main page.
For Pong Environment,
I just manually set the clip_rewards = False and got the final mean around 27.430 which is far from the max level(around 300) posted.

Is it due to difference hyper-parameters setting or may be due to some plotting techniques?
BTW, I will really appreciate if you can update the plotting code, Thank you!

How to use the results?

Could you please tell me how to use the results?

I want the data just like: mean_reward & time_step

tianhongdai / reinforcement-learning-algorithms Goto Github PK

reinforcement-learning-algorithms's People

Contributors

Stargazers

Watchers

Forkers

reinforcement-learning-algorithms's Issues

Bug using SAC with torch version 1.8.0a0+963f762

Retraining the saved model

How to visualize reward-epoch?

A3C in description

the same code？

SAC Agent still wrong with "tuple index out of range" use "MountainCar-v0"

BrokenPipeError: [Errno 32] Broken pipe

Add prioritized experience replay

Plotted Reward Scale

How to use the results?

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent