ghliu / pytorch-ddpg Goto Github PK

View Code? Open in Web Editor NEW

538.0 8.0 153.0 1.85 MB

Implementation of the Deep Deterministic Policy Gradient (DDPG) using PyTorch

License: Apache License 2.0

Python 100.00%

pytorch ddpg deep-reinforcement-learning openai-gym

pytorch-ddpg's People

Contributors

Stargazers

Watchers

Forkers

wickywwz leechikara zhouweiti meelement tegg89 vzhuang fgolemo grseb9s chenglongchen zzxx-husky keniuniu jmenashe mmarklar williamd4112 tallni2 mbt666666 bogdanbranescu fducau yepremyana jason12333 duzx16 zhangxu0307 vovallen 591x351-13n kevin0602 jackyangzg daominglyu wh-forker tclikang angel272shaw xueyingbai agnesmm moran232 denniswangcw jeme-yufeng-zhan wjp8016 random-user-x rafvasq collector-m mkaggrey tristone13th wajeehulhassanvii txing-casia rahulindoria5 h-aboutalebi kaikangsdu pyl62112991 hanrui-wang kackyt surya1304-zz huisiqi leela93 chen-shihua edend10 akshinag xiaotailong shadowyzy johnjohnzhou yjj1995 bombhero cwilson51 karlcui22 yuqiangheng xinghy123 chenkehan21 lunay0yuki zhangtjtongxue jonygao621 takayuki5168 dongzhuzhao zhangjinyiyi xuehui1991 zjhui11 sihongho jingxfei fotinosk jinmang2 arg-nctu kyg1552 integritynoble johanneszahn c-abbott thanhlongvt98 michaelrising wide725 zhouwen327 915665124 karsonl xinqin23 changyunke mitchelltesla qhfan benwater12 zw20210531 tamurajunichi joshjo city292 yudaitaka0515 yyummyu shenhe111

pytorch-ddpg's Issues

NotImplementedError

Traceback (most recent call last):
File "D:\Master\Codes\pytorch-ddpg\main.py", line 156, in
train(args.train_iter, agent, env, evaluate,
File "D:\Master\Codes\pytorch-ddpg\main.py", line 44, in train
observation2, reward, done, info = env.step(action)
File "D:\AI\Software\Conda\Miniconda\envs\torch\lib\site-packages\gym\core.py", line 349, in step
return self.env.step(self.action(action))
File "D:\AI\Software\Conda\Miniconda\envs\torch\lib\site-packages\gym\core.py", line 353, in action
raise NotImplementedError
NotImplementedError

Why append additional (s a r) pair to the replay buffer after one episode is done?

Hi Guan-Horng,

Thanks for your great implementation! I am wondering why do we append additional (s a r) pair to the replay buffer after one episode is done? The reward in that pair is zero, I think it is probably not mentioned in the original paper.

pytorch-ddpg/main.py

Line 64 in e9db328

agent.memory.append(

Thank you!

Index error in ddpg.py?

At line 127 in file ddpg.py, I think we should squeeze index 0 instead of index 1. However, because the two example game action space is only 1 dim, so this bug didn't show out.

I tried it on a high-dimensional action space, it only works when I change it to 0.

Computation of target values with terminal states

I have a question about the following line in the code in the training logic:

pytorch-ddpg/ddpg.py

Line 75 in e9db328

self.discount*to_tensor(terminal_batch.astype(np.float))*next_q_values

In the computation of the target Q-values, shouldn't the multiplication be done with

(1-to_tensor(terminal_batch.astype(np.float)))

as we would like the next state Q-values to be zeroed if the state was terminal. In fact, in this case the next state might not belong to the same episode as the current state, thus the evaluation of the target network is invalid.

Apologies if I'm missing something trivial.

What is needed to implement CUDA?

What is needed to implement GPUs?

error in compute target q values

        target_q_batch = to_tensor(reward_batch) + \
            self.discount*to_tensor(terminal_batch.astype(np.float))*next_q_values

I think it should be

        target_q_batch = to_tensor(reward_batch) + \
            self.discount*to_tensor(1.0 - terminal_batch.astype(np.float))*next_q_values

ghliu / pytorch-ddpg Goto Github PK

pytorch-ddpg's People

Contributors

Stargazers

Watchers

Forkers

pytorch-ddpg's Issues

NotImplementedError

Why append additional (s a r) pair to the replay buffer after one episode is done?

Index error in ddpg.py?

Computation of target values with terminal states

What is needed to implement CUDA?

error in compute target q values

the gradient of the action-value with respect to actions

train signature typo

The effect of NormalizedEnv

Why MountainCarContinuous-v0 will converge to 0 ?

Anyone reproduced the MountainCarContinuous-v0 results?

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent