Hi, I was just getting started with this amazing d3rlpy library, and wanted to train a

<a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="/us

[QUESTION] Continously increasing loss and TD error about d3rlpy HOT 7 CLOSED

spsingh37 commented on September 24, 2024

[QUESTION] Continously increasing loss and TD error

from d3rlpy.

Comments (7)

takuseno commented on September 24, 2024

@SuryaPratapSingh37 Hi, thanks for the issue. Generally speaking, there is not usually convergence in deep RL training because of the nonstationary nature. So it is what it is. Also, the TD error is not a really good metrics. People don't usually use it to measure training progress.

from d3rlpy.

spsingh37 commented on September 24, 2024

@takuseno Thanks for your reply. So could u pls guide like exactly how should I be changing the above code to make it converge? I felt cartpole is a pretty simple environment & at least the loss should have been decreasing (if not overfitting), and secondly, if not TD error what else should I use here to examine the training loss (for finding whether its overfitting or not)?

from d3rlpy.

takuseno commented on September 24, 2024

Sadly, particularly in offline deep RL, it's very difficult to prevent divergence. So my recommendation is to give up on the convergence. Also, in offline RL, there is no good metrics to measure policy performance yet. I'd direct you to this documentation and a paper:

Offline deep RL still needs a lot of inventions to make it practical 😓

from d3rlpy.

spsingh37 commented on September 24, 2024

Ohh.....do you know whether during training the transitions are sampled randomly from the replay buffer (if not how to randomly shuffle the transitions)?

from d3rlpy.

takuseno commented on September 24, 2024

Yes, the mini-batch is uniformly sampled from the buffer.

from d3rlpy.

takuseno commented on September 24, 2024

Another suggestion to prevent the divergence is using offline RL algorithms. For now, it looks like you're using DQN, which is designed for online training. If you use DiscreteCQL instead, you might get the better results in offline setting.

from d3rlpy.

takuseno commented on September 24, 2024

Please let me close this issue since it's simply a nature of offline RL.

from d3rlpy.

Recommend Projects

[QUESTION] Continously increasing loss and TD error about d3rlpy HOT 7 CLOSED

Comments (7)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent