Comments (7)
@SuryaPratapSingh37 Hi, thanks for the issue. Generally speaking, there is not usually convergence in deep RL training because of the nonstationary nature. So it is what it is. Also, the TD error is not a really good metrics. People don't usually use it to measure training progress.
from d3rlpy.
@takuseno Thanks for your reply. So could u pls guide like exactly how should I be changing the above code to make it converge? I felt cartpole is a pretty simple environment & at least the loss should have been decreasing (if not overfitting), and secondly, if not TD error what else should I use here to examine the training loss (for finding whether its overfitting or not)?
from d3rlpy.
Sadly, particularly in offline deep RL, it's very difficult to prevent divergence. So my recommendation is to give up on the convergence. Also, in offline RL, there is no good metrics to measure policy performance yet. I'd direct you to this documentation and a paper:
- https://d3rlpy.readthedocs.io/en/v2.3.0/tutorials/offline_policy_selection.html
- https://arxiv.org/abs/2007.09055
Offline deep RL still needs a lot of inventions to make it practical 😓
from d3rlpy.
Ohh.....do you know whether during training the transitions are sampled randomly from the replay buffer (if not how to randomly shuffle the transitions)?
from d3rlpy.
Yes, the mini-batch is uniformly sampled from the buffer.
from d3rlpy.
Another suggestion to prevent the divergence is using offline RL algorithms. For now, it looks like you're using DQN, which is designed for online training. If you use DiscreteCQL
instead, you might get the better results in offline setting.
from d3rlpy.
Please let me close this issue since it's simply a nature of offline RL.
from d3rlpy.
Related Issues (20)
- [BUG] DiscreteDecisionTransformer Inference Problem, AttributeError: 'numpy.ndarray' object has no attribute 'length' HOT 4
- [Question]Just want to make sure that the "environment" metric collected by the logger of offline RL algorithms is the result used in papers HOT 2
- d3rlpy install d4rl HOT 4
- ValueError: too many values to unpack (expected 4) when using hopper-medium-v0 environment HOT 6
- [BUG] How to continue training from a save checkpoint HOT 2
- [QUESTION] len(observation_shape) == 1 HOT 5
- [BUG] saving and loading model with custom network gives KeyError: 'custom' HOT 3
- [QUESTION] Offline Learning via custom MDPDataset HOT 2
- gym version incompatibility HOT 2
- Differences in RTG computation between inference and training time HOT 2
- [QUESTION] Custom evaluators not working. HOT 4
- [QUESTION] How to implement a new algorithm? Any guidelines? HOT 1
- [QUESTION] Importing Stable Baselines3 (SB3) Models into d3rlpy HOT 4
- [QUESTION] Issue with Evaluating Decision Transformer Using Evaluators in d3rlpy HOT 1
- [BUG] FQE Loading .d3 errors HOT 1
- a question about d3rlpy 1.1.1 install bug HOT 2
- Issue when using d3rlpy.load_learnable() to load trained model HOT 3
- [REQUEST] Annealing schedule of hyperparameters HOT 1
- create own environment HOT 4
- [Compilation error] Cython.Compiler.Errors.CompileError when running hopper example HOT 1
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from d3rlpy.