Comments (10)
Which dataset are you using? Because let's say the maximum capacity of the replay buffer is 1M and the dataset size is below 1M. In that case, there is still room to store new experiences in the replay buffer. If n_steps
of fit_online
is very large such as 1M, do you still see the same result?
from d3rlpy.
I couldn't actually reproduce your problem on my end. Here is my test code:
import d3rlpy
dqn = d3rlpy.algos.DQN(batch_size=10)
dataset, env = d3rlpy.datasets.get_cartpole()
buffer = d3rlpy.online.buffers.ReplayBuffer(maxlen=100, env=env, episodes=dataset.episodes)
print(buffer.transitions[0].observation)
dqn.fit_online(env, buffer, n_steps=100)
print(buffer.transitions[0].observation)
Note that I'm using gym==0.17.0
for compatibility issues, but this does not affect the behavior of ReplayBuffer
. The result is:
[0.37582362 0.5620217 0.00999125 0.24562645]
[-0.00132633 -0.599253 -0.00662717 0.88046914]
Can you share the complete example that I can reproduce your problem?
from d3rlpy.
Thanks for the instruction and replies.
I have created a snippet as a simplified example. And now it works.
There must be something I have done wrong in previous codes.
import d3rlpy
dataset, env = d3rlpy.datasets.get_dataset("hopper-expert-v2")
policy = d3rlpy.algos.CQL(use_gpu=True)
policy.fit(dataset, n_steps = 100, n_steps_per_epoch=50, show_progress=False)
policy.save_model(f'CQL.pt')
pretrained_model_name = f'CQL.pt'
policy.build_with_env(env)
policy.load_model(pretrained_model_name)
buffer = d3rlpy.online.buffers.ReplayBuffer(maxlen=20, env=env, episodes=dataset.episodes)
for _ in range(2):
policy.fit_online(env, buffer, n_steps=100, n_steps_per_epoch=50, show_progress=False)
t0=buffer.transitions[0]
print(f'first obs.:{t0.observation}')
first obs.:[ 1.2099096 -0.04010323 -0.07108203 -0.003647 0.16283353 0.3001447
-0.03584122 -0.22062637 -0.29892066 -0.403349 -0.8289119 ]
first obs.:[ 1.2481172 -0.01057371 -0.00299744 -0.01796114 -0.02033593 0.07686285
-0.2977234 0.9424554 0.8758408 0.48696142 -0.93829495]
from d3rlpy.
@HYDesmondLiu Hi, thanks for the issue. Please check this IQL reproduction script.
d3rlpy/reproductions/finetuning/iql.py
Line 61 in a11b15d
from d3rlpy.
Thanks for the prompt reply.
OKAY, let me ask you this way. Since the buffer is FIFO, I have tried to fill the buffer
with the dataset's transitions.
And with CQL.fit_online
, the first transition is not changed after finetuning via interacting with the environment.
My workaround is to enlarge the maxlen
of the buffer, I believe there is a bug in the append
code block.
from d3rlpy.
Can you make sure if you set episodes=dataset.episodes
like the example above?
from d3rlpy.
I have set it and it does not work either.
Since it is a queue, the first element should be removed when we try to add a new element to the full buffer.
However, I use buffer.transitions[0]
to print out the first observation in the buffer. It is always the same.
Basically, I was doing the training as the following snippet. Please let me know if anything is set incorrectly that
leads to the buffer is not updated after training online.
dataset, env = d3rlpy.datasets.get_dataset(env_name)
policy = d3rlpy.algos.CQL(use_gpu=True)
pretrained_model_name = f'xxx.pt'
policy.build_with_env(env)
policy.load_model(pretrained_model_name)
buffer = d3rlpy.online.buffers.ReplayBuffer(maxlen=1_000_000, env=env, episodes=dataset.episodes)
policy.fit_online(env, buffer, n_steps=250_000)
t0=buffer.transitions[0]
tl=buffer.transitions[-1]
print(f'first obs.:{t0.observation}/last obs.:{tl.observation}')
from d3rlpy.
I am using MuJoCo, however, I deliberately downsample the dataset on purpose.
Since in real-world offline-to-online finetuning problems there might not be access to while offline dataset.
So actually I downsample the buffer with only 5% of the original amount. So the buffers are filled up really quick.
from d3rlpy.
I see. Then, I believe that the replay buffer still has space to store new experiences. You can test this by decreasing maxlen
of the replay buffer to the very small value to see if it works correctly.
from d3rlpy.
Thanks for quick response. Actually that was what I did. The code snippet I provided is modified. The maxlen
I used was only 5% of the original size of the buffer. And I output the first element of the observations after every fit_online
was done. And it was always the same one.
from d3rlpy.
Related Issues (20)
- [BUG] Scorer freezes HOT 3
- [BUG] Error for `gym` with version 0.26.2 HOT 2
- [QUESTION] Using n_frames for Tabular data HOT 2
- How can I accelerate my training [HELP] HOT 2
- [BUG] Question / Possible bug - observation data changing when loaded into TransitionMiniBatch() HOT 9
- [BUG] CQL Crashing HOT 3
- [Question] Question about log parameters HOT 2
- [BUG] current overwriting of transitions in the buffer causes problems in to_mdp_dataset() HOT 7
- [QUESTION] Early stopping HOT 2
- [REQUEST] Vectorized / multiple env support HOT 2
- [BUG] Error in fitted q evaluation example HOT 3
- [REQUEST] How can I use Transformer to be a Encoder? HOT 3
- [QUESTION] What's meaning about discounted_sum_of_advantage_scorer? HOT 4
- [REQUEST] Adding Cal-QL HOT 2
- [Question] - Increasing model capacity HOT 2
- Models' Logits return, HOT 2
- [REQUEST] For next release, add the option to specify the ensemble reduction method as parameter of QFunc Factory
- Discrete version of MOPO / COMBO
- [REQUEST] Multi dimensional action space HOT 9
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from d3rlpy.