Code Monkey home page Code Monkey logo

Comments (10)

takuseno avatar takuseno commented on June 10, 2024 1

Which dataset are you using? Because let's say the maximum capacity of the replay buffer is 1M and the dataset size is below 1M. In that case, there is still room to store new experiences in the replay buffer. If n_steps of fit_online is very large such as 1M, do you still see the same result?

from d3rlpy.

takuseno avatar takuseno commented on June 10, 2024 1

I couldn't actually reproduce your problem on my end. Here is my test code:

import d3rlpy

dqn = d3rlpy.algos.DQN(batch_size=10)
dataset, env = d3rlpy.datasets.get_cartpole()
buffer = d3rlpy.online.buffers.ReplayBuffer(maxlen=100, env=env, episodes=dataset.episodes)
print(buffer.transitions[0].observation)
dqn.fit_online(env, buffer, n_steps=100)
print(buffer.transitions[0].observation)

Note that I'm using gym==0.17.0 for compatibility issues, but this does not affect the behavior of ReplayBuffer. The result is:

[0.37582362 0.5620217  0.00999125 0.24562645]
[-0.00132633 -0.599253   -0.00662717  0.88046914]

Can you share the complete example that I can reproduce your problem?

from d3rlpy.

HYDesmondLiu avatar HYDesmondLiu commented on June 10, 2024 1

Thanks for the instruction and replies.
I have created a snippet as a simplified example. And now it works.
There must be something I have done wrong in previous codes.

import d3rlpy

dataset, env = d3rlpy.datasets.get_dataset("hopper-expert-v2")
policy = d3rlpy.algos.CQL(use_gpu=True)
policy.fit(dataset, n_steps = 100, n_steps_per_epoch=50, show_progress=False)
policy.save_model(f'CQL.pt')

pretrained_model_name = f'CQL.pt'
policy.build_with_env(env)
policy.load_model(pretrained_model_name)


buffer = d3rlpy.online.buffers.ReplayBuffer(maxlen=20, env=env, episodes=dataset.episodes)    

for _ in range(2):
    policy.fit_online(env, buffer, n_steps=100, n_steps_per_epoch=50, show_progress=False)
    t0=buffer.transitions[0]
    print(f'first obs.:{t0.observation}')


first obs.:[ 1.2099096  -0.04010323 -0.07108203 -0.003647    0.16283353  0.3001447
 -0.03584122 -0.22062637 -0.29892066 -0.403349   -0.8289119 ]
first obs.:[ 1.2481172  -0.01057371 -0.00299744 -0.01796114 -0.02033593  0.07686285
 -0.2977234   0.9424554   0.8758408   0.48696142 -0.93829495]  

from d3rlpy.

takuseno avatar takuseno commented on June 10, 2024

@HYDesmondLiu Hi, thanks for the issue. Please check this IQL reproduction script.

buffer = d3rlpy.online.buffers.ReplayBuffer(maxlen=1000000,

from d3rlpy.

HYDesmondLiu avatar HYDesmondLiu commented on June 10, 2024

Thanks for the prompt reply.
OKAY, let me ask you this way. Since the buffer is FIFO, I have tried to fill the buffer with the dataset's transitions.
And with CQL.fit_online, the first transition is not changed after finetuning via interacting with the environment.

My workaround is to enlarge the maxlen of the buffer, I believe there is a bug in the append code block.

from d3rlpy.

takuseno avatar takuseno commented on June 10, 2024

Can you make sure if you set episodes=dataset.episodes like the example above?

from d3rlpy.

HYDesmondLiu avatar HYDesmondLiu commented on June 10, 2024

I have set it and it does not work either.
Since it is a queue, the first element should be removed when we try to add a new element to the full buffer.
However, I use buffer.transitions[0] to print out the first observation in the buffer. It is always the same.

Basically, I was doing the training as the following snippet. Please let me know if anything is set incorrectly that
leads to the buffer is not updated after training online.

dataset, env = d3rlpy.datasets.get_dataset(env_name)
policy = d3rlpy.algos.CQL(use_gpu=True)
pretrained_model_name = f'xxx.pt'
policy.build_with_env(env)
policy.load_model(pretrained_model_name)
buffer = d3rlpy.online.buffers.ReplayBuffer(maxlen=1_000_000, env=env, episodes=dataset.episodes) 
policy.fit_online(env, buffer, n_steps=250_000)
t0=buffer.transitions[0]
tl=buffer.transitions[-1]
print(f'first obs.:{t0.observation}/last obs.:{tl.observation}')

from d3rlpy.

HYDesmondLiu avatar HYDesmondLiu commented on June 10, 2024

I am using MuJoCo, however, I deliberately downsample the dataset on purpose.

Since in real-world offline-to-online finetuning problems there might not be access to while offline dataset.

So actually I downsample the buffer with only 5% of the original amount. So the buffers are filled up really quick.

from d3rlpy.

takuseno avatar takuseno commented on June 10, 2024

I see. Then, I believe that the replay buffer still has space to store new experiences. You can test this by decreasing maxlen of the replay buffer to the very small value to see if it works correctly.

from d3rlpy.

HYDesmondLiu avatar HYDesmondLiu commented on June 10, 2024

Thanks for quick response. Actually that was what I did. The code snippet I provided is modified. The maxlen I used was only 5% of the original size of the buffer. And I output the first element of the observations after every fit_online was done. And it was always the same one.

from d3rlpy.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.