When attempting to create an MDPDataset in d3rlpy with data shaped as for example (100

<a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="/us

[QUESTION] multidimensional states and actions about d3rlpy HOT 8 CLOSED

bzeni1 commented on June 23, 2024

[QUESTION] multidimensional states and actions

from d3rlpy.

Comments (8)

takuseno commented on June 23, 2024

@bzeni1 Hi, could you share the minimal example that I can reproduce your issue? It sounds like your code is simply incorrect.

btw, when you instantiate algorithms, you need to do as follows:

ddpg = d3rlpy.algos.DDPGConfig().create()

from d3rlpy.

bzeni1 commented on June 23, 2024

@takuseno Hi, find my code below. What could be the problem? Thank you in advance for your assistance on this matter.

processed_data = race_data.copy()
settings_columns = [ #22 selected coloumns from my dataset ]
processed_data['reward'] = processed_data['ACCELERATION_m_s2']

for col in settings_columns:
    processed_data[f'state_{col}'] = processed_data[col]
    

for col in settings_columns:
    processed_data[f'action_{col}'] = processed_data[col].diff().fillna(0)


for col in settings_columns:
    processed_data[f'next_{col}'] = processed_data[col].shift(-1)

#end of an episode (race)
processed_data['done'] = processed_data['race_num'].diff(-1) != 0

processed_data = processed_data[processed_data['done'] == False]

print("After filtering rows:", processed_data.shape)

print("States shape:", processed_data[settings_columns].shape)
print("Actions shape:", processed_data[[f'action_{col}' for col in settings_columns]].shape)
print("Rewards shape:", processed_data['reward'].shape)
print("Next states shape:", processed_data[[f'next_{col}' for col in settings_columns]].shape)
print("Dones shape:", processed_data['done'].shape)

**Output**
States shape: (17642, 22)
Actions shape: (17642, 22)
Rewards shape: (17642,)
Next states shape: (17642, 22)
Dones shape: (17642,)

states = processed_data[[f'state_{col}' for col in settings_columns if f'state_{col}' in processed_data.columns]].to_numpy()
actions = processed_data[[col for col in processed_data.columns if col.startswith('action_')]].to_numpy()
rewards = processed_data['reward'].to_numpy()
next_states = processed_data[[col for col in processed_data.columns if col.startswith('next_')]].to_numpy()
dones = processed_data['done'].to_numpy()

#next step:

dataset = MDPDataset(states, actions, rewards, next_states, dones)

#ValueError: operands could not be broadcast together with shapes (388124,) (17642,)

from d3rlpy.

takuseno commented on June 23, 2024

Thanks for sharing your code. It looks like next_states is unnecessary. It needs to be as follows:

dataset = MDPDataset(states, actions, rewards, dones)

from d3rlpy.

bzeni1 commented on June 23, 2024

Thanks for your advice. By removing next_states I am encountering a new issue:

ValueError: Either episodes or env must be provided to determine signatures. Or specify signatures directly.

However I already defined the segment by the 'done' flags, I still don't know how to determine the episodes. What do you think?

from d3rlpy.

takuseno commented on June 23, 2024

My guess is thatdones is all zeros, thus episodes couldn't be found. You need to correctly setup dones.

from d3rlpy.

rohanblueboybaijal commented on June 23, 2024

I think I am running into a similar issue. I have 2 datasets. FOr both of them all the dimensions are the same
observations: (5000, 4), actions: (5000, 2), rewards: (5000,), terminals: (5000,)

But with 1 dataset the fit function for IQL fails. Although I am getting a different error. I can see that both datasets have some terminals = 1.
Any suggestions for where an error like this might come up?

File "/home/rohan/anaconda3/envs/franka/lib/python3.10/site-packages/d3rlpy/algos/qlearning/base.py", line 409, in fit
    results = list(
  File "/home/rohan/anaconda3/envs/franka/lib/python3.10/site-packages/d3rlpy/algos/qlearning/base.py", line 543, in fitter
    loss = self.update(batch)
  File "/home/rohan/anaconda3/envs/franka/lib/python3.10/site-packages/d3rlpy/algos/qlearning/base.py", line 863, in update
    loss = self._impl.update(torch_batch, self._grad_step)
  File "/home/rohan/anaconda3/envs/franka/lib/python3.10/site-packages/d3rlpy/torch_utility.py", line 365, in wrapper
    return f(self, *args, **kwargs)  # type: ignore
  File "/home/rohan/anaconda3/envs/franka/lib/python3.10/site-packages/d3rlpy/algos/qlearning/base.py", line 70, in update
    return self.inner_update(batch, grad_step)
  File "/home/rohan/anaconda3/envs/franka/lib/python3.10/site-packages/d3rlpy/algos/qlearning/torch/ddpg_impl.py", line 118, in inner_update
    metrics.update(self.update_critic(batch))
  File "/home/rohan/anaconda3/envs/franka/lib/python3.10/site-packages/d3rlpy/algos/qlearning/torch/ddpg_impl.py", line 84, in update_critic
    loss = self.compute_critic_loss(batch, q_tpn)
  File "/home/rohan/anaconda3/envs/franka/lib/python3.10/site-packages/d3rlpy/algos/qlearning/torch/iql_impl.py", line 73, in compute_critic
_loss
    q_loss = self._q_func_forwarder.compute_error(
  File "/home/rohan/anaconda3/envs/franka/lib/python3.10/site-packages/d3rlpy/models/torch/q_functions/ensemble_q_function.py", line 256, in
 compute_error
    return compute_ensemble_q_function_error(
  File "/home/rohan/anaconda3/envs/franka/lib/python3.10/site-packages/d3rlpy/models/torch/q_functions/ensemble_q_function.py", line 96, in 
compute_ensemble_q_function_error
    loss = forwarder.compute_error( 
  File "/home/rohan/anaconda3/envs/franka/lib/python3.10/site-packages/d3rlpy/models/torch/q_functions/mean_q_function.py", line 130, in com
pute_error
    value = self._q_func(observations, actions).q_value
  File "/home/rohan/anaconda3/envs/franka/lib/python3.10/site-packages/d3rlpy/models/torch/q_functions/base.py", line 35, in __call__
    return super().__call__(x, action)  # type: ignore
  File "/home/rohan/anaconda3/envs/franka/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/home/rohan/anaconda3/envs/franka/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1520, in _call_impl
    return forward_call(*args, **kwargs)
  File "/home/rohan/anaconda3/envs/franka/lib/python3.10/site-packages/d3rlpy/models/torch/q_functions/mean_q_function.py", line 99, in forw
ard
    q_value=self._fc(self._encoder(x, action)),
  File "/home/rohan/anaconda3/envs/franka/lib/python3.10/site-packages/d3rlpy/models/torch/encoders.py", line 41, in __call__
    return super().__call__(x, action)
  File "/home/rohan/anaconda3/envs/franka/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/home/rohan/anaconda3/envs/franka/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1520, in _call_impl
    return forward_call(*args, **kwargs)
  File "/home/rohan/anaconda3/envs/franka/lib/python3.10/site-packages/d3rlpy/models/torch/encoders.py", line 284, in forward
    return self._layers(x)
  File "/home/rohan/anaconda3/envs/franka/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/home/rohan/anaconda3/envs/franka/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1520, in _call_impl
    return forward_call(*args, **kwargs)
  File "/home/rohan/anaconda3/envs/franka/lib/python3.10/site-packages/torch/nn/modules/container.py", line 217, in forward
    input = module(input)
  File "/home/rohan/anaconda3/envs/franka/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/home/rohan/anaconda3/envs/franka/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1520, in _call_impl
    return forward_call(*args, **kwargs)
  File "/home/rohan/anaconda3/envs/franka/lib/python3.10/site-packages/torch/nn/modules/linear.py", line 116, in forward
    return F.linear(input, self.weight, self.bias)
RuntimeError: mat1 and mat2 shapes cannot be multiplied (256x6 and 5x256)

from d3rlpy.

takuseno commented on June 23, 2024

@rohanblueboybaijal Sorry for the late response. Could you share a minimal example that I can reproduce your error?

from d3rlpy.

takuseno commented on June 23, 2024

Let me close this issue since the initial question should be resolved. Feel free to open a new issue to follow up.

from d3rlpy.

[QUESTION] multidimensional states and actions about d3rlpy HOT 8 CLOSED

Comments (8)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent