Hi, I'm confused... In rnns.py，there is a function as follows: <div class="snippet

It is already considered in here. <div

Hi, shouldn't it be <div class="snippet-clipboard-content notranslate position-rel

no. action is <math-renderer class="js-inline-math" style="display: inline" data-stati

Because the comment for observation is <div class="snippet-clipboard-content notra

Why are observation_embed and action at the same “t” in the rollout_representation function? about dreamer-pytorch HOT 7 OPEN

juliusfrost commented on August 9, 2024 1

Why are observation_embed and action at the same “t” in the rollout_representation function?

from dreamer-pytorch.

Comments (7)

seolhokim commented on August 9, 2024

It is already considered in here.

dreamer-pytorch/dreamer/algos/dreamer_algo.py

Lines 194 to 196 in 47bd509

    
           observation = samples.all_observation[:-1]  # [t, t+batch_length+1] -> [t, t+batch_length] 
        
           action = samples.all_action[1:]  # [t-1, t+batch_length] -> [t, t+batch_length] 
        
           reward = samples.all_reward[1:]  # [t-1, t+batch_length] -> [t, t+batch_length]

from dreamer-pytorch.

gunnxx commented on August 9, 2024

Hi, shouldn't it be

observation = samples.all_observation[:-1]  # [t, t+batch_length+1] -> [t, t+batch_length] 
action = samples.all_action[:-1]            # [t-1, t+batch_length] -> [t-1, t+batch_length-1] 
reward = samples.all_reward[1:]             # [t-1, t+batch_length] -> [t, t+batch_length]

so that

self.representation_model(obs_embed[t], action[t], prev_state)

will be $p(s_t | s_{t-1}, a_{t-1})$ for the prior and $p(s_t | s_{t-1}, a_{t-1}, o_t)$ for the posterior.

Current code is computing $p(s_t | s_{t-1}, a_t)$ for the prior and $p(s_t | s_{t-1}, a_t, o_t)$ for the posterior. Did I miss something?

from dreamer-pytorch.

seolhokim commented on August 9, 2024

all_observation is observation. not state. check the comment in lines :)

from dreamer-pytorch.

gunnxx commented on August 9, 2024

Hi sorry maybe I was not clear. My question was about indexing the action. The code is

        observation = samples.all_observation[:-1]  # [t, t+batch_length+1] -> [t, t+batch_length]
        action = samples.all_action[1:]  # [t-1, t+batch_length] -> [t, t+batch_length]
        reward = samples.all_reward[1:]  # [t-1, t+batch_length] -> [t, t+batch_length]
        reward = reward.unsqueeze(2)
        done = samples.done
        done = done.unsqueeze(2)

        # Extract tensors from the Samples object
        # They all have the batch_t dimension first, but we'll put the batch_b dimension first.
        # Also, we convert all tensors to floats so they can be fed into our models.

        lead_dim, batch_t, batch_b, img_shape = infer_leading_dims(observation, 3)
        # squeeze batch sizes to single batch dimension for imagination roll-out
        batch_size = batch_t * batch_b

        # normalize image
        observation = observation.type(self.type) / 255.0 - 0.5
        # embed the image
        embed = model.observation_encoder(observation)

        prev_state = model.representation.initial_state(batch_b, device=action.device, dtype=action.dtype)
        # Rollout model by taking the same series of actions as the real model
        prior, post = model.rollout.rollout_representation(batch_t, embed, action, prev_state)

which means embed is $o_{t:t+K}$ and action is $a_{t:t+K}$ (judging by the comment in the code). Don't we need $a_{t-1:t+K-1}$ instead?

from dreamer-pytorch.

seolhokim commented on August 9, 2024

no. action is $a_{t-1 : t+K-1}$. observation sequence timestep is like [t, t+batch_length+1] by [:-1]and action sequence timestep is like [t-1, t+batch_length] by [1:]

from dreamer-pytorch.

gunnxx commented on August 9, 2024

Because the comment for observation is

# [t, t+batch_length+1] -> [t, t+batch_length]

and for action is

# [t-1, t+batch_length] -> [t, t+batch_length]

So that's why I thought it was wrong because both are $o_{t:t+K}$ and $a_{t:t+K}$. I said I was not really sure as well because I was not sure about the replay buffer sampling. Thanks for the confirmation!

from dreamer-pytorch.

seolhokim commented on August 9, 2024

Okay. Every code is fine. Depending on where you cut the array, you can create data starting from t-1 or data starting from t. Thanks.

from dreamer-pytorch.

Why are observation_embed and action at the same “t” in the rollout_representation function? about dreamer-pytorch HOT 7 OPEN

Comments (7)

Related Issues (13)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent

	observation = samples.all_observation[:-1] # [t, t+batch_length+1] -> [t, t+batch_length]
	action = samples.all_action[1:] # [t-1, t+batch_length] -> [t, t+batch_length]
	reward = samples.all_reward[1:] # [t-1, t+batch_length] -> [t, t+batch_length]