zalandoresearch / pytorch-vq-vae Goto Github PK

View Code? Open in Web Editor NEW

501.0 501.0 96.0 1000 KB

PyTorch implementation of VQ-VAE by Aäron van den Oord et al.

License: MIT License

Jupyter Notebook 100.00%

deep-learning pytorch vae vq-vae

pytorch-vq-vae's People

Contributors

Stargazers

Watchers

Forkers

emediacode fashionbrainteam jgraving stjordanis shaojinding mazzzystar pclucas14 novaintrovert thzll2001 shuningjin phongnhhn92 pfvaldez jry01 aryan580 wangfn burunduk-chip salmedina guoday see-y720 summerblue5 ioangatop usefulmlforks strategist922 christophyoon bithikajain the8floor zcrenshaw mgrankin gogobd narendhrancs l9l4 zhaojw1998 darlingwu juiscoming zeroqiaoba will-shaddix owalnuto jamestwebber aneydrac rouven-spiess daydreamer2023 lr32768 wangxinyilinda pangzss yueyedeai sidharth06 gabrielelibardi programwhiz shiyuzh2007 avigailst tanglespace guoyanhui03 tanwirahmad kdaisuke0203 yjang43 drydenwiebe chenghegao4556 statmixedml yuyaozhao keleas agkhalil gengzigang bjmeo8 baofff allenpu nogabytes haoheliu larissa-11 pratikstar leemengtw vigneashpandiyan linlinsongyun cv-synthesis jainaarushi hellogreat singlesnail hermittt wanglei666666666 data-ai-ml-services fzergz jinghong107 ruihan0495 summeryoo kakadeguaidao arvintashakori codemakerzwh supershiye peihuahan zhikangniu slowmoyang demircantas fengshi-cherish limei1221 toymaker222 minggini zxq-0058

pytorch-vq-vae's Issues

EMA update before quantization

Hi and thanks for providing a nice and clean implementation of VQ-VAEs :)

While playing around with your code, I noticed that in VectorQuantizerEMA you first perform the EMA update of the codebook counts and embeddings, and then use the updated codebook embeddings as the quantized vectors (and for computing e_latent_loss).

In particular, the order in which you perform operations is:

Nearest neighbour search
EMA updates
Quantization
e_latent_loss computation

Is there a reason why you do the EMA updates before steps 3 and 4? My intuition says that the order should be:

Nearest neighbour search
Quantization
e_latent_loss computation
EMA updates

Looking forward to hearing your thoughts!

Many thanks,
Stefanos

How does the reconstruction loss update the encoder?

Thanks for your implementation of VQ-VAE but I've got a question. In the original paper, the gradients to the inputs of the decoder have been copied to the encoder's output cuz the op 'index selection' is non-differentiable, but I didn't the corresponding implementation in your code. I'm new in pytorch and not familiar with the auto-grad system, so it'll be appreciatable to have a little explanation about this. Thanks!

Intentional weight sharing in ResidualStack?

The list passed to nn.ModuleList in the ResidualStack class ctor in vae.ipynb#L324 duplicates a reference to a single Residual object instance. Was this done intentionally?

self._layers = nn.ModuleList(
[Residual(in_channels, num_hiddens, num_residual_hiddens)] * self._num_residual_layers)

To create a new objects for each layer the code might be changed to:

self._layers = nn.ModuleList([Residual(in_channels, num_hiddens, num_residual_hiddens)
                             for _ in range(self._num_residual_layers)])

Unintentional decay of embeddings towards 0?

By updating all embeddings regardless of if they are being used you are decaying them towards 0. Is this intended?

I tried removing the decay but it seems to decrease perplexity.

Hi,how its possible to train VQ-VAE on my own data set images? Thanks

How to extract only the latent codes?

Hi! Thank you for the great upload. How exactly can I extract the latent code of an image? By that I mean the code of size i.e [1,8,8] and not [128,8,8]

Thanks!

Replicating the bits / dim results in the original VQ-VAE paper

Hi,

thanks for your clean implementation ! I was wondering, have you ever tried to calculate the bits / dimension metric (as in the original paper) ? I've tried to do so using the provided code, and I'm still quite far from the results in the paper. I was hoping maybe you would have some insight to share as to why that is the case.

Thanks!
Lucas

Hi, why do we need to swap axis before we flatten?

Hi, I can't figure out why we need to change from BCHW to BHWC before we flatten.
I would be happy if you could explain this moment.
Thank you!

convert inputs from BCHW -> BHWC

    inputs = inputs.permute(0, 2, 3, 1).contiguous()
    input_shape = inputs.shape
    
    # Flatten input
    flat_input = inputs.view(-1, self._embedding_dim)

Codebook update

Hi everybody,

looking at the VectorQuantizerEMA nn.Module in the code, I was not able to understand how the codebook vectors are updated after initialization. Is there a way to force the use of all the codebook?
Last, how should I read the perplexity value?

Thank you!

Giorgio

dimension issue

``
def forward(self, inputs):
# convert inputs from BCHW -> BHWC
inputs = inputs.permute(0, 2, 3, 1).contiguous()
input_shape = inputs.shape

    # Flatten input
    flat_input = inputs.view(-1, self._embedding_dim)

``
My unders understanding is: dimension of flat_input should be BHWC*embedding_dim, one dimension seems to be missing? Or you are saying number of channels equal to embedding_dim?

zalandoresearch / pytorch-vq-vae Goto Github PK

pytorch-vq-vae's People

Contributors

Stargazers

Watchers

Forkers

pytorch-vq-vae's Issues

EMA update before quantization

How does the reconstruction loss update the encoder?

Intentional weight sharing in ResidualStack?

Unintentional decay of embeddings towards 0?

Hi,how its possible to train VQ-VAE on my own data set images? Thanks

How to extract only the latent codes?

Replicating the bits / dim results in the original VQ-VAE paper

Hi, why do we need to swap axis before we flatten?

convert inputs from BCHW -> BHWC

Codebook update

dimension issue

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent