Code Monkey home page Code Monkey logo

pytorch-vq-vae's People

Contributors

kashif avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

pytorch-vq-vae's Issues

EMA update before quantization

Hi and thanks for providing a nice and clean implementation of VQ-VAEs :)

While playing around with your code, I noticed that in VectorQuantizerEMA you first perform the EMA update of the codebook counts and embeddings, and then use the updated codebook embeddings as the quantized vectors (and for computing e_latent_loss).

In particular, the order in which you perform operations is:

  1. Nearest neighbour search
  2. EMA updates
  3. Quantization
  4. e_latent_loss computation

Is there a reason why you do the EMA updates before steps 3 and 4? My intuition says that the order should be:

  1. Nearest neighbour search
  2. Quantization
  3. e_latent_loss computation
  4. EMA updates

Looking forward to hearing your thoughts!

Many thanks,
Stefanos

How does the reconstruction loss update the encoder?

Thanks for your implementation of VQ-VAE but I've got a question. In the original paper, the gradients to the inputs of the decoder have been copied to the encoder's output cuz the op 'index selection' is non-differentiable, but I didn't the corresponding implementation in your code. I'm new in pytorch and not familiar with the auto-grad system, so it'll be appreciatable to have a little explanation about this. Thanks!

Intentional weight sharing in ResidualStack?

The list passed to nn.ModuleList in the ResidualStack class ctor in vae.ipynb#L324 duplicates a reference to a single Residual object instance. Was this done intentionally?

self._layers = nn.ModuleList(
[Residual(in_channels, num_hiddens, num_residual_hiddens)] * self._num_residual_layers)

To create a new objects for each layer the code might be changed to:

self._layers = nn.ModuleList([Residual(in_channels, num_hiddens, num_residual_hiddens)
                             for _ in range(self._num_residual_layers)])

Unintentional decay of embeddings towards 0?

By updating all embeddings regardless of if they are being used you are decaying them towards 0. Is this intended?

I tried removing the decay but it seems to decrease perplexity.

How to extract only the latent codes?

Hi! Thank you for the great upload. How exactly can I extract the latent code of an image? By that I mean the code of size i.e [1,8,8] and not [128,8,8]

Thanks!

Replicating the bits / dim results in the original VQ-VAE paper

Hi,

thanks for your clean implementation ! I was wondering, have you ever tried to calculate the bits / dimension metric (as in the original paper) ? I've tried to do so using the provided code, and I'm still quite far from the results in the paper. I was hoping maybe you would have some insight to share as to why that is the case.

Thanks!
Lucas

Hi, why do we need to swap axis before we flatten?

Hi, I can't figure out why we need to change from BCHW to BHWC before we flatten.
I would be happy if you could explain this moment.
Thank you!

convert inputs from BCHW -> BHWC

    inputs = inputs.permute(0, 2, 3, 1).contiguous()
    input_shape = inputs.shape
    
    # Flatten input
    flat_input = inputs.view(-1, self._embedding_dim)

Codebook update

Hi everybody,

looking at the VectorQuantizerEMA nn.Module in the code, I was not able to understand how the codebook vectors are updated after initialization. Is there a way to force the use of all the codebook?
Last, how should I read the perplexity value?

Thank you!

Giorgio

dimension issue

``
def forward(self, inputs):
# convert inputs from BCHW -> BHWC
inputs = inputs.permute(0, 2, 3, 1).contiguous()
input_shape = inputs.shape

    # Flatten input
    flat_input = inputs.view(-1, self._embedding_dim)

``
My unders understanding is: dimension of flat_input should be BHWC*embedding_dim, one dimension seems to be missing? Or you are saying number of channels equal to embedding_dim?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.