Code Monkey home page Code Monkey logo

devise-zero-shot-classification's People

Contributors

fg91 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar

devise-zero-shot-classification's Issues

Loss Function

I see that in the notebok you used cosine loss function, but in the paper a more hinge-like-ranking loss is used. Why did you used cosine loss?

Latest Paper related to DeViSE

Hi Fabio.
I read your article on Medium. Due to some reasons I am not able to post response. I enjoyed reading your explanation of Paper. Can you point recent advancements in this space? I see this Paper was published in 2013 but it still looks relevant. Simple and Powerful.

Possible improvement?!

Hey Fabio, read the full article, maybe the most interesting post that I have read on Medium. I didn't knew something like this existed! Some decades down the line, I can see novels converted into movies by NN.

Here's what I felt about DeVise:
I think there can be a improvement here, you are doing two things at once, first is finding a vector space to represent your image and second to map it to the vector space of word vectors.

There is no reason this can't be done separately. As you know Variational Autoencoders (or whatever their latest improvement is) are more suited to find a continuous vector space such that the image can be reconstructed. Since the space is continuous, it has similar properties to word vectors, like ( man + glasses -> man with glasses, you get the idea). Word vectors also have this property like (king - man + woman -> queen).

Here is what I suggest: what if you try to map these two continuous vector spaces using a NN. You maybe able to generate a high amount of training data, for example, if you have black cat as a class, you can find a derived word vector representation for it, and a derived latent space representation for an image that may represent the same idea (black + cat, if the individual words were present as classes for images), but you get the idea... that you should be able to generate a huge amount of combinations(more data, better NN!) that you can use to train NN and find better transformations between the two continuous vector spaces.

When testing for an image, you can use encoder part of Autoencoder to generate the encoding and then use your newly trained NN to find the word vector representation for it. What's interesting is that you can do the opposite as well if you want, generate a latent space representation from the any word vector and then us ethe decoder part of the Autoencoder to generate an image.

Two Autoencoders sould also be able to do all of this. First VAE finds a latent representation for the image. Afte the first VAE has been thoroughly trained, you can train a second VAE. Second VAE finds the word representation for the latent space representation of the image. You should be able to map word vectors to images and images to word vectors.

The loss for second VAE could be a combination of how well it reconstructs latent space as well as how well it generates an encoding that is similar to any word vector. The data used to train second VAE would be generated by the class combinations for which we can also generate a word vector representation, both the source and the target vector representations are generated as seen in above examples.

You may say that we can use one VAE instead of two for this, but then your first VAE won't train properly since it loses all information at the layer where it will need to represent a state with same size as that of any word vector. Then how come second VAE will train properly? It won't but we should get a good enough encoder and decoder (better than DeVise?), the advantage of decoupling is better latent space representation for the images.

Imagine a future where your NN is generating an image for a complex word vector like 'Woman riding on a white horse'. Maybe you will need a Transformer for this one.

Haven't thought if we can get GANs to work with this since we can't get the internal space representation for images. But since GANs are good at mapping even noise to a continuous space which is not accessible to us, maybe latent space for an image found by decoding a word vector by the VAE could be transformed by a GAN to generate a realistic image. Unfortunately, I am on the poor side of the globe, you have the passion as well as a gpu, I hope you will entertain this idea or build upon its flaws, I am open to your thoughts on this.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.