Code Monkey home page Code Monkey logo

visual-qa's Introduction

#Deep Learning for Visual Question Answering

Click here to go to the accompanying blog post.

This project uses Keras to train a variety of Feedforward and Recurrent Neural Networks for the task of Visual Question Answering. It is designed to work with the VQA dataset.

Models Implemented:

BOW+CNN Model LSTM + CNN Model
alt text alt text

##Requirements

  1. Keras 0.20
  2. spaCy 0.94
  3. scikit-learn 0.16
  4. progressbar
  5. Nvidia CUDA 7.5 (optional, for GPU acceleration)

Tested with Python 2.7 on Ubuntu 14.04 and Centos 7.1.

###Notes:

  1. Keras needs the latest Theano, which in turn needs Numpy/Scipy.
  2. spaCy is currently used only for converting questions to a vector (or a sequence of vectors), this dependency can be easily be removed if you want to.
  3. spaCy uses Goldberg and Levy's word vectors by default, but I found the performance to be much superior with Stanford's [Glove word vectors].

##The numbers Performance on the validation set of the VQA Challenge:

Model Accuracy
BOW+CNN 44.30%
LSTM-Language only 42.51%
LSTM+CNN 47.80%

There is a lot of scope for hyperparameter tuning here. Experiments were done for 100 epochs.

Model Training Time on GTX 760
BOW+CNN 160 seconds/epoch
LSTM+CNN 200 seconds/epoch

##Get Started Have a look at the get_started.sh script in the scripts folder. Also, have a look at the readme present in each of the folders.

##Feedback All kind of feedback (code style, bugs, comments etc.) is welcome. Please open an issue on this repo instead of mailing me, since it helps me keep track of things better.

##License MIT

visual-qa's People

Contributors

avisingh599 avatar

Watchers

James Cloos avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.