Code Monkey home page Code Monkey logo

visual-question-answer's Introduction

Visual Question Answering through Modal Dialogue

We’re already seeing incredible applications of object detection in our daily lives. One such interesting application is Visual Question Answering. It is a new and upcoming problem in Computer Vision where the data consists of open-ended questions about images. In order to answer these questions, an effective system would need to have an understanding of “vision, language and common-sense.”

Before proceeding further, I would highly encourge you to quickly read the full VQA Post here.

Try it now on FloydHub

Run

Click this button to open a Workspace on FloydHub that will train this model.

Do remember to execute run_me_first_floyd.sh inside a terminal everytime you restart your workspace to install relevant dependencies.


This post will first dig into the basic theory behind the Visual Question Answering task. Then, we’ll discuss and build two approaches to VQA: the “bag-of-words” and the “recurrent” model. Finally, we’ll provide a tutorial workflow for training your own models and setting up a REST API on FloydHub to start detecting objects in your own images. The project code is in Python (Keras + TensorFlow). You can view my experiments directly on FloydHub, as well as the code (along with the weight files and data) on Github.

Since I've already preprocessed the data & stored everything in a FloydHub dataset, here's what we're going to do -

  • Checkout the preprocessed data from the VQA Dataset.
  • Build & train two VQA models using Keras & Tensorflow.
  • Assess the models on the VQA validation sets.
  • Run the model to generate some really cool predictions.

Serving Models on FloydHub

I've created a separate repository here to serve models since it avoids the overhead of pushing the entire code/data in the training repo to Floyd over & over again.


For Offline Execution

The following are a couple of instructions that must be gone through in order to execute different (or all) sections of this project. You will need a NVIDIA GPU to train these models.

  1. Clone the project, replacing VQAMD with the name of the directory you are creating:

     $ git clone https://github.com/sominwadhwa/vqa_floyd.git VQAMD
     $ cd VQAMD
    
  2. Make sure you have python 3.5.x running on your local system. If you do, skip this step. In case you don't, head head here.

  3. virtualenv is a tool used for creating isolated 'virtual' python environments. It is advisable to create one here as well (to avoid installing the pre-requisites into the system-root). Do the following within the project directory:

     $ [sudo] pip install virtualenv
     $ virtualenv --system-site-packages VQAMD
     $ source VQAMD/bin/activate
    

To deactivate later, once you're done with the project, just type deactivate.

  1. Install the pre-requisites from requirements.txt & run tests/init.py to check if all the required packages were correctly installed:

     $ pip install -r requirements.txt
     $ bash run_me_first_on_floyd.sh
    

Contributing to VQA

I welcome contributions to this little project. If you have any new ideas or approaches that you'd like to incorporate here, feel free to open up an issue.

Please refer to each project's style guidelines and guidelines for submitting patches and additions. In general, we follow the "fork-and-pull" Git workflow.

  1. Fork the repo VQAMD on GitHub
  2. Clone the project to your own machine
  3. Commit changes to your own branch
  4. Push your work back up to your fork
  5. Submit a Pull request so that we can review your changes

NOTE: Be sure to merge the latest from "upstream" before making a pull request!

Issues

Feel free to submit issues and enhancement requests.

visual-question-answer's People

Contributors

sominw avatar

Watchers

James Cloos avatar zhouyonglong avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.