Code Monkey home page Code Monkey logo

semantic_compositional_nets's Introduction

Semantic Compositional Networks

The Theano code for the CVPR 2017 paper โ€œSemantic Compositional Networks for Visual Captioningโ€

Model architecture and illustration of semantic composition.

Dependencies

This code is written in python. To use it you will need:

  • Python 2.7 (do not use Python 3.0)
  • Theano 0.7 (you can also use the most recent version)
  • A recent version of NumPy and SciPy

Getting started

We provide the code on how to train SCN for image captioning on the COCO dataset.

  • In order to start, please first download the ResNet features and tag features we used in the experiments. Put the coco folder inside the ./data folder.

  • We also provide our pre-trained model on COCO. Put the pretrained_model folder into the current directory.

  • In order to evaluate the model, please download the standard coco-caption evaluation code. Copy the folder pycocoevalcap into the current directory.

  • Now, everything is ready.

How to use the code

  1. Run SCN_training.py to start training. On a modern GPU, the model will take one night to train.
THEANO_FLAGS=mode=FAST_RUN,device=gpu,floatX=float32 python SCN_training.py 
  1. Based on our pre-trained model, run SCN_decode.py to generate captions on the COCO small 5k test set. The generated captions are also provided, named coco_scn_5k_test.txt.

  2. Now, run SCN_evaluation.py to evaluate the model. The code will output

CIDEr: 1.043, Bleu-4: 0.341, Bleu-3: 0.446, Bleu-2: 0.582, Bleu-1: 0.743, ROUGE_L: 0.550, METEOR: 0.261. 
  1. In the ./data/coco folder, we also provide the features for the COCO official validation and test sets. Run SCN_for_test_server.py will help you generate captions for the official test set, and prepare the .json file for submission.

Video Captioning

In order to keep things simple, we provide another separate repo that reproduces our results on video captioning, using the Youtube2Text dataset.

Citing SCN

Please cite our CVPR paper in your publications if it helps your research:

@inproceedings{SCN_CVPR2017,
  Author = {Gan, Zhe and Gan, Chuang and He, Xiaodong and Pu, Yunchen and Tran, Kenneth and Gao, Jianfeng and Carin, Lawrence and Deng, Li},
  Title = {Semantic Compositional Networks for Visual Captioning},
  booktitle={CVPR},
  Year  = {2017}
}

semantic_compositional_nets's People

Contributors

zhegan avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.