Code Monkey home page Code Monkey logo

semantic-alignment-for-hierarchical-image-captioning's Introduction

Abstract

Inspired by recent progress of hierarchical reinforcement learning and adversarial text generation, we introduce a hierarchical adversarial attention based model to generate natural language description of images. The model automatically learns to align the attention over images and subgoal vectors in the process of caption generation. We describe how we can train, use and understand the model by showing its performance on Flickr8k. We also visualize the subgoal vectors and attention over images during generation procedures.

Authors

ย 
Sidi Lu Zhiyong Fang Peiyao Sheng

Demo

IMAGE ALT TEXT HERE

Code

We provide source code on [Github](https://github.com/zhiyong1997/Semantic-Alignment-for-Hierarchical-Image-Captioning), including:
1. Train/Test code.
2. Visualization tool for attention mechanism.

Sample Usage

Our model can handle COCO, Flickr8k and Flickr30k dataset. For simplicity, we only present Flickr8k here.

1. Create folder ./code/dataset

2. Download processed Flickr8k Image Captioning Dataset from here with key: sh4u

3. Unzip the downloaded file in ./code/dataset/

4. Download resnet50 model file in ./code/saved_model/ from here with key: h712

4. Run ./code/main.py with python3

Paper

Our paper is available here

Bibtex

@article{Lu2018SemanticAlignment,
          title={Semantic Alignment for Hierarchical Image Captioning},
          author={Lu, Sidi and Fang, Zhiyong and Sheng, Peiyao},
          year={2018},
          howpublished={\url{https://github.com/zhiyong1997/Semantic-Alignment-for-Hierarchical-Image-Captioning}}
        }

Example Result

semantic-alignment-for-hierarchical-image-captioning's People

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.