Code Monkey home page Code Monkey logo

mia's Introduction

MIA (NeurIPS 2019)

Implementation of "Aligning Visual Regions and Textual Concepts for Semantic-Grounded Image Representations" by Fenglin Liu, Yuanxin Liu, Xuancheng Ren, Xiaodong He, and Xu Sun. The paper can be found at [arxiv], [pdf].

Image text

Semantic-Grounded Image Representations (Based on the Bottom-up features)

Coming Soon!

Usage

Requirements

This code is written in Python2.7 and requires PyTorch >= 0.4.1

You may take a look at https://github.com/s-gupta/visual-concepts to find how to get the textual concepts of an image by yourself.

Dataset Preparation

Download MSCOCO images and preprocess them

  • Download

Download the mscoco images from link. You need 2014 training images and 2014 val. images. You should put the train2014/ and val2014/ in the ./data/images/ directory.

Note: You also provide a download bash script to download the mscoco images:

cd data/images/original && bash download_mscoco_images.sh
  • Preprocess

Now you may need to run resize.py to resize all the images (in both train and val folder) into 256 x 256. You may specify different locations inside resize.py

python resize_images.py

Download MSCOCO captions and preprocess them

  • Download

You may download the mscoco captions from the official website or use the download bash script provided by us.

cd data && bash download_mscoco_captions.sh
  • Preprocess

Afterwards, you should create the Karpathy split for training, validation and test.

python KarpathySplit.py

Then you can build the vocabulary by running (Note: You should download the nltk_data to build the vocabulary.)

unzip nltk_data.zip && python build_vocab.py

Download image concepts

Download the Textual Concepts (Google Drive) and put it in the ./data/ directory.

mv image_concepts.json ./data

Start Training

Now you can train the baseline models and the baseline w/ MIA models with:

Visual Attention

  • Baseline
CUDA_VISIBLE_DEVICES=0,1 python Train.py --basic_model=VisualAttention 
  • Baseline w/ MIA
CUDA_VISIBLE_DEVICES=0,1 python Train.py --basic_model=VisualAttention --use_MIA=True --iteration_times=2

Concept Attention

  • Baseline
CUDA_VISIBLE_DEVICES=0,1 python Train.py --basic_model=ConceptAttention
  • Baseline w/ MIA
CUDA_VISIBLE_DEVICES=0,1 python Train.py --basic_model=ConceptAttention --use_MIA=True --iteration_times=2

Visual Condition

  • Baseline
CUDA_VISIBLE_DEVICES=0,1 python Train.py --basic_model=VisualCondition
  • Baseline w/ MIA
CUDA_VISIBLE_DEVICES=0,1 python Train.py --basic_model=VisualCondition --use_MIA=True --iteration_times=2

Concept Attention

  • Baseline
CUDA_VISIBLE_DEVICES=0,1 python Train.py --basic_model=ConceptCondition
  • Baseline w/ MIA
CUDA_VISIBLE_DEVICES=0,1 python Train.py --basic_model=ConceptCondition --use_MIA=True --iteration_times=2

Visual Regional Attention (Coming Soon!)

  • Baseline
CUDA_VISIBLE_DEVICES=0,1 python Train.py --basic_model=VisualRegionalAttention
  • Baseline w/ MIA
CUDA_VISIBLE_DEVICES=0,1 python Train.py --basic_model=VisualRegionalAttention --use_MIA=True --iteration_times=2

Testing

You can test the trained model with Test.py, but don't forget to download the coco-caption code from link1 or link2 into coco directory.

  • Baseline
CUDA_VISIBLE_DEVICES=0 python Test.py  --basic_model=basic_model_name

Note: basic_model_name = (VisualAttention, ConceptAttention, VisualCondition, ConceptCondition, VisualRegionalAttention)

  • Baseline w/ MIA
CUDA_VISIBLE_DEVICES=0 python Test.py  --basic_model=basic_model_name --use_MIA=True --iteration_times=2

Reference

If you use this code or our extracted image concepts as part of any published research, please acknowledge the following paper

@inproceedings{Liu2019MIA,
  author    = {Fenglin Liu and
               Yuanxin Liu and
               Xuancheng Ren and
               Xiaodong He and
               Xu Sun},
  title     = {Aligning Visual Regions and Textual Concepts for Semantic-Grounded
               Image Representations},
  booktitle = {NeurIPS},
  pages     = {6847--6857},
  year      = {2019}
}

Acknowledgements

Thanks to Pytorch team for providing Pytorch, COCO team for providing dataset, Tsung-Yi Lin for providing evaluation codes for MS COCO caption generation, Yufeng Ma for providing open source repositories and Torchvision ResNet implementation.

Note

If you have any questions about the code or our paper, please send an email to [email protected]

mia's People

Contributors

fenglinliu98 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.