warren-ding / image-text-embedding Goto Github PK

View Code? Open in Web Editor NEW

This project forked from layumi/image-text-embedding

0.0 0.0 0.0 6.13 MB

Dual-Path Convolutional Image-Text Embedding https://arxiv.org/abs/1711.05535

License: MIT License

C++ 22.30% C 1.62% MATLAB 42.41% M 0.05% Cuda 32.94% Shell 0.67%

image-text-embedding's Introduction

Dual-Path Convolutional Image-Text Embedding

[Paper] [Slide]

This repository contains the code for our paper Dual-Path Convolutional Image-Text Embedding. Thank you for your kindly attention.

What's New: We updated the paper to the second version, adding more illustration about the mechanism of the proposed instance loss.

Install Matconvnet

I have included my Matconvnet in this repo, so you do not need to download it again.You just need to uncomment and modify some lines in gpu_compile.m and run it in Matlab. Try it~ (The code does not support cudnn 6.0. You may just turn off the Enablecudnn or try cudnn5.1)

If you fail in compilation, you may refer to http://www.vlfeat.org/matconvnet/install/

Prepocess Datasets

Extract wrod2vec weights. Follow the instruction in ./word2vector_matlab;
Prepocess the dataset. Follow the instruction in ./dataset. You can choose one dataset to run. Three datasets need different prepocessing. I write the instruction for Flickr30k, MSCOCO and CUHK-PEDES.
Download the model pre-trained on ImageNet. And put the model into './data'.

(bash) wget http://www.vlfeat.org/matconvnet/models/imagenet-resnet-50-dag.mat

Alternatively, you may try VGG16 or VGG19.

You may have a different split with me. (Sorry, this is my fault. I used a random split.) Just for a backup, this is the dictionary archive used in the paper.

Trained Model

You may download the three trained models from GoogleDrive.

Train

For Flickr30k, run train_flickr_word2_1_pool.m for Stage I training.

Run train_flickr_word_Rankloss_shift_hard for Stage II training.

For MSCOCO, run train_coco_word2_1_pool.m for Stage I training.

Run train_coco_Rankloss_shift_hard.m for Stage II training.

For CUHK-PEDES, run train_cuhk_word2_1_pool.m for Stage I training.

Run train_cuhk_word_Rankloss_shift for Stage II training.

Test

Select one model and have fun!

For Flickr30k, run test/extract_pic_feature_word2_plus_52.m and to extract the feature from image and text. Note that you need to change the model path in the code.
For MSCOCO, run test_coco/extract_pic_feature_word2_plus.m and to extract the feature from image and text. Note that you need to change the model path in the code.
For CUHK-PEDES, run test_cuhk/extract_pic_feature_word2_plus_52.m and to extract the feature from image and text. Note that you need to change the model path in the code.

CheckList

Recommend Projects

warren-ding / image-text-embedding Goto Github PK

image-text-embedding's Introduction

Dual-Path Convolutional Image-Text Embedding

Install Matconvnet

Prepocess Datasets

Trained Model

Train

Test

CheckList

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent