Code Monkey home page Code Monkey logo

caffe-dnnh's Introduction

This is a caffe version implementation of a hash network(DNNH/NINH) for similarity-based visual research.

The hash network is based on this paper: Hanjiang Lai, Yan Pan, Ye Liu, and Shuicheng Yan. Simultaneous feature learning and hash coding with deep neural networks, CVPR 2015.

For more details about the motivation, approaches, implementation, results&analysis and further improvements, please read my post. Any feedback is welcome!

My work

  • Deploy: Given the definition of loss layer, deploy the deep hashing pipeline on linux.
  • Train: Write prototxt to define dnnh and bash files to execute for training on preprocessed triplet CIFAR-10 dataset.
  • Test/Evaluate: Write prototxt to encode images and bash files to execute for image retrieval. Implement the metric of mean average precision (mAP) for evaluation.
  • Analysis: Draw performances for 12-bit, 24-bit and 48-bit hash code and make some analysis.
  • Presentation: Prepare a slide to show my work.

How to run

Dataset

Hash training needs triplet data input. Here I use the triplet CIFAR-10 dataset. To obtain it:

  • You can directly download the related zip file cifar_hash_dataset.7z from BaiduYun or OneDrive and extract it into caffe-dnnh\runtime\cifar_hash_dataset.
  • Or you can process the data by yourself. Scripts are provided for reference in caffe-dnnh\runtime\cifar_hash_dataset_process_scripts\.

Deploy

You may directly download my caffe-dnnh zip and deploy (may need to fix errors due to different environment and version). Or you can follow the instructions to add files/contents to the newest caffe release. Here CAFFE-ROOT refers to your root caffe directory and caffe-dnnh to mine.

  1. Add file caffe-dnnh/src/caffe/layers/triplet_ranking_hinge_loss_layer.cpp to path CAFFE-ROOT/src/caffe/layers and file caffe-dnnh/include/caffe/layers/triplet_ranking_hinge_loss_layer.hpp to path CAFFE-ROOT/include/caffe/layers.
  2. Modify file CAFFE-ROOT/src/caffe/proto/caffe.proto:
    • Add the following code directly.
// Message that stores parameters used by TripletRankingHingeLossLayer
message TripletRankingHingeLossParameter{
   //Dimension for computing
   optional int32 dim = 1 [default = 10];
   //Margin
   optional float margin = 2 [default = 1];
}
  • Find message LayerParameter, add optional TripletRankingHingeLossParameter triplet_ranking_hinge_loss_param = 151; in it.
  • Find message V1LayerParameter, add optional TripletRankingHingeLossParameter triplet_ranking_hinge_loss_param = 43; in it.
  • Find enum LayerType in message V1LayerParameter, add TRIPLET_RANKING_HINGE_LOSS=40; in it.

Attention: the number above like 151, 43 are ID and should not be conflict with others. Search next available in caffe.proto you will find comment like // SolverParameter next available ID: 42 (last added: layer_wise_reduce) and // LayerParameter next available layer-specific ID: 147 (last added: recurrent_param). Use next available ID and update the comment. 3. Add folder caffe-dnnh/runtime to path CAFFE-ROOT/. 4. Modify file CAFFE-ROOT/tools/caffe.cpp refer to caffe-dnnh/tools/caffe.cpp: Search ++++++++++ in caffe-dnnh/tools/caffe.cpp and you will find what I add.

Attention: For CPU/GPU mode switch

  1. check CPU_ONLY := 1 in CAFFE-ROOT/Makefile.config
  2. In folder CAFFE-ROOT/runtime/: check solver_mode: GPU in all solver.prototxt files (e.g. CAFFE-ROOT/runtime/12bit/train12_solver.prototxt), check -gpu=0 in all run_test.sh files (e.g. CAFFE-ROOT/runtime/12bit/run_test.sh)

Then follow the official Installation instructions to compile. Good luck!

Train

cd caffe-dnnh/runtime/12bit # or: 24bit, 48bit
sh ./run_train.sh # or: sh ./resume_train.sh

run_train.sh train deep hash neural network defined in prototxt and result models are stored in path caffe-dnnh/runtime/model. You can modify parameters like max iteration, snapshot in solver prototxt. Also note that tens of thousands iterations take time, so you are recommended to train with GPU mode in the background like nohup sh ./run_train.sh & and check output with command tail -100 nohup.out. Read corresponding files for more details.

Test

cd caffe-dnnh/runtime/12bit # or: 24bit, 48bit
sh ./run_test.sh

run_test.sh: uses forward pass of dnnh defined in test12_query.prototxt and test12_pool.prototxt to encode query images and pool set images. Then compile and run CAFFE-ROOT/runtime/evaluate_map.cpp for image retrieval evaluation. You can modify parameters (e.g. ITER in run_test.sh and top_neighbor_num in evaluate_map.cpp). Read corresponding files for more details.

Credits

I really appreciate their works!

  1. Dr.Tao Mei draw an outline of this research for me.
  2. The triplet ranking hinge loss layer is implemented by @FuchenUSTC in his caffe repository.
  3. Preprocessed triplet CIFAR-10 dataset and related scripts are shared by @FuchenUSTC. Read my post#dataset for more details about its structure so as to understand the structure of DNNH defined in prototxt.
  4. Networks structure and parameters are refered to codes_triplet_hashing1.zip provide by first author Hanjiang Lai.

caffe-dnnh's People

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar

caffe-dnnh's Issues

details of partition of NUS-WIDE

hello, I'm asking a details of the partition of the NUS-WIDE data set: how do you randomly select 100 samples in each classes for the query set (and so does the training set)?
I mean, as it's a multi-label data set, a same very sample might be selected several times during the sampling of each classes.
thanks

Can't find query_hashcode.txt

I run "run_test12.sh". I find a file "pool_hashcode.txt" but miss "query_hashcode.txt".

image

It causes a problem when run evaluate_map.cpp.
Please help

cifar_hash_dataset.7z downloading

Please, share this dataset on some more available sources like google drive or smth. I just cannot create an baidu account for downloading your dataset. I would much appreciate your help

use keras

Hi, I use keras to realize this paper. but I do not know whether it has problem or not.
`#!/usr/bin/env python3

-- coding: utf-8 --

"""
Created on Thu Jul 5 10:26:49 2018

@author: dolphin
"""

from future import absolute_import
from future import print_function
import keras
from keras import backend as K
import tensorflow as tf

#define slice function
def slice_f(x,c1,c2):
return x[c1:c2,:]

#define a slice layer using Lamda layer
def slice_hash_layer(inputX,arguemnts):
slice_s=keras.layers.Lambda(slice_f,arguments=arguemnts)(inputX)
hash_s=keras.layers.Dense(1,activation='sigmoid',kernel_initializer=keras.initializers.TruncatedNormal(stddev=0.1))(slice_s)
return hash_s

def deep_hash_model_():
inputX=keras.Input(shape=(224,224,3))
#conv1
conv1_1=keras.layers.Conv2D(96,kernel_size=(11,11),strides=(4,4),activation='relu',
kernel_initializer=keras.initializers.TruncatedNormal(stddev=0.01))(inputX)
conv1_2=keras.layers.Conv2D(96,kernel_size=(1,1),striders=(1,1),activation='relu',
kernel_initializer=keras.initializers.TruncatedNormal(stddev=0.05))(conv1_1)
pool1=keras.layers.MaxPool2D(pool_size=(3,3),strides = (2,2),padding='valid')(conv1_2)
pool1=keras.layers.Dropout(0.5)(pool1)

#conv2
conv2_1=keras.layers.Conv2D(256,kernel_size=(5,5),strides=(1,1),activation='relu',
                      kernel_initializer=keras.initializers.TruncatedNormal(stddev=0.01),
                      padding='same')(pool1)
conv2_2=keras.layers.Conv2D(256,kernel_size=(1,1),strides=(1,1),activation='relu',
                      kernel_initializer=keras.initializers.TruncatedNormal(stddev=0.05),
                      padding='same')(conv2_1)
pool2=keras.layers.MaxPool2D(pool_size=(3,3),strides = (2,2),padding='valid')(conv2_2)

#conv3
conv3_1=keras.layers.Conv2D(384,kernel_size=(3,3),strides=(1,1),activation='relu',
                      kernel_initializer=keras.initializers.TruncatedNormal(stddev=0.01),
                      padding='valid')(pool2)
conv3_2=keras.layers.Conv2D(384,kernel_size=(1,1),strides=(1,1),activation='relu',
                            kernel_initializer=keras.initializers.TruncatedNormal(stddev=0.05),
                            padding='valid')(conv3_1)
pool3=keras.layers.MaxPool2D(pool_size=(3,3),strides = (2,2),padding='valid')(conv3_2)

#conv4
conv4_1=keras.layers.Conv2D(1024,kernel_size=(3,3),strides=(1,1),activation='relu',
                      kernel_initializer=keras.initializers.TruncatedNormal(stddev=0.01),
                      padding='valid')(pool3)
conv4_2=keras.layers.Conv2D(1200,kernel_size=(1,1),strides=(1,1),activation='relu',
                            kernel_initializer=keras.initializers.TruncatedNormal(stddev=0.05),
                            padding='valid')(conv4_1)
pool4=keras.layers.AvgPool2D(kernel_size=(6,6),strides=(1,1),padding='valid')(conv4_2)

#divide and encode module
#use 24 hash-code-bit
slice_hash=[]
for i in range(24):
    arguments={'c1': i*50, 'c2': (i+1)*50}
    slice_hash.append(slice_hash_layer(pool4,arguments))
merge_one=keras.layers.concatenate(slice_hash)  
final_model = keras.models.Model(inputs=inputX, outputs=merge_one)
return final_model

deep_hash_model = deep_hash_model_()

#triplt loss function
#read a line from text,each line include a triple,including three images:
#query_image,positive_image,negtivate_image
batch_size=24
def triplt_loss(y_true,y_pred):
loss=tf.convert_to_tensor(0,dtype=tf.float32)
total_loss=tf.convert_to_tensor(0,dtype=tf.float32)
g=tf.constant(1.0,shape=[1],dtype=tf.float32)
zero=tf.constant(0.0,shape=[1],dtype=tf.float32)
for i in range(0,batch_size,3):
q_embedding=y_pred[i]
p_embedding=y_pred[i+1]
n_embedding=y_pred[i+2]
D_q_p=K.sqrt(K.sum((q_embedding-p_embedding)**2))
D_q_n=K.sqrt(K.sum((q_embedding-n_embedding)**2))
loss=tf.maximum(g+D_q_p-D_q_n,zero)
total_loss=total_loss+loss
total_loss=total_loss/(batch_size/3)
return total_loss `

Which one is the anchor of the triplet input when organizing the dataset?

Thanks for reading this issue.
I am trying to organize ImageNet into tripet formation using the provided scripts, but I am a little confused about how to write the relevant LISTFILE. Among "Pic1.jpg Pic2.jpg Pic3.jpg", which one is the anchor image and which one is the positive sample?

Thank you!

test problem

Hello
Thanks for your hard work.
After downloading your source code and datasets and deployed it in runtime/cifar_hash_dataset, I ran./run_train.sh in runtime/12bit and after serveral hours I got

.......
I0913 18:39:05.471935 10028 solver.cpp:311] Iteration 100000, loss = 0.000143271
I0913 18:39:05.471941 10028 solver.cpp:316] Optimization Done.

But when I ran ./run_test.sh after training, I got

......
I0913 19:03:20.272930 12613 caffe.cpp:319] Running for iteration#589...
I0913 19:03:20.296061 12613 caffe.cpp:354] Loss: 0
Reading images's hashcode from files...
Starting querying images in 58999 images' pool...
Finish querying 1 images in 58999 pool images set within the top 10000 returned neighbors!
Mean Average Precision(MAP): 0
MAP of label#1: 0
MAP of label#2: 0
MAP of label#3: 0
MAP of label#4: 0
MAP of label#5: 0
MAP of label#6: 0
MAP of label#7: 0
MAP of label#8: 0
MAP of label#9: 0
MAP of label#10: 0

I have not modified any code and don't know what I did wrong. Could you please give me some advices?
Thank you.

io.h: No such file or directory

Hello, I am compiling the "convert_triplet_imageset.cpp", and the error is io.h: No such file or directory. Even though there is a "io.h" in "/usr/include/x86_64-linux-gnu/sys", it cannot solve my problem. When I copy the file to "/usr/include", the result just likes the result when I simply comment the "#include <io.h>", in "io.cpp". i.e. "error: ‘_open’ was not declared in this scope","error: ‘_close’ was not declared in this scope" and so on. So what should I do to compile the file "convert_triplet_imageset.cpp"

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.