nanne / pytorch-netvlad Goto Github PK

View Code? Open in Web Editor NEW

407.0 7.0 107.0 5 KB

Pytorch implementation of NetVlad including training on Pittsburgh.

Python 100.00%

pytorch-netvlad's Introduction

pytorch-NetVlad

Implementation of NetVlad in PyTorch, including code for training the model on the Pittsburgh dataset.

Reproducing the paper

Below are the result as compared to the results in third row in the right column of Table 1:

	R@1	R@5	R@10
NetVlad paper	84.1	94.6	95.5
pytorch-NetVlad(alexnet)	68.6	84.6	89.3
pytorch-NetVlad(vgg16)	85.2	94.8	97.0

Running main.py with train mode and default settings should give similar scores to the ones shown above. Additionally, the model state for the above run is available here: https://drive.google.com/open?id=17luTjZFCX639guSVy00OUtzfTQo4AMF2

Using this checkpoint and the following command you can obtain the results shown above:

python main.py --mode=test --split=val --resume=vgg16_netvlad_checkpoint/

Setup

Dependencies

PyTorch (at least v0.4.0)
Faiss
scipy
- numpy
- sklearn
- h5py
tensorboardX

Data

Running this code requires a copy of the Pittsburgh 250k (available here), and the dataset specifications for the Pittsburgh dataset (available here). pittsburgh.py contains a hardcoded path to a directory, where the code expects directories 000 to 010 with the various Pittsburth database images, a directory queries_real with subdirectories 000 to 010 with the query images, and a directory datasets with the dataset specifications (.mat files).

Usage

main.py contains the majority of the code, and has three different modes (train, test, cluster) which we'll discuss in mode detail below.

Train

In order to initialise the NetVlad layer it is necessary to first run main.py with the correct settings and --mode=cluster. After which a model can be trained using (the following default flags):

python main.py --mode=train --arch=vgg16 --pooling=netvlad --num_clusters=64

The commandline args, the tensorboard data, and the model state will all be saved to opt.runsPath, which subsequently can be used for testing, or to resuming training.

For more information on all commandline arguments run:

python main.py --help

Test

To test a previously trained model on the Pittsburgh 30k testset (replace directory with correct dir for your case):

python main.py --mode=test --resume=runsPath/Nov19_12-00-00_vgg16_netvlad --split=test

The commandline arguments for training were saved, so we shouldnt need to specify them for testing. Additionally, to obtain the 'off the shelf' performance we can also omit the resume directory:

python main.py --mode=test

Cluster

In order to initialise the NetVlad layer we need to first sample from the data and obtain opt.num_clusters centroids. This step is necessary for each configuration of the network and for each dataset. To cluster simply run

python main.py --mode=cluster --arch=vgg16 --pooling=netvlad --num_clusters=64

with the correct values for any additional commandline arguments.

pytorch-netvlad's People

Contributors

Stargazers

Watchers

Forkers

mcimpoi trantorrepository snooble hzhang57 wujinlonglovezhangmiao1314 usmanmaqbool xxlxsyhl skjack mttgdd 598717026 hengshan123 krishnadn liuhuxian enderych cazhang jiangwei221 zjcs w-garcia charles-loomai weixingithubjiang xuebiguodongpai darya6584 twistedmove imiuru06 socome zhangxuliqiang nhonth georgeggggg tjj1998 caiyingfeng finfando hkust-swarm cv-ip silvadirceu tiwarilaxuu sqw475sqw rudyryk gxytcrc oldshuren semihorhan czifan xukuanhit salarim xvanbeurden jeonhyeongjunkw michaelschleiss shmnl chengfenggu yangxingbin chengwei920412 ghamsarimah yuxinfool lajoiepy blingdools aashutosh1997 nizqleo kaitaotang pareespathak d-fan-21 ahmedest61 chenyuyi94 linkanblomman kaizokuouluffy kevin-kai96 jsp-ywu doritodog zxhou alpaficia kaiyi98 multix-amsterdam weihongpan dav1dch hyeonjaegil uqekv dasupradyumna hzhou3 madaxian8 mymuli wendyyuu shehzi-khan shubodh xiaolenga zep-li manmantang oeg1n18 jw2394 linyy99 yznmur huiyan-dev kanelankai chengzegang jiaqingxie hengguan wangzhirui2001 alen123mu944 xbslam

pytorch-netvlad's Issues

The loss didn't decrease when training

Can you share some experiences about how to set learning rate, margin and threads for I used a totally different dataset. The loss keep steady in the training. I have tried adjusting the rate, margin but invalid.

Where can I download the Pittsburgh 250k dataset

Where can I download the Pittsburgh 250k dataset，can you give me the url, thank you!

what does utmDb, utmQ, posDistThr, posDistSqThr and nonTrivPosDistSqThr mean

Hi, I am an undergraduate student who has just started to learn Machine learning. I really want to reproduce the project on my custom dataset.
First, thank you so much for your remarkable work.
However, I am confused about what does utmDb, utmQ, posDistThr, posDistSqThr and nonTrivPosDistSqThr stand for in tokyo247.py. I guess numDb and numQ are the number of database images and query images, it is right?

Training on custom dataset

I have a custom dataset with image pose. Is there any template on how I can train on a custom dataset?

terminology clarification

I wonder if you could kindly clarify differences between the following terms:

batchSize vs cacheBatchSize vs cacheRefreshRate

self.nNegSample vs self.nNeg

self.nontrivial_positives vs self.potential_positives

self.potential_negatives , self.negCache and self.negCache

For Pittsburgh 30k, for instance, I can print these info for two classes WholeDatasetFromStruct and QueryDatasetFromStruct:

----------------------------------------------------------------------------------------------------
                                 Loading pittsburgh in train mode
>> Defining whole_train_set...
>> whole_train_set [17416]: 
WholeDatasetFromStruct
        dataset: pitts30k mode: train
        IMGs (db: 10000 qu: 7416) onlyDB: False => |IMGs|: 17416
        positives: None
        Transforms (if any): Compose(
                          ToTensor()
                          Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225])
                      )

>> Defining whole_training_data_loader given whole_train_set using torch.utils.data.DataLoader...
>> ok!
>> Defining train_set for queries with 0.1 margin...
>> train_set [7320]: 
QueryDatasetFromStruct
        Dataset: pitts30k mode: train margin: 0.1
        nontrivial (+) th: 10.0 m       potential (+) th: 25 m
        Negs: 10 Neg samples: 1000 potential Negs (> 25 m): 7416
        nontrivial pos: 7416 potential pos: 7416
        IMGs (db: 10000 qu: 7416)
        All queries without nontrivial positives: 7320
        negative Cache: 7416
        Transforms (if any): Compose(
                          ToTensor()
                          Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225])
                      )

>> Defining whole_test_set...
>> whole_test_set [17608]: 
WholeDatasetFromStruct
        dataset: pitts30k mode: val
        IMGs (db: 10000 qu: 7608) onlyDB: False => |IMGs|: 17608
        positives: None
        Transforms (if any): Compose(
                          ToTensor()
                          Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225])
                      )

>> Evaluating on val set, query count: 7608

                                              Done
----------------------------------------------------------------------------------------------------

I have already read, #33, #26, #4, #9!
I am trying to adjust NetVLAD to another dataset with GPS info and I am confused how to modify code accordingly?

About cluster centroid

Hi, thanks for great work.
I want to ask you why cluster centroids are fixed during training.
As the network gets updated, then the descriptor statistics will change and then the centroids also should be updated, but it's fixed.
Do you have any insight behind this? I cannot find any reason for this implementation in the paper

The tokyo247 dataset I downloaded contains only CSV and JPG files, but the running code prompts FileNotFoundError: [Errno 2] No such file or directory:'/home/user05/lai/pytorch-NetVlad/datasets/nt/NTh-9hEkoFYb5pUx8ghXRw/_200912 /NTh-9hEkoFYb5pUx8ghXRw__200912_35.650399_139.691617_090_012.jpg' , How do I set this up? Thank you

how to get the pittsburgh250k dataset? #19

I can't find the pittsburgh dataset anywhere.

"Data: Available on request."

confused with ground truth

I read the code, thank you for the excellent work. I have a confused understanding about the ground truth, and I will show the situation with a example.
I found the Database can be classifid into smaller groups, each group has 24 images share the same Lantitute and Lontitute, then when ground truth is searched using Lantitute and Lontitute, each query gets the same distance to all the images in the same db group, that is we can get top-24 ground truth with the same distance, the situation is simplified as bellow,
utmDb (3,2) (3,2) (3,2) (3,2) ... (3,2) / (5,4) (5,4) ... (5,4)
utmQ (1,1) ...
then, the ground truth index for utmQ[0] is [0, 1, 2, 3, ..., 23], during the NetVLAD test, VLAD vector is used to estimate the similarity, if a query belongs to place 1 and get the best top-10 matched index [23 22 21 20 19, 18, 17, 16, 15, 14], then the top-10 recognition percentage will be zero, as the top-10 ground truth is [0, 1, 2, ..., 9] but the top-10 VLAD matched result is [23, 22, ..., 14].
It seems unresonable, how to handle this situation and get a better evaluation way?

image normalization

Hi@Nanne
Thanks for the excellent work!
I feel a little bit comfused about the image pre-processing in the file pittsburgh.py. The function input_transform normalize images with mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225].
However, the official code of NetVLAD (in trainWeakly.py file) conduct the normalization as follows:
ims(:,:,1,:)= ims(:,:,1,:) - net.meta.normalization.averageImage(1,1,1);
ims(:,:,2,:)= ims(:,:,2,:) - net.meta.normalization.averageImage(1,1,2);
ims(:,:,3,:)= ims(:,:,3,:) - net.meta.normalization.averageImage(1,1,3);
where net.meta.normalization.averageImage=[123.6800,116.7790,103.9390].

Could you please give me some hints on how did 'mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225]' gained from 'mean=[123.6800,116.7790,103.9390]'? Thanks a lot

What does vladv2 indicate?

I really appreciate your effort in advance :)

For the first time, when I was a newbie of NetVlad, I set vladv2 as True, but it does not show better performance than the case whose vladv2 is False.

So, I'm wondering what does the vladv2 flag indicate (I believe it is likely to be related to normalization or something)

I'm looking forward to hearing from you.

Question about 'self.cache = None'

In line 208 of pittsburgh.py:
self.cache = None

makes me confused that in line 214 the h5 file can not be loaded with no path here. Does it mean we don't need to load h5 file here or just you don't set it

Questions about alpha in vetvlad.py

hello! Nanne.
could you please tell me the reference about the method of calculating the self.alpha in vetvlad.py? because I cannot find any clues about this in the original paper.
Thanks!

Question about the error “TypeError: Caught TypeError in DataLoader worker process 0.”

Thanks for your great work！

I get errors after run the instruction：
python main.py --mode=train --arch=vgg16 --pooling=netvlad --num_clusters=64

the errors as follows：
`====> Building Cache
Allocated: 60039168
Cached: 9596567552
/home/sqw/anaconda3/envs/pytorch1.4.0/lib/python3.7/site-packages/sklearn/neighbors/_base.py:622: UserWarning: Loky-backed parallel loops cannot be called in a multiprocessing, setting n_jobs=1
n_jobs = effective_n_jobs(self.n_jobs)
/home/sqw/anaconda3/envs/pytorch1.4.0/lib/python3.7/site-packages/sklearn/neighbors/_base.py:622: UserWarning: Loky-backed parallel loops cannot be called in a multiprocessing, setting n_jobs=1
n_jobs = effective_n_jobs(self.n_jobs)
/home/sqw/anaconda3/envs/pytorch1.4.0/lib/python3.7/site-packages/sklearn/neighbors/_base.py:622: UserWarning: Loky-backed parallel loops cannot be called in a multiprocessing, setting n_jobs=1
n_jobs = effective_n_jobs(self.n_jobs)
/home/sqw/anaconda3/envs/pytorch1.4.0/lib/python3.7/site-packages/sklearn/neighbors/_base.py:622: UserWarning: Loky-backed parallel loops cannot be called in a multiprocessing, setting n_jobs=1
n_jobs = effective_n_jobs(self.n_jobs)
/home/sqw/anaconda3/envs/pytorch1.4.0/lib/python3.7/site-packages/sklearn/neighbors/_base.py:622: UserWarning: Loky-backed parallel loops cannot be called in a multiprocessing, setting n_jobs=1
n_jobs = effective_n_jobs(self.n_jobs)
/home/sqw/anaconda3/envs/pytorch1.4.0/lib/python3.7/site-packages/sklearn/neighbors/_base.py:622: UserWarning: Loky-backed parallel loops cannot be called in a multiprocessing, setting n_jobs=1
n_jobs = effective_n_jobs(self.n_jobs)
/home/sqw/anaconda3/envs/pytorch1.4.0/lib/python3.7/site-packages/sklearn/neighbors/_base.py:622: UserWarning: Loky-backed parallel loops cannot be called in a multiprocessing, setting n_jobs=1
n_jobs = effective_n_jobs(self.n_jobs)
/home/sqw/anaconda3/envs/pytorch1.4.0/lib/python3.7/site-packages/sklearn/neighbors/_base.py:622: UserWarning: Loky-backed parallel loops cannot be called in a multiprocessing, setting n_jobs=1
n_jobs = effective_n_jobs(self.n_jobs)
Traceback (most recent call last):
File "main.py", line 515, in
train(epoch)
File "main.py", line 116, in train
negCounts, indices) in enumerate(training_data_loader, startIter):
File "/home/sqw/anaconda3/envs/pytorch1.4.0/lib/python3.7/site-packages/torch/utils/data/dataloader.py", line 345, in next
data = self._next_data()
File "/home/sqw/anaconda3/envs/pytorch1.4.0/lib/python3.7/site-packages/torch/utils/data/dataloader.py", line 856, in _next_data
return self._process_data(data)
File "/home/sqw/anaconda3/envs/pytorch1.4.0/lib/python3.7/site-packages/torch/utils/data/dataloader.py", line 881, in _process_data
data.reraise()
File "/home/sqw/anaconda3/envs/pytorch1.4.0/lib/python3.7/site-packages/torch/_utils.py", line 394, in reraise
raise self.exc_type(msg)
TypeError: Caught TypeError in DataLoader worker process 0.
Original Traceback (most recent call last):
File "/home/sqw/anaconda3/envs/pytorch1.4.0/lib/python3.7/site-packages/torch/utils/data/_utils/worker.py", line 178, in _worker_loop
data = fetcher.fetch(index)
File "/home/sqw/anaconda3/envs/pytorch1.4.0/lib/python3.7/site-packages/torch/utils/data/_utils/fetch.py", line 44, in fetch
data = [self.dataset[idx] for idx in possibly_batched_index]
File "/home/sqw/anaconda3/envs/pytorch1.4.0/lib/python3.7/site-packages/torch/utils/data/_utils/fetch.py", line 44, in
data = [self.dataset[idx] for idx in possibly_batched_index]
File "/home/sqw/anaconda3/envs/pytorch1.4.0/lib/python3.7/site-packages/torch/utils/data/dataset.py", line 257, in getitem
return self.dataset[self.indices[idx]]
File "/home/sqw/Desktop/pytorch-NetVlad-master/pittsburgh.py", line 230, in getitem
negFeat = h5feat[negSample.tolist()]
File "h5py/_objects.pyx", line 54, in h5py._objects.with_phil.wrapper
File "h5py/_objects.pyx", line 55, in h5py._objects.with_phil.wrapper
File "/home/sqw/anaconda3/envs/pytorch1.4.0/lib/python3.7/site-packages/h5py/_hl/dataset.py", line 777, in getitem
selection = sel.select(self.shape, args, dataset=self)
File "/home/sqw/anaconda3/envs/pytorch1.4.0/lib/python3.7/site-packages/h5py/_hl/selections.py", line 82, in select
return selector.make_selection(args)
File "h5py/_selector.pyx", line 272, in h5py._selector.Selector.make_selection
File "h5py/_selector.pyx", line 183, in h5py._selector.Selector.apply_args
TypeError: Indexing arrays must have integer dtypes

/home/sqw/anaconda3/envs/pytorch1.4.0/lib/python3.7/site-packages/sklearn/neighbors/_base.py:622: UserWarning: Loky-backed parallel loops cannot be called in a multiprocessing, setting n_jobs=1
n_jobs = effective_n_jobs(self.n_jobs)
/home/sqw/anaconda3/envs/pytorch1.4.0/lib/python3.7/site-packages/sklearn/neighbors/_base.py:622: UserWarning: Loky-backed parallel loops cannot be called in a multiprocessing, setting n_jobs=1
n_jobs = effective_n_jobs(self.n_jobs)
`

I doubt if the pitts datasets is misplaced.

About experiment results

I have some questions about the experiment results you got, e.g. pytorch-NetVlad(vgg16) | 85.2 | 94.8 | 97.0, this was trained on Pittsburgh 250k or Pittsburgh 30k? I run main.py with train mode and default settings on Pittsburgh 30k, but my test result is 82.29,91.46, 93.66.

Error with Faiss!

My python version is 3.8, and I installed faiss-gpu=1.6.1.There‘re something wrong when I run main.py
"kmeans = faiss.Kmeans(encoder_dim, opt.num_clusters, niter, verbose=False)
TypeError: init() takes 3 positional arguments but 4 were given"

Could you please tell me the solution!

Building Cache taking too long

When I started to train the model after clustering the code starts building the cache and continues to do so for a very long time. I waited for the code to finish building the cache for about 3 hours but yet there is no progress. For the model training, I am utilizing a GCP storage bucket on Google Colab. I also tried the same on a GCP VM instance and got the same issue.

I was wondering if you know if this is something specific to my hardware or is it a bug in the code? For reference, I am attaching the screenshots of specifications of the colab machine and the code block continuously running.

Is the PCA whitening implemented in the codes?

Hi @Nanne , thanks for the remarkable work. It inspires a lot.
It seems that the NetVLAD performs dimensionality reduction , i.e., using the PCA with whitening followed by L2-norm to reduce the features into 4096-D. If I understand correctly, the results reported in readme is implemented by using the 32k-D image representations for testing (on Pitts30k) and no PCA is performed. Is this right?
If it is, is the PCA whitening implemented in the codes?
Looking forward to your reply.

How to replace utm coordinates with GPS coordinates in my dataset?

Can I use GPS coordinates in my dataset to replace the utm coordinates to generate .mat files ? Do I need to modify the code which is used to find radius_neighbors?

Questions about alpha in netvlad.py

hi!
i have a question about alpha in netvlad.py
in the paper, they define the alpha like below.
α is computed so that the the ratio of the largest and the second largest soft assignment weight a_k(x_i) is on average equal to 100.

what does it mean?
in your netvlad.py, you compute alpha like this

knn = NearestNeighbors(n_jobs=-1) #TODO faiss?
knn.fit(traindescs)
del traindescs
dsSq = np.square(knn.kneighbors(clsts, 2)[1])
del knn
self.alpha = (-np.log(0.01) / np.mean(dsSq[:,1] - dsSq[:,0])).item()

but as i know, the function knn.kneighbors returns [ neighbor distance array , neighbor index array]
so is dsSq a neighbor index array of returns?
i think dsSq should be changed to distance not the index

Questions about the training.

Hi, Nanne. You have done a great jod on reproducing Net-Vlad!! I have a little problem on traning this network by using your code. I used almost all the default parameters as you wrote in Readme.md,
python main.py --mode=train --arch=vgg16 --pooling=netvlad --num_clusters=64 --batchSize=2
However, I found that the network seems didn't learn anything. Here are my training result,

Is there anything wrong with me? And how do you set these training parameter to get the performance that you wrote in Readme.md?

Where is the cache path in pittsburgh？

Line 214 in Pittsburgh.py

self.cache = None

but，line 220 will read it：

with h5py.File(self.cache, model='r') as h5:

so, where is the cache path?

thanks

Discussion on Nontrival Positives of TokyoTM dataset

Hi, thanks for your great work on pytorch-NetVLAD.
I found an issue when training the NetVLAD on TokyoTM. In tokyotm_train.mat file, the query images also appear in the database section, which results in the training cannot start (because the very query image is taken as the positive image, whose distance (=0) is naturally less than all of the other negatives.)
And I found in the original matlab code, the positives are chosen by the additional condition (dSq>1)

    function posIDs= nontrivialPosDb(db, iDb)
        [posIDs, dSq]= db.cp.getPosDbIDs(iDb);
        posIDs= posIDs(dSq>1 & dSq<=db.nonTrivPosDistSqThr & db.dbTimeStamp(iDb)~=db.dbTimeStamp(posIDs) );
    end

So, I think it should be added in your code.
Furthermore, I can not understand the reason that query images also appear in database.

What dose nonTrivPosDistSqThr mean in pittsburgh.py?

What dose nonTrivPosDistSqThr mean in pittsburgh.py? I find it in the .mat file, but i don't understand its meaning. For example, in pitts30k_test.mat, nonTrivPosDistSqThr=100.

What is the function of the h5feat?

What is the function about the followed code?

train_set.cache = join(opt.cachePath, train_set.whichSet + '_feat_cache.hdf5')
        with h5py.File(train_set.cache, mode='w') as h5: 
            pool_size = encoder_dim
            if opt.pooling.lower() == 'netvlad': pool_size *= opt.num_clusters
            h5feat = h5.create_dataset("features", 
                    [len(whole_train_set), pool_size], 
                    dtype=np.float32)
            with torch.no_grad():
                for iteration, (input, indices) in enumerate(whole_training_data_loader, 1):
                    input = input.to(device)
                    image_encoding = model.encoder(input)
                    vlad_encoding = model.pool(image_encoding) 
                    h5feat[indices.detach().numpy(), :] = vlad_encoding.detach().cpu().numpy()
                    del input, image_encoding, vlad_encoding

The h5feat was defined by the aboved code, but it is not used by the behind code.
Why is it defined?

Question about the checkpoint given

When I load the checkpoint given by you, it will show this error message:

Missing key(s) in state_dict: "pool.conv.bias".

Does it mean you use vlad instead of vladv2 here? And please tell me how can I deal with it that I can train my model based on this checkpoint.

Windows version without Faiss Library

Hey,
This is an amazing work and I really want to try it out but I am using Windows and FAISS hasn't released its windows version. So is there anyway you may release the windows version of this code too without the requirement of FAISS library,

Thanks.

About test on other datasets

First ,thanks for the remarkable work.Your work has inspired me a lot.
Now, I want to know whether the model trained on our network can be used to test the Oxford Buildings dataset or Paris Buildings dataset？Or can you give me some hints. looking forward to your reply :)

throw TypeError: 'TMPDIR' while running

Userwarning: loky-backed parallel loops cannot be called in a multiprocessing, setting n_jobs=1

After running:
$ python main.py --mode=train --arch=vgg16 --pooling=netvlad --num_clusters=64

and during training, I get this annoying warning:

/home/alijani/.conda/envs/py3_gpu/lib/python3.7/site-packages/sklearn/neighbors/_base.py:620: UserWarning: Loky-backed parallel loops cannot be called in a multiprocessing, setting n_jobs=1

I tracked the warning and this seems to happen in main.py in def train(epoch) and somewhere inside for subIter in range(subsetN):

Is there anyway, I could get rid of this ?
My system info:


Mon Feb  8 14:42:12 2021       
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 460.32.03    Driver Version: 460.32.03    CUDA Version: 11.2     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|===============================+======================+======================|
|   0  Tesla V100-PCIE...  On   | 00000000:86:00.0 Off |                    0 |
| N/A   34C    P0    23W / 250W |      0MiB / 16160MiB |      0%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+
                                                                               
+-----------------------------------------------------------------------------+
| Processes:                                                                  |
|  GPU   GI   CI        PID   Type   Process name                  GPU Memory |
|        ID   ID                                                   Usage      |
|=============================================================================|
|  No running processes found                                                 |
+-----------------------------------------------------------------------------+

_CudaDeviceProperties(name='Tesla V100-PCIE-16GB', major=7, minor=0, total_memory=16160MB, multi_processor_count=80)
                                      Python-Pytorch Platforms                                      
------------------------------------------------------------------------------------------
python    filename                 last modified                 torch     torchvision
------------------------------------------------------------------------------------------
3.7.9     main.py                  Mon Feb  8 14:41:50 2021      1.4.0     0.5.0     
####################################################################################################

Cheers

Some questions about calculating cluster

First ,thanks for the remarkable work.
My question is when i want to train this model , i should train cluster and save it firstly. But i check your code and find when you train cluster, you use the model which not trained before, that means the model is random. I think the descriptors trained by this network is not reliable.

tokyo 24/7 performance

thanks for the remarkable work.
have u done the experiment on the tokyo dataset?
it seems that the performance is too much lower than the results in the paper.

Tokyo 24 / 7

Hi @Nanne and @saibr

I'm a bit lost in regards to evaluating on the Tokyo 24 / 7 dataset. It seems that the dataloader provided in this repo seems to load Tokyo TimeMachine instead of Tokyo 24 / 7? When I try to load the tokyo247.mat file provide from https://www.di.ens.fr/willow/research/netvlad/data/netvlad_v100_datasets.tar.gz I think I understand the first elements as seen below

import scipy.io
mat = scipy.io.loadmat('/tokyo247.mat')

matStruct = mat['dbStruct'].item()
whichSet = matStruct[0].item()
dbImage = [f[0].item() for f in matStruct[1]]
utmDb = matStruct[2].T

However, I don't really understand the remaining part of the dbStruct. E.g. what are my query images? And what are the ground truth for each query image?

Help would really be appreciated. Thank you!

Unexpected key(s) in state_dict: "pool.centroids", "pool.conv.weight".

When i run the cluster, i got the problem:

Traceback (most recent call last):
File "main.py", line 486, in
model.load_state_dict(checkpoint['state_dict'])
File "/home/wwu/anaconda3/envs/UWSOD/lib/python3.7/site-packages/torch/nn/modules/module.py", line 847, in load_state_dict
self.class.name, "\n\t".join(error_msgs)))
RuntimeError: Error(s) in loading state_dict for Module:
Unexpected key(s) in state_dict: "pool.centroids", "pool.conv.weight".

I need help

About Tokyo dataset

Unlike for pitts30k, in the Tokyo dataset the query images are included in the gallery (in train and val set). To solve this issue, in the original matlab implementation the authors add a constraint to choose the positives for each query, which is that the positive should have a different date with respect to the query.
function posIDs= nontrivialPosDb(db, iDb)
[posIDs, dSq]= db.cp.getPosDbIDs(iDb);
posIDs= posIDs(dSq>1 & dSq<=db.nonTrivPosDistSqThr & db.dbTimeStamp(iDb)~=db.dbTimeStamp(posIDs) );
end

To solve this it is necessary to filter the nontrivial_positives in QueryDatasetFromStruct.init, in a way that the positives with the same date as the query are filtered away, and also in the evaluation part, it is needed to filter away the predictions with the same date of the query.
Hope this helps with the Tokyo issues

Can I train this on my custom dataset?

how to get the pittsburgh250k dataset?

I achieved the performance of
Recall Scores:                                                                                           
  top-1          86.1%                                                                                     
  top-5          93.0%                                                                             
  top-10         95.0%
by training from conv3 of vgg16 with learning rate of 0.0001 and applying PCA+whitening followed by L2 normalization (as the original paper introduced) in the inference.

training and testing on the pitts-30k dataset.

Originally posted by @yxgeee in #8 (comment)

License

Hi @Nanne, what is the license of your code? Many thanks!

Why need 7 days to train on a single GPU

The process of caching the negative samples for every 1000 iteration costed too many time. I dont know what is the problem from my side, could you please share your training details with me? like how much time cost for your training?

Need I get a new cluster centroids when testing?

            if not opt.resume: 
                if opt.mode.lower() == 'train':
                    initcache = join(opt.dataPath, 'centroids', opt.arch + '_' + train_set.dataset + '_' + str(opt.num_clusters) +'_desc_cen.hdf5')
                else:
                    initcache = join(opt.dataPath, 'centroids', opt.arch + '_' + whole_test_set.dataset + '_' + str(opt.num_clusters) +'_desc_cen.hdf5')

In this part of code, I can confirm that I need to use the specifc desc_cen when testing, while the code just get cluster for whole_train_set.

    elif opt.mode.lower() == 'cluster':
        print('===> Calculating descriptors and clusters')
        get_clusters(whole_train_set)

Does this mean when I want the desc_cen for test_test, I need to change the whole_train_set here to whole_test_set?

Questions about the main.py code.

When the code is in clustering mode, a layer called L2Norm() is attached to the back of the existing vgg-16 and alexnet, and then clustering to obtain the cluster centroid. However, training mode does not seem to attach these layers. Within NetVLAD layer, the forward() function initially has this content.

if self.normalize_input:
    x = F.normalize(x, p=2, dim=1) # across descriptor dim

Is this taking over the role? And what does this role mean?

Error with faiss kmeans argument

Running:
$ python main.py --mode=cluster --arch=vgg16 --pooling=netvlad --num_clusters=64
return error:

Traceback (most recent call last):
  File "main.py", line 478, in <module>
    get_clusters(whole_train_set)
  File "main.py", line 251, in get_clusters
    kmeans = faiss.Kmeans(encoder_dim, opt.num_clusters, niter, verbose=False) # ERROR!!!!!!!!!!!
TypeError: __init__() takes 3 positional arguments but 4 were given

in main.py and def get_clusters(cluster_set) one can replace
kmeans = faiss.Kmeans(encoder_dim, opt.num_clusters, niter, verbose=False)
with
kmeans = faiss.Kmeans(encoder_dim, opt.num_clusters, niter=niter, verbose=False)
to get rid of that error!

Pretrained recalls

I ran some tests (without performing any training) just by using the pretrained nets + max pool, and the recalls are always 5 to 10 points lower compared to off-the-shelf results on the netvlad paper. Do you think it is only because of the pretrained net, or do you know of any other difference with the original matlab code from the authors?

When I train the model, the following is printed out.

I ran the code below on Google colab.

python main.py --mode=train --arch=vgg16 --pooling=netvlad --num_clusters=64

After a while, the output as below came out.

The first is

/usr/local/lib/python3.6/dist-packages/sklearn/neighbors/_base.py:621: UserWarning: Loky-backed parallel loops cannot be called in a multiprocessing, setting n_jobs=1 n_jobs = effective_n_jobs(self.n_jobs)

The second is
[ True True True True True True True True True True True True True True True True True True True True True True True True True True True True True True True True True True True True True True True True True True True True True True True True True True True True True True True True True True True True True True True True True True True True True True True True True True True True True True True True True True True True True True True True True True True True True True True True True True True True]

The second case is not all like this, but it is a similar form.

I wonder if this output between learning is a symptom that makes wrong results.

What is the version of these package?

What a remarkable work it is! But when I am running the code, it always has the error "Out of Memory".

RuntimeError: CUDA error: out of memory

I guess it is the vesion problem. And it has the warning followed.

UserWarning: Loky-backed parallel loops cannot be called in a multiprocessing, setting n_jobs=1
  n_jobs = effective_n_jobs(self.n_jobs)

Can you tell the versions about followed packages or give me some advice to settle the problem?

PyTorch
Faiss
scipy
numpy
sklearn
h5py
tensorboardX

Sorry for another question!

The Pittsburgh datasets I got contains 000 to 010 directories ,but only contains 000 to 008 9 subdirectories in queries_real.
And I think that's the reason cause the error .
Is it possible not to use 009 010 two folder data?
Could you please give me some hints for the error?

Question about the parameter of normalization at the input.( mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225] ))

def input_transform():
    return transforms.Compose([
        transforms.ToTensor(),
        transforms.Normalize(mean=[0.485, 0.456, 0.406],
                               std=[0.229, 0.224, 0.225]),
    ])

I find the normalization in this transform is same between both Pittsburge and Tokyo dataset. Why different datasets can use same normalize parameter here?

If I use grey-scale image (transform it to 3-channel RGB image by copying 1 channel) here, how should I choose or calculate normalize parameter here?

Question about checkpoint

Is the checkpoint trained by youself closed to 'vd16_pitts30k_conv5_3_vlad_preL2_intra_white.mat', which is posted by the writer of netvlad?

If not, do you think I can train my model based on your checkpoint that my dataset is constructed by grey-scale image?

Thanks for your reply everytime

Could you release the datasets Pittsburgh?

How to use multiple GPUs

Do you have any advice on how to use multiple GPUs?

I tried --nGPU=8 (>1), but I have the input of the pooling layer missing:

Traceback (most recent call last):
  File "main.py", line 537, in <module>
    train(epoch)
  File "main.py", line 138, in train
    vlad_encoding = model.pool(image_encoding)
  File "lib64/python3.6/site-packages/torch/nn/modules/module.py", line 493, in __call__
    result = self.forward(*input, **kwargs)
  File "lib64/python3.6/site-packages/torch/nn/parallel/data_parallel.py", line 152, in forward
    outputs = self.parallel_apply(replicas, inputs, kwargs)
  File "lib64/python3.6/site-packages/torch/nn/parallel/data_parallel.py", line 162, in parallel_apply
    return parallel_apply(replicas, inputs, kwargs, self.device_ids[:len(replicas)])
  File "lib64/python3.6/site-packages/torch/nn/parallel/parallel_apply.py", line 83, in parallel_apply
    raise output
  File "lib64/python3.6/site-packages/torch/nn/parallel/parallel_apply.py", line 59, in _worker
    output = module(*input, **kwargs)
  File "lib64/python3.6/site-packages/torch/nn/modules/module.py", line 493, in __call__
    result = self.forward(*input, **kwargs)
TypeError: forward() missing 1 required positional argument: 'x'

It occurs in the middle of training.
I have observed that --batchSize=1 gives always the error.
--batchSize=2 or --batchSize=8 work fine at first.
However, after some epochs, the number of violating negatives (maximally 10) becomes lower,
and it seems to trigger the error.

One interesting point is that the error does not occur at the base model (e.g., vgg16).