kaiyangzhou / deep-person-reid Goto Github PK

Torchreid: Deep learning person re-identification in PyTorch.

Home Page: https://kaiyangzhou.github.io/deep-person-reid/

License: MIT License

Python 97.76% Makefile 0.04% Shell 0.08% C++ 0.20% Cuda 0.40% Cython 1.40% Dockerfile 0.12%

deep-learning pytorch computer-vision machine-learning deep-neural-networks person-reidentification person-reid image-retrieval re-ranking metric-learning

deep-person-reid's People

Contributors

Stargazers

Watchers

Forkers

xialuxi dichen-cd cavalleria statml morty2049 hyzcn december-boy toanhvu liviust jianchengss ansidong xzm2004260 vanpersie32 fitrialif hsdong2012 hedgefair column6942 omarhamdoun murari023 changwuxie distant1219 booyoungxu qidiso crusader183 hf1995 ganghu1993 codes-kzhan sunshinezhihuo xiaoxying wind09 xiaoyee dl-85 dangweili wang93 jiyanggao daijucug yxgeee bityangke tayfunates absorbguo ca-joe-yang ml-lab chunfeima locussam ercanburak qini7 luzai zhihuilics michuanhaohao yxwu-bjtu cupwater zgsxwsdxg mahfujau hsouporto xiaoweihappy123 liujiandu nlj0011 lvaleriu ruiyoua meimeiainaonao xun-yang asuradayuci amena6490 yoyokitartora apearlriverwater kikyou123 itbeyond1230 yangw502 zysolanine iamweiweishi rquispec reddevstorm yefengyue songbaihust jsmilemsj garygaryry bhuwendongchao rohitkeshari redhat12345 robinwenqian weiliangxiao manuyashchaudhary baiyancheng20 xqpinitial winroot zhangleiedu leidaguo deecamp5 doriswzg afcarl tiancity-nju meelement eric-zhang1990 dingrizhi ahuirecome suzhoushr yanbeic legendzzzzz dypromise jules-diez

deep-person-reid's Issues

Segmentation fault (core dumped)

==> Test
Extracted features for query set, obtained 3368-by-2048 matrix
Extracted features for gallery set, obtained 15913-by-2048 matrix
==> BatchTime(s)/BatchSize(img): 0.014/64
Segmentation fault (core dumped)
测试的时候出现这个问题，谢谢

A question in MuDeep

In Table 1 of the paper, we can see the structure of three parts Layers.There are some different points between your code and paper. In Multi-scale-A, stream id 3 and stream 4 seems to be wrong in paper. Because the number of last layer's output is not fit the number of next layer's input. What do you think about it?

Training for DukeMTMC-VideoReID

@luzai @KaiyangZhou Hi i have taken the default DukeMTMC-VideoReID dataset and training it but in the end where is the model getting saved? should i create a director called save-model and pass it as a arguement ? can you pls share the command / process to view the saved model file

Accelerate evaluation for a large gallery

Thanks for your repo!

Ref. to commits 0adb4f2. I did not try multiprocess.Pool but I tried Cython. It speeds up. The time for function eval_market1501 on Market1501 dataset reduces from 163.857s to 7.393s.

The modification are mainly:

add eval_market1501_wrap in eval.pyx
use setup.py to compile.

eval.pyx may seems lengthy, but I make some basic test, it calculates the approximate mAP and cmc as the original function eval_market1501. I am sorry that it seems still have some problem on precision . I will try to fix it as soon as possible.

eval.pyx is:

# cython: boundscheck=False, wraparound=False, nonecheck=False, cdivision=True

cpdef eval_market1501_wrap(distmat,
        q_pids,
        g_pids,
        q_camids,
        g_camids,
        max_rank):
    distmat = np.asarray(distmat,dtype = np.float32)
    q_pids = np.asarray(q_pids, dtype = np.int64)
    g_pids = np.asarray(g_pids , dtype = np.int64)
    q_camids=np.asarray(q_camids,dtype=np.int64)
    g_camids=np.asarray(g_camids, dtype=np.int64)
    return eval_market1501(distmat, q_pids, g_pids, q_camids, g_camids, max_rank)

cpdef eval_market1501(
        float[:,:] distmat,
        long[:] q_pids,
        long[:] g_pids,
        long[:] q_camids,
        long[:] g_camids,
        long max_rank,
):
    # return 0,0
    cdef:
        long num_q = distmat.shape[0], num_g = distmat.shape[1]

    if num_g < max_rank:
        max_rank = num_g
        print("Note: number of gallery samples is quite small, got {}".format(num_g))

    cdef:
        long[:,:] indices = np.argsort(distmat, axis=1)
        long[:,:] matches = (np.asarray(g_pids)[np.asarray(indices)] == np.asarray(q_pids)[:, np.newaxis]).astype(np.int64)

        float[:,:] all_cmc = np.zeros((num_q,max_rank),dtype=np.float32)
        float[:] all_AP = np.zeros(num_q,dtype=np.float32)

        long q_pid, q_camid
        long[:] order=np.zeros(num_g,dtype=np.int64), keep =np.zeros(num_g,dtype=np.int64)

        long num_valid_q = 0, q_idx, idx
        # long[:] orig_cmc=np.zeros(num_g,dtype=np.int64)
        float[:] orig_cmc=np.zeros(num_g,dtype=np.float32)
        float[:] cmc=np.zeros(num_g,dtype=np.float32), tmp_cmc=np.zeros(num_g,dtype=np.float32)
        long num_orig_cmc=0
        float num_rel=0.
        float tmp_cmc_sum =0.
        # num_orig_cmc is the valid size of orig_cmc, cmc and tmp_cmc
        unsigned int orig_cmc_flag=0

    for q_idx in range(num_q):
        # get query pid and camid
        q_pid = q_pids[q_idx]
        q_camid = q_camids[q_idx]
        # remove gallery samples that have the same pid and camid with query
        order = indices[q_idx]
        for idx in range(num_g):
            keep[idx] = ( g_pids[order[idx]] !=q_pid) | (g_camids[order[idx]]!=q_camid )
        # compute cmc curve
        num_orig_cmc=0
        orig_cmc_flag=0
        for idx in range(num_g):
            if keep[idx]:
                orig_cmc[num_orig_cmc] = matches[q_idx][idx]
                num_orig_cmc +=1
                orig_cmc_flag=1
        if not orig_cmc_flag:
            # this condition is true when query identity does not appear in gallery
            continue
        my_cusum(orig_cmc,cmc,num_orig_cmc)
        for idx in range(num_orig_cmc):
            if cmc[idx] >1:
                cmc[idx] =1
        all_cmc[q_idx] = cmc[:max_rank]
        num_valid_q+=1
        print('cmc', np.asarray(cmc)[:14].tolist())
        # compute average precision
        # reference: https://en.wikipedia.org/wiki/Evaluation_measures_(information_retrieval)#Average_precision
        num_rel = 0.
        print('ori cmc', np.asarray(orig_cmc)[:14].tolist())
        for idx in range(num_orig_cmc):
            num_rel += orig_cmc[idx]
        my_cusum( orig_cmc, tmp_cmc, num_orig_cmc)
        for idx in range(num_orig_cmc):
            tmp_cmc[idx] = tmp_cmc[idx] / (idx+1.) * orig_cmc[idx]
        print('tmp_cmc', np.asarray(tmp_cmc)[:14].tolist())

        tmp_cmc_sum=my_sum(tmp_cmc,num_orig_cmc)
        if num_rel<1e-32:
            all_AP[q_idx] =0
        else:
            all_AP[q_idx] = tmp_cmc_sum / num_rel
        print('final',tmp_cmc_sum, num_rel)

    assert num_valid_q > 0, "Error: all query identities do not appear in gallery"
    print_dbg('all ap', all_AP)
    print_dbg('all cmc', all_cmc)
    return  np.mean(all_AP), np.asarray(all_cmc).astype(np.float32).sum(axis=0) / num_valid_q

def print_dbg(msg, val):
    print(msg, np.asarray(val))

cpdef void my_cusum(
        cython.numeric[:] src,
        cython.numeric[:] dst,
        long size
    ) nogil:
    cdef:
        long idx
    for idx in range(size):
        if idx==0:
            dst[idx] = src[idx]
        else:
            dst[idx] = src[idx]+dst[idx-1]

cpdef cython.numeric my_sum(
        cython.numeric[:] src,
        long size
) nogil:
    cdef:
        long idx
        cython.numeric ttl=0
    for idx in range(size):
        ttl+=src[idx]
    return ttl

setup.py is:

from distutils.core import setup
from distutils.extension import Extension
from Cython.Distutils import build_ext

ext_modules = [Extension("cython_eval",
                         ["eval.pyx"],
                         libraries=["m"],
                         include_dirs=[numpy_include],
                         extra_compile_args=
                         ["-ffast-math","-Wno-cpp", "-Wno-unused-function"]                
                         ),
               ]

setup(
    name='lib',
    cmdclass={"build_ext": build_ext},
    ext_modules=ext_modules)

can we use eval_cuhk03 for VIPeR evaluation?

Hi! Thanks for the evaluation baseline.
Can we used the cuhk03_eval to evaluate VIPeR dataset?
Thank.

Problem when trying to evaluate prid

~:/python train_vid_model_xent.py -d prid -a resnet50 --evaluate --resume saved-models/resnet50_xent_prid.pth.tar --save-dir log/resnet50-xent-prid --test-batch 2

Args:Namespace(arch='resnet50', dataset='prid', eval_step=-1, evaluate=True, gamma=0.1, gpu_devices='0', height=256, lr=0.0003, max_epoch=15, pool='avg', print_freq=10, resume='saved-models/resnet50_xent_prid.pth.tar', save_dir='log/resnet50-xent-prid', seed=1, seq_len=15, start_epoch=0, stepsize=5, test_batch=2, train_batch=32, use_cpu=False, weight_decay=0.0005, width=128, workers=4)

Currently using GPU 0
Initializing dataset prid

train identites: 89, # test identites 89

=> PRID-2011 loaded
Dataset statistics:

subset | # ids | # tracklets

train | 89 | 178
query | 89 | 89
gallery | 89 | 89

total | 178 | 356
number of images per tracklet: 28 ~ 675, average 108.1

Initializing model: resnet50
Model size: 23.69039M
Loading checkpoint from 'saved-models/resnet50_xent_prid.pth.tar'
Evaluate only
Extracted features for query set, obtained 89-by-2048 matrix
Traceback (most recent call last):
File "train_vid_model_xent.py", line 269, in
main()
File "train_vid_model_xent.py", line 150, in main
test(model, queryloader, galleryloader, args.pool, use_gpu)
File "train_vid_model_xent.py", line 232, in test
imgs = imgs.view(b*s, c, h, w)
RuntimeError: invalid argument 2: size '[15 x 3 x 256 x 128]' is invalid for input with 2949120 elements at /pytorch/torch/lib/TH/THStorage.c:41

How to set the parameters for hacnn?

Thank you for providing the code.I learn a lot from it. However, I meet some problem when train the HACNN with xent loss, I set 120 epoch, and the parameters are:
--height 160 --width 64 --max-epoch 120-train-batch 32 --test-batch 32 --stepsize 20 --eval-step 20
However, the Rank1 result on Market-1501 is only 82.1% with the mAP 61.2%, which are worsing than your providing results. When I try the hacnn with xent+htri, the results in epoch150 are only 57.2% Rank-1 with 35.6% mAP. I don't know whether the hyperparameters are set right. Could you teach me how to set the parameter when you train the hacnn?
Thanks!

Error "UnicodeDecodeError: 'ascii' codec can't decode byte 0xa9 in position 1: ordinal not in range(128)" when running the README example with python3

When I try to run the example as specified in the README, like this:

python train_imgreid_xent.py -d market1501 -a resnet50 --evaluate --resume saved-models/resnet50_xent_market1501.pth.tar --save-dir log/resnet50-xent-market1501 --test-batch 100 --gpu-devices 0

I get the following error:

Warning: Cython evaluation is UNAVAILABLE
==========
Args:Namespace(arch='resnet50', cuhk03_classic_split=False, cuhk03_labeled=False, dataset='market1501', eval_step=-1, evaluate=True, fixbase_epoch=0, fixbase_lr=0.0003, freeze_bn=False, gamma=0.1, gpu_devices='0', height=256, load_weights='', lr=0.0003, max_epoch=60, optim='adam', print_freq=10, resume='saved-models/resnet50_xent_market1501.pth.tar', root='data', save_dir='log/resnet50-xent-market1501', seed=1, split_id=0, start_epoch=0, start_eval=0, stepsize=[20, 40], test_batch=100, train_batch=32, use_cpu=False, use_lmdb=False, use_metric_cuhk03=False, vis_ranked_res=False, weight_decay=0.0005, width=128, workers=4)
==========
Currently using GPU 0
Initializing dataset market1501
=> Market1501 loaded
Dataset statistics:
  ------------------------------
  subset   | # ids | # images
  ------------------------------
  train    |   751 |    12936
  query    |   750 |     3368
  gallery  |   751 |    15913
  ------------------------------
  total    |  1501 |    32217
  ------------------------------
Initializing model: resnet50
Model size: 23.508 M
Traceback (most recent call last):
  File "train_imgreid_xent.py", line 379, in <module>
    main()
  File "train_imgreid_xent.py", line 196, in main
    checkpoint = torch.load(args.resume)
  File "/opt/conda/lib/python3.6/site-packages/torch/serialization.py", line 303, in load
    return _load(f, map_location, pickle_module)
  File "/opt/conda/lib/python3.6/site-packages/torch/serialization.py", line 469, in _load
    result = unpickler.load()
UnicodeDecodeError: 'ascii' codec can't decode byte 0xa9 in position 1: ordinal not in range(128)

It appears that there is something wrong with the reading of the saved model.

A google search yields the following results in this SO answer. However, this is only valid for python2 not python3 which I am using. (I tried using it but did it did not work)

Is it possible that the code is not python3 compatible? Or is there something else?

使用triplet在相同的环境下测试无法达到正常的结果

您好，我按照您的环境（PyTorch (0.4.0)，torchvision (0.2.1)，Python2）以及方法复现了cross entropy loss+ triplet loss 的reid，数据集是market1501，最后的结果只能有：
单卡 Rank-1/Rank-5/Rank-10: 81.0/92.6/95.4 mAP:63.1
多（4）卡 Rank-1/Rank-5/Rank-10: 43.2/66.7/75.8 mAP:24.2
单独复现cross entropy loss的是结果正常能达到您的结果，加上triplet loss会有以上情况出现。使用您提供的pretrained的模型测试也是能够正常达到rank-1 87的结果
请问可能是哪里有问题呢
谢谢

The program stopped when the epoch did not reach the specified number.

Hello，The epoch I set is 30 times, but it runs to 20 times and it stops (on the market1501 dataset also). Is there any problem with the following commands?

The wrong hints are as follows：
Epoch: [20][3970/3983] Time 0.359 (0.362) Data 0.009 (0.008) Loss 1.0619 (1.0747)
Epoch: [20][3980/3983] Time 0.341 (0.362) Data 0.008 (0.008) Loss 1.0882 (1.0747)
==> Test
Extracted features for query set, obtained 1980-by-2048 matrix
Extracted features for gallery set, obtained 9330-by-2048 matrix
==> BatchTime(s)/BatchSize(img): 0.109/960
[3]+ 已杀死

The commands are as follows：
python train_img_model_xent.py -d market1501 -a resnet50 --max-epoch 30 --train-batch 128 --test-batch 64 --stepsize 20 --eval-step 20 --save-dir log/resnet50-xent-market1501 --gpu-devices 3
python train_vid_model_xent.py -d mars -a resnet50 --max-epoch 30 --train-batch 128 --test-batch 64 --stepsize 20 --eval-step 20 --save-dir log/resnet50-xent-mars --gpu-devices 0

Hyperparameter settings with MobileNet

Dear Author, Thanks a lot for your excellent work. If I may, can I ask you to provide the details of your hyperparameter settings? For example, I am training MobileNet V2 on Market, when I use the default settings, such as lr=0.0003, the training can only get 11.9% accuracy. I use lr=0.01 and decrease by x0.1 every 20 epoches, the training accuracy can hit 98.%, but the test performance is 68.1%, 85.5%,90.4%(rank 1 5 10) and map is 44.6%, which is much lower than your results. Can you point out what's wrong? FYI: I did't use crossentropy with label smooth, I use standard cross-entropy. will this have an impact on the influence?

Error with input size assertion when running HACNN on Duke MTMC

When I run HACNN on Duke MTMC using

python train_vidreid_xent.py -d dukemtmcvidreid -a hacnn --evaluate --resume saved-models/hacnn_xent_dukemtmcreid.pth.tar --save-dir log/resnet50-xent-dukemtmc --test-batch 2 --gpu-devices 0

I get the following error:

Warning: Cython evaluation is UNAVAILABLE
==========
Args:Namespace(arch='hacnn', dataset='dukemtmcvidreid', eval_step=-1, evaluate=True, fixbase_epoch=0, fixbase_lr=0.0003, freeze_bn=False, gamma=0.1, gpu_devices='0', height=256, label_smooth=False, load_weights='', lr=0.0003, max_epoch=15, optim='adam', pool='avg', print_freq=10, resume='saved-models/hacnn_xent_dukemtmcreid.pth.tar', root='data', save_dir='log/resnet50-xent-dukemtmc', seed=1, seq_len=15, start_epoch=0, start_eval=0, stepsize=[20, 40], test_batch=2, train_batch=32, use_cpu=False, vis_ranked_res=False, weight_decay=0.0005, width=128, workers=4)
==========
Currently using GPU 0
Initializing dataset dukemtmcvidreid
This dataset has been downloaded.
Note: if root path is changed, the previously generated json files need to be re-generated (so delete them first)
=> Automatically generating split (might take a while for the first time, have a coffe)
Processing data/dukemtmc-vidreid/DukeMTMC-VideoReID/train with 702 person identities
Saving split to data/dukemtmc-vidreid/split_train.json
=> Automatically generating split (might take a while for the first time, have a coffe)
Processing data/dukemtmc-vidreid/DukeMTMC-VideoReID/query with 702 person identities
Saving split to data/dukemtmc-vidreid/split_query.json
=> Automatically generating split (might take a while for the first time, have a coffe)
Processing data/dukemtmc-vidreid/DukeMTMC-VideoReID/gallery with 1110 person identities
Warn: index name F0001 in data/dukemtmc-vidreid/DukeMTMC-VideoReID/gallery/0002/2197 is missing, jump to next
Saving split to data/dukemtmc-vidreid/split_gallery.json
=> DukeMTMC-VideoReID loaded
Dataset statistics:
  ------------------------------
  subset   | # ids | # tracklets
  ------------------------------
  train    |   702 |     2196
  query    |   702 |      702
  gallery  |  1110 |     2636
  ------------------------------
  total    |  1404 |     5534
  number of images per tracklet: 1 ~ 9324, average 167.6
  ------------------------------
Initializing model: hacnn
Model size: 3.649 M
Loaded checkpoint from 'saved-models/hacnn_xent_dukemtmcreid.pth.tar'
- start_epoch: 299
- rank1: 0.8070017695426941
Evaluate only
Traceback (most recent call last):
  File "train_vidreid_xent.py", line 395, in <module>
    main()
  File "train_vidreid_xent.py", line 216, in main
    distmat = test(model, queryloader, galleryloader, args.pool, use_gpu, return_distmat=True)
  File "train_vidreid_xent.py", line 328, in test
    features = model(imgs)
  File "/opt/conda/lib/python3.6/site-packages/torch/nn/modules/module.py", line 491, in __call__
    result = self.forward(*input, **kwargs)
  File "/opt/conda/lib/python3.6/site-packages/torch/nn/parallel/data_parallel.py", line 112, in forward
    return self.module(*inputs[0], **kwargs[0])
  File "/opt/conda/lib/python3.6/site-packages/torch/nn/modules/module.py", line 491, in __call__
    result = self.forward(*input, **kwargs)
  File "/data/rooijenalv/Projects/gitlab-dv/projects/fietsreid/deployment/external/deep-person-reid/torchreid/models/hacnn.py", line 283, in forward
    "Input size does not match, expected (160, 64) but got ({}, {})".format(x.size(2), x.size(3))
AssertionError: Input size does not match, expected (160, 64) but got (256, 128)

How can I fix this?

Testing of a trained model

@luzai @KaiyangZhou Hi just wanted to know how to test the trained model for a video ? since the trained model is having an extension of "pth.tar" is there any other method which we can use to test this model

pls share the comand Thank you

Triplet loss

I don‘t understand how you generate triplet images to train the model. It seems that you rewrite the sampler class. But why you set the num_instances is 4? Thanks for your answer.

Using our own dataset for testing

Is there a way to prepare our own dataset and test the models on it? Thanks

AssertionError: Tensors not supported in scatter.

I've met python error:
==> Start training
Traceback (most recent call last):
File "train_img_model_xent.py", line 299, in
main()
File "train_img_model_xent.py", line 167, in main
train(epoch, model, criterion, optimizer, trainloader, use_gpu)
File "train_img_model_xent.py", line 212, in train
outputs = model(imgs)
File "/usr/local/lib/python2.7/dist-packages/torch/nn/modules/module.py", line 357, in call
result = self.forward(*input, **kwargs)
File "/usr/local/lib/python2.7/dist-packages/torch/nn/parallel/data_parallel.py", line 69, in forward
inputs, kwargs = self.scatter(inputs, kwargs, self.device_ids)
File "/usr/local/lib/python2.7/dist-packages/torch/nn/parallel/data_parallel.py", line 80, in scatter
return scatter_kwargs(inputs, kwargs, device_ids, dim=self.dim)
File "/usr/local/lib/python2.7/dist-packages/torch/nn/parallel/scatter_gather.py", line 38, in scatter_kwargs
inputs = scatter(inputs, target_gpus, dim) if inputs else []
File "/usr/local/lib/python2.7/dist-packages/torch/nn/parallel/scatter_gather.py", line 31, in scatter
return scatter_map(inputs)
File "/usr/local/lib/python2.7/dist-packages/torch/nn/parallel/scatter_gather.py", line 18, in scatter_map
return list(zip(*map(scatter_map, obj)))
File "/usr/local/lib/python2.7/dist-packages/torch/nn/parallel/scatter_gather.py", line 16, in scatter_map
assert not torch.is_tensor(obj), "Tensors not supported in scatter."
AssertionError: Tensors not supported in scatter.

TypeError: test() missing 1 required positional argument: 'use_gpu'

I encounter the following error:

TypeError: test() missing 1 required positional argument: 'use_gpu'

when running:

 python train_vidreid_xent.py -d dukemtmcvidreid -a hacnn --evaluate --resume saved-models/hacnn_xent_dukemtmcreid.pth.tar --save-dir log/resnet50-xent-dukemtmc --test-batch 2 --gpu-devices 0

Changing line 214 in file train_vidreid_xent.py from:

distmat = test(model, queryloader, galleryloader, use_gpu, return_distmat=True)

distmat = test(model, queryloader, galleryloader, 'avg', use_gpu, return_distmat=True)

or to

distmat = test(model, queryloader, galleryloader, 'max', use_gpu, return_distmat=True)

Seems to solve it.

Perhaps pool param is forgotten?

HACNN model download link is unavailable

HACNN download link is unavailable in benchmark page.

IndexError: index 2 is out of bounds for axis 0 with size 1 in eval_cuhk03

deep-person-reid/eval_metrics.py

Line 25 in b19c5a3

matches = (g_pids[indices] == q_pids[:, np.newaxis]).astype(np.int32)

Traceback (most recent call last):
  File "eval_metrics.py", line 25, in eval_cuhk03
    matches = (g_pids[indices] == q_pids[:, np.newaxis]).astype(np.int32)
IndexError: index 2 is out of bounds for axis 0 with size 1

Kindly check this

Can the code of pytorch 0.3 version be shared as well?

questions for samplers.py and train_vid_xent_htri.py

Hi, my understanding of samplers.py is that you are trying to use every image in data_source, only limitting the identities in each batch.
So I was wondering if there's some problems in train_vid_xent_htri.py when defines RandomIdentitySampler for trainloader:
RandomIdentitySampler(dataset.train ...
Is it should be:
RandomIdentitySampler(new_train, ...

Thanks very much for your excellent work! It helps me a lot!

Error when loading DenseNet model : Missing key(s) in state_dict

1.Error Description

I tried to perform test on the MARS dataset with the provided trained model densenet121_xent_htri_mars.pth.tar, using this command：
python train_vid_model_xent_htri.py -d mars -a densenet121 --evaluate --resume saved-models/densenet121_xent_htri_mars.pth.tar --save-dir log/densenet121-xent-htri-mars --test-batch 2

And I got the following error info:

RuntimeError: Error(s) in loading state_dict for DenseNet121:
Missing key(s) in state_dict: "base.denseblock1.denselayer1.norm1.running_var",
...

(for detailed info, please refer to the console log)

Some Analysis

1)Although I had already put the provided model file in the dir deep-person-reid/saved-models/,the console log shows that PyTorch still automatically downloaded a pre-trained model from "https://download.pytorch.org/models/densenet121-a639ec97.pth" to the path /home/user/.torch/models/densenet121-a639ec97.pth. After the download process had completed, pyTorch loaded the provided model densenet121_xent_htri_mars.pth.tar.(see the log for detailed info.)

It seems that the provided model densenet121_xent_htri_mars.pth.tar which contains the model's parameters ONLY is not consistent with the model auto downloaded from download.pytorch.org .

2)Why did PyTorch auto download pre-trained model from download.pytorch.org before it loaded densenet121_xent_htri_mars.pth.tar ?
Because in __init__ function of DenseNet.py,"pretrained" is set to true.
This Line:
densenet121 = torchvision.models.densenet121(pretrained=True)

3)I tried to perform test on the MARS dataset with the ResNet model (resnet50_xent_mars.pth.tar). PyTorch again automatically downloaded another pre-trained model from download.pytorch.org, but this time, NO runtime error occurred. Command:
python train_vid_model_xent.py -d mars -a resnet50 --evaluate --resume saved-models/resnet50_xent_mars.pth.tar --save-dir log/resnet50-xent-mars --test-batch 2

I have found a similar issue(#4), but it can't solve this problem.
Does anyone know how to solve it? Thanks in advance.

2.Additional Info

Code Version

May 30, 2018

Model File Hash

File Name : densenet121_xent_htri_mars.pth.tar
URL : http://www.eecs.qmul.ac.uk/~kz303/deep-person-reid/model-zoo/video-models/densenet121_xent_htri_mars.pth.tar
MD5 : 544FFC7520B5719B2F63CAA44F412F49

Environment

Ubuntu 14.04 x64
Anaconda 2 4.4.0 x86_64
Python 2.7.13
PyTorch 0.4.0
torchvision-cpu 0.2.1
use CPU only (no CUDA installed)

Training parameters of Mars dataset

Because the result of my training is always about 3% worse than the result you provided, so I guess it's the training parameters. My training parameters are as follows:

python train_vid_model_xent_htri.py -d mars -a resnet50m --max-epoch 500 --train-batch 128 --test-batch 32 --stepsize 200 --eval-step 20 --save-dir log/resnet50m-xent-htri-mars --gpu-devices 0

len (trainloader) and

Hello, xent and xent-htri's len (trainloader) why is the gap so large that the former 3983 and the last one are only 19. Their train_batch is 128. Is that normal? If it's normal, can you say why?

hyperparameters for ResNet50m + xent + htri

Dear author.

Can you publish the hyperparameters you used to get 90.7| 97.0| 98.2| 76.8 (rank1, rank5, rank10, mAP) in market1501?, I have tried with the default configuration and only got 81.4| 92.5| 95.4| 62.7. it gets stuck in this point.

Thanks a lot for the great work! :)

Do you have time consuming log?

Thank you for your great work in person reidentification. It makes clear which model can perform what performance from the table. And I want to ask, if you have stored the time consuming log of these models? Can you share the time consuming (such as, forward time/batch size) log?
Thank you again for your great work.

Yours,
Hao

Download link is broken

densenet121_xent_htri_mars.pth.tar was not found on this server

”densenet121_xent_htri_mars.pth.tar“ ’s link is broken

HACNN trained on Market1501 - Different results by large margin - Expected behaviour?

HI!
I trained your hacnn architecture and ended up with

Rank-1  : 86.1%
Rank-5  : 94.7%
Rank-10 : 96.6%
Rank-20 : 97.8%
mAP: 67.3%

Which are different from the original results reported in the paper 91.2 for rank1 and 75.7 for mAP
Pls, I want to make sure that this is the expected result and that you are getting the same result.

Thanks

Performance when training a model on one dataset and testing in another?

Just wondering, has anyone tried? Do you think it would be useful to try it or the results will be awful?

Missing Validation Set

Is there a validation set used for choosing the best model before testing the accuracy on the test set?
From what I see from the code, the model with Best Rank 1 is chosen based on test set result. Won't this mean that the Best Rank 1 result is overfitting on the test set?

Error met while evaluating models: 'module' object has no attribute 'Resize'

Hello @KaiyangZhou,

Thank you for this repo. I get an error while trying to evaluate some of the pretrained models. Here are the details:

This is my first time trying this repo. In a clean conda python2.7 environment, I installed PyTorch and torchvision, and the installations seem fine. I prepared the Market1501 dataset and downloaded some pretrained model weights as described in the readme. Then I tried to evaluate the ResNet50M xent+htri pretrained model for Market1501. I used the command as:

python train_img_model_xent_htri.py -d market1501 -a resnet50m --evaluate --resume saved-models/resnet50m_xent_htri_market1501.pth.tar --save-dir log/resnet50m-xenthtri-market1501 --test-batch 32

I got the following error logs:

==========
Args:Namespace(arch='resnet50m', cuhk03_classic_split=False, cuhk03_labeled=False, dataset='market1501', eval_step=-1, evaluate=True, gamma=0.1, gpu_devices='0', height=256, htri_only=False, lr=0.0003, margin=0.3, max_epoch=180, num_instances=4, optim='adam', print_freq=10, resume='saved-models/resnet50m_xent_htri_market1501.pth.tar', root='data', save_dir='log/resnet50m-xenthtri-market1501', seed=1, split_id=0, start_epoch=0, stepsize=60, test_batch=32, train_batch=32, use_cpu=False, use_metric_cuhk03=False, weight_decay=0.0005, width=128, workers=4)
==========
Currently using GPU 0
Initializing dataset market1501
=> Market1501 loaded
Dataset statistics:
  ------------------------------
  subset   | # ids | # images
  ------------------------------
  train    |   751 |    12936
  query    |   750 |     3368
  gallery  |   751 |    15913
  ------------------------------
  total    |  1501 |    32217
  ------------------------------
Traceback (most recent call last):
  File "train_img_model_xent_htri.py", line 295, in <module>
    main()
  File "train_img_model_xent_htri.py", line 115, in main
    T.Resize((args.height, args.width)),
AttributeError: 'module' object has no attribute 'Resize'

I thought this might be a problem with torchvision, but the latest version of it (0.2.1) is installed. What am I missing here? Can you give some guidance?

Error met when trying to reproduce HACNN result (Python3)

I am trying to reproduce the HACNN result. I've downloaded CUHK03 dataset and process it as described in README. However, when I apply the train command of HACNN as described in README, I've met python error:

I am an beginner of ReID, I an running the repo in my personal bought AWS K80 GPU Ubuntu 16.04 LTS server, in which the Python is 3.6.4 in Anaconda3, and the torch 0.4.0, with torchvision 0.2.1:

Where is the problem do you think? Can you give me some hint, from the capture image above?
Thanks.

hyperparameters for resnet50 + xent loss

Hi, I use the default setting when I train resnet50 with cross-entropy-label-smooth loss on Market1501, i.e. -- max-epoch 60 -- stepsize 20 40, then I got map 67.3, rank-1 85.2. Therefore, I try --max-epoch 180 --stepsize 60, then I got

my results seem to be much worse than yours :

thank you very much.

Incremental training

@luzai @KaiyangZhou Hi what is the command to perform incremental training , lets assume i trained my net for 20 epochs and got the weights now i want to train it for another 30 epochs but i have to start from 20th epoch , how can we do it ?

How to test the models on our videos/real time ?

Hi,
I was able to run the testing command without any error, but how to visualise the result ? How to see the performance on our own videos or real time video streams ? Where can I find the test results of market1501 tested with resnet50 model ?
Any help would be appreciated !!
Thanks.

How to set the parameters of xent+htri that use the densenet-121

Thanks for provide the elegant code;
When I train the densenet 121 with xent+htri loss, I set 80 epoch ;
I train it for three times ,but got not good result:
batch size = 32，epoch = 80 ：rank1 = 60.6%
batch size = 16，epoch = 80 ：rank1 = 61.2%
batch size = 48，epoch = 60 ：rank1 = 58.4%
I don't know why the result is not good like yours,can you teach me how to set the parameter when you train the densenet121;
Thanks

HANN中可视化特征图方法求教

您好，感谢您的代码，对我这种刚入门的小白帮助很大
请问下像HANN等论文中可视化特征图一般用的是什么方法？

How to extract files from benchmarks?

Hi, I'd like to use your pre-trained models to finetune Re-ID model. But I can't extract ‘*.tar‘files you upload. Is there anything wrong? Looking forward to your reply, thx

Where are the models?

Hi, I noticed that all the models on the main page are gone. Did you move them to another server? Are they available? Thank you.

how to initialize with imagenet

i wanna train other nets with pretrained weights on imagenet, how to get it?

Training DukeMTMC-VideoReID dataset

@luzai @KaiyangZhou Hi first of thanks for the wonderful code representation . i have few queries when training the DukeMTMC-VideoReID dataset

1.When i download the dataset from the source i have only train , gallery and video folder there is no .json file. Can you share me how to generate the .json file , since it gives me error camid = int(img_name[5]) - 1 # index-0
ValueError: invalid literal for int() with base 10: 'C'

Can you pls help me out in solving this issues

KeyError: 'state_dict'

I run this code(train_img_model_xent_htri.py) with ubuntu 16.04 and python3.5
and appear this problem

Loading checkpoint from 'saved-models/resnet50_xent_htri_market1501.pth.tar'
Traceback (most recent call last):
File "train_img_model_xent_htri.py", line 290, in
main()
File "train_img_model_xent_htri.py", line 155, in main
model.load_state_dict(checkpoint['state_dict'])
KeyError: 'state_dict'

what is this mean ? i download this resnet50-19c8e357.pth , rename resnet50_xent_htri_market1501.pth.tar and put it into the file saved-models
I don't know how to do.

UnicodeDecodeError: 'ascii' codec can't decode byte 0x9b in position 1: ordinal not in range(128)

Hello @KaiyangZhou , thank you for make this project.
I try to running test on prid 2011 dataset using ResNet50m pretrained model you give, but got error

UnicodeDecodeError: 'ascii' codec can't decode byte 0x9b in position 1: ordinal not in range(128)

I think this error appear because pretrained model resnet50m_xent_prid.pth.tar which can only open using Microsoft Windows OS (I don't know is true or not), but when I type file resnet50m_xent_prid.pth.tar got this message

resnet50m_xent_prid.pth.tar: 8086 relocatable (Microsoft)

How can I use that pretrained model on linux meanwhile pytorch can't install on windows?

Thank you :)

about the hacnn

when I run the model hacnn, I only get the top1: 84.0, map: 62.7. It is very below than top1: 88.7, map: 71.2.

Do you know the reason that may cause this?

RandomIdentitySampler can have a repeating PID

This is a minor issue with the RandomIdentitySampler. The implementation does not necessarily guarantee that the N identities in the batch are unique. For example, ID 28 is sampled twice in this batch.

tensor([ 554,  554,  554,  554,  195,  195,  195,  195,  399,  399,
         399,  399,  527,  527,  527,  527,   28,   28,   28,   28,
         501,  501,  501,  501,  252,  252,  252,  252,  136,  136,
         136,  136,  700,  700,  700,  700,  125,  125,  125,  125,
         120,  120,  120,  120,   68,   68,   68,   68,  577,  577,
         577,  577,  455,  455,  455,  455,   28,   28,   28,   28,
           9,    9,    9,    9,  387,  387,  387,  387,  564,  564,
         564,  564], device='cuda:0')

Can be reproduced by calling any of the htri demo examples.

Random image transformation probability

In transforms.py, the probabilty variable p works the opposite way: Increasing p decreases the probability of performing the transformation. Line 31 can be changed to if self.p < random.random():

And by the way, have you tried training a model with transformation probability other than p=0.5? In the triplet loss paper (In Defense of the Triplet Loss for Person Re-Identification), this probability seems like 1; it is stated that the transformation is done to all images. Can we expect increased scores with higher transformation probabilities?

train_vidreid_xent.py: Function argument TypeError occurred when option "evaluation only" is true

Code Version
Aug 17, 2018

Command

python train_vidreid_xent.py -d ilidsvid -a resnet50  \
--save-dir log/ilidsvid_test-resnet50-xent --gpu-devices 6,7  \
--evaluate --resume log/ilidsvid_train-resnet50-xent/best_model.pth.tar

Error Info

Traceback (most recent call last):
  File "train_vidreid_xent.py", line 379, in <module>
    main()
  File "train_vidreid_xent.py", line 200, in main
    distmat = test(model, queryloader, galleryloader, use_gpu, return_distmat=True)
TypeError: test() takes at least 5 arguments (5 given)

Solution
It seems that at train_vidreid_xent.py/line 200, argument "pool" is missing when invoking the test function:
distmat = test(model, queryloader, galleryloader, use_gpu, return_distmat=True)
fix:
distmat = test(model, queryloader, galleryloader,args.pool, use_gpu, return_distmat=True)

ResNeXt baseline

Hi, would you please add ResNeXt as baseline?

The learning rate of the first epoch is 10x of args.lr

Hi. Great code.
I have one question about the scheduler. The initial scheduler.last_epoch =-1. And in Python3, -1 // 60 = -1, which means the actual learning rate of the first epoch is 10 times of args.lr. I think the correct code sequence is

for epoch in range(args.max_epoch):
scheduler.step()
train(...)
validate(...)

Problem evaluating Market

When i try to evaluate market i get the following error

File "/home/konstantinou/virtualenvs/pytorch_python2/local/lib/python2.7/site-packages/torch/nn/modules/linear.py", line 49, in reset_parameters
stdv = 1. / math.sqrt(self.weight.size(1))
RuntimeError: dimension out of range (expected to be in range of [-1, 0], but got 1)

The command i use is the following
python train_img_model_xent.py -d market1501 -a resnet50 --evaluate --resume saved-models/resnet50_xent_market1501.pth.tar --save-dir log/resnet50m_xent_market1501 --test-batch 32