feichtenhofer / detect-track Goto Github PK

View Code? Open in Web Editor NEW

550.0 70.0 110.0 645 KB

Code release for "Detect to Track and Track to Detect", ICCV 2017

Home Page: http://www.robots.ox.ac.uk/~vgg/research/detect-track/

License: Other

MATLAB 96.03% Cuda 1.68% C++ 2.30%

detect-track's Introduction

===============================================================================

Detect to Track and Track to Detect

This repository contains the code for our ICCV 2017 paper:

Christoph Feichtenhofer, Axel Pinz, Andrew Zisserman
"Detect to Track and Track to Detect"
in Proc. ICCV 2017

This repository also contains results for a ResNeXt-101 and Inception-v4 backbone network that perform slightly better (81.6% and 82.1% mAP on ImageNet VID val) than the ResNet-101 backbone (80.0% mAP) used in the conference version of the paper
This code builds on the original Matlab version of R-FCN
We are preparing a Python version of D&T that will support end-to-end training and inference of the RPN, Detector & Tracker.

If you find the code useful for your research, please cite our paper:

    @inproceedings{feichtenhofer2017detect,
      title={Detect to Track and Track to Detect},
      author={Feichtenhofer, Christoph and Pinz, Axel and Zisserman, Andrew},
      booktitle={International Conference on Computer Vision (ICCV)},
      year={2017}
    }

Requirements

The code was tested on Ubuntu 14.04, 16.04 and Windows 10 using NVIDIA Titan X or Z GPUs.

If you have questions regarding the implementation please contact:

Christoph Feichtenhofer <feichtenhofer AT tugraz.at>

================================================================================

Setup

Download the code git clone --recursive https://github.com/feichtenhofer/detect-track

This will also download a modified version of the Caffe deep learning framework. In case of any issues, please follow the installation instructions in the corresponding README as well as on the Caffe website.

Compile the code by running rfcn_build.m.
Edit the file get_root_path.m to adjust the models and data paths.
- Download the ImageNet VID dataset from http://image-net.org/download-images
- Download pretrained model files and the RPN proposals, linked below and unpack them into your models/data directory.
- In case the models are not present, the function check_dl_model will attempt to download the model to the respective directories
- In case the RPN files are not present, the function download_proposals will attempt to download & extract the proposal files to the respective directories

Training

You can train your own models on ImageNet VID as follows
- script_Detect_ILSVRC_vid_ResNet_OHEM_rpn(); to train the image-based Detection network.
- script_DetectTrack_ILSVRC_vid_ResNet_OHEM_rpn(); to train the video-based Detection & Tacking network.

Testing

The scripts above have subroutines that test the learned models after training. You can also test our trained, final models available for download below. We provide three testing functions that work with a different numbers of frames at a time (i.e. processed by one GPU during the forward pass)
1. rfcn_test(); to test the image-based Detection network.
2. rfcn_test_vid(); to test the video-based Detection & Tacking network with 2 frames at a time.
3. rfcn_test_vid_multiframe(); to test the video-based Detection & Tacking network with 3 frames at a time.
Moreover, we provide multiple testing network definitions that can be used for interesting experiments, for examüple
- test_track.prototxt is the most simple form of D&T testing
- test_track_reg.prototxt is a D&T version that additionally regresses the tracking boxes before performing the ROI tracking. Therefore, this procedure produces tracks that tightly encompass the underlying objects, whereas the above function tracks the proposal region (and therefore also the background area).
- test_track_regcls.prototxt is a D&T version that additionally classifies the tracked region and computes the detection confidence as the mean of the detection score from the current frame, as well as the detection score of the tracked region in the next frame. Therefore, this method produces better results, especially if the temporal distance between the frames becomes larger and more complementary information can be integrated from the tracked region

Results on ImageNet VID

The networks are trained as decribed in the paper; i.e. on an intersection of the ImageNet object detection from video (VID) dataset which contains 30 classes in 3862 training videos and and the ImageNet object detection (DET) dataset (only using the data from the 30 VID classes). Validation results on the 555 videos of ImageNet VID validation are shown below.

_Method	_{test structure}	_ResNet-50	_ResNet-101	_ResNeXt-101	_Inception-v4
_Detect	_{test.prototxt}	72.1	74.1	75.9	77.9
_{Detect & Track}	_{test_track.prototxt}	76.5	79.8	81.4	82.0
_{Detect & Track}	_{test_track_regcls.prototxt}	76.7	80.0	81.6	82.1

We show different testing network definitions in the rows and backbone networks in columns. The reported performance is mAP (in %), averaged over all videos and classes in the ImageNet VID validation subset.

Trained models

Download our backbone and final networks trained on ImageNet here:
- ImageNet CLS models: ResNet-50 [OneDrive] / ResNet-101 [OneDrive] / ResNeXt-101 [OneDrive]
- Detect models: ResNet-50 [OneDrive] / ResNet-101 [OneDrive] / ResNeXt-101 [OneDrive]
- Detect & Track models: ResNet-50 [OneDrive] / ResNet-101 [OneDrive] / ResNeXt-101 [OneDrive]

Data

Our models were trained using region proposals extracted using a Region Proposal Network that is trained on the same data as D&T. We use the RPN from craftGBD and provide the extracted proposals for training and testing on ImageNet VID and the DET subsets below.

Pre-computed object proposals for

ImageNet DET: [FTP server] [OneDrive]
ImageNet VID_train: [FTP server] [OneDrive part1] [OneDrive part2]
ImageNet VID_val: [FTP server] [OneDrive]

detect-track's People

Stargazers

Watchers

Forkers

weitaoatvison yuechengli chuanzhidong wanjinchang 10183308 xiangliu886 xjsxujingsong lgen benjamesbabala issac8huxley wwwanghao chunfeima larsoncs jimmy-dq december-boy tqdavid ywwang2013 sputnikav duane-edgington alakia taoshenming coblyfiesta shethr go2star aitechnology xxxxxxxx-dl opencvfun arasharchor wangjuenew damax18 lishengjie5211 ismida jaxball zhanglichao carpinter flir junweima negi111111 signalimagecv murari023 hyfine kyocen sophiezhou cndylan sherylwang fqss0436 shijies zgsxwsdxg wll199566 miracle-fmh sunshinezhihuo amoliu hxl1990 locussam ideaplexus liujiandu michaelbbtiger ted8201 juggernaut93 alohagin tracking-fun merlin2013 knockknock13 lazylazypig starstylesky adwardy hyzcn dreadlord1984 tengsz yangwf1 yashvigulati carlsonzhang ucasqcz xuhuaze707313 klqulei abby666 zhukkang amirunpri2018 iamweiweishi hustmarmot yinrui1991 linxueya zhouzhouha abutaufique dimahwang88 collector-m daiwc codabeans andudu wangyangneu yuzhengshun puhuajiang abeltianxiong tuanthng qiao-maoying fanleishifanlei samihasara ammarnoman jarvisll yanzhaowu

detect-track's Issues

It'd be nice to have a demo video

Hi!
I think the title is self-explanatory. If you could add some demo to run it directly after compiling it would be very nice to have.
Nice work so far!

Regarding the testing.

Hello,
is there any one who is succeeded in testing the model using pre-trained model on the CPU? If yes can you please share the process to do it??

Where is meta_vid.mat?

In the imdb funcition, there is ´ meta_det = load(fullfile(devkit_path, 'data', 'meta_vid.mat')); ´, where can I find the ´meta_vid.mat´? Thanks.

Snow

Hi, I found there was no the file ´mean_image´ in the director models/pre_trained_models/ResNet-101L. Where can I find it? Thanks

File structure and missing directory

Hi,
when we used your code to train, we encountered two problems as follows:

is it possible for you to share the file structure of the root folder(/data/ILSVRC/), including Annotations, Data, devkit, ImageSets and imdb. For example, since we trained on both VID and DET, in root_path/Data, should we put VID_val and VID_train into VID folder, or just leave DET, VID, VID_train and VID_val in parallel?
We have succeed constructing "imdb_ilsvrc15_train_unflip.mat", but every time when it was about to construct the corresponding roidb, we always got the warning saying "GT(xml) file empty/broken: ILSVRC2013_train_extra0/ILSVRC2013_train_00000001". But there's no corresponding annotations for the 2013 extra. We cannot find this directory from any ILSVRC dataset, so could you please share where we can get it?
Thanks so much in advance!!

Regarding implementation of correlation layer.cpp

F0719 13:18:41.276878 6182 correlation_layer.cpp:89] Not Implemented Yet
*** Check failure stack trace: ***

     Illegal instruction detected at Thu Jul 19 13:18:43 2018 +0530

Configuration:
Crash Decoding : Disabled - No sandbox or build area path
Crash Mode : continue (default)
Default Encoding : UTF-8
Deployed : false
Desktop Environment : Unity
GNU C Library : 2.23 stable
Graphics Driver : Unknown hardware
Java Version : Java 1.8.0_144-b01 with Oracle Corporation Java HotSpot(TM) 64-Bit Server VM mixed mode
MATLAB Architecture : glnxa64
MATLAB Entitlement ID : 5221413
MATLAB Root : /usr/local/MATLAB/R2018a
MATLAB Version : 9.4.0.813654 (R2018a)
OpenGL : hardware
Operating System : Ubuntu 16.04.4 LTS
Process ID : 6122
Processor ID : x86 Family 6 Model 142 Stepping 9, GenuineIntel
Session Key : 6d05ce2d-fa08-443f-9dd7-94ebfc528a13
Static TLS mitigation : Enabled: Full
Window System : The X.Org Foundation (11906000), display :0

Fault Count: 1

Abnormal termination

Register State (from fault):
RAX = 0000000000000001 RBX = 0000000022e43660
RCX = 00007f7edd85d788 RDX = 00007f7e891fe6a0
RSP = 00007f7edd85d770 RBP = 0000000014161c00
RSI = 0000000000000000 RDI = 0000000000000000

R8 = 0000000000000081 R9 = 0000000000000000
R10 = 00007f7f0035d650 R11 = 00007f7e88fdc15b
R12 = 000000000000017b R13 = 0000000000002388
R14 = 00007f7edd85d800 R15 = 0000000009198b88

RIP = 00007f7effb6dedc EFL = 0000000000010246

CS = 0033 FS = 0000 GS = 0000

Stack Trace (from fault):
[ 0] 0x00007f7effb6dedc [ 1] 0x00007f7e88fe3789 [ 2] 0x00007f7e88fdc360 [ 3] 0x00007f7e88fdee1e [ 4] 0x00007f7e893bd400 [ 5] 0x00007f7e893da2b2 [ 6] 0x00007f7e893da4f6 [ 7] 0x00007f7e89241c5f [ 8] 0x00007f7e8924293f [ 9] 0x00007f7eeb090080 [ 10] 0x00007f7eeb090447 [ 11] 0x00007f7eeb090f2b [ 12] 0x00007f7eeb07b30c [ 13] 0x00007f7eece842ad [ 14] 0x00007f7eece84bae [ 15] 0x00007f7ee922cda1 [ 16] 0x00007f7ee922d982 [ 17] 0x00007f7ee9315fc9 [ 18] 0x00007f7ee92b7431 [ 19] 0x00007f7ee8abd5a8 [ 20] 0x00007f7ee8abfcbc [ 21] 0x00007f7ee8abc01d [ 22] 0x00007f7ee8ab5ba1 [ 23] 0x00007f7ee8ab5dd9 [ 24] 0x00007f7ee8abb846 [ 25] 0x00007f7ee8abb92f [ 26] 0x00007f7ee8bea503 [ 27] 0x00007f7ee8bedcf3 [ 28] 0x00007f7ee90fdf6d [ 29] 0x00007f7ee9219fa1 [ 30] 0x00007f7eece842ad [ 31] 0x00007f7eece84bae [ 32] 0x00007f7ee922cda1 [ 33] 0x00007f7ee922d982 [ 34] 0x00007f7ee9315fc9 [ 35] 0x00007f7ee92b7431 [ 36] 0x00007f7ee8abd5a8 [ 37] 0x00007f7ee8abfcbc [ 38] 0x00007f7ee8abc01d [ 39] 0x00007f7ee8ab5ba1 [ 40] 0x00007f7ee8ab5dd9 [ 41] 0x00007f7ee8abb846 [ 42] 0x00007f7ee8abb92f [ 43] 0x00007f7ee8bea503 [ 44] 0x00007f7ee8bedcf3 [ 45] 0x00007f7ee90fdf6d [ 46] 0x00007f7ee90ab60c [ 47] 0x00007f7ee90b2448 [ 48] 0x00007f7ee90b3e22 [ 49] 0x00007f7ee9141807 [ 50] 0x00007f7ee9141aea [ 51] 0x00007f7eeb2f591a [ 52] 0x00007f7eed36ebb8 [ 53] 0x00007f7efd570e9f [ 54] 0x00007f7effb6fa99 [ 55] 0x00007f7efd571126 [ 56] 0x00007f7eed36e9d3 [ 57] 0x00007f7f01cec1a2 [ 58] 0x00007f7f01cec4e8 [ 59] 0x00007f7eed978e6c [ 60] 0x00007f7eed97897f [ 61] 0x00007f7eed956ab1 [ 62] 0x00007f7eed939ac8 [ 63] 0x00007f7eed9348bf [ 64] 0x00007f7f00e1ea05 [ 65] 0x00007f7f00e1fff2 [ 66] 0x00007f7f00e208fb [ 67] 0x00007f7eed36ffc3 [ 68] 0x00007f7eed3706a4 [ 69] 0x00007f7eed3693f1 [ 70] 0x00007f7effb686ba [ 71] 0x00007f7effe8541d [ 72] 0x0000000000000000 /lib/x86_64-linux-gnu/libpthread.so.0+00052956 pthread_rwlock_unlock+00000044
/usr/lib/x86_64-linux-gnu/libglog.so.0+00075657 ZN24glog_internal_namespace_5Mutex12ReaderUnlockEv+00000025
/usr/lib/x86_64-linux-gnu/libglog.so.0+00045920 ZN6google10LogMessage5FlushEv+00000704
/usr/lib/x86_64-linux-gnu/libglog.so.0+00056862 ZN6google15LogMessageFatalD2Ev+00000014
/home/narendrachintala/git/caffe-rfcn/matlab/+caffe/private/caffe.mexa64+01823744
/home/narendrachintala/git/caffe-rfcn/matlab/+caffe/private/caffe.mexa64+01942194
/home/narendrachintala/git/caffe-rfcn/matlab/+caffe/private/caffe.mexa64+01942774
/home/narendrachintala/git/caffe-rfcn/matlab/+caffe/private/caffe_.mexa64+00269407
/home/narendrachintala/git/caffe-rfcn/matlab/+caffe/private/caffe_.mexa64+00272703 mexFunction+00000163
bin/glnxa64/libmex.so+00413824
bin/glnxa64/libmex.so+00414791
bin/glnxa64/libmex.so+00417579
bin/glnxa64/libmex.so+00328460
bin/glnxa64/libmwm_dispatcher.so+00979629 ZN8Mfh_file16dispatch_fh_implEMS_FviPP11mxArray_tagiS2_EiS2_iS2+00000829
bin/glnxa64/libmwm_dispatcher.so+00981934 ZN8Mfh_file11dispatch_fhEiPP11mxArray_tagiS2+00000030
bin/glnxa64/libmwm_lxe.so+12619169
bin/glnxa64/libmwm_lxe.so+12622210
bin/glnxa64/libmwm_lxe.so+13574089
bin/glnxa64/libmwm_lxe.so+13186097
bin/glnxa64/libmwm_lxe.so+04822440
bin/glnxa64/libmwm_lxe.so+04832444
bin/glnxa64/libmwm_lxe.so+04816925
bin/glnxa64/libmwm_lxe.so+04791201
bin/glnxa64/libmwm_lxe.so+04791769
bin/glnxa64/libmwm_lxe.so+04814918
bin/glnxa64/libmwm_lxe.so+04815151
bin/glnxa64/libmwm_lxe.so+06055171
bin/glnxa64/libmwm_lxe.so+06069491
bin/glnxa64/libmwm_lxe.so+11378541
bin/glnxa64/libmwm_lxe.so+12541857
bin/glnxa64/libmwm_dispatcher.so+00979629 ZN8Mfh_file16dispatch_fh_implEMS_FviPP11mxArray_tagiS2_EiS2_iS2+00000829
bin/glnxa64/libmwm_dispatcher.so+00981934 ZN8Mfh_file11dispatch_fhEiPP11mxArray_tagiS2+00000030
bin/glnxa64/libmwm_lxe.so+12619169
bin/glnxa64/libmwm_lxe.so+12622210
bin/glnxa64/libmwm_lxe.so+13574089
bin/glnxa64/libmwm_lxe.so+13186097
bin/glnxa64/libmwm_lxe.so+04822440
bin/glnxa64/libmwm_lxe.so+04832444
bin/glnxa64/libmwm_lxe.so+04816925
bin/glnxa64/libmwm_lxe.so+04791201
bin/glnxa64/libmwm_lxe.so+04791769
bin/glnxa64/libmwm_lxe.so+04814918
bin/glnxa64/libmwm_lxe.so+04815151
bin/glnxa64/libmwm_lxe.so+06055171
bin/glnxa64/libmwm_lxe.so+06069491
bin/glnxa64/libmwm_lxe.so+11378541
bin/glnxa64/libmwm_lxe.so+11040268
bin/glnxa64/libmwm_lxe.so+11068488
bin/glnxa64/libmwm_lxe.so+11075106
bin/glnxa64/libmwm_lxe.so+11655175
bin/glnxa64/libmwm_lxe.so+11655914
bin/glnxa64/libmwbridge.so+00207130 _Z8mnParserv+00000874
bin/glnxa64/libmwmcr.so+00641976
bin/glnxa64/libmwmlutil.so+06524575 _ZNSt13__future_base13_State_baseV29_M_do_setEPSt8functionIFSt10unique_ptrINS_12_Result_baseENS3_8_DeleterEEvEEPb+00000031
/lib/x86_64-linux-gnu/libpthread.so.0+00060057
bin/glnxa64/libmwmlutil.so+06525222 ZSt9call_onceIMNSt13__future_base13_State_baseV2EFvPSt8functionIFSt10unique_ptrINS0_12_Result_baseENS4_8_DeleterEEvEEPbEJPS1_S9_SA_EEvRSt9once_flagOT_DpOT0+00000102
bin/glnxa64/libmwmcr.so+00641491
bin/glnxa64/libmwmvm.so+03367330 ZN14cmddistributor15PackagedTaskIIP10invokeFuncIN7mwboost8functionIFvvEEEEENS2_10shared_ptrINS2_13unique_futureIDTclfp_EEEEEERKT+00000082
bin/glnxa64/libmwmvm.so+03368168 _ZNSt17_Function_handlerIFN7mwboost3anyEvEZN14cmddistributor15PackagedTaskIIP10createFuncINS0_8functionIFvvEEEEESt8functionIS2_ET_EUlvE_E9_M_invokeERKSt9_Any_data+00000024
bin/glnxa64/libmwiqm.so+00867948 _ZN7mwboost6detail8function21function_obj_invoker0ISt8functionIFNS_3anyEvEES4_E6invokeERNS1_15function_bufferE+00000028
bin/glnxa64/libmwiqm.so+00866687 _ZN3iqm18PackagedTaskPlugin7executeEP15inWorkSpace_tagRN7mwboost10shared_ptrIN14cmddistributor17IIPCompletedEventEEE+00000447
bin/glnxa64/libmwiqm.so+00727729
bin/glnxa64/libmwiqm.so+00608968
bin/glnxa64/libmwiqm.so+00587967
bin/glnxa64/libmwservices.so+03262981
bin/glnxa64/libmwservices.so+03268594
bin/glnxa64/libmwservices.so+03270907 _Z25svWS_ProcessPendingEventsiib+00000187
bin/glnxa64/libmwmcr.so+00647107
bin/glnxa64/libmwmcr.so+00648868
bin/glnxa64/libmwmcr.so+00619505
/lib/x86_64-linux-gnu/libpthread.so.0+00030394
/lib/x86_64-linux-gnu/libc.so.6+01078301 clone+00000109
+00000000

This error was detected while a MEX-file was running. If the MEX-file
is not an official MathWorks function, please examine its source code
for errors. Please consult the External Interfaces Guide for information
on debugging MEX-files.
** This crash report has been saved to disk as /home/narendrachintala/matlab_crash_dump.6122-1 **

Caught MathWorks::System::FatalException

I am getting this exception that the correlation layer.cpp is not implemented yet..Is anyone aware of this issue?

correlation_layer

can anybody explain some contents about the correlation_layer.cu. I can not get through how the correlation works .
thx for your kindly help~

About the network prototxt

I feel confused about the ROI tracking network. Can you first open the deploy.prototxt?

Would you mind sharing your GBDnet model and it's training file!

Nice work so far! Thanks for your contribution

Results about tau

Hi~recently I run your code, there are some issues about results. I get the result 79.8% when tau = 1, but I change the tau = 10, the result is so bad, only 8.1%. It confused me, thank you for your reply~~

Missing proposals and wrong .caffemodel?

Hi, I was trying to test this model on ImageNet VID (no modification to the code) and I used the trained models linked in the repo homepage (like this one or this one). The problem is that using those models I get completely random predictions: for each one of the 30 classes I get the same ~0.03 score, so the model doesn't detect any proposal and seems to act randomly, as it's been randomly initialized (I quadruple-checked that Caffe gets as input the right .caffemodel file).

For this reason I tried to train the model, but I constantly get missing proposal file errors. I checked and it seems that - for example - for the DET train dataset there are only ~53k proposal files while the ImageNet DET train dataset has got ~456k images. What am I missing here?

How can I run the rfcn_test()?

I need help. I don't know how to get the input parameters of rfcn_test() and so on.
What should I do set the parameters?

which train prototxt did you use?

There are so many train prototxts in the 'models/rfcn_prototxts/ResNet-101L_ILSVRCvid_corr' directory, what is the difference and which one did you use? Thanks!

meta_vid.mat

Hello~ How can I get teh meta_vid.mat which used by imdb_from_ilsvrc15vid.m?

How to download the pretained models, as the onedrive webpage does not work now.

Where can I find the file ´mean_image´

Hi, I didn´t find the file ´mean_image´ in the folder models/pre_trained_models/ResNet-101L. Where can I find it? Thanks.

Clarification of N_tra in the tracking objective

In Sec. 3.3 of the paper, it's mentioned that the tracking loss is active for N_tra ground truth RoIs which have a track correspondence across the two frames (t, t+tau). I'm interpreting this as meaning that only predicted RoIs assigned to ground-truth RoIs (using an IoU>0.5) with a correspondence between the two frames are used in the tracking loss. Is this the idea?

For example, if the RoI batch size is 256, and each of these 256 RoIs are assigned to a ground truth box in frame t having a correspondence in the next frame t+tau, would I use N_tra=256 and all of their RoI-tracking deltas with the ground-truth delta in the regression of Equation (1) of the paper?

Could you provide example testing command?

Could you provide an example command to get the results on VID validation set? How to reproduce the 79.8 mAP as in the paper using rfcn_test_vid()? I'm not sure what the inputs: conf, imdb, roidb, etc., correspond to. Thank you!

which DL framework does your code use?

Evaluation devkit: Errors during evaluation

@feichtenhofer Great work! Thanks for sharing. I'm attempting to run inference on a single VID video snippet and am running into errors when trying to call functions within the original Imagenet devkit ( taken from here). Are you using a custom/modified devkit?

The first error is an IO error from devkit/evaluation/eval_vid_detection on l117. I noticed the number of columns had changed in the input file predict_file, and I could fix this by changing:

[img_ids obj_labels obj_confs xmin ymin xmax ymax] = ...
        textread(predict_file,'%d %d %f %f %f %f %f');

[img_ids obj_labels unk obj_confs xmin ymin xmax ymax] = ...
        textread(predict_file,'%d %d %d %f %f %f %f %f');

After modifying the line above, I ran into another error in the evaluation:

Undefined function or variable 'eval_vid_tracking'.

This function is called within imdb/imdb_eval_ilsvrc14.m here. I noticed the file eval_vid_tracking.m doesn't exist and it doesn't seem to be in this repo. If you are using a modified devkit, can you please push it?

trained data and datas can't be download

I try ftp and onedrive, wget from ftp always be time out, and onedrive has nothing

regarding the region proposals.

In the code it is looking for .mat file of the region proposals of the data but in the repository you have included the directory how to use them in the code if I want to do validation only?

Demo code

Can you please provide a demo code which given video frames, gives output boxes. ?
There is no need to run the test script on imagenetvid dataset if the code has to be used off the shelf for tracking purposes.

if no groundtruth, how to compute rois_disp in test network

when deploy the model, no groundtruth, so no trackids, so how to compute rois_disp in the test network, use all of the rois?

How to train end2end?

@feichtenhofer Hi,I want to know how can I train a end2end model? Look forward to your reply.

Do you plan to public code for paper "Detect or Track: Towards Cost-Effective Video Object Detection/Tracking"

Luo, Hao, Wenxuan Xie, Xinggang Wang, andWenjun Zeng. "Detect or Track: Towards Cost-Effective Video Object Detection/Tracking". AAAI, 2019, 8.

Is this also your work. I will be appreciate if you can public the code for this article. I am looking for an efficient multiple object tracking algorithm.

how the tracking ROI pooling layer could work?

thanks for your excellent job,but i got confused in some details in your paper.
In your paper,the tracking ROI pooling layer is operate on the stack of {Xcorr,Xreg-t,Xreg-t+1}
as far as i can see:
both Xreg-t and Xreg-t+1 layer has s shape of kk4
Xcorr consist of correlation output of conv3,4,5 respectively,and the correlation output should have shape like HW(2d+1)*(2d+1)
so:
1:how to concat different layer together like:Xcorr and Xreg-t
2:how to pool on the stacked feature map
thank you.

I am researching this paper.When can you announce the source code?

Is the tracking regress output is redundant with future frame detection output?

Hi, I'm reading the paper and I feel confused about the tracking regress output.
The paper says that the tracking regression output is the offset between objects in two frames.
But the objects has been detected by rfcn network in two frames actually,
what can the tracking regression output do?

I'm very appreciate if there is any answer.

the trained model

I can't open the link to download the trained model .I am in China ,angbody has the same problem?

How to implement RoI Tracking?

In the track-regression, RoI pooling is operated on the concatenation of the bounding box regression features and the correlation feature. I want to know the details about RoI pooling such as which frame's roi should be used.

This error was detected while a MEX-file was running. If the MEX-file is not an official MathWorks function, please examine its source code for errors. Please consult the External Interfaces Guide for information on debugging MEX-files

When I do evaluation, there is an error like '
This error was detected while a MEX-file was running. If the MEX-file
is not an official MathWorks function, please examine its source code
for errors. Please consult the External Interfaces Guide for information
on debugging MEX-files.'
Someone said it was caused by using different numbers of GPU with the setting in the code. I only have 2 GPU. Is there anyone know where I can change the number of GPU in the code? Or hoe can I set it for only using CPU? Thanks

How to infer on my own video?

Hello,

how can I test this on my own video to see the performance? Thanks

Training end-to-end from scratch

Has anyone had any luck training this model end-to-end without external object proposals or pretraining the RFCN network on Imagenet DET? I've been trying to train the D(&T loss) model in pytorch and have only reached a frame mean AP of ~64% on the full imagnet VID validation set. Some implementation notes:

I'm only training/testing on Imagenet VID (I am not using anything from Imagenet DET).
As in the paper, I'm sampling 10 frames from each video snippet in the training set. These frames are sampled at regular intervals across the duration of the snippet.
I'm using resnet-101 with pretrained imagenet weights and am randomly initializing the RPN and RCNN.
I'm using correlation features on conv3, conv4, and conv5 and am regressing on the ground truth boxes in frame t --> t+tau.
I am using an L1 smooth loss for the tracking loss.
I am not linking detections across frames at the moment.
I am using a batch size of 2 (2 images per video, 2 videos = 4 frames total)
My initial lr is 5e-4

some doubts about the results

I have run the test code successfully, but I still have some doubts

1.To get the "Detect" results by "ImageNet CLS models" or Detect models"?

2.What is the difference between D and D&T models? I find their prototxt is the same. does the D models mean D(& T loss)?

Thank you ~

where is the code?

rfcn_test parameters

Hi, I am wondering how to set conf, imdb, roidb, varargin in rfcn_test.m? Thank you.

how to run D_T test code

Hi,guys,I'm new here. I've downloaded this code, but do not know how to run this code for test as the author said. Could anyone explain explicitly about the pip line of testing . Thank u very much~

Training break down

When I finetuned the RoI tracking part on the trained RFCN detector, the model training would be broken down. The RoI I used belongs to the T frame for the correlation features and the pair of frames' feature map. The initiate learning rate I set is 0.0001. How can I solve it?