Code Monkey home page Code Monkey logo

pytorch-detect-to-track's Introduction

A pytorch implementation of the paper Detect to Track and Track to Detect.

Introduction

This project is a pytorch implementation of detect to track and track to detect. This repository is influenced by the following implementations:

During our implementation, we refer to the above implementations, especially jwyang/faster-rcnn.pytorch. As in that implementation, this repository has the following qualities:

  • It is pure Pytorch code. We convert all the numpy implementations to pytorch!

  • It supports multi-image batch training. We revise all the layers, including dataloader, rpn, roi-pooling, etc., to support multiple images in each minibatch.

  • It supports multiple GPUs training. We use a multiple GPU wrapper (nn.DataParallel here) to make it flexible to use one or more GPUs, as a merit of the above two features.

Furthermore, since the Detect to Track and Track to Detect implementation originally used an R-FCN siamese network and correlation layer, we've added/modified the following:

  • Supports multiple images per roidb entry. By default, we use 2 images in contiguous frames to define an roidb entry to faciliate a forward pass through a two-legged siamese network.

  • It is memory efficient. We limit the aspect ratio of the images in each roidb and group images with similar aspect ratios into a minibatch. As such, we can train resnet101 with batchsize = 2 (4 images) on a 2 Titan X (12 GB).

  • Supports 4 pooling methods. roi pooling, roi alignment, roi cropping, and position-sensitive roi pooling. More importantly, we modify all of them to support multi-image batch training.

  • Supports correlation layer. We adopt the correlation layer from NVIDIA's flownet2 implementation.

Other Resources

Benchmarking

WORK IN PROGRESS

This project is a work in progress, and PRs are welcome. The current implementation is benchmarked against the Imagenet VID dataset.

For training, we adopt the common heuristic of passing alternating samples from VID and DET (e.g. iteration 1 is from VID, iteration 2 is from DET, etc). Additionally, for training, 10 frames are sampled per video snippet. This avoids biasing the training towards longer snippets. However, validation performance is evaluated on each frame from each snippet of VAL. Please refer to the D&T paper for more details.

1). Baseline single-frame RFCN (see this repo: (Trained model can be accessed here under the name rfcn_detect.pth

Imagenet VID+DET (Train/Test: imagenet_vid_train+imagenet_det_train/imagenet_vid_val, scale=600, PS ROI Pooling).

model   #GPUs batch size lr       lr_decay max_epoch     time/epoch mem/GPU mAP
Res-101     2 2 1e-3 5   11   -- 8021MiB   70.3

2). D(&T loss) Imagenet VID+DET (Train/Test: imagenet_vid_train+imagenet_det_train/imagenet_vid_val, scale=600, PS ROI Pooling). This network is initialized with the weights from the single-frame RFCN baseline above. Trained model can be accessed from here under the name rfcn_detect_track_1_7_32941.pth).

Currently, the performance drops by 1.6 percentage points. The issue is currently unknown. Again, PRs are welcome.

model   #GPUs batch size lr       lr_decay max_epoch     time/epoch mem/GPU mAP
Res-101     2 2 1e-4 5   7   -- 8021MiB   68.7

TODO: Result using Viterbi algorithm as linking post-processing step.

  • If not mentioned, the GPU we used is NVIDIA Titan X Pascal (12GB).

prerequisites

  • Python 2.7
  • Pytorch 0.3.0 (0.4.0+ may work, but hasn't been tested; some minor tweaks are probably required.)
  • CUDA 8.0 or higher

TODO:

  • Update to Pytorch 0.4.0+
  • Make Python 3 compatible

Build

As pointed out by ruotianluo/pytorch-faster-rcnn, choose the right -arch to compile the cuda code:

GPU model Architecture
TitanX (Maxwell/Pascal) sm_52
GTX 960M sm_50
GTX 1080 (Ti) sm_61
Grid K520 (AWS g2.2xlarge) sm_30
Tesla K80 (AWS p2.xlarge) sm_37

More details about setting the architecture can be found here or here

Install all the python dependencies using pip:

pip install -r requirements.txt

If you would like to use tensorboard, install the cpu version of Tensorflow and install TensorboardX

Compile the cuda dependencies using following simple commands:

cd lib
sh make.sh

It will compile all the modules you need, including NMS, PSROI_POOLING, ROI_Pooing, ROI_Align and ROI_Crop. The default version is compiled with Python 2.7, please compile by yourself if you are using a different python version.

As pointed out in this issue, if you encounter some error during the compilation, you might miss to export the CUDA paths to your environment.

Training

Then:

cd pytorch-detect-and-track
mkdir data

Download the ILSVRC VID and DET (train/val/test lists can be found here. The ILSVRC2015 images can be downloaded from here ).

Untar the file:

tar xf ILSVRC2015.tar.gz

We'll refer to this directory as $DATAPATH. Make sure the directory structure looks something like:

|--ILSVRC2015
|----Annotations
|------DET
|--------train
|--------val
|------VID
|--------train
|--------val
|----Data
|------DET
|--------train
|--------val
|------VID
|--------train
|--------val
|----ImageSets
|------DET
|------VID

Create a soft link under pytorch-detect-and-track/data:

ln -s $DATAPATH/ILSVRC2015 ./ILSVRC

Create a directory called pytorch-detect-and-track/data/pretrained_model, and place the pretrained models into this directory.

Before training, set the correct directory to save and load the trained models. The default is ./output/models. Change the arguments "save_dir" and "load_dir" in trainval_net.py and test_net.py to adapt to your environment.

To train an RFCN D&T model with resnet-101 on Imagenet VID+DET, simply run:

CUDA_VISIBLE_DEVICES=0,1 python trainval_net.py \
    --cuda \
    --mGPUs \
    --nw 12 \
    --dataset imagenet_vid+imagenet_det \
    --cag \
    --lr 1e-4 \
    --bs 2 \
    --lr_decay_gamma=0.1 \
    --lr_decay_step 3 \
    --epochs 10 \
    --use_tfboard True

where 'bs' is the batch size, --cag is a flag for class-agnostic bbox regression, lr, lr_decay_gamma, and lr_decay_step are the learning rate, factor to decrease the learning rate by, and the number of epochs before decaying the learning rate, respectively. Above, --bs, --nw (number of workers; check with linux nproc), and --mGPUs should be set according to the number of GPUs you wish to train on and your GPU memory size. On 2 Titan Xps with 12G memory, the batch size can be up to 2 (4 images, 2 per GPU).

Authorship

Contributions to this project have been made by Thomas Balestri and Jugal Sheth.

pytorch-detect-to-track's People

Contributors

feynman27 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

pytorch-detect-to-track's Issues

indentation problem

Hello, thank you for this work!!
When I read this code, I found there are some indentation problems in some files, such as in resnet.py and faster_rcnn.py. It may show in my VS code or Notepad++ like this
QQ截图20190717154135
This problem happens because there are some indentations in the code, it displays correctly in github but not in my editor, and it seems not report errors when the program was running, but I'm not sure what it may influence, just mention it.

Unable to run demo.py ??

running on titan-X (12-gigs vRAM). After processing 4-frames it through CUDA-runtime error.
Do i need to update something in config. settings ? Is there any other way to process/test video files ?

OR

How do i execute/run demo.py by supplying custom videos list ?

thanks in advance

pred_trk_boxes

Hello @Feynman27 ,

Thank you for sharing your job , I would like to ask you: why in online_tubes.py you don't use the information given by the pred_trk_boxes for the IoU?.

Something about "_smooth_l1_loss"

@Feynman27

When I run trainval_net.py
Traceback (most recent call last):
File "/media/hp208/4t/zhaoxingjie/graduation_project/d2t/pytorch-detect-to-track/trainval_net.py", line 367, in
rois_label, tracking_loss_bbox = RFCN(im_data, im_info, gt_boxes, num_boxes)
File "/media/hp208/4t/soft/anaconda3/envs/D2T2D/lib/python3.5/site-packages/torch/nn/modules/module.py", line 357, in call
result = self.forward(input, **kwargs)
File "/media/hp208/4t/zhaoxingjie/graduation_project/d2t/pytorch-detect-to-track/lib/model/faster_rcnn/rfcn.py", line 205, in forward
tracking_rois_outside_ws)
File "/media/hp208/4t/zhaoxingjie/graduation_project/d2t/pytorch-detect-to-track/lib/model/utils/net_utils.py", line 76, in _smooth_l1_loss
box_diff = bbox_pred - bbox_targets
RuntimeError: The size of tensor a (124) must match the size of tensor b (4) at non-singleton dimension 1
the dimension of tracking_pred is 30
124 , but tracking_rois_target is 30*4 , in function _smooth_l1_loss
`
def _smooth_l1_loss(bbox_pred, bbox_targets, bbox_inside_weights, bbox_outside_weights, sigma=1.0, dim=[1]):

sigma_2 = sigma ** 2
box_diff = bbox_pred - bbox_targets
in_box_diff = bbox_inside_weights * box_diff
abs_in_box_diff = torch.abs(in_box_diff)
smoothL1_sign = (abs_in_box_diff < 1. / sigma_2).detach().float()
in_loss_box = torch.pow(in_box_diff, 2) * (sigma_2 / 2.) * smoothL1_sign \
              + (abs_in_box_diff - (0.5 / sigma_2)) * (1. - smoothL1_sign)
out_loss_box = bbox_outside_weights * in_loss_box
loss_box = out_loss_box
for i in sorted(dim, reverse=True):
  loss_box = loss_box.sum(i)
loss_box = loss_box.mean()
return loss_box

`
box_diff = bbox_pred - bbox_targets , they did not match,
can you explain this, thanks a lot!

Loss when training with custom dataset (2 classes) dives to nan

Hello @Feynman27 @cclauss @jwyang @jiasenlu @albanie @alex-birch.
I tried to run the code with a grayscale dataset (Infrared) with 2 classes (background and positive). After some simple modifications, regarding number of classes and width-height, i run the code but since first iterations i got huge loss values and afterwards get NaNs. Also as i noticed, bbox predictions are negative. I use pre-trained resnet-101, and train rfcn from scratch. Any advice would be highy appreciated.

Classfication and Box Regression based on RoI Pooling and RoI tracking ?

Hi @Feynman27 ,
Thanks for the effort of translating Detect to Track to pytorch!

I was wondering whether to predict the classification and box regression of the frame t, you use only the RoI poolings from frame t? I think in the paper it appears that they combine all the RoI pooling output from both time t and t+tau, and also the RoI Tracking output. However, in the RFCN code (Line 108) it seems that you calculate the classification and regression from just the output of one leg of the network.

Have I understood something wrong? Thanks!

input vid_list format

Hello!I wander what the input argument --vid_list look like in demo.py. I'll appreciate it if you give me an example. And how can I get the imdb file of the ILSVRC train data to train? Looking Forward to your reply!
Best wishes!

RuntimeError: Cannot re-initialize CUDA in forked subprocess. To use CUDA with multiprocessing, you must use Python 3.4+ and the 'spawn' start method

Did I miss some configs? In readme file you said the project require python2.7.
The complete error is here:
Traceback (most recent call last): File "trainval_net.py", line 342, in <module> data = next(vid_data_iter) File "/root/anaconda2/lib/python2.7/site-packages/torch/utils/data/dataloader.py", line 281, in __next__ return self._process_next_batch(batch) File "/root/anaconda2/lib/python2.7/site-packages/torch/utils/data/dataloader.py", line 301, in _process_next_batch raise batch.exc_type(batch.exc_msg) RuntimeError: Traceback (most recent call last): File "/root/anaconda2/lib/python2.7/site-packages/torch/utils/data/dataloader.py", line 55, in _worker_loop samples = collate_fn([dataset[i] for i in batch_indices]) File "/pytorch-detect-to-track/lib/roi_data_layer/roibatchLoader.py", line 218, in __getitem__ num_boxes.append(torch.LongTensor([min(gt_boxes[ientry].size(0), self.max_num_box)]).cuda()) File "/root/anaconda2/lib/python2.7/site-packages/torch/_utils.py", line 69, in _cuda return new_type(self.size()).copy_(self, async) File "/root/anaconda2/lib/python2.7/site-packages/torch/cuda/__init__.py", line 384, in _lazy_new _lazy_init() File "/root/anaconda2/lib/python2.7/site-packages/torch/cuda/__init__.py", line 140, in _lazy_init "Cannot re-initialize CUDA in forked subprocess. " + msg) RuntimeError: Cannot re-initialize CUDA in forked subprocess. To use CUDA with multiprocessing, you must use Python 3.4+ and the 'spawn' start method
Thank you.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.