nvlabs / step Goto Github PK

STEP: Spatio-Temporal Progressive Learning for Video Action Detection. CVPR'19 (Oral)

Python 97.25% Shell 2.75%

action-detection spatial-temporal spatio-temporal video-action action-recognition ava ava-dataset ucf101 ucf101-dataset amp

step's Introduction

STEP: Spatio-Temporal Progressive Learning for Video Action Detection

[Paper] [Supp] [YouTube] [Poster]

STEP: Spatio-Temporal Progressive Learning for Video Action Detection, CVPR 2019 (Oral)
Xitong Yang, Xiaodong Yang, Ming-Yu Liu, Fanyi Xiao, Larry Davis, Jan Kautz

STEP is a fully end-to-end action detector that performs detection simply from a handful of initial proposals with no need of relying on an extra person detector.

Getting Started
- Installation
- (Optional) Demo
Training on AVA Dataset
Citation
Related Work
License

Getting Started

Installation

Prerequisites: Python 3.6, NumPy, OpenCV
Install PyTorch (>= 1.1.0) and torchvision (>= 0.2.1)
(Optional) You may skip this. Install APEX for half-precision training:

git clone https://github.com/NVIDIA/apex.git
cd apex
python setup.py install --cuda_ext --cpp_ext

Clone this repo:

git clone https://github.com/NVlabs/STEP.git
cd STEP/

Install external packages (for RoI pooling/align and NMS):

python setup.py build develop

(Optional) Demo

Try STEP on your own video data! Our model pre-trained on the AVA dataset can effectively detect common actions (e.g., stand, sit, walk, run, talk to, etc.) in general videos.

First, extract frames of your own videos and organize them in datasets/demo/frames/ as follows:

|-- frames/
|   |-- <video_id1>/
|       |-- frame0000.jpg
|       |-- frame0001.jpg
|       |-- ...
|   |-- <video_id2>/
|   |-- ...

Second, modify the file demo.py:

checkpoint_path: the path to the trained STEP model. You can use the model you trained on your own (see Training), or our trained model downloaded from Google Drive and Baidu Disk.
args.data_root: the path to your video frames, and the default is datasets/demo/frames/
source_fps: frame rate of your own videos
(optional) conf_thresh and global_thresh: thresholds for confidence scores and global NMS, these are the values you can control for better visualization

Finally, run the script for action detection:

python demo.py

The detection results and visualization will be saved in datasets/demo/results/ by default.

Training on AVA Dataset

Dataset Preparation

Download AVA. Note that our code uses the version AVA v2.1.

Put all the annotation-related files into the folder datasets/ava/label/. Transform the origional annotation files in csv format to pickle files:

python scripts/generate_label.py <path_to_train_csv>
python scripts/generate_label.py <path_to_val_csv>

Extract frames from the downloaded videos and store them in datasets/ava/frames/. You can check out the code scripts/extract_clips.py for the process (ffmpeg is required).

The extracted frames are organized as follows:

|-- frames/
|   |-- <video_id>/
|       |-- <timestamp>/ 
|           |-- <frame_id>

Each folder <timestamp>/ contains the frames within a 1-second interval, starting from that timestamp (for example, the first frame 00000.jpg in the folder 01000/ corresponds to the frame exactly at timstamp 1000). This organization is made for precise alignment with the AVA annotations (in other words, the annotation at a certein timstamp corresponds to the first frame in the folder of that timestamp). As the annoations are provided at timestamps 902:1798 inclusive, we can safely extract the frames at timestamps only from 900 to 1800.

You can save your dataset and annotations in other directories. If so, you need to modify the default pathes in the training scripts, as mentioned in the next section.

Testing

We provide our trained models to reproduce the results reported in our paper. You can download the weights from Google Drive or Baidu Disk, and put it in pretrained/.

Run the following command for testing and evaluation on the validation set of AVA:

python test.py

The output will be stored in datasets/ava/cache/STEP-max3-i3d-two_branch/.

STEP achieves 20.2% mAP on AVA v2.1 using this implementation (updated in arxiv).

Training

As the classification task on the AVA dataset is challenging, we perform classification pre-training on AVA using the ground truth annotations before training the detection models. Our classification pre-trained weights (mAP = 26.4%) can be downloaded from Google Drive and Baidu Disk, and we put it in pretrained/.

Now we are ready to train STEP, using the following script:

cd scripts
bash train_step.sh

Note that you need to modify data_root, save_root and pretrain_path if you save them in the other places.

You can train STEP with the low precision (fp16), by add a flag --fp16 at the end of the script file scripts/train_step.sh (APEX is required for fp16 training).

You can also train your own pre-trained model using the following script:

cd scripts
bash train_cls.sh

If so, you need the kinetics-pretrained weights for the I3D network, which can be downloaded from Google Drive and Baidu Disk and then put in pretrained/.

Tips

GPU memory requirement for the default setting (3 steps, 34 initial proposals, batch size 8):

fp32, 4GPUs: >= 15G
fp16, 4GPUs: >= 10G

Citation

Please cite this paper if it helps your research:

@inproceedings{cvpr2019step,
   title={STEP: Spatio-Temporal Progressive Learning for Video Action Detection},
   author={Yang, Xitong and Yang, Xiaodong and Liu, Ming-Yu and Xiao, Fanyi and Davis, Larry S and Kautz, Jan},
   booktitle={IEEE Conference on Computer Vision and Pattern Recognition (CVPR)},
   year={2019}
}

Related Work

In the folder external/, we modify the code from ActivityNet for parsing annotation files and evaluation, and the code from maskrcnn-benchmark for RoI pooling/align and NMS. Please follow the corresponding license to use the code.

License

Copyright (C) 2019 NVIDIA Corporation. All rights reserved. Licensed under the CC BY-NC-SA 4.0 (Attribution-NonCommercial-ShareAlike 4.0 International). The code is released for academic research use only. For commercial use, please contact [email protected].

step's People

Stargazers

Watchers

Forkers

compliceu zhuysheng glassstone janysunny ammieqi 2samgu2 aixioma kinect59 zaurneo anhminh3105 callmetoy hzhang57 jochanhee sajid3 xdr1xnl0p3z stwrd phaphuang yushanshan05 texsp vinayakarannil seta-tiendang govindskatyura heavytrowa tiendang9966 alighofrani95 818ajian idanazuri chuandai genhao3 junwenxiong anate aakgun liuguoyou l081 futurezmx ky-zhou migakol neudeep wenhel giahaowjx sunilsivadas arashi-5 duniuofhust anarhist94 cryptowealth-technology joshualse danialashidiq

step's Issues

converting csv to pickle file

Hi , I am really new here, I have this issue where i am not able to convert csv file to pickle file, please correct my code if I am wrong.

Should it be this :
with open('ava_train_v2.csv', 'wb') as f:
pickle.dump(ava_train_v2.csv, f)

or this ?
with open('ava_train_v2.pkl', 'wb') as f:
pickle.dump(ava_train_v2.pkl, f)

(https://user-images.githubusercontent.com/23328577/80235678-9b768280-868c-11ea-8e73-fcaaac3f851e.PNG)

hello,I have tried 2 ways.Firstly,I put a video on the directory "/home/huiqiang/code/STEP-master/datasets/demo/frames/",the results like bellow.

Building I3D model...
Building I3D head for global branch...
Building I3D head for global branch...
Building I3D head for global branch...
Building I3D head for context branch...
Datalist len: 0

Runtime Error: Segmentation fault

I use my own image datas with demo.py and got this error, no any other infos display.

I have located the postion causing this error, its roi layer calling. However, I tested the ROIAlign_cuda.cu not using PyTorch Tensor as parameters but use float * instead and no errors raise.

my gcc version is 4.8.5, is the gcc version critical ? any advices? thanks

AVA Evaluation

Thanks for your nice github!
I want to run the evaluation for the AVA dataset.
But I don't know the format of the result.csv .
Could you give me the example of the result.csv which your codes want?
I just want to know the format of the result.csv of prediction. Ex: video name, scores, label and so on.
Thank you!

CUDA error: an illegal memory access was encountered

hi, thanks for you great works. When I add --fp16 flag, I meet a CUDA error: an illegal memory access was encountered error. However, when I remove the --fp16 flag, it works well. I have two gpus, one is GTX 1070 and another one is 980Ti.
pytorch version 1.1.0
cuda 9.0
thanks again..

run demo.py error: File "/home/user/STEP/data/customize.py", line 105, in read_images return np.stack(images, axis=0) raise ValueError('all input arrays must have the same shape') ValueError: all input arrays must have the same shape

The demo is not working

I prepared the dataset folder in respect to the information listed in the Demo paragraph. But the demo is not working. This is what I got:

python demo.py 
Warning: If you want to use fp16, please apex with cuda support (https://github.com/NVIDIA/apex) and update pytorch to 1.0
Warning: If you want to use fp16, please apex with cuda support (https://github.com/NVIDIA/apex) and update pytorch to 1.0
Warning: If you want to use fp16, please apex with cuda support (https://github.com/NVIDIA/apex) and update pytorch to 1.0
Loading pretrain model from pretrained/ava_step.pth
Building I3D model...
Building I3D head for global branch...
Building I3D head for global branch...
Building I3D head for global branch...
Building I3D head for context branch...
Datalist len:  1439
Traceback (most recent call last):
  File "/home/bilel/anaconda3/envs/env_pytorch/lib/python3.6/queue.py", line 173, in get
    self.not_empty.wait(remaining)
  File "/home/bilel/anaconda3/envs/env_pytorch/lib/python3.6/threading.py", line 304, in wait
    self._acquire_restore(saved_state)
  File "/home/bilel/anaconda3/envs/env_pytorch/lib/python3.6/threading.py", line 251, in _acquire_restore
    def _acquire_restore(self, x):
  File "/home/bilel/anaconda3/envs/env_pytorch/lib/python3.6/site-packages/torch/utils/data/_utils/signal_handling.py", line 66, in handler
    _error_if_any_worker_fails()
RuntimeError: DataLoader worker (pid 12383) is killed by signal: Killed. 

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/home/bilel/anaconda3/envs/env_pytorch/lib/python3.6/site-packages/torch/utils/data/dataloader.py", line 724, in _try_get_data
    data = self._data_queue.get(timeout=timeout)
  File "/home/bilel/anaconda3/envs/env_pytorch/lib/python3.6/queue.py", line 176, in get
    return item
  File "/home/bilel/anaconda3/envs/env_pytorch/lib/python3.6/threading.py", line 243, in __exit__
    return self._lock.__exit__(*args)
RuntimeError: release unlocked lock

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "demo.py", line 228, in <module>
    main()
  File "demo.py", line 110, in main
    for _, (images, tubes, infos) in enumerate(dataloader):
  File "/home/bilel/anaconda3/envs/env_pytorch/lib/python3.6/site-packages/torch/utils/data/dataloader.py", line 804, in __next__
    idx, data = self._get_data()
  File "/home/bilel/anaconda3/envs/env_pytorch/lib/python3.6/site-packages/torch/utils/data/dataloader.py", line 761, in _get_data
    success, data = self._try_get_data()
  File "/home/bilel/anaconda3/envs/env_pytorch/lib/python3.6/site-packages/torch/utils/data/dataloader.py", line 737, in _try_get_data
    raise RuntimeError('DataLoader worker (pid(s) {}) exited unexpectedly'.format(pids_str))
RuntimeError: DataLoader worker (pid(s) 12383) exited unexpectedly

Demo is not working, help required

Hi there,
First of all I would like to congratulate you for this great work and thank you for saharing the this awsome content publicly.
I am trying to run the demo.py after following all the steps described in the demo paragraph but unfortunately I am encountering problems and no visual results are generated in results folder except an empty results.txt file. Following is the terminal output that I am getting. The "Datalist len" is 0. I don't know why it is so. I have provided a total of 778 frames but I am not getting any output. Is there a minimum nunber of frames to be provided for detection?
Please help, Thanks.

python3 demo.py
Warning: If you want to use fp16, please apex with cuda support (https://github.com/NVIDIA/apex) and update pytorch to 1.0
Warning: If you want to use fp16, please apex with cuda support (https://github.com/NVIDIA/apex) and update pytorch to 1.0
Warning: If you want to use fp16, please apex with cuda support (https://github.com/NVIDIA/apex) and update pytorch to 1.0
Loading pretrain model from pretrained/ava_step.pth
Building I3D model...
Building I3D head for global branch...
Building I3D head for global branch...
Building I3D head for global branch...
Building I3D head for context branch...
Datalist len: 0

CUDA error: an illegal memory access was encountered

hi, thanks for you great works.
I train my dataset, which has ten classes, fps =1, and I don't add --fp16 flag.
max_iter=2
batch_size=2

But when I start training, there will be the error. This error happens during the third itertator. That means it is ok during the first and the second iterator. The model can forward,backforward and the function of optimizer.step is ok during the first and the second iterator. When the third itertator starts, there throw the error:
Traceback (most recent call last):
File "train.py", line 602, in
main()
File "train.py", line 235, in main
train(args, nets, optimizer, scheduler, train_dataloader, val_dataloader, log_file)
File "train.py", line 362, in train
optimizer.step()
File "/usr/local/lib/python3.6/dist-packages/torch/optim/lr_scheduler.py", line 51, in wrapper
return wrapped(*args, **kwargs)
File "/usr/local/lib/python3.6/dist-packages/torch/optim/adam.py", line 103, in step
denom = (exp_avg_sq.sqrt() / math.sqrt(bias_correction2)).add_(group['eps'])
RuntimeError: CUDA error: an illegal memory access was encountered

what is the meaning of “chunks”？

thank U for sharing first! I try to understand the meaning of "chunks"?is it related to "T"? Look foraward to your reply!
for i in range(1, exec_iter+1): # index from 1 # adaptively get the start chunk chunks = args.NUM_CHUNKS[i] T_start = int((args.NUM_CHUNKS[args.max_iter] - chunks) / 2) * args.T T_length = chunks * args.T chunk_idx = [j*args.T + int(args.T/2) for j in range(chunks)] # used to index the middel frame of each chunk half_T = int(args.T/2)

Running the STEP model on a CPU

Hello,
I'm currently attempting to run The demo script of STEP on a CPU instead of a GPU. I made some modifications to the code accordingly, but unfortunately, I encountered an error. After replacing the GPU instructions, I received the following error message:
raise RuntimeError('Error(s) in loading state_dict for {}:\n\t{}'.format(
RuntimeError: Error(s) in loading state_dict for BaseNet:
Missing key(s) in state_dict: "base_model.0.conv3d.weight", "base_model.0.batch3d.weight", "base_model.0.batch3d.bias", "base_model.0.batch3d.running_mean", ....
I would greatly appreciate any assistance in resolving this issue. If anyone could help me understand and correct the problem, it would be very helpful.

Thank you in advance.

pre-defined proposals

Hello,i want to konw how to generate pre-defined proposals?
I don't find some codes in generating the pre-defined proposals through the whole project.So,the pre-trained proposals are trained independently,and then feed them into the two-branch network?

Is this model suitable for real-time application ? (Not issue)

Hi,

Thanks for these wonderful works. I can launch the demo.
As I mentioned, I would like to know if this model is suitable for online detection application or there will be latency.

UCF数据集

Can you tell me the code which is used in UCF

will you open the code of training on the ucf101 dataset?

when I try to train my own dataset,something wrong happens

when I train my datasets,it always reports errors like：
ValueError: Image path /STEP/datasets/ava/frames/D01/00922/ not Found
because my datasets' each video only has 20 clips,which from 00902 to 00921,and fps equals 8.
I guess this error may be caught by the code:
if self.input_type == "rgb": images = read_images(self.imgpath_rgb, videoname, fid, num=TEM_REDUCE*self.T*self.chunks, fps=self.fps)
I want to know what is the meaning of "num=TEM_REDUCEself.Tself.chunks", is "num" related to "fps"?do you have some suggestions for my training?Thanks!

Training on Custom Dataset

Thanks for sharing such a nice work . I want to train a model using your given code . Is it necessary to make my dataset as in the "AVA-Data set Format". If it is yes then what is the way or tools can be used.What are pre-processing steps I need to do to complete the AVA Dataset Format. Is there any way which I can use to train my dataset.

I have dataset now as follow

Dataset/
├── Class-A
│ └── insp.mp4
├── Class-B
│ └── kep.mp4
└── Class-C
└── tak.mp4

Run Demo 2th time and not working

I run demo 1st time in google colab and it's working, but when i try 2th time the demo is not working. This is what I got:

Warning: If you want to use fp16, please apex with cuda support (https://github.com/NVIDIA/apex) and update pytorch to 1.0 Warning: If you want to use fp16, please apex with cuda support (https://github.com/NVIDIA/apex) and update pytorch to 1.0 Warning: If you want to use fp16, please apex with cuda support (https://github.com/NVIDIA/apex) and update pytorch to 1.0 Loading pretrain model from /content/drive/MyDrive/STEP/pretrained/ava_step.pth

Building I3D model... Building I3D head for global branch... Building I3D head for global branch... Building I3D head for global branch... Building I3D head for context branch... Datalist len: 978

THCudaCheck FAIL file=/content/drive/My Drive/STEP/external/maskrcnn_benchmark/csrc/cuda/ROIAlign_cuda.cu line=321 error=209 : no kernel image is available for execution on the device Traceback (most recent call last): File "demo.py", line 229, in <module> main() File "demo.py", line 122, in main history, _ = inference(args, conv_feat, context_feat, nets, args.max_iter, tubes) File "/content/drive/My Drive/STEP/utils/utils.py", line 48, in inference pooled_feat = nets['roi_net'](conv_feat[:, T_start:T_start+T_length].contiguous(), flat_tubes) File "/usr/local/lib/python3.6/dist-packages/torch/nn/modules/module.py", line 727, in _call_impl result = self.forward(*input, **kwargs) File "/content/drive/My Drive/STEP/models/networks.py", line 45, in forward tubes.view(-1, 5).detach()) File "/usr/local/lib/python3.6/dist-packages/torch/nn/modules/module.py", line 727, in _call_impl result = self.forward(*input, **kwargs) File "/content/drive/My Drive/STEP/external/maskrcnn_benchmark/roi_layers/roi_align.py", line 92, in forward input, rois, self.output_size, self.spatial_scale, self.sampling_ratio File "/content/drive/My Drive/STEP/external/maskrcnn_benchmark/roi_layers/roi_align.py", line 53, in forward output = _C.roi_align_forward(input, roi, spatial_scale, output_size[0], output_size[1], sampling_ratio) RuntimeError: cuda runtime error (209) : no kernel image is available for execution on the device at /content/drive/My Drive/STEP/external/maskrcnn_benchmark/csrc/cuda/ROIAlign_cuda.cu:321

Baidudisk is not work

hi,thank you for your work about action detection.i want to try this code.but the pre-trained mode is not found in Baidudisk.if you have time.Can you submit it once more?

wish your reply.

Mr.zhang

TypeError: '<' not supported between instances of 'NoneType' and 'int'

Thank you for sharing the code of your wonderful work.
After setting up the environment, I tried the demo.py program on my own video clip having 64 frames.
I am getting the above mentioned error in following line:
for _, (images, tubes, infos) in enumerate(dataloader):
Have you ever faced this error? What could be possible solution.
Thanks.

test.py is not working it has invalid config

when running python test.py it expects
val.pkl
and frames of data
I have downloaded test videos (from ava_test.txt)
But How can I generate pkl file?
Generate_label.py code generates it it expects a csv with video name, box details, etc which is not available for test data.
So ho to run it??
@tmbdev
@bkhailany @mjgarland @dumerrill @sdalton1

a problem when train

RuntimeError: A tensor was not on the same device as the first tensor

When i run the train.py ,i face up with this problem

Draw graphs

How can i draw MIUT , Comparison of the per-class breakdown frame-AP at IoU threshold 0.5 on AV and other graphs like mention here. for this repo?
http://xiaodongyang.org/publications/papers/step-supp-cvpr19.pdf