Code Monkey home page Code Monkey logo

actiondetection-afsd's Introduction

AFSD: Learning Salient Boundary Feature for Anchor-free Temporal Action Localization

This is an official implementation in PyTorch of AFSD. Our paper is available at https://arxiv.org/abs/2103.13137

Updates

  • (May, 2021) Release training and inference code for ActivityNet v1.3: [ANET_README]
  • (May, 2021) We released AFSD training and inference code for THUMOS14 dataset.
  • (February, 2021) AFSD is accepted by CVPR2021.

Abstract

Temporal action localization is an important yet challenging task in video understanding. Typically, such a task aims at inferring both the action category and localization of the start and end frame for each action instance in a long, untrimmed video. While most current models achieve good results by using pre-defined anchors and numerous actionness, such methods could be bothered with both large number of outputs and heavy tuning of locations and sizes corresponding to different anchors. Instead, anchor-free methods is lighter, getting rid of redundant hyper-parameters, but gains few attention. In this paper, we propose the first purely anchor-free temporal localization method, which is both efficient and effective. Our model includes (i) an end-to-end trainable basic predictor, (ii) a saliency-based refinement module to gather more valuable boundary features for each proposal with a novel boundary pooling, and (iii) several consistency constraints to make sure our model can find the accurate boundary given arbitrary proposals. Extensive experiments show that our method beats all anchor-based and actionness-guided methods with a remarkable margin on THUMOS14, achieving state-of-the-art results, and comparable ones on ActivityNet v1.3.

Summary

  • First purely anchor-free framework for temporal action detection task.
  • Fully end-to-end method using frames as input rather then features.
  • Saliency-based refinement module to gather more valuable boundary features.
  • Boundary consistency learning to make sure our model can find the accurate boundary.

Performance

Getting Started

Environment

  • Python 3.7
  • PyTorch == 1.4.0 (Please make sure your pytorch version is 1.4)
  • NVIDIA GPU

Setup

pip3 install -r requirements.txt
python3 setup.py develop

Data Preparation

  • THUMOS14 RGB data:
  1. Download pre-processed RGB npy data (13.7GB): [Weiyun]
  2. Unzip the RGB npy data to ./datasets/thumos14/validation_npy/ and ./datasets/thumos14/test_npy/
  • THUMOS14 flow data:
  1. Because it costs more time to generate flow data for THUMOS14, to make easy to run flow model, we provide the pre-processed flow data in Google Drive and Weiyun (3.4GB): [Google Drive], [Weiyun]
  2. Unzip the flow npy data to ./datasets/thumos14/validation_flow_npy/ and ./datasets/thumos14/test_flow_npy/

If you want to generate npy data by yourself, please refer to the following guidelines:

  • RGB data generation manually:
  1. To construct THUMOS14 RGB npy inputs, please download the THUMOS14 training and testing videos.
    Training videos: https://storage.googleapis.com/thumos14_files/TH14_validation_set_mp4.zip
    Testing videos: https://storage.googleapis.com/thumos14_files/TH14_Test_set_mp4.zip
    (unzip password is THUMOS14_REGISTERED)
  2. Move the training videos to ./datasets/thumos14/validation/ and the testing videos to ./datasets/thumos14/test/
  3. Run the data processing script: python3 AFSD/common/video2npy.py configs/thumos14.yaml
  • Flow data generation manually:
  1. If you should generate flow data manually, firstly install the denseflow.
  2. Prepare the pre-processed RGB data.
  3. Check and run the script: python3 AFSD/common/gen_denseflow_npy.py configs/thumos14_flow.yaml

Inference

We provide the pretrained models contain I3D backbone model and final RGB and flow models for THUMOS14 dataset: [Google Drive], [Weiyun]

# run RGB model
python3 AFSD/thumos14/test.py configs/thumos14.yaml --checkpoint_path=models/thumos14/checkpoint-15.ckpt --output_json=thumos14_rgb.json

# run flow model
python3 AFSD/thumos14/test.py configs/thumos14_flow.yaml --checkpoint_path=models/thumos14_flow/checkpoint-16.ckpt --output_json=thumos14_flow.json

# run fusion (RGB + flow) model
python3 AFSD/thumos14/test.py configs/thumos14.yaml --fusion --output_json=thumos14_fusion.json

Evaluation

The output json results of pretrained model can be downloaded from: [Google Drive], [Weiyun]

# evaluate THUMOS14 fusion result as example
python3 AFSD/thumos14/eval.py output/thumos14_fusion.json

mAP at tIoU 0.3 is 0.6728296149479254
mAP at tIoU 0.4 is 0.6242590551201842
mAP at tIoU 0.5 is 0.5546668739091394
mAP at tIoU 0.6 is 0.4374840824921885
mAP at tIoU 0.7 is 0.3110112542745055

Training

# train the RGB model
python3 AFSD/thumos14/train.py configs/thumos14.yaml --lw=10 --cw=1 --piou=0.5

# train the flow model
python3 AFSD/thumos14/train.py configs/thumos14_flow.yaml --lw=10 --cw=1 --piou=0.5

Citation

If you find this project useful for your research, please use the following BibTeX entry.

@InProceedings{Lin_2021_CVPR,
    author    = {Lin, Chuming and Xu, Chengming and Luo, Donghao and Wang, Yabiao and Tai, Ying and Wang, Chengjie and Li, Jilin and Huang, Feiyue and Fu, Yanwei},
    title     = {Learning Salient Boundary Feature for Anchor-free Temporal Action Localization},
    booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)},
    month     = {June},
    year      = {2021},
    pages     = {3320-3329}
}

actiondetection-afsd's People

Contributors

linchuming avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar

actiondetection-afsd's Issues

custom data

你好,请问,这个框架应用于自定义的数据,该如何构建数据格式,比如一个视频是一个动作从start到end的视频

long video training

ActivityNet是一段视频只有一个action,如果我想用一段长视频包含多组action该如何训练&测试呢?

about feature extraction

Hi, have you tried to use I3D pre-extracted features? Since this methods involves finetuning of I3D models,
which may result in unfair comparison with other methods.

confidence score for long videos

Hi,
I checked your codes on very long videos and I got a lot of false positives. What will be the best values for conf_thresh and top_k?

support for multi-GPU

我在复现代码的过程当中发现这个repo不支持多卡,在这里把我个人的解决方法写到这里把,希望作者可以更新一下多卡版本
采用4块V100进行训练,修改的地方:
train.py->def forward_one_epoch(net, clips, targets, scores=None, training=True, ssl=True):

  if training:
       if ssl:
           tar = targets[0]
           pro = torch.stack([tar,tar,tar,tar],dim=0)
           output_dict = net(clips, proposals=pro, ssl=ssl)
       else:
           output_dict = net(clips, ssl=False)
           output_dict['priors'] = output_dict['priors'][0:126:]

setup.py in 3090

你好,我在我自己的电脑上(cuda11.2)可以进行setup.py并运行后续程序,但是在3090的服务器中(cuda11.1 cuda11.4)进行训练时,在boundary_max_pooling_cuda处总是会报错 cuda runtime error(209):no kernel image is available foe execution on the device.
我调整了好多torch和cuda版本,但好像并不是版本不匹配的问题
能帮帮我吗 谢谢你

Why we use the function ScaleExp() in BDNet.py?

I checked the code carefully according to the formula(1 and 3) in the paper. I could not understand why we use ScaleExp()?In the code,"l_segment = new_priors - segments[:, :, :1]".Do we divide the both parts of the formula 3 by 2^l?Thank you!!

Architecture of Pyramid Feature Network

Hello,

Firstly, I'd like to thank you much for publishing the code and congratulations about the CVPR'21 paper.

I would like to have an overview of the architecture of pyramid feature module in your pipeline, it is noted to be shared in supplementary, but unfortunately I cannot get access to it.
Could you please share the pdf file of supplementary?

CUDA_runtime error (98)

非常感谢开源的工作!我在使用代码时会报错CUDA_runtime error (98)。
报错位置为:AFSD/prop_pooling/boundary_max_pooling_kernel.cu:110
我猜想应该是CUDA拓展出现了问题。
我的环境信息:
pytorch 1.4.0
torchvision 0.5.0
cuda: 10.0
另外:不知道有没有CPU版本的boundary_max_pooling_kernel呢?非常感谢!

about activitynet1.3

@linchuming 您好我在运行python3 AFSD/anet_data/video2npy.py THREAD_NUM生成 RGB npy 输入数据时,遇到一个问题,当采样视频的总时长超过1分钟时,ret, frame = cap.read(),ret为false,count = cap.get(cv2.CAP_PROP_FRAME_COUNT)为770。但是同样的count为770,但是采样视频总时长不超过1分钟时,ret是为true。我不知道这是什么问题,您能帮帮我吗?还有一个神奇的现象是,我把不能正确读帧的视频下载到我本地笔记本电脑上时,这些都可以读取。

rescal flow to [-1,1]

in gen_denseflow_npy.py :
Following I3D data preprocessing, for the flow stream, we convert the videos to grayscale,and pixel values are truncated to the range [-20, 20], then rescaled between -1 and 1. We only use the first two output dimensions, and apply the same cropping as for RGB.
but i dont see the operation '' recal flow from [-20, 20] to [-1, 1]''
Thanks for you work

Has anyone reproduced the results of rgb model on THUMOS14 dataset?

I have trained the AFSD rgb model on THUMOS14 dataset as described in Implementation Details, and the experiment results are as follows:

0.3 | 0.4 | 0.5 | 0.6 | 0.7 | Avg.
57.7 | 52.5 | 44.6 | 35.1 | 23.4 | 42.6

However, the results are still about 1.0 lower than the value in the paper.
Could you offer help and figure out this problem?
Thanks a lot.

ActivityNet v1.3 data preprocess & infernece

I downloaded all the sampled video data(32.4G), the total number of these videos is 14950. But the total number of all npy files I get after running step 3 is only 11171. When I run the RGB model inference I also get some FileNotFoundError like "No such file or directory: 'datasets/activitynet/train_val_npy_112/v_JDg--pjY5gg.npy'". I wish i can use some help.

Could you provide the baseline code which only use "Basic Prediction Module" for your work?

I modified your code for baseline results.I deleted all the network structure and loss function after the "Basic Prediction Module" and got a very pool results.But you provided the basekine results in your paper 43.1 31.0 19.0 in table(a). I only modified the following three .py files:BDNet.py,multisegment_loss.py and train.py.

I only leave the two loss functions called "loss_loc_val" and "loss_conf_val" and delete others and I got 0.05981531185934582,0.029753291292292032 and 0.008829597885094938

I don't know how you achieve the baseline results in your paper.

Data download links

Hi,

Could you please provide the download links for the THUMOS14 RGB data numpy files instead of the Weiyun link provided here? I am not able to access the link https://share.weiyun.com/bP62lmHj.

Something either on GDrive or a link with wget access could work.

Thank you for your help!

The issue of Multi-GPU Training

Thanks for adding the code for multi-GPU training. However, changing the value of ngpu in config.py does not seem to work. For example, if ngpu=4, the program still trains only on GPU=0 instead of 4 GPU=0,1,2,3

Error during training

Thank you very much for your great work. I am getting this error while training on Thumos dataset. can you help me?
100% 200/200 [01:02<00:00, 3.20it/s]
0% 0/7842 [01:36<?, ?it/s]
Traceback (most recent call last):
File "AFSD/thumos14/train.py", line 279, in
run_one_epoch(i, net, optimizer, train_data_loader, len(train_dataset) // batch_size)
File "AFSD/thumos14/train.py", line 170, in run_one_epoch
for n_iter, (clips, targets, scores, ssl_clips, ssl_targets, flags) in enumerate(pbar):
File "D:\anaconda\envs\yyf\lib\site-packages\tqdm\std.py", line 1195, in iter
for obj in iterable:
File "D:\anaconda\envs\yyf\lib\site-packages\torch\utils\data\dataloader.py", line 355, in iter
return self._get_iterator()
File "D:\anaconda\envs\yyf\lib\site-packages\torch\utils\data\dataloader.py", line 301, in _get_iterator
return _MultiProcessingDataLoaderIter(self)
File "D:\anaconda\envs\yyf\lib\site-packages\torch\utils\data\dataloader.py", line 914, in init
w.start()
File "D:\anaconda\envs\yyf\lib\multiprocessing\process.py", line 121, in start
self._popen = self._Popen(self)
File "D:\anaconda\envs\yyf\lib\multiprocessing\context.py", line 224, in _Popen
return _default_context.get_context().Process._Popen(process_obj)
File "D:\anaconda\envs\yyf\lib\multiprocessing\context.py", line 327, in _Popen
return Popen(process_obj)
File "D:\anaconda\envs\yyf\lib\multiprocessing\popen_spawn_win32.py", line 93, in init
reduction.dump(process_obj, to_child)
File "D:\anaconda\envs\yyf\lib\multiprocessing\reduction.py", line 60, in dump
ForkingPickler(file, protocol).dump(obj)
MemoryError
Traceback (most recent call last):
File "", line 1, in
File "D:\anaconda\envs\yyf\lib\multiprocessing\spawn.py", line 116, in spawn_main
exitcode = _main(fd, parent_sentinel)
File "D:\anaconda\envs\yyf\lib\multiprocessing\spawn.py", line 126, in _main
self = reduction.pickle.load(from_parent)
EOFError: Ran out of input

Missing default.yaml for video2npy.py

Hello! I'm new here and I find that there lacks a file named default.yaml when I am trying to transform video to npy by myself. Expecting for your reply, thanks!
image

Actual number of classes and class indice

Hello,

Thank you for your great work,

I found out that the number of classes in the config file for thumos14 dataset is the actual number of classes + 1. Here, thumos14 dataset has 20 classes while the config file is set to 21. I also tried it in my costume dataset, and I found out that the number of classes in the config file should be set to the number of actual classes + 1. Otherwise, it gives an error. So, what is that extra class? How can I find the original class indices after the action detection is complete?

An error and a warning when run setup.py

UserWarning: Error checking compiler version for cl: [WinError 2] 系统找不到指定的文件。
warnings.warn('Error checking compiler version for {}: {}'.format(compiler, error))
error: command 'C:\Program Files (x86)\Microsoft Visual Studio 14.0\VC\BIN\x86_amd64\cl.exe' failed with exit status 2
@linchuming

Query regarding the Input Video processing

Hi , i observed that the video for ANet dataset is trimmed off to have 768 frames , most likely to fit GPU. But my question , when feeding the data to I3D backbone , is it sent as ( batch, channel = 3 , temporal = 768 , height, width ) dimension ? or you break it up into windows of 16 and repeatedly fit in the data ?

ActivityNet files?

Hi @linchuming ,

 Thanks for your sharing code. I wonder whether you can also upload the npy files of ActivityNet dataset?
 Thank you.

Error during training

Thank you so much for your great work. I receive this error when I train on my costume dataset based on Thumos. I followed the all of your templates for data annotations. Would you please help me?

0% 0/18218 [00:00<?, ?it/s]/home/nomad/anaconda3/envs/AFSD/lib/python3.8/site-packages/torch/nn/functional.py:3103: UserWarning: The default behavior for interpolate/upsample with float scale_factor changed in 1.6.0 to align with other frameworks/libraries, and now uses scale_factor directly, instead of relying on the computed output size. If you wish to restore the old behavior, please set recompute_scale_factor=True. See the documentation of nn.Upsample for details.
warnings.warn("The default behavior for interpolate/upsample with float scale_factor changed "
0% 17/18218 [00:11<2:03:32, 2.46it/s, loss=58.30155]/opt/conda/conda-bld/pytorch_1603729096996/work/aten/src/ATen/native/cuda/ScatterGatherKernel.cu:115: operator(): block: [0,0,0], thread: [31,0,0] Assertion idx_dim >= 0 && idx_dim < index_size && "index out of bounds" failed.
/opt/conda/conda-bld/pytorch_1603729096996/work/aten/src/ATen/native/cuda/ScatterGatherKernel.cu:115: operator(): block: [0,0,0], thread: [32,0,0] Assertion idx_dim >= 0 && idx_dim < index_size && "index out of bounds" failed.
/opt/conda/conda-bld/pytorch_1603729096996/work/aten/src/ATen/native/cuda/ScatterGatherKernel.cu:115: operator(): block: [0,0,0], thread: [33,0,0] Assertion idx_dim >= 0 && idx_dim < index_size && "index out of bounds" failed.
/opt/conda/conda-bld/pytorch_1603729096996/work/aten/src/ATen/native/cuda/ScatterGatherKernel.cu:115: operator(): block: [0,0,0], thread: [34,0,0] Assertion idx_dim >= 0 && idx_dim < index_size && "index out of bounds" failed.
/opt/conda/conda-bld/pytorch_1603729096996/work/aten/src/ATen/native/cuda/ScatterGatherKernel.cu:115: operator(): block: [0,0,0], thread: [35,0,0] Assertion idx_dim >= 0 && idx_dim < index_size && "index out of bounds" failed.
/opt/conda/conda-bld/pytorch_1603729096996/work/aten/src/ATen/native/cuda/ScatterGatherKernel.cu:115: operator(): block: [0,0,0], thread: [36,0,0] Assertion idx_dim >= 0 && idx_dim < index_size && "index out of bounds" failed.
/opt/conda/conda-bld/pytorch_1603729096996/work/aten/src/ATen/native/cuda/ScatterGatherKernel.cu:115: operator(): block: [0,0,0], thread: [37,0,0] Assertion idx_dim >= 0 && idx_dim < index_size && "index out of bounds" failed.
/opt/conda/conda-bld/pytorch_1603729096996/work/aten/src/ATen/native/cuda/ScatterGatherKernel.cu:115: operator(): block: [0,0,0], thread: [38,0,0] Assertion idx_dim >= 0 && idx_dim < index_size && "index out of bounds" failed.
/opt/conda/conda-bld/pytorch_1603729096996/work/aten/src/ATen/native/cuda/ScatterGatherKernel.cu:115: operator(): block: [0,0,0], thread: [39,0,0] Assertion idx_dim >= 0 && idx_dim < index_size && "index out of bounds" failed.
/opt/conda/conda-bld/pytorch_1603729096996/work/aten/src/ATen/native/cuda/ScatterGatherKernel.cu:115: operator(): block: [0,0,0], thread: [44,0,0] Assertion idx_dim >= 0 && idx_dim < index_size && "index out of bounds" failed.
/opt/conda/conda-bld/pytorch_1603729096996/work/aten/src/ATen/native/cuda/ScatterGatherKernel.cu:115: operator(): block: [0,0,0], thread: [45,0,0] Assertion idx_dim >= 0 && idx_dim < index_size && "index out of bounds" failed.
/opt/conda/conda-bld/pytorch_1603729096996/work/aten/src/ATen/native/cuda/ScatterGatherKernel.cu:115: operator(): block: [0,0,0], thread: [46,0,0] Assertion idx_dim >= 0 && idx_dim < index_size && "index out of bounds" failed.
/opt/conda/conda-bld/pytorch_1603729096996/work/aten/src/ATen/native/cuda/ScatterGatherKernel.cu:115: operator(): block: [0,0,0], thread: [47,0,0] Assertion idx_dim >= 0 && idx_dim < index_size && "index out of bounds" failed.
/opt/conda/conda-bld/pytorch_1603729096996/work/aten/src/ATen/native/cuda/ScatterGatherKernel.cu:115: operator(): block: [0,0,0], thread: [48,0,0] Assertion idx_dim >= 0 && idx_dim < index_size && "index out of bounds" failed.
/opt/conda/conda-bld/pytorch_1603729096996/work/aten/src/ATen/native/cuda/ScatterGatherKernel.cu:115: operator(): block: [0,0,0], thread: [49,0,0] Assertion idx_dim >= 0 && idx_dim < index_size && "index out of bounds" failed.
/opt/conda/conda-bld/pytorch_1603729096996/work/aten/src/ATen/native/cuda/ScatterGatherKernel.cu:115: operator(): block: [0,0,0], thread: [50,0,0] Assertion idx_dim >= 0 && idx_dim < index_size && "index out of bounds" failed.
/opt/conda/conda-bld/pytorch_1603729096996/work/aten/src/ATen/native/cuda/ScatterGatherKernel.cu:115: operator(): block: [0,0,0], thread: [51,0,0] Assertion idx_dim >= 0 && idx_dim < index_size && "index out of bounds" failed.
/opt/conda/conda-bld/pytorch_1603729096996/work/aten/src/ATen/native/cuda/ScatterGatherKernel.cu:115: operator(): block: [0,0,0], thread: [52,0,0] Assertion idx_dim >= 0 && idx_dim < index_size && "index out of bounds" failed.
/opt/conda/conda-bld/pytorch_1603729096996/work/aten/src/ATen/native/cuda/ScatterGatherKernel.cu:115: operator(): block: [0,0,0], thread: [53,0,0] Assertion idx_dim >= 0 && idx_dim < index_size && "index out of bounds" failed.
/opt/conda/conda-bld/pytorch_1603729096996/work/aten/src/ATen/native/cuda/ScatterGatherKernel.cu:115: operator(): block: [0,0,0], thread: [54,0,0] Assertion idx_dim >= 0 && idx_dim < index_size && "index out of bounds" failed.
/opt/conda/conda-bld/pytorch_1603729096996/work/aten/src/ATen/native/cuda/ScatterGatherKernel.cu:115: operator(): block: [0,0,0], thread: [55,0,0] Assertion idx_dim >= 0 && idx_dim < index_size && "index out of bounds" failed.
/opt/conda/conda-bld/pytorch_1603729096996/work/aten/src/ATen/native/cuda/ScatterGatherKernel.cu:115: operator(): block: [0,0,0], thread: [56,0,0] Assertion idx_dim >= 0 && idx_dim < index_size && "index out of bounds" failed.
0% 17/18218 [00:12<3:36:36, 1.40it/s, loss=58.30155]
Traceback (most recent call last):
File "AFSD/thumos14/train.py", line 281, in
run_one_epoch(i, net, optimizer, train_data_loader, len(train_dataset) // batch_size)
File "AFSD/thumos14/train.py", line 174, in run_one_epoch
loss_ct, loss_start, loss_end = forward_one_epoch(
File "AFSD/thumos14/train.py", line 137, in forward_one_epoch
loss_l, loss_c, loss_prop_l, loss_prop_c, loss_ct = CPD_Loss(
File "/home/matthew/anaconda3/envs/AFSD/lib/python3.8/site-packages/torch/nn/modules/module.py", line 727, in _call_impl
result = self.forward(*input, **kwargs)
File "/home/matthew/ActionDetection-AFSD/AFSD/thumos14/multisegment_loss.py", line 254, in forward
N = max(pos.sum(), 1)
RuntimeError: CUDA error: device-side assert triggered

Data Pre-processing for untrimmed videos on non-standard data

Hi,
Congratulations on such a nice work! Also, thank you for open-sourcing the code!
We are trying to use this code on our raw untrimmed videos and want to use this framework for temporal action localization.

We have our own non-standard data with 15 minutes of videos on avg at 30fps and a higher resolution (~500X900). We also have multiple actions in the videos.

For the activity net, I see that the max frames are specified to be 768

Could you please suggest if we need to split video into clips and what would be the length of each clip? Do we need to sample 256/768 frames uniformly? Or should we split clips based on the actions? Could you please point to any starter code that we could refer?

Thanks.

thumos14_gt.json

I noticed that the number of videos in the thumos14_gt. json file is 410, it seems that there are 3 videos missing in the test part, now I check that 'video_test_0000270' is not in the thumos14_gt. json, does this affect the evaluation result?

about the trainning on ActivityNet1.3

based on the paper,should i use the code “python3 AFSD/anet/train_init.py configs/anet.yaml --lw=1 --cw=1 --piou=0.5” to train the net. The lw=1 is right? Why does my loss increase when I train?

about AFSD in activitynet1.3

@linchuming Hello, you use cuhk_ val_ simp_ share. json file when AFSD predicts the class of proposals in activitynet1.3 datasets. Does the model that gets the json file use the temporary boundary annotations of the training set in activitynet1.3 datasets when training the video classifier? Is the video classification score file predicted by untrimmednet network?

about thumos14

Hello, do you remove the background data provided by the thumos 14 dataset during training and testing?

A question about fusion

I have trained the flow model and rgb model by myself, and the results are better than the original results separately. But when I use the fusion method to test the model, the final results are even worse. How should this be explained?

Usage of UntrimmedNet Result used during post-processing of ActivityNet

Hi,

Congrats for your awesome work.

I just want to know why is the Untrimmednet result used during post process ? After reading your paper, it is evident that this work is a localization network ( classification + proposals ) , so why is the UntrimmedNet coming here ? Isnt this network supposed to give you action classification as well ?

Thanks in advance

video2npy.py for activitynet cannot read video frames.

I am trying to extract RGB frames by following ActivityNet Readme.

However, when I run video2npy.py, it cannot read frames for some videos .
In detail, VideoCapture.read() returns False while get(cv2.CAP_PROP_FRAME_COUNT) returns 770 frames.

The videos are not scaled to 112x112. (The videos are also generated by transform_videos.py.)
One of width and height is 112, but another was different.
It seems like that they keep the original aspect ratio during resizing.

Is that a problem? Then, how could I fix this?

what is flow model?

Hi, You didn't refer to any words about "flow" in your paper. I want to know whether you use the optical flow model or not?

What version of opencv-python are you using?

Thanks for your sharing! When I attempt to transfer ant mp4 file to .npy file, some mp4 file could not be read. I guess it's cv2 version problem. So could you tell us what version of opencv-python are you using?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.