open-mmlab / mmaction2 Goto Github PK

OpenMMLab's Next Generation Video Understanding Toolbox and Benchmark

Home Page: https://mmaction2.readthedocs.io

License: Apache License 2.0

Python 83.68% Dockerfile 0.09% Shell 1.27% Jupyter Notebook 14.96%

action-recognition temporal-action-localization pytorch video-understanding tsn i3d slowfast ava spatial-temporal-action-detection benchmark

mmaction2's People

Stargazers

Watchers

Forkers

joannalxy jackytown sux97 dreamerlin hellock hzhang57 gt9505 yangheng111 pczhangpat liu-zhy jsonl1024 bigtailhao dun933 wuxiaomin0110 xrosliang kenzoukun onlyonewater xlocus kennymckormick hjian idayday dingguanglei zhaoyue-zephyrus pawngeek yuta1125tp ziming-liu snorkeldepth irvingzhang0512 sjtuytc whwu95 edenfrenkel zhangluustb xiaming9880 zzwei1 xwen99 dqyu-git hj0wang xvjiarui congee524 liupion yaochaorui tangh matumu tanwf0321 coolsunxu openvinotoolkit zhangjf2018 limingcv penghaozhou hypnosxc ch2go phapnm mike112223 itruonghai huangjun12 cv-ip pengkiina yuanye-f webstorage119 sebastienlinker hduer dandingol03 aakgun yliyu imvdsports xiaoxiaozhangx siyamsajeebkhan roozbehsanaei jin-s13 guliisgreat jkdomoguen wwdok tang662019 jhxu-org anto09 huguyuehuhu sarsigmadelta wjn922 nikky4d z1z9b89 tchang1997 shippingwang duanqipeng riiick2011 magicdream2222 lzkzls yuyang-cloud jiujing23333 warhammer0 launchauto rengangqiang jackweiwang endeavour10020 aycatakmaz jimmy-inl h137437 shreyas-bk yf-res jb892 cclin0

mmaction2's Issues

Inconsistent variable type in generate_labels

In generate_labels

    def generate_labels(self, gt_bbox):
        """Generate training labels."""
        match_score_confidence_list = []
        match_score_start_list = []
        match_score_end_list = []
        for every_gt_bbox in gt_bbox:
            gt_iou_map = []
            for start, end in every_gt_bbox:
                start = start.numpy()
                end = end.numpy()
         ......

The type variable start and end is numpy.float64 instead of tensor, obviously it has no function named numpy(), which leads to an error.
A simple solution would be just comment out these last two lines. (which may be conflict with your design? )
Another solution is found in your training log, just add gt_bbox in the argument keys of ToTensor

train_pipeline = [
    dict(type='LoadLocalizationFeature'),
    dict(type='GenerateLocalizationLabels'),
    dict(
        type='Collect',
        keys=['raw_feature', 'gt_bbox'],
        meta_name='video_meta',
        meta_keys=['video_name']),
    # dict(type='ToTensor', keys=['raw_feature']),
    dict(type='ToTensor', keys=['raw_feature', 'gt_bbox']),
    dict(type='ToDataContainer', fields=[dict(key='gt_bbox', stack=False)])
]

Maybe it's just a simple mistake when uploading the config file ：）

use TSN to train HMDB51, then errors happen

hi mmaction2, First of all, thank you for your contribution. i want to train hmdb51 dataset using TSN, following tutorials, i do something as:

data processing: following preparing_hmdb51.md

$ cd mmaction2/tools/data/hmdb51

# process data and annotation
$ bash download_annotations.sh
$ bash download_videos.sh

# extract data
$ bash extract_rgb_frames.sh

# generate label
$ bash  generate_rawframes_filelist.sh
$ bash generate_videos_filelist.sh

when doing this, the log shows some .avi doesn't work

...
...
rgb 6568 talk/The_Matrix_3_talk_h_nm_np1_fr_goo_13.avi None done       <------------- here
"../../data/hmdb51/videos/talk/Hamlet_(1996)_Fencing_Scenes_talk_u_cm_np1_fr_med_0.avi", frames ≈ 96
extracted frames of video "../../data/hmdb51/videos/talk/Hamlet_(1996)_Fencing_Scenes_talk_u_cm_np1_fr_med_0.avi", 95 frames
1 videos (95 frames, 0 tvl1 flows) processed, using 0.276s, decoding speed 344.203fps, flow speed 0fps
rgb 6569 talk/Hamlet_(1996)_Fencing_Scenes_talk_u_cm_np1_fr_med_0.avi None done              <-------------------here
"../../data/hmdb51/videos/talk/Fellowship_6_talk_h_cm_np1_fr_goo_14.avi", frames ≈ 59
extracted frames of video "../../data/hmdb51/videos/talk/Fellowship_6_talk_h_cm_np1_fr_goo_14.avi", 58 frames
1 videos (58 frames, 0 tvl1 flows) processed, using 0.327s, decoding speed 177.37fps, flow speed 0fps
rgb 6570 talk/Fellowship_6_talk_h_cm_np1_fr_goo_14.avi None done
"../../data/hmdb51/videos/talk/Fellowship_6_talk_h_cm_np1_fr_goo_13.avi", frames ≈ 73
extracted frames of video "../../data/hmdb51/videos/talk/Fellowship_6_talk_h_cm_np1_fr_goo_13.avi", 72 frames
1 videos (72 frames, 0 tvl1 flows) processed, using 0.502s, decoding speed 143.426fps, flow speed 0fps
rgb 6571 talk/Fellowship_6_talk_h_cm_np1_fr_goo_13.avi None done
Genearte raw frames (RGB only)

prepare config

i copy mmaction2/configs/recognition/tsn/tsn_r50_1x1x3_80e_ucf101_rgb.py to tsn_hmdb51_config.py and modified some place

# model settings
model = dict(
    type='Recognizer2D',
    backbone=dict(
        type='ResNet',
        pretrained='torchvision://resnet50',
        depth=50,
        norm_eval=False),
    cls_head=dict(
        type='TSNHead',
        # num_classes=101,
        # 修改类别
        num_classes=51,                 <--------------- here
        in_channels=2048,
        spatial_type='avg',
        consensus=dict(type='AvgConsensus', dim=1),
        dropout_ratio=0.8,
        init_std=0.001))
# model training and testing settings
train_cfg = None
test_cfg = dict(average_clips=None)
# dataset settings
dataset_type = 'RawframeDataset'
data_root = 'data/hmdb51/rawframes/'                                    <--- here
data_root_val = 'data/hmdb51/rawframes/'
ann_file_train = 'data/hmdb51/hmdb51_train_split_{1,2,3}_rawframes.txt'
ann_file_val = 'data/hmdb51/hmdb51_val_split_{1,2,3}_rawframes.txt'
ann_file_test = 'data/hmdb51/hmdb51_val_split_{1,2,3}_rawframes.txt'       <----- here
img_norm_cfg = dict(mean=[104, 117, 128], std=[1, 1, 1], to_bgr=False)
train_pipeline = [
    dict(type='SampleFrames', clip_len=1, frame_interval=1, num_clips=3),
    dict(type='FrameSelector'),
    dict(type='Resize', scale=(-1, 256)),
    dict(
        type='MultiScaleCrop',
        input_size=224,
        scales=(1, 0.875, 0.75, 0.66),
        random_crop=False,
        max_wh_scale_gap=1),
    dict(type='Resize', scale=(224, 224), keep_ratio=False),
    dict(type='Flip', flip_ratio=0.5),
    dict(type='Normalize', **img_norm_cfg),
    dict(type='FormatShape', input_format='NCHW'),
    dict(type='Collect', keys=['imgs', 'label'], meta_keys=[]),
    dict(type='ToTensor', keys=['imgs', 'label'])
]
val_pipeline = [
    dict(
        type='SampleFrames',
        clip_len=1,
        frame_interval=1,
        num_clips=3,
        test_mode=True),
    dict(type='FrameSelector'),
    dict(type='Resize', scale=(-1, 256)),
    dict(type='CenterCrop', crop_size=224),
    dict(type='Flip', flip_ratio=0),
    dict(type='Normalize', **img_norm_cfg),
    dict(type='FormatShape', input_format='NCHW'),
    dict(type='Collect', keys=['imgs', 'label'], meta_keys=[]),
    dict(type='ToTensor', keys=['imgs'])
]
test_pipeline = [
    dict(
        type='SampleFrames',
        clip_len=1,
        frame_interval=1,
        num_clips=25,
        test_mode=True),
    dict(type='FrameSelector'),
    dict(type='Resize', scale=(-1, 256)),
    dict(type='TenCrop', crop_size=224),
    dict(type='Flip', flip_ratio=0),
    dict(type='Normalize', **img_norm_cfg),
    dict(type='FormatShape', input_format='NCHW'),
    dict(type='Collect', keys=['imgs', 'label'], meta_keys=[]),
    dict(type='ToTensor', keys=['imgs'])
]
data = dict(
    videos_per_gpu=32,
    workers_per_gpu=4,
    train=dict(
        type=dataset_type,
        ann_file=ann_file_train,
        data_prefix=data_root,
        pipeline=train_pipeline),
    val=dict(
        type=dataset_type,
        ann_file=ann_file_val,
        data_prefix=data_root_val,
        pipeline=val_pipeline),
    test=dict(
        type=dataset_type,
        ann_file=ann_file_test,
        data_prefix=data_root_val,
        pipeline=test_pipeline))
# optimizer
optimizer = dict(
    type='SGD', lr=0.001, momentum=0.9,
    weight_decay=0.0005)  # this lr is used for 8 gpus
optimizer_config = dict(grad_clip=dict(max_norm=40, norm_type=2))
# learning policy
lr_config = dict(policy='step', step=[30, 60])
total_epochs = 80
checkpoint_config = dict(interval=5)
evaluation = dict(
    interval=5, metrics=['top_k_accuracy', 'mean_class_accuracy'], topk=(1, 5))
log_config = dict(
    interval=20,
    hooks=[
        dict(type='TextLoggerHook'),
        # dict(type='TensorboardLoggerHook'),
    ])
# runtime settings
dist_params = dict(backend='nccl')
log_level = 'INFO'
work_dir = './work_dirs/tsn_r50_1x1x3_80e_hmdb51_rgb/'
# 使用预训练模型
load_from = './checkpoints/tsn_r50_1x1x3_80e_ucf101_rgb_20200613-d6ad9c48.pth'           <- here
# load_from = None
resume_from = None
workflow = [('train', 1)]

begin train. using following statement

$ CUDA_VISIBLE_DEVICES=3 python tools/train.py configs/recognition/tsn/tsn_hmdb51_config.py 
2020-08-11 15:19:47,841 - mmaction - INFO - Environment info:
------------------------------------------------------------
sys.platform: linux
Python: 3.7.7 (default, Mar 26 2020, 15:48:22) [GCC 7.3.0]
CUDA available: True
...
...
...
# load_from = None
resume_from = None
workflow = [('train', 1)]

2020-08-11 15:19:49,517 - mmaction - INFO - These parameters in pretrained checkpoint are not loaded: {'fc.weight', 'fc.bias'}
Traceback (most recent call last):
  File "tools/train.py", line 146, in <module>
    main()
  File "tools/train.py", line 125, in main
    datasets = [build_dataset(cfg.data.train)]
  File "/home/zj/zhonglian/mmaction2/mmaction/datasets/builder.py", line 39, in build_dataset
    dataset = build_from_cfg(cfg, DATASETS, default_args)
  File "/home/zj/zhonglian/mmcv/mmcv/utils/registry.py", line 167, in build_from_cfg
    return obj_cls(**args)
  File "/home/zj/zhonglian/mmaction2/mmaction/datasets/rawframe_dataset.py", line 92, in __init__
    multi_class, num_classes, modality)
  File "/home/zj/zhonglian/mmaction2/mmaction/datasets/base.py", line 57, in __init__
    self.video_infos = self.load_annotations()
  File "/home/zj/zhonglian/mmaction2/mmaction/datasets/rawframe_dataset.py", line 97, in load_annotations
    with open(self.ann_file, 'r') as fin:
FileNotFoundError: [Errno 2] No such file or directory: 'data/hmdb51/hmdb51_train_split_{1,2,3}_rawframes.txt'

ok, i modified the path to ann_file_train = 'data/hmdb51/hmdb51_train_split_1_rawframes.txt', then another error happens

...
...
2020-08-11 15:36:29,645 - mmaction - INFO - These parameters in pretrained checkpoint are not loaded: {'fc.bias', 'fc.weight'}
2020-08-11 15:36:41,862 - mmaction - INFO - load checkpoint from ./checkpoints/tsn_r50_1x1x3_80e_ucf101_rgb_20200613-d6ad9c48.pth
2020-08-11 15:36:44,408 - mmaction - WARNING - The model and loaded state dict do not match exactly

size mismatch for cls_head.fc_cls.weight: copying a param with shape torch.Size([101, 2048]) from checkpoint, the shape in current model is torch.Size([51, 2048]).
size mismatch for cls_head.fc_cls.bias: copying a param with shape torch.Size([101]) from checkpoint, the shape in current model is torch.Size([51]).
2020-08-11 15:36:44,409 - mmaction - INFO - Start running, host: zj@user-SYS-7049GP-TRT, work_dir: /home/zj/zhonglian/mmaction2/work_dirs/tsn_r50_1x1x3_80e_hmdb51_rgb
2020-08-11 15:36:44,409 - mmaction - INFO - workflow: [('train', 1)], max: 80 epochs
Traceback (most recent call last):
  File "tools/train.py", line 146, in <module>
    main()
  File "tools/train.py", line 142, in main
    meta=meta)
  File "/home/zj/zhonglian/mmaction2/mmaction/apis/train.py", line 111, in train_model
    runner.run(data_loaders, cfg.workflow, cfg.total_epochs)
  File "/home/zj/zhonglian/mmcv/mmcv/runner/epoch_based_runner.py", line 122, in run
    epoch_runner(data_loaders[i], **kwargs)
  File "/home/zj/zhonglian/mmcv/mmcv/runner/epoch_based_runner.py", line 27, in train
    for i, data_batch in enumerate(data_loader):
  File "/home/zj/anaconda3/envs/zhonglian/lib/python3.7/site-packages/torch/utils/data/dataloader.py", line 345, in __next__
    data = self._next_data()
  File "/home/zj/anaconda3/envs/zhonglian/lib/python3.7/site-packages/torch/utils/data/dataloader.py", line 838, in _next_data
    return self._process_data(data)
  File "/home/zj/anaconda3/envs/zhonglian/lib/python3.7/site-packages/torch/utils/data/dataloader.py", line 881, in _process_data
    data.reraise()
  File "/home/zj/anaconda3/envs/zhonglian/lib/python3.7/site-packages/torch/_utils.py", line 394, in reraise
    raise self.exc_type(msg)
FileNotFoundError: Caught FileNotFoundError in DataLoader worker process 1.
Original Traceback (most recent call last):
  File "/home/zj/anaconda3/envs/zhonglian/lib/python3.7/site-packages/torch/utils/data/_utils/worker.py", line 178, in _worker_loop
    data = fetcher.fetch(index)
  File "/home/zj/anaconda3/envs/zhonglian/lib/python3.7/site-packages/torch/utils/data/_utils/fetch.py", line 44, in fetch
    data = [self.dataset[idx] for idx in possibly_batched_index]
  File "/home/zj/anaconda3/envs/zhonglian/lib/python3.7/site-packages/torch/utils/data/_utils/fetch.py", line 44, in <listcomp>
    data = [self.dataset[idx] for idx in possibly_batched_index]
  File "/home/zj/zhonglian/mmaction2/mmaction/datasets/base.py", line 103, in __getitem__
    return self.prepare_train_frames(idx)
  File "/home/zj/zhonglian/mmaction2/mmaction/datasets/rawframe_dataset.py", line 137, in prepare_train_frames
    return self.pipeline(results)
  File "/home/zj/zhonglian/mmaction2/mmaction/datasets/pipelines/compose.py", line 41, in __call__
    data = t(data)
  File "/home/zj/zhonglian/mmaction2/mmaction/datasets/pipelines/loading.py", line 848, in __call__
    img_bytes = self.file_client.get(filepath)
  File "/home/zj/zhonglian/mmcv/mmcv/fileio/file_client.py", line 294, in get
    return self.client.get(filepath)
  File "/home/zj/zhonglian/mmcv/mmcv/fileio/file_client.py", line 185, in get
    with open(filepath, 'rb') as f:
FileNotFoundError: [Errno 2] No such file or directory: '/home/zj/zhonglian/mmaction2/data/hmdb51/rawframes/shake_hands/Concourse_d_elegance_Sofia_2009___PRICE_GIVING_CEREMONY_shake_hands_f_cm_np2_le_med_0/img_00066.jpg'

i check the img dir

$ ls
img_00000.jpg  img_00006.jpg  img_00012.jpg  img_00018.jpg  img_00024.jpg  img_00030.jpg  img_00036.jpg  img_00042.jpg  img_00048.jpg  img_00054.jpg  img_00060.jpg
img_00001.jpg  img_00007.jpg  img_00013.jpg  img_00019.jpg  img_00025.jpg  img_00031.jpg  img_00037.jpg  img_00043.jpg  img_00049.jpg  img_00055.jpg  img_00061.jpg
img_00002.jpg  img_00008.jpg  img_00014.jpg  img_00020.jpg  img_00026.jpg  img_00032.jpg  img_00038.jpg  img_00044.jpg  img_00050.jpg  img_00056.jpg  img_00062.jpg
img_00003.jpg  img_00009.jpg  img_00015.jpg  img_00021.jpg  img_00027.jpg  img_00033.jpg  img_00039.jpg  img_00045.jpg  img_00051.jpg  img_00057.jpg  img_00063.jpg
img_00004.jpg  img_00010.jpg  img_00016.jpg  img_00022.jpg  img_00028.jpg  img_00034.jpg  img_00040.jpg  img_00046.jpg  img_00052.jpg  img_00058.jpg  img_00064.jpg
img_00005.jpg  img_00011.jpg  img_00017.jpg  img_00023.jpg  img_00029.jpg  img_00035.jpg  img_00041.jpg  img_00047.jpg  img_00053.jpg  img_00059.jpg  img_00065.jpg

there is no img_00066.jpg happens, why this will happen and how to solve it ? Looking forward to your help

No such file or directory: 'openmm/mmaction2/data/ucf101/rawframes/Skiing/v_Skiing_g06_c04/img_00300.jpg

for example, the images in folder openmm/mmaction2/data/ucf101/rawframes/Skiing/v_Skiing_g06_c04/ was img_00000.jpg~img_00299.jpg

mmaction2/mmaction/datasets/rawframe_dataset.py

Line 115 in 6307050

video_info['total_frames'] = int(line_split[idx])

i modified this line to

video_info['total_frames'] = int(line_split[idx]) - 1

and it started training, is it right?

about predict different labels in a video

I have a long video that contains a lot of labels, I want create a 300 frames detection window .I want to change the demo.py to do this. But it seems that the demo.py have to read the path of the shortcut video and got one predict. My temporary treatment plan is when i got 300 frames , I save the video as a temporary file，and call the demo.py program.then delete temporary file. It's stupid and inefficiency How can i read a long video and get the continuous prediction.please help me.

Test picture

I successfully used demo.py to test a video, but how can I test a picture from my own data?
If I can. I want to compare the result between video and pictures in order from the same data.
I have seen the " dataset_type = 'RawframeDataset' " and " dataset_type = 'VideoDataset' " in the config.

Log file doesn't correspond to the performance you report!

https://openmmlab.oss-accelerate.aliyuncs.com/mmaction/recognition/slowfast/slowfast_r50_256p_4x16x1_256e_kinetics400_rgb/20200728_022505.log.json#
Only get 30% top-1 acc.

Method of RGB fram extraction

Thanks for your elegant implementation of this toolbox.

The doc says denseflow installation is unnecessary for RGB frame extraction, but I find this script still uses denseflow for both RGB and flow extraction. I am wondering which one should be trusted.

Train custom data

I write a slowfast_custom_config.py . It reads:

model = dict(
type='Recognizer3D',
backbone=dict(
type='ResNet3dSlowFast',
pretrained=None,
resample_rate=8, # tau
speed_ratio=8, # alpha
channel_ratio=8, # beta_inv
slow_pathway=dict(
type='resnet3d',
depth=50,
pretrained=None,
lateral=True,
conv1_kernel=(1, 7, 7),
dilations=(1, 1, 1, 1),
conv1_stride_t=1,
pool1_stride_t=1,
inflate=(0, 0, 1, 1),
norm_eval=False),
fast_pathway=dict(
type='resnet3d',
depth=50,
pretrained=None,
lateral=False,
base_channels=8,
conv1_kernel=(5, 7, 7),
conv1_stride_t=1,
pool1_stride_t=1,
norm_eval=False)),
cls_head=dict(
type='SlowFastHead',
in_channels=2304, # 2048+256
num_classes=400,
spatial_type='avg',
dropout_ratio=0.5))
train_cfg = None
test_cfg = dict(average_clips=None)
dataset_type = 'VideoDataset'
data_root = '/dat01/liuzhixiong/zt/mmaction2/data/fortest/videos_train'
data_root_val = '/dat01/liuzhixiong/zt/mmaction2/data/fortest/videos_val'
ann_file_train = '/dat01/liuzhixiong/zt/mmaction2/data/fortest/train_list_videos.txt'
ann_file_val = '/dat01/liuzhixiong/zt/mmaction2/data/fortest/val_list_videos.txt'
ann_file_test = '/dat01/liuzhixiong/zt/mmaction2/data/fortest/val_list_videos.txt'
img_norm_cfg = dict(
mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], to_bgr=False)
train_pipeline = [
dict(type='DecordInit'),
dict(type='SampleFrames', clip_len=32, frame_interval=2, num_clips=1),
dict(type='DecordDecode'),
dict(type='Resize', scale=(-1, 256)),
dict(type='RandomResizedCrop'),
dict(type='Resize', scale=(224, 224), keep_ratio=False),
dict(type='Flip', flip_ratio=0.5),
dict(type='Normalize', **img_norm_cfg),
dict(type='FormatShape', input_format='NCTHW'),
dict(type='Collect', keys=['imgs', 'label'], meta_keys=[]),
dict(type='ToTensor', keys=['imgs', 'label'])
]
val_pipeline = [
dict(type='DecordInit'),
dict(
type='SampleFrames',
clip_len=32,
frame_interval=2,
num_clips=1,
test_mode=True),
dict(type='DecordDecode'),
dict(type='Resize', scale=(-1, 256)),
dict(type='CenterCrop', crop_size=224),
dict(type='Flip', flip_ratio=0),
dict(type='Normalize', **img_norm_cfg),
dict(type='FormatShape', input_format='NCTHW'),
dict(type='Collect', keys=['imgs', 'label'], meta_keys=[]),
dict(type='ToTensor', keys=['imgs'])
]
test_pipeline = [
dict(type='DecordInit'),
dict(
type='SampleFrames',
clip_len=32,
frame_interval=2,
num_clips=10,
test_mode=True),
dict(type='DecordDecode'),
dict(type='Resize', scale=(-1, 256)),
dict(type='ThreeCrop', crop_size=256),
dict(type='Flip', flip_ratio=0),
dict(type='Normalize', **img_norm_cfg),
dict(type='FormatShape', input_format='NCTHW'),
dict(type='Collect', keys=['imgs', 'label'], meta_keys=[]),
dict(type='ToTensor', keys=['imgs'])
]
data = dict(
videos_per_gpu=4,
workers_per_gpu=4,
train=dict(
type=dataset_type,
ann_file=ann_file_train,
data_prefix=data_root,
pipeline=train_pipeline),
val=dict(
type=dataset_type,
ann_file=ann_file_val,
data_prefix=data_root_val,
pipeline=val_pipeline),
test=dict(
type=dataset_type,
ann_file=ann_file_test,
data_prefix=data_root_val,
pipeline=test_pipeline))

optimizer = dict(
type='SGD', lr=0.1, momentum=0.9,
weight_decay=0.0001) # this lr is used for 8 gpus
optimizer_config = dict(grad_clip=dict(max_norm=40, norm_type=2))

lr_config = dict(
policy='CosineAnnealing',
min_lr=0,
warmup='linear',
warmup_by_epoch=True,
warmup_iters=34)
total_epochs = 256
checkpoint_config = dict(interval=4)
workflow = [('train', 1)]
evaluation = dict(
interval=5, metrics=['top_k_accuracy', 'mean_class_accuracy'], topk=(1, 5))
log_config = dict(
interval=20,
hooks=[
dict(type='TextLoggerHook'),
# dict(type='TensorboardLoggerHook'),
])
dist_params = dict(backend='nccl')
log_level = 'INFO'
work_dir = './work_dirs/slowfast_r50_video_3d_4x16x1_256e_fortest_rgb'
load_from = None
resume_from = None
find_unused_parameters = False

The train_list_videos.txt follows the tips.It reads:

data/fortest/videos_train/01_trian.mp4 1
data/fortest/videos_train/02_trian.mp4 1
data/fortest/videos_train/03_trian.mp4 1
data/fortest/videos_train/04_trian.mp4 2
data/fortest/videos_train/05_trian.mp4 3
......

But when I ues:
python tools/train.py configs/recognition/slowfast/slowfast_custom_config.py
--work-dir work_dirs/slowfast_r50_4x16x1_256e_fortest_rgb
--validate --seed 0 --deterministic
to sbatch my job.

It feedbacks:

Traceback (most recent call last):
File "tools/train.py", line 146, in
main()
File "tools/train.py", line 125, in main
datasets = [build_dataset(cfg.data.train)]
File "/dat01/liuzhixiong/zt/mmaction2/mmaction/datasets/builder.py", line 39, in build_dataset
dataset = build_from_cfg(cfg, DATASETS, default_args)
File "/dat01/liuzhixiong/anaconda3/envs/mmaction/lib/python3.6/site-packages/mmcv/utils/registry.py", line 167, in build_from_cfg
return obj_cls(**args)
File "/dat01/liuzhixiong/zt/mmaction2/mmaction/datasets/video_dataset.py", line 43, in init
super().init(ann_file, pipeline, start_index=start_index, **kwargs)
File "/dat01/liuzhixiong/zt/mmaction2/mmaction/datasets/base.py", line 63, in init
self.video_infos = self.load_annotations()
File "/dat01/liuzhixiong/zt/mmaction2/mmaction/datasets/video_dataset.py", line 58, in load_annotations
filename, label = line_split
ValueError: not enough values to unpack (expected 2, got 0)

I don't konw why it can't read my list.txt

Test video from URL

There lots of videos in webs, it's common to testing a video directly from video's URL, rather than download the video to disk as a temp file and then run the testing pipeline.

Would you consider implementing such testing pipeline?

Thanks.

Could you provide the classification component for Temporal Action Localization task to get the mAP?

Thanks for your awesome job.
Now you provide the code of BSN and BMN for Temporal Action Localization. But it only contains the Temporal Propocal Generation part. I note that many works apply the untrimmedNet (CUHK & ETHZ & SIAT Submission to ActivityNet Challenge) to the get classification results, but I have not found the classification results file or a easy way to get the classification results .
Do you have plan to provide the code for classifing the proposals to get the final metric mAP?

decord SampleFrames start_index=1 bug

i try to train kinetics with this config, but get index out of range error.
after some debugging, i find that this bug is caused by default setting start_index = 1 in SampleFrames.
I think start_index should be 0 for decord.

Demo.py

when I try to run demo.py to test my own video
I use:
python demo/demo.py configs/recognition/slowfast/slowfast_r50_4x16x1_256e_kinetics400_rgb.py demo/checkpoints/slowfast_r50_4x16x1_256e_kinetics400_rgb_20200618-9a124260.pth demo/test1.mp4 demo/label_map.txt

but it was wrong and say:
Traceback (most recent call last):
File "demo/demo.py", line 35, in
main()
File "demo/demo.py", line 27, in main
results = inference_recognizer(model, args.video, args.label)
File "/dat01/wangbo2/ZT/mmaction2/mmaction/apis/inference.py", line 63, in inference_recognizer
data = test_pipeline(data)
File "/dat01/wangbo2/ZT/mmaction2/mmaction/datasets/pipelines/compose.py", line 41, in call
data = t(data)
File "/dat01/wangbo2/ZT/mmaction2/mmaction/datasets/pipelines/loading.py", line 582, in call
directory = results['frame_dir']
KeyError: 'frame_dir'

my environment:
python 3.6
pytorch 1.3
others followed the requirements

need your help!

mmaction vs mmaction2

hello! thanks for the new repo! I just wanna ask, why mmaction2? why not reorganizing mmaction codebase? what's the difference between mmaction and mmaction2?

the results of the experiment

Hi, I did not find the results of the experiment. I would like to ask how the UCF 101 dataset performs on slowfast in your experiment? And have you achieve the acc in the original paper?

how to set custom lr updater?

Hi,
I want to use 'StepLrUpdaterHook', but I do not want it decrease to 0.1 * lr at the step I specified. What I want is base_lr = 0.1, and then the flowing decreased lr is 0.5 * base_lr, 0.1 * base_lr, 0.05 * base_lr, 0.001 * base_lr.
How can i do it?

Thanks in advance!

MEVA or Virat dataset

This is probably a silly question. I am interested in aerial and surveillance action recognition. Like for the recently completed ActivityNet at CVPR, there was a surveillance challenge with MEVA/Virat data. As the actions there are not as nuanced as in Kinectics/Ava, and some are also taken from aerial perspective, can we still apply mmaction2 for that type of data?

When will you add "Multigrid training"?

Multigrid training.

Typo in decord init

A copy typo
https://github.com/open-mmlab/mmaction2/blob/master/mmaction/datasets/pipelines/loading.py#L682
PyAV Init -> Decord Init

How to add NECK module

I want to reimplement TPN in mmaction2. TPN registries the 'TPN' NECK module in the original mmaction, how can I implement this function in mmaction2?

sth v1 preparation

The original something-something v1 dataset already contains frames after extraction. So the preparation process probably needs a refactorization. What is needed is just renaming the extracted frames to follow the naming convention "img_%05d.jpg".

test.py

I want know whether test.py can output the predicted values( if I have 3cls, it can output 0 or 1 or 2? ) or labels?
I followed your guide to add '-out result,json', but I can't understand what the values mean in the result.json .

Also, I want to ask a question about the model:
As a 3D model, whether Slowfast can predict with only one picture as input?
I have tried my idea, but it doesn't work, maybe I get wrong dataset for rawframes.

About the supporting of FineGym datasets

Will this codebase add FineGym to the data_preparation?

Could you share your Kinetics400 dataset?

I cannot download the Kinetics400 dataset. When I train your tsn model, it's hard to reproduce your released accuracy. I don't know the problem. Please, could you share your used kinetics400 data set?

real-time recognition

It can be used to recognize real-time videos with webcamera or something else?

The test command in BMN README can not run.

The test command in BMN README can not run. Please fix and also check other README files for temporal proposal generation.

"workers_per_gpu" settings do not work

My CPU is AMD ThreadRipper 2990wx and GPU is Titan RTX.

No matter how much I set workers_per_gpu to, the code only uses one thread of the cpu, and cannot use all the 64 threads of the cpu.

Can anyone help me,Thanks!!!

Tables in the docs of TIN are not correctly displayed

Thanks for your error report and we appreciate it a lot.

Checklist

I have searched related issues but cannot get the expected help.
The bug has not been fixed in the latest version.

Describe the bug
In the document of TIN under the Modelzoo section, some tables are not correctly displayed. [Link]
But it seems fine in the README of TIN, therefore a re-compilation of the document may be required.

re:error

tsn_r50_1x1x3_80e_ucf101_rgb.py
Traceback (most recent call last):
File "D:\Anaconda\envs\mmaction2-master\lib\sre_parse.py", line 1015, in parse_template
this = chr(ESCAPES[this][1])
KeyError: '\U'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "C:/Users/11987/Desktop/小论文素材/model/mmaction2-master/tools/train.py", line 167, in
main()
File "C:/Users/11987/Desktop/小论文素材/model/mmaction2-master/tools/train.py", line 83, in main
cfg = Config.fromfile(args.config)
File "D:\Anaconda\envs\mmaction2-master\lib\site-packages\mmcv-1.0.2-py3.7-win-amd64.egg\mmcv\utils\config.py", line 204, in fromfile
use_predefined_variables)
File "D:\Anaconda\envs\mmaction2-master\lib\site-packages\mmcv-1.0.2-py3.7-win-amd64.egg\mmcv\utils\config.py", line 127, in _file2dict
temp_config_file.name)
File "D:\Anaconda\envs\mmaction2-master\lib\site-packages\mmcv-1.0.2-py3.7-win-amd64.egg\mmcv\utils\config.py", line 108, in _substitute_predefined_vars
config_file = re.sub(regexp, value, config_file)
File "D:\Anaconda\envs\mmaction2-master\lib\re.py", line 192, in sub
return _compile(pattern, flags).sub(repl, string, count)
File "D:\Anaconda\envs\mmaction2-master\lib\re.py", line 309, in _subx
template = _compile_repl(template, pattern)
File "D:\Anaconda\envs\mmaction2-master\lib\re.py", line 300, in _compile_repl
return sre_parse.parse_template(repl, pattern)
File "D:\Anaconda\envs\mmaction2-master\lib\sre_parse.py", line 1018, in parse_template
raise s.error('bad escape %s' % this, len(this))
re.error: bad escape \U at position 2

When I run train.py, I encountered this problem, I tried to change re to regex and I also got this:

regex._regex_core.error: incomplete escape \U at position 5

Can anyone help me,Thanks!!!

fail to run webcam demo with r2plus1d

Hi, thanks for providing this awesome tool first.

I trained on my own datasets and it works on the webcam demo with TSN.

I tried to run the webcam demo with r2plus1d but it failed.

Here is the error messages:

Traceback (most recent call last):
File "demo/webcam_demo.py", line 161, in
main()
File "demo/webcam_demo.py", line 157, in main
predict_webcam_video()
File "demo/webcam_demo.py", line 83, in predict_webcam_video
cur_data = test_pipeline(cur_data)
File "/home/ubuntu/Desktop/YHWang/mmaction2/mmaction/datasets/pipelines/compose.py", line 41, in call
data = t(data)
File "/home/ubuntu/Desktop/YHWang/mmaction2/mmaction/datasets/pipelines/formating.py", line 248, in call
num_clips = results['num_clips']
KeyError: 'num_clips'

The config i modified is num_classes(in r2plus1d_r34_video_inference_8x8x1_180e_kinetics400_rgb.py). I changed it from 400 to 12 (my datasets class numbers).
After a little test, i found that it fail to get the clip_len and num_clips in the test_pipeline dict.
I tried to comment some code in formating.py:
"
if self.input_format == 'NCTHW':
#num_clips = results['num_clips']
#clip_len = results['clip_len']

        imgs = imgs.reshape((-1, num_clips, clip_len) + imgs.shape[1:])

"
and i change num_clips, clip_len to some number then it works.
But the predictied label doesn't change by time, maybe the result is wrong.

Sorry for my poor english.
Could you give me some idea? Thanks for you help!

reproducing TSM_R50_1x1x16_50e_sthv2 issue

Notice

There are several common situations in the reimplementation issues as below

Reimplement a model in the model zoo using the provided configs

Checklist

I have searched related issues but cannot get the expected help.

Describe the issue

When I tested tsm_r50_1x1x16_50e_sthv2_rgb with this checkpoint , the result is lower than the reported accuracy (57.68/83.65).

I used sthv2 dataset in original webm video format.

Reproduction

What command or script did you run?

 bash tools/dist_test.sh configs/recognition/tsm/tsm_r50_1x1x16_50e_sthv2_rgb.py work_dirs/tsm_r50_1x1x16_50e_sthv2_rgb_20200621-60ff441a.pth 8 --eval top_k_accuracy mean_class_accuracy

What config dir you run?

configs/recognition/tsm/tsm_r50_1x1x16_50e_sthv2_rgb.py

Did you make any modifications on the code or config? Did you understand what you have modified?

To use something-somethingv-2 original video dataset, I just made sthv2_{train, val}_list_videos.txt files.

Also, modified the config file to use this video format.

# model settings
model = dict(
    type='Recognizer2D',
    backbone=dict(
        type='ResNetTSM',
        pretrained='torchvision://resnet50',
        depth=50,
        norm_eval=False,
        shift_div=8),
    cls_head=dict(
        type='TSMHead',
        num_classes=339,
        in_channels=2048,
        spatial_type='avg',
        consensus=dict(type='AvgConsensus', dim=1),
        dropout_ratio=0.5,
        init_std=0.001,
        is_shift=True))
# model training and testing settings
train_cfg = None
test_cfg = dict(average_clips=None)
# dataset settings
# dataset_type = 'RawframeDataset'
# data_root = 'data/sthv2/rawframes'
# data_root_val = 'data/sthv2/rawframes'
# ann_file_train = 'data/sthv2/sthv2_train_list_rawframes.txt'
# ann_file_val = 'data/sthv2/sthv2_val_list_rawframes.txt'
# ann_file_test = 'data/sthv2/sthv2_val_list_rawframes.txt'
dataset_type = 'VideoDataset'
data_root = 'data/sthv2/videos'
data_root_val = 'data/sthv2/videos'
ann_file_train = 'data/sthv2/sthv2_train_list_videos.txt'
ann_file_val = 'data/sthv2/sthv2_val_list_videos.txt'
ann_file_test = 'data/sthv2/sthv2_val_list_videos.txt'
img_norm_cfg = dict(
    mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], to_bgr=False)
train_pipeline = [
    dict(type='DecordInit'),
    dict(type='SampleFrames', clip_len=1, frame_interval=1, num_clips=16),
    # dict(type='RawFrameDecode'),
    dict(type='DecordDecode'),
    dict(type='Resize', scale=(-1, 256)),
    dict(
        type='MultiScaleCrop',
        input_size=224,
        scales=(1, 0.875, 0.75, 0.66),
        random_crop=False,
        max_wh_scale_gap=1,
        num_fixed_crops=13),
    dict(type='Resize', scale=(224, 224), keep_ratio=False),
    dict(type='Normalize', **img_norm_cfg),
    dict(type='FormatShape', input_format='NCHW'),
    dict(type='Collect', keys=['imgs', 'label'], meta_keys=[]),
    dict(type='ToTensor', keys=['imgs', 'label'])
]
val_pipeline = [
    dict(type='DecordInit'),
    dict(
        type='SampleFrames',
        clip_len=1,
        frame_interval=1,
        num_clips=16,
        test_mode=True),
    # dict(type='RawFrameDecode'),
    dict(type='DecordDecode'),
    dict(type='Resize', scale=(-1, 256)),
    dict(type='CenterCrop', crop_size=224),
    dict(type='Normalize', **img_norm_cfg),
    dict(type='FormatShape', input_format='NCHW'),
    dict(type='Collect', keys=['imgs', 'label'], meta_keys=[]),
    dict(type='ToTensor', keys=['imgs'])
]
test_pipeline = [
    dict(type='DecordInit'),    
    dict(
        type='SampleFrames',
        clip_len=1,
        frame_interval=1,
        num_clips=16,
        test_mode=True),
    # dict(type='RawFrameDecode'),
    dict(type='DecordDecode'),
    dict(type='Resize', scale=(-1, 256)),
    dict(type='CenterCrop', crop_size=224),
    dict(type='Normalize', **img_norm_cfg),
    dict(type='FormatShape', input_format='NCHW'),
    dict(type='Collect', keys=['imgs', 'label'], meta_keys=[]),
    dict(type='ToTensor', keys=['imgs'])
]
data = dict(
    videos_per_gpu=6,
    workers_per_gpu=4,
    train=dict(
        type=dataset_type,
        ann_file=ann_file_train,
        data_prefix=data_root,
        pipeline=train_pipeline),
    val=dict(
        type=dataset_type,
        ann_file=ann_file_val,
        data_prefix=data_root_val,
        pipeline=val_pipeline),
    test=dict(
        type=dataset_type,
        ann_file=ann_file_test,
        data_prefix=data_root_val,
        pipeline=test_pipeline))
# optimizer
optimizer = dict(
    type='SGD',
    constructor='TSMOptimizerConstructor',
    paramwise_cfg=dict(fc_lr5=True),
    lr=0.0075,  # this lr is used for 8 gpus
    momentum=0.9,
    weight_decay=0.0005)
optimizer_config = dict(grad_clip=dict(max_norm=20, norm_type=2))
# learning policy
lr_config = dict(policy='step', step=[20, 40])
total_epochs = 50
checkpoint_config = dict(interval=1)
evaluation = dict(
    interval=2, metrics=['top_k_accuracy', 'mean_class_accuracy'], topk=(1, 5))
log_config = dict(
    interval=20,
    hooks=[
        dict(type='TextLoggerHook'),
        # dict(type='TensorboardLoggerHook'),
    ])
# runtime settings
dist_params = dict(backend='nccl')
log_level = 'INFO'
work_dir = './work_dirs/tsm_r50_1x1x16_50e_sthv2_rgb/'
load_from = None
resume_from = None
workflow = [('train', 1)]

What dataset did you use?

--> Something-Something-V2

Environment

Please run PYTHONPATH=${PWD}:$PYTHONPATH python mmaction/utils/collect_env.py to collect necessary environment information and paste it here.

sys.platform: linux
Python: 3.7.9 (default, Aug 31 2020, 12:42:55) [GCC 7.3.0]
CUDA available: True
GPU 0,1,2,3,4,5,6,7: TITAN Xp
CUDA_HOME: /usr/local/cuda
NVCC:
GCC: gcc (Ubuntu 5.4.0-6ubuntu1~16.04.12) 5.4.0 20160609
PyTorch: 1.6.0
PyTorch compiling details: PyTorch built with:

GCC 7.3

C++ Version: 201402

Intel(R) Math Kernel Library Version 2020.0.2 Product Build 20200624 for Intel(R) 64 architecture applications

Intel(R) MKL-DNN v1.5.0 (Git Hash e2ac1fac44c5078ca927cb9b90e1b3066a0b2ed0)

OpenMP 201511 (a.k.a. OpenMP 4.5)

NNPACK is enabled

CPU capability usage: AVX2

CUDA Runtime 10.2

NVCC architecture flags: -gencode;arch=compute_37,code=sm_37;-gencode;arch=compute_50,code=sm_50;-gencode;arch=compute_60,code=sm_60;-gencode;arch=compute_61,code=sm_61;-gencode;arch=compute_70,code=sm_70;-gencode;arch=compute_75,code=sm_75;-gencode;arch=compute_37,code=compute_37

CuDNN 7.6.5

Magma 2.5.2

Build settings: BLAS=MKL, BUILD_TYPE=Release, CXX_FLAGS= -Wno-deprecated -fvisibility-inlines-hidden -DUSE_PTHREADPOOL -fopenmp -DNDEBUG -DUSE_FBGEMM -DUSE_QNNPACK -DUSE_PYTORCH_QNNPACK -DUSE_XNNPACK -DUSE_VULKAN_WRAPPER -O2 -fPIC -Wno-narrowing -Wall -Wextra -Werror=return-type -Wno-missing-field-initializers -Wno-type-limits -Wno-array-bounds -Wno-unknown-pragmas -Wno-sign-compare -Wno-unused-parameter -Wno-unused-variable -Wno-unused-function -Wno-unused-result -Wno-unused-local-typedefs -Wno-strict-overflow -Wno-strict-aliasing -Wno-error=deprecated-declarations -Wno-stringop-overflow -Wno-error=pedantic -Wno-error=redundant-decls -Wno-error=old-style-cast -fdiagnostics-color=always -faligned-new -Wno-unused-but-set-variable -Wno-maybe-uninitialized -fno-math-errno -fno-trapping-math -Werror=format -Wno-stringop-overflow, PERF_WITH_AVX=1, PERF_WITH_AVX2=1, PERF_WITH_AVX512=1, USE_CUDA=ON, USE_EXCEPTION_PTR=1, USE_GFLAGS=OFF, USE_GLOG=OFF, USE_MKL=ON, USE_MKLDNN=ON, USE_MPI=OFF, USE_NCCL=ON, USE_NNPACK=ON, USE_OPENMP=ON, USE_STATIC_DISPATCH=OFF,

TorchVision: 0.7.0
OpenCV: 4.4.0
MMCV: 1.1.2
MMCV Compiler: GCC 7.3
MMCV CUDA Compiler: 10.2
MMAction2: 0.6.0+7dc58b3

You may add addition that may be helpful for locating the problem, such as
- How you installed PyTorch [e.g., pip, conda, source]
  --> by conda

Results

If applicable, paste the related results here, e.g., what you expect and what you get.

Evaluating top_k_accuracy...

top1_acc        0.4162
top5_acc        0.7047

Evaluating mean_class_accuracy...

mean_acc        0.3648
top1_acc: 0.4162
top5_acc: 0.7047
mean_class_accuracy: 0.3648

Issue fix

If you have already identified the reason, you can provide the information here. If you are willing to create a PR to fix it, please also leave a comment here and that would be much appreciated!

AttributeError: 'EpochBasedRunner' object has no attribute 'data_loader'

python tools/train.py configs/recognition/slowfast/slowfast_r50_4x8x1_256e_jester_rgb.py --validate

error info:
Traceback (most recent call last):
File "/export/mmaction2/tools/train.py", line 146, in
main()
File "/export/mmaction2/tools/train.py", line 142, in main
meta=meta)
File "/export/mmaction2/mmaction/apis/train.py", line 111, in train_model
runner.run(data_loaders, cfg.workflow, cfg.total_epochs)
File "/home/zhanglu/miniconda3/envs/open-mmlab/lib/python3.7/site-packages/mmcv/runner/epoch_based_runner.py", line 103, in run
self.call_hook('before_run')
File "/home/zhanglu/miniconda3/envs/open-mmlab/lib/python3.7/site-packages/mmcv/runner/base_runner.py", line 298, in call_hook
getattr(hook, fn_name)(self)
File "/home/zhanglu/miniconda3/envs/open-mmlab/lib/python3.7/site-packages/mmcv/runner/hooks/lr_updater.py", line 114, in before_run
epoch_len = len(runner.data_loader)
AttributeError: 'EpochBasedRunner' object has no attribute 'data_loader'

The same error will occur with the csn model.

will bmn support classification with start time point and end time point?

Hi,

Thanks for the great repo!

I know that bmn support video start time point and end time point prediction. But will it support the classification for this video snip between start and end point? if not, how to do the classification in an end to end way? any suggestions?

will add this feature to the repo in near future?

Thanks in advance!

Unexpected keyword 'use_frames'

When following the demo.py in documents, I got an error like this:

TypeError: init_recognizer() got an unexpected keyword argument 'use_frames'

Is there anything thing changed about the recognizer?

Refactor CI (add PyTorch 1.6 and CPU)

Ref: https://github.com/open-mmlab/mmdetection/blob/master/.github/workflows/build.yml

mmcv error when extracting frames

Describe the bug

When I extracting rgb frames using tools/data/sthv2/extract_rgb_frames_opencv.sh, opencv resize error happened.

From the error trackback, it may be caused by mmcv.

Reproduction

What command or script did you run?

sh extract_rgb_frames_opencv.sh in tools/data/sthv2

Did you make any modifications on the code or config? Did you understand what you have modified?
--> No
What dataset did you use?
--> sthv2

Environment

sys.platform: linux
Python: 3.7.9 (default, Aug 31 2020, 12:42:55) [GCC 7.3.0]
CUDA available: True
GPU 0,1,2,3,4,5,6,7: TITAN Xp
CUDA_HOME: /usr/local/cuda
NVCC:
GCC: gcc (Ubuntu 5.4.0-6ubuntu1~16.04.12) 5.4.0 20160609
PyTorch: 1.6.0
PyTorch compiling details: PyTorch built with:

GCC 7.3

C++ Version: 201402

Intel(R) Math Kernel Library Version 2020.0.2 Product Build 20200624 for Intel(R) 64 architecture applications

Intel(R) MKL-DNN v1.5.0 (Git Hash e2ac1fac44c5078ca927cb9b90e1b3066a0b2ed0)

OpenMP 201511 (a.k.a. OpenMP 4.5)

NNPACK is enabled

CPU capability usage: AVX2

CUDA Runtime 10.2

NVCC architecture flags: -gencode;arch=compute_37,code=sm_37;-gencode;arch=compute_50,code=sm_50;-gencode;arch=compute_60,code=sm_60;-gencode;arch=compute_61,code=sm_61;-gencode;arch=compute_70,code=sm_70;-gencode;arch=compute_75,code=sm_75;-gencode;arch=compute_37,code=compute_37

CuDNN 7.6.5

Magma 2.5.2

Build settings: BLAS=MKL, BUILD_TYPE=Release, CXX_FLAGS= -Wno-deprecated -fvisibility-inlines-hidden -DUSE_PTHREADPOOL -fopenmp -DNDEBUG -DUSE_FBGEMM -DUSE_QNNPACK -DUSE_PYTORCH_QNNPACK -DUSE_XNNPACK -DUSE_VULKAN_WRAPPER -O2 -fPIC -Wno-narrowing -Wall -Wextra -Werror=return-type -Wno-missing-field-initializers -Wno-type-limits -Wno-array-bounds -Wno-unknown-pragmas -Wno-sign-compare -Wno-unused-parameter -Wno-unused-variable -Wno-unused-function -Wno-unused-result -Wno-unused-local-typedefs -Wno-strict-overflow -Wno-strict-aliasing -Wno-error=deprecated-declarations -Wno-stringop-overflow -Wno-error=pedantic -Wno-error=redundant-decls -Wno-error=old-style-cast -fdiagnostics-color=always -faligned-new -Wno-unused-but-set-variable -Wno-maybe-uninitialized -fno-math-errno -fno-trapping-math -Werror=format -Wno-stringop-overflow, PERF_WITH_AVX=1, PERF_WITH_AVX2=1, PERF_WITH_AVX512=1, USE_CUDA=ON, USE_EXCEPTION_PTR=1, USE_GFLAGS=OFF, USE_GLOG=OFF, USE_MKL=ON, USE_MKLDNN=ON, USE_MPI=OFF, USE_NCCL=ON, USE_NNPACK=ON, USE_OPENMP=ON, USE_STATIC_DISPATCH=OFF,

TorchVision: 0.7.0
OpenCV: 4.4.0
MMCV: 1.1.2
MMCV Compiler: GCC 7.3
MMCV CUDA Compiler: 10.2
MMAction2: 0.6.0+7dc58b3

You may add addition that may be helpful for locating the problem, such as
--> pytorch installed by conda

Error traceback
If applicable, paste the error traceback here.


Traceback (most recent call last):
  File "build_rawframes.py", line 226, in <module>
    len(vid_list) * [args.task]))
  File "/home/lsrock1/anaconda3/envs/pytorch1.6/lib/python3.7/multiprocessing/pool.py", line 268, in map
    return self._map_async(func, iterable, mapstar, chunksize).get()
  File "/home/lsrock1/anaconda3/envs/pytorch1.6/lib/python3.7/multiprocessing/pool.py", line 657, in get
    raise self._value
cv2.error: OpenCV(4.4.0) /tmp/pip-req-build-nzyrw1vf/opencv/modules/imgproc/src/resize.cpp:3932: error: (-215:Assertion failed) inv_scale_x > 0 in function 'resize'

Genearte raw frames (RGB only)

TypeError: Object of type ndarray is not JSON serializable

hi mmation2 , i trained a model for ucf101 using config file configs/recognition/tsn/tsn_r50_1x1x3_80e_ucf101_rgb.py

now i want to test the power of it, using following code:

$ CUDA_VISIBLE_DEVICES=1 python tools/test.py configs/recognition/tsn/tsn_r50_1x1x3_80e_ucf101_rgb.py work_dirs/tsn_r50_1x1x3_80e_ucf101_rgb/latest.pth --eval top_k_accuracy mean_class_accuracy     --out result.json

it works fine, but when save result into json file, error happes

$ CUDA_VISIBLE_DEVICES=1 python tools/test.py configs/recognition/tsn/tsn_r50_1x1x3_80e_ucf101_rgb.py work_dirs/tsn_r50_1x1x3_80e_ucf101_rgb/latest.pth --eval top_k_accuracy mean_class_accuracy     --out result.json
2020-08-12 14:40:58,082 - mmaction - INFO - These parameters in pretrained checkpoint are not loaded: {'fc.bias', 'fc.weight'}
[>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>] 4/4, 1.6 task/s, elapsed: 3s, ETA:     0s
writing results to result.json
Traceback (most recent call last):
  File "tools/test.py", line 139, in <module>
    main()
  File "tools/test.py", line 131, in main
    dataset.dump_results(outputs, **output_config)
  File "/home/zj/zhonglian/mmaction2/mmaction/datasets/base.py", line 86, in dump_results
    return mmcv.dump(results, out)
  File "/home/zj/zhonglian/mmcv/mmcv/fileio/io.py", line 80, in dump
    handler.dump_to_path(obj, file, **kwargs)
  File "/home/zj/zhonglian/mmcv/mmcv/fileio/handlers/base.py", line 25, in dump_to_path
    self.dump_to_fileobj(obj, f, **kwargs)
  File "/home/zj/zhonglian/mmcv/mmcv/fileio/handlers/json_handler.py", line 13, in dump_to_fileobj
    json.dump(obj, file, **kwargs)
  File "/home/zj/anaconda3/envs/zhonglian/lib/python3.7/json/__init__.py", line 179, in dump
    for chunk in iterable:
  File "/home/zj/anaconda3/envs/zhonglian/lib/python3.7/json/encoder.py", line 429, in _iterencode
    yield from _iterencode_list(o, _current_indent_level)
  File "/home/zj/anaconda3/envs/zhonglian/lib/python3.7/json/encoder.py", line 325, in _iterencode_list
    yield from chunks
  File "/home/zj/anaconda3/envs/zhonglian/lib/python3.7/json/encoder.py", line 438, in _iterencode
    o = _default(o)
  File "/home/zj/anaconda3/envs/zhonglian/lib/python3.7/json/encoder.py", line 179, in default
    raise TypeError(f'Object of type {o.__class__.__name__} '
TypeError: Object of type ndarray is not JSON serializable

in tools/test.py, the correlative code is

    if rank == 0:
        if output_config:
            out = output_config['out']
            print(f'\nwriting results to {out}')
            dataset.dump_results(outputs, **output_config)

i printed the output_config and out, the info is:

{'out': 'result.json'}

[array([ 1.9529229 ,  2.923142  ,  0.26469332, -0.12839134, -2.1167572 ,
       -0.7340729 , -1.3261667 ,  0.4541236 ,  0.94828093,  1.155941  ,
        0.23628543, -0.78831387,  2.2801087 ,  1.0793906 ,  0.31419927,
        0.30997226,  0.5425246 , -0.70942116, -1.1134925 ,  2.236816  ,
        3.9390984 , -2.1505275 , -1.085769  , -2.8008654 , -1.3788043 ,
       -0.35550973,  0.6128084 ,  0.97523236, -1.4105709 , -1.2038826 ,
       -2.1797624 , -1.4052689 , -0.67973197,  1.7024329 , -0.7162529 ,
       -1.2531643 , -1.405829  , -1.7755532 , -0.9127121 , -0.52495575,
        0.5702051 , -0.54499656,  0.9248879 , -1.0198474 ,  1.8331637 ,
       -0.5963148 , -1.2978854 ,  1.1907437 , -1.4260625 ,  0.20374985,
        1.7188393 ,  0.9811421 , -1.6228783 ,  0.58338284, -0.7557665 ,
       -1.0928499 , -0.7617161 , -0.65688896,  4.0263968 , -0.09345046,
        0.07987386,  0.73330057, 13.416785  , -0.31503808,  3.6180706 ,
        1.4577851 ,  1.4350643 ,  0.21168658, -0.19559935, -1.103691  ,
        0.7532946 ,  1.5955294 , -1.1590674 , -1.2700799 , -0.32934734,
       -0.52962774, -0.747167  ,  0.18337195, -0.2666077 ,  0.717041  ,
       -0.6293016 , -0.6326269 , -0.17059498, -2.4983056 ,  0.0488462 ,
       -1.161425  ,  0.13799725, -1.8053738 , -1.6930958 ,  1.2327036 ,
       -1.2348598 , -0.18195666,  1.3208578 , -3.0858784 ,  1.1431783 ,
       -0.9411551 , -0.7087368 , -1.15071   , -3.0066304 , -1.8325434 ,
        2.1851883 ], dtype=float32), array([-3.1554081 , -1.8086557 , -1.2189873 , -1.3863541 , -3.4624038 ,
       -3.7378008 ,  7.6724052 ,  5.97018   , -4.2795534 , -2.5561104 ,
        2.2037597 ,  1.2032832 , -5.6521015 , -3.200562  ,  0.06564808,
       -3.1106699 , -0.22693926, -4.557994  , -0.9784015 , -2.8301358 ,
       -0.26256648, -1.9581242 , -0.75423837,  3.251859  , -3.7698638 ,
        2.8235092 , -2.9476943 ,  0.75258267, 10.651768  ,  1.6277269 ,
       -0.08898169, -1.1676219 ,  4.3143296 , -4.4079895 , -4.0753226 ,
        2.1783433 , -4.154809  , -1.7371117 , -2.4756253 ,  6.97458   ,
       -1.4465613 ,  3.5330255 , -1.9635652 , -1.0765982 ,  3.4709496 ,
       -0.44178772, -0.5041221 ,  2.493868  , -0.25774002, -2.910048  ,
        1.3306173 ,  3.3166916 , -1.9219271 , -1.5394036 , -1.2261659 ,
       -1.2541034 , -0.9439164 , -0.20131937, -2.7909422 , -1.7844346 ,
       -0.31215718, -2.2882266 , -1.4200875 , -2.3059387 , -1.2107593 ,
       -2.174218  , -3.193241  ,  2.251296  , -2.9217339 ,  2.1830683 ,
        0.09082523,  0.70335275, -3.5495253 , -5.4326572 , -2.9788358 ,
        0.7502857 , -2.0108578 , -3.704027  ,  2.679557  , -0.8924122 ,
        0.39617965,  2.2738085 , -3.2832923 ,  7.1167126 ,  3.3312867 ,
       -0.20836425, -3.8255863 , -0.7380201 ,  2.5008836 ,  5.836446  ,
        3.9049966 , 16.540073  ,  9.489449  ,  6.8317823 , -2.6105278 ,
        0.0635196 , -0.18466364,  2.4365137 , -0.29589617, -0.49789888,
        2.5412517 ], dtype=float32), array([-3.40932107e+00, -1.06936395e-02, -1.73499656e+00, -1.59915805e+00,
        5.71720302e-02, -1.26235354e+00,  1.75313354e+00,  1.82909936e-01,
       -2.73504066e+00, -8.32203209e-01,  1.33741820e+00,  1.22894943e+00,
       -3.33747673e+00, -2.82331657e+00, -6.27151072e-01, -5.35833001e-01,
        7.28152394e-02, -3.50825024e+00,  2.36635065e+00,  1.20436706e-01,
        1.99636745e+00,  1.94954121e+00,  1.54881507e-01,  3.04111511e-01,
       -2.20299864e+00,  4.68201256e+00, -3.32769918e+00,  1.58799827e+00,
        2.00522804e+00,  4.28090960e-01,  1.21267533e+00, -3.45705330e-01,
        2.38831758e+00, -2.96614265e+00, -1.35263073e+00,  1.28939712e+00,
       -1.74022067e+00, -1.94155240e+00, -3.36226821e+00,  7.63379526e+00,
        4.00016403e+00,  4.05345821e+00, -4.05784190e-01,  1.22065210e+00,
        3.96605849e-01, -3.39757466e+00,  1.67164028e+00,  6.65977716e-01,
        3.89114916e-01, -1.13685560e+00,  1.78429723e+00,  1.66959250e+00,
        8.51574957e-01, -1.33695388e+00, -3.62328577e+00, -2.20936608e+00,
       -4.98263955e-01, -1.52075148e+00, -1.68073058e+00, -3.47000551e+00,
       -4.68902290e-03,  9.44112360e-01, -2.32742310e+00, -7.69852519e-01,
       -2.74959385e-01, -1.03926265e+00, -1.83813047e+00,  3.34748793e+00,
       -3.22042465e-01, -4.92838115e-01,  2.63888419e-01,  3.05683446e+00,
        1.63758367e-01, -4.02872753e+00, -2.33594084e+00,  1.09016666e+01,
       -2.16153765e+00, -2.93059349e+00,  3.17019510e+00,  1.59995222e+00,
       -7.56023049e-01,  7.05853367e+00, -1.75534749e+00, -9.27645862e-02,
       -7.87818313e-01, -1.31494510e+00, -5.49836457e-02,  7.27982521e-01,
       -9.21023250e-01,  2.67443925e-01,  1.25793505e+00,  1.52883315e+00,
        2.56475949e+00,  9.29922283e-01, -1.78127527e+00, -6.23938262e-01,
       -6.67548358e-01,  1.15025485e+00, -2.27030230e+00,  2.42970988e-01,
       -1.11846581e-01], dtype=float32), array([ 9.7881667e-02,  6.3227153e-01, -1.8561482e+00, -2.1571205e+00,
        1.4059830e+01,  4.8399657e-01, -1.8275721e+00, -2.1536226e+00,
        2.0527697e+00, -2.3162837e+00, -3.0728564e+00,  4.5147705e-01,
       -2.2566085e+00,  9.0172809e-01,  9.2773736e-01,  3.4005036e+00,
       -2.4779036e+00, -1.9556541e+00, -4.0643939e-01, -1.2113328e+00,
        1.0615828e+00,  1.8980796e+00,  8.0910289e-01, -3.4260190e+00,
        1.6985834e-02,  1.8681365e+00, -1.6745995e+00,  3.1297741e+00,
        4.9533206e-01,  7.7088308e+00, -9.4858694e-01,  1.6952250e+00,
       -3.3255212e+00, -9.6397811e-01,  2.0618695e-01,  3.0011529e-01,
        1.3867394e+00,  2.7509351e+00, -1.8679692e+00,  1.8175439e+00,
       -1.7074220e+00, -3.3053722e+00,  4.2096773e-01,  3.0590990e+00,
       -3.0134280e+00, -4.1446114e+00,  1.4162828e+00, -1.3907127e+00,
       -2.8771629e+00,  9.5357203e-01,  1.0698979e+00, -3.5089359e+00,
       -4.6066377e-01, -2.0315270e+00, -2.4641752e+00, -1.7112375e+00,
        7.7639780e+00, -7.3515660e-01, -1.6210897e+00, -1.6490629e+00,
       -1.4550496e+00,  8.2967222e-01, -2.4997182e+00, -3.0694556e-01,
        7.3129952e-01, -7.7849364e-01, -8.0653977e-01, -2.7814975e-01,
        6.9563894e+00, -2.2368103e-02,  1.2655897e+00,  1.0192424e-02,
        1.6345310e+00, -2.7512756e-01, -1.4516522e+00, -1.3889271e-01,
       -7.7020127e-01, -1.5020751e+00,  1.9333646e+00, -4.9428000e+00,
       -1.9338553e+00, -2.5300448e+00,  4.6418971e-01, -5.1236825e+00,
        2.4116956e-01,  7.5193768e+00,  5.8947573e+00, -5.9647286e-01,
       -3.0245688e+00,  1.1701695e+00, -2.1766311e-01, -2.4784267e+00,
       -2.7892220e+00,  1.5604091e-01,  2.2785933e+00,  7.8045473e+00,
       -1.1207641e+00, -2.6828754e+00,  1.1542189e+00,  3.2799768e-01,
       -9.7450703e-01], dtype=float32)]

I hope you can help me solve this problem

Support multi_class in TSM-Head

Describe the feature

Motivation
A clear and concise description of the motivation of the feature.
Ex1. It is inconvenient when [....].
Ex2. There is a recent paper [....], which is very helpful for [....].

Related resources
If there is an official code released or third-party implementations, please also provide the information here, which would be very helpful.

Additional context
Add any other context or screenshots about the feature request here.
If you would like to implement the feature and create a PR, please leave a comment here and that would be much appreciated.

You should just add one line in tsm_head.py to support multi_class.

video classification

Can I use mmaction2 to Classify videos? I I've just come into contact with this framework, and I'm not very familiar with it.Please tell me which model in the model zoo would be good at video classification?

TSM temporal_pool=True bug

Thanks for your awesome codabase.
I'm trying to train TSM with temporal_pool=True(add temporal_pool=True in both TSMHead & ResNetTSM ) but get some errors.
After some debugging, i think ResNetTSM forget to do actual temporal pool between layer1 and layer2.
which means, feature map shape before layer2 should be N * num_segments/2, C, H, W instead of N * num_segments, C, H, W
In original TSM codabase, when temporal_pool=True, there is a max_pool3d to do actual temporal pool before layer2, which is missing in mmaction2.

misleading settings in README

I got the suggestions "The gpus indicates the number of gpu we used to get the checkpoint. It is noteworthy that the configs we provide are used for 8 gpus as default. According to the Linear Scaling Rule, you may set the learning rate proportional to the batch size if you use different GPUs or videos per GPU, e.g., lr=0.01 for 4 GPUs * 2 video/gpu and lr=0.08 for 16 GPUs * 4 video/gpu." when tried to use slowfast configs.

Yet the lr and videos_per_gpu in these configs files are different from those in README. For example, in https://github.com/open-mmlab/mmaction2/blob/master/configs/recognition/slowfast/slowfast_r50_4x16x1_256e_kinetics400_rgb.py, the 'lr' is 0.1 and the 'videos_per_gpu' is 8.

So, which one is the correct setting to reproduce the performance mentioned in README?

Roadmap of MMAction2

We keep this issue open to collect feature requests from users and hear your voice. Our monthly release plan is also available here.

You can either:

Suggest a new feature by leaving a comment.
Vote for a feature request with 👍 or be against with 👎. (Remember that developers are busy and cannot respond to all feature requests, so vote for your most favorable one!)
Tell us that you would like to help implement one of the features in the list or review the PRs. (This is the greatest things to hear about!)

BSN README problem

training command in BSN README

python tools/train.py configs/localization/bsn/bsn_400x100_1x16_20e_activitynet_feature.py

can not run(filename not updated)

Add Dockerfile and use PyTorch 1.6

Switch the modelzoo URL to https://download.openmmlab.com

Useless forward Test code

mmaction\models\recognizers\recognizer3d.py def forward_test(self, imgs):

Loss is not calculated, and accuracy is not calculated. So why do I use it? I recommend printing the accuracy after the evaluation

Feature extraction of BMN using TSN

In the BMN Model Zoo there are results of feature extracted by MMAction but I found on details in Data Preparation about how to extract the feature using TSN.

After refer to BMN paper and some issues, I am still confusing about the details.

assume the video has 16,000 frames

Divide all frames into 1000 continuous non-overlap snippets, each has 16 frames. Decode video to raw frames and calculate optical flow.
Select the 8-th rgb frame and 6,7,8,9,10-th optical flow frames in each snippet to represent this snippet.
For one snippet:

RGB: initialize TSN network with ActivityNet RGB corresponding config and ckpt in TSN Model Zoo. Input one rgb frame (8-th), simply resize to 224x224 without any crop, then cls_score return by tsn_head will be a tensor with shape [1, 200].
Flow: initialize TSN network with flow config and ckpt, input five optical flow frames, then consensus module will "average" them, so cls_score will also be a tensor with shape [1, 200].
concat two tensor above -> get feature of this snippet

Same process to all 1000 snippets, so the feature shape of a video is [1000, 400], then use this script to rescaled to [100, 400]

Is above the right step? Or could you add your feature extraction script to this repo.

Thank you!

Give training dataset resolution info in modelzoo

Currently some typical used resolutions for action recognition include:

340x256 (i guess it is the legacy of ucf101)
short-side 256
height 256
short-side 320
height 320
height 331

Obviously different resolution might or might not influence the accuracy. So it is good to mark the resolution of the training data

Edit:

Forgot about the video format.

Do you have the plan to add the person detection function

你好。非常感谢你们的implementation。
请问你们接下去有没有计划实现对动画中出现的不同人都进行独立的行为识别推论的计划？就像slowfast他们的implementation一样，他们对AVA的dataset也能进行学习与推论。
我们现在在做监视摄像头的行为识别，里面出现的人不止一个，所以希望能实现独立的推论。

FileNotFoundError: [Errno 2] No such file or directory: 'data/ucf101/ucf101_train_split_{1,2,3}_rawframes.txt'

hi mmaction2, i met this problem several times, use UCF101 dataset to train, use default config file

$ CUDA_VISIBLE_DEVICES=1 python tools/train.py configs/recognition/tsn/tsn_r50_1x1x3_80e_ucf101_rgb.py

refer to #101 , add start_index=0 to data dict

but there was a problem

2020-08-12 09:20:54,393 - mmaction - INFO - These parameters in pretrained checkpoint are not loaded: {'fc.bias', 'fc.weight'}
Traceback (most recent call last):
  File "tools/train.py", line 146, in <module>
    main()
  File "tools/train.py", line 125, in main
    datasets = [build_dataset(cfg.data.train)]
  File "/home/zj/zhonglian/mmaction2/mmaction/datasets/builder.py", line 39, in build_dataset
    dataset = build_from_cfg(cfg, DATASETS, default_args)
  File "/home/zj/zhonglian/mmcv/mmcv/utils/registry.py", line 167, in build_from_cfg
    return obj_cls(**args)
  File "/home/zj/zhonglian/mmaction2/mmaction/datasets/rawframe_dataset.py", line 93, in __init__
    multi_class, num_classes, start_index, modality)
  File "/home/zj/zhonglian/mmaction2/mmaction/datasets/base.py", line 63, in __init__
    self.video_infos = self.load_annotations()
  File "/home/zj/zhonglian/mmaction2/mmaction/datasets/rawframe_dataset.py", line 98, in load_annotations
    with open(self.ann_file, 'r') as fin:
FileNotFoundError: [Errno 2] No such file or directory: 'data/ucf101/ucf101_train_split_{1,2,3}_rawframes.txt'

when i modified this file path in config file

# ann_file_train = 'data/ucf101/ucf101_train_split_{1,2,3}_rawframes.txt'
ann_file_train = 'data/ucf101/ucf101_train_split_1_rawframes.txt'

ok, everything is fine. I wonder if it needs to be changed every time, after trained use 2 to continue train, because there has a config

work_dir = './work_dirs/tsn_r50_1x1x3_80e_ucf101_rgb/'

experiment results on UCF-101 and HMDB-51 for R(2+1)D and I3D backbone.

Hi,

For experiments using R(2+1)D and I3D backbone
(https://github.com/open-mmlab/mmaction2/blob/master/configs/recognition/r2plus1d/README.md),
(https://github.com/open-mmlab/mmaction2/blob/master/configs/recognition/i3d/README.md),
did you have experiment results on UCF-101 and HMDB-51? If yes, would you mind share with me your experimental results and give me more information about model initialization (random init or ImageNet pre-trained)

Thanks!

open-mmlab / mmaction2 Goto Github PK

mmaction2's People

Stargazers

Watchers

Forkers

mmaction2's Issues

Recommend Projects

Recommend Topics

Recommend Org