open-mmlab / mmaction2 Goto Github PK
View Code? Open in Web Editor NEWOpenMMLab's Next Generation Video Understanding Toolbox and Benchmark
Home Page: https://mmaction2.readthedocs.io
License: Apache License 2.0
OpenMMLab's Next Generation Video Understanding Toolbox and Benchmark
Home Page: https://mmaction2.readthedocs.io
License: Apache License 2.0
In generate_labels
def generate_labels(self, gt_bbox):
"""Generate training labels."""
match_score_confidence_list = []
match_score_start_list = []
match_score_end_list = []
for every_gt_bbox in gt_bbox:
gt_iou_map = []
for start, end in every_gt_bbox:
start = start.numpy()
end = end.numpy()
......
The type variable start
and end
is numpy.float64
instead of tensor
, obviously it has no function named numpy()
, which leads to an error.
A simple solution would be just comment out these last two lines. (which may be conflict with your design? )
Another solution is found in your training log, just add gt_bbox
in the argument keys
of ToTensor
train_pipeline = [
dict(type='LoadLocalizationFeature'),
dict(type='GenerateLocalizationLabels'),
dict(
type='Collect',
keys=['raw_feature', 'gt_bbox'],
meta_name='video_meta',
meta_keys=['video_name']),
# dict(type='ToTensor', keys=['raw_feature']),
dict(type='ToTensor', keys=['raw_feature', 'gt_bbox']),
dict(type='ToDataContainer', fields=[dict(key='gt_bbox', stack=False)])
]
Maybe it's just a simple mistake when uploading the config file :)
hi mmaction2, First of all, thank you for your contribution. i want to train hmdb51 dataset using TSN, following tutorials, i do something as:
preparing_hmdb51.md
$ cd mmaction2/tools/data/hmdb51
# process data and annotation
$ bash download_annotations.sh
$ bash download_videos.sh
# extract data
$ bash extract_rgb_frames.sh
# generate label
$ bash generate_rawframes_filelist.sh
$ bash generate_videos_filelist.sh
when doing this, the log shows some .avi doesn't work
...
...
rgb 6568 talk/The_Matrix_3_talk_h_nm_np1_fr_goo_13.avi None done <------------- here
"../../data/hmdb51/videos/talk/Hamlet_(1996)_Fencing_Scenes_talk_u_cm_np1_fr_med_0.avi", frames ≈ 96
extracted frames of video "../../data/hmdb51/videos/talk/Hamlet_(1996)_Fencing_Scenes_talk_u_cm_np1_fr_med_0.avi", 95 frames
1 videos (95 frames, 0 tvl1 flows) processed, using 0.276s, decoding speed 344.203fps, flow speed 0fps
rgb 6569 talk/Hamlet_(1996)_Fencing_Scenes_talk_u_cm_np1_fr_med_0.avi None done <-------------------here
"../../data/hmdb51/videos/talk/Fellowship_6_talk_h_cm_np1_fr_goo_14.avi", frames ≈ 59
extracted frames of video "../../data/hmdb51/videos/talk/Fellowship_6_talk_h_cm_np1_fr_goo_14.avi", 58 frames
1 videos (58 frames, 0 tvl1 flows) processed, using 0.327s, decoding speed 177.37fps, flow speed 0fps
rgb 6570 talk/Fellowship_6_talk_h_cm_np1_fr_goo_14.avi None done
"../../data/hmdb51/videos/talk/Fellowship_6_talk_h_cm_np1_fr_goo_13.avi", frames ≈ 73
extracted frames of video "../../data/hmdb51/videos/talk/Fellowship_6_talk_h_cm_np1_fr_goo_13.avi", 72 frames
1 videos (72 frames, 0 tvl1 flows) processed, using 0.502s, decoding speed 143.426fps, flow speed 0fps
rgb 6571 talk/Fellowship_6_talk_h_cm_np1_fr_goo_13.avi None done
Genearte raw frames (RGB only)
i copy mmaction2/configs/recognition/tsn/tsn_r50_1x1x3_80e_ucf101_rgb.py
to tsn_hmdb51_config.py
and modified some place
# model settings
model = dict(
type='Recognizer2D',
backbone=dict(
type='ResNet',
pretrained='torchvision://resnet50',
depth=50,
norm_eval=False),
cls_head=dict(
type='TSNHead',
# num_classes=101,
# 修改类别
num_classes=51, <--------------- here
in_channels=2048,
spatial_type='avg',
consensus=dict(type='AvgConsensus', dim=1),
dropout_ratio=0.8,
init_std=0.001))
# model training and testing settings
train_cfg = None
test_cfg = dict(average_clips=None)
# dataset settings
dataset_type = 'RawframeDataset'
data_root = 'data/hmdb51/rawframes/' <--- here
data_root_val = 'data/hmdb51/rawframes/'
ann_file_train = 'data/hmdb51/hmdb51_train_split_{1,2,3}_rawframes.txt'
ann_file_val = 'data/hmdb51/hmdb51_val_split_{1,2,3}_rawframes.txt'
ann_file_test = 'data/hmdb51/hmdb51_val_split_{1,2,3}_rawframes.txt' <----- here
img_norm_cfg = dict(mean=[104, 117, 128], std=[1, 1, 1], to_bgr=False)
train_pipeline = [
dict(type='SampleFrames', clip_len=1, frame_interval=1, num_clips=3),
dict(type='FrameSelector'),
dict(type='Resize', scale=(-1, 256)),
dict(
type='MultiScaleCrop',
input_size=224,
scales=(1, 0.875, 0.75, 0.66),
random_crop=False,
max_wh_scale_gap=1),
dict(type='Resize', scale=(224, 224), keep_ratio=False),
dict(type='Flip', flip_ratio=0.5),
dict(type='Normalize', **img_norm_cfg),
dict(type='FormatShape', input_format='NCHW'),
dict(type='Collect', keys=['imgs', 'label'], meta_keys=[]),
dict(type='ToTensor', keys=['imgs', 'label'])
]
val_pipeline = [
dict(
type='SampleFrames',
clip_len=1,
frame_interval=1,
num_clips=3,
test_mode=True),
dict(type='FrameSelector'),
dict(type='Resize', scale=(-1, 256)),
dict(type='CenterCrop', crop_size=224),
dict(type='Flip', flip_ratio=0),
dict(type='Normalize', **img_norm_cfg),
dict(type='FormatShape', input_format='NCHW'),
dict(type='Collect', keys=['imgs', 'label'], meta_keys=[]),
dict(type='ToTensor', keys=['imgs'])
]
test_pipeline = [
dict(
type='SampleFrames',
clip_len=1,
frame_interval=1,
num_clips=25,
test_mode=True),
dict(type='FrameSelector'),
dict(type='Resize', scale=(-1, 256)),
dict(type='TenCrop', crop_size=224),
dict(type='Flip', flip_ratio=0),
dict(type='Normalize', **img_norm_cfg),
dict(type='FormatShape', input_format='NCHW'),
dict(type='Collect', keys=['imgs', 'label'], meta_keys=[]),
dict(type='ToTensor', keys=['imgs'])
]
data = dict(
videos_per_gpu=32,
workers_per_gpu=4,
train=dict(
type=dataset_type,
ann_file=ann_file_train,
data_prefix=data_root,
pipeline=train_pipeline),
val=dict(
type=dataset_type,
ann_file=ann_file_val,
data_prefix=data_root_val,
pipeline=val_pipeline),
test=dict(
type=dataset_type,
ann_file=ann_file_test,
data_prefix=data_root_val,
pipeline=test_pipeline))
# optimizer
optimizer = dict(
type='SGD', lr=0.001, momentum=0.9,
weight_decay=0.0005) # this lr is used for 8 gpus
optimizer_config = dict(grad_clip=dict(max_norm=40, norm_type=2))
# learning policy
lr_config = dict(policy='step', step=[30, 60])
total_epochs = 80
checkpoint_config = dict(interval=5)
evaluation = dict(
interval=5, metrics=['top_k_accuracy', 'mean_class_accuracy'], topk=(1, 5))
log_config = dict(
interval=20,
hooks=[
dict(type='TextLoggerHook'),
# dict(type='TensorboardLoggerHook'),
])
# runtime settings
dist_params = dict(backend='nccl')
log_level = 'INFO'
work_dir = './work_dirs/tsn_r50_1x1x3_80e_hmdb51_rgb/'
# 使用预训练模型
load_from = './checkpoints/tsn_r50_1x1x3_80e_ucf101_rgb_20200613-d6ad9c48.pth' <- here
# load_from = None
resume_from = None
workflow = [('train', 1)]
$ CUDA_VISIBLE_DEVICES=3 python tools/train.py configs/recognition/tsn/tsn_hmdb51_config.py
2020-08-11 15:19:47,841 - mmaction - INFO - Environment info:
------------------------------------------------------------
sys.platform: linux
Python: 3.7.7 (default, Mar 26 2020, 15:48:22) [GCC 7.3.0]
CUDA available: True
...
...
...
# load_from = None
resume_from = None
workflow = [('train', 1)]
2020-08-11 15:19:49,517 - mmaction - INFO - These parameters in pretrained checkpoint are not loaded: {'fc.weight', 'fc.bias'}
Traceback (most recent call last):
File "tools/train.py", line 146, in <module>
main()
File "tools/train.py", line 125, in main
datasets = [build_dataset(cfg.data.train)]
File "/home/zj/zhonglian/mmaction2/mmaction/datasets/builder.py", line 39, in build_dataset
dataset = build_from_cfg(cfg, DATASETS, default_args)
File "/home/zj/zhonglian/mmcv/mmcv/utils/registry.py", line 167, in build_from_cfg
return obj_cls(**args)
File "/home/zj/zhonglian/mmaction2/mmaction/datasets/rawframe_dataset.py", line 92, in __init__
multi_class, num_classes, modality)
File "/home/zj/zhonglian/mmaction2/mmaction/datasets/base.py", line 57, in __init__
self.video_infos = self.load_annotations()
File "/home/zj/zhonglian/mmaction2/mmaction/datasets/rawframe_dataset.py", line 97, in load_annotations
with open(self.ann_file, 'r') as fin:
FileNotFoundError: [Errno 2] No such file or directory: 'data/hmdb51/hmdb51_train_split_{1,2,3}_rawframes.txt'
ok, i modified the path to ann_file_train = 'data/hmdb51/hmdb51_train_split_1_rawframes.txt'
, then another error happens
...
...
2020-08-11 15:36:29,645 - mmaction - INFO - These parameters in pretrained checkpoint are not loaded: {'fc.bias', 'fc.weight'}
2020-08-11 15:36:41,862 - mmaction - INFO - load checkpoint from ./checkpoints/tsn_r50_1x1x3_80e_ucf101_rgb_20200613-d6ad9c48.pth
2020-08-11 15:36:44,408 - mmaction - WARNING - The model and loaded state dict do not match exactly
size mismatch for cls_head.fc_cls.weight: copying a param with shape torch.Size([101, 2048]) from checkpoint, the shape in current model is torch.Size([51, 2048]).
size mismatch for cls_head.fc_cls.bias: copying a param with shape torch.Size([101]) from checkpoint, the shape in current model is torch.Size([51]).
2020-08-11 15:36:44,409 - mmaction - INFO - Start running, host: zj@user-SYS-7049GP-TRT, work_dir: /home/zj/zhonglian/mmaction2/work_dirs/tsn_r50_1x1x3_80e_hmdb51_rgb
2020-08-11 15:36:44,409 - mmaction - INFO - workflow: [('train', 1)], max: 80 epochs
Traceback (most recent call last):
File "tools/train.py", line 146, in <module>
main()
File "tools/train.py", line 142, in main
meta=meta)
File "/home/zj/zhonglian/mmaction2/mmaction/apis/train.py", line 111, in train_model
runner.run(data_loaders, cfg.workflow, cfg.total_epochs)
File "/home/zj/zhonglian/mmcv/mmcv/runner/epoch_based_runner.py", line 122, in run
epoch_runner(data_loaders[i], **kwargs)
File "/home/zj/zhonglian/mmcv/mmcv/runner/epoch_based_runner.py", line 27, in train
for i, data_batch in enumerate(data_loader):
File "/home/zj/anaconda3/envs/zhonglian/lib/python3.7/site-packages/torch/utils/data/dataloader.py", line 345, in __next__
data = self._next_data()
File "/home/zj/anaconda3/envs/zhonglian/lib/python3.7/site-packages/torch/utils/data/dataloader.py", line 838, in _next_data
return self._process_data(data)
File "/home/zj/anaconda3/envs/zhonglian/lib/python3.7/site-packages/torch/utils/data/dataloader.py", line 881, in _process_data
data.reraise()
File "/home/zj/anaconda3/envs/zhonglian/lib/python3.7/site-packages/torch/_utils.py", line 394, in reraise
raise self.exc_type(msg)
FileNotFoundError: Caught FileNotFoundError in DataLoader worker process 1.
Original Traceback (most recent call last):
File "/home/zj/anaconda3/envs/zhonglian/lib/python3.7/site-packages/torch/utils/data/_utils/worker.py", line 178, in _worker_loop
data = fetcher.fetch(index)
File "/home/zj/anaconda3/envs/zhonglian/lib/python3.7/site-packages/torch/utils/data/_utils/fetch.py", line 44, in fetch
data = [self.dataset[idx] for idx in possibly_batched_index]
File "/home/zj/anaconda3/envs/zhonglian/lib/python3.7/site-packages/torch/utils/data/_utils/fetch.py", line 44, in <listcomp>
data = [self.dataset[idx] for idx in possibly_batched_index]
File "/home/zj/zhonglian/mmaction2/mmaction/datasets/base.py", line 103, in __getitem__
return self.prepare_train_frames(idx)
File "/home/zj/zhonglian/mmaction2/mmaction/datasets/rawframe_dataset.py", line 137, in prepare_train_frames
return self.pipeline(results)
File "/home/zj/zhonglian/mmaction2/mmaction/datasets/pipelines/compose.py", line 41, in __call__
data = t(data)
File "/home/zj/zhonglian/mmaction2/mmaction/datasets/pipelines/loading.py", line 848, in __call__
img_bytes = self.file_client.get(filepath)
File "/home/zj/zhonglian/mmcv/mmcv/fileio/file_client.py", line 294, in get
return self.client.get(filepath)
File "/home/zj/zhonglian/mmcv/mmcv/fileio/file_client.py", line 185, in get
with open(filepath, 'rb') as f:
FileNotFoundError: [Errno 2] No such file or directory: '/home/zj/zhonglian/mmaction2/data/hmdb51/rawframes/shake_hands/Concourse_d_elegance_Sofia_2009___PRICE_GIVING_CEREMONY_shake_hands_f_cm_np2_le_med_0/img_00066.jpg'
i check the img dir
$ ls
img_00000.jpg img_00006.jpg img_00012.jpg img_00018.jpg img_00024.jpg img_00030.jpg img_00036.jpg img_00042.jpg img_00048.jpg img_00054.jpg img_00060.jpg
img_00001.jpg img_00007.jpg img_00013.jpg img_00019.jpg img_00025.jpg img_00031.jpg img_00037.jpg img_00043.jpg img_00049.jpg img_00055.jpg img_00061.jpg
img_00002.jpg img_00008.jpg img_00014.jpg img_00020.jpg img_00026.jpg img_00032.jpg img_00038.jpg img_00044.jpg img_00050.jpg img_00056.jpg img_00062.jpg
img_00003.jpg img_00009.jpg img_00015.jpg img_00021.jpg img_00027.jpg img_00033.jpg img_00039.jpg img_00045.jpg img_00051.jpg img_00057.jpg img_00063.jpg
img_00004.jpg img_00010.jpg img_00016.jpg img_00022.jpg img_00028.jpg img_00034.jpg img_00040.jpg img_00046.jpg img_00052.jpg img_00058.jpg img_00064.jpg
img_00005.jpg img_00011.jpg img_00017.jpg img_00023.jpg img_00029.jpg img_00035.jpg img_00041.jpg img_00047.jpg img_00053.jpg img_00059.jpg img_00065.jpg
there is no img_00066.jpg happens, why this will happen and how to solve it ? Looking forward to your help
for example, the images in folder openmm/mmaction2/data/ucf101/rawframes/Skiing/v_Skiing_g06_c04/
was img_00000.jpg~img_00299.jpg
i modified this line to
video_info['total_frames'] = int(line_split[idx]) - 1
and it started training, is it right?
I have a long video that contains a lot of labels, I want create a 300 frames detection window .I want to change the demo.py to do this. But it seems that the demo.py have to read the path of the shortcut video and got one predict. My temporary treatment plan is when i got 300 frames , I save the video as a temporary file,and call the demo.py program.then delete temporary file. It's stupid and inefficiency How can i read a long video and get the continuous prediction.please help me.
I successfully used demo.py to test a video, but how can I test a picture from my own data?
If I can. I want to compare the result between video and pictures in order from the same data.
I have seen the " dataset_type = 'RawframeDataset' " and " dataset_type = 'VideoDataset' " in the config.
Thanks for your elegant implementation of this toolbox.
The doc says denseflow
installation is unnecessary for RGB frame extraction, but I find this script still uses denseflow
for both RGB and flow extraction. I am wondering which one should be trusted.
I write a slowfast_custom_config.py . It reads:
model = dict(
type='Recognizer3D',
backbone=dict(
type='ResNet3dSlowFast',
pretrained=None,
resample_rate=8, # tau
speed_ratio=8, # alpha
channel_ratio=8, # beta_inv
slow_pathway=dict(
type='resnet3d',
depth=50,
pretrained=None,
lateral=True,
conv1_kernel=(1, 7, 7),
dilations=(1, 1, 1, 1),
conv1_stride_t=1,
pool1_stride_t=1,
inflate=(0, 0, 1, 1),
norm_eval=False),
fast_pathway=dict(
type='resnet3d',
depth=50,
pretrained=None,
lateral=False,
base_channels=8,
conv1_kernel=(5, 7, 7),
conv1_stride_t=1,
pool1_stride_t=1,
norm_eval=False)),
cls_head=dict(
type='SlowFastHead',
in_channels=2304, # 2048+256
num_classes=400,
spatial_type='avg',
dropout_ratio=0.5))
train_cfg = None
test_cfg = dict(average_clips=None)
dataset_type = 'VideoDataset'
data_root = '/dat01/liuzhixiong/zt/mmaction2/data/fortest/videos_train'
data_root_val = '/dat01/liuzhixiong/zt/mmaction2/data/fortest/videos_val'
ann_file_train = '/dat01/liuzhixiong/zt/mmaction2/data/fortest/train_list_videos.txt'
ann_file_val = '/dat01/liuzhixiong/zt/mmaction2/data/fortest/val_list_videos.txt'
ann_file_test = '/dat01/liuzhixiong/zt/mmaction2/data/fortest/val_list_videos.txt'
img_norm_cfg = dict(
mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], to_bgr=False)
train_pipeline = [
dict(type='DecordInit'),
dict(type='SampleFrames', clip_len=32, frame_interval=2, num_clips=1),
dict(type='DecordDecode'),
dict(type='Resize', scale=(-1, 256)),
dict(type='RandomResizedCrop'),
dict(type='Resize', scale=(224, 224), keep_ratio=False),
dict(type='Flip', flip_ratio=0.5),
dict(type='Normalize', **img_norm_cfg),
dict(type='FormatShape', input_format='NCTHW'),
dict(type='Collect', keys=['imgs', 'label'], meta_keys=[]),
dict(type='ToTensor', keys=['imgs', 'label'])
]
val_pipeline = [
dict(type='DecordInit'),
dict(
type='SampleFrames',
clip_len=32,
frame_interval=2,
num_clips=1,
test_mode=True),
dict(type='DecordDecode'),
dict(type='Resize', scale=(-1, 256)),
dict(type='CenterCrop', crop_size=224),
dict(type='Flip', flip_ratio=0),
dict(type='Normalize', **img_norm_cfg),
dict(type='FormatShape', input_format='NCTHW'),
dict(type='Collect', keys=['imgs', 'label'], meta_keys=[]),
dict(type='ToTensor', keys=['imgs'])
]
test_pipeline = [
dict(type='DecordInit'),
dict(
type='SampleFrames',
clip_len=32,
frame_interval=2,
num_clips=10,
test_mode=True),
dict(type='DecordDecode'),
dict(type='Resize', scale=(-1, 256)),
dict(type='ThreeCrop', crop_size=256),
dict(type='Flip', flip_ratio=0),
dict(type='Normalize', **img_norm_cfg),
dict(type='FormatShape', input_format='NCTHW'),
dict(type='Collect', keys=['imgs', 'label'], meta_keys=[]),
dict(type='ToTensor', keys=['imgs'])
]
data = dict(
videos_per_gpu=4,
workers_per_gpu=4,
train=dict(
type=dataset_type,
ann_file=ann_file_train,
data_prefix=data_root,
pipeline=train_pipeline),
val=dict(
type=dataset_type,
ann_file=ann_file_val,
data_prefix=data_root_val,
pipeline=val_pipeline),
test=dict(
type=dataset_type,
ann_file=ann_file_test,
data_prefix=data_root_val,
pipeline=test_pipeline))
optimizer = dict(
type='SGD', lr=0.1, momentum=0.9,
weight_decay=0.0001) # this lr is used for 8 gpus
optimizer_config = dict(grad_clip=dict(max_norm=40, norm_type=2))
lr_config = dict(
policy='CosineAnnealing',
min_lr=0,
warmup='linear',
warmup_by_epoch=True,
warmup_iters=34)
total_epochs = 256
checkpoint_config = dict(interval=4)
workflow = [('train', 1)]
evaluation = dict(
interval=5, metrics=['top_k_accuracy', 'mean_class_accuracy'], topk=(1, 5))
log_config = dict(
interval=20,
hooks=[
dict(type='TextLoggerHook'),
# dict(type='TensorboardLoggerHook'),
])
dist_params = dict(backend='nccl')
log_level = 'INFO'
work_dir = './work_dirs/slowfast_r50_video_3d_4x16x1_256e_fortest_rgb'
load_from = None
resume_from = None
find_unused_parameters = False
The train_list_videos.txt follows the tips.It reads:
data/fortest/videos_train/01_trian.mp4 1
data/fortest/videos_train/02_trian.mp4 1
data/fortest/videos_train/03_trian.mp4 1
data/fortest/videos_train/04_trian.mp4 2
data/fortest/videos_train/05_trian.mp4 3
......
But when I ues:
python tools/train.py configs/recognition/slowfast/slowfast_custom_config.py
--work-dir work_dirs/slowfast_r50_4x16x1_256e_fortest_rgb
--validate --seed 0 --deterministic
to sbatch my job.
It feedbacks:
Traceback (most recent call last):
File "tools/train.py", line 146, in
main()
File "tools/train.py", line 125, in main
datasets = [build_dataset(cfg.data.train)]
File "/dat01/liuzhixiong/zt/mmaction2/mmaction/datasets/builder.py", line 39, in build_dataset
dataset = build_from_cfg(cfg, DATASETS, default_args)
File "/dat01/liuzhixiong/anaconda3/envs/mmaction/lib/python3.6/site-packages/mmcv/utils/registry.py", line 167, in build_from_cfg
return obj_cls(**args)
File "/dat01/liuzhixiong/zt/mmaction2/mmaction/datasets/video_dataset.py", line 43, in init
super().init(ann_file, pipeline, start_index=start_index, **kwargs)
File "/dat01/liuzhixiong/zt/mmaction2/mmaction/datasets/base.py", line 63, in init
self.video_infos = self.load_annotations()
File "/dat01/liuzhixiong/zt/mmaction2/mmaction/datasets/video_dataset.py", line 58, in load_annotations
filename, label = line_split
ValueError: not enough values to unpack (expected 2, got 0)
I don't konw why it can't read my list.txt
There lots of videos in webs, it's common to testing a video directly from video's URL, rather than download the video to disk as a temp file and then run the testing pipeline.
Would you consider implementing such testing pipeline?
Thanks.
Thanks for your awesome job.
Now you provide the code of BSN and BMN for Temporal Action Localization. But it only contains the Temporal Propocal Generation part. I note that many works apply the untrimmedNet (CUHK & ETHZ & SIAT Submission to ActivityNet Challenge) to the get classification results, but I have not found the classification results file or a easy way to get the classification results .
Do you have plan to provide the code for classifing the proposals to get the final metric mAP?
i try to train kinetics with this config, but get index out of range
error.
after some debugging, i find that this bug is caused by default setting start_index = 1
in SampleFrames
.
I think start_index
should be 0
for decord
.
when I try to run demo.py to test my own video
I use:
python demo/demo.py configs/recognition/slowfast/slowfast_r50_4x16x1_256e_kinetics400_rgb.py demo/checkpoints/slowfast_r50_4x16x1_256e_kinetics400_rgb_20200618-9a124260.pth demo/test1.mp4 demo/label_map.txt
but it was wrong and say:
Traceback (most recent call last):
File "demo/demo.py", line 35, in
main()
File "demo/demo.py", line 27, in main
results = inference_recognizer(model, args.video, args.label)
File "/dat01/wangbo2/ZT/mmaction2/mmaction/apis/inference.py", line 63, in inference_recognizer
data = test_pipeline(data)
File "/dat01/wangbo2/ZT/mmaction2/mmaction/datasets/pipelines/compose.py", line 41, in call
data = t(data)
File "/dat01/wangbo2/ZT/mmaction2/mmaction/datasets/pipelines/loading.py", line 582, in call
directory = results['frame_dir']
KeyError: 'frame_dir'
my environment:
python 3.6
pytorch 1.3
others followed the requirements
need your help!
hello! thanks for the new repo! I just wanna ask, why mmaction2? why not reorganizing mmaction codebase? what's the difference between mmaction and mmaction2?
Hi, I did not find the results of the experiment. I would like to ask how the UCF 101 dataset performs on slowfast in your experiment? And have you achieve the acc in the original paper?
Hi,
I want to use 'StepLrUpdaterHook', but I do not want it decrease to 0.1 * lr at the step I specified. What I want is base_lr = 0.1, and then the flowing decreased lr is 0.5 * base_lr, 0.1 * base_lr, 0.05 * base_lr, 0.001 * base_lr.
How can i do it?
Thanks in advance!
This is probably a silly question. I am interested in aerial and surveillance action recognition. Like for the recently completed ActivityNet at CVPR, there was a surveillance challenge with MEVA/Virat data. As the actions there are not as nuanced as in Kinectics/Ava, and some are also taken from aerial perspective, can we still apply mmaction2 for that type of data?
Multigrid training.
A copy typo
https://github.com/open-mmlab/mmaction2/blob/master/mmaction/datasets/pipelines/loading.py#L682
PyAV Init -> Decord Init
I want to reimplement TPN in mmaction2. TPN registries the 'TPN' NECK module in the original mmaction, how can I implement this function in mmaction2?
The original something-something v1 dataset already contains frames after extraction. So the preparation process probably needs a refactorization. What is needed is just renaming the extracted frames to follow the naming convention "img_%05d.jpg".
I want know whether test.py can output the predicted values( if I have 3cls, it can output 0 or 1 or 2? ) or labels?
I followed your guide to add '-out result,json', but I can't understand what the values mean in the result.json .
Also, I want to ask a question about the model:
As a 3D model, whether Slowfast can predict with only one picture as input?
I have tried my idea, but it doesn't work, maybe I get wrong dataset for rawframes.
Will this codebase add FineGym to the data_preparation?
I cannot download the Kinetics400 dataset. When I train your tsn model, it's hard to reproduce your released accuracy. I don't know the problem. Please, could you share your used kinetics400 data set?
It can be used to recognize real-time videos with webcamera or something else?
The test command in BMN README can not run. Please fix and also check other README files for temporal proposal generation.
My CPU is AMD ThreadRipper 2990wx and GPU is Titan RTX.
No matter how much I set workers_per_gpu to, the code only uses one thread of the cpu, and cannot use all the 64 threads of the cpu.
Can anyone help me,Thanks!!!
Thanks for your error report and we appreciate it a lot.
Checklist
Describe the bug
In the document of TIN under the Modelzoo section, some tables are not correctly displayed. [Link]
But it seems fine in the README of TIN, therefore a re-compilation of the document may be required.
tsn_r50_1x1x3_80e_ucf101_rgb.py
Traceback (most recent call last):
File "D:\Anaconda\envs\mmaction2-master\lib\sre_parse.py", line 1015, in parse_template
this = chr(ESCAPES[this][1])
KeyError: '\U'
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "C:/Users/11987/Desktop/小论文素材/model/mmaction2-master/tools/train.py", line 167, in
main()
File "C:/Users/11987/Desktop/小论文素材/model/mmaction2-master/tools/train.py", line 83, in main
cfg = Config.fromfile(args.config)
File "D:\Anaconda\envs\mmaction2-master\lib\site-packages\mmcv-1.0.2-py3.7-win-amd64.egg\mmcv\utils\config.py", line 204, in fromfile
use_predefined_variables)
File "D:\Anaconda\envs\mmaction2-master\lib\site-packages\mmcv-1.0.2-py3.7-win-amd64.egg\mmcv\utils\config.py", line 127, in _file2dict
temp_config_file.name)
File "D:\Anaconda\envs\mmaction2-master\lib\site-packages\mmcv-1.0.2-py3.7-win-amd64.egg\mmcv\utils\config.py", line 108, in _substitute_predefined_vars
config_file = re.sub(regexp, value, config_file)
File "D:\Anaconda\envs\mmaction2-master\lib\re.py", line 192, in sub
return _compile(pattern, flags).sub(repl, string, count)
File "D:\Anaconda\envs\mmaction2-master\lib\re.py", line 309, in _subx
template = _compile_repl(template, pattern)
File "D:\Anaconda\envs\mmaction2-master\lib\re.py", line 300, in _compile_repl
return sre_parse.parse_template(repl, pattern)
File "D:\Anaconda\envs\mmaction2-master\lib\sre_parse.py", line 1018, in parse_template
raise s.error('bad escape %s' % this, len(this))
re.error: bad escape \U at position 2
When I run train.py, I encountered this problem, I tried to change re to regex and I also got this:
regex._regex_core.error: incomplete escape \U at position 5
Can anyone help me,Thanks!!!
Hi, thanks for providing this awesome tool first.
I trained on my own datasets and it works on the webcam demo with TSN.
I tried to run the webcam demo with r2plus1d but it failed.
Here is the error messages:
Traceback (most recent call last):
File "demo/webcam_demo.py", line 161, in
main()
File "demo/webcam_demo.py", line 157, in main
predict_webcam_video()
File "demo/webcam_demo.py", line 83, in predict_webcam_video
cur_data = test_pipeline(cur_data)
File "/home/ubuntu/Desktop/YHWang/mmaction2/mmaction/datasets/pipelines/compose.py", line 41, in call
data = t(data)
File "/home/ubuntu/Desktop/YHWang/mmaction2/mmaction/datasets/pipelines/formating.py", line 248, in call
num_clips = results['num_clips']
KeyError: 'num_clips'
The config i modified is num_classes(in r2plus1d_r34_video_inference_8x8x1_180e_kinetics400_rgb.py). I changed it from 400 to 12 (my datasets class numbers).
After a little test, i found that it fail to get the clip_len and num_clips in the test_pipeline dict.
I tried to comment some code in formating.py:
"
if self.input_format == 'NCTHW':
#num_clips = results['num_clips']
#clip_len = results['clip_len']
imgs = imgs.reshape((-1, num_clips, clip_len) + imgs.shape[1:])
"
and i change num_clips, clip_len to some number then it works.
But the predictied label doesn't change by time, maybe the result is wrong.
Sorry for my poor english.
Could you give me some idea? Thanks for you help!
Notice
There are several common situations in the reimplementation issues as below
Checklist
Describe the issue
When I tested tsm_r50_1x1x16_50e_sthv2_rgb with this checkpoint , the result is lower than the reported accuracy (57.68/83.65).
I used sthv2 dataset in original webm video format.
Reproduction
bash tools/dist_test.sh configs/recognition/tsm/tsm_r50_1x1x16_50e_sthv2_rgb.py work_dirs/tsm_r50_1x1x16_50e_sthv2_rgb_20200621-60ff441a.pth 8 --eval top_k_accuracy mean_class_accuracy
configs/recognition/tsm/tsm_r50_1x1x16_50e_sthv2_rgb.py
To use something-somethingv-2 original video dataset, I just made sthv2_{train, val}_list_videos.txt files.
Also, modified the config file to use this video format.
# model settings
model = dict(
type='Recognizer2D',
backbone=dict(
type='ResNetTSM',
pretrained='torchvision://resnet50',
depth=50,
norm_eval=False,
shift_div=8),
cls_head=dict(
type='TSMHead',
num_classes=339,
in_channels=2048,
spatial_type='avg',
consensus=dict(type='AvgConsensus', dim=1),
dropout_ratio=0.5,
init_std=0.001,
is_shift=True))
# model training and testing settings
train_cfg = None
test_cfg = dict(average_clips=None)
# dataset settings
# dataset_type = 'RawframeDataset'
# data_root = 'data/sthv2/rawframes'
# data_root_val = 'data/sthv2/rawframes'
# ann_file_train = 'data/sthv2/sthv2_train_list_rawframes.txt'
# ann_file_val = 'data/sthv2/sthv2_val_list_rawframes.txt'
# ann_file_test = 'data/sthv2/sthv2_val_list_rawframes.txt'
dataset_type = 'VideoDataset'
data_root = 'data/sthv2/videos'
data_root_val = 'data/sthv2/videos'
ann_file_train = 'data/sthv2/sthv2_train_list_videos.txt'
ann_file_val = 'data/sthv2/sthv2_val_list_videos.txt'
ann_file_test = 'data/sthv2/sthv2_val_list_videos.txt'
img_norm_cfg = dict(
mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], to_bgr=False)
train_pipeline = [
dict(type='DecordInit'),
dict(type='SampleFrames', clip_len=1, frame_interval=1, num_clips=16),
# dict(type='RawFrameDecode'),
dict(type='DecordDecode'),
dict(type='Resize', scale=(-1, 256)),
dict(
type='MultiScaleCrop',
input_size=224,
scales=(1, 0.875, 0.75, 0.66),
random_crop=False,
max_wh_scale_gap=1,
num_fixed_crops=13),
dict(type='Resize', scale=(224, 224), keep_ratio=False),
dict(type='Normalize', **img_norm_cfg),
dict(type='FormatShape', input_format='NCHW'),
dict(type='Collect', keys=['imgs', 'label'], meta_keys=[]),
dict(type='ToTensor', keys=['imgs', 'label'])
]
val_pipeline = [
dict(type='DecordInit'),
dict(
type='SampleFrames',
clip_len=1,
frame_interval=1,
num_clips=16,
test_mode=True),
# dict(type='RawFrameDecode'),
dict(type='DecordDecode'),
dict(type='Resize', scale=(-1, 256)),
dict(type='CenterCrop', crop_size=224),
dict(type='Normalize', **img_norm_cfg),
dict(type='FormatShape', input_format='NCHW'),
dict(type='Collect', keys=['imgs', 'label'], meta_keys=[]),
dict(type='ToTensor', keys=['imgs'])
]
test_pipeline = [
dict(type='DecordInit'),
dict(
type='SampleFrames',
clip_len=1,
frame_interval=1,
num_clips=16,
test_mode=True),
# dict(type='RawFrameDecode'),
dict(type='DecordDecode'),
dict(type='Resize', scale=(-1, 256)),
dict(type='CenterCrop', crop_size=224),
dict(type='Normalize', **img_norm_cfg),
dict(type='FormatShape', input_format='NCHW'),
dict(type='Collect', keys=['imgs', 'label'], meta_keys=[]),
dict(type='ToTensor', keys=['imgs'])
]
data = dict(
videos_per_gpu=6,
workers_per_gpu=4,
train=dict(
type=dataset_type,
ann_file=ann_file_train,
data_prefix=data_root,
pipeline=train_pipeline),
val=dict(
type=dataset_type,
ann_file=ann_file_val,
data_prefix=data_root_val,
pipeline=val_pipeline),
test=dict(
type=dataset_type,
ann_file=ann_file_test,
data_prefix=data_root_val,
pipeline=test_pipeline))
# optimizer
optimizer = dict(
type='SGD',
constructor='TSMOptimizerConstructor',
paramwise_cfg=dict(fc_lr5=True),
lr=0.0075, # this lr is used for 8 gpus
momentum=0.9,
weight_decay=0.0005)
optimizer_config = dict(grad_clip=dict(max_norm=20, norm_type=2))
# learning policy
lr_config = dict(policy='step', step=[20, 40])
total_epochs = 50
checkpoint_config = dict(interval=1)
evaluation = dict(
interval=2, metrics=['top_k_accuracy', 'mean_class_accuracy'], topk=(1, 5))
log_config = dict(
interval=20,
hooks=[
dict(type='TextLoggerHook'),
# dict(type='TensorboardLoggerHook'),
])
# runtime settings
dist_params = dict(backend='nccl')
log_level = 'INFO'
work_dir = './work_dirs/tsm_r50_1x1x16_50e_sthv2_rgb/'
load_from = None
resume_from = None
workflow = [('train', 1)]
--> Something-Something-V2
Environment
PYTHONPATH=${PWD}:$PYTHONPATH python mmaction/utils/collect_env.py
to collect necessary environment information and paste it here.sys.platform: linux
Python: 3.7.9 (default, Aug 31 2020, 12:42:55) [GCC 7.3.0]
CUDA available: True
GPU 0,1,2,3,4,5,6,7: TITAN Xp
CUDA_HOME: /usr/local/cuda
NVCC:
GCC: gcc (Ubuntu 5.4.0-6ubuntu1~16.04.12) 5.4.0 20160609
PyTorch: 1.6.0
PyTorch compiling details: PyTorch built with:
- GCC 7.3
- C++ Version: 201402
- Intel(R) Math Kernel Library Version 2020.0.2 Product Build 20200624 for Intel(R) 64 architecture applications
- Intel(R) MKL-DNN v1.5.0 (Git Hash e2ac1fac44c5078ca927cb9b90e1b3066a0b2ed0)
- OpenMP 201511 (a.k.a. OpenMP 4.5)
- NNPACK is enabled
- CPU capability usage: AVX2
- CUDA Runtime 10.2
- NVCC architecture flags: -gencode;arch=compute_37,code=sm_37;-gencode;arch=compute_50,code=sm_50;-gencode;arch=compute_60,code=sm_60;-gencode;arch=compute_61,code=sm_61;-gencode;arch=compute_70,code=sm_70;-gencode;arch=compute_75,code=sm_75;-gencode;arch=compute_37,code=compute_37
- CuDNN 7.6.5
- Magma 2.5.2
- Build settings: BLAS=MKL, BUILD_TYPE=Release, CXX_FLAGS= -Wno-deprecated -fvisibility-inlines-hidden -DUSE_PTHREADPOOL -fopenmp -DNDEBUG -DUSE_FBGEMM -DUSE_QNNPACK -DUSE_PYTORCH_QNNPACK -DUSE_XNNPACK -DUSE_VULKAN_WRAPPER -O2 -fPIC -Wno-narrowing -Wall -Wextra -Werror=return-type -Wno-missing-field-initializers -Wno-type-limits -Wno-array-bounds -Wno-unknown-pragmas -Wno-sign-compare -Wno-unused-parameter -Wno-unused-variable -Wno-unused-function -Wno-unused-result -Wno-unused-local-typedefs -Wno-strict-overflow -Wno-strict-aliasing -Wno-error=deprecated-declarations -Wno-stringop-overflow -Wno-error=pedantic -Wno-error=redundant-decls -Wno-error=old-style-cast -fdiagnostics-color=always -faligned-new -Wno-unused-but-set-variable -Wno-maybe-uninitialized -fno-math-errno -fno-trapping-math -Werror=format -Wno-stringop-overflow, PERF_WITH_AVX=1, PERF_WITH_AVX2=1, PERF_WITH_AVX512=1, USE_CUDA=ON, USE_EXCEPTION_PTR=1, USE_GFLAGS=OFF, USE_GLOG=OFF, USE_MKL=ON, USE_MKLDNN=ON, USE_MPI=OFF, USE_NCCL=ON, USE_NNPACK=ON, USE_OPENMP=ON, USE_STATIC_DISPATCH=OFF,
TorchVision: 0.7.0
OpenCV: 4.4.0
MMCV: 1.1.2
MMCV Compiler: GCC 7.3
MMCV CUDA Compiler: 10.2
MMAction2: 0.6.0+7dc58b3
Results
If applicable, paste the related results here, e.g., what you expect and what you get.
Evaluating top_k_accuracy...
top1_acc 0.4162
top5_acc 0.7047
Evaluating mean_class_accuracy...
mean_acc 0.3648
top1_acc: 0.4162
top5_acc: 0.7047
mean_class_accuracy: 0.3648
Issue fix
If you have already identified the reason, you can provide the information here. If you are willing to create a PR to fix it, please also leave a comment here and that would be much appreciated!
python tools/train.py configs/recognition/slowfast/slowfast_r50_4x8x1_256e_jester_rgb.py --validate
error info:
Traceback (most recent call last):
File "/export/mmaction2/tools/train.py", line 146, in
main()
File "/export/mmaction2/tools/train.py", line 142, in main
meta=meta)
File "/export/mmaction2/mmaction/apis/train.py", line 111, in train_model
runner.run(data_loaders, cfg.workflow, cfg.total_epochs)
File "/home/zhanglu/miniconda3/envs/open-mmlab/lib/python3.7/site-packages/mmcv/runner/epoch_based_runner.py", line 103, in run
self.call_hook('before_run')
File "/home/zhanglu/miniconda3/envs/open-mmlab/lib/python3.7/site-packages/mmcv/runner/base_runner.py", line 298, in call_hook
getattr(hook, fn_name)(self)
File "/home/zhanglu/miniconda3/envs/open-mmlab/lib/python3.7/site-packages/mmcv/runner/hooks/lr_updater.py", line 114, in before_run
epoch_len = len(runner.data_loader)
AttributeError: 'EpochBasedRunner' object has no attribute 'data_loader'
The same error will occur with the csn model.
Hi,
Thanks for the great repo!
I know that bmn support video start time point and end time point prediction. But will it support the classification for this video snip between start and end point? if not, how to do the classification in an end to end way? any suggestions?
will add this feature to the repo in near future?
Thanks in advance!
When following the demo.py in documents, I got an error like this:
TypeError: init_recognizer() got an unexpected keyword argument 'use_frames'
Is there anything thing changed about the recognizer?
Describe the bug
When I extracting rgb frames using tools/data/sthv2/extract_rgb_frames_opencv.sh
, opencv resize error happened.
From the error trackback, it may be caused by mmcv.
Reproduction
sh extract_rgb_frames_opencv.sh
in tools/data/sthv2
Did you make any modifications on the code or config? Did you understand what you have modified?
--> No
What dataset did you use?
--> sthv2
Environment
sys.platform: linux
Python: 3.7.9 (default, Aug 31 2020, 12:42:55) [GCC 7.3.0]
CUDA available: True
GPU 0,1,2,3,4,5,6,7: TITAN Xp
CUDA_HOME: /usr/local/cuda
NVCC:
GCC: gcc (Ubuntu 5.4.0-6ubuntu1~16.04.12) 5.4.0 20160609
PyTorch: 1.6.0
PyTorch compiling details: PyTorch built with:
- GCC 7.3
- C++ Version: 201402
- Intel(R) Math Kernel Library Version 2020.0.2 Product Build 20200624 for Intel(R) 64 architecture applications
- Intel(R) MKL-DNN v1.5.0 (Git Hash e2ac1fac44c5078ca927cb9b90e1b3066a0b2ed0)
- OpenMP 201511 (a.k.a. OpenMP 4.5)
- NNPACK is enabled
- CPU capability usage: AVX2
- CUDA Runtime 10.2
- NVCC architecture flags: -gencode;arch=compute_37,code=sm_37;-gencode;arch=compute_50,code=sm_50;-gencode;arch=compute_60,code=sm_60;-gencode;arch=compute_61,code=sm_61;-gencode;arch=compute_70,code=sm_70;-gencode;arch=compute_75,code=sm_75;-gencode;arch=compute_37,code=compute_37
- CuDNN 7.6.5
- Magma 2.5.2
- Build settings: BLAS=MKL, BUILD_TYPE=Release, CXX_FLAGS= -Wno-deprecated -fvisibility-inlines-hidden -DUSE_PTHREADPOOL -fopenmp -DNDEBUG -DUSE_FBGEMM -DUSE_QNNPACK -DUSE_PYTORCH_QNNPACK -DUSE_XNNPACK -DUSE_VULKAN_WRAPPER -O2 -fPIC -Wno-narrowing -Wall -Wextra -Werror=return-type -Wno-missing-field-initializers -Wno-type-limits -Wno-array-bounds -Wno-unknown-pragmas -Wno-sign-compare -Wno-unused-parameter -Wno-unused-variable -Wno-unused-function -Wno-unused-result -Wno-unused-local-typedefs -Wno-strict-overflow -Wno-strict-aliasing -Wno-error=deprecated-declarations -Wno-stringop-overflow -Wno-error=pedantic -Wno-error=redundant-decls -Wno-error=old-style-cast -fdiagnostics-color=always -faligned-new -Wno-unused-but-set-variable -Wno-maybe-uninitialized -fno-math-errno -fno-trapping-math -Werror=format -Wno-stringop-overflow, PERF_WITH_AVX=1, PERF_WITH_AVX2=1, PERF_WITH_AVX512=1, USE_CUDA=ON, USE_EXCEPTION_PTR=1, USE_GFLAGS=OFF, USE_GLOG=OFF, USE_MKL=ON, USE_MKLDNN=ON, USE_MPI=OFF, USE_NCCL=ON, USE_NNPACK=ON, USE_OPENMP=ON, USE_STATIC_DISPATCH=OFF,
TorchVision: 0.7.0
OpenCV: 4.4.0
MMCV: 1.1.2
MMCV Compiler: GCC 7.3
MMCV CUDA Compiler: 10.2
MMAction2: 0.6.0+7dc58b3
Error traceback
If applicable, paste the error traceback here.
Traceback (most recent call last):
File "build_rawframes.py", line 226, in <module>
len(vid_list) * [args.task]))
File "/home/lsrock1/anaconda3/envs/pytorch1.6/lib/python3.7/multiprocessing/pool.py", line 268, in map
return self._map_async(func, iterable, mapstar, chunksize).get()
File "/home/lsrock1/anaconda3/envs/pytorch1.6/lib/python3.7/multiprocessing/pool.py", line 657, in get
raise self._value
cv2.error: OpenCV(4.4.0) /tmp/pip-req-build-nzyrw1vf/opencv/modules/imgproc/src/resize.cpp:3932: error: (-215:Assertion failed) inv_scale_x > 0 in function 'resize'
Genearte raw frames (RGB only)
hi mmation2 , i trained a model for ucf101 using config file configs/recognition/tsn/tsn_r50_1x1x3_80e_ucf101_rgb.py
now i want to test the power of it, using following code:
$ CUDA_VISIBLE_DEVICES=1 python tools/test.py configs/recognition/tsn/tsn_r50_1x1x3_80e_ucf101_rgb.py work_dirs/tsn_r50_1x1x3_80e_ucf101_rgb/latest.pth --eval top_k_accuracy mean_class_accuracy --out result.json
it works fine, but when save result into json file, error happes
$ CUDA_VISIBLE_DEVICES=1 python tools/test.py configs/recognition/tsn/tsn_r50_1x1x3_80e_ucf101_rgb.py work_dirs/tsn_r50_1x1x3_80e_ucf101_rgb/latest.pth --eval top_k_accuracy mean_class_accuracy --out result.json
2020-08-12 14:40:58,082 - mmaction - INFO - These parameters in pretrained checkpoint are not loaded: {'fc.bias', 'fc.weight'}
[>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>] 4/4, 1.6 task/s, elapsed: 3s, ETA: 0s
writing results to result.json
Traceback (most recent call last):
File "tools/test.py", line 139, in <module>
main()
File "tools/test.py", line 131, in main
dataset.dump_results(outputs, **output_config)
File "/home/zj/zhonglian/mmaction2/mmaction/datasets/base.py", line 86, in dump_results
return mmcv.dump(results, out)
File "/home/zj/zhonglian/mmcv/mmcv/fileio/io.py", line 80, in dump
handler.dump_to_path(obj, file, **kwargs)
File "/home/zj/zhonglian/mmcv/mmcv/fileio/handlers/base.py", line 25, in dump_to_path
self.dump_to_fileobj(obj, f, **kwargs)
File "/home/zj/zhonglian/mmcv/mmcv/fileio/handlers/json_handler.py", line 13, in dump_to_fileobj
json.dump(obj, file, **kwargs)
File "/home/zj/anaconda3/envs/zhonglian/lib/python3.7/json/__init__.py", line 179, in dump
for chunk in iterable:
File "/home/zj/anaconda3/envs/zhonglian/lib/python3.7/json/encoder.py", line 429, in _iterencode
yield from _iterencode_list(o, _current_indent_level)
File "/home/zj/anaconda3/envs/zhonglian/lib/python3.7/json/encoder.py", line 325, in _iterencode_list
yield from chunks
File "/home/zj/anaconda3/envs/zhonglian/lib/python3.7/json/encoder.py", line 438, in _iterencode
o = _default(o)
File "/home/zj/anaconda3/envs/zhonglian/lib/python3.7/json/encoder.py", line 179, in default
raise TypeError(f'Object of type {o.__class__.__name__} '
TypeError: Object of type ndarray is not JSON serializable
in tools/test.py
, the correlative code is
if rank == 0:
if output_config:
out = output_config['out']
print(f'\nwriting results to {out}')
dataset.dump_results(outputs, **output_config)
i printed the output_config
and out
, the info is:
{'out': 'result.json'}
[array([ 1.9529229 , 2.923142 , 0.26469332, -0.12839134, -2.1167572 ,
-0.7340729 , -1.3261667 , 0.4541236 , 0.94828093, 1.155941 ,
0.23628543, -0.78831387, 2.2801087 , 1.0793906 , 0.31419927,
0.30997226, 0.5425246 , -0.70942116, -1.1134925 , 2.236816 ,
3.9390984 , -2.1505275 , -1.085769 , -2.8008654 , -1.3788043 ,
-0.35550973, 0.6128084 , 0.97523236, -1.4105709 , -1.2038826 ,
-2.1797624 , -1.4052689 , -0.67973197, 1.7024329 , -0.7162529 ,
-1.2531643 , -1.405829 , -1.7755532 , -0.9127121 , -0.52495575,
0.5702051 , -0.54499656, 0.9248879 , -1.0198474 , 1.8331637 ,
-0.5963148 , -1.2978854 , 1.1907437 , -1.4260625 , 0.20374985,
1.7188393 , 0.9811421 , -1.6228783 , 0.58338284, -0.7557665 ,
-1.0928499 , -0.7617161 , -0.65688896, 4.0263968 , -0.09345046,
0.07987386, 0.73330057, 13.416785 , -0.31503808, 3.6180706 ,
1.4577851 , 1.4350643 , 0.21168658, -0.19559935, -1.103691 ,
0.7532946 , 1.5955294 , -1.1590674 , -1.2700799 , -0.32934734,
-0.52962774, -0.747167 , 0.18337195, -0.2666077 , 0.717041 ,
-0.6293016 , -0.6326269 , -0.17059498, -2.4983056 , 0.0488462 ,
-1.161425 , 0.13799725, -1.8053738 , -1.6930958 , 1.2327036 ,
-1.2348598 , -0.18195666, 1.3208578 , -3.0858784 , 1.1431783 ,
-0.9411551 , -0.7087368 , -1.15071 , -3.0066304 , -1.8325434 ,
2.1851883 ], dtype=float32), array([-3.1554081 , -1.8086557 , -1.2189873 , -1.3863541 , -3.4624038 ,
-3.7378008 , 7.6724052 , 5.97018 , -4.2795534 , -2.5561104 ,
2.2037597 , 1.2032832 , -5.6521015 , -3.200562 , 0.06564808,
-3.1106699 , -0.22693926, -4.557994 , -0.9784015 , -2.8301358 ,
-0.26256648, -1.9581242 , -0.75423837, 3.251859 , -3.7698638 ,
2.8235092 , -2.9476943 , 0.75258267, 10.651768 , 1.6277269 ,
-0.08898169, -1.1676219 , 4.3143296 , -4.4079895 , -4.0753226 ,
2.1783433 , -4.154809 , -1.7371117 , -2.4756253 , 6.97458 ,
-1.4465613 , 3.5330255 , -1.9635652 , -1.0765982 , 3.4709496 ,
-0.44178772, -0.5041221 , 2.493868 , -0.25774002, -2.910048 ,
1.3306173 , 3.3166916 , -1.9219271 , -1.5394036 , -1.2261659 ,
-1.2541034 , -0.9439164 , -0.20131937, -2.7909422 , -1.7844346 ,
-0.31215718, -2.2882266 , -1.4200875 , -2.3059387 , -1.2107593 ,
-2.174218 , -3.193241 , 2.251296 , -2.9217339 , 2.1830683 ,
0.09082523, 0.70335275, -3.5495253 , -5.4326572 , -2.9788358 ,
0.7502857 , -2.0108578 , -3.704027 , 2.679557 , -0.8924122 ,
0.39617965, 2.2738085 , -3.2832923 , 7.1167126 , 3.3312867 ,
-0.20836425, -3.8255863 , -0.7380201 , 2.5008836 , 5.836446 ,
3.9049966 , 16.540073 , 9.489449 , 6.8317823 , -2.6105278 ,
0.0635196 , -0.18466364, 2.4365137 , -0.29589617, -0.49789888,
2.5412517 ], dtype=float32), array([-3.40932107e+00, -1.06936395e-02, -1.73499656e+00, -1.59915805e+00,
5.71720302e-02, -1.26235354e+00, 1.75313354e+00, 1.82909936e-01,
-2.73504066e+00, -8.32203209e-01, 1.33741820e+00, 1.22894943e+00,
-3.33747673e+00, -2.82331657e+00, -6.27151072e-01, -5.35833001e-01,
7.28152394e-02, -3.50825024e+00, 2.36635065e+00, 1.20436706e-01,
1.99636745e+00, 1.94954121e+00, 1.54881507e-01, 3.04111511e-01,
-2.20299864e+00, 4.68201256e+00, -3.32769918e+00, 1.58799827e+00,
2.00522804e+00, 4.28090960e-01, 1.21267533e+00, -3.45705330e-01,
2.38831758e+00, -2.96614265e+00, -1.35263073e+00, 1.28939712e+00,
-1.74022067e+00, -1.94155240e+00, -3.36226821e+00, 7.63379526e+00,
4.00016403e+00, 4.05345821e+00, -4.05784190e-01, 1.22065210e+00,
3.96605849e-01, -3.39757466e+00, 1.67164028e+00, 6.65977716e-01,
3.89114916e-01, -1.13685560e+00, 1.78429723e+00, 1.66959250e+00,
8.51574957e-01, -1.33695388e+00, -3.62328577e+00, -2.20936608e+00,
-4.98263955e-01, -1.52075148e+00, -1.68073058e+00, -3.47000551e+00,
-4.68902290e-03, 9.44112360e-01, -2.32742310e+00, -7.69852519e-01,
-2.74959385e-01, -1.03926265e+00, -1.83813047e+00, 3.34748793e+00,
-3.22042465e-01, -4.92838115e-01, 2.63888419e-01, 3.05683446e+00,
1.63758367e-01, -4.02872753e+00, -2.33594084e+00, 1.09016666e+01,
-2.16153765e+00, -2.93059349e+00, 3.17019510e+00, 1.59995222e+00,
-7.56023049e-01, 7.05853367e+00, -1.75534749e+00, -9.27645862e-02,
-7.87818313e-01, -1.31494510e+00, -5.49836457e-02, 7.27982521e-01,
-9.21023250e-01, 2.67443925e-01, 1.25793505e+00, 1.52883315e+00,
2.56475949e+00, 9.29922283e-01, -1.78127527e+00, -6.23938262e-01,
-6.67548358e-01, 1.15025485e+00, -2.27030230e+00, 2.42970988e-01,
-1.11846581e-01], dtype=float32), array([ 9.7881667e-02, 6.3227153e-01, -1.8561482e+00, -2.1571205e+00,
1.4059830e+01, 4.8399657e-01, -1.8275721e+00, -2.1536226e+00,
2.0527697e+00, -2.3162837e+00, -3.0728564e+00, 4.5147705e-01,
-2.2566085e+00, 9.0172809e-01, 9.2773736e-01, 3.4005036e+00,
-2.4779036e+00, -1.9556541e+00, -4.0643939e-01, -1.2113328e+00,
1.0615828e+00, 1.8980796e+00, 8.0910289e-01, -3.4260190e+00,
1.6985834e-02, 1.8681365e+00, -1.6745995e+00, 3.1297741e+00,
4.9533206e-01, 7.7088308e+00, -9.4858694e-01, 1.6952250e+00,
-3.3255212e+00, -9.6397811e-01, 2.0618695e-01, 3.0011529e-01,
1.3867394e+00, 2.7509351e+00, -1.8679692e+00, 1.8175439e+00,
-1.7074220e+00, -3.3053722e+00, 4.2096773e-01, 3.0590990e+00,
-3.0134280e+00, -4.1446114e+00, 1.4162828e+00, -1.3907127e+00,
-2.8771629e+00, 9.5357203e-01, 1.0698979e+00, -3.5089359e+00,
-4.6066377e-01, -2.0315270e+00, -2.4641752e+00, -1.7112375e+00,
7.7639780e+00, -7.3515660e-01, -1.6210897e+00, -1.6490629e+00,
-1.4550496e+00, 8.2967222e-01, -2.4997182e+00, -3.0694556e-01,
7.3129952e-01, -7.7849364e-01, -8.0653977e-01, -2.7814975e-01,
6.9563894e+00, -2.2368103e-02, 1.2655897e+00, 1.0192424e-02,
1.6345310e+00, -2.7512756e-01, -1.4516522e+00, -1.3889271e-01,
-7.7020127e-01, -1.5020751e+00, 1.9333646e+00, -4.9428000e+00,
-1.9338553e+00, -2.5300448e+00, 4.6418971e-01, -5.1236825e+00,
2.4116956e-01, 7.5193768e+00, 5.8947573e+00, -5.9647286e-01,
-3.0245688e+00, 1.1701695e+00, -2.1766311e-01, -2.4784267e+00,
-2.7892220e+00, 1.5604091e-01, 2.2785933e+00, 7.8045473e+00,
-1.1207641e+00, -2.6828754e+00, 1.1542189e+00, 3.2799768e-01,
-9.7450703e-01], dtype=float32)]
I hope you can help me solve this problem
Describe the feature
Motivation
A clear and concise description of the motivation of the feature.
Ex1. It is inconvenient when [....].
Ex2. There is a recent paper [....], which is very helpful for [....].
Related resources
If there is an official code released or third-party implementations, please also provide the information here, which would be very helpful.
Additional context
Add any other context or screenshots about the feature request here.
If you would like to implement the feature and create a PR, please leave a comment here and that would be much appreciated.
You should just add one line in tsm_head.py to support multi_class.
Can I use mmaction2 to Classify videos? I I've just come into contact with this framework, and I'm not very familiar with it.Please tell me which model in the model zoo would be good at video classification?
Thanks for your awesome codabase.
I'm trying to train TSM with temporal_pool=True
(add temporal_pool=True
in both TSMHead
& ResNetTSM
) but get some errors.
After some debugging, i think ResNetTSM
forget to do actual temporal pool between layer1 and layer2.
which means, feature map shape before layer2 should be N * num_segments/2, C, H, W
instead of N * num_segments, C, H, W
In original TSM codabase, when temporal_pool=True
, there is a max_pool3d
to do actual temporal pool before layer2, which is missing in mmaction2.
I got the suggestions "The gpus indicates the number of gpu we used to get the checkpoint. It is noteworthy that the configs we provide are used for 8 gpus as default. According to the Linear Scaling Rule, you may set the learning rate proportional to the batch size if you use different GPUs or videos per GPU, e.g., lr=0.01 for 4 GPUs * 2 video/gpu and lr=0.08 for 16 GPUs * 4 video/gpu." when tried to use slowfast configs.
Yet the lr and videos_per_gpu in these configs files are different from those in README. For example, in https://github.com/open-mmlab/mmaction2/blob/master/configs/recognition/slowfast/slowfast_r50_4x16x1_256e_kinetics400_rgb.py, the 'lr' is 0.1 and the 'videos_per_gpu' is 8.
So, which one is the correct setting to reproduce the performance mentioned in README?
We keep this issue open to collect feature requests from users and hear your voice. Our monthly release plan is also available here.
You can either:
training command in BSN README
python tools/train.py configs/localization/bsn/bsn_400x100_1x16_20e_activitynet_feature.py
can not run(filename not updated)
mmaction\models\recognizers\recognizer3d.py def forward_test(self, imgs):
Loss is not calculated, and accuracy is not calculated. So why do I use it? I recommend printing the accuracy after the evaluation
In the BMN Model Zoo there are results of feature extracted by MMAction but I found on details in Data Preparation about how to extract the feature using TSN.
After refer to BMN paper and some issues, I am still confusing about the details.
assume the video has 16,000 frames
Divide all frames into 1000 continuous non-overlap snippets, each has 16 frames. Decode video to raw frames and calculate optical flow.
Select the 8-th rgb frame and 6,7,8,9,10-th optical flow frames in each snippet to represent this snippet.
For one snippet:
RGB: initialize TSN network with ActivityNet RGB corresponding config and ckpt in TSN Model Zoo. Input one rgb frame (8-th), simply resize to 224x224
without any crop, then cls_score
return by tsn_head will be a tensor with shape [1, 200]
.
Flow: initialize TSN network with flow config and ckpt, input five optical flow frames, then consensus module
will "average" them, so cls_score
will also be a tensor with shape [1, 200]
.
concat two tensor above -> get feature of this snippet
[1000, 400]
, then use this script to rescaled to [100, 400]
Is above the right step? Or could you add your feature extraction script to this repo.
Thank you!
Currently some typical used resolutions for action recognition include:
Obviously different resolution might or might not influence the accuracy. So it is good to mark the resolution of the training data
Edit:
Forgot about the video format.
你好。非常感谢你们的implementation。
请问你们接下去有没有计划实现对动画中出现的不同人都进行独立的行为识别推论的计划?就像slowfast他们的implementation一样,他们对AVA的dataset也能进行学习与推论。
我们现在在做监视摄像头的行为识别,里面出现的人不止一个,所以希望能实现独立的推论。
hi mmaction2, i met this problem several times, use UCF101 dataset to train, use default config file
$ CUDA_VISIBLE_DEVICES=1 python tools/train.py configs/recognition/tsn/tsn_r50_1x1x3_80e_ucf101_rgb.py
refer to #101 , add start_index=0
to data dict
but there was a problem
2020-08-12 09:20:54,393 - mmaction - INFO - These parameters in pretrained checkpoint are not loaded: {'fc.bias', 'fc.weight'}
Traceback (most recent call last):
File "tools/train.py", line 146, in <module>
main()
File "tools/train.py", line 125, in main
datasets = [build_dataset(cfg.data.train)]
File "/home/zj/zhonglian/mmaction2/mmaction/datasets/builder.py", line 39, in build_dataset
dataset = build_from_cfg(cfg, DATASETS, default_args)
File "/home/zj/zhonglian/mmcv/mmcv/utils/registry.py", line 167, in build_from_cfg
return obj_cls(**args)
File "/home/zj/zhonglian/mmaction2/mmaction/datasets/rawframe_dataset.py", line 93, in __init__
multi_class, num_classes, start_index, modality)
File "/home/zj/zhonglian/mmaction2/mmaction/datasets/base.py", line 63, in __init__
self.video_infos = self.load_annotations()
File "/home/zj/zhonglian/mmaction2/mmaction/datasets/rawframe_dataset.py", line 98, in load_annotations
with open(self.ann_file, 'r') as fin:
FileNotFoundError: [Errno 2] No such file or directory: 'data/ucf101/ucf101_train_split_{1,2,3}_rawframes.txt'
when i modified this file path in config file
# ann_file_train = 'data/ucf101/ucf101_train_split_{1,2,3}_rawframes.txt'
ann_file_train = 'data/ucf101/ucf101_train_split_1_rawframes.txt'
ok, everything is fine. I wonder if it needs to be changed every time, after trained use 2
to continue train, because there has a config
work_dir = './work_dirs/tsn_r50_1x1x3_80e_ucf101_rgb/'
Hi,
For experiments using R(2+1)D and I3D backbone
(https://github.com/open-mmlab/mmaction2/blob/master/configs/recognition/r2plus1d/README.md),
(https://github.com/open-mmlab/mmaction2/blob/master/configs/recognition/i3d/README.md),
did you have experiment results on UCF-101 and HMDB-51? If yes, would you mind share with me your experimental results and give me more information about model initialization (random init or ImageNet pre-trained)
Thanks!
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.