zhaoyue-zephyrus / avion Goto Github PK
View Code? Open in Web Editor NEWCode release for "Training a Large Video Model on a Single Machine in a Day"
Home Page: http://arxiv.org/abs/2309.16669
License: MIT License
Code release for "Training a Large Video Model on a Single Machine in a Day"
Home Page: http://arxiv.org/abs/2309.16669
License: MIT License
I want to run the script main_lavila_finetune_cls.py
and everywhere that there is old_args.model
I got this error. It looks like the model name is not defined...
So how can I solve this issue?
Dear Yue,
Thank you very much for your great codebase. I am trying to finetune the classification model on EK100. The training seems to go smoothly except for one wired message from FFMPEG. I wonder if that would influence the model's performance. Have you had this message as well?
I am looking forward to your reply!
Hi, thanks for sharing the nice work!
I'm planning to run the 'AVION/scripts/main_videomae_pretrain.py' and it needs the '--train-metadata' argument. I checked the rest of the code and it seems it expects something different than the 'EPIC_100_train.csv' file in the original EK100 annotation repo here.
I can guess how the data format is and I can create the expected file for the original videos, but I'm wondering how it would be with your resized and chunked dataset version.
Could you please help me with this?
When I tried evolution only, I got an error:
Traceback (most recent call last):
File "/media/f/AVION/scripts/main_lavila_finetune_cls.py", line 622, in <module>
main(args)
File "/media/f/AVION/scripts/main_lavila_finetune_cls.py", line 145, in main
print("=> creating model: {}".format(old_args.model))
AttributeError: 'Namespace' object has no attribute 'model
My run script is as follows:
EXP_PATH=.
export PYTHONPATH=.:third_party/decord/python/
python scripts/main_lavila_finetune_cls.py \
--root /root/h/DataSet/Kitchen/avion_dataset/video_320p_15sec/ \
--train-metadata /root/h/DataSet/Kitchen/avion_dataset/epic-kitchens-100-annotations/EPIC_100_train.csv \
--val-metadata /root/h/DataSet/Kitchen/avion_dataset/epic-kitchens-100-annotations/EPIC_100_validation.csv \
--video-chunk-length 15 \
--use-flash-attn \
--grad-checkpointing \
--use-fast-conv1 \
--batch-size 64 \
--fused-decode-crop \
--use-multi-epochs-loader \
--pretrain-model /root/linux/AVION/pretrainmodels/avion_finetune_cls_lavila_vitb_best.pt \
--output-dir $EXP_PATH 2>&1 | tee $EXP_PATH/log.txt
Depending on the error, I output all the arguments of old args.
old_args = ckpt['args']
print("ckpt\n", ckpt.keys())
print("old args \n", pprint.pformat(vars(old_args)))
print("=> creating model: {}".format(old_args.model))
The output is as follows:
/root/miniconda3/envs/avion/lib/python3.10/site-packages/torchvision/transforms/_functional_video.py:6: UserWarning: The 'torchvision.transforms._functional_video' module is deprecated since 0.12 and will be removed in the future. Please use the 'torchvision.transforms.functional' module instead.
warnings.warn(
/root/miniconda3/envs/avion/lib/python3.10/site-packages/torchvision/transforms/_transforms_video.py:22: UserWarning: The 'torchvision.transforms._transforms_video' module is deprecated since 0.12 and will be removed in the future. Please use the 'torchvision.transforms' module instead.
warnings.warn(
Not using distributed mode
ckpt
dict_keys(['epoch', 'state_dict', 'optimizer', 'scaler', 'best_acc1', 'args'])
old args
{'actions': verb noun
0 0 0
1 0 1
2 0 10
3 0 100
4 0 101
... ... ...
3801 9 93
3802 9 94
3803 9 95
3804 9 98
3805 9 99
[3806 rows x 2 columns],
'batch_size': 64,
'betas': (0.9, 0.999),
'clip_length': 16,
'clip_stride': 2,
'cutmix': 1.0,
'cutmix_minmax': None,
'dataset': 'ek100_cls',
'decode_threads': 1,
'disable_amp': False,
'dist_backend': 'nccl',
'dist_url': 'env://',
'distributed': True,
'drop_path_rate': 0.1,
'dropout_rate': 0.5,
'epochs': 100,
'eps': 1e-08,
'eval_freq': 5,
'evaluate': False,
'fused_decode_crop': True,
'gpu': 0,
'grad_clip_norm': None,
'local_rank': 0,
'lr': 0.012,
'lr_end': 4e-05,
'lr_start': 4e-06,
'mapping_act2n': {0: 0,
1: 1,
2: 10,
3: 100,
4: 101,
5: 102,
6: 103,
7: 104,
8: 105,
9: 106,
10: 107,
11: 108,
12: 109,
13: 11,
14: 110,
......,
3797: 9,
3798: 9,
3799: 9,
3800: 9,
3801: 9,
3802: 9,
3803: 9,
3804: 9,
3805: 9},
'mixup': 0.8,
'mixup_mode': 'batch',
'mixup_prob': 1.0,
'mixup_switch_prob': 0.5,
'norm_style': 'openai',
'num_classes': 3806,
'num_clips': 1,
'num_crops': 1,
'optimizer': 'sgd',
'output_dir': 'experiments/finetune_cls_lavila_vitb/',
'patch_dropout': 0.0,
'pickle_filename': '',
'pretrain_model': './experiments/pretrain_lavila_vitb/checkpoint_best.pt',
'print_freq': 10,
'rank': 0,
'resume': '',
'root': '/storage/Datasets/EPIC-KITCHENS-100/EK100_320p_15sec_30fps_libx264/',
'seed': 0,
'smoothing': 0.1,
'start_epoch': 0,
'train_metadata': 'datasets/EK100/epic-kitchens-100-annotations/EPIC_100_train.csv',
'update_freq': 1,
'use_fast_conv1': True,
'use_flash_attn': True,
'use_grad_checkpointing': True,
'use_multi_epochs_loader': True,
'use_zero': False,
'val_metadata': 'datasets/EK100/epic-kitchens-100-annotations/EPIC_100_validation.csv',
'video_chunk_length': 15,
'warmup_epochs': 2,
'wd': 4e-05,
'workers': 8,
'world_size': 8}
Traceback (most recent call last):
File "/media/f/AVION/scripts/main_lavila_finetune_cls.py", line 621, in <module>
main(args)
File "/media/f/AVION/scripts/main_lavila_finetune_cls.py", line 145, in main
print("=> creating model: {}".format(old_args.model))
AttributeError: 'Namespace' object has no attribute 'model'
It turns out that there is indeed a lack of "model"
I noticed that RandAugment is only used in the Kinetics dataloader if fast_rrc is off. Does this mean that RandAugment was not used for pre-training or finetuning? I also noticed even if RandAugment is moved to the GPU along with other transforms, data loading speed is quite a bit slower. Have you seen this issue before?
Dear Yue,
Thanks for this amazing repo, which is well-documented and easy to try on! However, when I followed your guidance to fine-tune the action recognition model on EK-100. I got very bad results (Acc@1 0.021 Acc@5 0.089
for test and train_Acc@1": 2.62106393129771, "train_Acc@5": 9.759661259541986
even for training). Do you have any idea about that?
To include more details for your reference, I used the resized, chunked version of EK-100 that you shared here. I used the pre-trained model you provided here. And the following is the command I used for fine-tuning.
PYTHONPATH=.:third_party/decord/python/ \
CUDA_VISIBLE_DEVICES="0,1" \
torchrun \
--nproc_per_node=2 scripts/main_lavila_finetune_cls.py \
--root datasets/EK100/EK100_320p_15sec_30fps_libx264/ \
--video-chunk-length 15 --use-flash-attn \
--grad-checkpointing \
--use-fast-conv1 \
--batch-size 256 \
--fused-decode-crop \
--use-multi-epochs-loader \
--pretrain-model experiments/pretrain_lavila_vitb/avion_pretrain_lavila_vitb_best.pt \
--output-dir experiments/EK100_test 2>&1 | tee experiments/EK100_test/train_log.txt
And I also attached the generated train_log.txt for your reference.
Thank you in advance for your help!
Hi, thank you for sharing very helpful repo :)
I downloaded the resized/chunked version of EK100 dataset in this link and tried to unzip it, but unzip EK100_320p_15sec_30fps_libx264.zip
did not work for me. Below is the error message.
Archive: EK100_320p_15sec_30fps_libx264.zip
warning [EK100_320p_15sec_30fps_libx264.zip]: 35339199913 extra bytes at beginning or within zipfile
(attempting to process anyway)
error [EK100_320p_15sec_30fps_libx264.zip]: start of central directory not found;
zipfile corrupt.
(please check that you have transferred or created the zipfile in the
appropriate BINARY mode and that you have compiled UnZip properly)
It seems that the uploaded zip file is corrupted. Could you please have a look on this problem?
Thank you :)
Thanks for all this awesome work.
I see scripts and instructions to do the EK100 MIR task, but not the Action Recognition task. Could you please tell me if I'm missing something? I would like to be able to replicate the Action Recognition performance of 54.4% on the validation split.
Cheers!
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.