zhaoyue-zephyrus / avion Goto Github PK

View Code? Open in Web Editor NEW

97.0 1.0 4.0 1.35 MB

Code release for "Training a Large Video Model on a Single Machine in a Day"

Home Page: http://arxiv.org/abs/2309.16669

License: MIT License

Python 99.59% Shell 0.41%

efficient-training video-understanding

avion's People

Contributors

Stargazers

Watchers

Forkers

amirsh15 moohnai sergshel sejja

avion's Issues

AttributeError: 'Namespace' object has no attribute 'model'

I want to run the script main_lavila_finetune_cls.py and everywhere that there is old_args.model I got this error. It looks like the model name is not defined...
So how can I solve this issue?

WIT dataset? What is the source of WIT video dataset?

[mov,mp4,m4a,3gp,3g2,mj2 @ 0x84289600] Referenced QT chapter track not found

Dear Yue,

Thank you very much for your great codebase. I am trying to finetune the classification model on EK100. The training seems to go smoothly except for one wired message from FFMPEG. I wonder if that would influence the model's performance. Have you had this message as well?

I am looking forward to your reply!

EK100 Train Metadata videoMAE

Hi, thanks for sharing the nice work!

I'm planning to run the 'AVION/scripts/main_videomae_pretrain.py' and it needs the '--train-metadata' argument. I checked the rest of the code and it seems it expects something different than the 'EPIC_100_train.csv' file in the original EK100 annotation repo here.
I can guess how the data format is and I can create the expected file for the original videos, but I'm wondering how it would be with your resized and chunked dataset version.
Could you please help me with this?

Epic Kitchen evalution: AttributeError: 'Namespace' object has no attribute 'model'

When I tried evolution only, I got an error:

Traceback (most recent call last):
  File "/media/f/AVION/scripts/main_lavila_finetune_cls.py", line 622, in <module>
    main(args)
  File "/media/f/AVION/scripts/main_lavila_finetune_cls.py", line 145, in main
    print("=> creating model: {}".format(old_args.model))
AttributeError: 'Namespace' object has no attribute 'model

My run script is as follows：

EXP_PATH=.

export PYTHONPATH=.:third_party/decord/python/

python scripts/main_lavila_finetune_cls.py \
  --root /root/h/DataSet/Kitchen/avion_dataset/video_320p_15sec/ \
  --train-metadata /root/h/DataSet/Kitchen/avion_dataset/epic-kitchens-100-annotations/EPIC_100_train.csv \
  --val-metadata /root/h/DataSet/Kitchen/avion_dataset/epic-kitchens-100-annotations/EPIC_100_validation.csv \
  --video-chunk-length 15 \
  --use-flash-attn \
  --grad-checkpointing \
  --use-fast-conv1 \
  --batch-size 64 \
  --fused-decode-crop \
  --use-multi-epochs-loader \
  --pretrain-model /root/linux/AVION/pretrainmodels/avion_finetune_cls_lavila_vitb_best.pt \
  --output-dir $EXP_PATH 2>&1 | tee $EXP_PATH/log.txt

Depending on the error, I output all the arguments of old args.

old_args = ckpt['args']
print("ckpt\n", ckpt.keys())
print("old args \n", pprint.pformat(vars(old_args)))
print("=> creating model: {}".format(old_args.model))

The output is as follows:

/root/miniconda3/envs/avion/lib/python3.10/site-packages/torchvision/transforms/_functional_video.py:6: UserWarning: The 'torchvision.transforms._functional_video' module is deprecated since 0.12 and will be removed in the future. Please use the 'torchvision.transforms.functional' module instead.
  warnings.warn(
/root/miniconda3/envs/avion/lib/python3.10/site-packages/torchvision/transforms/_transforms_video.py:22: UserWarning: The 'torchvision.transforms._transforms_video' module is deprecated since 0.12 and will be removed in the future. Please use the 'torchvision.transforms' module instead.
  warnings.warn(
Not using distributed mode
ckpt
 dict_keys(['epoch', 'state_dict', 'optimizer', 'scaler', 'best_acc1', 'args'])
old args 
 {'actions':       verb  noun
0        0     0
1        0     1
2        0    10
3        0   100
4        0   101
...    ...   ...
3801     9    93
3802     9    94
3803     9    95
3804     9    98
3805     9    99

[3806 rows x 2 columns],
 'batch_size': 64,
 'betas': (0.9, 0.999),
 'clip_length': 16,
 'clip_stride': 2,
 'cutmix': 1.0,
 'cutmix_minmax': None,
 'dataset': 'ek100_cls',
 'decode_threads': 1,
 'disable_amp': False,
 'dist_backend': 'nccl',
 'dist_url': 'env://',
 'distributed': True,
 'drop_path_rate': 0.1,
 'dropout_rate': 0.5,
 'epochs': 100,
 'eps': 1e-08,
 'eval_freq': 5,
 'evaluate': False,
 'fused_decode_crop': True,
 'gpu': 0,
 'grad_clip_norm': None,
 'local_rank': 0,
 'lr': 0.012,
 'lr_end': 4e-05,
 'lr_start': 4e-06,
 'mapping_act2n': {0: 0,
                   1: 1,
                   2: 10,
                   3: 100,
                   4: 101,
                   5: 102,
                   6: 103,
                   7: 104,
                   8: 105,
                   9: 106,
                   10: 107,
                   11: 108,
                   12: 109,
                   13: 11,
                   14: 110,
                   ......,
                   3797: 9,
                   3798: 9,
                   3799: 9,
                   3800: 9,
                   3801: 9,
                   3802: 9,
                   3803: 9,
                   3804: 9,
                   3805: 9},
 'mixup': 0.8,
 'mixup_mode': 'batch',
 'mixup_prob': 1.0,
 'mixup_switch_prob': 0.5,
 'norm_style': 'openai',
 'num_classes': 3806,
 'num_clips': 1,
 'num_crops': 1,
 'optimizer': 'sgd',
 'output_dir': 'experiments/finetune_cls_lavila_vitb/',
 'patch_dropout': 0.0,
 'pickle_filename': '',
 'pretrain_model': './experiments/pretrain_lavila_vitb/checkpoint_best.pt',
 'print_freq': 10,
 'rank': 0,
 'resume': '',
 'root': '/storage/Datasets/EPIC-KITCHENS-100/EK100_320p_15sec_30fps_libx264/',
 'seed': 0,
 'smoothing': 0.1,
 'start_epoch': 0,
 'train_metadata': 'datasets/EK100/epic-kitchens-100-annotations/EPIC_100_train.csv',
 'update_freq': 1,
 'use_fast_conv1': True,
 'use_flash_attn': True,
 'use_grad_checkpointing': True,
 'use_multi_epochs_loader': True,
 'use_zero': False,
 'val_metadata': 'datasets/EK100/epic-kitchens-100-annotations/EPIC_100_validation.csv',
 'video_chunk_length': 15,
 'warmup_epochs': 2,
 'wd': 4e-05,
 'workers': 8,
 'world_size': 8}
Traceback (most recent call last):
  File "/media/f/AVION/scripts/main_lavila_finetune_cls.py", line 621, in <module>
    main(args)
  File "/media/f/AVION/scripts/main_lavila_finetune_cls.py", line 145, in main
    print("=> creating model: {}".format(old_args.model))
AttributeError: 'Namespace' object has no attribute 'model'

It turns out that there is indeed a lack of "model"

RandAugment usage

I noticed that RandAugment is only used in the Kinetics dataloader if fast_rrc is off. Does this mean that RandAugment was not used for pre-training or finetuning? I also noticed even if RandAugment is moved to the GPU along with other transforms, data loading speed is quite a bit slower. Have you seen this issue before?

Bad performance for EK-100 Action Recognition finetuning model

Dear Yue,

Thanks for this amazing repo, which is well-documented and easy to try on! However, when I followed your guidance to fine-tune the action recognition model on EK-100. I got very bad results (Acc@1 0.021 Acc@5 0.089 for test and train_Acc@1": 2.62106393129771, "train_Acc@5": 9.759661259541986 even for training). Do you have any idea about that?

To include more details for your reference, I used the resized, chunked version of EK-100 that you shared here. I used the pre-trained model you provided here. And the following is the command I used for fine-tuning.

PYTHONPATH=.:third_party/decord/python/ \
CUDA_VISIBLE_DEVICES="0,1" \
torchrun \
    --nproc_per_node=2 scripts/main_lavila_finetune_cls.py \
    --root datasets/EK100/EK100_320p_15sec_30fps_libx264/ \
    --video-chunk-length 15 --use-flash-attn \
    --grad-checkpointing \
    --use-fast-conv1 \
    --batch-size 256 \
    --fused-decode-crop \
    --use-multi-epochs-loader \
    --pretrain-model experiments/pretrain_lavila_vitb/avion_pretrain_lavila_vitb_best.pt \
    --output-dir experiments/EK100_test 2>&1 | tee experiments/EK100_test/train_log.txt

And I also attached the generated train_log.txt for your reference.

Thank you in advance for your help!

Resized, chunked version for EK100 seems corrupted

Hi, thank you for sharing very helpful repo :)

I downloaded the resized/chunked version of EK100 dataset in this link and tried to unzip it, but unzip EK100_320p_15sec_30fps_libx264.zip did not work for me. Below is the error message.

Archive:  EK100_320p_15sec_30fps_libx264.zip
warning [EK100_320p_15sec_30fps_libx264.zip]:  35339199913 extra bytes at beginning or within zipfile
  (attempting to process anyway)
error [EK100_320p_15sec_30fps_libx264.zip]:  start of central directory not found;
  zipfile corrupt.
  (please check that you have transferred or created the zipfile in the
  appropriate BINARY mode and that you have compiled UnZip properly)

It seems that the uploaded zip file is corrupted. Could you please have a look on this problem?

Thank you :)

How can I Replicate your EK100 Action Recognition results?

Thanks for all this awesome work.

I see scripts and instructions to do the EK100 MIR task, but not the Action Recognition task. Could you please tell me if I'm missing something? I would like to be able to replicate the Action Recognition performance of 54.4% on the validation split.

Cheers!

Load the ckpts for LaViLa

Hi, thanks for opening the pre-trained ckpts. I want to know is it possible to load the checkpoint here for the model in LaViLa?
I check the 'state_dict', but it seems that the two do not correspond. Could you please provide a simple script to implement this loading?
Thanks!

zhaoyue-zephyrus / avion Goto Github PK

avion's People

Contributors

Stargazers

Watchers

Forkers

avion's Issues

AttributeError: 'Namespace' object has no attribute 'model'

WIT dataset? What is the source of WIT video dataset?

[mov,mp4,m4a,3gp,3g2,mj2 @ 0x84289600] Referenced QT chapter track not found

EK100 Train Metadata videoMAE

Epic Kitchen evalution: AttributeError: 'Namespace' object has no attribute 'model'

RandAugment usage

Bad performance for EK-100 Action Recognition finetuning model

Resized, chunked version for EK100 seems corrupted

How can I Replicate your EK100 Action Recognition results?

Load the ckpts for LaViLa

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent