Code Monkey home page Code Monkey logo

avion's People

Contributors

zhaoyue-zephyrus avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar

avion's Issues

[mov,mp4,m4a,3gp,3g2,mj2 @ 0x84289600] Referenced QT chapter track not found

Dear Yue,

Thank you very much for your great codebase. I am trying to finetune the classification model on EK100. The training seems to go smoothly except for one wired message from FFMPEG. I wonder if that would influence the model's performance. Have you had this message as well?
image

I am looking forward to your reply!

EK100 Train Metadata videoMAE

Hi, thanks for sharing the nice work!

I'm planning to run the 'AVION/scripts/main_videomae_pretrain.py' and it needs the '--train-metadata' argument. I checked the rest of the code and it seems it expects something different than the 'EPIC_100_train.csv' file in the original EK100 annotation repo here.
I can guess how the data format is and I can create the expected file for the original videos, but I'm wondering how it would be with your resized and chunked dataset version.
Could you please help me with this?

Epic Kitchen evalution: AttributeError: 'Namespace' object has no attribute 'model'

When I tried evolution only, I got an error:

Traceback (most recent call last):
  File "/media/f/AVION/scripts/main_lavila_finetune_cls.py", line 622, in <module>
    main(args)
  File "/media/f/AVION/scripts/main_lavila_finetune_cls.py", line 145, in main
    print("=> creating model: {}".format(old_args.model))
AttributeError: 'Namespace' object has no attribute 'model

My run script is as follows:

EXP_PATH=.

export PYTHONPATH=.:third_party/decord/python/

python scripts/main_lavila_finetune_cls.py \
  --root /root/h/DataSet/Kitchen/avion_dataset/video_320p_15sec/ \
  --train-metadata /root/h/DataSet/Kitchen/avion_dataset/epic-kitchens-100-annotations/EPIC_100_train.csv \
  --val-metadata /root/h/DataSet/Kitchen/avion_dataset/epic-kitchens-100-annotations/EPIC_100_validation.csv \
  --video-chunk-length 15 \
  --use-flash-attn \
  --grad-checkpointing \
  --use-fast-conv1 \
  --batch-size 64 \
  --fused-decode-crop \
  --use-multi-epochs-loader \
  --pretrain-model /root/linux/AVION/pretrainmodels/avion_finetune_cls_lavila_vitb_best.pt \
  --output-dir $EXP_PATH 2>&1 | tee $EXP_PATH/log.txt

Depending on the error, I output all the arguments of old args.

old_args = ckpt['args']
print("ckpt\n", ckpt.keys())
print("old args \n", pprint.pformat(vars(old_args)))
print("=> creating model: {}".format(old_args.model))

The output is as follows:

/root/miniconda3/envs/avion/lib/python3.10/site-packages/torchvision/transforms/_functional_video.py:6: UserWarning: The 'torchvision.transforms._functional_video' module is deprecated since 0.12 and will be removed in the future. Please use the 'torchvision.transforms.functional' module instead.
  warnings.warn(
/root/miniconda3/envs/avion/lib/python3.10/site-packages/torchvision/transforms/_transforms_video.py:22: UserWarning: The 'torchvision.transforms._transforms_video' module is deprecated since 0.12 and will be removed in the future. Please use the 'torchvision.transforms' module instead.
  warnings.warn(
Not using distributed mode
ckpt
 dict_keys(['epoch', 'state_dict', 'optimizer', 'scaler', 'best_acc1', 'args'])
old args 
 {'actions':       verb  noun
0        0     0
1        0     1
2        0    10
3        0   100
4        0   101
...    ...   ...
3801     9    93
3802     9    94
3803     9    95
3804     9    98
3805     9    99

[3806 rows x 2 columns],
 'batch_size': 64,
 'betas': (0.9, 0.999),
 'clip_length': 16,
 'clip_stride': 2,
 'cutmix': 1.0,
 'cutmix_minmax': None,
 'dataset': 'ek100_cls',
 'decode_threads': 1,
 'disable_amp': False,
 'dist_backend': 'nccl',
 'dist_url': 'env://',
 'distributed': True,
 'drop_path_rate': 0.1,
 'dropout_rate': 0.5,
 'epochs': 100,
 'eps': 1e-08,
 'eval_freq': 5,
 'evaluate': False,
 'fused_decode_crop': True,
 'gpu': 0,
 'grad_clip_norm': None,
 'local_rank': 0,
 'lr': 0.012,
 'lr_end': 4e-05,
 'lr_start': 4e-06,
 'mapping_act2n': {0: 0,
                   1: 1,
                   2: 10,
                   3: 100,
                   4: 101,
                   5: 102,
                   6: 103,
                   7: 104,
                   8: 105,
                   9: 106,
                   10: 107,
                   11: 108,
                   12: 109,
                   13: 11,
                   14: 110,
                   ......,
                   3797: 9,
                   3798: 9,
                   3799: 9,
                   3800: 9,
                   3801: 9,
                   3802: 9,
                   3803: 9,
                   3804: 9,
                   3805: 9},
 'mixup': 0.8,
 'mixup_mode': 'batch',
 'mixup_prob': 1.0,
 'mixup_switch_prob': 0.5,
 'norm_style': 'openai',
 'num_classes': 3806,
 'num_clips': 1,
 'num_crops': 1,
 'optimizer': 'sgd',
 'output_dir': 'experiments/finetune_cls_lavila_vitb/',
 'patch_dropout': 0.0,
 'pickle_filename': '',
 'pretrain_model': './experiments/pretrain_lavila_vitb/checkpoint_best.pt',
 'print_freq': 10,
 'rank': 0,
 'resume': '',
 'root': '/storage/Datasets/EPIC-KITCHENS-100/EK100_320p_15sec_30fps_libx264/',
 'seed': 0,
 'smoothing': 0.1,
 'start_epoch': 0,
 'train_metadata': 'datasets/EK100/epic-kitchens-100-annotations/EPIC_100_train.csv',
 'update_freq': 1,
 'use_fast_conv1': True,
 'use_flash_attn': True,
 'use_grad_checkpointing': True,
 'use_multi_epochs_loader': True,
 'use_zero': False,
 'val_metadata': 'datasets/EK100/epic-kitchens-100-annotations/EPIC_100_validation.csv',
 'video_chunk_length': 15,
 'warmup_epochs': 2,
 'wd': 4e-05,
 'workers': 8,
 'world_size': 8}
Traceback (most recent call last):
  File "/media/f/AVION/scripts/main_lavila_finetune_cls.py", line 621, in <module>
    main(args)
  File "/media/f/AVION/scripts/main_lavila_finetune_cls.py", line 145, in main
    print("=> creating model: {}".format(old_args.model))
AttributeError: 'Namespace' object has no attribute 'model'

It turns out that there is indeed a lack of "model"

RandAugment usage

I noticed that RandAugment is only used in the Kinetics dataloader if fast_rrc is off. Does this mean that RandAugment was not used for pre-training or finetuning? I also noticed even if RandAugment is moved to the GPU along with other transforms, data loading speed is quite a bit slower. Have you seen this issue before?

Bad performance for EK-100 Action Recognition finetuning model

Dear Yue,

Thanks for this amazing repo, which is well-documented and easy to try on! However, when I followed your guidance to fine-tune the action recognition model on EK-100. I got very bad results (Acc@1 0.021 Acc@5 0.089 for test and train_Acc@1": 2.62106393129771, "train_Acc@5": 9.759661259541986 even for training). Do you have any idea about that?

To include more details for your reference, I used the resized, chunked version of EK-100 that you shared here. I used the pre-trained model you provided here. And the following is the command I used for fine-tuning.

PYTHONPATH=.:third_party/decord/python/ \
CUDA_VISIBLE_DEVICES="0,1" \
torchrun \
    --nproc_per_node=2 scripts/main_lavila_finetune_cls.py \
    --root datasets/EK100/EK100_320p_15sec_30fps_libx264/ \
    --video-chunk-length 15 --use-flash-attn \
    --grad-checkpointing \
    --use-fast-conv1 \
    --batch-size 256 \
    --fused-decode-crop \
    --use-multi-epochs-loader \
    --pretrain-model experiments/pretrain_lavila_vitb/avion_pretrain_lavila_vitb_best.pt \
    --output-dir experiments/EK100_test 2>&1 | tee experiments/EK100_test/train_log.txt

And I also attached the generated train_log.txt for your reference.

Thank you in advance for your help!

Resized, chunked version for EK100 seems corrupted

Hi, thank you for sharing very helpful repo :)

I downloaded the resized/chunked version of EK100 dataset in this link and tried to unzip it, but unzip EK100_320p_15sec_30fps_libx264.zip did not work for me. Below is the error message.

Archive:  EK100_320p_15sec_30fps_libx264.zip
warning [EK100_320p_15sec_30fps_libx264.zip]:  35339199913 extra bytes at beginning or within zipfile
  (attempting to process anyway)
error [EK100_320p_15sec_30fps_libx264.zip]:  start of central directory not found;
  zipfile corrupt.
  (please check that you have transferred or created the zipfile in the
  appropriate BINARY mode and that you have compiled UnZip properly)

It seems that the uploaded zip file is corrupted. Could you please have a look on this problem?

Thank you :)

How can I Replicate your EK100 Action Recognition results?

Thanks for all this awesome work.

I see scripts and instructions to do the EK100 MIR task, but not the Action Recognition task. Could you please tell me if I'm missing something? I would like to be able to replicate the Action Recognition performance of 54.4% on the validation split.

Cheers!

Load the ckpts for LaViLa

Hi, thanks for opening the pre-trained ckpts. I want to know is it possible to load the checkpoint here for the model in LaViLa?
I check the 'state_dict', but it seems that the two do not correspond. Could you please provide a simple script to implement this loading?
Thanks!

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.