opengvlab / egovideo Goto Github PK

View Code? Open in Web Editor NEW

97.0 97.0 3.0 48.23 MB

[CVPR 2024 Champions] Solutions for EgoVis Chanllenges in CVPR 2024

Python 15.94% Shell 0.06% Jupyter Notebook 83.99% Dockerfile 0.01% Makefile 0.01% CSS 0.01% Batchfile 0.01%

egovideo's People

Contributors

Stargazers

Watchers

Forkers

yxgz lwaekfjlk zhiwenshao

egovideo's Issues

code for NLQ fusion

Hi,

Thank you for releasing the features. I was unable to find the code to fuse the features. I'm interested in reproducing the results of the paper on the NLQ dataset. Can you share these codes?

Access to pretrained model for STA task

Hi!

Can you please let me know what pretrained model are you using for STA task training? The ego4d_sta_train.sh contains following line: MODEL_PATH='/mnt/petrelfs/share_data/chenguo/ego_forecasting/pretrained_models/vitl_v_f.pt', can you say to which model is it referring to?

Thanks!

How to modify number of temporal views for the FHP task

Hi!

I am curious to know, what parameters are needed to be modified to change the number of temporal views (V) as defined in your experiments? Do you also use the same number of frames (T) both during training and testing stages?

Thanks!

NLQ() and MQ() backbone checkpoints

Hi,

Thank you for releasing the code, checkpoints, and features. I was unable to find the checkpoints corresponding to the NLQ and MQ verb/noun features. I'm interested in extracting features for videos outside the NLQ/MQ dataset. Can you share these checkpoints? It will also be great if you can share the script to extract features from videos. Thanks in advance.

Access to the FHP checkpoints

In the readme it is mentioned that the code and checkpoints of pretraining for the FHP task are released, can you please guide me where can I find those?

STA pretrained model checkpoint

Interesting works! I would like to cite your work regarding STA task but I couldn't find pretrained model for STA.
Can you release the pretrained checkpoint (ViT-L) for STA?
It should be the model corresponding to "/mnt/petrelfs/share_data/chenguo/ego_forecasting/pretrained_models/vitl_v.pt"
at https://github.com/OpenGVLab/ego4d-eccv2022-solutions/blob/main/forecasting_eval/configs/Ego4dShortTermAnticipation/VIT3D.yaml#L5

Thank you in advance!

Question about the provided video feature for MQ and NLQ

Thanks a lot for your nice work and providing your video feature.
I have a question about the provided feature.

"The video features extracted by VideoMAE-L pretrained on verb and noun subset. ". Does it means that the features are used for the experiment "K700 → Verb" in the paper.

ego4d_verb_pretrain_vitl_k700.pt Top-1: 16.60%, Top-5: 49.78%

Accuracy of the network on the 167745 test videos: Top-1: 16.60%, Top-5: 49.78%

export LC_ALL="en_US.UTF-8"

OUTPUT_DIR='./workdir/ego4d_verb_pretrain_vitl_k700'
DATA_PATH='/home/yangninghua/data/1/PerformDutiesDataset/ActionRecognition/Ego4d/v2/full_scale'
MODEL_PATH='/home/yangninghua/data/1/PerformDutiesDataset/ActionRecognition/Ego4d/videomae_cls/checkpoint/ego4d_verb_pretrain_vitl_k700.pt'

GPUS=4
NNODES=${NNODES:-1}
NODE_RANK=${NODE_RANK:-0}
PORT=${PORT:-39500}
MASTER_ADDR=${MASTER_ADDR:-"127.0.0.1"}


# batch_size can be adjusted according to the graphics card
# vit_large_patch16_224_ego4d batch=16 使用显存12260MiB
# batch=42 使用显存23022MiB, 4张卡，999次迭代  167745行
OMP_NUM_THREADS=1 CUDA_VISIBLE_DEVICES=0,1,2,3 python -m torch.distributed.launch --nproc_per_node=$GPUS \
        --master_port $PORT --nnodes=$NNODES \
        --node_rank=$NODE_RANK --master_addr=$MASTER_ADDR \
        run_ego4d_cls_pretrain.py \
    --model vit_large_patch16_224_ego4d \
    --nb_noun_classes 0 \
    --nb_verb_classes 118 \
    --data_set ego4d_verb \
    --data_path ${DATA_PATH} \
    --finetune ${MODEL_PATH} \
    --log_dir ${OUTPUT_DIR} \
    --output_dir ${OUTPUT_DIR} \
    --batch_size 42 \
    --num_sample 1 \
    --warmup_epochs  1 \
    --input_size 224 \
    --short_side_size 224 \
    --save_ckpt_freq 1 \
    --num_frames 16 \
    --opt adamw \
    --lr 5e-4 \
    --opt_betas 0.9 0.999 \
    --weight_decay 0.05 \
    --epochs 10 \
    --dist_eval \
    --test_num_segment 2 \
    --test_num_crop 3 \
    --enable_deepspeed \
    --eval

SCOD How to visualize bounding boxes of model?

I am running the evaluation code for the SCOD task and I downloaded the outputs to a pickle file. Looking at the output of the mode, it seems to be returning an array of size 100x5 for each validation image. I am a little confused as to how to translate this output to actual bounding boxes... Any advice on how I can do this? Thank you!!

Validation data for SCOD

May I know what data format and which directories should I be placing my validation set to run the SCOD benchmark? Thank you!!

Would you release the leading model and code for SCOD with Swin-L and Pre-trained on ImageNet-22K + Objects365

Thanks!

Hardware Requirements for Training?

@czczup @cg1177 A quick follow-up question: could you disclose the GPU memory size required by the training script for the Swin-L IN-22K+O365 model for SCOD task (or just the GPU type/per GPU memory used in this work)? Also whether you used whole-precision? Thanks again!

opengvlab / egovideo Goto Github PK

egovideo's People

Contributors

Stargazers

Watchers

Forkers

egovideo's Issues

Recommend Projects

Recommend Topics

Recommend Org