opengvlab / egovideo Goto Github PK
View Code? Open in Web Editor NEW[CVPR 2024 Champions] Solutions for EgoVis Chanllenges in CVPR 2024
[CVPR 2024 Champions] Solutions for EgoVis Chanllenges in CVPR 2024
Hi!
Can you please let me know what pretrained model are you using for STA task training? The ego4d_sta_train.sh
contains following line: MODEL_PATH='/mnt/petrelfs/share_data/chenguo/ego_forecasting/pretrained_models/vitl_v_f.pt'
, can you say to which model is it referring to?
Thanks!
Hi,
Thank you for releasing the code, checkpoints, and features. I was unable to find the checkpoints corresponding to the NLQ and MQ verb/noun features. I'm interested in extracting features for videos outside the NLQ/MQ dataset. Can you share these checkpoints? It will also be great if you can share the script to extract features from videos. Thanks in advance.
In the readme it is mentioned that the code and checkpoints of pretraining for the FHP task are released, can you please guide me where can I find those?
Interesting works! I would like to cite your work regarding STA task but I couldn't find pretrained model for STA.
Can you release the pretrained checkpoint (ViT-L) for STA?
It should be the model corresponding to "/mnt/petrelfs/share_data/chenguo/ego_forecasting/pretrained_models/vitl_v.pt"
at https://github.com/OpenGVLab/ego4d-eccv2022-solutions/blob/main/forecasting_eval/configs/Ego4dShortTermAnticipation/VIT3D.yaml#L5
Thank you in advance!
Accuracy of the network on the 167745 test videos: Top-1: 16.60%, Top-5: 49.78%
export LC_ALL="en_US.UTF-8"
OUTPUT_DIR='./workdir/ego4d_verb_pretrain_vitl_k700'
DATA_PATH='/home/yangninghua/data/1/PerformDutiesDataset/ActionRecognition/Ego4d/v2/full_scale'
MODEL_PATH='/home/yangninghua/data/1/PerformDutiesDataset/ActionRecognition/Ego4d/videomae_cls/checkpoint/ego4d_verb_pretrain_vitl_k700.pt'
GPUS=4
NNODES=${NNODES:-1}
NODE_RANK=${NODE_RANK:-0}
PORT=${PORT:-39500}
MASTER_ADDR=${MASTER_ADDR:-"127.0.0.1"}
# batch_size can be adjusted according to the graphics card
# vit_large_patch16_224_ego4d batch=16 使用显存12260MiB
# batch=42 使用显存23022MiB, 4张卡,999次迭代 167745行
OMP_NUM_THREADS=1 CUDA_VISIBLE_DEVICES=0,1,2,3 python -m torch.distributed.launch --nproc_per_node=$GPUS \
--master_port $PORT --nnodes=$NNODES \
--node_rank=$NODE_RANK --master_addr=$MASTER_ADDR \
run_ego4d_cls_pretrain.py \
--model vit_large_patch16_224_ego4d \
--nb_noun_classes 0 \
--nb_verb_classes 118 \
--data_set ego4d_verb \
--data_path ${DATA_PATH} \
--finetune ${MODEL_PATH} \
--log_dir ${OUTPUT_DIR} \
--output_dir ${OUTPUT_DIR} \
--batch_size 42 \
--num_sample 1 \
--warmup_epochs 1 \
--input_size 224 \
--short_side_size 224 \
--save_ckpt_freq 1 \
--num_frames 16 \
--opt adamw \
--lr 5e-4 \
--opt_betas 0.9 0.999 \
--weight_decay 0.05 \
--epochs 10 \
--dist_eval \
--test_num_segment 2 \
--test_num_crop 3 \
--enable_deepspeed \
--eval
I am running the evaluation code for the SCOD task and I downloaded the outputs to a pickle file. Looking at the output of the mode, it seems to be returning an array of size 100x5 for each validation image. I am a little confused as to how to translate this output to actual bounding boxes... Any advice on how I can do this? Thank you!!
May I know what data format and which directories should I be placing my validation set to run the SCOD benchmark? Thank you!!
Thanks!
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.