Hello. I've been trying to restore paper's score. However... i failed to achieve the s

Hi ys2man, thank you for letting us know your concern. <ol dir="auto"

Hi <a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="

Cannot achieve score writeen on paper about temporal_context_aggregation HOT 4 CLOSED

xwen99 commented on June 3, 2024

Cannot achieve score writeen on paper

from temporal_context_aggregation.

Comments (4)

xwen99 commented on June 3, 2024

Hi ys2man, thank you for letting us know your concern.

losing ~80 videos in the training set shouldn't be a problem for the performance, compared to the big size of the VCDB dataset.
As mentioned in the paper, we sample 997,090 frames from the VCDB dataset, i.e. 10 frames per video, so this is correct.
The dropout rate is not crucial, please follow the paper.
To better locate the problem, may I ask why the cosine similarity isn't working, how the first table is obtained (the performance seems good), and what's the difference between evaluation.py and evaluation_org.py?

from temporal_context_aggregation.

y2sman commented on June 3, 2024

Hi ys2man, thank you for letting us know your concern.

losing ~80 videos in the training set shouldn't be a problem for the performance, compared to the big size of the VCDB dataset.

As mentioned in the paper, we sample 997,090 frames from the VCDB dataset, i.e. 10 frames per video, so this is correct.

The dropout rate is not crucial, please follow the paper.

To better locate the problem, may I ask why the cosine similarity isn't working, how the first table is obtained (the performance seems good), and what's the difference between evaluation.py and evaluation_org.py?

Thanks for reply. Before i start, it is good to hear that first table's performance is good.

The difference between evaluation.py and evaluation_org.py is not quite big. I just write my own cosine similiarity code for evaluation, because the orginial code wasn't worked.

I attached my error message here with evaluation.py to calc cosine similarity.

python3 evaluation.py --dataset FIVR-5K --pca_components 1024 --num_clusters 256 --num_layers 1 --output_dim 1024 --padding_size 64 --metric cosine --model_path models/model_v5_with_all_bg.pth --feature_path pre_processing/fivr_imac_pca1024.hdf5 --random_sampling 
Comparator is ...  False
loading features...
...features loaded
100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 50/50 [00:00<00:00, 156.28it/s]
  0%|                                                                                                                                                                                                                                                                    | 0/5000 [00:00<?, ?it/s]
Traceback (most recent call last):
  File "evaluation.py", line 331, in <module>
    main()
  File "evaluation.py", line 327, in main
    eval_function(model, dataset, args)
  File "evaluation.py", line 258, in query_vs_database
    sims = calculate_similarities(queries, embedding, qr_video_dict, args.metric, comparator)
  File "evaluation.py", line 50, in calculate_similarities
    cdist(query_features, target_feature, metric='cosine'))
  File "/usr/local/envs/etri/lib/python3.7/site-packages/scipy/spatial/distance.py", line 2717, in cdist
    raise ValueError('XA must be a 2-dimensional array.')
ValueError: XA must be a 2-dimensional array.

Other methods(euclidian, chamfer, sym_chamfer) are worked perfect. I added visil's pre_trained weight for video_comparator but it stopped while calculating.

python3 evaluation.py --dataset FIVR-5K --pca_components 1024 --num_clusters 256 --num_layers 1 --output_dim 1024 --padding_size 64 --metric chamfer --model_path models/model_v5_with_all_bg.pth --feature_path pre_processing/fivr_imac_pca1024.hdf5 --random_sampling --use_comparator
Comparator is ...  True
loading features...
...features loaded
100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 50/50 [00:00<00:00, 505.95it/s]
 48%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████▋                                                                                                                                  | 2383/5000 [01:45<01:56, 22.49it/s]
Traceback (most recent call last):
  File "evaluation.py", line 331, in <module>
    main()
  File "evaluation.py", line 327, in main
    eval_function(model, dataset, args)
  File "evaluation.py", line 258, in query_vs_database
    sims = calculate_similarities(queries, embedding, qr_video_dict, args.metric, comparator)
  File "evaluation.py", line 56, in calculate_similarities
    sim = chamfer(query, target_feature, comparator)
  File "evaluation.py", line 71, in chamfer
    simmatrix = comparator(simmatrix).detach()
  File "/usr/local/envs/etri/lib/python3.7/site-packages/torch/nn/modules/module.py", line 1051, in _call_impl
    return forward_call(*input, **kwargs)
  File "/home/kjlee/workspace/temporal_context_aggregation/model.py", line 620, in forward
    sim = self.mpool2(sim)
  File "/usr/local/envs/etri/lib/python3.7/site-packages/torch/nn/modules/module.py", line 1051, in _call_impl
    return forward_call(*input, **kwargs)
  File "/usr/local/envs/etri/lib/python3.7/site-packages/torch/nn/modules/pooling.py", line 164, in forward
    self.return_indices)
  File "/usr/local/envs/etri/lib/python3.7/site-packages/torch/_jit_internal.py", line 405, in fn
    return if_false(*args, **kwargs)
  File "/usr/local/envs/etri/lib/python3.7/site-packages/torch/nn/functional.py", line 718, in _max_pool2d
    return torch.max_pool2d(input, kernel_size, stride, padding, dilation, ceil_mode)
RuntimeError: Given input size: (64x22x1). Calculated output size: (64x11x0). Output size is too small

The two problems I sent above are the problems I have now. I think it is structured in accordance with what is in the paper, and as I said above, other parts are parts that do not affect performance. In this situation, can you provide "pre_trained_model" or provide accurate parameter values? And please check whether the cosine similarity calculation code works.

from temporal_context_aggregation.

xwen99 commented on June 3, 2024

Hi @y2sman,

Just noticed that you are trying to align with our results on FIVR-5K reported in the ablation study section. However, for one thing, this subset is somehow too small and may produce unstable results; for another, as that table is only for ablation study, we only ensure ablating one hyper-parameter per subtable, and not all hyper-parameters are perfectly aligned with our final run on FIVR-200K (so it is ok to have different results with the ablation study section). So I recommend trying to experiment with FIVR-200K or running multiple times with FIVR-5K for stable results.

About your questions, I just use scipy to calculate the cosine similarities, please check their document for the error message, it seems that your tensor shape is not suitable, and for the ViSiL video comparator, please note that they require each video to have at least 4 frames, your error message may indicate a too-short video. Recently they released their official PyTorch code, which may be helpful for you: https://github.com/MKLab-ITI/visil/tree/pytorch

BTW, I wonder if the problem is only with the cosine similarity metric, are other metrics fine?

from temporal_context_aggregation.

zcgeqian commented on June 3, 2024

Hi ys2man, thank you for letting us know your concern.

losing ~80 videos in the training set shouldn't be a problem for the performance, compared to the big size of the VCDB dataset.

As mentioned in the paper, we sample 997,090 frames from the VCDB dataset, i.e. 10 frames per video, so this is correct.

The dropout rate is not crucial, please follow the paper.

To better locate the problem, may I ask why the cosine similarity isn't working, how the first table is obtained (the performance seems good), and what's the difference between evaluation.py and evaluation_org.py?

Thanks for reply. Before i start, it is good to hear that first table's performance is good.

The difference between evaluation.py and evaluation_org.py is not quite big. I just write my own cosine similiarity code for evaluation, because the orginial code wasn't worked.

I attached my error message here with evaluation.py to calc cosine similarity.

python3 evaluation.py --dataset FIVR-5K --pca_components 1024 --num_clusters 256 --num_layers 1 --output_dim 1024 --padding_size 64 --metric cosine --model_path models/model_v5_with_all_bg.pth --feature_path pre_processing/fivr_imac_pca1024.hdf5 --random_sampling 
Comparator is ...  False
loading features...
...features loaded
100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 50/50 [00:00<00:00, 156.28it/s]
  0%|                                                                                                                                                                                                                                                                    | 0/5000 [00:00<?, ?it/s]
Traceback (most recent call last):
  File "evaluation.py", line 331, in <module>
    main()
  File "evaluation.py", line 327, in main
    eval_function(model, dataset, args)
  File "evaluation.py", line 258, in query_vs_database
    sims = calculate_similarities(queries, embedding, qr_video_dict, args.metric, comparator)
  File "evaluation.py", line 50, in calculate_similarities
    cdist(query_features, target_feature, metric='cosine'))
  File "/usr/local/envs/etri/lib/python3.7/site-packages/scipy/spatial/distance.py", line 2717, in cdist
    raise ValueError('XA must be a 2-dimensional array.')
ValueError: XA must be a 2-dimensional array.

Other methods(euclidian, chamfer, sym_chamfer) are worked perfect. I added visil's pre_trained weight for video_comparator but it stopped while calculating.

python3 evaluation.py --dataset FIVR-5K --pca_components 1024 --num_clusters 256 --num_layers 1 --output_dim 1024 --padding_size 64 --metric chamfer --model_path models/model_v5_with_all_bg.pth --feature_path pre_processing/fivr_imac_pca1024.hdf5 --random_sampling --use_comparator
Comparator is ...  True
loading features...
...features loaded
100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 50/50 [00:00<00:00, 505.95it/s]
 48%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████▋                                                                                                                                  | 2383/5000 [01:45<01:56, 22.49it/s]
Traceback (most recent call last):
  File "evaluation.py", line 331, in <module>
    main()
  File "evaluation.py", line 327, in main
    eval_function(model, dataset, args)
  File "evaluation.py", line 258, in query_vs_database
    sims = calculate_similarities(queries, embedding, qr_video_dict, args.metric, comparator)
  File "evaluation.py", line 56, in calculate_similarities
    sim = chamfer(query, target_feature, comparator)
  File "evaluation.py", line 71, in chamfer
    simmatrix = comparator(simmatrix).detach()
  File "/usr/local/envs/etri/lib/python3.7/site-packages/torch/nn/modules/module.py", line 1051, in _call_impl
    return forward_call(*input, **kwargs)
  File "/home/kjlee/workspace/temporal_context_aggregation/model.py", line 620, in forward
    sim = self.mpool2(sim)
  File "/usr/local/envs/etri/lib/python3.7/site-packages/torch/nn/modules/module.py", line 1051, in _call_impl
    return forward_call(*input, **kwargs)
  File "/usr/local/envs/etri/lib/python3.7/site-packages/torch/nn/modules/pooling.py", line 164, in forward
    self.return_indices)
  File "/usr/local/envs/etri/lib/python3.7/site-packages/torch/_jit_internal.py", line 405, in fn
    return if_false(*args, **kwargs)
  File "/usr/local/envs/etri/lib/python3.7/site-packages/torch/nn/functional.py", line 718, in _max_pool2d
    return torch.max_pool2d(input, kernel_size, stride, padding, dilation, ceil_mode)
RuntimeError: Given input size: (64x22x1). Calculated output size: (64x11x0). Output size is too small

I think you have evaluated the frame-level feature with cosine similarity, which is for video-level feature according to 4.2 similarity measure of the paper.

from temporal_context_aggregation.

Cannot achieve score writeen on paper about temporal_context_aggregation HOT 4 CLOSED

Comments (4)

Related Issues (7)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent