Code Monkey home page Code Monkey logo

Comments (4)

xwen99 avatar xwen99 commented on June 3, 2024

Hi ys2man, thank you for letting us know your concern.

  1. losing ~80 videos in the training set shouldn't be a problem for the performance, compared to the big size of the VCDB dataset.
  2. As mentioned in the paper, we sample 997,090 frames from the VCDB dataset, i.e. 10 frames per video, so this is correct.
  3. The dropout rate is not crucial, please follow the paper.
  4. To better locate the problem, may I ask why the cosine similarity isn't working, how the first table is obtained (the performance seems good), and what's the difference between evaluation.py and evaluation_org.py?

from temporal_context_aggregation.

y2sman avatar y2sman commented on June 3, 2024

Hi ys2man, thank you for letting us know your concern.

  1. losing ~80 videos in the training set shouldn't be a problem for the performance, compared to the big size of the VCDB dataset.
  2. As mentioned in the paper, we sample 997,090 frames from the VCDB dataset, i.e. 10 frames per video, so this is correct.
  3. The dropout rate is not crucial, please follow the paper.
  4. To better locate the problem, may I ask why the cosine similarity isn't working, how the first table is obtained (the performance seems good), and what's the difference between evaluation.py and evaluation_org.py?

Thanks for reply. Before i start, it is good to hear that first table's performance is good.

The difference between evaluation.py and evaluation_org.py is not quite big. I just write my own cosine similiarity code for evaluation, because the orginial code wasn't worked.

I attached my error message here with evaluation.py to calc cosine similarity.

python3 evaluation.py --dataset FIVR-5K --pca_components 1024 --num_clusters 256 --num_layers 1 --output_dim 1024 --padding_size 64 --metric cosine --model_path models/model_v5_with_all_bg.pth --feature_path pre_processing/fivr_imac_pca1024.hdf5 --random_sampling 
Comparator is ...  False
loading features...
...features loaded
100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 50/50 [00:00<00:00, 156.28it/s]
  0%|                                                                                                                                                                                                                                                                    | 0/5000 [00:00<?, ?it/s]
Traceback (most recent call last):
  File "evaluation.py", line 331, in <module>
    main()
  File "evaluation.py", line 327, in main
    eval_function(model, dataset, args)
  File "evaluation.py", line 258, in query_vs_database
    sims = calculate_similarities(queries, embedding, qr_video_dict, args.metric, comparator)
  File "evaluation.py", line 50, in calculate_similarities
    cdist(query_features, target_feature, metric='cosine'))
  File "/usr/local/envs/etri/lib/python3.7/site-packages/scipy/spatial/distance.py", line 2717, in cdist
    raise ValueError('XA must be a 2-dimensional array.')
ValueError: XA must be a 2-dimensional array.

Other methods(euclidian, chamfer, sym_chamfer) are worked perfect. I added visil's pre_trained weight for video_comparator but it stopped while calculating.

python3 evaluation.py --dataset FIVR-5K --pca_components 1024 --num_clusters 256 --num_layers 1 --output_dim 1024 --padding_size 64 --metric chamfer --model_path models/model_v5_with_all_bg.pth --feature_path pre_processing/fivr_imac_pca1024.hdf5 --random_sampling --use_comparator
Comparator is ...  True
loading features...
...features loaded
100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 50/50 [00:00<00:00, 505.95it/s]
 48%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████▋                                                                                                                                  | 2383/5000 [01:45<01:56, 22.49it/s]
Traceback (most recent call last):
  File "evaluation.py", line 331, in <module>
    main()
  File "evaluation.py", line 327, in main
    eval_function(model, dataset, args)
  File "evaluation.py", line 258, in query_vs_database
    sims = calculate_similarities(queries, embedding, qr_video_dict, args.metric, comparator)
  File "evaluation.py", line 56, in calculate_similarities
    sim = chamfer(query, target_feature, comparator)
  File "evaluation.py", line 71, in chamfer
    simmatrix = comparator(simmatrix).detach()
  File "/usr/local/envs/etri/lib/python3.7/site-packages/torch/nn/modules/module.py", line 1051, in _call_impl
    return forward_call(*input, **kwargs)
  File "/home/kjlee/workspace/temporal_context_aggregation/model.py", line 620, in forward
    sim = self.mpool2(sim)
  File "/usr/local/envs/etri/lib/python3.7/site-packages/torch/nn/modules/module.py", line 1051, in _call_impl
    return forward_call(*input, **kwargs)
  File "/usr/local/envs/etri/lib/python3.7/site-packages/torch/nn/modules/pooling.py", line 164, in forward
    self.return_indices)
  File "/usr/local/envs/etri/lib/python3.7/site-packages/torch/_jit_internal.py", line 405, in fn
    return if_false(*args, **kwargs)
  File "/usr/local/envs/etri/lib/python3.7/site-packages/torch/nn/functional.py", line 718, in _max_pool2d
    return torch.max_pool2d(input, kernel_size, stride, padding, dilation, ceil_mode)
RuntimeError: Given input size: (64x22x1). Calculated output size: (64x11x0). Output size is too small

The two problems I sent above are the problems I have now. I think it is structured in accordance with what is in the paper, and as I said above, other parts are parts that do not affect performance. In this situation, can you provide "pre_trained_model" or provide accurate parameter values? And please check whether the cosine similarity calculation code works.

from temporal_context_aggregation.

xwen99 avatar xwen99 commented on June 3, 2024

Hi @y2sman,

Just noticed that you are trying to align with our results on FIVR-5K reported in the ablation study section. However, for one thing, this subset is somehow too small and may produce unstable results; for another, as that table is only for ablation study, we only ensure ablating one hyper-parameter per subtable, and not all hyper-parameters are perfectly aligned with our final run on FIVR-200K (so it is ok to have different results with the ablation study section). So I recommend trying to experiment with FIVR-200K or running multiple times with FIVR-5K for stable results.

About your questions, I just use scipy to calculate the cosine similarities, please check their document for the error message, it seems that your tensor shape is not suitable, and for the ViSiL video comparator, please note that they require each video to have at least 4 frames, your error message may indicate a too-short video. Recently they released their official PyTorch code, which may be helpful for you: https://github.com/MKLab-ITI/visil/tree/pytorch

BTW, I wonder if the problem is only with the cosine similarity metric, are other metrics fine?

from temporal_context_aggregation.

zcgeqian avatar zcgeqian commented on June 3, 2024

Hi ys2man, thank you for letting us know your concern.

  1. losing ~80 videos in the training set shouldn't be a problem for the performance, compared to the big size of the VCDB dataset.
  2. As mentioned in the paper, we sample 997,090 frames from the VCDB dataset, i.e. 10 frames per video, so this is correct.
  3. The dropout rate is not crucial, please follow the paper.
  4. To better locate the problem, may I ask why the cosine similarity isn't working, how the first table is obtained (the performance seems good), and what's the difference between evaluation.py and evaluation_org.py?

Thanks for reply. Before i start, it is good to hear that first table's performance is good.

The difference between evaluation.py and evaluation_org.py is not quite big. I just write my own cosine similiarity code for evaluation, because the orginial code wasn't worked.

I attached my error message here with evaluation.py to calc cosine similarity.

python3 evaluation.py --dataset FIVR-5K --pca_components 1024 --num_clusters 256 --num_layers 1 --output_dim 1024 --padding_size 64 --metric cosine --model_path models/model_v5_with_all_bg.pth --feature_path pre_processing/fivr_imac_pca1024.hdf5 --random_sampling 
Comparator is ...  False
loading features...
...features loaded
100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 50/50 [00:00<00:00, 156.28it/s]
  0%|                                                                                                                                                                                                                                                                    | 0/5000 [00:00<?, ?it/s]
Traceback (most recent call last):
  File "evaluation.py", line 331, in <module>
    main()
  File "evaluation.py", line 327, in main
    eval_function(model, dataset, args)
  File "evaluation.py", line 258, in query_vs_database
    sims = calculate_similarities(queries, embedding, qr_video_dict, args.metric, comparator)
  File "evaluation.py", line 50, in calculate_similarities
    cdist(query_features, target_feature, metric='cosine'))
  File "/usr/local/envs/etri/lib/python3.7/site-packages/scipy/spatial/distance.py", line 2717, in cdist
    raise ValueError('XA must be a 2-dimensional array.')
ValueError: XA must be a 2-dimensional array.

Other methods(euclidian, chamfer, sym_chamfer) are worked perfect. I added visil's pre_trained weight for video_comparator but it stopped while calculating.

python3 evaluation.py --dataset FIVR-5K --pca_components 1024 --num_clusters 256 --num_layers 1 --output_dim 1024 --padding_size 64 --metric chamfer --model_path models/model_v5_with_all_bg.pth --feature_path pre_processing/fivr_imac_pca1024.hdf5 --random_sampling --use_comparator
Comparator is ...  True
loading features...
...features loaded
100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 50/50 [00:00<00:00, 505.95it/s]
 48%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████▋                                                                                                                                  | 2383/5000 [01:45<01:56, 22.49it/s]
Traceback (most recent call last):
  File "evaluation.py", line 331, in <module>
    main()
  File "evaluation.py", line 327, in main
    eval_function(model, dataset, args)
  File "evaluation.py", line 258, in query_vs_database
    sims = calculate_similarities(queries, embedding, qr_video_dict, args.metric, comparator)
  File "evaluation.py", line 56, in calculate_similarities
    sim = chamfer(query, target_feature, comparator)
  File "evaluation.py", line 71, in chamfer
    simmatrix = comparator(simmatrix).detach()
  File "/usr/local/envs/etri/lib/python3.7/site-packages/torch/nn/modules/module.py", line 1051, in _call_impl
    return forward_call(*input, **kwargs)
  File "/home/kjlee/workspace/temporal_context_aggregation/model.py", line 620, in forward
    sim = self.mpool2(sim)
  File "/usr/local/envs/etri/lib/python3.7/site-packages/torch/nn/modules/module.py", line 1051, in _call_impl
    return forward_call(*input, **kwargs)
  File "/usr/local/envs/etri/lib/python3.7/site-packages/torch/nn/modules/pooling.py", line 164, in forward
    self.return_indices)
  File "/usr/local/envs/etri/lib/python3.7/site-packages/torch/_jit_internal.py", line 405, in fn
    return if_false(*args, **kwargs)
  File "/usr/local/envs/etri/lib/python3.7/site-packages/torch/nn/functional.py", line 718, in _max_pool2d
    return torch.max_pool2d(input, kernel_size, stride, padding, dilation, ceil_mode)
RuntimeError: Given input size: (64x22x1). Calculated output size: (64x11x0). Output size is too small

The two problems I sent above are the problems I have now. I think it is structured in accordance with what is in the paper, and as I said above, other parts are parts that do not affect performance. In this situation, can you provide "pre_trained_model" or provide accurate parameter values? And please check whether the cosine similarity calculation code works.

I think you have evaluated the frame-level feature with cosine similarity, which is for video-level feature according to 4.2 similarity measure of the paper.

from temporal_context_aggregation.

Related Issues (7)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.