mever-team / distill-and-select Goto Github PK

Authors official PyTorch implementation of the "DnS: Distill-and-Select for Efficient and Accurate Video Indexing and Retrieval" [IJCV 2022]

License: Apache License 2.0

Python 100.00%

video-retrieval knowledge-distillation video-search video-similarity-search video-similarity-learning duplicate-videos near-duplicate-video-retrieval fivr ndvr

distill-and-select's Introduction

DnS: Distill-and-Select for Efficient and Accurate Video Indexing and Retrieval

This repository contains the PyTorch implementation of the paper DnS: Distill-and-Select for Efficient and Accurate Video Indexing and Retrieval. It provides code for the knowledge distillation training of coarse- and fine-grained student networks based on similarities calculated from a teacher and the selector network. Also, the scripts for the training of the selector network are included. Finally, to facilitate the reproduction of the paper's results, the evaluation code, the extracted features for the employed video datasets, and pre-trained networks for the various students and selectors are available.

Prerequisites

Python 3
PyTorch >= 1.1
Torchvision >= 0.4

Preparation

Installation

Clone this repo:

git clone https://github.com/mever-team/distill-and-select
cd distill-and-select

You can install all the dependencies by

pip install -r requirements.txt

conda install --file requirements.txt

Feature files

We provide our extracted features for all datasets to facilitate reproducibility for future research.
Download the feature files of the dataset you want:
- DnS-100K (219 GB)
- FIVR-200K (406 GB), FIVR-5K (8.7 GB)
- CC_WEB_VIDEO (31 GB)
- SVD (150 GB)
- EVVE (9 GB)
- VCDB (118 GB)
All feature files are in HDF5 format

Distillation

We provide the code for training and evaluation of our student models.

Student training

To train a fine-grained student, run the train_student.py given fine-grained as value to the --student_type argument, as in the following command:

python train_student.py --student_type fine-grained --experiment_path experiments/DnS_students --trainset_hdf5 /path/to/dns_100k.hdf5

You can train an attention or binarization fine-grained students by setting either the --attention or --binarization flags to true, respectively.

For fine-grained attention students:

python train_student.py --student_type fine-grained --binarization false --attention true --experiment_path /path/to/experiment/ --trainset_hdf5 /path/to/dns_100k.hdf5

For fine-grained binarization students:

python train_student.py --student_type fine-grained --binarization true --attention false --experiment_path /path/to/experiment/ --trainset_hdf5 /path/to/dns_100k.hdf5

To train a coarse-grained students, provide coarse-grained to the --student_type argument:

python train_student.py --student_type coarse-grained --experiment_path /path/to/experiment/ --trainset_hdf5 /path/to/dns_100k.hdf5 --attention true --learning_rate 1e-5

Provide one of the teacher, fg_att_student_iter1, fg_att_student_iter2 to the --teacher argument in odrder to train a student with a different teacher:

python train_student.py --teacher fg_att_student_iter2 --experiment_path /path/to/experiment/ --trainset_hdf5 /path/to/dns_100k.hdf5

You can optionally perform validation with FIVR-5K by providing its HDF5 file to the --val_hdf5 and choosing one of the DSVR, CSVR, ISVR sets with the --val_set argument:

python train_student.py --student_type coarse-grained --val_hdf5 /path/to/fivr_5k.hdf5 --val_set ISVR --experiment_path /path/to/experiment/ --trainset_hdf5 /path/to/dns_100k.hdf5 --learning_rate 1e-5

Student Evaluation

Choose one of the FIVR-5K, FIVR-200K, CC_WEB_VIDEO, SVD, or EVVE datasets to evaluate your models.
For the evaluation of the students, run the evaluation_student.py script by providing the path to the .pth model to the --student_path argument, as in the following command:

python evaluation_student.py --student_path experiments/DnS_students/model_fg_att_student.pth --dataset FIVR-5K --dataset_hdf5 /path/to/fivr_200k.hdf5

If you don't pass any value to the --student_path, a pretrained model will be selected:

python evaluation_student.py --student_type fine-grained --attention true --dataset FIVR-5K --dataset_hdf5 /path/to/fivr_200k.hdf5

Selection

We also provide the code for training of the selector network and the evaluation of our overall DnS framework.

Selector training

To train a selector network, run the train_selector.py as in the following command:

python train_selector.py --experiment_path experiments/DnS_students --trainset_hdf5 /path/to/dns_100k.hdf5

Provide different values to --threshold argument to train the selector network with different label functions.

DnS Evaluation

For the evaluation of the DnS framework, run the evaluation_dns.py script by providing the path to the .pth model to the corresponding network arguments, as in the following command:

python evaluation_dns.py --selector_network_path experiments/DnS_students/model_selector_network.pth --dataset FIVR-5K --dataset_hdf5 /path/to/fivr_200k.hdf5

If you don't pass any value to the network path argument, then the pretrained model will be selected. E.g. to evalute DnS with the Fine-grained Attention Student:

python evaluation_dns.py --attention true --dataset FIVR-5K --dataset_hdf5 /path/to/fivr_200k.hdf5

Provide different values to --percentage argument to sent different number of video pairs for reranking to the Fine-grained student. Given the value all, it runs evaluation for all dataset percentages.

Use our pretrained models

We also provide our pretrained models trained with the fg_att_student_iter2 teacher.

Load our pretrained models as follows:

from model.feature_extractor import FeatureExtractor
from model.students import FineGrainedStudent, CoarseGrainedStudent
from model.selector import SelectorNetwork

# The feature extraction network used in out experiments
feature_extractor = FeatureExtractor(dims=512).eval()

# Our Fine-grained Students
fg_att_student = FineGrainedStudent(pretrained=True, attention=True).eval()
fg_bin_student = FineGrainedStudent(pretrained=True, binarization=True).eval()

# Our Coarse-grained Students
cg_student = CoarseGrainedStudent(pretrained=True).eval()

# Our Selector Networks
selector_att = SelectorNetwork(pretrained=True, attention=True).eval()
selector_bin = SelectorNetwork(pretrained=True, binarization=True).eval()

First, extract video features by providing a video tensor to feature extractor (similar as here)

video_features = feature_extractor(video_tensor)

Use the index_video() function providing video features to extract video representations for the student and selector networks

fg_features = fg_att_student.index_video(video_features)
cg_features = cg_student.index_video(video_features)
sn_features = selector_att.index_video(video_features)

Use the calculate_video_similarity() function providing query and target features to calculate similarity based on the student networks.

fine_similarity = fg_att_student.calculate_video_similarity(query_fg_features, target_fg_features)
coarse_similarity = cg_student.calculate_video_similarity(query_cg_features, target_cg_features)

To calculate the selector's score for a video pair, call the selector network by providing the features extracted for each video and their coarse similarity

selector_features = torch.cat([query_sn_features, target_sn_features, coarse_similarity], 1)
selector_scores = selector_att(selector_features)

Citation

If you use this code for your research, please consider citing our papers:

@article{kordopatis2022dns,
  title={{DnS}: {Distill-and-Select} for Efficient and Accurate Video Indexing and Retrieval},
  author={Kordopatis-Zilos, Giorgos and Tzelepis, Christos and Papadopoulos, Symeon and Kompatsiaris, Ioannis and Patras, Ioannis},
  journal={International Journal of Computer Vision},
  year={2022}
}

@inproceedings{kordopatis2019visil,
  title={{ViSiL}: Fine-grained Spatio-Temporal Video Similarity Learning},
    author={Kordopatis-Zilos, Giorgos and Papadopoulos, Symeon and Patras, Ioannis and Kompatsiaris, Ioannis},
  booktitle={Proceedings of the IEEE/CVF International Conference on Computer Vision},
  year={2019}
}

Related Projects

ViSiL - here you can find our teacher model

FIVR-200K - download our FIVR-200K dataset

Acknowledgements

This work has been supported by the projects WeVerify and MediaVerse, partially funded by the European Commission under contract number 825297 and 957252, respectively, and DECSTER funded by EPSRC under contract number EP/R025290/1.

License

This project is licensed under the Apache License 2.0 - see the LICENSE file for details

Contact for further details about the project

Giorgos Kordopatis-Zilos ([email protected])

distill-and-select's People

Contributors

Stargazers

Watchers

Forkers

fredhuang16 quliulangle ufishaufi landryraccoon brahimmade ykhu426 hadryan tnmygrwl theadamcolton

distill-and-select's Issues

How can I train the student by other datasets?

Thanks for your great work!
If I want to use my dataset to train the student, how can I generate trainset_similarities_teacher.pickle like you did?
Many thanks!

About training

Thanks for your great work!
When I use the vcdb.hdf5 file to train the fine-grained Student network, There come an error, and the error logs are following:

Start training
epoch 0: 0%| | 0/688 [00:00<?, ?iter/s]
Traceback (most recent call last):
File "train_student.py", line 166, in
main(args)
File "train_student.py", line 67, in main
for pairs in pbar:
File "/home/yuzhenhuang/anaconda3/envs/DnS/lib/python3.8/site-packages/tqdm/std.py", line 1180, in iter
for obj in iterable:
File "/home/yuzhenhuang/anaconda3/envs/DnS/lib/python3.8/site-packages/torch/utils/data/dataloader.py", line 521, in next
data = self._next_data()
File "/home/yuzhenhuang/anaconda3/envs/DnS/lib/python3.8/site-packages/torch/utils/data/dataloader.py", line 1203, in _next_data
return self._process_data(data)
File "/home/yuzhenhuang/anaconda3/envs/DnS/lib/python3.8/site-packages/torch/utils/data/dataloader.py", line 1229, in _process_data
data.reraise()
File "/home/yuzhenhuang/anaconda3/envs/DnS/lib/python3.8/site-packages/torch/_utils.py", line 434, in reraise
raise exception
KeyError: Caught KeyError in DataLoader worker process 0.
Original Traceback (most recent call last):
File "/home/yuzhenhuang/anaconda3/envs/DnS/lib/python3.8/site-packages/torch/utils/data/_utils/worker.py", line 287, in _worker_loop
data = fetcher.fetch(index)
File "/home/yuzhenhuang/anaconda3/envs/DnS/lib/python3.8/site-packages/torch/utils/data/_utils/fetch.py", line 49, in fetch
data = [self.dataset[idx] for idx in possibly_batched_index]
File "/home/yuzhenhuang/anaconda3/envs/DnS/lib/python3.8/site-packages/torch/utils/data/_utils/fetch.py", line 49, in
data = [self.dataset[idx] for idx in possibly_batched_index]
File "/home/yuzhenhuang/文档/pycharm_project/distill-and-select-main/datasets/generators.py", line 94, in getitem
anchor = self.load_video(pairs[0])
File "/home/yuzhenhuang/文档/pycharm_project/distill-and-select-main/datasets/generators.py", line 69, in load_video
video_tensor = self.feature_file[str(self.index[video])][:]
File "h5py/_objects.pyx", line 54, in h5py._objects.with_phil.wrapper
File "h5py/_objects.pyx", line 55, in h5py._objects.with_phil.wrapper
File "/home/yuzhenhuang/anaconda3/envs/DnS/lib/python3.8/site-packages/h5py/_hl/group.py", line 305, in getitem
oid = h5o.open(self.id, self._e(name), lapl=self._lapl)
File "h5py/_objects.pyx", line 54, in h5py._objects.with_phil.wrapper
File "h5py/_objects.pyx", line 55, in h5py._objects.with_phil.wrapper
File "h5py/h5o.pyx", line 190, in h5py.h5o.open
KeyError: "Unable to open object (object 'yzzkTm4USWk' doesn't exist)"

If you can help me solve this problem, I'll be very grateful!

CoarseGrainedStudent error

cg_student = CoarseGrainedStudent(pretrained=True)
video_features = feature_extractor(video_tensor)
cg_features = cg_student.index_video(video_features)

RuntimeError Traceback (most recent call last)
in
1 fg_features = fg_att_student.index_video(query_features)
----> 2 cg_features = cg_student.index_video(query_features)
3 sn_features = selector_att.index_video(query_features)

~/Desktop/distill-and-select/model/students.py in index_video(self, x, mask)
64
65 if hasattr(self, 'transformer'):
---> 66 x = x.permute(1, 0, 2)
67 x = self.transformer(x, src_key_padding_mask=
68 (1 - mask).bool() if mask is not None else None)

RuntimeError: number of dims don't match in permute

For extracting the provided features, did you resize the frames to a particular size

I tried running the feature_extractor, but the output size of region_vectors from layer3 is incompatible with region_vectors from other layers. My input is size [64, 3, 240, 320], region_vector of layer 1 has size[64, 256, 3, 4], layer2 [64, 512, 3, 4], layer 3 [64, 1024, 4, 5],
layer4 [64, 2048, 3, 4]. Could you provide more information on the settings on how the extracted frames are preprocessed before extracting the resnet features? Thank you!

about dimension

After reading codes of Visil and DnS, I noticed that you have reduced D=t93840 to t9512. I am a little curious why don't you change 9. It is also an expression of space.

[Questions about code evaluation results]

Hi, thanks for your work!

The results I obtained and the evaluation results of the paper are different, but I don't know which part I'm missing.

And, Could you share FIVR-200K, DNS-100K Original Videos with me?

My Training execution code is as follows.

python train_student.py --student_type coarse-grained --experiment_path experiments/dns_students --trainset_hdf5 /mldisk/nfs_shared_/dh/datasets/dns_100k.hdf5

My evaluation execution code is as follows.

python evaluation_student.py --student_path /workspace/distill-and-select/experiments/dns_students/model_cg_student.pth --dataset FIVR-5K --dataset_hdf5 /mldisk/nfs_shared_/dh/datasets/fivr_200k.hdf5

As for the parameters of training and evaluation codes, the code provided was used as it was.

The performance evaluation of the coarse-grained student model in the paper is

===== FIVR-5K Dataset =====
Queries: 50 videos
Database: 5000 videos
----------------
DSVR mAP: 0.634
CSVR mAP: 0.647
ISVR mAP: 0.608

The performance evaluation of my coarse-grained student model

===== FIVR-5K Dataset =====
Queries: 50 videos
Database: 5000 videos
----------------
DSVR mAP: 0.5735
CSVR mAP: 0.5920
ISVR mAP: 0.5579

The text below is the result of my execution.

python train_student.py --student_type coarse-grained --experiment_path experiments/dns_students --trainset_hdf5 /mldisk/nfs_shared_/dh/datasets/dns_100k.hdf5

epoch 19: 100%|█| 688/688 [05:20<00:00,  2.14iter/s, total_loss=0.090 (0.064), distillat
epoch 20: 100%|█| 688/688 [05:22<00:00,  2.14iter/s, total_loss=0.067 (0.064), distillat
epoch 21: 100%|█| 688/688 [05:20<00:00,  2.14iter/s, total_loss=0.070 (0.064), distillat
epoch 22: 100%|█| 688/688 [05:22<00:00,  2.14iter/s, total_loss=0.066 (0.063), distillat
epoch 23: 100%|█| 688/688 [05:23<00:00,  2.13iter/s, total_loss=0.066 (0.063), distillat
epoch 24: 100%|█| 688/688 [05:20<00:00,  2.14iter/s, total_loss=0.050 (0.062), distillat
epoch 25: 100%|█| 688/688 [05:22<00:00,  2.13iter/s, total_loss=0.058 (0.061), distillat
epoch 26: 100%|█| 688/688 [05:23<00:00,  2.13iter/s, total_loss=0.067 (0.061), distillat
epoch 27: 100%|█| 688/688 [05:23<00:00,  2.13iter/s, total_loss=0.054 (0.061), distillat
epoch 28: 100%|█| 688/688 [05:23<00:00,  2.13iter/s, total_loss=0.062 (0.061), distillat
epoch 29: 100%|█| 688/688 [05:24<00:00,  2.12iter/s, total_loss=0.059 (0.060), distillat
epoch 30: 100%|█| 688/688 [05:23<00:00,  2.12iter/s, total_loss=0.086 (0.060), distillat
epoch 31: 100%|█| 688/688 [05:19<00:00,  2.15iter/s, total_loss=0.056 (0.060), distillat
epoch 32: 100%|█| 688/688 [05:21<00:00,  2.14iter/s, total_loss=0.056 (0.059), distillat
epoch 33: 100%|█| 688/688 [05:22<00:00,  2.13iter/s, total_loss=0.058 (0.059), distillat
epoch 34: 100%|█| 688/688 [05:24<00:00,  2.12iter/s, total_loss=0.075 (0.059), distillat
epoch 35: 100%|█| 688/688 [05:24<00:00,  2.12iter/s, total_loss=0.063 (0.059), distillat

...

epoch 250: 100%|█| 688/688 [05:20<00:00,  2.15iter/s, total_loss=0.037 (0.044), distill
epoch 251: 100%|█| 688/688 [05:19<00:00,  2.15iter/s, total_loss=0.045 (0.044), distill
epoch 252: 100%|█| 688/688 [05:21<00:00,  2.14iter/s, total_loss=0.038 (0.044), distill
epoch 253: 100%|█| 688/688 [05:23<00:00,  2.13iter/s, total_loss=0.038 (0.044), distill
epoch 254: 100%|█| 688/688 [05:23<00:00,  2.13iter/s, total_loss=0.069 (0.044), distill
epoch 255: 100%|█| 688/688 [05:19<00:00,  2.15iter/s, total_loss=0.052 (0.044), distill
epoch 256: 100%|█| 688/688 [05:22<00:00,  2.14iter/s, total_loss=0.041 (0.044), distill
epoch 257: 100%|█| 688/688 [05:22<00:00,  2.13iter/s, total_loss=0.043 (0.044), distill
epoch 258: 100%|█| 688/688 [05:20<00:00,  2.14iter/s, total_loss=0.071 (0.044), distill
epoch 259: 100%|█| 688/688 [05:18<00:00,  2.16iter/s, total_loss=0.055 (0.044), distill
epoch 260: 100%|█| 688/688 [05:21<00:00,  2.14iter/s, total_loss=0.049 (0.044), distill
epoch 261: 100%|█| 688/688 [05:21<00:00,  2.14iter/s, total_loss=0.037 (0.043), distill
epoch 262: 100%|█| 688/688 [05:20<00:00,  2.15iter/s, total_loss=0.047 (0.044), distill
epoch 263: 100%|█| 688/688 [05:20<00:00,  2.15iter/s, total_loss=0.055 (0.044), distill
epoch 264: 100%|█| 688/688 [05:21<00:00,  2.14iter/s, total_loss=0.034 (0.043), distill
epoch 265: 100%|█| 688/688 [05:18<00:00,  2.16iter/s, total_loss=0.042 (0.044), distill
epoch 266: 100%|█| 688/688 [05:19<00:00,  2.16iter/s, total_loss=0.035 (0.044), distill
epoch 267: 100%|█| 688/688 [05:20<00:00,  2.15iter/s, total_loss=0.041 (0.043), distill
epoch 268: 100%|█| 688/688 [05:20<00:00,  2.15iter/s, total_loss=0.040 (0.044), distill
epoch 269: 100%|█| 688/688 [05:20<00:00,  2.14iter/s, total_loss=0.051 (0.044), distill
epoch 270: 100%|█| 688/688 [05:21<00:00,  2.14iter/s, total_loss=0.050 (0.044), distill
epoch 271: 100%|█| 688/688 [05:17<00:00,  2.17iter/s, total_loss=0.039 (0.044), distill
epoch 272: 100%|█| 688/688 [05:18<00:00,  2.16iter/s, total_loss=0.038 (0.043), distill
epoch 273: 100%|█| 688/688 [05:16<00:00,  2.17iter/s, total_loss=0.041 (0.043), distill
epoch 274: 100%|█| 688/688 [05:17<00:00,  2.16iter/s, total_loss=0.034 (0.043), distill
epoch 275: 100%|█| 688/688 [05:16<00:00,  2.18iter/s, total_loss=0.056 (0.043), distill
epoch 276: 100%|█| 688/688 [05:17<00:00,  2.17iter/s, total_loss=0.040 (0.043), distill
epoch 277: 100%|█| 688/688 [05:17<00:00,  2.17iter/s, total_loss=0.051 (0.043), distill
epoch 278: 100%|█| 688/688 [05:19<00:00,  2.15iter/s, total_loss=0.038 (0.043), distill
epoch 279: 100%|█| 688/688 [05:19<00:00,  2.15iter/s, total_loss=0.028 (0.043), distill
epoch 280: 100%|█| 688/688 [05:19<00:00,  2.15iter/s, total_loss=0.039 (0.043), distill
epoch 281: 100%|█| 688/688 [05:21<00:00,  2.14iter/s, total_loss=0.035 (0.043), distill
epoch 282: 100%|█| 688/688 [05:21<00:00,  2.14iter/s, total_loss=0.033 (0.043), distill
epoch 283: 100%|█| 688/688 [05:21<00:00,  2.14iter/s, total_loss=0.039 (0.043), distill
epoch 284: 100%|█| 688/688 [05:22<00:00,  2.14iter/s, total_loss=0.034 (0.043), distill
epoch 285: 100%|█| 688/688 [05:18<00:00,  2.16iter/s, total_loss=0.041 (0.043), distill
epoch 286: 100%|█| 688/688 [05:19<00:00,  2.15iter/s, total_loss=0.058 (0.043), distill
epoch 287: 100%|█| 688/688 [05:20<00:00,  2.15iter/s, total_loss=0.038 (0.043), distill
epoch 288: 100%|█| 688/688 [05:21<00:00,  2.14iter/s, total_loss=0.036 (0.043), distill
epoch 289: 100%|█| 688/688 [05:17<00:00,  2.17iter/s, total_loss=0.061 (0.043), distill
epoch 290: 100%|█| 688/688 [05:19<00:00,  2.16iter/s, total_loss=0.037 (0.043), distill
epoch 291: 100%|█| 688/688 [05:20<00:00,  2.15iter/s, total_loss=0.041 (0.043), distill
epoch 292: 100%|█| 688/688 [05:20<00:00,  2.15iter/s, total_loss=0.039 (0.043), distill
epoch 293: 100%|█| 688/688 [05:20<00:00,  2.15iter/s, total_loss=0.048 (0.043), distill
epoch 294: 100%|█| 688/688 [05:18<00:00,  2.16iter/s, total_loss=0.025 (0.043), distill
epoch 295: 100%|█| 688/688 [05:19<00:00,  2.15iter/s, total_loss=0.043 (0.043), distill
epoch 296: 100%|█| 688/688 [05:18<00:00,  2.16iter/s, total_loss=0.039 (0.043), distill
epoch 297: 100%|█| 688/688 [05:20<00:00,  2.15iter/s, total_loss=0.056 (0.043), distill
epoch 298: 100%|█| 688/688 [05:20<00:00,  2.15iter/s, total_loss=0.036 (0.043), distill
epoch 299: 100%|█| 688/688 [05:21<00:00,  2.14iter/s, total_loss=0.042 (0.043), distill


root@5b101930459a:/workspace/distill-and-select# python evaluation_student.py --student_path experiments/DnS_students/model_fg_att_student.pth --dataset FIVR-5K --dataset_hdf5 /mldisk/nfs_shared_/dh/datasets/fivr_200k.hdf5


> Loading network
CoarseGrainedStudent(
  (transformer): TransformerEncoder(
    (layers): ModuleList(
      (0): TransformerEncoderLayer(
        (self_attn): MultiheadAttention(
          (out_proj): _LinearWithBias(in_features=512, out_features=512, bias=True)
        )
        (linear1): Linear(in_features=512, out_features=2048, bias=True)
        (dropout): Dropout(p=0.1, inplace=False)
        (linear2): Linear(in_features=2048, out_features=512, bias=True)
        (norm1): LayerNorm((512,), eps=1e-05, elementwise_affine=True)
        (norm2): LayerNorm((512,), eps=1e-05, elementwise_affine=True)
        (dropout1): Dropout(p=0.1, inplace=False)
        (dropout2): Dropout(p=0.1, inplace=False)
      )
    )
    (norm): LayerNorm((512,), eps=1e-05, elementwise_affine=True)
  )
  (netvlad): NetVLAD(
    (conv): Conv2d(512, 64, kernel_size=(1, 1), stride=(1, 1), bias=False)
    (reduction_layer): Linear(in_features=32768, out_features=1024, bias=False)
    (norm): LayerNorm((1024,), eps=1e-05, elementwise_affine=True)
  )
)

> Extract features of the query videos
100%|████████████████████████████████████████████████████| 2/2 [00:02<00:00,  1.34s/it]

> Extract features of the target videos
100%|████████████████████████████████████████████████| 157/157 [01:57<00:00,  1.34it/s]

> Calculate query-target similarities

> Evaluation on FIVR
===== FIVR-5K Dataset =====
Queries: 50 videos
Database: 5000 videos
----------------
DSVR mAP: 0.5735
CSVR mAP: 0.5920
ISVR mAP: 0.5579

h5py threw an error in `dataloader`.

Great work!
Initializing h5py in the __init__ method of Datagenerator results in KeyError: 'Unable to open object (wrong B-tree signature), but initializing it in the __getitem__ method runs successfully. This could be due to a version issue with PyTorch; my version is 1.12.

coarse_similarity = cg_student.calculate_video_similarity(query_cg_features, target_cg_features)
print(coarse_similarity.shape)
torch.Size([15, 15])
Is it possible to get a similarity with a value?

How did you generate the dataset features？

if I use other datasets, how can I generate features like you did?