Code Monkey home page Code Monkey logo

s2vs's People

Contributors

gkordo avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar

s2vs's Issues

Test on a custom dataset

Hi, and thank you for this amazing work.

I was trying to test your code on a custom toy dataset with a couple of "original" videos and a couple of "altered" videos derived from the previous.

Any suggestion on how to structure the dataset? I downloaded the VCDB dataset and, looking at your vcdb.py, I suppose that you serialized the dataset into a pickle file. Can you provide some additional info about this process?

I thought to organize my custom dataset similarly to VCDB, and from your code I see you enforced the following structure:

self.queries = dataset['queries']
self.positives = dataset['positives']
self.dataset = dataset['dataset']

I suppose that 'queries' are the altered videos and 'dataset' the original videos, but:

  1. what 'positives' stands for? If it is the ground truth, maybe I can skip it since in this real world scenario I don't have GT.
  2. in which format should I encode videos into these keys? Are they simple lists containing the paths pointing of the videos? For instance:
    dataset['queries'] = ['.../mydataset/video1.mp4', '.../mydataset/video2.mp4',...] ?

Thank you.

Toy example with two videos

Hi,
I tested your pretrained model using the two videos inside data/examples.

Starting from the suggestions you provided, I wrote the following code

import torch
from utils import load_video
from model.similarity_network import ViSiL
import evaluation as eval # This is your evaluation.py module


feat_extractor = torch.hub.load('gkordo/s2vs:main', 'resnet50_LiMAC')
s2vs_dns = torch.hub.load('gkordo/s2vs:main', 's2vs_dns')
s2vs_vcdb = torch.hub.load('gkordo/s2vs:main', 's2vs_vcdb')

# Load the two videos from the video files
query_video = torch.from_numpy(load_video('./data/examples/video1/'))
target_video = torch.from_numpy(load_video('./data/examples/video2/'))

# Initialize pretrained ViSiL model
#model = ViSiL(pretrained='s2vs_dns').to('cuda')
model = SimilarityNetwork['ViSiL'].get_model(pretrained='s2vs_dns').to('cuda')
model.eval()

# Extract features of the two videos
query_features = eval.extract_features(feat_extractor.to('cuda'), query_video.to('cuda'))
target_features = eval.extract_features(feat_extractor.to('cuda'), target_video.to('cuda'))

# Calculate similarity between the two videos
similarity = model.calculate_video_similarity(query_features, target_features)
print(similarity)

The results I got are:

  • similarity = 0.9781 when comparing video1 to video1 itself
  • similarity = 0.7917 when comparing video1 to video2

Since video1 and 2 are completely different, I would have expected a lower value for the similarity score.
I'm mainly interested in the copy detection task and I wonder if 0.79 can actually be considered a "low value" such that I can argue that the two videos are not potential copies.

Maybe I'm missing something or my code is wrong.

Any help would be really appreciated.

Thank you again for this work

How to deal with large videos

Hi Giorgos,
thanks again for this work and for your support in my previous issues.

After some experiments on custom videos, I'm struggling with calculating the similarity between large videos (>7 min), due to a limted amount of memory in my GPU.

In fact, when I try to process such videos, I get a "CUDA out of memory error".

I managed to overcome this issue in the features extraction part, by setting fps=1 and splitting the query and target videos into N chuncks, computing the features for each chunck and then stack together all the N features tensor into a single features tensor (does it make sense?).

But when it comes to the similarity part, specifically with the calculate_video_similarity function, I get the above error.

Do you have any suggestion on how to optimize the similarity part for such videos?

I guess that splitting the query and target videos into several chuncks and compute the similarity between chunks would not result in a meaningful similarity check, but maybe I'm wrong.

Thanks a lot.

EDIT: After further investigation, it seems that what is causing the error is the torch.einsum operation inside the frame_to_frame_similarity function.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.