Describe the bug Hash collision occurs with videos of the same le

Hash Collision about videohash HOT 6 OPEN

akamhy commented on May 29, 2024

Hash Collision

from videohash.

Comments (6)

Demmenie commented on May 29, 2024

I have noticed this too, any idea what's causing it?

from videohash.

dale-wahl commented on May 29, 2024

Just learning this library myself, but if you check out the collages, you can see the collage images are virtually identical (located at v1.collage_path and v2.collage_path). Basically the scenes are too short and as far as the video hash is concerned consist of white pixels at the exact same two points and black everywhere else. My guess is that this will not be an effective tool with short videos such as the two in the example. I have been trying to find recommendations on minimum scene lengths.

Just did some testing, and you can increase the number of frames per second. Check out the results of this:

v1 = VideoHash(url='https://user-images.githubusercontent.com/47534140/185008752-da1f09c7-a177-4a46-9c64-230744e998c1.mp4',frame_interval=5)
v2 = VideoHash(url='https://user-images.githubusercontent.com/47534140/185008748-b8922142-37cc-48a0-bad9-1385ba016587.mov',frame_interval=5)
print (v1 == v2)
# and compare their collages to the ones you created without using frame_interval
print(v1.collage_path)
print(v2.collage_path)

from videohash.

Demmenie commented on May 29, 2024

I'll have to look at that example later. I've also had the opposite problem where the same video will produce different hashes, not to mention that it always takes a few seconds to run which is quite long for real-world applications these days.

I think I'll either have to fork this and see if I can improve or switch to using something else. I'd also like to see if I can add partial fingerprint, where a video that's part of another one can be recognised as such.

from videohash.

MikPisula commented on May 29, 2024

Just learning this library myself, but if you check out the collages, you can see the collage images are virtually identical (located at v1.collage_path and v2.collage_path). Basically the scenes are too short and as far as the video hash is concerned consist of white pixels at the exact same two points and black everywhere else. My guess is that this will not be an effective tool with short videos such as the two in the example. I have been trying to find recommendations on minimum scene lengths.

Just did some testing, and you can increase the number of frames per second. Check out the results of this:
v1 = VideoHash(url='https://user-images.githubusercontent.com/47534140/185008752-da1f09c7-a177-4a46-9c64-230744e998c1.mp4',frame_interval=5)
v2 = VideoHash(url='https://user-images.githubusercontent.com/47534140/185008748-b8922142-37cc-48a0-bad9-1385ba016587.mov',frame_interval=5)
print (v1 == v2)
# and compare their collages to the ones you created without using frame_interval
print(v1.collage_path)
print(v2.collage_path)

The issue of collages for short videos being almost entirely black seems to stem from the fact that the width of the collage is set to 1024px no matter what. Instead, i tried editing collagemaker.py so that it would calculate the width of the collage based on the already-existing variable self.images_per_row_in_collage, and it resulted in much nicer collages although i have not tested it extensively. From my limited testing it produces the same hash for a video when:

it is converted to a different format (tested on .mov)
is is compressed
it is downscaled (by 50%)

And, more importantly, it produces different hashes for the two videos I uploaded in the original issue.

Link: MikPisula@b4b8f32

from videohash.

MikPisula commented on May 29, 2024

When it comes to the performance, perhaps the python multiprocessing library could be used to speed up the image-manipulation part?

from videohash.

Demmenie commented on May 29, 2024

It could do but it has to be done in a way that works across devices. I think an algorithm with decent time complexity would be best. I'm also thinking it might be better to start over than to fork. I'd like to see if video fingerprinting might be possible.

Edit: I just found this: https://pypi.org/project/videofingerprint/
Looks like @akamhy was working on it but the repo doesn't exist anymore.
(Gonna start a separate issue for speed)

from videohash.

Hash Collision about videohash HOT 6 OPEN

Comments (6)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent