akamhy / videohash Goto Github PK

Near Duplicate Video Detection (Perceptual Video Hashing) - Get a 64-bit comparable hash-value for any video.

Home Page: https://pypi.org/project/videohash

License: MIT License

Python 100.00%

near-duplicate-video-clip-detection near-duplicate-video duplicate-videos find-similar-videos-by-content duplicate-detection video python ffmpeg video-deduplication duplicate-video-finder

videohash's People

Contributors

Stargazers

Watchers

videohash's Issues

assets host issue

text.mp4

Translation of the Hindi langauge written in Latin alphabet/Roman alphabet into English is:
The third wave of covid-19 is here
Again* I will pass the tests without studying**.

* (exams were canceled in the past year)
** (because of online tests and cheating)

Write FFmpeg installer for windows in Python 3 (should try if you are good at writing installer for windows)

If you are good at Python please write a script that would download the latest FFmpeg from https://www.gyan.dev/ffmpeg/builds/ffmpeg-git-full.7z

Uncompress the archive.

Copy the bin directory from the decompressed folder, and paste inside C:\Program Files\ffmpeg.

Add C:\Program Files\ffmpeg\bin\ to the Environment Variables.

See https://github.com/akamhy/videohash/wiki/Install-FFmpeg,-but-how%3F#install-ffmpeg-on-windows.

I've written a script for testing on windows, you may find it useful. Link : https://github.com/akamhy/videohash/blob/main/assets/windows_ffmpeg_downloader_at_cwd.py

CVE-2020-10378 (Medium) detected in Pillow-6.2.2-cp27-cp27mu-manylinux1_x86_64.whl

CVE-2020-10378 - Medium Severity Vulnerability

Vulnerable Library - Pillow-6.2.2-cp27-cp27mu-manylinux1_x86_64.whl

Python Imaging Library (Fork)

Library home page: https://files.pythonhosted.org/packages/12/ad/61f8dfba88c4e56196bf6d056cdbba64dc9c5dfdfbc97d02e6472feed913/Pillow-6.2.2-cp27-cp27mu-manylinux1_x86_64.whl

Path to dependency file: videohash/requirements.txt

Path to vulnerable library: videohash/requirements.txt

Dependency Hierarchy:

❌ Pillow-6.2.2-cp27-cp27mu-manylinux1_x86_64.whl (Vulnerable Library)

Found in HEAD commit: 775d08735341d5bb435fcc1501160e1d150f31e0

Found in base branch: main

Vulnerability Details

In libImaging/PcxDecode.c in Pillow before 7.1.0, an out-of-bounds read can occur when reading PCX files where state->shuffle is instructed to read beyond state->buffer.

Publish Date: 2020-06-25

URL: CVE-2020-10378

CVSS 3 Score Details (5.5)

Base Score Metrics:

Exploitability Metrics:
- Attack Vector: Local
- Attack Complexity: Low
- Privileges Required: None
- User Interaction: Required
- Scope: Unchanged
Impact Metrics:
- Confidentiality Impact: High
- Integrity Impact: None
- Availability Impact: None

For more information on CVSS3 Scores, click here.

Suggested Fix

Type: Upgrade version

Origin: python-pillow/Pillow@41b554b

Release Date: 2020-06-25

Fix Resolution: 7.1.0

Step up your Open Source Security Game with WhiteSource here

[Feature Request] Command-line interface.

Hello, everyone. Thank you for an excellent library!

Would it be possible to add an official command-line interface? Something like:

videohash_cmdline.bash video1 video0
99%

ERROR: [generic] None: Unable to download webpage: (caused by URLError('unknown url type: c'))

To Reproduce
Generate videohash of a video directly from its path
url1 = "C:\Users\PCNAME\Documents\myapp\static_media\vid1.mp4"
videohash1 = VideoHash(url=url1)

url2 = "C:\Users\PCNAME\Documents\myapp\static_media\vid2.mp4"
videohash2 = VideoHash(url=url2)

Expected behavior
please I want to Generate videohash of these videos directly from their path.
videohash1.is_similar(videohash2)
False

Screenshots
If applicable, add screenshots to help explain your problem.

Please complete the following information:

Operating system:win8
Python Version:3.9
VideoHash version:3.0.1

Additional context
Add any other context about the problem here.

It appears that OpenCV is faster for grabbing the frames

Benchmark CV and FFmpeg frame generation.

pyhon subprocess inherits stdin by default and causes ffmpeg to fail

When running videohash as part of a program that has also used subprocess it seems to inherit the stdin and that can result in various failures for ffmpeg.

I have been documenting it here:
digitalmethodsinitiative/4cat#303 (comment)
Essentially, I can use videohash alone, but not with additional subprocesses unless I edit it and provide it with stdin=subprocess.DEVNULL since the default stdin is in use.

Sending a PR shortly with needed edit.

BUG REPORT

Describe the bug
Hash collision for some videos of same length

To Reproduce

v1 = VideoHash(url="https://canvaz.scdn.co/upload/artist/3PhoLpVuITZKcymswpck5b/video/5e966e9c01f147cdae93a02c61a4bf7c.cnvs.mp4")
v2 = VideoHash(url="https://canvaz.scdn.co/upload/licensor/7JGwF0zhX9oItt9901OvB5/video/dc047df48f774d1590b61fd38bc082e4.cnvs.mp4")
print(v1 == v2)

Expected behavior
The hash should be different but is same.

Screenshots
NA

Please complete the following information:

Operating system: NA
Python Version: NA
VideoHash version: NA

Additional context
The issue can probably be resolved by extracting more features such as brightness levels or maybe the most dominant colors of frames extracted at a specific FPS. Increase the number of hash bits to accommodate more data.

why not use colorhash + whash and change the bit site to 128( twice of 64, the current size)? They generate hash is very different ways and collisions should be highly unlikely.

add conda-forge

see https://github.com/conda-forge/videohash-feedstock

method to delete the temp junk

call method on the videohashobject to delete the temp files created by the instance.

Temp folder not freeing up

The temp folder (or cache folder on Mac) gets increasingly bigger when processing a lot of videos at the same time.

It SHOULD cleanup after itself, and not leave the working/temp files there.

Currently i have to do it manually when processing ~20k files

Hash Collision

Describe the bug
Hash collision occurs with videos of the same length and with similar colour schemes.

To Reproduce

v1 = VideoHash(url='https://user-images.githubusercontent.com/47534140/185008752-da1f09c7-a177-4a46-9c64-230744e998c1.mp4')
v2 = VideoHash(url='https://user-images.githubusercontent.com/47534140/185008748-b8922142-37cc-48a0-bad9-1385ba016587.mov')
print (v1 == v2)

Expected behavior
The hashes of the videos should be different.

Screenshots
NA

Please complete the following information:

Operating system: NA
Python Version: 3.10.5
VideoHash version: 3.0.1

Additional context

Improve grammar and fix typo

I am a non-native speaker and I suck at formal grammar. If you are a native speaker or just good at writing doc strings/comments/copy editing please open a pull request. Language must be formal, don't add any jokes or slang.

Thank you!

Long Video might fail to maketile due to the jpeg format

Problem:
When I try to get the hash of a long video (about 90 min), it turns out errors when making tile:
"encoder error -2 when writing image file" and
"Maximum supported image dimension is 65500 pixels".

My solution now:
Instead of just changing the frame_interval to shorten the width, I found that it might be a limit of the jpeg format.
So the solution I am using is to change the default output of the function "make_tile" in "tilemaker.py" to png format:
save_tiles(tiles, prefix="tile", directory=tiles_dir, file_format="png")

Suggestion:
Now, there is no error. However, I am not sure if this will affect the hash value (as generated with jpeg format), or if any confilct to any part of this lib.
As I notice the default parameter of "file_format" in th function "save_tiles" is already png, I am confused why the jpeg format is explictly given in the function "make_tile".
If there is no other problem, maybe using png as the default in "make_tile" is better considering some long videos?

Thank you!

exceptions

https://github.com/akamhy/videohash/blob/main/videohash/vhash.py#L98 is nonsense. We actually could not find the downloaded video.

CVE-2020-35653 (High) detected in Pillow-6.2.2-cp27-cp27mu-manylinux1_x86_64.whl

CVE-2020-35653 - High Severity Vulnerability

Vulnerable Library - Pillow-6.2.2-cp27-cp27mu-manylinux1_x86_64.whl

Python Imaging Library (Fork)

Library home page: https://files.pythonhosted.org/packages/12/ad/61f8dfba88c4e56196bf6d056cdbba64dc9c5dfdfbc97d02e6472feed913/Pillow-6.2.2-cp27-cp27mu-manylinux1_x86_64.whl

Path to dependency file: videohash/requirements.txt

Path to vulnerable library: videohash/requirements.txt

Dependency Hierarchy:

❌ Pillow-6.2.2-cp27-cp27mu-manylinux1_x86_64.whl (Vulnerable Library)

Found in HEAD commit: 775d08735341d5bb435fcc1501160e1d150f31e0

Found in base branch: main

Vulnerability Details

In Pillow before 8.1.0, PcxDecode has a buffer over-read when decoding a crafted PCX file because the user-supplied stride value is trusted for buffer calculations.

Publish Date: 2021-01-12

URL: CVE-2020-35653

CVSS 3 Score Details (7.1)

Base Score Metrics:

Exploitability Metrics:
- Attack Vector: Network
- Attack Complexity: Low
- Privileges Required: None
- User Interaction: Required
- Scope: Unchanged
Impact Metrics:
- Confidentiality Impact: Low
- Integrity Impact: None
- Availability Impact: High

For more information on CVSS3 Scores, click here.

Suggested Fix

Type: Upgrade version

Origin: https://cve.mitre.org/cgi-bin/cvename.cgi?name=CVE-2020-35653

Release Date: 2021-01-12

Fix Resolution: 8.1.0

Step up your Open Source Security Game with WhiteSource here

CVE-2020-10177 (Medium) detected in Pillow-6.2.2-cp27-cp27mu-manylinux1_x86_64.whl

CVE-2020-10177 - Medium Severity Vulnerability

Vulnerable Library - Pillow-6.2.2-cp27-cp27mu-manylinux1_x86_64.whl

Python Imaging Library (Fork)

Library home page: https://files.pythonhosted.org/packages/12/ad/61f8dfba88c4e56196bf6d056cdbba64dc9c5dfdfbc97d02e6472feed913/Pillow-6.2.2-cp27-cp27mu-manylinux1_x86_64.whl

Path to dependency file: videohash/requirements.txt

Path to vulnerable library: videohash/requirements.txt

Dependency Hierarchy:

❌ Pillow-6.2.2-cp27-cp27mu-manylinux1_x86_64.whl (Vulnerable Library)

Found in HEAD commit: 775d08735341d5bb435fcc1501160e1d150f31e0

Found in base branch: main

Vulnerability Details

Pillow before 7.1.0 has multiple out-of-bounds reads in libImaging/FliDecode.c.

Publish Date: 2020-06-25

URL: CVE-2020-10177

CVSS 3 Score Details (5.5)

Base Score Metrics:

Exploitability Metrics:
- Attack Vector: Local
- Attack Complexity: Low
- Privileges Required: None
- User Interaction: Required
- Scope: Unchanged
Impact Metrics:
- Confidentiality Impact: High
- Integrity Impact: None
- Availability Impact: None

For more information on CVSS3 Scores, click here.

Suggested Fix

Type: Upgrade version

Origin: python-pillow/Pillow@41b554b

Release Date: 2020-06-25

Fix Resolution: 7.1.0

Step up your Open Source Security Game with WhiteSource here

CVE-2020-10994 (Medium) detected in Pillow-6.2.2-cp27-cp27mu-manylinux1_x86_64.whl

CVE-2020-10994 - Medium Severity Vulnerability

Vulnerable Library - Pillow-6.2.2-cp27-cp27mu-manylinux1_x86_64.whl

Python Imaging Library (Fork)

Library home page: https://files.pythonhosted.org/packages/12/ad/61f8dfba88c4e56196bf6d056cdbba64dc9c5dfdfbc97d02e6472feed913/Pillow-6.2.2-cp27-cp27mu-manylinux1_x86_64.whl

Path to dependency file: videohash/requirements.txt

Path to vulnerable library: videohash/requirements.txt

Dependency Hierarchy:

❌ Pillow-6.2.2-cp27-cp27mu-manylinux1_x86_64.whl (Vulnerable Library)

Found in HEAD commit: 775d08735341d5bb435fcc1501160e1d150f31e0

Found in base branch: main

Vulnerability Details

In libImaging/Jpeg2KDecode.c in Pillow before 7.1.0, there are multiple out-of-bounds reads via a crafted JP2 file.

Publish Date: 2020-06-25

URL: CVE-2020-10994

CVSS 3 Score Details (5.5)

Base Score Metrics:

Exploitability Metrics:
- Attack Vector: Local
- Attack Complexity: Low
- Privileges Required: None
- User Interaction: Required
- Scope: Unchanged
Impact Metrics:
- Confidentiality Impact: None
- Integrity Impact: None
- Availability Impact: High

For more information on CVSS3 Scores, click here.

Suggested Fix

Type: Upgrade version

Origin: python-pillow/Pillow@41b554b

Release Date: 2020-06-25

Fix Resolution: 7.1.0

Step up your Open Source Security Game with WhiteSource here

CVE-2020-35654 (High) detected in Pillow-6.2.2-cp27-cp27mu-manylinux1_x86_64.whl

CVE-2020-35654 - High Severity Vulnerability

Vulnerable Library - Pillow-6.2.2-cp27-cp27mu-manylinux1_x86_64.whl

Python Imaging Library (Fork)

Library home page: https://files.pythonhosted.org/packages/12/ad/61f8dfba88c4e56196bf6d056cdbba64dc9c5dfdfbc97d02e6472feed913/Pillow-6.2.2-cp27-cp27mu-manylinux1_x86_64.whl

Path to dependency file: videohash/requirements.txt

Path to vulnerable library: videohash/requirements.txt

Dependency Hierarchy:

❌ Pillow-6.2.2-cp27-cp27mu-manylinux1_x86_64.whl (Vulnerable Library)

Found in HEAD commit: 775d08735341d5bb435fcc1501160e1d150f31e0

Found in base branch: main

Vulnerability Details

In Pillow before 8.1.0, TiffDecode has a heap-based buffer overflow when decoding crafted YCbCr files because of certain interpretation conflicts with LibTIFF in RGBA mode.

Publish Date: 2021-01-12

URL: CVE-2020-35654

CVSS 3 Score Details (8.8)

Base Score Metrics:

Exploitability Metrics:
- Attack Vector: Network
- Attack Complexity: Low
- Privileges Required: None
- User Interaction: Required
- Scope: Unchanged
Impact Metrics:
- Confidentiality Impact: High
- Integrity Impact: High
- Availability Impact: High

For more information on CVSS3 Scores, click here.

Suggested Fix

Type: Upgrade version

Origin: https://cve.mitre.org/cgi-bin/cvename.cgi?name=CVE-2020-35654

Release Date: 2021-01-12

Fix Resolution: 8.1.0

Step up your Open Source Security Game with WhiteSource here

BUG REPORT: AttributeError due to PIL.Image v10+ dropping ANTIALIAS

Describe the bug

VideoHash results in:

AttributeError: module 'PIL.Image' has no attribute 'ANTIALIAS'

Due to PIL.Image v10+ having deprecated ANTILIAS; fix appears to be LANCZOS or pinning PIL < 10:

https://pillow.readthedocs.io/en/stable/releasenotes/10.0.0.html#constants

To Reproduce

Install via pip with PIL and pillow from brew (currently 10.0.0)

VideoHash("path/to.mp4")

Expected behavior

Object with .hash and no AttributeError

Screenshots

N/A

Please complete the following information:

Operating system: macOS Ventura 13.5
Python Version: 3.11.4
VideoHash version: 3.0.1

Additional context

I think this would be a good first issue for another contributor. Should I attempt a PR?

[WARNING] False Positive Issues

Currently, we are experiencing a high number of false positives when utilizing this library. In our scenario, approximately 70% of the results are false positives, which significantly impacts the accuracy of our application.

To address this issue, I suggest to use the following precheck before using the library:

Preprocessing based on video length: Consider incorporating a preprocessing step that filters out videos with durations less than 1 minute. This criteria can help eliminate irrelevant and short-duration videos, which often contribute to false positive matches.
Similarity threshold adjustment: Modify the similarity threshold used by the library to make it more stringent. By increasing the threshold, the library will only consider videos with a higher degree of similarity, reducing the occurrence of false positives. This adjustment can significantly improve the precision of the matching process.
Comparison of video durations: Introduce a comparison mechanism that checks the proximity of video durations when assessing similarity. This step would ensure that two videos are not considered similar if their durations differ significantly. By including this additional criterion, we can reduce the occurrence of false positives caused by videos with vastly different lengths.

But still thanks to the author to provide this library for low cost comparison, but if you're using it in a very serious scenario, I would suggest use it like the bloom filter, and do intensive algorithm after positive result.

Video hashs on vastly different videos yield is_similar() True

Would modifying similar_percentage help? If so, which direction should I go?

Alphanumeric string sorter

we needa sort the hashes calculated in order for deep scan. This can be used to match videos even if edited significantly.

support windows, use join

currently broken on Windows

in operator to test if another image or video is part of the hash object

We wanna support the image and videos to be compared to the hash. We can store the hashes of all the frames in mem and check for lowest score or user defined score.

CVE-2020-35655 (Medium) detected in Pillow-6.2.2-cp27-cp27mu-manylinux1_x86_64.whl

CVE-2020-35655 - Medium Severity Vulnerability

Vulnerable Library - Pillow-6.2.2-cp27-cp27mu-manylinux1_x86_64.whl

Python Imaging Library (Fork)

Library home page: https://files.pythonhosted.org/packages/12/ad/61f8dfba88c4e56196bf6d056cdbba64dc9c5dfdfbc97d02e6472feed913/Pillow-6.2.2-cp27-cp27mu-manylinux1_x86_64.whl

Path to dependency file: videohash/requirements.txt

Path to vulnerable library: videohash/requirements.txt

Dependency Hierarchy:

❌ Pillow-6.2.2-cp27-cp27mu-manylinux1_x86_64.whl (Vulnerable Library)

Found in HEAD commit: 775d08735341d5bb435fcc1501160e1d150f31e0

Found in base branch: main

Vulnerability Details

In Pillow before 8.1.0, SGIRleDecode has a 4-byte buffer over-read when decoding crafted SGI RLE image files because offsets and length tables are mishandled.

Publish Date: 2021-01-12

URL: CVE-2020-35655

CVSS 3 Score Details (5.4)

Base Score Metrics:

Exploitability Metrics:
- Attack Vector: Network
- Attack Complexity: Low
- Privileges Required: None
- User Interaction: Required
- Scope: Unchanged
Impact Metrics:
- Confidentiality Impact: Low
- Integrity Impact: None
- Availability Impact: Low

For more information on CVSS3 Scores, click here.

Suggested Fix

Type: Upgrade version

Origin: https://cve.mitre.org/cgi-bin/cvename.cgi?name=CVE-2020-35655

Release Date: 2021-01-12

Fix Resolution: 8.1.0

Step up your Open Source Security Game with WhiteSource here

download video quality flag

if sett the worst flag or not. (youtube-dl/yt-dlp feature)

Design a vector(SVG) logo for this project

Please design a vector logo for this project.
The logo must contain the project name 'videohash'.

You must release the logo under the MIT License(same license as the project).

I want the output in SVG and PNG formats. The logo must be transparent(the background) and should be professional.

Examples of logos of some real projects that I like. The new logo can be of a similar design but MUST NOT be exactly copied.

Both the PNG and SVG formats should be inside the assets directory in the pull request.

~~This issue is not gonna get assigned before you open a pull request but the one I like the most is gonna be selected. Please open a pull request only if you are good at making vector logos.~~ Thank you!

BUG REPORT - MAKE the -f worst optional

Describe the bug
The download fails on reddit.

To Reproduce
less than or equal to v2.1.7

Python 3.9.0 (default, Oct 21 2021, 15:27:22) 
[GCC 10.3.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> url1 = "https://www.reddit.com/r/IndianDankMemes/comments/rn2yxa/ha_bhai_normi_hu_mai/"
>>> from videohash import VideoHash
>>> url1 = "https://www.reddit.com/r/IndianDankMemes/comments/rn2yxa/ha_bhai_normi_hu_mai/"
>>> url2 = "https://www.reddit.com/r/IndianDankMemes/comments/rmw1o9/i_am_happy_i_am_happy_i_am_happi_today/"
>>> videohash1 = VideoHash(url=url1)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/home/akamhy/projects/benchmark_videohash/venv/lib/python3.9/site-packages/videohash/videohash.py", line 85, in __init__
    self._copy_video_to_video_dir()
  File "/home/akamhy/projects/benchmark_videohash/venv/lib/python3.9/site-packages/videohash/videohash.py", line 288, in _copy_video_to_video_dir
    Download(
  File "/home/akamhy/projects/benchmark_videohash/venv/lib/python3.9/site-packages/videohash/downloader.py", line 51, in __init__
    self.download_video()
  File "/home/akamhy/projects/benchmark_videohash/venv/lib/python3.9/site-packages/videohash/downloader.py", line 85, in download_video
    raise DownloadFailed(
videohash.exceptions.DownloadFailed: '/home/akamhy/projects/benchmark_videohash/venv/bin/yt-dlp' failed to download the video at 'https://www.reddit.com/r/IndianDankMemes/comments/rn2yxa/ha_bhai_normi_hu_mai/'.
[Reddit] rn2yxa: Downloading JSON metadata
[Reddit] rn2yxa: Downloading m3u8 information
[Reddit] rn2yxa: Downloading MPD manifest

ERROR: [Reddit] k4nqp99cdc781: Requested format is not available

>>> videohash1 = VideoHash(url=url1, download_worst=False)
>>> videohash2 = VideoHash(url=url2, download_worst=False)
>>> videohash1 - videohash2
4
>>>

Expected behavior
Download the video without any extra arguments.

Please complete the following information:

Operating system: NA
Python Version: NA
VideoHash version: NA

Additional context
I don't use Reddit but a friend of mine was using videohash to search posts by templates.
Both the URLs use the same template.

add option to choose any hash user prefers fromm imagehash

Crop black bars from video

Should probably use [ffmpeg.git] / libavfilter /vf_cropdetect.c or just use python to crop the frames post extraction.

The black bars aren't an issue if they occupy less than 15% of the area but they are quite problematic if the area occupied is more than 15%. The issue seems fixable.

add a duration attribute to the videohash objects

In seconds (units)
return type always be float
add tests for the rocket.mkv video.

Video paths containing spaces break ffmpeg calls

When using the from_path method of hashing a video, if the path to the video contains any number of spaces, it will break the ffmpeg commands given to subprocess.Popen. This is because:

The paths within the command are not encapsulated by quotation marks, causing ffmpeg to interpret only the part of the path prior to the first whitespace as the target path, and the rest of the given path as additional, invalid arguments
The command given to subprocess.Popen is split on spaces before being interpreted, again forcing the system to interpret different parts of the path as new arguments

This is easily fixed by inserting escaped quotation marks around any paths in the ffmpeg commands and dropping the .split() on operation in the command and setting shell=True in Popen.

I've taken the liberty of including the updated functions here:

def frames(input_file, output_prefix):
    """Extract the frames of the video.
    Export frames as images at output_prefix as a 7 digit padded jpeg file.
    """
    command = "ffmpeg -i \"{input_file}\" -r 1 \"{output_prefix}_%07d.jpeg\"".format(
        input_file=input_file, output_prefix=output_prefix
    )
    process = Popen(command, shell=True, stdout=DEVNULL, stderr=STDOUT)
    output, error = process.communicate()


def compressor(input_file, task_dir, task_uid):
    # APPLY : ffmpeg -i input.webm -s 64x64 -r 30  output.mp4

    output_file = join(task_dir, task_uid + "compressed.mp4")
    command = "ffmpeg -i \"{input_file}\" -s 64x64 -r 30 \"{output_file}\"".format(
        input_file=input_file, output_file=output_file
    )
    process = Popen(command, shell=True, stdout=DEVNULL, stderr=STDOUT)
    output, error = process.communicate()

    return output_file

Hope you find this useful! Thanks for the great module!

CVE-2020-11538 (High) detected in Pillow-6.2.2-cp27-cp27mu-manylinux1_x86_64.whl

CVE-2020-11538 - High Severity Vulnerability

Vulnerable Library - Pillow-6.2.2-cp27-cp27mu-manylinux1_x86_64.whl

Python Imaging Library (Fork)

Library home page: https://files.pythonhosted.org/packages/12/ad/61f8dfba88c4e56196bf6d056cdbba64dc9c5dfdfbc97d02e6472feed913/Pillow-6.2.2-cp27-cp27mu-manylinux1_x86_64.whl

Path to dependency file: videohash/requirements.txt

Path to vulnerable library: videohash/requirements.txt

Dependency Hierarchy:

❌ Pillow-6.2.2-cp27-cp27mu-manylinux1_x86_64.whl (Vulnerable Library)

Found in HEAD commit: 775d08735341d5bb435fcc1501160e1d150f31e0

Found in base branch: main

Vulnerability Details

In libImaging/SgiRleDecode.c in Pillow through 7.0.0, a number of out-of-bounds reads exist in the parsing of SGI image files, a different issue than CVE-2020-5311.

Publish Date: 2020-06-25

URL: CVE-2020-11538

CVSS 3 Score Details (8.1)

Base Score Metrics:

Exploitability Metrics:
- Attack Vector: Network
- Attack Complexity: High
- Privileges Required: None
- User Interaction: None
- Scope: Unchanged
Impact Metrics:
- Confidentiality Impact: High
- Integrity Impact: High
- Availability Impact: High

For more information on CVSS3 Scores, click here.

Suggested Fix

Type: Upgrade version

Origin: python-pillow/Pillow@41b554b

Release Date: 2020-06-25

Fix Resolution: 7.1.0

Step up your Open Source Security Game with WhiteSource here

Don't call youtube-dl via CLI instead import it.

On shared OS with limited access, we might not be able to access the CLI if it's in a venv.

[Feature Request] Hash based on limited number of frames

My use case for this software would only be in needing to compare the hash of the first few seconds of video for hundreds of files of varying lengths. The reason for this is part of a classification task ie. I have a lot of files and want to classify them based on the contents of the first few seconds.

I could create a script which trims the videos all to 2-3 seconds long then use videohash on those clips, then relate those results back to their original clip but it would be great if videohash could handle all of this for me.

What I imagine would be something like having an max_frames parameter added to the VideoHash function.

eg. videohash.VideoHash(..., frame_interval=0.2, max_frames=10) would provide me a hash based on 10 frames from the first ~2 seconds of video.

I could also see perhaps setting a time range being handy instead, eg. start_time: '2:00', end_time: '2:30' would hash only that 30 second clip from the video. This would solve my use case but also be a more general solution for other use cases, though I think it may be a little more nuanced to implement vs. the first proposal.

Interested to hear the maintainers thoughts on this as I might be able to tackle a solution if there's interest.

performance

for same length video it's great, but we should probably make it length agnostic

Convert files to a consistent cotaniner /codec

This just works, maybe ffmpeg ain't consistent in frame extraction for different containers and codecs.

Change Frame Interval

Would be great to expose access to the frame interval. I quite often work with very long or very short videos and would be great to specifiy the frame interval to check. Alternatively, be able to select a total number of frames and have it randomly select that number of frames from across the video.

Feature Request: serialize and deserialize hash result

I'm trying to hash some videos and save them to DB. Then check aganist to a new video.

I hope there can be a Hash object as video hash, can he serialize/deserialize to/from string or bytes, instead of current VideoHash, which can be only calculated from video source.

Hashing speed issue.

Describe the bug
It takes quite a while to hash a video.

To Reproduce

from videohash import VideoHash
import time

start = time.time()

url = 'https://user-images.githubusercontent.com/47534140/185008752-da1f09c7-a177-4a46-9c64-230744e998c1.mp4'
v1 = VideoHash(url=url, frame_interval=12)

print(f"Finished in {time.time() - start} secs")

Expected behavior
It should realistically be doable in under a second

Please complete the following information:

Operating system: Windows 10
Python Version: 3.10.2
VideoHash version: 2.1.9

Additional context
Currently takes about 3/4 seconds

CVE-2020-10379 (High) detected in Pillow-6.2.2-cp27-cp27mu-manylinux1_x86_64.whl

CVE-2020-10379 - High Severity Vulnerability

Vulnerable Library - Pillow-6.2.2-cp27-cp27mu-manylinux1_x86_64.whl

Python Imaging Library (Fork)

Library home page: https://files.pythonhosted.org/packages/12/ad/61f8dfba88c4e56196bf6d056cdbba64dc9c5dfdfbc97d02e6472feed913/Pillow-6.2.2-cp27-cp27mu-manylinux1_x86_64.whl

Path to dependency file: videohash/requirements.txt

Path to vulnerable library: videohash/requirements.txt

Dependency Hierarchy:

❌ Pillow-6.2.2-cp27-cp27mu-manylinux1_x86_64.whl (Vulnerable Library)

Found in HEAD commit: 775d08735341d5bb435fcc1501160e1d150f31e0

Found in base branch: main

Vulnerability Details

In Pillow before 7.1.0, there are two Buffer Overflows in libImaging/TiffDecode.c.

Publish Date: 2020-06-25

URL: CVE-2020-10379

CVSS 3 Score Details (7.8)

Base Score Metrics:

Exploitability Metrics:
- Attack Vector: Local
- Attack Complexity: Low
- Privileges Required: None
- User Interaction: Required
- Scope: Unchanged
Impact Metrics:
- Confidentiality Impact: High
- Integrity Impact: High
- Availability Impact: High

For more information on CVSS3 Scores, click here.

Suggested Fix

Type: Upgrade version

Origin: python-pillow/Pillow@41b554b

Release Date: 2020-06-25

Fix Resolution: 7.1.0

Step up your Open Source Security Game with WhiteSource here

Videohash 'is_similar' function returns True for different videos

BUG REPORT
the videohash function is_similar returns True even when the videos are different.

NOTES:
One thing I noticed is that the is_similar function seems to be not correctly implemented. Basically this function sometimes returns True even though videos are completely different. I expect the is_similar function to return True for videos that at least shares some common characteristics as video length for example.

How are two videos supposed to be the equal if their length is not even similar? I would add a check prior than generating the hash value.

So, the videohash code should also take video length in consideration when comparing two videos for similarities

akamhy / videohash Goto Github PK

videohash's People

Contributors

Stargazers

Watchers

Forkers

videohash's Issues

CVE-2020-10378 - Medium Severity Vulnerability

CVE-2020-35653 - High Severity Vulnerability

CVE-2020-10177 - Medium Severity Vulnerability

CVE-2020-10994 - Medium Severity Vulnerability

CVE-2020-35654 - High Severity Vulnerability

CVE-2020-35655 - Medium Severity Vulnerability

CVE-2020-11538 - High Severity Vulnerability

CVE-2020-10379 - High Severity Vulnerability

Recommend Projects

Recommend Topics

Recommend Org