At present difPy evaluates every file in folders its pointed to, ascertaining whether

Hi <a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="

<a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="/us

Hi <a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="

Enhancement: ignore files where file extension is not of a known image type about duplicate-image-finder HOT 5 CLOSED

elisemercury commented on June 18, 2024

Enhancement: ignore files where file extension is not of a known image type

from duplicate-image-finder.

Comments (5)

elisemercury commented on June 18, 2024

Hi @audiomuze,

Thanks a lot for the suggestion! Indeed I think this could speed up the process a bit and make difPy more efficient. This will be considered for the next update!

Again thank you and best,
Elise

from duplicate-image-finder.

elisemercury commented on June 18, 2024

@UplandsDynamic, thanks for looking into this issue!

I looked into it as well as part of the update for v3.0.9, but came to the preliminary conclusion that the current implementation might actually be the most accurate one. Let me explain why:

Pillow has no way (at least I didn't find any, maybe you will?) of programmatically accessing which filetypes are supported. Therefore, we would need to manually define which filetypes difPy would proceed with to decode, and which not. This would lead to manual updating of supported filetypes, and should Pillow add or remove some without being updated in difPy, it might lead to files being skipped, which could actually have successfully been decoded by Pillow.

I'm happy to hear if you have another idea or approach that could be implemented here. I think it can be very useful, we just need to find what is the most efficient way to do so. 🙂

Thanks and best,
Elise

from duplicate-image-finder.

UplandsDynamic commented on June 18, 2024

@elisemercury Ah I see your thinking, yes that could be a problem.

I've almost done implementing that since my last message (it didn't take long), and was about to test. I hardcoded a tuple of common extensions, which would necessarily need to be updated as and when Pillow changes - unless, as you say, Pillow could be programmatically queried for that information (which would be very useful!).

In my implementation, the user may select the option to filter out 'invalid' extensions, using an optional input arg, as per the other features. If that boolean is set to True, the files that do not have a valid extension are removed from play during the directory scanning process. I defined a filter method in the _help class, that returns a list of 'valid' and skipped files; the skipped are then added to the output stats (either just a count, or a list of skipped files, if verbose logging is on).

I'll go ahead and finishing testing and submit a pull request with what I've already done, then have a think about the best way to address the point you raised. As I say, it didn't take long to code, so if you decided not to merge, then no worries! You may well have a better approach to implementation in any case, but my effort will be there if you want it.

Cheers,
Dan

from duplicate-image-finder.

audiomuze commented on June 18, 2024

@UplandsDynamic I think your approach is 100% appropriate. Having the ability to select whether or not to process * gives one the best of both worlds - analyze everything or analyze only known file extensions. I know which switch I'd be choosing.

from duplicate-image-finder.

elisemercury commented on June 18, 2024

Hi @audiomuze,

Thanks for your feedback - this is very helpful. The feature has been implemented in v3.0.10. Thanks again @UplandsDynamic for your contributions.

Best,
Elise

from duplicate-image-finder.

Enhancement: ignore files where file extension is not of a known image type about duplicate-image-finder HOT 5 CLOSED

Comments (5)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent