Code Monkey home page Code Monkey logo

Comments (5)

elisemercury avatar elisemercury commented on June 18, 2024

Hi @audiomuze,

Thanks a lot for the suggestion! Indeed I think this could speed up the process a bit and make difPy more efficient. This will be considered for the next update!

Again thank you and best,
Elise

from duplicate-image-finder.

elisemercury avatar elisemercury commented on June 18, 2024

@UplandsDynamic, thanks for looking into this issue!

I looked into it as well as part of the update for v3.0.9, but came to the preliminary conclusion that the current implementation might actually be the most accurate one. Let me explain why:

Pillow has no way (at least I didn't find any, maybe you will?) of programmatically accessing which filetypes are supported. Therefore, we would need to manually define which filetypes difPy would proceed with to decode, and which not. This would lead to manual updating of supported filetypes, and should Pillow add or remove some without being updated in difPy, it might lead to files being skipped, which could actually have successfully been decoded by Pillow.

I'm happy to hear if you have another idea or approach that could be implemented here. I think it can be very useful, we just need to find what is the most efficient way to do so. 🙂

Thanks and best,
Elise

from duplicate-image-finder.

UplandsDynamic avatar UplandsDynamic commented on June 18, 2024

@elisemercury Ah I see your thinking, yes that could be a problem.

I've almost done implementing that since my last message (it didn't take long), and was about to test. I hardcoded a tuple of common extensions, which would necessarily need to be updated as and when Pillow changes - unless, as you say, Pillow could be programmatically queried for that information (which would be very useful!).

In my implementation, the user may select the option to filter out 'invalid' extensions, using an optional input arg, as per the other features. If that boolean is set to True, the files that do not have a valid extension are removed from play during the directory scanning process. I defined a filter method in the _help class, that returns a list of 'valid' and skipped files; the skipped are then added to the output stats (either just a count, or a list of skipped files, if verbose logging is on).

I'll go ahead and finishing testing and submit a pull request with what I've already done, then have a think about the best way to address the point you raised. As I say, it didn't take long to code, so if you decided not to merge, then no worries! You may well have a better approach to implementation in any case, but my effort will be there if you want it.

Cheers,
Dan

from duplicate-image-finder.

audiomuze avatar audiomuze commented on June 18, 2024

@UplandsDynamic I think your approach is 100% appropriate. Having the ability to select whether or not to process * gives one the best of both worlds - analyze everything or analyze only known file extensions. I know which switch I'd be choosing.

from duplicate-image-finder.

elisemercury avatar elisemercury commented on June 18, 2024

Hi @audiomuze,

Thanks for your feedback - this is very helpful. The feature has been implemented in v3.0.10. Thanks again @UplandsDynamic for your contributions.

Best,
Elise

from duplicate-image-finder.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.