Comments (5)
Hi @audiomuze,
Thanks a lot for the suggestion! Indeed I think this could speed up the process a bit and make difPy more efficient. This will be considered for the next update!
Again thank you and best,
Elise
from duplicate-image-finder.
@UplandsDynamic, thanks for looking into this issue!
I looked into it as well as part of the update for v3.0.9, but came to the preliminary conclusion that the current implementation might actually be the most accurate one. Let me explain why:
Pillow has no way (at least I didn't find any, maybe you will?) of programmatically accessing which filetypes are supported. Therefore, we would need to manually define which filetypes difPy would proceed with to decode, and which not. This would lead to manual updating of supported filetypes, and should Pillow add or remove some without being updated in difPy, it might lead to files being skipped, which could actually have successfully been decoded by Pillow.
I'm happy to hear if you have another idea or approach that could be implemented here. I think it can be very useful, we just need to find what is the most efficient way to do so. 🙂
Thanks and best,
Elise
from duplicate-image-finder.
@elisemercury Ah I see your thinking, yes that could be a problem.
I've almost done implementing that since my last message (it didn't take long), and was about to test. I hardcoded a tuple of common extensions, which would necessarily need to be updated as and when Pillow changes - unless, as you say, Pillow could be programmatically queried for that information (which would be very useful!).
In my implementation, the user may select the option to filter out 'invalid' extensions, using an optional input arg, as per the other features. If that boolean is set to True, the files that do not have a valid extension are removed from play during the directory scanning process. I defined a filter method in the _help class, that returns a list of 'valid' and skipped files; the skipped are then added to the output stats (either just a count, or a list of skipped files, if verbose logging is on).
I'll go ahead and finishing testing and submit a pull request with what I've already done, then have a think about the best way to address the point you raised. As I say, it didn't take long to code, so if you decided not to merge, then no worries! You may well have a better approach to implementation in any case, but my effort will be there if you want it.
Cheers,
Dan
from duplicate-image-finder.
@UplandsDynamic I think your approach is 100% appropriate. Having the ability to select whether or not to process * gives one the best of both worlds - analyze everything or analyze only known file extensions. I know which switch I'd be choosing.
from duplicate-image-finder.
Hi @audiomuze,
Thanks for your feedback - this is very helpful. The feature has been implemented in v3.0.10
. Thanks again @UplandsDynamic for your contributions.
Best,
Elise
from duplicate-image-finder.
Related Issues (20)
- Similar or Duplicate images aren't recognized at all HOT 5
- All my images are considered invalid HOT 3
- Minimum requirements not met HOT 1
- Make the package installable and usable via `pipx` HOT 2
- Enhancement - Optional parameter set for source folder / comparison folder mode HOT 3
- query about json HOT 2
- Launching dif.py with the parameters below causes it to terminate HOT 2
- Multiprocessing HOT 1
- Bug: File extension filter erroneously flags files that have more than dot (.) character in file name. HOT 2
- search.delete() always fails (even with matches); nested search.lower_quality dictionary HOT 2
- Incorrect results and a few further observations HOT 4
- MemoryError HOT 5
- Fail to detect pictures compressed to a lower resolution HOT 4
- Feature: Detection of cropped duplicates HOT 2
- distutils was removed in Python 3.12 HOT 1
- A new process has started before the current process has finished its bootstrapping phase HOT 9
- Support of other file types HOT 1
- Bug: Incorrect MSE values for certain folder input parameters HOT 1
- difPy.build() throws RuntimeError HOT 1
- Widely varied search times HOT 1
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from duplicate-image-finder.