Code Monkey home page Code Monkey logo

Comments (9)

dsoprea avatar dsoprea commented on May 5, 2024

I would just use a phash. This produces a hash that should withstand scaling and minor amounts of filtering. At best you eliminate duplicates. At worst, you still have them. Either way, you'll eliminate files resulting from simple copies/renaming on disk.

I wrote a Go implementation a couple of years ago that produced identical results to the other implementations at http://blockhash.io . They never updated the website, but it works all the same.

https://github.com/dsoprea/go-perceptualhash

It's not perfect, but it's as close as you're gonna get without some research guys.

from photoprism.

lastzero avatar lastzero commented on May 5, 2024

Was also going to look into the sha1 fingerprints we do for all files... my feeling is it adds significant overhead, maybe we can find a faster solution that is also reliable. It is used to determine, if a file needs to be indexed again and if new thumbs are needed.

from photoprism.

dsoprea avatar dsoprea commented on May 5, 2024

That makes more sense. You're not (at this stage) concerned with associating all permutations.

from photoprism.

lastzero avatar lastzero commented on May 5, 2024

The light and color maps we have for all photos are some kind of perceptual hash... we want to use that later to easily find similar images.

from photoprism.

dsoprea avatar dsoprea commented on May 5, 2024

That's interesting. Presumably, you'd have to layer them or otherwise marry them together first.

from photoprism.

lastzero avatar lastzero commented on May 5, 2024

See https://github.com/photoprism/photoprism/blob/develop/internal/photoprism/colors.go

from photoprism.

lastzero avatar lastzero commented on May 5, 2024

We now extract Document and Instance ID via exiftool and use it for grouping related files with different names.

from photoprism.

dsoprea avatar dsoprea commented on May 5, 2024

What if those aren't present (since they're nonstandard)?

from photoprism.

lastzero avatar lastzero commented on May 5, 2024

Exif may contain an ImageID, typically a hash or UUID. Not extremely common though. Document and Instance ID can be found in XMP metadata.

from photoprism.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.