Code Monkey home page Code Monkey logo

Comments (5)

oniony avatar oniony commented on May 18, 2024

I've implemented the algorithm for generating the fingerprints without any problem. However using the fingerprints in a repair is a bit more problematic: to identify where a directory has moved to would require fingerprinting every directory in the search paths specified which would potentially be very expensive.

You might think this would be a preexisting problem when repairing files but TMSU is able to build a shortlist of candidates by only considering files with an identical size as a file cannot be identical to another if its filesize is different. The size check (stat) is a relatively cheap operation compared to calculating a fingerprinting.

A synonym to the file size shortcut might be to consider the number of items within the directory. This might be cheap if the directory is small but could potentially be more expensive than the fingerprint calculation if the directory has millions of items. It might be I would have to cap the number of items just like the directory fingerprint algorithm stops if the directory has too many items.

from tmsu.

mildred avatar mildred commented on May 18, 2024

If the idea is just to notice when a directory moves somewhere, perhaps what you could do is to add a file .tmsuid in that directory containing a unique id + device and inode number of that file. This will be the directory identifier.

When the directory is moved somewhere else, the file stays with its inode and device number untouched (if on same filesystem). This can be detected.

When the directory is copied, it is also possible to detect it by noticing that the .tmsuid file has change device and inode number, and is a copy.

If you don't care to detect copies vs renames, you don"t need to keep track of device and inode number.

from tmsu.

oniony avatar oniony commented on May 18, 2024

@mildred Yes, that is one possible solution however it is not very user-friendly: one would have to remember to add this meta-data to each directory ahead of time. I would prefer to come up with a solution that would transparently detect directory moves/renames.

I think the best solution (as in most transparent and requiring no up-front participation from the user) would be to shortlist candidate directories based upon the number of directory entries or their aggregate size. This should be relatively cheap to calculate as it would only require a (perhaps recursive) directory enumeration.

from tmsu.

mildred avatar mildred commented on May 18, 2024

Well, I was suggesting that tmsu would create this file. Perhaps this might be a little bit invasive, but it would record directory identity better than the list of its files (that can change possibly).

Or perhaps, just record directory inode number as a hint.

from tmsu.

oniony avatar oniony commented on May 18, 2024

I wouldn't want to use inodes as not every type of filesystem uses inodes.

With respect to the list of a directory's files changing: I would consider this no different than the contents of a file changing after the fingerprint has been calculated, i.e. it could be repaired in the same way using the repair subcommand.

from tmsu.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.