Code Monkey home page Code Monkey logo

Comments (5)

albertony avatar albertony commented on June 2, 2024 1

I have a slight concern about sftp startup times. Lots of people use sftp without a config file which means that it probes for shells/supported hashes each time it is used. Perhaps we should delay hash support probing until it is asked for?

Good point, I agree we need to look into that if/when additional hash is added to sftp backend.

from rclone.

albertony avatar albertony commented on June 2, 2024

Do you specifically want sftp to support executing b3sum, the same way as md5sum and sha1sum, because you intend to use it for supporting checksums on copy etc on this specific backend? Or do suggest support for blake3 more in general? I think adding support in rclone hashsum command would probably be relevant, as it already can be used with the sftp backend on the remote end as the checksum command, i.e. if not md5sum or sha1sum is not available but having the rclone binary on the server is allowed.

I've seen the same as you. A while ago I played around with using this as hash for the local filesystem backend in rclone, but did not get consistently better performance results that lead me to finalize a PR for it. The IO contribution, caching etc seemed to affect the results far more than the actual hash calculation, however there might be niche cases where it could be relevant, I just didn't spend more time on it.

When speaking of hash performance, xxHash (XXH3) is also often part of the discussion, and is normally even faster - probably the fastest around currently? In contrast to blake3 it is not a cryptographic hash, and is therefore in another league sort of, however for file checksumming it may not be a requirement.
Edit: It was also briefly discussed in forum 3 years ago: https://forum.rclone.org/t/faster-non-cryptographic-hashing-algorithm-for-faster-file-comparison/23601

As a curiosity, some do even use a combination of both:

Ccache uses BLAKE3, a very fast cryptographic hash algorithm, for the hashing. On a cache hit, ccache is able to supply all of the correct compiler outputs (including all warnings, dependency file, etc) from the cache. Data stored in the cache is checksummed with XXH3, an extremely fast non-cryptographic algorithm, to detect corruption.

(https://ccache.dev/manual/4.9.html#_how_ccache_works)

from rclone.

albertony avatar albertony commented on June 2, 2024

I just updated my previous experimental implementation, and pushed a draft #7767, which will create beta builds at https://beta.rclone.org/branch/add-xxh-blake-hash/ in case anyone feels like testing it out.

from rclone.

ncw avatar ncw commented on June 2, 2024

Having a tree based hash is a very interesting idea and one which, for example the dropboxhash is emulating in a simplistic way. The rclone internals aren't currently optimized for tree based hashes though, they expect sequential hashes. I'm not sure the go interface supports nonsequential hashes.

However getting sftp to support b3sum will work well in conjunction.

I have a slight concern about sftp startup times. Lots of people use sftp without a config file which means that it probes for shells/supported hashes each time it is used. Perhaps we should delay hash support probing until it is asked for?

from rclone.

albertony avatar albertony commented on June 2, 2024

On second though... I assumed you meant it probes hashes on each NewFs or similar, but I don't think it does? Don't have sftp server to test against right now edit: based on reading code, and quick testing against rclone serve sftp. I think it probes for shell type, but not hashes.

from rclone.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.