Code Monkey home page Code Monkey logo

Comments (5)

nemequ avatar nemequ commented on July 17, 2024

This is a bug in the doboz library. All squash_get_max_compressed_size is doing is returning doboz::Compressor::getMaxCompressedSize. See squash_doboz_get_max_compressed_size in plugins/doboz/squash-doboz.cpp. Upstream bug (including test case) at https://bitbucket.org/attila_afra/doboz/issue/1/doboz-compressor-getmaxcompressedsize-can

I'll add a unit test for random data to Squash. If this doesn't get fixed upstream (the upstream code hasn't been touched in a while, project may be dead) I'll remove the doboz plugin.

from squash.

Intensity avatar Intensity commented on July 17, 2024

Understood - not all upstream libraries will be robust and free of bugs. The random data unit test should help narrow down and see if any other compressors have incorrect estimates of the maximum compressed size, even if it's by a few bytes, since random data is likely to test the worst case scenario.

As far as the set of plugins offered in squash goes, I would find value in having doboz kept around. It's a very nice asymmetric algorithm with great performance characteristics (quite fast in decompression, and yet a very reasonable compression ratio). Also, it may not be maintained as much as other algorithms because of its relative lack of popularity, but this doesn't seem to me to be due to an inherent fault in the algorithm itself (since it performs quite well). Having it included as an option here would provide additional flexibility, and the algorithm could eventually gain in popularity as an alternative (or ideas could be borrowed from its source).

Also, when it comes to an algorithm that doesn't work completely well or may seem to have some shortcomings, I'd still prefer to see that it is kept around. That's because in my view, even a slightly buggy compressor can be of use, because of the asymmetric case where one tries a few algorithms and parameters before writing the final compressed format. One can test whether the compressed data verifies correctly (as part of the process of testing decompression speed for that set). If so, given a determinism and platform-independence assumption, that algorithm can be used with confidence for that particular data. If not, maybe something else can be tried, but where the majority of the data still uses that algorithm. Also, as more people rely on squash for their individual use cases, they may not be too happy with a plugin disappearing, as they may value further upstream changes to squash too. Having algorithms easily accessible could also motivate someone to maintain it or at least report a bug, since they otherwise might go with a lower barrier of entry (which would be to use standard zlib, bzip2, lzma).

from squash.

nemequ avatar nemequ commented on July 17, 2024

Perhaps instead of removing it altogether I could simply disable it by default (so it will not build unless you pass --enable-whatever to configure). I have some pretty serious reservations about enabling using code, even more so when it is known to be buggy. I think those reservations will also hold true for most Squash consumers… remember, Squash isn't just about choosing a codec, it's also intended for use in a production environment.

I'm much more concerned about the fact that Doboz writes past the end of the buffer it was given than that getMaxCompressedSize returns an incorrect value. We can always just add a few bytes to the max compressed size in the Squash plugin to be safe, but if the codec writes to memory it's not supposed to there isn't much Squash can do (at least not without crippling performance).

Except for a few specific cases which Squash consumers should be able to test for (using functions like squash_codec_knows_uncompressed_size) everything should work in all codecs. Otherwise you're back to writing separate code for each codec, which pretty much negates the purpose of Squash. Two things which should be testable but aren't (I'm glad we're having this conversation, since it's encouraging me to create issues instead of keeping it in my head) are issues #41 and #43.

Also, as more people rely on squash for their individual use cases, they may not be too happy with a plugin disappearing, as they may value further upstream changes to squash too.

Once Squash is API stable I plan on being very conservative about removing plugins. Meanwhile, I'd like to significantly expand the unit tests so we can be confident that a codec is reliable. If we're going to treat the list of plugins available as part of the default configuration as API (which I think we have to), we need to be confident that either there aren't any critical bugs or that any critical bugs will be fixable, either by us or upstream.

Having algorithms easily accessible could also motivate someone to maintain it or at least report a bug, since they otherwise might go with a lower barrier of entry (which would be to use standard zlib, bzip2, lzma).

I think robust unit testing goes a long way towards resolving this. If we can locate bugs like this in the unit tests then we can notify the appropriate people of the bugs and, if/when they are resolved, enable the plugin in Squash. I really want to expand the unit tests to be as brutal as they can be.

from squash.

nemequ avatar nemequ commented on July 17, 2024

Closing this as the squash plugin now adds a couple extra byres until the bug in doboz is resolved.

from squash.

nemequ avatar nemequ commented on July 17, 2024

Fixed upstream, imported into squash with e0a6248

from squash.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.