Code Monkey home page Code Monkey logo

Comments (5)

nemequ avatar nemequ commented on July 17, 2024

I'm assuming you're talking about something to help you write your own benchmarks, not just the benchmarking tool distributed with Squash. That does give you CPU time + wall clock time (although it's presented as compression/decompression rate in the HTML output, and only CPU time is used).

I've been thinking about benchmarks lately, and looking into how to do them right. Long story short, It's hard. I suggest you take a look at Aleksey Shipilev's JVMLS 2013 talk, "JVM Benchmarking". It's about Java, but the first half is pretty generic, and quite interesting. So, my current thinking is that Squash (and everyone else) should use a benchmarking library which does all that stuff for us. The only thing I've found so far is cbenchsuite, which I haven't yet had time too look at closely, so I don't know whether or not it would be suitable.

Measuring memory usage is tricky. Most libraries don't let us supply a custom malloc/free function, so AFAIK the only way to do that would be to override malloc using LD_PRELOAD. Even with overriding malloc the numbers may not be accurate. I know mcm is planning to use mmap + MAP_ANONYMOUS instead of malloc (perhaps it already does, haven't looked at the code since Mathieu Chartier mentioned that was his plan), and SHARC will emit an error if memory isn't aligned to something (IIRC 8-byte, but I'm not positive), so I would be surprised if its internal buffers didn't use posix_memalign, and I don't know if overriding malloc would catch that or not. Just asking the OS to tell you how much memory is in use before and after isn't reliable, either, because most mallocs just use sbrk, so they can't necessarily return all the memory they've allocated.

from squash.

Intensity avatar Intensity commented on July 17, 2024

I am interested in being able to request (by way of options) to receive basic timing information from an individual compressor invocation, even if it's an in-memory request. I'd find value in getting that information on an individual invocation of the compression, not just as part of a separately run benchmark. It seems that in the benchmarking plugin, clock_gettime () is preferred, but more generally other approaches to can be used on other platforms, so this is open to further tweaking. The request is also a placeholder to express interest in measuring information about other performance bottlenecks - for example, deducing how much memory latency or processor cache size got in the way. Compiler choice, the processor type, and available resources would all affect the resultant performance so maybe there is a tool which tries to deduce what the bottlenecks are. Thanks for the reference video on benchmarking. It seems cbenchsuite is pretty comprehensive so a subset of this could be used.

As for memory usage, it may not be easy or possible to get information reliably for all plugins, but a best effort can be of use even if it's a subset. Alternative measures can be suggested in time, too. There is an option to do a conditional preprocessor define that overrides the malloc () instances in the code to invoke one that keeps track of statistics on allocated memory. And for the other approaches like mmap or the aligned memory requirement, a different override can be used, or some plugins just might have to resort to less reliable methods of memory usage estimates.

from squash.

nemequ avatar nemequ commented on July 17, 2024

It seems that in the benchmarking plugin, clock_gettime () is preferred, but more generally other approaches to can be used on other platforms, so this is open to further tweaking.

Indeed. I had to tweak it recently for FreeBSD, and I'm guessing Windows does something completely different. I'm willing to try to add support for other systems as people need it.

The request is also a placeholder to express interest in measuring information about other performance bottlenecks - for example, deducing how much memory latency or processor cache size got in the way.

That would definitely be great information to have, but I have no idea how to detect it—I suspect that if there is a way it would be pretty OS-specific (and/or hardware-specific), which means a big and/or complex code base would probably be required to support it, which (at least to me) implies a separate library.

I am interested in being able to request (by way of options) to receive basic timing information from an individual compressor invocation, even if it's an in-memory request. I'd find value in getting that information on an individual invocation of the compression, not just as part of a separately run benchmark.

How do you envision that working? Options are just strings of keys values and are intended to be possible to share across multiple, possibly parallel, compression/decompression operations. With that in mind, I don't see how they can be leveraged to provide this functionality, and I don't think this is a sufficient reason to reconsider that. Think about using Squash to compress/decompress rows in a database… being able to create a single SquashOptions at initialization and simply reuse that whenever you need to compress or decompress a chunk of data is a big deal.

The only way I see to be helpful here (other than creating another version of every API call) would be to create something like GTimer (only I'd want to support CPU time in addition to wall clock) and include it in Squash. I'd probably be okay with something like that if we committed to stopping there, but if we don't then where does it end? Do we add a way to handle warm-up elegantly? Restrict turbo boost? Scheduling? Memory profiling?

Like I said, I think that stuff belongs in a separate library. I think something similar to a unit testing API (I like glib's, but I'm open to alternatives) would be ideal. I don't really know how cbenchsuite works, but if it fits the bill I'd love to use it instead of reinventing the wheel.

As for memory usage, it may not be easy or possible to get information reliably for all plugins, but a best effort can be of use even if it's a subset. Alternative measures can be suggested in time, too. There is an option to do a conditional preprocessor define that overrides the malloc () instances in the code to invoke one that keeps track of statistics on allocated memory. And for the other approaches like mmap or the aligned memory requirement, a different override can be used, or some plugins just might have to resort to less reliable methods of memory usage estimates.

This seems to be making my point about a separate library. This is all very cool stuff but it's a lot of bloat to include in Squash, and it's not really compression-specific, so why not a separate library?

from squash.

nemequ avatar nemequ commented on July 17, 2024

I've split the timer code out of benchmark/benchmark.c into benchmark/timer.c and benchmark/timer.h. This would be the GTimer-like API I was talking about.

It's still not part of Squash itself, but it might be at some point.

from squash.

nemequ avatar nemequ commented on July 17, 2024

I just pushed a commit to the benchmark program which could make memory reporting in the benchmarks feasible. Before each compress/decompress attempt, we now fork(). The main purpose of this is so we can recover (I hope) from the OOM killer terminating the process which is taking up all the memory, since that was happening on my Raspberry Pi with LZMA at higher levels. However, it may be possible to grab the maximum memory usage before we do anything and then after compression/decompression, and use the difference as the memory usage. We would have to add another fork() so we can separate compression and decompression, but I think it could work.

If we do this, it may be wise to revisit the squash_compress_file* functions, which currently prefer to mmap the input/output and use the buffer to buffer API instead whenever buffer to buffer functions are implemented in plugins instead of only when the stream functions aren't. This was done because it could provide a performance boost (since having to memcpy is less likely) for some codecs. but if the mapped files get counted in the memory usage it wouldn't really be fair.

from squash.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.