quixdb / squash-benchmark Goto Github PK

View Code? Open in Web Editor NEW

42.0 42.0 10.0 78.03 MB

Benchmarking all of the algorithms from Squash

Home Page: https://quixdb.github.io/squash-benchmark/

License: MIT License

C 88.20% Objective-C 7.57% Makefile 4.22%

squash-benchmark's People

Contributors

Stargazers

Watchers

Forkers

ezhangle ivan-tkatchev cooloppo t-mat psteinb travisdowns igxactly-forks jakekirk moneytech

squash-benchmark's Issues

./benchmark runtime issues

2 small issues noticed while trying ./benchmark with squash 0.6 :

Not specifying -o results in segfault
cannot open read-only files

Switching between linear and logarithmic scales isn't discoverable

You can click the labels for the X and Y axis on the charts to toggle between linear and logarithmic, but it isn't obvious that the feature exists.

Disagreement on hyphen in squash library name

With a locally built and installed version of libsquash 0.8, the benchmark doesn't build due to a disagreement about whether the library name is libsquash-0.8 or libsquash0.8.

The benchmark Makefile uses pkg-config --libs --cflags squash-0.8 to link libsquash, which returns:

-I/usr/local/include -I/usr/local/include/squash-0.8 -L/usr/local/lib -lsquash-0.8

However, the actual libraries installed into /usr/local/lib don't have the hyphen:

lrwxrwxrwx 1 root root       21 Jan  4 17:11 libsquash0.8.so -> libsquash0.8.so.0.8.0
-rw-r--r-- 1 root root   383072 Jan  3 18:01 libsquash0.8.so.0.8
lrwxrwxrwx 1 root root       19 Jan  4 17:11 libsquash0.8.so.0.8.0 -> libsquash0.8.so.0.8

So the benchmark build fails with:

cc -Wall -Wextra -g -o benchmark benchmark.c timer.c `pkg-config --libs --cflags squash-0.8`
/usr/bin/ld: cannot find -lsquash-0.8

Allow people to import results from their computers

It would be great to allow people to either upload a result file or enter a URL to grab one from somewhere else instead of relying solely on my computers. Doesn't have to be saved on the server or anything, basically just a way to use the web site as a viewer for their own results.

Add ability to turn codecs on and off

The copy codec can be a useful point of reference, but it can also mess up the scales because it can be a huge outlier. It would be nice to be able to turn the codec on and off at will.

This functionality could/should also be extended to all codecs, and perhaps even each level. That would allow people to, for example, only compare a subset of codecs (e.g., people probably aren't interested in ZPAQ and density).

Make it easier to compare machines

It should be possible to make the machine the variable we compare (i.e., choose a codec and dataset, it tells you how the that codec and dataset behave on different machines).

Add a transmission graph

(Originally posted to quixdb/squash#82 by @gpnuma):

Here's an issue to make things easy to track for you :)
A transmission speed graph would be very informative, in addition to the other graphs which are very valuable already.
Same principle as this : http://fastcompression.blogspot.fr/p/compression-benchmark.html.

Also, a round trip speed graph in x / ratio in y (a derivative of the previous but with unlimited transmission speed) would also be interesting as a standalone as it gives the general qualities of each algorithm depending on files and platforms.
I think comparing only compression speed/ratio on one side and on the other side decompression speed/ratio is a bit limited, because you could easily create an algorithm that compresses very slowly and decompresses ultra quickly, as well as - although probably harder - an algorithm that compresses extremely quickly and decompresses very slowly if you see what I mean.
Whereas having assets in both areas (like all well-known algorithms) is much more difficult hence the idea of this round trip graph.
Thanks !

feature request : single level benchmark

It's currently possible to select a compression algorithm to bench, using -c algo.
It works fine, but all defined compression levels are then tested by default.

It would be great if it was also possible to optionally select a single compression level, for finer tests.

For example, -c algo,level (idea from fsbench)

Content overflows on mobile devices

Take a look at the site with the device simulator in your browser's developer tools to see what I mean. The page is really, really messed up.

Unlikely results

From lz4/lz4#109:

Also, it's still too soon to link current results, as they can, at times, prove inconsistent. For example these ones, where a few fast compressors including LZ4 score an abysmal decompression speed < 20 MB/s, which can't be correct by a few order of magnitude. Once again, it's not reason to worry yet. As said, it's normal for a first version to have a few issues to sort out. It can be a fluke, a minor change, an intermediate bug, well whatever. So let's wait for the next version, and we'll start building on this future one.

Best guess right now is that it's because a background service or cron job kicked off there, which I should make more of an effort to prevent.

It would be useful to run the benchmark multiple times on the same machine, then load the data into a spreadsheet and compare the two runs… hopefully that will let us spot any anomalies and figure out a way to improve the methodology to avoid it happening again.

Create memcpy plugin

Should help provide a baseline.

Add box plots for summary information

We could use box plots to summarize results for a single codec + level across different datasets. One each for compression speed, decompression speed, and maybe ratio.

Add ability to switch to logarithmic scale for graphs in result tables

The graphs in the result table (for compression ratio, compression speed, and decompression speed) are linear, and currently can't be switched to logarithmic. It would be nice to add an interface to toggle between the two.

It would be nice if this were also consistent with the solution for #20.

Link to the .csv files of the results

Would it be possible to have a link on the website to download the .csv file of the benchmark results? For people who might want to make their own graphs or rankings.

is this repo still uptodate with squash master?

Hi -

the Makefile yields squash-0.8 as the expected libsquash version
https://github.com/psteinb/squash-benchmark/blob/master/Makefile#L51

So I tried with master and the last release 0.7.0, but in both cases I get:

In file included from /home/steinbac/software/squash/0.7.0/include/squash-0.7/squash/squash.h:65:0,
                 from benchmark.c:38:
/home/steinbac/software/squash/0.7.0/include/squash-0.7/squash/options.h:121:27: note: expected ‘const char *’ but argument is of type ‘SquashCodec * {aka struct _SquashCodec *}’
 SQUASH_API int            squash_options_get_int       (SquashOptions* options, const char* key);
                           ^
benchmark.c:93:21: error: too many arguments to function ‘squash_options_get_int’
   const int level = squash_options_get_int (opts, codec, "level");

As squash_options_get_int is undocumented, I am unclear how to fix this.

Display the version or revision of libraries used

Sometimes there are noticeable differences between 2 different versions of the same library.
I don’t have a specific example but it’s easy to imagine that after a big update the numbers could turn out completely different (not saying in good or bad particularly, but as an example some type of files might benefit from an improvement and some might not).
That would make the accuracy of the benchmark results even better.

please add sdch

I know this particular codec has a very specific use case. But it is also one of the most common use cases in the modern world, and it would be very interesting to see how it stacks up to the more general codecs out there.

http://lists.w3.org/Archives/Public/ietf-http-wg/2008JulSep/att-0441/Shared_Dictionary_Compression_over_HTTP.pdf

Add new corpus

As discussed on encode.ru, there are some pretty bad limitations with our current corpus. See my initial post in that thread for details. I think that thread has gotten a bit out of hand (feels like design by committee), but I still plan on putting together a new corpus, which should obviously be added to the squash benchmark.

please update the brotli to the recent version

there are a significant updates to the brotli repository since the last generated results

Speed result

minor feature request : get speed displayed for console results

when using current version of benchmark, results are provided using, I suspect, time in second. Some minor issues :

Scale is not explicitly mentioned
Number of significant digits can sometimes be too small, especially when the file benchmarked is very small and the algorithm selected is very fast

Example :

./benchmark -c lz4 sum
Using sum:
  lz4:lz4
    level 1: compressed (0.0007 CPU, 0.0007 wall, 32004 bytes)... decompressed (0.000828 CPU, 0.000829 wall).
    level 2: compressed (0.0007 CPU, 0.0007 wall, 33343 bytes)... decompressed (0.001052 CPU, 0.001054 wall).
    level 3: compressed (0.0008 CPU, 0.0008 wall, 29358 bytes)... decompressed (0.000849 CPU, 0.000851 wall).
    level 4: compressed (0.0008 CPU, 0.0008 wall, 24915 bytes)... decompressed (0.000873 CPU, 0.000875 wall).

The number of significant digits is fine for decompression, but it's not for compression, as a slight difference in rounding can make > 10%.

Anyway, while I'm fine regarding this format (with possibly a bit more digits) for -o jsonFile, it's less agreeable to use for direct onscreen result.
One can always grab a calculator and manually type size / time to obtain speed, but it's less usable than having speed directly calculated on screen.

json and msgpack datasets

json and msgpack are fairly trendy these days. seeing results for these would be great! It'd be even better if all serialization formats (protobuf, xml, json, msgpack) all originated from the same dataset.

Add memory usage information

The obvious route for heap usage (fork() and wait3()) also has some issues when considering things like preexisting freelists in malloc implementations, fragmentation, and malloc requesting more memory than it needs (e.g., next highest power of two, a multiple of the page size, etc.).

I think the only way to do this accurately would be to override malloc/realloc/free/new/delete/mmap, but I still need to find a reliable solution for measuring the stack size.

Please consider "heatshrink"

heatshrink is a compression library that is optimized for small embedded systems. It has quite small memory footprint and CPU requirements.

I would be interested to see it compared to regular desktop compression algorithms.

Being designed for embedded use, it is portable. And the source includes simple command-line compression and decompression tools.

https://github.com/atomicobject/heatshrink

Add human genome data?

The human reference genome ("fasta" filetype) and next generation sequencing datasets ('fastq' filetype) might provide interesting additions to the benchmark. Currently the state of the art is zlib (e.g., http://www.htslib.org/benchmarks/zlib.html).

Furthermore, these folks might be a possible consumer of the squash API in their underlying C library HTSLib (https://github.com/samtools/htslib)

read-only samples

benchmark seems unable to use a sample file set in read-only mode (chmod 444)

Changing settings re-draws charts

Changing settings (including switching between logarithmic and linear scales) forces a complete redraw instead of animating a transition.

Can't find squash/squash.h

When building squash-benchmark, after building and installing squash 0.8, it fails as follows:

cc -Wall -Wextra -g -o benchmark benchmark.c timer.c `pkg-config --libs --cflags squash-0.8`
benchmark.c:47:27: fatal error: squash/squash.h: No such file or directory

Indeed, benchmark.c does an #include <squash/squash.h>, but the squash headers themselves are installed into a version-specific squash directory:

/usr/local/include/squash-0.8/squash.h

That directory does get on the command line thanks to pkq-config, which resolves to:

-I/usr/local/include -I/usr/local/include/squash-0.8 -L/usr/local/lib -lsquash-0.8

... but squash.h is a direct child of squash-0.8 so #include <squash/squash.h> will fail. There is a squash directory under there, but it has just the finer-grained APIs, and not squash.h.

My guess is that the idea is just that users of the installed library should just include <squash.h> so I'll submit a pull request to that effect.

Run copy codec first to get stuff cached

The first codec to run may have a small disadvantage on the first run since the kernel is unlikely to have the file cached. It's probably quite a small disadvantage; the operation is usually run multiple times, and most of the time will not be counted as CPU time anyways, but it would be easy to just run the copy codec once first for each file being benchmarked.

Make it easier to compare datasets

It should be possible to make the dataset the variable we compare (i.e., choose a codec and machine, it tells you how the that codec and machine behave on different datasets).

floating point data

Hello,
I am a 3D programmer, and should be great check for a list of 32bit floating point data, maybe a stanford 3D scanned model: http://graphics.stanford.edu/data/3Dscanrep/
Best regards,
Perry