Code Monkey home page Code Monkey logo

Comments (14)

g1mv avatar g1mv commented on August 14, 2024

Hey @FrancescAlted

Thanks for your issue !
A lot of things appear strange to me at first sight in these results :

  • the compression levels you are referring to are blosc compression levels, right ? You only use density's chameleon otherwise ?
  • the higher the compression ratio, the faster the speed... that's really odd
  • even the lz4 results look odd, because lz4 is heavily asymetric and usually 10x faster at decompressing than compressing.

Do you have an idea on these ?
Otherwise thanks for the links I'll give c-blosc a try with static libraries to check out if anything's wrong.
BTW I just released the final 0.12.5 beta seconds ago (its the current master branch), it might already fix some problems.

from density.

g1mv avatar g1mv commented on August 14, 2024

This is what I get on OS/X with the latest dev version :

Blosc version: 1.7.0.dev ($Date:: 2015-05-27 #$)
List of supported compressors in this build: blosclz,lz4,lz4hc,snappy,zlib,density
Supported compression libraries:
  BloscLZ: 1.0.5
  LZ4: 1.7.0
  Snappy: unknown
  Zlib: 1.2.5
  DENSITY: 0.12.6
Using compressor: density
Running suite: single
--> 4, 2097152, 8, 19, density
********************** Run info ******************************
Blosc version: 1.7.0.dev ($Date:: 2015-05-27 #$)
Using synthetic data with 19 significant bits (out of 32)
Dataset size: 2097152 bytes Type size: 8 bytes
Working set: 256.0 MB       Number of threads: 4
********************** Running benchmarks *********************
memcpy(write):        595.1 us, 3360.7 MB/s
memcpy(read):         218.6 us, 9149.1 MB/s
Compression level: 0
comp(write):      331.4 us, 6034.9 MB/s   Final bytes: 2097168  Ratio: 1.00
decomp(read):     214.7 us, 9313.6 MB/s   OK
Compression level: 1
comp(write):     2216.0 us, 902.5 MB/s    Final bytes: 1204240  Ratio: 1.74
decomp(read):     537.3 us, -0.0 MB/s     FAILED.  Error code: -1
OK
Compression level: 2
comp(write):     2206.0 us, 906.6 MB/s    Final bytes: 1204240  Ratio: 1.74
decomp(read):     699.4 us, -0.0 MB/s     FAILED.  Error code: -1
OK
Compression level: 3
comp(write):     2218.4 us, 901.5 MB/s    Final bytes: 1204240  Ratio: 1.74
decomp(read):     737.3 us, -0.0 MB/s     FAILED.  Error code: -1
OK
Compression level: 4
comp(write):     1621.4 us, 1233.5 MB/s   Final bytes: 1159184  Ratio: 1.81
decomp(read):    1165.2 us, -0.0 MB/s     FAILED.  Error code: -1
OK
Compression level: 5
comp(write):     1390.6 us, 1438.2 MB/s   Final bytes: 1159184  Ratio: 1.81
decomp(read):    1189.5 us, -0.0 MB/s     FAILED.  Error code: -1
OK
Compression level: 6
comp(write):      949.2 us, 2106.9 MB/s   Final bytes: 1136656  Ratio: 1.85
decomp(read):    1355.1 us, -0.0 MB/s     FAILED.  Error code: -1
OK
Compression level: 7
comp(write):      743.6 us, 2689.6 MB/s   Final bytes: 1125520  Ratio: 1.86
decomp(read):    1497.2 us, -0.0 MB/s     FAILED.  Error code: -1
OK
Compression level: 8
comp(write):      761.6 us, 2626.1 MB/s   Final bytes: 1125520  Ratio: 1.86
decomp(read):    1562.7 us, -0.0 MB/s     FAILED.  Error code: -1
OK
Compression level: 9
comp(write):      785.8 us, 2545.2 MB/s   Final bytes: 1119824  Ratio: 1.87
decomp(read):    1980.3 us, -0.0 MB/s     FAILED.  Error code: -1
OK

Round-trip compr/decompr on 7.5 GB
Elapsed time:       9.6 s, 1751.7 MB/s

So it's very similar to your results. I'll need to check what's going on.
Can you give me a quick heads up about the way blosc operates (the logic) ? To avoid digging in your source code and to verify there's nothing incompatible with how density functions ?

from density.

FrancescAlted avatar FrancescAlted commented on August 14, 2024

Thanks for the speedy response. Yes, what blosc does is basically split the data to be compressed in small blocks (in order to use L1 as efficiently as possible, but also for leveraging multi-threading). It then applies a shuffle filter (it does not compress as such, but it helps compressors to achieve better compression ratios in many scenarios of binary data) and then pass the shuffled data to the compressor. More info about how it works in the 10 first minutes of this presentation: https://www.youtube.com/watch?v=E9q33wbPCGU

Regarding the size of the blocks (I suppose this is important for density), they are typically between 8 KB and up to around 1 MB, depending on the compression level, the data type size and the compressor that is going to be used. See the algorithm that computes block sizes here: https://github.com/FrancescAlted/c-blosc/blob/density/blosc/blosc.c#L918

Please tell me if you need more clarifications. I am eager to use DENSITY inside Blosc because I think it is a good fit, but I am trying to understand it first (then I will need to figure out how to use C89 and C99 code in the same project ;)

from density.

FrancescAlted avatar FrancescAlted commented on August 14, 2024

Oh, and regarding the question of using just Chameleon is because I am trying. If everything goes well, the idea is to use Chameleon for low compression levels and Cheetah for higher ones. Then, depending on how slow compression is, I might decide to use Lion for the highest compression level. I suppose I can use density_buffer_decompress() for decompressing any of these, right?

from density.

g1mv avatar g1mv commented on August 14, 2024

Ok I got everything to work properly using the following patch applied to your density tree : https://gist.github.com/gpnuma/e159fb6b505ef9b11e00 .

Here is a test run :

Blosc version: 1.7.0.dev ($Date:: 2015-05-27 #$)
List of supported compressors in this build: blosclz,lz4,lz4hc,snappy,zlib,density
Supported compression libraries:
  BloscLZ: 1.0.5
  LZ4: 1.7.0
  Snappy: unknown
  Zlib: 1.2.5
  DENSITY: 0.12.6
Using compressor: density
Running suite: single
--> 4, 8388608, 8, 32, density
********************** Run info ******************************
Blosc version: 1.7.0.dev ($Date:: 2015-05-27 #$)
Using synthetic data with 32 significant bits (out of 32)
Dataset size: 8388608 bytes Type size: 8 bytes
Working set: 256.0 MB       Number of threads: 4
********************** Running benchmarks *********************
memcpy(write):       2526.0 us, 3167.0 MB/s
memcpy(read):        1291.0 us, 6196.7 MB/s
Compression level: 0
comp(write):     1101.3 us, 7264.3 MB/s   Final bytes: 8388624  Ratio: 1.00
decomp(read):    1313.1 us, 6092.6 MB/s   OK
Compression level: 1
comp(write):     2871.6 us, 2785.9 MB/s   Final bytes: 4566672  Ratio: 1.84
decomp(read):    2388.5 us, 3349.3 MB/s   OK
Compression level: 2
comp(write):     2750.1 us, 2909.0 MB/s   Final bytes: 4566672  Ratio: 1.84
decomp(read):    2395.7 us, 3339.3 MB/s   OK
Compression level: 3
comp(write):     2749.2 us, 2910.0 MB/s   Final bytes: 4566672  Ratio: 1.84
decomp(read):    2407.5 us, 3323.0 MB/s   OK
Compression level: 4
comp(write):     2977.3 us, 2687.0 MB/s   Final bytes: 4511568  Ratio: 1.86
decomp(read):    2269.7 us, 3524.7 MB/s   OK
Compression level: 5
comp(write):     3043.9 us, 2628.2 MB/s   Final bytes: 4511568  Ratio: 1.86
decomp(read):    2270.0 us, 3524.2 MB/s   OK
Compression level: 6
comp(write):     4438.5 us, 1802.4 MB/s   Final bytes: 3622608  Ratio: 2.32
decomp(read):    4439.0 us, 1802.2 MB/s   OK
Compression level: 7
comp(write):     4256.3 us, 1879.6 MB/s   Final bytes: 3601120  Ratio: 2.33
decomp(read):    4279.2 us, 1869.5 MB/s   OK
Compression level: 8
comp(write):     4248.0 us, 1883.2 MB/s   Final bytes: 3601120  Ratio: 2.33
decomp(read):    4408.4 us, 1814.7 MB/s   OK
Compression level: 9
comp(write):     11095.0 us, 721.0 MB/s   Final bytes: 1887328  Ratio: 4.44
decomp(read):    12044.7 us, 664.2 MB/s   OK

Round-trip compr/decompr on 7.5 GB
Elapsed time:       7.9 s, 2141.1 MB/s

I set the significant bits to 32 otherwise the data to compress isn't very interesting (it's like processing a file full of zeroes).
Compression ratios are more contained than the lz4 run (they never go below 1.84), as I saw you're using the accel parameter for lz4_fast which can lead to near zero compression but much greater speed.

Here is a sample run with snappy, which exhibits a similar - although lower (1.60) - containment in compression ratio :

Blosc version: 1.7.0.dev ($Date:: 2015-05-27 #$)
List of supported compressors in this build: blosclz,lz4,lz4hc,snappy,zlib,density
Supported compression libraries:
  BloscLZ: 1.0.5
  LZ4: 1.7.0
  Snappy: unknown
  Zlib: 1.2.5
  DENSITY: 0.12.6
Using compressor: snappy
Running suite: single
--> 4, 8388608, 8, 32, snappy
********************** Run info ******************************
Blosc version: 1.7.0.dev ($Date:: 2015-05-27 #$)
Using synthetic data with 32 significant bits (out of 32)
Dataset size: 8388608 bytes Type size: 8 bytes
Working set: 256.0 MB       Number of threads: 4
********************** Running benchmarks *********************
memcpy(write):       2402.9 us, 3329.3 MB/s
memcpy(read):        1203.4 us, 6648.0 MB/s
Compression level: 0
comp(write):     1345.3 us, 5946.4 MB/s   Final bytes: 8388624  Ratio: 1.00
decomp(read):    1285.3 us, 6224.3 MB/s   OK
Compression level: 1
comp(write):     6389.5 us, 1252.1 MB/s   Final bytes: 5232684  Ratio: 1.60
decomp(read):    2433.4 us, 3287.5 MB/s   OK
Compression level: 2
comp(write):     4867.7 us, 1643.5 MB/s   Final bytes: 5232684  Ratio: 1.60
decomp(read):    2394.4 us, 3341.1 MB/s   OK
Compression level: 3
comp(write):     4901.1 us, 1632.3 MB/s   Final bytes: 5232684  Ratio: 1.60
decomp(read):    2389.7 us, 3347.6 MB/s   OK
Compression level: 4
comp(write):     5716.6 us, 1399.4 MB/s   Final bytes: 3990010  Ratio: 2.10
decomp(read):    2806.1 us, 2850.9 MB/s   OK
Compression level: 5
comp(write):     5746.6 us, 1392.1 MB/s   Final bytes: 3990010  Ratio: 2.10
decomp(read):    2786.3 us, 2871.2 MB/s   OK
Compression level: 6
comp(write):     6050.9 us, 1322.1 MB/s   Final bytes: 3339270  Ratio: 2.51
decomp(read):    2944.6 us, 2716.8 MB/s   OK
Compression level: 7
comp(write):     6181.5 us, 1294.2 MB/s   Final bytes: 3012514  Ratio: 2.78
decomp(read):    3119.4 us, 2564.6 MB/s   OK
Compression level: 8
comp(write):     6235.0 us, 1283.1 MB/s   Final bytes: 3012514  Ratio: 2.78
decomp(read):    3143.5 us, 2544.9 MB/s   OK
Compression level: 9
comp(write):     5757.8 us, 1389.4 MB/s   Final bytes: 2558737  Ratio: 3.28
decomp(read):    3115.5 us, 2567.8 MB/s   OK

Round-trip compr/decompr on 7.5 GB
Elapsed time:       8.1 s, 2097.7 MB/s

The workaround for output buffer size I used in the aforementioned patch will be fixed in 0.12.6 as a set of function which precisely define the minimum output buffer size for compression/decompression will appear.

from density.

g1mv avatar g1mv commented on August 14, 2024

Oh yeah, I forgot to mention : this was compiled and run against the latest dev branch version.

Overall, if I may add, I think you should test blosc against a real file instead of synthetic data. Your current method has the advantage of creating very precise entropy levels but its drawback is that it does not represent anything real.

from density.

FrancescAlted avatar FrancescAlted commented on August 14, 2024

Hmm, something is going wrong in my machine (Ubuntu 14.10 / clang 3.5):

$ bench/bench density single 4 8388608 8 32
Blosc version: 1.7.0.dev ($Date:: 2015-05-27 #$)
List of supported compressors in this build: blosclz,lz4,lz4hc,snappy,zlib,density
Supported compression libraries:
  BloscLZ: 1.0.5
  LZ4: 1.7.0
  Snappy: 1.1.1
  Zlib: 1.2.8
  DENSITY: 0.12.6
Using compressor: density
Running suite: single
--> 4, 8388608, 8, 32, density
********************** Run info ******************************
Blosc version: 1.7.0.dev ($Date:: 2015-05-27 #$)
Using synthetic data with 32 significant bits (out of 32)
Dataset size: 8388608 bytes     Type size: 8 bytes
Working set: 256.0 MB           Number of threads: 4
********************** Running benchmarks *********************
memcpy(write):           1875.6 us, 4265.2 MB/s
memcpy(read):            1351.2 us, 5920.8 MB/s
Compression level: 0
comp(write):     1312.5 us, 6095.1 MB/s   Final bytes: 8388624  Ratio: 1.00
decomp(read):    1438.6 us, 5561.0 MB/s   OK
Compression level: 1
comp(write):     55510.6 us, 144.1 MB/s   Final bytes: 5334032  Ratio: 1.57
decomp(read):     177.3 us, -0.0 MB/s     FAILED.  Error code: -1
OK
Compression level: 2
comp(write):     40168.2 us, 199.2 MB/s   Final bytes: 5334032  Ratio: 1.57
decomp(read):     170.1 us, -0.0 MB/s     FAILED.  Error code: -1
OK
Compression level: 3
comp(write):     39445.4 us, 202.8 MB/s   Final bytes: 5334032  Ratio: 1.57
decomp(read):     167.2 us, -0.0 MB/s     FAILED.  Error code: -1
OK
Compression level: 4
comp(write):     28557.8 us, 280.1 MB/s   Final bytes: 4895248  Ratio: 1.71
decomp(read):     157.2 us, -0.0 MB/s     FAILED.  Error code: -1
OK
Compression level: 5
comp(write):     21233.2 us, 376.8 MB/s   Final bytes: 4895248  Ratio: 1.71
decomp(read):     173.6 us, -0.0 MB/s     FAILED.  Error code: -1
OK
Compression level: 6
comp(write):     12465.3 us, 641.8 MB/s   Final bytes: 4675856  Ratio: 1.79
decomp(read):     177.4 us, -0.0 MB/s     FAILED.  Error code: -1
OK
Compression level: 7
comp(write):     8179.7 us, 978.0 MB/s    Final bytes: 4566672  Ratio: 1.84
decomp(read):     191.6 us, -0.0 MB/s     FAILED.  Error code: -1
OK
Compression level: 8
comp(write):     8064.7 us, 992.0 MB/s    Final bytes: 4566672  Ratio: 1.84
decomp(read):     166.2 us, -0.0 MB/s     FAILED.  Error code: -1
OK
Compression level: 9
comp(write):     6451.6 us, 1240.0 MB/s   Final bytes: 4511568  Ratio: 1.86
decomp(read):     205.7 us, -0.0 MB/s     FAILED.  Error code: -1
OK

Round-trip compr/decompr on 7.5 GB
Elapsed time:      21.9 s, 772.1 MB/s
faltet@francesc-Latitude-E6430:~/blosc/c-blosc-francesc/build$ ldd bench/bench 
        linux-vdso.so.1 =>  (0x00007ffe2ea55000)
        librt.so.1 => /lib/x86_64-linux-gnu/librt.so.1 (0x00007f0dfefe6000)
        libblosc.so.1 => /home/faltet/blosc/c-blosc-francesc/build/blosc/libblosc.so.1 (0x00007f0dfedc2000)
        libpthread.so.0 => /lib/x86_64-linux-gnu/libpthread.so.0 (0x00007f0dfeba4000)
        libz.so.1 => /lib/x86_64-linux-gnu/libz.so.1 (0x00007f0dfe98b000)
        libdensity.so => /usr/local/lib/libdensity.so (0x00007f0dfe46b000)
        libc.so.6 => /lib/x86_64-linux-gnu/libc.so.6 (0x00007f0dfe0a7000)
        libstdc++.so.6 => /usr/lib/x86_64-linux-gnu/libstdc++.so.6 (0x00007f0dfdd98000)
        libm.so.6 => /lib/x86_64-linux-gnu/libm.so.6 (0x00007f0dfda91000)
        libgcc_s.so.1 => /lib/x86_64-linux-gnu/libgcc_s.so.1 (0x00007f0dfd87a000)
        /lib64/ld-linux-x86-64.so.2 (0x00007f0dff21b000)
        libspookyhash.so => /usr/local/lib/libspookyhash.so (0x00007f0dfd676000)

The above is with the dev branch. With master:

$ bench/bench density single 4 8388608 8 32
Blosc version: 1.7.0.dev ($Date:: 2015-05-27 #$)
List of supported compressors in this build: blosclz,lz4,lz4hc,snappy,zlib,density
Supported compression libraries:
  BloscLZ: 1.0.5
  LZ4: 1.7.0
  Snappy: 1.1.1
  Zlib: 1.2.8
  DENSITY: 0.12.5
Using compressor: density
Running suite: single
--> 4, 8388608, 8, 32, density
********************** Run info ******************************
Blosc version: 1.7.0.dev ($Date:: 2015-05-27 #$)
Using synthetic data with 32 significant bits (out of 32)
Dataset size: 8388608 bytes     Type size: 8 bytes
Working set: 256.0 MB           Number of threads: 4
********************** Running benchmarks *********************
memcpy(write):           1786.6 us, 4477.7 MB/s
memcpy(read):            1331.7 us, 6007.2 MB/s
Compression level: 0
comp(write):     1306.5 us, 6123.3 MB/s   Final bytes: 8388624  Ratio: 1.00
decomp(read):    1400.4 us, 5712.8 MB/s   OK
Compression level: 1
comp(write):     54855.8 us, 145.8 MB/s   Final bytes: 5334032  Ratio: 1.57
decomp(read):     180.6 us, -0.0 MB/s     FAILED.  Error code: -1
OK
Compression level: 2
comp(write):     39616.5 us, 201.9 MB/s   Final bytes: 5334032  Ratio: 1.57
decomp(read):     150.2 us, -0.0 MB/s     FAILED.  Error code: -1
OK
Compression level: 3
comp(write):     41280.2 us, 193.8 MB/s   Final bytes: 5334032  Ratio: 1.57
decomp(read):     146.5 us, -0.0 MB/s     FAILED.  Error code: -1
OK
Compression level: 4
comp(write):     28674.1 us, 279.0 MB/s   Final bytes: 4895248  Ratio: 1.71
decomp(read):     160.3 us, -0.0 MB/s     FAILED.  Error code: -1
OK
Compression level: 5
comp(write):     21312.7 us, 375.4 MB/s   Final bytes: 4895248  Ratio: 1.71
decomp(read):     163.8 us, -0.0 MB/s     FAILED.  Error code: -1
OK
Compression level: 6
comp(write):     12716.5 us, 629.1 MB/s   Final bytes: 4675856  Ratio: 1.79
decomp(read):     183.4 us, -0.0 MB/s     FAILED.  Error code: -1
OK
Compression level: 7
comp(write):     8138.4 us, 983.0 MB/s    Final bytes: 4566672  Ratio: 1.84
decomp(read):     187.6 us, -0.0 MB/s     FAILED.  Error code: -1
OK
Compression level: 8
comp(write):     8028.2 us, 996.5 MB/s    Final bytes: 4566672  Ratio: 1.84
decomp(read):     188.3 us, -0.0 MB/s     FAILED.  Error code: -1
OK
Compression level: 9
comp(write):     6376.3 us, 1254.7 MB/s   Final bytes: 4511568  Ratio: 1.86
decomp(read):     183.5 us, -0.0 MB/s     FAILED.  Error code: -1
OK

Round-trip compr/decompr on 7.5 GB
Elapsed time:      22.0 s, 769.6 MB/s

So that's not any better.

from density.

g1mv avatar g1mv commented on August 14, 2024

Did you try to apply the patch I provided to c-blosc ?
I multiplied blocksize by 8, added a bogus size for the output buffer, and did a switch/case to select the various algorithms.

from density.

FrancescAlted avatar FrancescAlted commented on August 14, 2024

Ah, nope. I applied (part of) it here: FrancescAlted/c-blosc@f505fd8 . With this, I am no getting segfaults anymore:

$ bench/bench density single 4
Blosc version: 1.7.0.dev ($Date:: 2015-05-27 #$)
List of supported compressors in this build: blosclz,lz4,lz4hc,snappy,zlib,density
Supported compression libraries:
  BloscLZ: 1.0.5
  LZ4: 1.7.0
  Snappy: 1.1.1
  Zlib: 1.2.8
  DENSITY: 0.12.5
Using compressor: density
Running suite: single
--> 4, 2097152, 8, 19, density
********************** Run info ******************************
Blosc version: 1.7.0.dev ($Date:: 2015-05-27 #$)
Using synthetic data with 19 significant bits (out of 32)
Dataset size: 2097152 bytes     Type size: 8 bytes
Working set: 256.0 MB           Number of threads: 4
********************** Running benchmarks *********************
memcpy(write):            504.7 us, 3962.4 MB/s
memcpy(read):             242.4 us, 8250.7 MB/s
Compression level: 0
comp(write):      277.7 us, 7203.2 MB/s   Final bytes: 2097168  Ratio: 1.00
decomp(read):     217.5 us, 9194.7 MB/s   OK
Compression level: 1
comp(write):     2857.1 us, 700.0 MB/s    Final bytes: 1125520  Ratio: 1.86
decomp(read):    2587.3 us, 773.0 MB/s    OK
Compression level: 2
comp(write):     2846.0 us, 702.7 MB/s    Final bytes: 1125520  Ratio: 1.86
decomp(read):    2657.4 us, 752.6 MB/s    OK
Compression level: 3
comp(write):     2844.4 us, 703.1 MB/s    Final bytes: 1125520  Ratio: 1.86
decomp(read):    2668.1 us, 749.6 MB/s    OK
Compression level: 4
comp(write):     2073.3 us, 964.6 MB/s    Final bytes: 1119824  Ratio: 1.87
decomp(read):    1901.5 us, 1051.8 MB/s   OK
Compression level: 5
comp(write):     2081.0 us, 961.1 MB/s    Final bytes: 1119824  Ratio: 1.87
decomp(read):    1905.1 us, 1049.8 MB/s   OK
Compression level: 6
comp(write):     3007.8 us, 664.9 MB/s    Final bytes: 508336  Ratio: 4.13
decomp(read):    3583.5 us, 558.1 MB/s    OK
Compression level: 7
comp(write):     2442.5 us, 818.8 MB/s    Final bytes: 506016  Ratio: 4.14
decomp(read):    2812.5 us, 711.1 MB/s    OK
Compression level: 8
comp(write):     2366.7 us, 845.0 MB/s    Final bytes: 506016  Ratio: 4.14
decomp(read):    2819.0 us, 709.5 MB/s    OK
Compression level: 9
comp(write):     4928.5 us, 405.8 MB/s    Final bytes: 207086  Ratio: 10.13
decomp(read):    5828.6 us, 343.1 MB/s    OK

Round-trip compr/decompr on 7.5 GB
Elapsed time:      20.5 s, 822.2 MB/s

BTW, I am not changing the block size in benchmark because the current one (2 MB) is already a bit large for chunked datasets (for a hint on why small data chunks are important to us, see http://bcolz.blosc.org/).

Curiously enough, density works best without threading:

$ bench/bench density single 1   # use a single thread
Blosc version: 1.7.0.dev ($Date:: 2015-05-27 #$)
List of supported compressors in this build: blosclz,lz4,lz4hc,snappy,zlib,density
Supported compression libraries:
  BloscLZ: 1.0.5
  LZ4: 1.7.0
  Snappy: 1.1.1
  Zlib: 1.2.8
  DENSITY: 0.12.5
Using compressor: density
Running suite: single
--> 1, 2097152, 8, 19, density
********************** Run info ******************************
Blosc version: 1.7.0.dev ($Date:: 2015-05-27 #$)
Using synthetic data with 19 significant bits (out of 32)
Dataset size: 2097152 bytes     Type size: 8 bytes
Working set: 256.0 MB           Number of threads: 1
********************** Running benchmarks *********************
memcpy(write):            513.8 us, 3892.3 MB/s
memcpy(read):             251.5 us, 7953.0 MB/s
Compression level: 0
comp(write):      292.3 us, 6841.5 MB/s   Final bytes: 2097168  Ratio: 1.00
decomp(read):     267.0 us, 7491.4 MB/s   OK
Compression level: 1
comp(write):     1974.5 us, 1012.9 MB/s   Final bytes: 1125520  Ratio: 1.86
decomp(read):    1492.9 us, 1339.7 MB/s   OK
Compression level: 2
comp(write):     1902.8 us, 1051.1 MB/s   Final bytes: 1125520  Ratio: 1.86
decomp(read):    1507.6 us, 1326.6 MB/s   OK
Compression level: 3
comp(write):     1918.5 us, 1042.5 MB/s   Final bytes: 1125520  Ratio: 1.86
decomp(read):    1483.9 us, 1347.8 MB/s   OK
Compression level: 4
comp(write):     1709.0 us, 1170.2 MB/s   Final bytes: 1119824  Ratio: 1.87
decomp(read):    1265.1 us, 1580.9 MB/s   OK
Compression level: 5
comp(write):     1706.0 us, 1172.3 MB/s   Final bytes: 1119824  Ratio: 1.87
decomp(read):    1271.0 us, 1573.6 MB/s   OK
Compression level: 6
comp(write):     2314.7 us, 864.0 MB/s    Final bytes: 508336  Ratio: 4.13
decomp(read):    2700.3 us, 740.7 MB/s    OK
Compression level: 7
comp(write):     2402.9 us, 832.3 MB/s    Final bytes: 506016  Ratio: 4.14
decomp(read):    2859.0 us, 699.5 MB/s    OK
Compression level: 8
comp(write):     2443.8 us, 818.4 MB/s    Final bytes: 506016  Ratio: 4.14
decomp(read):    2844.4 us, 703.1 MB/s    OK
Compression level: 9
comp(write):     4945.3 us, 404.4 MB/s    Final bytes: 207086  Ratio: 10.13
decomp(read):    5818.2 us, 343.8 MB/s    OK

Round-trip compr/decompr on 7.5 GB
Elapsed time:      16.9 s, 1001.4 MB/s

Not sure exactly why.

from density.

FrancescAlted avatar FrancescAlted commented on August 14, 2024

Regarding your suggestion of testing Blosc on actual data, well, the gist of it is to work as a compressor for binary data, where zero bytes are, by far, the most common used. Also, the whole point about using the shuffle filter is to increase the probability of finding a run of zeroed bytes in buffers.

The fact is that Blosc works pretty well in practice as you can see for example in: https://www.youtube.com/watch?v=TZdqeEd7iTM or https://www.youtube.com/watch?v=kLP83HZvbfQ

from density.

g1mv avatar g1mv commented on August 14, 2024

That is very strange in regards to threading. On my test platform (Core i7 OS/X) here is what I get :

1 thread

$ bench/bench density single 1
Blosc version: 1.7.0.dev ($Date:: 2015-05-27 #$)
List of supported compressors in this build: blosclz,lz4,lz4hc,snappy,zlib,density
Supported compression libraries:
  BloscLZ: 1.0.5
  LZ4: 1.7.0
  Snappy: unknown
  Zlib: 1.2.5
  DENSITY: 0.12.6
Using compressor: density
Running suite: single
--> 1, 8388608, 8, 32, density
********************** Run info ******************************
Blosc version: 1.7.0.dev ($Date:: 2015-05-27 #$)
Using synthetic data with 32 significant bits (out of 32)
Dataset size: 8388608 bytes Type size: 8 bytes
Working set: 256.0 MB       Number of threads: 1
********************** Running benchmarks *********************
memcpy(write):       2366.5 us, 3380.5 MB/s
memcpy(read):        1228.9 us, 6509.6 MB/s
Compression level: 0
comp(write):     1268.7 us, 6305.8 MB/s   Final bytes: 8388624  Ratio: 1.00
decomp(read):    1374.2 us, 5821.7 MB/s   OK
Compression level: 1
comp(write):     8289.4 us, 965.1 MB/s    Final bytes: 4566672  Ratio: 1.84
decomp(read):    6334.8 us, 1262.9 MB/s   OK
Compression level: 2
comp(write):     8155.4 us, 980.9 MB/s    Final bytes: 4566672  Ratio: 1.84
decomp(read):    6509.8 us, 1228.9 MB/s   OK
Compression level: 3
comp(write):     8433.1 us, 948.6 MB/s    Final bytes: 4566672  Ratio: 1.84
decomp(read):    6459.7 us, 1238.4 MB/s   OK
Compression level: 4
comp(write):     6900.0 us, 1159.4 MB/s   Final bytes: 4511568  Ratio: 1.86
decomp(read):    4903.2 us, 1631.6 MB/s   OK
Compression level: 5
comp(write):     6945.7 us, 1151.8 MB/s   Final bytes: 4511568  Ratio: 1.86
decomp(read):    4941.9 us, 1618.8 MB/s   OK
Compression level: 6
comp(write):     8646.8 us, 925.2 MB/s    Final bytes: 3622608  Ratio: 2.32
decomp(read):    9722.9 us, 822.8 MB/s    OK
Compression level: 7
comp(write):     7820.2 us, 1023.0 MB/s   Final bytes: 3601120  Ratio: 2.33
decomp(read):    8835.1 us, 905.5 MB/s    OK
Compression level: 8
comp(write):     7845.3 us, 1019.7 MB/s   Final bytes: 3601120  Ratio: 2.33
decomp(read):    8817.7 us, 907.3 MB/s    OK
Compression level: 9
comp(write):     21697.2 us, 368.7 MB/s   Final bytes: 1887328  Ratio: 4.44
decomp(read):    23950.2 us, 334.0 MB/s   OK

Round-trip compr/decompr on 7.5 GB
Elapsed time:      16.5 s, 1022.6 MB/s

2 threads

$ bench/bench density single 2
Blosc version: 1.7.0.dev ($Date:: 2015-05-27 #$)
List of supported compressors in this build: blosclz,lz4,lz4hc,snappy,zlib,density
Supported compression libraries:
  BloscLZ: 1.0.5
  LZ4: 1.7.0
  Snappy: unknown
  Zlib: 1.2.5
  DENSITY: 0.12.6
Using compressor: density
Running suite: single
--> 2, 8388608, 8, 32, density
********************** Run info ******************************
Blosc version: 1.7.0.dev ($Date:: 2015-05-27 #$)
Using synthetic data with 32 significant bits (out of 32)
Dataset size: 8388608 bytes Type size: 8 bytes
Working set: 256.0 MB       Number of threads: 2
********************** Running benchmarks *********************
memcpy(write):       2292.8 us, 3489.3 MB/s
memcpy(read):        1232.9 us, 6488.8 MB/s
Compression level: 0
comp(write):     1088.8 us, 7347.3 MB/s   Final bytes: 8388624  Ratio: 1.00
decomp(read):    1307.0 us, 6120.7 MB/s   OK
Compression level: 1
comp(write):     4619.7 us, 1731.7 MB/s   Final bytes: 4566672  Ratio: 1.84
decomp(read):    3784.3 us, 2114.0 MB/s   OK
Compression level: 2
comp(write):     4642.2 us, 1723.3 MB/s   Final bytes: 4566672  Ratio: 1.84
decomp(read):    3688.3 us, 2169.0 MB/s   OK
Compression level: 3
comp(write):     4585.2 us, 1744.7 MB/s   Final bytes: 4566672  Ratio: 1.84
decomp(read):    3743.4 us, 2137.1 MB/s   OK
Compression level: 4
comp(write):     3968.9 us, 2015.7 MB/s   Final bytes: 4511568  Ratio: 1.86
decomp(read):    2929.8 us, 2730.5 MB/s   OK
Compression level: 5
comp(write):     3946.0 us, 2027.4 MB/s   Final bytes: 4511568  Ratio: 1.86
decomp(read):    2964.6 us, 2698.5 MB/s   OK
Compression level: 6
comp(write):     5236.9 us, 1527.6 MB/s   Final bytes: 3622608  Ratio: 2.32
decomp(read):    5659.9 us, 1413.5 MB/s   OK
Compression level: 7
comp(write):     6199.0 us, 1290.5 MB/s   Final bytes: 3601120  Ratio: 2.33
decomp(read):    6393.8 us, 1251.2 MB/s   OK
Compression level: 8
comp(write):     6170.7 us, 1296.4 MB/s   Final bytes: 3601120  Ratio: 2.33
decomp(read):    6286.6 us, 1272.5 MB/s   OK
Compression level: 9
comp(write):     10581.0 us, 756.1 MB/s   Final bytes: 1887328  Ratio: 4.44
decomp(read):    11585.6 us, 690.5 MB/s   OK

Round-trip compr/decompr on 7.5 GB
Elapsed time:       9.9 s, 1699.6 MB/s

4 threads

$ bench/bench density single 4
Blosc version: 1.7.0.dev ($Date:: 2015-05-27 #$)
List of supported compressors in this build: blosclz,lz4,lz4hc,snappy,zlib,density
Supported compression libraries:
  BloscLZ: 1.0.5
  LZ4: 1.7.0
  Snappy: unknown
  Zlib: 1.2.5
  DENSITY: 0.12.6
Using compressor: density
Running suite: single
--> 4, 8388608, 8, 32, density
********************** Run info ******************************
Blosc version: 1.7.0.dev ($Date:: 2015-05-27 #$)
Using synthetic data with 32 significant bits (out of 32)
Dataset size: 8388608 bytes Type size: 8 bytes
Working set: 256.0 MB       Number of threads: 4
********************** Running benchmarks *********************
memcpy(write):       2379.6 us, 3362.0 MB/s
memcpy(read):        1199.0 us, 6672.4 MB/s
Compression level: 0
comp(write):     1090.6 us, 7335.2 MB/s   Final bytes: 8388624  Ratio: 1.00
decomp(read):    1305.6 us, 6127.5 MB/s   OK
Compression level: 1
comp(write):     2906.1 us, 2752.9 MB/s   Final bytes: 4566672  Ratio: 1.84
decomp(read):    2453.8 us, 3260.3 MB/s   OK
Compression level: 2
comp(write):     2772.4 us, 2885.6 MB/s   Final bytes: 4566672  Ratio: 1.84
decomp(read):    2427.6 us, 3295.4 MB/s   OK
Compression level: 3
comp(write):     2786.6 us, 2870.9 MB/s   Final bytes: 4566672  Ratio: 1.84
decomp(read):    2404.4 us, 3327.3 MB/s   OK
Compression level: 4
comp(write):     2714.1 us, 2947.5 MB/s   Final bytes: 4511568  Ratio: 1.86
decomp(read):    2168.6 us, 3689.0 MB/s   OK
Compression level: 5
comp(write):     2717.3 us, 2944.1 MB/s   Final bytes: 4511568  Ratio: 1.86
decomp(read):    2152.0 us, 3717.5 MB/s   OK
Compression level: 6
comp(write):     4490.2 us, 1781.7 MB/s   Final bytes: 3622608  Ratio: 2.32
decomp(read):    4443.0 us, 1800.6 MB/s   OK
Compression level: 7
comp(write):     4247.7 us, 1883.4 MB/s   Final bytes: 3601120  Ratio: 2.33
decomp(read):    4253.4 us, 1880.9 MB/s   OK
Compression level: 8
comp(write):     4250.4 us, 1882.2 MB/s   Final bytes: 3601120  Ratio: 2.33
decomp(read):    4271.5 us, 1872.9 MB/s   OK
Compression level: 9
comp(write):     11015.6 us, 726.2 MB/s   Final bytes: 1887328  Ratio: 4.44
decomp(read):    12085.9 us, 661.9 MB/s   OK

Round-trip compr/decompr on 7.5 GB
Elapsed time:       7.8 s, 2166.7 MB/s

So threading is visibly improving things, apart maybe for the 4-thread lion vs 2-thread.

from density.

g1mv avatar g1mv commented on August 14, 2024

But after further comparisons yes you're right, it seems like snappy for example scales better with multithreading (goes from 25.5s on 1-thread to 8.2s on 4-thread which is 3 times faster).

BTW there is a slight overhead in setting up a buffer in density as buffer initialization involves some malloc, that's why I had increased the blocksize and maybe that's the reason heavy multithreading is not helping a lot with small block sizes (the small overhead in setting up compression is probably what actually limits the scalability).
I'll look into this further when I get more time, there might be a way to avoid all overhead by slightly modifying the API, and I think it could be worth it in use cases like yours, so thanks for pointing it out 😄

In regards to blosc and binary data, yes I understand what you are trying to do ! The only problem with random data is that you actually deny any obvious "patterns" in non-zero data which inevitably appear when manipulating "human" data.
Since the function you're using is perfectly random, on one side you'll get a predictable number of zeroes in unpredictable order, but on the other all non-zero data will essentially be pattern-free which is not very realistic.
What I mean is that blosc could be very good with this synthetic data - I'm sure it's the case - but less performing with real data as it might "break" some non-zero patterns by "splitting" them which could lead to a compression ratio downgrade.
For example, let's say you want to compress :
ABCDEABCDEABCDEABCDEABCD (24 symbols)
A good compression algorithm will spot a pattern and go :
ABCDEABCDEABCDEABCDEABCD => that's 4 x ABCDE and 1 x ABCD => easy and efficient compression.
However if you split it in 3 blocks of 8 (blosc processing) you get : ABCDEABC DEABCDEA BCDEABCD
Now, each individual block doesn't exhibit any obvious pattern and the same compression algorithm will actually generate very poor results.

from density.

FrancescAlted avatar FrancescAlted commented on August 14, 2024

Yes, the malloc call inside density could be the root of poor threading scalability. Thanks for willing to tackle this.

Blosc does not shuffle using 8 bytes blocks by default, but rather the size of the datatype that you are compressing (2 for short int, 4 for int and float32, 8 for long int and float64 and other sizes for structs too). Using this datatype size is critical for the reasons explained in the talks above.

Regarding real data, you may want to have a look at this notebook:

http://nbviewer.ipython.org/github/Blosc/movielens-bench/blob/master/querying-ep14.ipynb

where real data is being used and where you can see that compression ratio can reach a 20x in this case. Also, it can be seen that some operations takes less time (on decent modern computer) on compressed datasets than in uncompressed ones.

from density.

g1mv avatar g1mv commented on August 14, 2024

Needs retesting with 0.14.0

from density.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.