Code Monkey home page Code Monkey logo

c-blosc2's Introduction

C-Blosc2

A fast, compressed and persistent data store library for C

Author:Blosc Development Team
Contact:[email protected]
URL:https://www.blosc.org
Gitter:Join the chat at https://gitter.im/Blosc/c-blosc
Actions:actions
NumFOCUS:numfocus
Code of Conduct:Contributor Covenant

What is it?

Blosc is a high performance compressor optimized for binary data (i.e. floating point numbers, integers and booleans, although it can handle string data too). It has been designed to transmit data to the processor cache faster than the traditional, non-compressed, direct memory fetch approach via a memcpy() OS call. Blosc main goal is not just to reduce the size of large datasets on-disk or in-memory, but also to accelerate memory-bound computations.

C-Blosc2 is the new major version of C-Blosc, and is backward compatible with both the C-Blosc1 API and its in-memory format. However, the reverse thing is generally not true for the format; buffers generated with C-Blosc2 are not format-compatible with C-Blosc1 (i.e. forward compatibility is not supported). In case you want to ensure full API compatibility with C-Blosc1 API, define the BLOSC1_COMPAT symbol.

See a 3 minutes introductory video to Blosc2.

Blosc2 NDim: an N-Dimensional store

One of the latest and more exciting additions in C-Blosc2 is the Blosc2 NDim layer (or b2nd for short), allowing to create and read n-dimensional datasets in an extremely efficient way thanks to a n-dim 2-level partitioning, that allows to slice and dice arbitrary large and compressed data in a more fine-grained way:

https://github.com/Blosc/c-blosc2/blob/main/images/b2nd-2level-parts.png?raw=true

To wet you appetite, here it is how the NDArray object in the Python wrapper performs on getting slices orthogonal to the different axis of a 4-dim dataset:

https://github.com/Blosc/c-blosc2/blob/main/images/Read-Partial-Slices-B2ND.png?raw=true

We have blogged about this: https://www.blosc.org/posts/blosc2-ndim-intro

We also have a ~2 min explanatory video on why slicing in a pineapple-style (aka double partition) is useful:

Slicing a dataset in pineapple-style

New features in C-Blosc2

  • 64-bit containers: the first-class container in C-Blosc2 is the super-chunk or, for brevity, schunk, that is made by smaller chunks which are essentially C-Blosc1 32-bit containers. The super-chunk can be backed or not by another container which is called a frame (see later).
  • NDim containers (b2nd): allow to store n-dimensional data that can efficiently read datasets in slices that can be n-dimensional too. To achieve this, a n-dimensional 2-level partitioning has been implemented. This capabilities were formerly part of Caterva, and now it is included in C-Blosc2 for convenience. Caterva is now deprecated.
  • More filters: besides shuffle and bitshuffle already present in C-Blosc1, C-Blosc2 already implements:
    • bytedelta: calculates the difference between bytes in a block that has been shuffled already. We have blogged about bytedelta.
    • delta: the stored blocks inside a chunk are diff'ed with respect to first block in the chunk. The idea is that, in some situations, the diff will have more zeros than the original data, leading to better compression.
    • trunc_prec: it zeroes the least significant bits of the mantissa of float32 and float64 types. When combined with the shuffle or bitshuffle filter, this leads to more contiguous zeros, which are compressed better.
  • A filter pipeline: the different filters can be pipelined so that the output of one can the input for the other. A possible example is a delta followed by shuffle, or as described above, trunc_prec followed by bitshuffle.
  • Prefilters: allow to apply user-defined C callbacks prior the filter pipeline during compression. See test_prefilter.c for an example of use.
  • Postfilters: allow to apply user-defined C callbacks after the filter pipeline during decompression. The combination of prefilters and postfilters could be interesting for supporting e.g. encryption (via prefilters) and decryption (via postfilters). Also, a postfilter alone can be used to produce on-the-flight computation based on existing data (or other metadata, like e.g. coordinates). See test_postfilter.c for an example of use.
  • SIMD support for ARM (NEON): this allows for faster operation on ARM architectures. Only shuffle is supported right now, but the idea is to implement bitshuffle for NEON too. Thanks to Lucian Marc.
  • SIMD support for PowerPC (ALTIVEC): this allows for faster operation on PowerPC architectures. Both shuffle and bitshuffle are supported; however, this has been done via a transparent mapping from SSE2 into ALTIVEC emulation in GCC 8, so performance could be better (but still, it is already a nice improvement over native C code; see PR #59 for details). Thanks to Jerome Kieffer and ESRF for sponsoring the Blosc team in helping him in this task.
  • Dictionaries: when a block is going to be compressed, C-Blosc2 can use a previously made dictionary (stored in the header of the super-chunk) for compressing all the blocks that are part of the chunks. This usually improves the compression ratio, as well as the decompression speed, at the expense of a (small) overhead in compression speed. Currently, it is only supported in the zstd codec, but would be nice to extend it to lz4 and blosclz at least.
  • Contiguous frames: allow to store super-chunks contiguously, either on-disk or in-memory. When a super-chunk is backed by a frame, instead of storing all the chunks sparsely in-memory, they are serialized inside the frame container. The frame can be stored on-disk too, meaning that persistence of super-chunks is supported.
  • Sparse frames: each chunk in a super-chunk is stored in a separate file or different memory area, as well as the metadata. This is allows for more efficient updates/deletes than in contiguous frames (i.e. avoiding 'holes' in monolithic files). The drawback is that it consumes more inodes when on-disk. Thanks to Marta Iborra for this contribution.
  • Partial chunk reads: there is support for reading just part of chunks, so avoiding to read the whole thing and then discard the unnecessary data.
  • Parallel chunk reads: when several blocks of a chunk are to be read, this is done in parallel by the decompressing machinery. That means that every thread is responsible to read, post-filter and decompress a block by itself, leading to an efficient overlap of I/O and CPU usage that optimizes reads to a maximum.
  • Meta-layers: optionally, the user can add meta-data for different uses and in different layers. For example, one may think on providing a meta-layer for NumPy so that most of the meta-data for it is stored in a meta-layer; then, one can place another meta-layer on top of the latter for adding more high-level info if desired (e.g. geo-spatial, meteorological...).
  • Variable length meta-layers: the user may want to add variable-length meta information that can be potentially very large (up to 2 GB). The regular meta-layer described above is very quick to read, but meant to store fixed-length and relatively small meta information. Variable length metalayers are stored in the trailer of a frame, whereas regular meta-layers are in the header.
  • Efficient support for special values: large sequences of repeated values can be represented with an efficient, simple and fast run-length representation, without the need to use regular codecs. With that, chunks or super-chunks with values that are the same (zeros, NaNs or any value in general) can be built in constant time, regardless of the size. This can be useful in situations where a lot of zeros (or NaNs) need to be stored (e.g. sparse matrices).
  • Nice markup for documentation: we are currently using a combination of Sphinx + Doxygen + Breathe for documenting the C-API. See https://www.blosc.org/c-blosc2/c-blosc2.html. Thanks to Alberto Sabater and Aleix Alcacer for contributing the support for this.
  • Plugin capabilities for filters and codecs: we have a plugin register capability inplace so that the info about the new filters and codecs can be persisted and transmitted to different machines. See https://github.com/Blosc/c-blosc2/blob/main/examples/urfilters.c for a self-contained example. Thanks to the NumFOCUS foundation for providing a grant for doing this, and Oscar Griñón and Aleix Alcacer for the implementation.
  • Pluggable tuning capabilities: this will allow users with different needs to define an interface so as to better tune different parameters like the codec, the compression level, the filters to use, the blocksize or the shuffle size. Thanks to ironArray for sponsoring us in doing this.
  • Support for I/O plugins: so that users can extend the I/O capabilities beyond the current filesystem support. Things like the use of databases or S3 interfaces should be possible by implementing these interfaces. Thanks to ironArray for sponsoring us in doing this.
  • Python wrapper: we have a preliminary wrapper in the works. You can have a look at our ongoing efforts in the python-blosc2 repo. Thanks to the Python Software Foundation for providing a grant for doing this.
  • Security: we are actively using using the OSS-Fuzz and ClusterFuzz for uncovering programming errors in C-Blosc2. Thanks to Google for sponsoring us in doing this, and to Nathan Moinvaziri for most of the work here.

More info about the improved capabilities of C-Blosc2 can be found in this talk.

C-Blosc2 API and format have been frozen, and that means that there is guarantee that your programs will continue to work with future versions of the library, and that next releases will be able to read from persistent storage generated from previous releases (as of 2.0.0).

Open format

The Blosc2 format is open and documented in the next documents:

All these documents take less than 1000 lines of text, so they should be easy to read and understand. In our opinion, this is very important for the long-term success of the library, as it allows for third-party implementations of the format, and also for the users to understand what is going on under the hood.

Python wrapper

We are officially supporting (thanks to the Python Software Foundation) a Python wrapper for Blosc2. It supports all the features of the predecessor python-blosc package plus most of the bells and whistles of C-Blosc2, like 64-bit and multidimensional containers. As a bonus, the python-blosc2 package comes with wheels and binary versions of the C-Blosc2 libraries, so anyone, even non-Python users can install C-Blosc2 binaries easily with:

pip install blosc2

Compiling the C-Blosc2 library with CMake

Blosc can be built, tested and installed using CMake. The following procedure describes a typical CMake build.

Create the build directory inside the sources and move into it:

git clone https://github.com/Blosc/c-blosc2
cd c-blosc2
mkdir build
cd build

Now run CMake configuration and optionally specify the installation directory (e.g. '/usr' or '/usr/local'):

cmake -DCMAKE_INSTALL_PREFIX=your_install_prefix_directory ..

CMake allows to configure Blosc in many different ways, like preferring internal or external sources for compressors or enabling/disabling them. Please note that configuration can also be performed using UI tools provided by CMake (ccmake or cmake-gui):

ccmake ..      # run a curses-based interface
cmake-gui ..   # run a graphical interface

Build, test and install Blosc:

cmake --build .
ctest
cmake --build . --target install

The static and dynamic version of the Blosc library, together with header files, will be installed into the specified CMAKE_INSTALL_PREFIX.

Once you have compiled your Blosc library, you can easily link your apps with it as shown in the examples/ directory.

Handling support for codecs (LZ4, LZ4HC, Zstd, Zlib)

C-Blosc2 comes with full sources for LZ4, LZ4HC, Zstd, and Zlib and in general, you should not worry about not having (or CMake not finding) the libraries in your system because by default the included sources will be automatically compiled and included in the C-Blosc2 library. This means that you can be confident in having a complete support for all these codecs in all the official Blosc deployments. Of course, if you consider this is too bloated, you can exclude support for some of them.

For example, let's suppose that you want to disable support for Zstd:

cmake -DDEACTIVATE_ZSTD=ON ..

Or, you may want to use a codec in an external library already in the system:

cmake -DPREFER_EXTERNAL_LZ4=ON ..

Supported platforms

C-Blosc2 is meant to support all platforms where a C99 compliant C compiler can be found. The ones that are mostly tested are Intel (Linux, Mac OSX and Windows), ARM (Linux, Mac), and PowerPC (Linux). More on ARM support in README_ARM.rst.

For Windows, you will need at least VS2015 or higher on x86 and x64 targets (i.e. ARM is not supported on Windows).

For Mac OSX, make sure that you have the command line developer tools available. You can always install them with:

xcode-select --install

For Mac OSX on arm64 architecture, you may want to compile it like this:

CC="clang -arch arm64" cmake ..

Display error messages

By default error messages are disabled. To display them, you just need to activate the Blosc tracing machinery by setting the BLOSC_TRACE environment variable.

Contributing

If you want to collaborate in this development you are welcome. We need help in the different areas listed at the ROADMAP; also, be sure to read our DEVELOPING-GUIDE and our Code of Conduct. Blosc is distributed using the BSD license.

Tweeter feed

Follow @Blosc2 so as to get informed about the latest developments.

Citing Blosc

You can cite our work on the different libraries under the Blosc umbrella as:

@ONLINE{blosc,
  author = {{Blosc Development Team}},
  title = "{A fast, compressed and persistent data store library}",
  year = {2009-2023},
  note = {https://blosc.org}
}

Acknowledgments

See THANKS document.


-- Blosc Development Team. We make compression better.

c-blosc2's People

Contributors

albertosm27 avatar aleixalcacer avatar avalentino avatar ax3l avatar biswa96 avatar crspeller avatar datapythonista avatar derobins avatar dimitripapadopoulos avatar esc avatar francescalted avatar froody avatar havardaasen avatar hmaarrfk avatar ihsinme avatar ivilata avatar jack-pappas avatar jansellner avatar jdavid avatar juliantaylor avatar kif avatar lucianmarc avatar martaiborra avatar milesgranger avatar nmoinvaz avatar oscargm98 avatar pkubaj avatar stevengj avatar t20100 avatar tnorth avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

c-blosc2's Issues

Make author more comprehensive

There are many contributions to Blosc codebase, so we must replace "Francesc Alted" by "Blosc Development Team" all throughout the different Blosc projects.

Memory overwrite in test_maxout

This is what valgrind reports:

$ valgrind tests/test_maxout
==363== Memcheck, a memory error detector
==363== Copyright (C) 2002-2015, and GNU GPL'd, by Julian Seward et al.
==363== Using Valgrind-3.11.0 and LibVEX; rerun with -h for copyright info
==363== Command: tests/test_maxout
==363== 
==363== Invalid write of size 8
==363==    at 0x4C326CB: memcpy@@GLIBC_2.14 (in /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so)
==363==    by 0x4E52427: memcpy (string3.h:53)
==363==    by 0x4E52427: blosc_compress_context (blosc.c:1524)
==363==    by 0x4E541F5: blosc_compress (blosc.c:1784)
==363==    by 0x400FE9: test_maxout_equal (test_maxout.c:41)
==363==    by 0x400FE9: all_tests (test_maxout.c:70)
==363==    by 0x400FE9: main (test_maxout.c:100)
==363==  Address 0x56df458 is 0 bytes after a block of size 1,016 alloc'd
==363==    at 0x4C2FFC6: memalign (in /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so)
==363==    by 0x4C300D1: posix_memalign (in /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so)
==363==    by 0x401277: blosc_test_malloc.constprop.1 (test_common.h:73)
==363==    by 0x400D4D: main (test_maxout.c:91)

Wrong data returned from getitem (DeepState fuzzing)

This is a minimal test case; making any value smaller causes it to pass.

TRACE: TestCBlosc2.cpp(37): Performing 1 round trips.
TRACE: TestCBlosc2.cpp(40): *******************************   Starting run #0   *******************************
TRACE: TestCBlosc2.cpp(55): Compression task: type_size: 131 ; num_elements: 1 ; buffer_alignment: 32; compression_level: 6; do_shuffle: 0; compressor: blosclz; delta: 1; buffer_size: 131
TRACE: TestCBlosc2.cpp(69): original[0] = 0
TRACE: TestCBlosc2.cpp(69): original[1] = 0
TRACE: TestCBlosc2.cpp(69): original[2] = 0
TRACE: TestCBlosc2.cpp(69): original[3] = 0
TRACE: TestCBlosc2.cpp(69): original[4] = 0
TRACE: TestCBlosc2.cpp(69): original[5] = 0
TRACE: TestCBlosc2.cpp(69): original[6] = 0
TRACE: TestCBlosc2.cpp(69): original[7] = 0
TRACE: TestCBlosc2.cpp(69): original[8] = 0
TRACE: TestCBlosc2.cpp(69): original[9] = 0
TRACE: TestCBlosc2.cpp(69): original[10] = 0
TRACE: TestCBlosc2.cpp(69): original[11] = 0
TRACE: TestCBlosc2.cpp(69): original[12] = 0
TRACE: TestCBlosc2.cpp(69): original[13] = 0
TRACE: TestCBlosc2.cpp(69): original[14] = 0
TRACE: TestCBlosc2.cpp(69): original[15] = 0
TRACE: TestCBlosc2.cpp(69): original[16] = 0
TRACE: TestCBlosc2.cpp(69): original[17] = 0
TRACE: TestCBlosc2.cpp(69): original[18] = 0
TRACE: TestCBlosc2.cpp(69): original[19] = 0
TRACE: TestCBlosc2.cpp(69): original[20] = 0
TRACE: TestCBlosc2.cpp(69): original[21] = 0
TRACE: TestCBlosc2.cpp(69): original[22] = 0
TRACE: TestCBlosc2.cpp(69): original[23] = 0
TRACE: TestCBlosc2.cpp(69): original[24] = 0
TRACE: TestCBlosc2.cpp(69): original[25] = 0
TRACE: TestCBlosc2.cpp(69): original[26] = 0
TRACE: TestCBlosc2.cpp(69): original[27] = 0
TRACE: TestCBlosc2.cpp(69): original[28] = 0
TRACE: TestCBlosc2.cpp(69): original[29] = 0
TRACE: TestCBlosc2.cpp(69): original[30] = 0
TRACE: TestCBlosc2.cpp(69): original[31] = 0
TRACE: TestCBlosc2.cpp(69): original[32] = 0
TRACE: TestCBlosc2.cpp(69): original[33] = 0
TRACE: TestCBlosc2.cpp(69): original[34] = 0
TRACE: TestCBlosc2.cpp(69): original[35] = 0
TRACE: TestCBlosc2.cpp(69): original[36] = 0
TRACE: TestCBlosc2.cpp(69): original[37] = 0
TRACE: TestCBlosc2.cpp(69): original[38] = 0
TRACE: TestCBlosc2.cpp(69): original[39] = 0
TRACE: TestCBlosc2.cpp(69): original[40] = 0
TRACE: TestCBlosc2.cpp(69): original[41] = 0
TRACE: TestCBlosc2.cpp(69): original[42] = 0
TRACE: TestCBlosc2.cpp(69): original[43] = 0
TRACE: TestCBlosc2.cpp(69): original[44] = 0
TRACE: TestCBlosc2.cpp(69): original[45] = 0
TRACE: TestCBlosc2.cpp(69): original[46] = 0
TRACE: TestCBlosc2.cpp(69): original[47] = 0
TRACE: TestCBlosc2.cpp(69): original[48] = 0
TRACE: TestCBlosc2.cpp(69): original[49] = 0
TRACE: TestCBlosc2.cpp(69): original[50] = 0
TRACE: TestCBlosc2.cpp(69): original[51] = 0
TRACE: TestCBlosc2.cpp(69): original[52] = 0
TRACE: TestCBlosc2.cpp(69): original[53] = 0
TRACE: TestCBlosc2.cpp(69): original[54] = 0
TRACE: TestCBlosc2.cpp(69): original[55] = 0
TRACE: TestCBlosc2.cpp(69): original[56] = 0
TRACE: TestCBlosc2.cpp(69): original[57] = 0
TRACE: TestCBlosc2.cpp(69): original[58] = 0
TRACE: TestCBlosc2.cpp(69): original[59] = 0
TRACE: TestCBlosc2.cpp(69): original[60] = 0
TRACE: TestCBlosc2.cpp(69): original[61] = 0
TRACE: TestCBlosc2.cpp(69): original[62] = 0
TRACE: TestCBlosc2.cpp(69): original[63] = 0
TRACE: TestCBlosc2.cpp(69): original[64] = 0
TRACE: TestCBlosc2.cpp(69): original[65] = 0
TRACE: TestCBlosc2.cpp(69): original[66] = 0
TRACE: TestCBlosc2.cpp(69): original[67] = 0
TRACE: TestCBlosc2.cpp(69): original[68] = 0
TRACE: TestCBlosc2.cpp(69): original[69] = 0
TRACE: TestCBlosc2.cpp(69): original[70] = 0
TRACE: TestCBlosc2.cpp(69): original[71] = 0
TRACE: TestCBlosc2.cpp(69): original[72] = 0
TRACE: TestCBlosc2.cpp(69): original[73] = 0
TRACE: TestCBlosc2.cpp(69): original[74] = 0
TRACE: TestCBlosc2.cpp(69): original[75] = 0
TRACE: TestCBlosc2.cpp(69): original[76] = 0
TRACE: TestCBlosc2.cpp(69): original[77] = 0
TRACE: TestCBlosc2.cpp(69): original[78] = 0
TRACE: TestCBlosc2.cpp(69): original[79] = 0
TRACE: TestCBlosc2.cpp(69): original[80] = 0
TRACE: TestCBlosc2.cpp(69): original[81] = 0
TRACE: TestCBlosc2.cpp(69): original[82] = 0
TRACE: TestCBlosc2.cpp(69): original[83] = 0
TRACE: TestCBlosc2.cpp(69): original[84] = 0
TRACE: TestCBlosc2.cpp(69): original[85] = 0
TRACE: TestCBlosc2.cpp(69): original[86] = 0
TRACE: TestCBlosc2.cpp(69): original[87] = 0
TRACE: TestCBlosc2.cpp(69): original[88] = 0
TRACE: TestCBlosc2.cpp(69): original[89] = 0
TRACE: TestCBlosc2.cpp(69): original[90] = 0
TRACE: TestCBlosc2.cpp(69): original[91] = 0
TRACE: TestCBlosc2.cpp(69): original[92] = 0
TRACE: TestCBlosc2.cpp(69): original[93] = 0
TRACE: TestCBlosc2.cpp(69): original[94] = 0
TRACE: TestCBlosc2.cpp(69): original[95] = 0
TRACE: TestCBlosc2.cpp(69): original[96] = 0
TRACE: TestCBlosc2.cpp(69): original[97] = 0
TRACE: TestCBlosc2.cpp(69): original[98] = 0
TRACE: TestCBlosc2.cpp(69): original[99] = 0
TRACE: TestCBlosc2.cpp(69): original[100] = 0
TRACE: TestCBlosc2.cpp(69): original[101] = 0
TRACE: TestCBlosc2.cpp(69): original[102] = 0
TRACE: TestCBlosc2.cpp(69): original[103] = 0
TRACE: TestCBlosc2.cpp(69): original[104] = 0
TRACE: TestCBlosc2.cpp(69): original[105] = 0
TRACE: TestCBlosc2.cpp(69): original[106] = 0
TRACE: TestCBlosc2.cpp(69): original[107] = 0
TRACE: TestCBlosc2.cpp(69): original[108] = 0
TRACE: TestCBlosc2.cpp(69): original[109] = 0
TRACE: TestCBlosc2.cpp(69): original[110] = 0
TRACE: TestCBlosc2.cpp(69): original[111] = 0
TRACE: TestCBlosc2.cpp(69): original[112] = 0
TRACE: TestCBlosc2.cpp(69): original[113] = 0
TRACE: TestCBlosc2.cpp(69): original[114] = 0
TRACE: TestCBlosc2.cpp(69): original[115] = 0
TRACE: TestCBlosc2.cpp(69): original[116] = 0
TRACE: TestCBlosc2.cpp(69): original[117] = 0
TRACE: TestCBlosc2.cpp(69): original[118] = 0
TRACE: TestCBlosc2.cpp(69): original[119] = 0
TRACE: TestCBlosc2.cpp(69): original[120] = 0
TRACE: TestCBlosc2.cpp(69): original[121] = 0
TRACE: TestCBlosc2.cpp(69): original[122] = 0
TRACE: TestCBlosc2.cpp(69): original[123] = 0
TRACE: TestCBlosc2.cpp(69): original[124] = 0
TRACE: TestCBlosc2.cpp(69): original[125] = 0
TRACE: TestCBlosc2.cpp(69): original[126] = 0
TRACE: TestCBlosc2.cpp(69): original[127] = 0
TRACE: TestCBlosc2.cpp(69): original[128] = 1
TRACE: TestCBlosc2.cpp(69): original[129] = 0
TRACE: TestCBlosc2.cpp(69): original[130] = 0
TRACE: TestCBlosc2.cpp(81): # uncompressed bytes = 131
TRACE: TestCBlosc2.cpp(83): # compressed bytes = 34
TRACE: TestCBlosc2.cpp(84): block size = 131
TRACE: TestCBlosc2.cpp(90): typesize: 131; DOSHUFFLE: 0; DOBITSHUFFLE: 0; DODELTA: 8; MEMCPYED: 0
TRACE: TestCBlosc2.cpp(98): compressor: BloscLZ
TRACE: TestCBlosc2.cpp(111): Performing 1 non-buffer-changing actions.
TRACE: TestCBlosc2.cpp(128): Getting 1 from 0
TRACE: TestCBlosc2.cpp(139): [0][0]: original: 0; items: 0
TRACE: TestCBlosc2.cpp(139): [0][1]: original: 0; items: 0
TRACE: TestCBlosc2.cpp(139): [0][2]: original: 0; items: 0
TRACE: TestCBlosc2.cpp(139): [0][3]: original: 0; items: 0
TRACE: TestCBlosc2.cpp(139): [0][4]: original: 0; items: 0
TRACE: TestCBlosc2.cpp(139): [0][5]: original: 0; items: 0
TRACE: TestCBlosc2.cpp(139): [0][6]: original: 0; items: 0
TRACE: TestCBlosc2.cpp(139): [0][7]: original: 0; items: 0
TRACE: TestCBlosc2.cpp(139): [0][8]: original: 0; items: 0
TRACE: TestCBlosc2.cpp(139): [0][9]: original: 0; items: 0
TRACE: TestCBlosc2.cpp(139): [0][10]: original: 0; items: 0
TRACE: TestCBlosc2.cpp(139): [0][11]: original: 0; items: 0
TRACE: TestCBlosc2.cpp(139): [0][12]: original: 0; items: 0
TRACE: TestCBlosc2.cpp(139): [0][13]: original: 0; items: 0
TRACE: TestCBlosc2.cpp(139): [0][14]: original: 0; items: 0
TRACE: TestCBlosc2.cpp(139): [0][15]: original: 0; items: 0
TRACE: TestCBlosc2.cpp(139): [0][16]: original: 0; items: 0
TRACE: TestCBlosc2.cpp(139): [0][17]: original: 0; items: 0
TRACE: TestCBlosc2.cpp(139): [0][18]: original: 0; items: 0
TRACE: TestCBlosc2.cpp(139): [0][19]: original: 0; items: 0
TRACE: TestCBlosc2.cpp(139): [0][20]: original: 0; items: 0
TRACE: TestCBlosc2.cpp(139): [0][21]: original: 0; items: 0
TRACE: TestCBlosc2.cpp(139): [0][22]: original: 0; items: 0
TRACE: TestCBlosc2.cpp(139): [0][23]: original: 0; items: 0
TRACE: TestCBlosc2.cpp(139): [0][24]: original: 0; items: 0
TRACE: TestCBlosc2.cpp(139): [0][25]: original: 0; items: 0
TRACE: TestCBlosc2.cpp(139): [0][26]: original: 0; items: 0
TRACE: TestCBlosc2.cpp(139): [0][27]: original: 0; items: 0
TRACE: TestCBlosc2.cpp(139): [0][28]: original: 0; items: 0
TRACE: TestCBlosc2.cpp(139): [0][29]: original: 0; items: 0
TRACE: TestCBlosc2.cpp(139): [0][30]: original: 0; items: 0
TRACE: TestCBlosc2.cpp(139): [0][31]: original: 0; items: 0
TRACE: TestCBlosc2.cpp(139): [0][32]: original: 0; items: 0
TRACE: TestCBlosc2.cpp(139): [0][33]: original: 0; items: 0
TRACE: TestCBlosc2.cpp(139): [0][34]: original: 0; items: 0
TRACE: TestCBlosc2.cpp(139): [0][35]: original: 0; items: 0
TRACE: TestCBlosc2.cpp(139): [0][36]: original: 0; items: 0
TRACE: TestCBlosc2.cpp(139): [0][37]: original: 0; items: 0
TRACE: TestCBlosc2.cpp(139): [0][38]: original: 0; items: 0
TRACE: TestCBlosc2.cpp(139): [0][39]: original: 0; items: 0
TRACE: TestCBlosc2.cpp(139): [0][40]: original: 0; items: 0
TRACE: TestCBlosc2.cpp(139): [0][41]: original: 0; items: 0
TRACE: TestCBlosc2.cpp(139): [0][42]: original: 0; items: 0
TRACE: TestCBlosc2.cpp(139): [0][43]: original: 0; items: 0
TRACE: TestCBlosc2.cpp(139): [0][44]: original: 0; items: 0
TRACE: TestCBlosc2.cpp(139): [0][45]: original: 0; items: 0
TRACE: TestCBlosc2.cpp(139): [0][46]: original: 0; items: 0
TRACE: TestCBlosc2.cpp(139): [0][47]: original: 0; items: 0
TRACE: TestCBlosc2.cpp(139): [0][48]: original: 0; items: 0
TRACE: TestCBlosc2.cpp(139): [0][49]: original: 0; items: 0
TRACE: TestCBlosc2.cpp(139): [0][50]: original: 0; items: 0
TRACE: TestCBlosc2.cpp(139): [0][51]: original: 0; items: 0
TRACE: TestCBlosc2.cpp(139): [0][52]: original: 0; items: 0
TRACE: TestCBlosc2.cpp(139): [0][53]: original: 0; items: 0
TRACE: TestCBlosc2.cpp(139): [0][54]: original: 0; items: 0
TRACE: TestCBlosc2.cpp(139): [0][55]: original: 0; items: 0
TRACE: TestCBlosc2.cpp(139): [0][56]: original: 0; items: 0
TRACE: TestCBlosc2.cpp(139): [0][57]: original: 0; items: 0
TRACE: TestCBlosc2.cpp(139): [0][58]: original: 0; items: 0
TRACE: TestCBlosc2.cpp(139): [0][59]: original: 0; items: 0
TRACE: TestCBlosc2.cpp(139): [0][60]: original: 0; items: 0
TRACE: TestCBlosc2.cpp(139): [0][61]: original: 0; items: 0
TRACE: TestCBlosc2.cpp(139): [0][62]: original: 0; items: 0
TRACE: TestCBlosc2.cpp(139): [0][63]: original: 0; items: 0
TRACE: TestCBlosc2.cpp(139): [0][64]: original: 0; items: 0
TRACE: TestCBlosc2.cpp(139): [0][65]: original: 0; items: 0
TRACE: TestCBlosc2.cpp(139): [0][66]: original: 0; items: 0
TRACE: TestCBlosc2.cpp(139): [0][67]: original: 0; items: 0
TRACE: TestCBlosc2.cpp(139): [0][68]: original: 0; items: 0
TRACE: TestCBlosc2.cpp(139): [0][69]: original: 0; items: 0
TRACE: TestCBlosc2.cpp(139): [0][70]: original: 0; items: 0
TRACE: TestCBlosc2.cpp(139): [0][71]: original: 0; items: 0
TRACE: TestCBlosc2.cpp(139): [0][72]: original: 0; items: 0
TRACE: TestCBlosc2.cpp(139): [0][73]: original: 0; items: 0
TRACE: TestCBlosc2.cpp(139): [0][74]: original: 0; items: 0
TRACE: TestCBlosc2.cpp(139): [0][75]: original: 0; items: 0
TRACE: TestCBlosc2.cpp(139): [0][76]: original: 0; items: 0
TRACE: TestCBlosc2.cpp(139): [0][77]: original: 0; items: 0
TRACE: TestCBlosc2.cpp(139): [0][78]: original: 0; items: 0
TRACE: TestCBlosc2.cpp(139): [0][79]: original: 0; items: 0
TRACE: TestCBlosc2.cpp(139): [0][80]: original: 0; items: 0
TRACE: TestCBlosc2.cpp(139): [0][81]: original: 0; items: 0
TRACE: TestCBlosc2.cpp(139): [0][82]: original: 0; items: 0
TRACE: TestCBlosc2.cpp(139): [0][83]: original: 0; items: 0
TRACE: TestCBlosc2.cpp(139): [0][84]: original: 0; items: 0
TRACE: TestCBlosc2.cpp(139): [0][85]: original: 0; items: 0
TRACE: TestCBlosc2.cpp(139): [0][86]: original: 0; items: 0
TRACE: TestCBlosc2.cpp(139): [0][87]: original: 0; items: 0
TRACE: TestCBlosc2.cpp(139): [0][88]: original: 0; items: 0
TRACE: TestCBlosc2.cpp(139): [0][89]: original: 0; items: 0
TRACE: TestCBlosc2.cpp(139): [0][90]: original: 0; items: 0
TRACE: TestCBlosc2.cpp(139): [0][91]: original: 0; items: 0
TRACE: TestCBlosc2.cpp(139): [0][92]: original: 0; items: 0
TRACE: TestCBlosc2.cpp(139): [0][93]: original: 0; items: 0
TRACE: TestCBlosc2.cpp(139): [0][94]: original: 0; items: 0
TRACE: TestCBlosc2.cpp(139): [0][95]: original: 0; items: 0
TRACE: TestCBlosc2.cpp(139): [0][96]: original: 0; items: 0
TRACE: TestCBlosc2.cpp(139): [0][97]: original: 0; items: 0
TRACE: TestCBlosc2.cpp(139): [0][98]: original: 0; items: 0
TRACE: TestCBlosc2.cpp(139): [0][99]: original: 0; items: 0
TRACE: TestCBlosc2.cpp(139): [0][100]: original: 0; items: 0
TRACE: TestCBlosc2.cpp(139): [0][101]: original: 0; items: 0
TRACE: TestCBlosc2.cpp(139): [0][102]: original: 0; items: 0
TRACE: TestCBlosc2.cpp(139): [0][103]: original: 0; items: 0
TRACE: TestCBlosc2.cpp(139): [0][104]: original: 0; items: 0
TRACE: TestCBlosc2.cpp(139): [0][105]: original: 0; items: 0
TRACE: TestCBlosc2.cpp(139): [0][106]: original: 0; items: 0
TRACE: TestCBlosc2.cpp(139): [0][107]: original: 0; items: 0
TRACE: TestCBlosc2.cpp(139): [0][108]: original: 0; items: 0
TRACE: TestCBlosc2.cpp(139): [0][109]: original: 0; items: 0
TRACE: TestCBlosc2.cpp(139): [0][110]: original: 0; items: 0
TRACE: TestCBlosc2.cpp(139): [0][111]: original: 0; items: 0
TRACE: TestCBlosc2.cpp(139): [0][112]: original: 0; items: 0
TRACE: TestCBlosc2.cpp(139): [0][113]: original: 0; items: 0
TRACE: TestCBlosc2.cpp(139): [0][114]: original: 0; items: 0
TRACE: TestCBlosc2.cpp(139): [0][115]: original: 0; items: 0
TRACE: TestCBlosc2.cpp(139): [0][116]: original: 0; items: 0
TRACE: TestCBlosc2.cpp(139): [0][117]: original: 0; items: 0
TRACE: TestCBlosc2.cpp(139): [0][118]: original: 0; items: 0
TRACE: TestCBlosc2.cpp(139): [0][119]: original: 0; items: 0
TRACE: TestCBlosc2.cpp(139): [0][120]: original: 0; items: 0
TRACE: TestCBlosc2.cpp(139): [0][121]: original: 0; items: 0
TRACE: TestCBlosc2.cpp(139): [0][122]: original: 0; items: 0
TRACE: TestCBlosc2.cpp(139): [0][123]: original: 0; items: 0
TRACE: TestCBlosc2.cpp(139): [0][124]: original: 0; items: 0
TRACE: TestCBlosc2.cpp(139): [0][125]: original: 0; items: 0
TRACE: TestCBlosc2.cpp(139): [0][126]: original: 0; items: 0
TRACE: TestCBlosc2.cpp(139): [0][127]: original: 0; items: 0
TRACE: TestCBlosc2.cpp(139): [0][128]: original: 1; items: 1
TRACE: TestCBlosc2.cpp(137): !!! MISMATCH !!!:
TRACE: TestCBlosc2.cpp(139): [0][129]: original: 0; items: 1
TRACE: TestCBlosc2.cpp(139): [0][130]: original: 0; items: 0
CRITICAL: TestCBlosc2.cpp(141): Get returned wrong data for item 0 (getting 1 from 0)
ERROR: Failed: CBlosc2_RoundTrip
ERROR: Test case wrongdata.fail failed

The library and header need to be 'blosc2', no just 'blosc'

It turns out that there are many distributions that are carrying versions of libblosc.so and blosc.h of the blosc1 library, and even though blosc2 is compatible to these, for new blosc2 deployments, it is nearly impossible to override all the possible places where the blosc1 header and library are. We definitely need new names like libblosc2.so and blosc2.h.

blosc_getitem return value seems inconsistent (DeepState fuzzing)

Whether I assert that it is the #bytes of the #items, some tests seem to violate the claim. Is this known/expected?

	      unsigned start_item = DeepState_UIntInRange(0, num_elements-1);
	      unsigned num_items = DeepState_UIntInRange(0, num_elements-start_item);
	      LOG(TRACE) << "Getting " << num_items << " from " << start_item;
	      int get_result = blosc_getitem(intermediate, start_item, num_items, items);
	      //ASSERT_EQ(get_result, num_items * type_size) <<
	      //"Getting " << num_items << " from " << start_item << " expected: " << num_items * type_size << ": got " << get_result;

If I allow either:

             ASSERT((get_result == num_items) || (get_result == (num_items * type_size))) <<
                "Getting " << num_items << " from " << start_item << " with size " << type_size << ": got " << get_result;

it never fails, but that's not a consistent API. I can minimize a failing example, if you can clarify which is the expected return value.

Add support for post-filters

This is a user-provided function and would mimic the current pre-filter implementation. This will be interesting for supporting e.g. encryption (via pre-filters) and decryption (post-filters), or on-the-flight computation based on data in the same chunk, or generic inputs that can be populated by the user in the postfilter struct that will be passed to the post-filter.

Support for thread-safe operation

Currently, all the operation in super-chunks are making use of blosc_compress() and blosc_decompress(). For applications that are multithreaded, one must rather use blosc_compress_ctx() and blosc_decompress_ctx().

An idea is to pass a new field to the blosc2_sparams struct. The new field could be tentatively called thread_safety. The default could be 1 (i.e. thread safe, so the _ctx() API is going to be used), but the user can select a 0 for speed.

Out-of-order appends

Right now, the order of appending chunks in super-chunks sets the order of reading chunks. It would be nice if a new:
blosc2_schunk_set_buffer(blosc2_schunk *schunk, void *src, size_t nbytes, int pos)

call with the pos parameter specifying the order for reading. A pos=-1 would mean 'at the end', that is, equivalent to:

blosc2_schunk_append_buffer(blosc2_schunk *schunk, void *src, size_t nbytes) .

This would imply to create a new metalayer with this new order information.

Use msgpack for the chunks section in frame format

Packing the chunks section of the frame format would bring more encapsulation and better maneagibility in the end. With that, fields like uncompressed_size and compressed_size and blocksize can be moved into the new chunks section.

Suggestion: add a chunks_size field to the new chunks so that computing the size of the chunk for offsets would be easier, and specially, faster (e.g. remove the need for get_trailer_offsets() in frame_get_chunk()).

After doing this, I am not sure on whether we should add another msgpck field at the front of the header for specifying that three other msgpack sections come later (header, chunks and trailer). Probably this is not a good idea because then utilities like msgpack2json may want to read the whole file, and this can be too much for large files.

Changing the number of threads during appends blocks.

The program below shows the problem. When it starts with a number of threads larger than 1 it can not go back to a another number of threads larger than 1. However it can cycle correctly between the original number of threads and 1, and back.

`/*
Copyright (C) 2015 Francesc Alted
http://blosc.org
License: BSD (see LICENSE.txt)

Example program demonstrating use of the delta filter from C code.

To compile this program:

$ gcc -O delta_schunk_ex.c -o delta_schunk_ex -lblosc

To run:

$ ./delta_schunk_ex
Blosc version info: 2.0.0a4.dev ($Date:: 2016-08-04 #$)
Compression ratio: 762.9 MB -> 7.6 MB (100.7x)
Compression time: 0.222 s, 3437.4 MB/s
Decompression time: 0.162 s, 4714.4 MB/s
Successful roundtrip!

*/

#include <stdio.h>
#include <assert.h>
#include <context.h>
#include "blosc.h"

#define KB 1024.
#define MB (1024KB)
#define GB (1024
MB)

#define CHUNKSIZE (200 * 1000)
#define NCHUNKS 500
#define NTHREADS 4

int main() {
static int64_t data[CHUNKSIZE];
static int64_t data_dest[CHUNKSIZE];
const size_t isize = CHUNKSIZE * sizeof(int64_t);
int dsize = 0;
int64_t nbytes, cbytes;
blosc2_cparams cparams = BLOSC_CPARAMS_DEFAULTS;
blosc2_dparams dparams = BLOSC_DPARAMS_DEFAULTS;
blosc2_schunk* schunk;
int i;
int j = 0;
int nchunk;
size_t nchunks;
blosc_timestamp_t last, current;
double ttotal;

printf("Blosc version info: %s (%s)\n",
BLOSC_VERSION_STRING, BLOSC_VERSION_DATE);

/* Initialize the Blosc compressor */
blosc_init();

/* Create a super-chunk container */
cparams.typesize = 8;
cparams.filters[0] = BLOSC_DELTA;
cparams.compcode = BLOSC_BLOSCLZ;
cparams.clevel = 9;
cparams.nthreads = NTHREADS;
dparams.nthreads = NTHREADS;
schunk = blosc2_new_schunk(cparams, dparams);

struct blosc2_context_s * cctx = schunk->cctx;
blosc_set_timestamp(&last);
for (nchunk = 1; nchunk <= NCHUNKS; nchunk++) {
for (i = 0; i < CHUNKSIZE; i++) {
data[i] = i * (int64_t)nchunk;
}
if (j % 2 != 0) {
cctx->nthreads = 1; // if different from 1 the program blocks
} else {
cctx->nthreads = NTHREADS;
}
nchunks = blosc2_append_buffer(schunk, isize, data);
printf("Compressed");
j++;
assert(nchunks == nchunk);
}
/* Gather some info */
nbytes = schunk->nbytes;
cbytes = schunk->cbytes;
blosc_set_timestamp(&current);
ttotal = blosc_elapsed_secs(last, current);
printf("Compression ratio: %.1f MB -> %.1f MB (%.1fx)\n",
nbytes / MB, cbytes / MB, (1. * nbytes) / cbytes);
printf("Compression time: %.3g s, %.1f MB/s\n",
ttotal, nbytes / (ttotal * MB));

/* Retrieve and decompress the chunks (0-based count) */
blosc_set_timestamp(&last);
for (nchunk = NCHUNKS-1; nchunk >= 0; nchunk--) {
dsize = blosc2_decompress_chunk(schunk, (size_t)nchunk,
(void *)data_dest, isize);
}
if (dsize < 0) {
printf("Decompression error. Error code: %d\n", dsize);
return dsize;
}
blosc_set_timestamp(&current);
ttotal = blosc_elapsed_secs(last, current);
printf("Decompression time: %.3g s, %.1f MB/s\n",
ttotal, nbytes / (ttotal * MB));

/* Check integrity of the first chunk */
for (i = 0; i < CHUNKSIZE; i++) {
if (data_dest[i] != (uint64_t)i) {
printf("Decompressed data differs from original %d, %zd!\n",
i, data_dest[i]);
return -1;
}
}

printf("Successful roundtrip!\n");

/* Free resources */
blosc2_destroy_schunk(schunk);

return 0;
}`

Add lz4 to the minimum set of codecs

Right now, only the blosclz codec is mandatory, and all the other codecs are optionals. It would be nice to add at least lz4 (and lz4hc, because they are the same library) to the list of mandatory codecs.

Rename `nspace`to `metalayer`

The intent of the current name-spaces is to add different layers for storing meta-information so, we would need to do a global renaming of the nspace -> metalayer.

blosc2_decompress_ctx can overrun src

blosc2_decompress_ctx does not take the size of the source buffer as an argument. Instead, it assumes that the buffer is complete and correctly formed. That can result in array overruns for buffers that are truncated, malicious or otherwise corrupt. It should take the source buffer length as an argument, and never read beyond that.

Support ALTIVEC vector instruction on IBM POWER.

This issue the the continuation of:
Blosc/c-blosc#267

As stated there, SSE2 instructions from Intel can be transparently mapped into ALTIVEC starting with GCC-8. I made a quick demonstrator which is 40 to 50% faster (bitshuffle-lz4) compared to the master on a power9. The roadmap is the following:

  • Add the -DNO_WARN_X86_INTRINSICS compiler flag
  • Check the compiler version and processor version
  • Update the tests to validate this path like all the others

Add blosc2_decompress_safe()

Following the discussion in #40, it could be handy to have a new call for safer decompression that might have this signature:

int blosc2_decompress_safe(blosc2_context* context, const void* src, size_t *srcsize, void* dest, size_t *destsize);

On input, srcsize and destsize will have maximum number of bytes for reading (srcsize) or writing (destsize), whereas on output, these will contain the actual number of bytes read (srcsize) and written (destsize).

Essentially, the implementation would be something like:

int blosc2_decompress_safe(blosc2_context* context, const void* src, size_t* srcsize, void* dest, size_t* destsize);
  int result;
  size_t nbytes, cbytes, blocksize;

  blosc_cbuffer_sizes(src, &nbytes,  &cbytes, &blocksize);
  if (*srcsize < cbytes) {
    *srcsize = BLOSC_EXTENDED_HEADER_LENGTH;
    *destsize = 0;
    return -11;  // means that srcsize is too short
   }
  if (*destsize < nbytes) {
    *srcsize = BLOSC_EXTENDED_HEADER_LENGTH;
    *destsize = 0;
    return -12;  // means that destsize is too short
   }

  result = blosc2_decompress_ctx(context, src, dest, destsize);
  if (result > 0) {
    *srcsize = cbytes;
    *destsize = nbytes;
  }
  return result;
}

Error using metalayers

@FrancescAlted, When I tried to use the metalayers in blosc I get an error.

The example below consists of:

  • A schunk based on a frame is created.
  • A metalayer is added.
  • The schunk is filled using blosc2_schunk_append_buffer.

If the metalayers lines are commented it works well. But if the metalayer is added, when it tries to append a buffer, blosc fails.

#include <blosc2.h>

int main() {

    blosc2_frame *frame = blosc2_new_frame("metalayer.blosc2");

    blosc2_schunk *sc = blosc2_new_schunk(BLOSC_CPARAMS_DEFAULTS, BLOSC_DPARAMS_DEFAULTS, frame);

    char name[] = "metalayer";
    uint64_t content = 1234;
    uint32_t content_size = sizeof(uint64_t);
    blosc2_frame_add_metalayer(sc->frame, name, (uint8_t *) &content, content_size);

    uint64_t *content2;
    uint32_t content_size2;
    blosc2_frame_get_metalayer(sc->frame, name, (uint8_t **) &content2, &content_size2);

    printf("Content %llu\n", *content2);

    size_t part_size = 100 * 100 * sc->typesize;

    uint8_t *part = malloc(part_size);

    for (int i = 0; i < 2; ++i) {
        int err = blosc2_schunk_append_buffer(sc, part, part_size);
        printf("Error %d\n", err);
    }

    free(part);

    return 0;
}

DOSHUFFLE bit in metainfo seems wrong (DeepState fuzzing)

DOSHUFFLE metainfo bit seems wrong:

RACE: Initialized test input buffer with data from `doshuffle.fail`
TRACE: Running: CBlosc2_RoundTrip from TestCBlosc2.cpp(34)
TRACE: TestCBlosc2.cpp(45): Performing 1 round trips.
TRACE: TestCBlosc2.cpp(48): *******************************   Starting run #0   *******************************
TRACE: TestCBlosc2.cpp(50): Type size = 1
TRACE: TestCBlosc2.cpp(52): Number of elements = 1
TRACE: TestCBlosc2.cpp(54): Buffer alignment = 32
TRACE: TestCBlosc2.cpp(56): Compression level = 0
TRACE: TestCBlosc2.cpp(58): Do shuffle = 1
TRACE: TestCBlosc2.cpp(61): Setting compressor to blosclz
TRACE: TestCBlosc2.cpp(65): Setting delta to 0
TRACE: TestCBlosc2.cpp(69): Buffer size = 1
TRACE: TestCBlosc2.cpp(89): # uncompressed bytes = 1
TRACE: TestCBlosc2.cpp(91): # compressed bytes = 33
TRACE: TestCBlosc2.cpp(92): block size = 1
TRACE: TestCBlosc2.cpp(97): type size = 1
TRACE: TestCBlosc2.cpp(99): DOSHUFFLE: 0
CRITICAL: TestCBlosc2.cpp(100): do shuffle = 0 but set to 1
ERROR: Failed: CBlosc2_RoundTrip
ERROR: Test case doshuffle.fail failed

Support for endianness

So far (c-blosc2 beta.3), all the data stored in chunks does not provide information about its endianness.

PR #84 allows to store this information, but making use of it for restoring the correct endianness (e.g. when writing a frame on a big-endian machine and read it on a little-endian) has still to be implemented yet. In fact, this is not trivial, as the Blosc2 layer does not have info about the types (it is an int of 4 bytes or a str of 4 bytes?), so probably endianness restoring should be implemented on a different layer that has info about the types.

Add a system-meta layer

Right now, the only way to store variable length metadata in a blosc2 container is to use the usermeta section, but the user has full access to it, so it can mess with that information. It would nice to add another section (let's call it system-meta) where it can be used by applications on top of blosc2 to store critical information that should be protected (i.e. the user cannot delete/modify).

C-Blosc Build Gives memcpy() errors

When trying to build c-blosc in VS2015 Developer cmd prmpt, I get memcpy errors. (NOTE: I get the same error with c-blosc and c-blosc2)

cmake .. works fine, but cmake --build . gives (at end):

"C:\blosc\build\ALL_BUILD.vcxproj" (default target) (1) ->
"C:\blosc\build\bench\bench.vcxproj" (default target) (3) ->
"C:\blosc\build\blosc\blosc_shared.vcxproj" (default target) (4) ->
(ClCompile target) ->
C:\Program Files (x86)\Microsoft Visual Studio 14.0\VC\include\vcruntime_stri
ng.h(44): error C2040: 'memcpy': 'void *(void *,const void *,std::size_t)' diff
ers in levels of indirection from 'void *(void *,const void *,std::size_t)' [C:
\blosc\build\blosc\blosc_shared.vcxproj]

"C:\blosc\build\ALL_BUILD.vcxproj" (default target) (1) ->
"C:\blosc\build\blosc\blosc_shared_testing.vcxproj" (default target) (5) ->
C:\Program Files (x86)\Microsoft Visual Studio 14.0\VC\include\vcruntime_stri
ng.h(44): error C2040: 'memcpy': 'void *(void *,const void *,std::size_t)' diff
ers in levels of indirection from 'void *(void *,const void *,std::size_t)' [C:
\blosc\build\blosc\blosc_shared_testing.vcxproj]

"C:\blosc\build\ALL_BUILD.vcxproj" (default target) (1) ->
"C:\blosc\build\blosc\blosc_static.vcxproj" (default target) (6) ->
C:\Program Files (x86)\Microsoft Visual Studio 14.0\VC\include\vcruntime_stri
ng.h(44): error C2040: 'memcpy': 'void *(void *,const void *,std::size_t)' diff
ers in levels of indirection from 'void *(void *,const void *,std::size_t)' [C:
\blosc\build\blosc\blosc_static.vcxproj]

132 Warning(s)
3 Error(s)

Time Elapsed 00:00:30.01

Compress/Decompress routines should return Errnos

When an error happens, the API leaves the user wondering why. I suggest that blosc_compress, blosc2_compress_ctx, blosc_decompress, blosc2_decompress_ctx, blosc_getitem, and blosc2_getitem_ctx should all return -1 on error and set errno. Alternatively, Blosc could define its own error codes with negative values.

This would be especially useful for the decompress routines, which can return an error either because the destination buffer is too small, the data is corrupted, or the decompression library is unavailable.

Support for `file` command line utility for disk-based frames

It would be nice to provide support for the file command line utility in Unix platforms. As an example, it is cool to see the amount of metainfo that is provided for .jpg files:

$ file ~/WWW.YTS.AG.jpg
/home/francesc/WWW.YTS.AG.jpg: JPEG image data, JFIF standard 1.01, resolution (DPI), density 96x96, segment length 16, Exif Standard: [TIFF image data, big-endian, direntries=4, xresolution=62, yresolution=70, resolutionunit=2, software=paint.net 4.0.6], baseline, precision 8, 350x500, frames 3

Add `blosc_compress_ctx()` and `blosc_decompress_ctx()` for Blosc1 compatibility

The initial plan was to deprecate blosc_compress_ctx() and blosc_decompress_ctx() in favor of the newer blosc2_compress_ctx() and blosc2_decompress_ctx(). However, as there is no deprecation warning in docs of the original blosc_compress_ctx() and blosc_decompress_ctx() for Blosc1, we should provide an implementation for these (probably as thin wrappers of blosc2_compress_ctx() and blosc2_decompress_ctx() ) for Blosc2.

Allow the metalayers to be in-memory frames

The metalayers are currently backed by plain chunks, and that means a limitation of 31-bit in the size of the metalayers. By using an in-memory frame we would also get a contiguous buffer to be stored (similar to a chunk), and we would get rid of 1) the 31-bit limit and 2) the need to have large data chunks in-memory prior to be compressed into a chunk.

Add support for buldkite.com

Adding support for buildkite as another CI would be handy because it allows to use on-premise machines, potentially speeding-up the time to comple builds, but also to setup pipelines with more complex dependencies and analyzers.

Add a blosc2_schunk_getitem()

This will allow to retrieve different items from inside a super-chunk. The API blosc2_schunk_getitem() is not written in stone.

Metainfo type size not correct? (DeepState fuzzing)

I just started fuzzing C-Blosc2 with DeepState. The harness is here: https://github.com/agroce/deepstate-c-blosc2

A few odd things are coming up immediately. No round-trip failures yet, but oddities, maybe API misunderstandings.

    size_t typesize;
    int flags;
    blosc_cbuffer_metainfo(intermediate, &typesize, &flags);
    LOG(TRACE) << "type size = " << typesize;
    //ASSERT_EQ(typesize, type_size) << "type size = " << type_size << " but meta claims " << typesize;

The returned typesize seems to always be 1 in runs, whatever it was when the call to compress was made.

Problem while building cblosc2 on ARM64

Hi,
I am testing blosc2 on ARM64, actually the Pine64. Not a super powerful board but it is 64bits and has neon SIMD and this is what I am interested in.
https://www.pine64.org/devices/single-board-computers/pine-a64/
The operating system is debian10 hence kernel 4.19 and gcc8.3

Here is the error message, it is likely the expectations for following the standard is slightly different from one architecture to another ... but I am not that fluent in C:

kieffer@pine64:/data/workspace/c-blosc2$ cd build/
kieffer@pine64:/data/workspace/c-blosc2/build$ rm -rf *    
kieffer@pine64:/data/workspace/c-blosc2/build$ cmake ..
-- The C compiler identification is GNU 8.3.0
-- The CXX compiler identification is GNU 8.3.0
-- Check for working C compiler: /usr/bin/cc
-- Check for working C compiler: /usr/bin/cc -- works
-- Detecting C compiler ABI info
-- Detecting C compiler ABI info - done
-- Detecting C compile features
-- Detecting C compile features - done
-- Check for working CXX compiler: /usr/bin/c++
-- Check for working CXX compiler: /usr/bin/c++ -- works
-- Detecting CXX compiler ABI info
-- Detecting CXX compiler ABI info - done
-- Detecting CXX compile features
-- Detecting CXX compile features - done
Configuring for Blosc version: 2.0.0-beta.4.dev
-- Using LZ4 internal sources.
-- Using LIZARD internal sources.
-- Using MINIZ internal sources for ZLIB support.
-- Using ZSTD internal sources.
-- No IPP libraries found.
-- Not using IPP accelerated compression.
-- No build type specified. Defaulting to 'Release'.
-- Building for system processor aarch64
-- Adding run-time support for NEON
-- Looking for pthread.h
-- Looking for pthread.h - found
-- Looking for pthread_create
-- Looking for pthread_create - not found
-- Looking for pthread_create in pthreads
-- Looking for pthread_create in pthreads - not found
-- Looking for pthread_create in pthread
-- Looking for pthread_create in pthread - found
-- Found Threads: TRUE  
Skipping test_shuffle_roundtrip_altivec on non-ALTIVEC builds
Skipping test_shuffle_roundtrip_avx2 on non-AVX2 builds
Skipping test_shuffle_roundtrip_sse2 on non-SSE2 builds
-- Configuring done
CMake Warning (dev) at blosc/CMakeLists.txt:258 (add_library):
  Policy CMP0063 is not set: Honor visibility properties for all target
  types.  Run "cmake --help-policy CMP0063" for policy details.  Use the
  cmake_policy command to set the policy and suppress this warning.

  Target "blosc2_static" of type "STATIC_LIBRARY" has the following
  visibility properties set for C:

    C_VISIBILITY_PRESET

  For compatibility CMake is not honoring them for this target.
This warning is for project developers.  Use -Wno-dev to suppress it.

-- Generating done
-- Build files have been written to: /data/workspace/c-blosc2/build
kieffer@pine64:/data/workspace/c-blosc2/build$ make
Scanning dependencies of target blosc_shared_testing
[  1%] Building C object blosc/CMakeFiles/blosc_shared_testing.dir/blosc2.c.o
In file included from /data/workspace/c-blosc2/blosc/blosc2.c:18:
/data/workspace/c-blosc2/blosc/blosc2.h:1162:34: warning: 'struct timespec' declared inside parameter list will not be visible outside of this definition or declaration
 #define blosc_timestamp_t struct timespec
                                  ^~~~~~~~
/data/workspace/c-blosc2/blosc/blosc2.h:1168:39: note: in expansion of macro 'blosc_timestamp_t'
 BLOSC_EXPORT void blosc_set_timestamp(blosc_timestamp_t* timestamp);
                                       ^~~~~~~~~~~~~~~~~
/data/workspace/c-blosc2/blosc/blosc2.h:1162:34: warning: 'struct timespec' declared inside parameter list will not be visible outside of this definition or declaration
 #define blosc_timestamp_t struct timespec
                                  ^~~~~~~~
/data/workspace/c-blosc2/blosc/blosc2.h:1173:41: note: in expansion of macro 'blosc_timestamp_t'
 BLOSC_EXPORT double blosc_elapsed_nsecs(blosc_timestamp_t start_time,
                                         ^~~~~~~~~~~~~~~~~
/data/workspace/c-blosc2/blosc/blosc2.h:1162:34: warning: 'struct timespec' declared inside parameter list will not be visible outside of this definition or declaration
 #define blosc_timestamp_t struct timespec
                                  ^~~~~~~~
/data/workspace/c-blosc2/blosc/blosc2.h:1179:40: note: in expansion of macro 'blosc_timestamp_t'
 BLOSC_EXPORT double blosc_elapsed_secs(blosc_timestamp_t start_time,
                                        ^~~~~~~~~~~~~~~~~
In file included from /data/workspace/c-blosc2/blosc/blosc2.c:51:
/data/workspace/c-blosc2/internal-complibs/miniz-2.0.8/miniz.c:3031:9: note: #pragma message: Using fopen, ftello, fseeko, stat() etc. path for file I/O - this path may not support large files.
 #pragma message("Using fopen, ftello, fseeko, stat() etc. path for file I/O - this path may not support large files.")
         ^~~~~~~
/data/workspace/c-blosc2/blosc/blosc2.c: In function 'blosc_compress_context':
/data/workspace/c-blosc2/blosc/blosc2.c:1592:23: warning: passing argument 1 of 'blosc_set_timestamp' from incompatible pointer type [-Wincompatible-pointer-types]
   blosc_set_timestamp(&last);
                       ^~~~~
In file included from /data/workspace/c-blosc2/blosc/blosc2.c:18:
/data/workspace/c-blosc2/blosc/blosc2.h:1168:58: note: expected 'struct timespec *' but argument is of type 'struct timespec *'
 BLOSC_EXPORT void blosc_set_timestamp(blosc_timestamp_t* timestamp);
                                                          ^
/data/workspace/c-blosc2/blosc/blosc2.c:1644:25: warning: passing argument 1 of 'blosc_set_timestamp' from incompatible pointer type [-Wincompatible-pointer-types]
     blosc_set_timestamp(&current);
                         ^~~~~~~~
In file included from /data/workspace/c-blosc2/blosc/blosc2.c:18:
/data/workspace/c-blosc2/blosc/blosc2.h:1168:58: note: expected 'struct timespec *' but argument is of type 'struct timespec *'
 BLOSC_EXPORT void blosc_set_timestamp(blosc_timestamp_t* timestamp);
                                                          ^
/data/workspace/c-blosc2/blosc/blosc2.c:1645:39: error: type of formal parameter 1 is incomplete
     double ctime = blosc_elapsed_secs(last, current);
                                       ^~~~
/data/workspace/c-blosc2/blosc/blosc2.c:1645:45: error: type of formal parameter 2 is incomplete
     double ctime = blosc_elapsed_secs(last, current);
                                             ^~~~~~~
/data/workspace/c-blosc2/blosc/blosc2.c: In function 'blosc_get_complib_info':
/data/workspace/c-blosc2/blosc/blosc2.c:2592:14: warning: implicit declaration of function 'strdup'; did you mean 'strcmp'? [-Wimplicit-function-declaration]
   *complib = strdup(clibname);
              ^~~~~~
              strcmp
/data/workspace/c-blosc2/blosc/blosc2.c:2592:12: warning: assignment to 'char *' from 'int' makes pointer from integer without a cast [-Wint-conversion]
   *complib = strdup(clibname);
            ^
/data/workspace/c-blosc2/blosc/blosc2.c:2593:12: warning: assignment to 'char *' from 'int' makes pointer from integer without a cast [-Wint-conversion]
   *version = strdup(clibversion);
            ^
make[2]: *** [blosc/CMakeFiles/blosc_shared_testing.dir/build.make:63: blosc/CMakeFiles/blosc_shared_testing.dir/blosc2.c.o] Error 1
make[1]: *** [CMakeFiles/Makefile2:127: blosc/CMakeFiles/blosc_shared_testing.dir/all] Error 2
make: *** [Makefile:163: all] Error 2

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.