Code Monkey home page Code Monkey logo

python-blosc's Introduction

Python-Blosc

A Python wrapper for the extremely fast Blosc compression library

Author

The Blosc development team

Contact

[email protected]

Github

https://github.com/Blosc/python-blosc

URL

https://www.blosc.org/python-blosc/python-blosc.html

PyPi

version

Anaconda

anaconda

Gitter

gitter

Code of Conduct

Contributor Covenant

What it is

Blosc (https://blosc.org) is a high performance compressor optimized for binary data. It has been designed to transmit data to the processor cache faster than the traditional, non-compressed, direct memory fetch approach via a memcpy() OS call.

Blosc works well for compressing numerical arrays that contains data with relatively low entropy, like sparse data, time series, grids with regular-spaced values, etc.

python-blosc a Python package that wraps Blosc. python-blosc supports Python 3.8 or higher versions.

Installing

Blosc is now offering Python wheels for the main OS (Win, Mac and Linux) and platforms. You can install binary packages from PyPi using pip:

$ pip install blosc

Documentation

The Sphinx based documentation is here:

https://blosc.org/python-blosc/python-blosc.html

Also, some examples are available on python-blosc wiki page:

https://github.com/blosc/python-blosc/wiki

Lastly, here is the recording and the slides from the talk "Compress me stupid" at the EuroPython 2014.

Building

If you need more control, there are different ways to compile python-blosc, depending if you want to link with an already installed Blosc library or not.

Installing via setuptools

python-blosc comes with the Blosc sources with it and can be built with:

$ python -m pip install -r requirements-dev.txt
$ python setup.py build_ext --inplace

Any codec can be enabled (=1) or disabled (=0) on this build-path with the appropriate OS environment variables INCLUDE_LZ4, INCLUDE_SNAPPY, INCLUDE_ZLIB, and INCLUDE_ZSTD. By default all the codecs in Blosc are enabled except Snappy (due to some issues with C++ with the gcc toolchain).

Compiler specific optimisations are automatically enabled by inspecting the CPU flags building Blosc. They can be manually disabled by setting the following environmental variables: DISABLE_BLOSC_SSE2 and DISABLE_BLOSC_AVX2.

setuptools is limited to using the compiler specified in the environment variable CC which on posix systems is usually gcc. This often causes trouble with the Snappy codec, which is written in C++, and as a result Snappy is no longer compiled by default. This problem is not known to affect MSVC or clang. Snappy is considered optional in Blosc as its compression performance is below that of the other codecs.

That's all. You can proceed with testing section now.

Compiling with an installed Blosc library

This approach uses pre-built, fully optimized versions of Blosc built via CMake.

Go to https://github.com/Blosc/c-blosc/releases and download and install the C-Blosc library. Then, you can tell python-blosc where is the C-Blosc library in a couple of ways:

Using an environment variable:

$ export USE_SYSTEM_BLOSC=1                 # or "set USE_SYSTEM_BLOSC=1" on Windows
$ export Blosc_ROOT=/usr/local/customprefix # If you installed Blosc into a custom location
$ python setup.py build_ext --inplace

Using flags:

$ python setup.py build_ext --inplace -DUSE_SYSTEM_BLOSC:BOOL=YES -DBlosc_ROOT:PATH=/usr/local/customprefix

Testing

After compiling, you can quickly check that the package is sane by running the doctests in blosc/test.py:

$ python -m blosc.test  (add -v for verbose mode)

Once installed, you can re-run the tests at any time with:

$ python -c "import blosc; blosc.test()"

Benchmarking

If curious, you may want to run a small benchmark that compares a plain NumPy array copy against compression through different compressors in your Blosc build:

$ PYTHONPATH=. python bench/compress_ptr.py

Just to whet your appetite, here are the results for an Intel Xeon E5-2695 v3 @ 2.30GHz, running Python 3.5, CentOS 7, but YMMV (and will vary!):

-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
python-blosc version: 1.5.1.dev0
Blosc version: 1.11.2 ($Date:: 2017-01-27 #$)
Compressors available: ['blosclz', 'lz4', 'lz4hc', 'snappy', 'zlib', 'zstd']
Compressor library versions:
  BloscLZ: 1.0.5
  LZ4: 1.7.5
  Snappy: 1.1.1
  Zlib: 1.2.7
  Zstd: 1.1.2
Python version: 3.5.2 |Continuum Analytics, Inc.| (default, Jul  2 2016, 17:53:06)
[GCC 4.4.7 20120313 (Red Hat 4.4.7-1)]
Platform: Linux-3.10.0-327.18.2.el7.x86_64-x86_64 (#1 SMP Thu May 12 11:03:55 UTC 2016)
Linux dist: CentOS Linux 7.2.1511
Processor: x86_64
Byte-ordering: little
Detected cores: 56
Number of threads to use by default: 4
  -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
Creating NumPy arrays with 10**8 int64/float64 elements:
  *** ctypes.memmove() *** Time for memcpy(): 0.276 s (2.70 GB/s)

Times for compressing/decompressing with clevel=5 and 24 threads

*** the arange linear distribution ***
  *** blosclz , noshuffle  ***  0.382 s (1.95 GB/s) / 0.300 s (2.48 GB/s) Compr. ratio:   1.0x
  *** blosclz , shuffle    ***  0.042 s (17.77 GB/s) / 0.027 s (27.18 GB/s)   Compr. ratio:  57.1x
  *** blosclz , bitshuffle ***  0.094 s (7.94 GB/s) / 0.041 s (18.28 GB/s)    Compr. ratio:  74.0x
  *** lz4     , noshuffle  ***  0.156 s (4.79 GB/s) / 0.052 s (14.30 GB/s)    Compr. ratio:   2.0x
  *** lz4     , shuffle    ***  0.033 s (22.58 GB/s) / 0.034 s (22.03 GB/s)   Compr. ratio:  68.6x
  *** lz4     , bitshuffle ***  0.059 s (12.63 GB/s) / 0.053 s (14.18 GB/s)   Compr. ratio:  33.1x
  *** lz4hc   , noshuffle  ***  0.443 s (1.68 GB/s) / 0.070 s (10.62 GB/s)    Compr. ratio:   2.0x
  *** lz4hc   , shuffle    ***  0.102 s (7.31 GB/s) / 0.029 s (25.42 GB/s)    Compr. ratio:  97.5x
  *** lz4hc   , bitshuffle ***  0.206 s (3.62 GB/s) / 0.038 s (19.85 GB/s)    Compr. ratio: 180.5x
  *** snappy  , noshuffle  ***  0.154 s (4.84 GB/s) / 0.056 s (13.28 GB/s)    Compr. ratio:   2.0x
  *** snappy  , shuffle    ***  0.044 s (16.89 GB/s) / 0.047 s (15.95 GB/s)   Compr. ratio:  17.4x
  *** snappy  , bitshuffle ***  0.064 s (11.58 GB/s) / 0.061 s (12.26 GB/s)   Compr. ratio:  18.2x
  *** zlib    , noshuffle  ***  1.172 s (0.64 GB/s) / 0.135 s (5.50 GB/s) Compr. ratio:   5.3x
  *** zlib    , shuffle    ***  0.260 s (2.86 GB/s) / 0.086 s (8.67 GB/s) Compr. ratio: 120.8x
  *** zlib    , bitshuffle ***  0.262 s (2.84 GB/s) / 0.094 s (7.96 GB/s) Compr. ratio: 260.1x
  *** zstd    , noshuffle  ***  0.973 s (0.77 GB/s) / 0.093 s (8.00 GB/s) Compr. ratio:   7.8x
  *** zstd    , shuffle    ***  0.093 s (7.97 GB/s) / 0.023 s (32.71 GB/s)    Compr. ratio: 156.7x
  *** zstd    , bitshuffle ***  0.115 s (6.46 GB/s) / 0.029 s (25.60 GB/s)    Compr. ratio: 320.6x

*** the linspace linear distribution ***
  *** blosclz , noshuffle  ***  0.341 s (2.19 GB/s) / 0.291 s (2.56 GB/s) Compr. ratio:   1.0x
  *** blosclz , shuffle    ***  0.132 s (5.65 GB/s) / 0.023 s (33.10 GB/s)    Compr. ratio:   2.0x
  *** blosclz , bitshuffle ***  0.166 s (4.50 GB/s) / 0.036 s (20.89 GB/s)    Compr. ratio:   2.8x
  *** lz4     , noshuffle  ***  0.142 s (5.26 GB/s) / 0.028 s (27.07 GB/s)    Compr. ratio:   1.0x
  *** lz4     , shuffle    ***  0.093 s (8.01 GB/s) / 0.030 s (24.87 GB/s)    Compr. ratio:   3.4x
  *** lz4     , bitshuffle ***  0.102 s (7.31 GB/s) / 0.039 s (19.13 GB/s)    Compr. ratio:   5.3x
  *** lz4hc   , noshuffle  ***  0.700 s (1.06 GB/s) / 0.044 s (16.77 GB/s)    Compr. ratio:   1.1x
  *** lz4hc   , shuffle    ***  0.203 s (3.67 GB/s) / 0.021 s (36.22 GB/s)    Compr. ratio:   8.6x
  *** lz4hc   , bitshuffle ***  0.342 s (2.18 GB/s) / 0.028 s (26.50 GB/s)    Compr. ratio:  14.2x
  *** snappy  , noshuffle  ***  0.271 s (2.75 GB/s) / 0.274 s (2.72 GB/s) Compr. ratio:   1.0x
  *** snappy  , shuffle    ***  0.099 s (7.54 GB/s) / 0.042 s (17.55 GB/s)    Compr. ratio:   4.2x
  *** snappy  , bitshuffle ***  0.127 s (5.86 GB/s) / 0.043 s (17.20 GB/s)    Compr. ratio:   6.1x
  *** zlib    , noshuffle  ***  1.525 s (0.49 GB/s) / 0.158 s (4.70 GB/s) Compr. ratio:   1.6x
  *** zlib    , shuffle    ***  0.346 s (2.15 GB/s) / 0.098 s (7.59 GB/s) Compr. ratio:  10.7x
  *** zlib    , bitshuffle ***  0.420 s (1.78 GB/s) / 0.104 s (7.20 GB/s) Compr. ratio:  18.0x
  *** zstd    , noshuffle  ***  1.061 s (0.70 GB/s) / 0.096 s (7.79 GB/s) Compr. ratio:   1.9x
  *** zstd    , shuffle    ***  0.203 s (3.68 GB/s) / 0.052 s (14.21 GB/s)    Compr. ratio:  14.2x
  *** zstd    , bitshuffle ***  0.251 s (2.97 GB/s) / 0.047 s (15.84 GB/s)    Compr. ratio:  22.2x

*** the random distribution ***
  *** blosclz , noshuffle  ***  0.340 s (2.19 GB/s) / 0.285 s (2.61 GB/s) Compr. ratio:   1.0x
  *** blosclz , shuffle    ***  0.091 s (8.21 GB/s) / 0.017 s (44.29 GB/s)    Compr. ratio:   3.9x
  *** blosclz , bitshuffle ***  0.080 s (9.27 GB/s) / 0.029 s (26.12 GB/s)    Compr. ratio:   6.1x
  *** lz4     , noshuffle  ***  0.150 s (4.95 GB/s) / 0.027 s (28.05 GB/s)    Compr. ratio:   2.4x
  *** lz4     , shuffle    ***  0.068 s (11.02 GB/s) / 0.029 s (26.03 GB/s)   Compr. ratio:   4.5x
  *** lz4     , bitshuffle ***  0.063 s (11.87 GB/s) / 0.054 s (13.70 GB/s)   Compr. ratio:   6.2x
  *** lz4hc   , noshuffle  ***  0.645 s (1.15 GB/s) / 0.019 s (39.22 GB/s)    Compr. ratio:   3.5x
  *** lz4hc   , shuffle    ***  0.257 s (2.90 GB/s) / 0.022 s (34.62 GB/s)    Compr. ratio:   5.1x
  *** lz4hc   , bitshuffle ***  0.128 s (5.80 GB/s) / 0.029 s (25.52 GB/s)    Compr. ratio:   6.2x
  *** snappy  , noshuffle  ***  0.164 s (4.54 GB/s) / 0.048 s (15.46 GB/s)    Compr. ratio:   2.2x
  *** snappy  , shuffle    ***  0.082 s (9.09 GB/s) / 0.043 s (17.39 GB/s)    Compr. ratio:   4.3x
  *** snappy  , bitshuffle ***  0.071 s (10.48 GB/s) / 0.046 s (16.08 GB/s)   Compr. ratio:   5.0x
  *** zlib    , noshuffle  ***  1.223 s (0.61 GB/s) / 0.093 s (7.97 GB/s) Compr. ratio:   4.0x
  *** zlib    , shuffle    ***  0.636 s (1.17 GB/s) / 0.126 s (5.89 GB/s) Compr. ratio:   5.5x
  *** zlib    , bitshuffle ***  0.327 s (2.28 GB/s) / 0.109 s (6.81 GB/s) Compr. ratio:   6.2x
  *** zstd    , noshuffle  ***  1.432 s (0.52 GB/s) / 0.103 s (7.27 GB/s) Compr. ratio:   4.2x
  *** zstd    , shuffle    ***  0.388 s (1.92 GB/s) / 0.031 s (23.71 GB/s)    Compr. ratio:   5.9x
  *** zstd    , bitshuffle ***  0.127 s (5.86 GB/s) / 0.033 s (22.77 GB/s)    Compr. ratio:   6.4x

Also, Blosc works quite well on ARM processors (even without NEON support yet):

-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
python-blosc version: 1.4.4
Blosc version: 1.11.2 ($Date:: 2017-01-27 #$)
Compressors available: ['blosclz', 'lz4', 'lz4hc', 'snappy', 'zlib', 'zstd']
Compressor library versions:
  BloscLZ: 1.0.5
  LZ4: 1.7.5
  Snappy: 1.1.1
  Zlib: 1.2.8
  Zstd: 1.1.2
Python version: 3.6.0 (default, Dec 31 2016, 21:20:16)
[GCC 4.9.2]
Platform: Linux-3.4.113-sun8i-armv7l (#50 SMP PREEMPT Mon Nov 14 08:41:55 CET 2016)
Linux dist: debian 9.0
Processor: not recognized
Byte-ordering: little
Detected cores: 4
Number of threads to use by default: 4
-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
  *** ctypes.memmove() *** Time for memcpy():   0.015 s (93.57 MB/s)

Times for compressing/decompressing with clevel=5 and 4 threads

*** user input ***
  *** blosclz , noshuffle  ***  0.015 s (89.93 MB/s) / 0.010 s (138.32 MB/s)    Compr. ratio:   2.7x
  *** blosclz , shuffle    ***  0.023 s (60.25 MB/s) / 0.012 s (112.71 MB/s)    Compr. ratio:   2.3x
  *** blosclz , bitshuffle ***  0.018 s (77.63 MB/s) / 0.021 s (66.76 MB/s)     Compr. ratio:   7.3x
  *** lz4     , noshuffle  ***  0.008 s (177.14 MB/s) / 0.009 s (159.00 MB/s)   Compr. ratio:   3.6x
  *** lz4     , shuffle    ***  0.010 s (131.29 MB/s) / 0.012 s (117.69 MB/s)   Compr. ratio:   3.5x
  *** lz4     , bitshuffle ***  0.015 s (89.97 MB/s) / 0.022 s (63.62 MB/s)     Compr. ratio:   8.4x
  *** lz4hc   , noshuffle  ***  0.071 s (19.30 MB/s) / 0.007 s (186.64 MB/s)    Compr. ratio:   8.6x
  *** lz4hc   , shuffle    ***  0.079 s (17.30 MB/s) / 0.014 s (95.99 MB/s)     Compr. ratio:   6.2x
  *** lz4hc   , bitshuffle ***  0.062 s (22.23 MB/s) / 0.027 s (51.53 MB/s)     Compr. ratio:   9.7x
  *** snappy  , noshuffle  ***  0.008 s (173.87 MB/s) / 0.009 s (148.77 MB/s)   Compr. ratio:   4.4x
  *** snappy  , shuffle    ***  0.011 s (123.22 MB/s) / 0.016 s (85.16 MB/s)    Compr. ratio:   4.4x
  *** snappy  , bitshuffle ***  0.015 s (89.02 MB/s) / 0.021 s (64.87 MB/s)     Compr. ratio:   6.2x
  *** zlib    , noshuffle  ***  0.047 s (29.26 MB/s) / 0.011 s (121.83 MB/s)    Compr. ratio:  14.7x
  *** zlib    , shuffle    ***  0.080 s (17.20 MB/s) / 0.022 s (63.61 MB/s)     Compr. ratio:   9.4x
  *** zlib    , bitshuffle ***  0.059 s (23.50 MB/s) / 0.033 s (41.10 MB/s)     Compr. ratio:  10.5x
  *** zstd    , noshuffle  ***  0.113 s (12.21 MB/s) / 0.011 s (124.64 MB/s)    Compr. ratio:  15.6x
  *** zstd    , shuffle    ***  0.154 s (8.92 MB/s) / 0.026 s (52.56 MB/s)      Compr. ratio:   9.9x
  *** zstd    , bitshuffle ***  0.116 s (11.86 MB/s) / 0.036 s (38.40 MB/s)     Compr. ratio:  11.4x

For details on the ARM benchmark see: #105

In case you find your own results interesting, please report them back to the authors!

License

The software is licensed under a 3-Clause BSD license. A copy of the python-blosc license can be found in LICENSE.txt.

Mailing list

Discussion about this module is welcome in the Blosc list:

[email protected]

https://groups.google.com/g/blosc


Enjoy data!

python-blosc's People

Contributors

albertosm27 avatar bnavigator avatar cgohlke avatar datapythonista avatar dimitripapadopoulos avatar esc avatar francescalted avatar francescelies avatar gcmalloc avatar graingert avatar hmaarrfk avatar itdaniher avatar juanmaree avatar keszybz avatar lbolla avatar lgarrison avatar manuel-castro avatar martaiborra avatar ndevenish avatar newt0311 avatar odidev avatar oscargm98 avatar piskvorky avatar robbmcleod avatar sdvillal avatar simnyatsanga avatar small-mallet avatar tacaswell avatar thewtex avatar tirkarthi avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

python-blosc's Issues

add docs for set_blocksize for release

Hi, great stuff you have done in blosc, thanks! :)

I am trying it for https://attic-backup.org/ and due to the way attic works, it usually processes ~ 64KB big chunks of data. I first had a bit trouble to get blosc into parallel execution until I found set_blocksize.

So I'ld like to suggest that you add a bit docs for it and not just write "experts only" - maxing out speed is your whole point, right? So blosc should not just use 1 thread.

I currently have set blocksize to 8192, assuming that 64KB chunk size divided by maybe 8 cores would give 8KB blocks. Is this correct?

Also, I'ld like to suggest to do a new release on pypi - having "dev" packages as a dependency is a bit ugly.

support aarch64?

pip install blosc on aarch64 / python3.5.2 errors out due to cpuinfo.py not having a match clause for aarch64.

This is solvable by adding the following three lines:

+       elif re.match('^aarch64$', raw_arch_string):
+               arch = 'AARCH_64'
+               bits = 64

Cheers!
Ian

binary wheels

We are stumbling upon the missing Python.h issue again and again. Maybe it is time to put some effort into binary wheels. Though I must confess I have absolutely no idea about these.

segfault when typesize is zero.

In [1]: import blosc
In [2]: s = 'abc'
In [3]: blosc.compress(s, typesize=0)
[3]    12228 floating point exception (core dumped)  ipython

blosc_extension.error: Error -1 while compressing data (Issue probably coming from Shuffle filter)

When using this code (with blosc-1.2.1.win-amd64-py2.7.exe, Python 2.7 64bits, Windows 7 x64) :

import numpy as np
import blosc

with open('blah.bin','rb') as f:
    w = np.fromstring(f.read(), dtype=np.int16)
print w    # w is a regular, standard, numpy array

z = blosc.pack_array(w, cname='lz4')   # here we have a bug

then we have this crash :
(Please download the blah.bin file here : https://dl.dropboxusercontent.com/u/83031018/blah.bin)

[   0    0    2 ..., -872 -258 -599]
Traceback (most recent call last):
  File "D:\Documents\projects\coding\python\vrac\compression\blosc_bug.py", line 7, in <module>
    z = blosc.pack_array(w, cname='lz4')
  File "C:\Python27-64\lib\site-packages\blosc\toplevel.py", line 579, in pack_array
    packed_array = compress(pickled_array, itemsize, clevel, shuffle, cname)
  File "C:\Python27-64\lib\site-packages\blosc\toplevel.py", line 309, in compress
    return _ext.compress(bytesobj, typesize, clevel, shuffle, cname)
blosc_extension.error: Error -1 while compressing data

Important note : When shuffle = False, there is no more error. So the issue probably comes from "Shuffle".

Compress does not work as it should under Mac OSX

The test suite reveals that some dataset configurations do not work well on Mac OSX. Here it as example:

import array
import blosc
a = array.array('i', xrange(1000)).tostring()
print blosc.set_nthreads(1)  # disable multithreading, just in case
ac = blosc.compress(a, 4, 9, True)
print len(ac)

Here is the result on Mac OSX (incorrect):

2
4016

And here on Linux (which is correct):

6
365

nbytes in PyBlosc_compress should be size_t

nbytes in PyBlosc_compress should be of type size_t (or perhaps Py_ssize_t better) and not int so as to match the signature of PyString_FromStringAndSize() and blosc_compress. PyBlosc_decompress should be consistent with this too.

can't build master on aarch64

see pr #135

ubuntu 17.10, python 3.6

Linux rock64 4.4.77-rk3328 #29 SMP Mon Nov 20 03:26:28 CET 2017 aarch64 aarch64 aarch64 GNU/Linux

note some tests fail, I'll open another issue with details.

refactor type and value checking

The compress* and decompress* functions are doing a fair bit of type and value checking. Some of the code is copied and pasted. Should refactor this into functions (or decorators) and then just call the functions to avoid duplication.

Maintain conda packages on conda forge

Many people in the community have switched to building packages automatically on conda-forge. The getting started procedure is relatively straightforward if you already have a conda recipe.

Having up-to-date conda packages in a community maintained channel would make it easier for downstream libraries (like Dask) to rely on Blosc more heavily.

This would be a nice task for anyone who wants to help out Blosc and get involved in community package management. A deep knowledge of the blosc codebase is not necessary.

Is it possible to remove the MAX_BUFFERSIZE restriction?

I just run into this issue:

C:\Miniconda3\lib\site-packages\blosc\toplevel.py in compress(bytesobj, typesize, clevel, shuffle, cname)
    359     """
    360 
--> 361     _check_input_length('bytesobj', len(bytesobj))
    362     _check_typesize(typesize)
    363     _check_shuffle(shuffle)

C:\Miniconda3\lib\site-packages\blosc\toplevel.py in _check_input_length(input_name, input_len)
    299     if input_len > blosc.MAX_BUFFERSIZE:
    300         raise ValueError("%s cannot be larger than %d bytes" %
--> 301                          (input_name, blosc.MAX_BUFFERSIZE))
    302 
    303 

ValueError: bytesobj cannot be larger than 2147483631 bytes

...so it looks like you're currently restricted to sizeof(int32) or ~2GB. Is there any way around this restriction?

Error when building 1.2.7 on python 3.5.1 (Anaconda 2.5, Windows)

Hello,

See this for more info: Blosc/bloscpack#46

@esc Suggested I post a bug here instead...

I have Anaconda 2.5 (Python 3.5.1) with Visual Studio community edition. I can compile and install 1.2.8 (or 1.2.9dev0 from github) with no problems. With 1.2.7, pip install blosc==1.2.7 gives me all this nastiness....

Both 1.2.8 and 1.2.9dev0 seem to build just fine. (I'm using these while playing with castra).

uild\temp.win-amd64-3.5\Release\c-blosc/internal-complibs\zlib-1.2.8\inflate.obj
build\temp.win-amd64-3.5\Release\c-blosc/internal-complibs\zlib-1.2.8\inftrees.
obj build\temp.win-amd64-3.5\Release\c-blosc/internal-complibs\zlib-1.2.8\trees.
obj build\temp.win-amd64-3.5\Release\c-blosc/internal-complibs\zlib-1.2.8\uncomp
r.obj build\temp.win-amd64-3.5\Release\c-blosc/internal-complibs\zlib-1.2.8\zuti
l.obj /OUT:build\lib.win-amd64-3.5\blosc\blosc_extension.cp35-win_amd64.pyd /IMP
LIB:build\temp.win-amd64-3.5\Release\blosc\blosc_extension.cp35-win_amd64.lib
blosc_extension.obj : warning LNK4197: export 'PyInit_blosc_extension' speci
fied multiple times; using first specification
Creating library build\temp.win-amd64-3.5\Release\blosc\blosc_extension.c
p35-win_amd64.lib and object build\temp.win-amd64-3.5\Release\blosc\blosc_extens
ion.cp35-win_amd64.exp
shuffle.obj : error LNK2001: unresolved external symbol blosc_get_cpu_featur
es
build\lib.win-amd64-3.5\blosc\blosc_extension.cp35-win_amd64.pyd : fatal err
or LNK1120: 1 unresolved externals
error: command 'C:\Program Files (x86)\Microsoft Visual Studio 14.0\VC\B
IN\amd64\link.exe' failed with exit status 1120

----------------------------------------

Command "C:\Anaconda3\python.exe -u -c "import setuptools, tokenize;file='C:
\Users\PQUACK1\AppData\Local\Temp\pip-build-7u774njl\blosc\setup.py';ex
ec(compile(getattr(tokenize, 'open', open)(file).read().replace('\r\n', '\n'
), file, 'exec'))" install --record C:\Users\QUACK
1\AppData\Local\Temp\pip
-rbdprlu1-record\install-record.txt --single-version-externally-managed --compil
e" failed with error code 1 in C:\Users\QUACK~1\AppData\Local\Temp\pip-build-7u
774njl\blosc\

[Anaconda3] C:\Users\pquackenbush\git>

Error encountered when updating Blosc 1.2.7

Hi,

I want to request for a python wheel distribution of this package or compiled binary instead of install from source. The following log was generated when I tried to update blosc:

c:\Python\Scripts>pip install -U blosc
Collecting blosc
  Downloading blosc-1.2.7.tar.gz (239kB)
    100% |################################| 241kB 292kB/s
Installing collected packages: blosc
  Found existing installation: blosc 1.2.5
    Uninstalling blosc-1.2.5:
      Successfully uninstalled blosc-1.2.5
  Running setup.py install for blosc
    Complete output from command c:\Python\python.exe -c "import setuptools, tokenize;__file__='c:\\users\\eep1\\appdata
\\local\\temp\\pip-build-h08iyx\\blosc\\setup.py';exec(compile(getattr(tokenize, 'open', open)(__file__).read().replace(
'\r\n', '\n'), __file__, 'exec'))" install --record c:\users\eep1\appdata\local\temp\pip-kup53o-record\install-record.tx
t --single-version-externally-managed --compile:
    running install
    running build
    running build_py
    creating build
    creating build\lib.win32-2.7
    creating build\lib.win32-2.7\blosc
    copying blosc\test.py -> build\lib.win32-2.7\blosc
    copying blosc\toplevel.py -> build\lib.win32-2.7\blosc
    copying blosc\version.py -> build\lib.win32-2.7\blosc
    copying blosc\__init__.py -> build\lib.win32-2.7\blosc
    running build_ext
    building 'blosc.blosc_extension' extension
    creating build\temp.win32-2.7
    creating build\temp.win32-2.7\Release
    creating build\temp.win32-2.7\Release\blosc
    creating build\temp.win32-2.7\Release\c-blosc
    creating build\temp.win32-2.7\Release\c-blosc\blosc
    creating build\temp.win32-2.7\Release\c-blosc\internal-complibs
    creating build\temp.win32-2.7\Release\c-blosc\internal-complibs\lz4-1.6.0
    creating build\temp.win32-2.7\Release\c-blosc\internal-complibs\snappy-1.1.1
    creating build\temp.win32-2.7\Release\c-blosc\internal-complibs\zlib-1.2.8
    C:\Python\MinGW\bin\gcc.exe -mdll -O -Wall -DHAVE_LZ4=1 -DHAVE_SNAPPY=1 -DHAVE_ZLIB=1 -Ic-blosc\blosc -Ic-blosc/inte
rnal-complibs\lz4-1.6.0 -Ic-blosc/internal-complibs\snappy-1.1.1 -Ic-blosc/internal-complibs\zlib-1.2.8 -Ic:\Python\incl
ude -Ic:\Python\PC -c blosc/blosc_extension.c -o build\temp.win32-2.7\Release\blosc\blosc_extension.o
    C:\Python\MinGW\bin\gcc.exe -mdll -O -Wall -DHAVE_LZ4=1 -DHAVE_SNAPPY=1 -DHAVE_ZLIB=1 -Ic-blosc\blosc -Ic-blosc/inte
rnal-complibs\lz4-1.6.0 -Ic-blosc/internal-complibs\snappy-1.1.1 -Ic-blosc/internal-complibs\zlib-1.2.8 -Ic:\Python\incl
ude -Ic:\Python\PC -c c-blosc/blosc\blosc.c -o build\temp.win32-2.7\Release\c-blosc\blosc\blosc.o
    c-blosc/blosc\blosc.c:56:23: fatal error: pthread.h: No such file or directory
       #include <pthread.h>
                           ^
    compilation terminated.
    error: command 'C:\\Python\\MinGW\\bin\\gcc.exe' failed with exit status 1

    ----------------------------------------
    Rolling back uninstall of blosc
    Command "c:\Python\python.exe -c "import setuptools, tokenize;__file__='c:\\users\\eep1\\appdata\\local\\temp\\pip-b
uild-h08iyx\\blosc\\setup.py';exec(compile(getattr(tokenize, 'open', open)(__file__).read().replace('\r\n', '\n'), __fil
e__, 'exec'))" install --record c:\users\eep1\appdata\local\temp\pip-kup53o-record\install-record.txt --single-version-e
xternally-managed --compile" failed with error code 1 in c:\users\eep1\appdata\local\temp\pip-build-h08iyx\blosc

Kind regards

more extensive tests

These days I tend to do most of my testing with nosetests for discovery and test functions provided by nose.tools.

How would you prefer to add more tests? I would propose using a file test_blosc.py with simple test functions.

floating point exception while operating on empty container

e.g. when something like this (reading 2nd read of the same opened file):

 $> python -c 'import blosc; f=file("/home/yoh/.emacs", "rb"); print 1; blosc.compress(f.read(), typesize=8); print 2; blosc.compress(f.read(), typesize=8); print 3;'
1
2
zsh: floating point exception  python -c 

MacOS installation failure

Installing on MacOS 10.12.6 with Anaconda 5.3.0 Python3.6.1, the following error occurs:

    blosc/blosc_extension.c:594:27: error: use of undeclared identifier 'BLOSC_NOSHUFFLE'
      PyModule_AddIntMacro(m, BLOSC_NOSHUFFLE);
                              ^
    blosc/blosc_extension.c:595:27: error: use of undeclared identifier 'BLOSC_SHUFFLE'
      PyModule_AddIntMacro(m, BLOSC_SHUFFLE);
                              ^
    blosc/blosc_extension.c:596:27: error: use of undeclared identifier 'BLOSC_BITSHUFFLE'
      PyModule_AddIntMacro(m, BLOSC_BITSHUFFLE);
                              ^
    22 warnings and 3 errors generated.
    error: command 'gcc' failed with exit status 1

Release archive

Hi Francesc,

Could you please also provide tarballs somewhere for python-blosc with an exact name/version ? (such as python-blosc-1.1.tar.gz)
That would be helpful for packagers. The PyPi archive is useful, but doesn't seem to match exactly this repository (it contains no README/Release-notes AFAIR).

Thanks!

support memoryviews even on bytearray?

Seems like attic (which is py3 only) would want to give a memoryview on a bytearray (called data) to blosc's compress, but it doesn't get accepted there, so I would have to use bytes(data) to first convert to an acceptable type - and this would make a full new copy of it, right?

I tried to losen the first type check in blosc to accept memoryviews also, but then it just fails a little later with:

    return _ext.compress(bytesobj, typesize, clevel, shuffle, cname)
TypeError: must be read-only pinned buffer, not memoryview

Seems like code like that (see HAS_NEW_BUFFER) might be useful (but please check, I am not familiar with low-level stuff):

https://github.com/dlitz/pycrypto/pull/81/files#diff-d29a5dec14d8ca1fc5d169320636fc52R631

Memory leak in blosc.decompress in current master

If I run this, I get quickly into memory troubles:

import psutil
x = 'aj lkajfldkfjaoiur 0983 5t93h308 ajlkf n fsfhahtey8 haiuoyajkah ' * 100000
counter = 0
freemem = psutil.avail_phymem() / 1024. ** 3
while freemem > 1:
    cx = blosc.compress(x, typesize=1, clevel=1, shuffle=False, cname='blosclz')
    blosc.decompress(cx)
    freemem = psutil.avail_phymem() / 1024. ** 3
    counter += 1
    if counter % 10000 == 0:
        print('%.2f GB' % freemem)

#10.56 GB
#8.66 GB
#6.75 GB
#4.84 GB
#2.91 GB
#1.01 GB

But if I run this, I don't:

import psutil
x = 'aj lkajfldkfjaoiur 0983 5t93h308 ajlkf n fsfhahtey8 haiuoyajkah ' * 100000
cx = blosc.compress(x, typesize=1, clevel=1, shuffle=False, cname='blosclz')
counter = 0
freemem = psutil.avail_phymem() / 1024. ** 3
while freemem > 1:
    blosc.decompress(cx)
    freemem = psutil.avail_phymem() / 1024. ** 3
    counter += 1
    if counter % 10000 == 0:
        print('%.2f GB' % freemem)

#12.44 GB
#12.44 GB
#12.44 GB
# ...

That makes me think that decompress is not freeing memory used by the compressed data. And that the bug has been introduced in some recent commit to python-blosc.

I get the leak with python-blosc from master, spiced with c-blosc 1.7.0 (although I think the particular version of c-blosc is not relevant here). If I run python-blosc 1.2.7 and c-blosc 1.7.0, everything works like a charm - winning combination, because if I use older versions of lz4hc I get segfaults, the reason why I got into all this trouble at first instance.

Corrupted data when using bitshuffle?

Perhaps I don't quite understand the typesize parameter, but would doing the following be incorrect when using blosc.compress?

I'm running this on Mac with Python 3.6.2 using the default python-blosc package from conda-forge. Here it seems like the decompress does not give me back the original string value:

$ python
Python 3.6.2 | packaged by conda-forge | (default, Jul 23 2017, 23:01:38)
[GCC 4.2.1 Compatible Apple LLVM 6.1.0 (clang-602.0.53)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> import numpy
>>> x = numpy.ones(27266, dtype='uint8')
>>> xx = x.tobytes()
>>> import blosc
>>> blosc.decompress(blosc.compress(xx, clevel=5, cname='zstd', shuffle=blosc.BITSHUFFLE))[-3:]
b'\x01  '
>>> blosc.decompress(blosc.compress(xx, clevel=5, cname='zstd', shuffle=blosc.SHUFFLE))[-3:]
b'\x01\x01\x01'
>>> blosc.decompress(blosc.compress(xx, clevel=5, cname='zstd'))[-3:]
b'\x01\x01\x01'

As you can see, with the bitshuffle filter, the values from the decompress seems corrupted at the very end (with two \x00 instead of \x01). Do you have any idea why?

Here is my environment.yml:

name: test_blosc
channels:
- conda-forge
- defaults
dependencies:
- blas=1.1=openblas
- blosc=1.12.0=0
- ca-certificates=2017.7.27.1=0
- certifi=2017.7.27.1=py36_0
- libgfortran=3.0.0=0
- ncurses=5.9=10
- numpy=1.13.3=py36_blas_openblas_200
- openblas=0.2.19=2
- openssl=1.0.2l=0
- pip=9.0.1=py36_0
- python=3.6.2=0
- python-blosc=1.4.4=py36_0
- readline=6.2=0
- setuptools=36.6.0=py36_1
- snappy=1.1.7=1
- sqlite=3.13.0=1
- tk=8.5.19=2
- wheel=0.30.0=py_1
- xz=5.2.3=0
- zlib=1.2.11=0
prefix: /Users/wlee/miniconda3/envs/test_blosc

install issue?

I am trying to use python-blosc for hyperspy/hyperspy#1716 and after installing blosc using pip, I get the following error when importing blosc.

>>> import blosc
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/opt/anaconda3/lib/python3.6/site-packages/blosc/__init__.py", line 13, in <module>
    from blosc.blosc_extension import (
ImportError: /opt/anaconda3/lib/python3.6/site-packages/blosc/blosc_extension.cpython-36m-x86_64-linux-gnu.so: undefined symbol: _ZNSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEE9_M_appendEPKcm

Missing a package? Failed Install

I just installed anaconda from a clean install and installed blosc using pip install blosc. I am now having this import error:

In [1]: import blosc
---------------------------------------------------------------------------
ImportError Traceback (most recent call last)
in ()
----> 1 import blosc

/usr/local/anaconda/lib/python2.7/site-packages/blosc/init.py in ()
11
12 # Blosc C symbols that we want to export
---> 13 from blosc.blosc_extension import (
14 BLOSC_VERSION_STRING as VERSION_STRING,
15 BLOSC_VERSION_DATE as VERSION_DATE,

ImportError: /usr/local/anaconda/lib/python2.7/site-packages/blosc/blosc_extension.so: undefined symbol: _ZNSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEE9_M_appendEPKcm

I am running Linux Mint 18 Sarah 64-bit Kernel Linux 4.4.0-21-generic x86_64 MATE 1.14.1
Am I missing a package when installing from pip? Thank you.

Alternatinve way of managing Blosc sources

Currently the, Blosc sources in c-blosc subdirectory are managed by manually copying them when a new version of Blosc is released.

I will shortly present two PRs to demo using subtrees or submodules to manage Blosc sources.

Expose underlying compression libraries directly

Sometimes I want to use some parts of blosc but not others. In particular I use blosc to gain access to other compression libraries (like snappy) but don't want the added features of blocked parallel compression (I handle parallelism on my own.) What are your thoughts on exposing some of the internal building blocks out to the Python layer?

I would love functions like blosc.shuffle, blosc.lzo.compress, blosc.snappy.compress, etc.. I would use blosc in several more applications if these were available to me directly. It would be especially useful if these did not hold on to the GIL.

building docs breaks on TypeError

Hi, recently on Debian we've got this error from building the docs:

# Sphinx version: 1.5.6
# Python version: 2.7.13 (CPython)
# Docutils version: 0.13.1 release
# Jinja2 version: 2.9.6
# Last messages:
#   loading intersphinx inventory from http://docs.python.org/objects.inv...
#   WARNING: intersphinx inventory 'http://docs.python.org/objects.inv' not fetchable due to <class 'requests.exceptions.ProxyError'>: HTTPConnectionPool(host='127.0.0.1', port=9): Max retries exceeded with url: http://docs.python.org/objects.inv (Caused by ProxyError('Cannot connect to proxy.', NewConnectionError('<requests.packages.urllib3.connection.HTTPConnection object at 0x7f4cc6e66750>: Failed to establish a new connection: [Errno 111] Connection refused',)))
#   building [mo]: targets for 0 po files that are out of date
#   building [html]: targets for 5 source files that are out of date
#   updating environment:
#   5 added, 0 changed, 0 removed
#   reading sources... [ 20%] index
#   reading sources... [ 40%] install
#   reading sources... [ 60%] intro
#   reading sources... [ 80%] reference
# Loaded extensions:
#   sphinx.ext.coverage (1.5.6) from /usr/lib/python2.7/dist-packages/sphinx/ext/coverage.pyc
#   sphinx.ext.todo (1.5.6) from /usr/lib/python2.7/dist-packages/sphinx/ext/todo.pyc
#   sphinx.ext.autodoc (1.5.6) from /usr/lib/python2.7/dist-packages/sphinx/ext/autodoc.pyc
#   sphinx.ext.intersphinx (1.5.6) from /usr/lib/python2.7/dist-packages/sphinx/ext/intersphinx.pyc
#   sphinx.ext.doctest (1.5.6) from /usr/lib/python2.7/dist-packages/sphinx/ext/doctest.pyc
#   alabaster (0.7.8) from /usr/lib/python2.7/dist-packages/alabaster/__init__.pyc
#   numpydoc (unknown version) from /usr/lib/python2.7/dist-packages/numpydoc/__init__.pyc
Traceback (most recent call last):
  File "/usr/lib/python2.7/dist-packages/sphinx/cmdline.py", line 296, in main
    app.build(opts.force_all, filenames)
  File "/usr/lib/python2.7/dist-packages/sphinx/application.py", line 333, in build
    self.builder.build_update()
  File "/usr/lib/python2.7/dist-packages/sphinx/builders/__init__.py", line 251, in build_update
    'out of date' % len(to_build))
  File "/usr/lib/python2.7/dist-packages/sphinx/builders/__init__.py", line 265, in build
    self.doctreedir, self.app))
  File "/usr/lib/python2.7/dist-packages/sphinx/environment/__init__.py", line 556, in update
    self._read_serial(docnames, app)
  File "/usr/lib/python2.7/dist-packages/sphinx/environment/__init__.py", line 576, in _read_serial
    self.read_doc(docname, app)
  File "/usr/lib/python2.7/dist-packages/sphinx/environment/__init__.py", line 684, in read_doc
    pub.publish()
  File "/usr/lib/python2.7/dist-packages/docutils/core.py", line 217, in publish
    self.settings)
  File "/usr/lib/python2.7/dist-packages/sphinx/io.py", line 55, in read
    self.parse()
  File "/usr/lib/python2.7/dist-packages/docutils/readers/__init__.py", line 78, in parse
    self.parser.parse(self.input, document)
  File "/usr/lib/python2.7/dist-packages/docutils/parsers/rst/__init__.py", line 185, in parse
    self.statemachine.run(inputlines, document, inliner=self.inliner)
  File "/usr/lib/python2.7/dist-packages/docutils/parsers/rst/states.py", line 171, in run
    input_source=document['source'])
  File "/usr/lib/python2.7/dist-packages/docutils/statemachine.py", line 239, in run
    context, state, transitions)
  File "/usr/lib/python2.7/dist-packages/docutils/statemachine.py", line 460, in check_line
    return method(match, context, next_state)
  File "/usr/lib/python2.7/dist-packages/docutils/parsers/rst/states.py", line 2983, in text
    self.section(title.lstrip(), source, style, lineno + 1, messages)
  File "/usr/lib/python2.7/dist-packages/docutils/parsers/rst/states.py", line 327, in section
    self.new_subsection(title, lineno, messages)
  File "/usr/lib/python2.7/dist-packages/docutils/parsers/rst/states.py", line 395, in new_subsection
    node=section_node, match_titles=True)
  File "/usr/lib/python2.7/dist-packages/docutils/parsers/rst/states.py", line 282, in nested_parse
    node=node, match_titles=match_titles)
  File "/usr/lib/python2.7/dist-packages/docutils/parsers/rst/states.py", line 196, in run
    results = StateMachineWS.run(self, input_lines, input_offset)
  File "/usr/lib/python2.7/dist-packages/docutils/statemachine.py", line 239, in run
    context, state, transitions)
  File "/usr/lib/python2.7/dist-packages/docutils/statemachine.py", line 460, in check_line
    return method(match, context, next_state)
  File "/usr/lib/python2.7/dist-packages/docutils/parsers/rst/states.py", line 2748, in underline
    self.section(title, source, style, lineno - 1, messages)
  File "/usr/lib/python2.7/dist-packages/docutils/parsers/rst/states.py", line 327, in section
    self.new_subsection(title, lineno, messages)
  File "/usr/lib/python2.7/dist-packages/docutils/parsers/rst/states.py", line 395, in new_subsection
    node=section_node, match_titles=True)
  File "/usr/lib/python2.7/dist-packages/docutils/parsers/rst/states.py", line 282, in nested_parse
    node=node, match_titles=match_titles)
  File "/usr/lib/python2.7/dist-packages/docutils/parsers/rst/states.py", line 196, in run
    results = StateMachineWS.run(self, input_lines, input_offset)
  File "/usr/lib/python2.7/dist-packages/docutils/statemachine.py", line 239, in run
    context, state, transitions)
  File "/usr/lib/python2.7/dist-packages/docutils/statemachine.py", line 460, in check_line
    return method(match, context, next_state)
  File "/usr/lib/python2.7/dist-packages/docutils/parsers/rst/states.py", line 2321, in explicit_markup
    nodelist, blank_finish = self.explicit_construct(match)
  File "/usr/lib/python2.7/dist-packages/docutils/parsers/rst/states.py", line 2333, in explicit_construct
    return method(self, expmatch)
  File "/usr/lib/python2.7/dist-packages/docutils/parsers/rst/states.py", line 2076, in directive
    directive_class, match, type_name, option_presets)
  File "/usr/lib/python2.7/dist-packages/docutils/parsers/rst/states.py", line 2125, in run_directive
    result = directive_instance.run()
  File "/usr/lib/python2.7/dist-packages/sphinx/ext/autodoc.py", line 1668, in run
    documenter.generate(more_content=self.content)
  File "/usr/lib/python2.7/dist-packages/sphinx/ext/autodoc.py", line 1000, in generate
    sig = self.format_signature()
  File "/usr/lib/python2.7/dist-packages/sphinx/ext/autodoc.py", line 654, in format_signature
    self.object, self.options, args, retann)
  File "/usr/lib/python2.7/dist-packages/sphinx/application.py", line 593, in emit_firstresult
    for result in self.emit(event, *args):
  File "/usr/lib/python2.7/dist-packages/sphinx/application.py", line 589, in emit
    results.append(callback(self, *args))
  File "/usr/lib/python2.7/dist-packages/numpydoc/numpydoc.py", line 119, in mangle_signature
    sig = re.sub(sixu("^[^(]*"), sixu(""), sig)
  File "/usr/lib/python2.7/re.py", line 155, in sub
    return _compile(pattern, flags).sub(repl, string, count)
TypeError: expected string or buffer

numpydoc is 0.7.0. Intersphinx/net connection during build is blocked by policy.
Thanks,
DS

Build failure on raspberry pi running arch

Log:

    [user@scene_pi python-blosc]$ sudo python setup.py install
    running install
    running bdist_egg
    running egg_info
    writing dependency_links to blosc.egg-info/dependency_links.txt
    writing top-level names to blosc.egg-info/top_level.txt
    writing blosc.egg-info/PKG-INFO
    reading manifest file 'blosc.egg-info/SOURCES.txt'
    reading manifest template 'MANIFEST.in'
    warning: no files found matching '*.txt'
    warning: no files found matching '*.cpp' under directory 'c-blosc'
    warning: no files found matching '*.hpp' under directory 'c-blosc'
    writing manifest file 'blosc.egg-info/SOURCES.txt'
    installing library code to build/bdist.linux-armv7l/egg
    running install_lib
    running build_py
    creating build
    creating build/lib.linux-armv7l-3.4
    creating build/lib.linux-armv7l-3.4/blosc
    copying blosc/test.py -> build/lib.linux-armv7l-3.4/blosc
    copying blosc/__init__.py -> build/lib.linux-armv7l-3.4/blosc
    copying blosc/version.py -> build/lib.linux-armv7l-3.4/blosc
    copying blosc/toplevel.py -> build/lib.linux-armv7l-3.4/blosc
    running build_ext
    building 'blosc.blosc_extension' extension
    creating build/temp.linux-armv7l-3.4
    creating build/temp.linux-armv7l-3.4/blosc
    creating build/temp.linux-armv7l-3.4/c-blosc
    creating build/temp.linux-armv7l-3.4/c-blosc/blosc
    creating build/temp.linux-armv7l-3.4/c-blosc/internal-complibs
    creating build/temp.linux-armv7l-3.4/c-blosc/internal-complibs/lz4-1.7.0
    creating build/temp.linux-armv7l-3.4/c-blosc/internal-complibs/snappy-1.1.1
    creating build/temp.linux-armv7l-3.4/c-blosc/internal-complibs/zlib-1.2.8
    gcc -pthread -Wno-unused-result -DNDEBUG -g -fwrapv -O3 -Wall -Wstrict-prototypes -march=armv7-a -mfloat-abi=hard -mfpu=vfpv3-d16 -O2 -pipe -fstack-protector --param=ssp-buffer-size=4 -fPIC -DHAVE_LZ4=1 -DHAVE_SNAPPY=1 -DHAVE_ZLIB=1 -Ic-blosc/blosc -Ic-blosc/internal-complibs/lz4-1.7.0 -Ic-blosc/internal-complibs/snappy-1.1.1 -Ic-blosc/internal-complibs/zlib-1.2.8 -I/usr/include/python3.4m -c blosc/blosc_extension.c -o build/temp.linux-armv7l-3.4/blosc/blosc_extension.o
    gcc -pthread -Wno-unused-result -DNDEBUG -g -fwrapv -O3 -Wall -Wstrict-prototypes -march=armv7-a -mfloat-abi=hard -mfpu=vfpv3-d16 -O2 -pipe -fstack-protector --param=ssp-buffer-size=4 -fPIC -DHAVE_LZ4=1 -DHAVE_SNAPPY=1 -DHAVE_ZLIB=1 -Ic-blosc/blosc -Ic-blosc/internal-complibs/lz4-1.7.0 -Ic-blosc/internal-complibs/snappy-1.1.1 -Ic-blosc/internal-complibs/zlib-1.2.8 -I/usr/include/python3.4m -c c-blosc/blosc/blosc.c -o build/temp.linux-armv7l-3.4/c-blosc/blosc/blosc.o
    c-blosc/blosc/blosc.c: In function 'blosc_getitem':
    c-blosc/blosc/blosc.c:1275:7: warning: unused variable 'tmp_init' [-Wunused-variable]
       int tmp_init = 0;
           ^
    gcc -pthread -Wno-unused-result -DNDEBUG -g -fwrapv -O3 -Wall -Wstrict-prototypes -march=armv7-a -mfloat-abi=hard -mfpu=vfpv3-d16 -O2 -pipe -fstack-protector --param=ssp-buffer-size=4 -fPIC -DHAVE_LZ4=1 -DHAVE_SNAPPY=1 -DHAVE_ZLIB=1 -Ic-blosc/blosc -Ic-blosc/internal-complibs/lz4-1.7.0 -Ic-blosc/internal-complibs/snappy-1.1.1 -Ic-blosc/internal-complibs/zlib-1.2.8 -I/usr/include/python3.4m -c c-blosc/blosc/shuffle-sse2.c -o build/temp.linux-armv7l-3.4/c-blosc/blosc/shuffle-sse2.o
    c-blosc/blosc/shuffle-sse2.c:14:4: error: #error SSE2 is not supported by the target architecture/platform and/or this compiler.
       #error SSE2 is not supported by the target architecture/platform and/or this compiler.
        ^
    c-blosc/blosc/shuffle-sse2.c:17:23: fatal error: emmintrin.h: No such file or directory
    compilation terminated.
    error: command 'gcc' failed with exit status 1

Use memoryview.itemsize if no default provided

If I give blosc.compress a memoryview object it would be possible to learn the typesize automatically.

In [1]: import numpy as np

In [2]: x = np.ones(5, dtype='i4')

In [3]: x.data
Out[3]: <memory at 0x7fe710bf0408>

In [4]: x.data.itemsize
Out[4]: 4

In [5]: import blosc

In [6]: blosc.compress(x.data, typesize=x.data.itemsize)
Out[6]: b'\x02\x01\x13\x04\x14\x00\x00\x00\x14\x00\x00\x00$\x00\x00\x00\x01\x00\x00\x00\x01\x00\x00\x00\x01\x00\x00\x00\x01\x00\x00\x00\x01\x00\x00\x00'

Sporadic segfaults with GCC on Ubuntu Linux

When using GCC (tested with 4.9.3 and 5.2.1) on a Ubuntu 15.10 box one can get sporadicly but consistently segfaults when exercising the test suite enough times:

$ for i in {1..10}; do nosetests --with-doctest blosc; done
........................
----------------------------------------------------------------------
Ran 24 tests in 5.054s

OK
........................
----------------------------------------------------------------------
Ran 24 tests in 5.368s

OK
........................
----------------------------------------------------------------------
Ran 24 tests in 5.122s

OK
........................
----------------------------------------------------------------------
Ran 24 tests in 5.184s

OK
........................
----------------------------------------------------------------------
Ran 24 tests in 5.123s

OK
........................
----------------------------------------------------------------------
Ran 24 tests in 5.753s

OK
........................
----------------------------------------------------------------------
Ran 24 tests in 5.343s

OK
........................
----------------------------------------------------------------------
Ran 24 tests in 5.133s

OK
........................
----------------------------------------------------------------------
Ran 24 tests in 5.487s

OK
Segmentation fault (core dumped)

I cannot get any segfault when using clang (tested with 3.6 and 3.7). Testing on a Mac OSX box does not show any problem either (this is normal because xcode brings clang/LLVM).

A detailed investigation using valgrind does not show anything too evident, except things like:

test_no_leaks (blosc.test.TestCodec) ... ==5330== Invalid read of size 4
==5330==    at 0x4ECEF73: PyObject_Free (obmalloc.c:1013)
==5330==    by 0x4EE2A72: tupledealloc (tupleobject.c:235)
==5330==    by 0x4F327C6: ext_do_call (ceval.c:4665)
==5330==    by 0x4F327C6: PyEval_EvalFrameEx (ceval.c:3026)
==5330==    by 0x4F35A2D: PyEval_EvalCodeEx (ceval.c:3582)
==5330==    by 0x4F34A54: fast_function (ceval.c:4446)
==5330==    by 0x4F34A54: call_function (ceval.c:4371)
==5330==    by 0x4F34A54: PyEval_EvalFrameEx (ceval.c:2987)
==5330==    by 0x4F35A2D: PyEval_EvalCodeEx (ceval.c:3582)
==5330==    by 0x4EB14A7: function_call (funcobject.c:526)
==5330==    by 0x4E81D22: PyObject_Call (abstract.c:2546)
==5330==    by 0x4F32796: ext_do_call (ceval.c:4663)
==5330==    by 0x4F32796: PyEval_EvalFrameEx (ceval.c:3026)
==5330==    by 0x4F35A2D: PyEval_EvalCodeEx (ceval.c:3582)
==5330==    by 0x4EB13A0: function_call (funcobject.c:526)
==5330==    by 0x4E81D22: PyObject_Call (abstract.c:2546)
==5330==  Address 0x428b9020 is 32 bytes before a block of size 80,002,976 in arena "client"

so perhaps there is a problem with reference counting but I am not sure if this is a red herring.

Anyway, as GCC is a very important compiler this ticket has high priority.

blosc_compress_ctx accepts compressor while blosc_compress does not

While having a look at a test failing on bloscpack, I saw the following:

blosc_compress_ctx accepts a compressor string passed as parameter.
While blosc_compress the compressor parameter is missing, therefore I believe is ignored. It uses the value from g_global_context instead.

On one hand when releasing the gil the compress string is passed, but on the other hand blosc_compress is using the global context blosc_extension.c lines of interest. Is this intentional what am I missing?

fix prefix in doctests

All doctests should use the blosc prefix:

>>> blosc.compress(...)
>>> blosc.decompress(...)

Release GIL

I would like to use single-threaded blosc.compress/decompress in many threads in parallel. My understanding is that this is possible in the C layer by creating many contexts but not currently possible in the Python layer.

Ambiguities created by the new test runner.

The heuristics of nosetests lead to the test() function which should only run the tests, to be misdetected as a real test. Hence, now when using nosetests the test are actually run twice:

1 zsh» PYTHONPATH=. nosetests 
........test_basic_codec (blosc.test.TestCodec) ... ok
test_compress_exceptions (blosc.test.TestCodec) ... ok
test_compress_ptr_exceptions (blosc.test.TestCodec) ... ok
test_decompress_exceptions (blosc.test.TestCodec) ... ok
test_decompress_ptr_exceptions (blosc.test.TestCodec) ... ok
test_pack_array_exceptions (blosc.test.TestCodec) ... ok
test_set_nthreads_exceptions (blosc.test.TestCodec) ... ok
test_unpack_array_exceptions (blosc.test.TestCodec) ... ok
compress (blosc.toplevel)
Doctest: blosc.toplevel.compress ... ok
compress_ptr (blosc.toplevel)
Doctest: blosc.toplevel.compress_ptr ... ok
decompress (blosc.toplevel)
Doctest: blosc.toplevel.decompress ... ok
decompress_ptr (blosc.toplevel)
Doctest: blosc.toplevel.decompress_ptr ... ok
free_resources (blosc.toplevel)
Doctest: blosc.toplevel.free_resources ... ok
pack_array (blosc.toplevel)
Doctest: blosc.toplevel.pack_array ... ok
set_nthreads (blosc.toplevel)
Doctest: blosc.toplevel.set_nthreads ... ok
unpack_array (blosc.toplevel)
Doctest: blosc.toplevel.unpack_array ... ok

----------------------------------------------------------------------
Ran 16 tests in 5.389s

OK
.
----------------------------------------------------------------------
Ran 9 tests in 9.674s

OK
PYTHONPATH=. nosetests  5.09s user 4.66s system 94% cpu 10.267 total

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.