blosc / python-blosc Goto Github PK

View Code? Open in Web Editor NEW

348.0 348.0 73.0 5.08 MB

A Python wrapper for the extremely fast Blosc compression library

Home Page: https://www.blosc.org/python-blosc/python-blosc.html

License: Other

Python 78.76% C 19.26% CMake 1.98%

blosc compression python wrapper

python-blosc's Issues

subtree-merge-blosc

The subtree merge script is broken right now

broken link: http://blosc.pydata.org

Hi,

the title of this repo links to http://blosc.pydata.org, but this just returns a 404. I guess, it should link to http://www.blosc.org/ .

Cheers,
Arne

refactor type and value checking

The compress* and decompress* functions are doing a fair bit of type and value checking. Some of the code is copied and pasted. Should refactor this into functions (or decorators) and then just call the functions to avoid duplication.

Memory leak in blosc.decompress in current master

If I run this, I get quickly into memory troubles:

import psutil
x = 'aj lkajfldkfjaoiur 0983 5t93h308 ajlkf n fsfhahtey8 haiuoyajkah ' * 100000
counter = 0
freemem = psutil.avail_phymem() / 1024. ** 3
while freemem > 1:
    cx = blosc.compress(x, typesize=1, clevel=1, shuffle=False, cname='blosclz')
    blosc.decompress(cx)
    freemem = psutil.avail_phymem() / 1024. ** 3
    counter += 1
    if counter % 10000 == 0:
        print('%.2f GB' % freemem)

#10.56 GB
#8.66 GB
#6.75 GB
#4.84 GB
#2.91 GB
#1.01 GB

But if I run this, I don't:

import psutil
x = 'aj lkajfldkfjaoiur 0983 5t93h308 ajlkf n fsfhahtey8 haiuoyajkah ' * 100000
cx = blosc.compress(x, typesize=1, clevel=1, shuffle=False, cname='blosclz')
counter = 0
freemem = psutil.avail_phymem() / 1024. ** 3
while freemem > 1:
    blosc.decompress(cx)
    freemem = psutil.avail_phymem() / 1024. ** 3
    counter += 1
    if counter % 10000 == 0:
        print('%.2f GB' % freemem)

#12.44 GB
#12.44 GB
#12.44 GB
# ...

That makes me think that decompress is not freeing memory used by the compressed data. And that the bug has been introduced in some recent commit to python-blosc.

I get the leak with python-blosc from master, spiced with c-blosc 1.7.0 (although I think the particular version of c-blosc is not relevant here). If I run python-blosc 1.2.7 and c-blosc 1.7.0, everything works like a charm - winning combination, because if I use older versions of lz4hc I get segfaults, the reason why I got into all this trouble at first instance.

Support for recognizing AVX2 instructions

In order to detect AVX2 instructions and activate it, it may be a good idea to use:

https://github.com/FlightDataServices/PyCPUID

so as to detect if AVX2 flags can be used to compile c-blosc directly from setuptools.

The only 'problem' that I can see is that PyCPUID has a LGPL which can somewhat collide with BSD that is used for distribution purposes.

Fix types of decompression.

Check the types used in decompression and fix them akin to compression. See: FrancescAlted@2a852c3

support aarch64?

pip install blosc on aarch64 / python3.5.2 errors out due to cpuinfo.py not having a match clause for aarch64.

This is solvable by adding the following three lines:

+       elif re.match('^aarch64$', raw_arch_string):
+               arch = 'AARCH_64'
+               bits = 64

Cheers!
Ian

blosc_compress_ctx accepts compressor while blosc_compress does not

While having a look at a test failing on bloscpack, I saw the following:

blosc_compress_ctx accepts a compressor string passed as parameter.
While blosc_compress the compressor parameter is missing, therefore I believe is ignored. It uses the value from g_global_context instead.

On one hand when releasing the gil the compress string is passed, but on the other hand blosc_compress is using the global context blosc_extension.c lines of interest. Is this intentional what am I missing?

fix indentation in docstrings

When looking at:

https://github.com/FrancescAlted/python-blosc/blob/master/blosc/toplevel.py#L116

I can see, that the parameters are incorrectly indented. This happens in other places too.

Ambiguities created by the new test runner.

The heuristics of nosetests lead to the test() function which should only run the tests, to be misdetected as a real test. Hence, now when using nosetests the test are actually run twice:

1 zsh» PYTHONPATH=. nosetests 
........test_basic_codec (blosc.test.TestCodec) ... ok
test_compress_exceptions (blosc.test.TestCodec) ... ok
test_compress_ptr_exceptions (blosc.test.TestCodec) ... ok
test_decompress_exceptions (blosc.test.TestCodec) ... ok
test_decompress_ptr_exceptions (blosc.test.TestCodec) ... ok
test_pack_array_exceptions (blosc.test.TestCodec) ... ok
test_set_nthreads_exceptions (blosc.test.TestCodec) ... ok
test_unpack_array_exceptions (blosc.test.TestCodec) ... ok
compress (blosc.toplevel)
Doctest: blosc.toplevel.compress ... ok
compress_ptr (blosc.toplevel)
Doctest: blosc.toplevel.compress_ptr ... ok
decompress (blosc.toplevel)
Doctest: blosc.toplevel.decompress ... ok
decompress_ptr (blosc.toplevel)
Doctest: blosc.toplevel.decompress_ptr ... ok
free_resources (blosc.toplevel)
Doctest: blosc.toplevel.free_resources ... ok
pack_array (blosc.toplevel)
Doctest: blosc.toplevel.pack_array ... ok
set_nthreads (blosc.toplevel)
Doctest: blosc.toplevel.set_nthreads ... ok
unpack_array (blosc.toplevel)
Doctest: blosc.toplevel.unpack_array ... ok

----------------------------------------------------------------------
Ran 16 tests in 5.389s

OK
.
----------------------------------------------------------------------
Ran 9 tests in 9.674s

OK
PYTHONPATH=. nosetests  5.09s user 4.66s system 94% cpu 10.267 total

fix return NULL w/o exception set

See the comments in FrancescAlted@2a852c3 for details.

segfault when typesize is zero.

In [1]: import blosc
In [2]: s = 'abc'
In [3]: blosc.compress(s, typesize=0)
[3]    12228 floating point exception (core dumped)  ipython

binary wheels

We are stumbling upon the missing Python.h issue again and again. Maybe it is time to put some effort into binary wheels. Though I must confess I have absolutely no idea about these.

Need `lizard` branch

I would like to submit a pull request to support the c-blosc candidate branch lizard in python-blosc found here:

https://github.com/robbmcleod/python-blosc/tree/lizard

but I need a corresponding branch to pull against.

Error when building 1.2.7 on python 3.5.1 (Anaconda 2.5, Windows)

Hello,

See this for more info: Blosc/bloscpack#46

@esc Suggested I post a bug here instead...

I have Anaconda 2.5 (Python 3.5.1) with Visual Studio community edition. I can compile and install 1.2.8 (or 1.2.9dev0 from github) with no problems. With 1.2.7, pip install blosc==1.2.7 gives me all this nastiness....

Both 1.2.8 and 1.2.9dev0 seem to build just fine. (I'm using these while playing with castra).

uild\temp.win-amd64-3.5\Release\c-blosc/internal-complibs\zlib-1.2.8\inflate.obj
build\temp.win-amd64-3.5\Release\c-blosc/internal-complibs\zlib-1.2.8\inftrees.
obj build\temp.win-amd64-3.5\Release\c-blosc/internal-complibs\zlib-1.2.8\trees.
obj build\temp.win-amd64-3.5\Release\c-blosc/internal-complibs\zlib-1.2.8\uncomp
r.obj build\temp.win-amd64-3.5\Release\c-blosc/internal-complibs\zlib-1.2.8\zuti
l.obj /OUT:build\lib.win-amd64-3.5\blosc\blosc_extension.cp35-win_amd64.pyd /IMP
LIB:build\temp.win-amd64-3.5\Release\blosc\blosc_extension.cp35-win_amd64.lib
blosc_extension.obj : warning LNK4197: export 'PyInit_blosc_extension' speci
fied multiple times; using first specification
Creating library build\temp.win-amd64-3.5\Release\blosc\blosc_extension.c
p35-win_amd64.lib and object build\temp.win-amd64-3.5\Release\blosc\blosc_extens
ion.cp35-win_amd64.exp
shuffle.obj : error LNK2001: unresolved external symbol blosc_get_cpu_featur
es
build\lib.win-amd64-3.5\blosc\blosc_extension.cp35-win_amd64.pyd : fatal err
or LNK1120: 1 unresolved externals
error: command 'C:\Program Files (x86)\Microsoft Visual Studio 14.0\VC\B
IN\amd64\link.exe' failed with exit status 1120

----------------------------------------

Command "C:\Anaconda3\python.exe -u -c "import setuptools, tokenize;file='C:
\Users\PQUACK1\AppData\Local\Temp\pip-build-7u774njl\blosc\setup.py';ex
ec(compile(getattr(tokenize, 'open', open)(file).read().replace('\r\n', '\n'
), file, 'exec'))" install --record C:\Users\QUACK1\AppData\Local\Temp\pip
-rbdprlu1-record\install-record.txt --single-version-externally-managed --compil
e" failed with error code 1 in C:\Users\QUACK~1\AppData\Local\Temp\pip-build-7u
774njl\blosc\

[Anaconda3] C:\Users\pquackenbush\git>

install issue?

I am trying to use python-blosc for hyperspy/hyperspy#1716 and after installing blosc using pip, I get the following error when importing blosc.

>>> import blosc
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/opt/anaconda3/lib/python3.6/site-packages/blosc/__init__.py", line 13, in <module>
    from blosc.blosc_extension import (
ImportError: /opt/anaconda3/lib/python3.6/site-packages/blosc/blosc_extension.cpython-36m-x86_64-linux-gnu.so: undefined symbol: _ZNSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEE9_M_appendEPKcm

Do not mask Python errors

For example, in:

https://github.com/FrancescAlted/python-blosc/blob/master/blosc/blosc_extension.c#L77

the original error should not be masked with another error, which can be misleading for debugging.

Release GIL

I would like to use single-threaded blosc.compress/decompress in many threads in parallel. My understanding is that this is possible in the C layer by creating many contexts but not currently possible in the Python layer.

building docs breaks on TypeError

Hi, recently on Debian we've got this error from building the docs:

# Sphinx version: 1.5.6
# Python version: 2.7.13 (CPython)
# Docutils version: 0.13.1 release
# Jinja2 version: 2.9.6
# Last messages:
#   loading intersphinx inventory from http://docs.python.org/objects.inv...
#   WARNING: intersphinx inventory 'http://docs.python.org/objects.inv' not fetchable due to <class 'requests.exceptions.ProxyError'>: HTTPConnectionPool(host='127.0.0.1', port=9): Max retries exceeded with url: http://docs.python.org/objects.inv (Caused by ProxyError('Cannot connect to proxy.', NewConnectionError('<requests.packages.urllib3.connection.HTTPConnection object at 0x7f4cc6e66750>: Failed to establish a new connection: [Errno 111] Connection refused',)))
#   building [mo]: targets for 0 po files that are out of date
#   building [html]: targets for 5 source files that are out of date
#   updating environment:
#   5 added, 0 changed, 0 removed
#   reading sources... [ 20%] index
#   reading sources... [ 40%] install
#   reading sources... [ 60%] intro
#   reading sources... [ 80%] reference
# Loaded extensions:
#   sphinx.ext.coverage (1.5.6) from /usr/lib/python2.7/dist-packages/sphinx/ext/coverage.pyc
#   sphinx.ext.todo (1.5.6) from /usr/lib/python2.7/dist-packages/sphinx/ext/todo.pyc
#   sphinx.ext.autodoc (1.5.6) from /usr/lib/python2.7/dist-packages/sphinx/ext/autodoc.pyc
#   sphinx.ext.intersphinx (1.5.6) from /usr/lib/python2.7/dist-packages/sphinx/ext/intersphinx.pyc
#   sphinx.ext.doctest (1.5.6) from /usr/lib/python2.7/dist-packages/sphinx/ext/doctest.pyc
#   alabaster (0.7.8) from /usr/lib/python2.7/dist-packages/alabaster/__init__.pyc
#   numpydoc (unknown version) from /usr/lib/python2.7/dist-packages/numpydoc/__init__.pyc
Traceback (most recent call last):
  File "/usr/lib/python2.7/dist-packages/sphinx/cmdline.py", line 296, in main
    app.build(opts.force_all, filenames)
  File "/usr/lib/python2.7/dist-packages/sphinx/application.py", line 333, in build
    self.builder.build_update()
  File "/usr/lib/python2.7/dist-packages/sphinx/builders/__init__.py", line 251, in build_update
    'out of date' % len(to_build))
  File "/usr/lib/python2.7/dist-packages/sphinx/builders/__init__.py", line 265, in build
    self.doctreedir, self.app))
  File "/usr/lib/python2.7/dist-packages/sphinx/environment/__init__.py", line 556, in update
    self._read_serial(docnames, app)
  File "/usr/lib/python2.7/dist-packages/sphinx/environment/__init__.py", line 576, in _read_serial
    self.read_doc(docname, app)
  File "/usr/lib/python2.7/dist-packages/sphinx/environment/__init__.py", line 684, in read_doc
    pub.publish()
  File "/usr/lib/python2.7/dist-packages/docutils/core.py", line 217, in publish
    self.settings)
  File "/usr/lib/python2.7/dist-packages/sphinx/io.py", line 55, in read
    self.parse()
  File "/usr/lib/python2.7/dist-packages/docutils/readers/__init__.py", line 78, in parse
    self.parser.parse(self.input, document)
  File "/usr/lib/python2.7/dist-packages/docutils/parsers/rst/__init__.py", line 185, in parse
    self.statemachine.run(inputlines, document, inliner=self.inliner)
  File "/usr/lib/python2.7/dist-packages/docutils/parsers/rst/states.py", line 171, in run
    input_source=document['source'])
  File "/usr/lib/python2.7/dist-packages/docutils/statemachine.py", line 239, in run
    context, state, transitions)
  File "/usr/lib/python2.7/dist-packages/docutils/statemachine.py", line 460, in check_line
    return method(match, context, next_state)
  File "/usr/lib/python2.7/dist-packages/docutils/parsers/rst/states.py", line 2983, in text
    self.section(title.lstrip(), source, style, lineno + 1, messages)
  File "/usr/lib/python2.7/dist-packages/docutils/parsers/rst/states.py", line 327, in section
    self.new_subsection(title, lineno, messages)
  File "/usr/lib/python2.7/dist-packages/docutils/parsers/rst/states.py", line 395, in new_subsection
    node=section_node, match_titles=True)
  File "/usr/lib/python2.7/dist-packages/docutils/parsers/rst/states.py", line 282, in nested_parse
    node=node, match_titles=match_titles)
  File "/usr/lib/python2.7/dist-packages/docutils/parsers/rst/states.py", line 196, in run
    results = StateMachineWS.run(self, input_lines, input_offset)
  File "/usr/lib/python2.7/dist-packages/docutils/statemachine.py", line 239, in run
    context, state, transitions)
  File "/usr/lib/python2.7/dist-packages/docutils/statemachine.py", line 460, in check_line
    return method(match, context, next_state)
  File "/usr/lib/python2.7/dist-packages/docutils/parsers/rst/states.py", line 2748, in underline
    self.section(title, source, style, lineno - 1, messages)
  File "/usr/lib/python2.7/dist-packages/docutils/parsers/rst/states.py", line 327, in section
    self.new_subsection(title, lineno, messages)
  File "/usr/lib/python2.7/dist-packages/docutils/parsers/rst/states.py", line 395, in new_subsection
    node=section_node, match_titles=True)
  File "/usr/lib/python2.7/dist-packages/docutils/parsers/rst/states.py", line 282, in nested_parse
    node=node, match_titles=match_titles)
  File "/usr/lib/python2.7/dist-packages/docutils/parsers/rst/states.py", line 196, in run
    results = StateMachineWS.run(self, input_lines, input_offset)
  File "/usr/lib/python2.7/dist-packages/docutils/statemachine.py", line 239, in run
    context, state, transitions)
  File "/usr/lib/python2.7/dist-packages/docutils/statemachine.py", line 460, in check_line
    return method(match, context, next_state)
  File "/usr/lib/python2.7/dist-packages/docutils/parsers/rst/states.py", line 2321, in explicit_markup
    nodelist, blank_finish = self.explicit_construct(match)
  File "/usr/lib/python2.7/dist-packages/docutils/parsers/rst/states.py", line 2333, in explicit_construct
    return method(self, expmatch)
  File "/usr/lib/python2.7/dist-packages/docutils/parsers/rst/states.py", line 2076, in directive
    directive_class, match, type_name, option_presets)
  File "/usr/lib/python2.7/dist-packages/docutils/parsers/rst/states.py", line 2125, in run_directive
    result = directive_instance.run()
  File "/usr/lib/python2.7/dist-packages/sphinx/ext/autodoc.py", line 1668, in run
    documenter.generate(more_content=self.content)
  File "/usr/lib/python2.7/dist-packages/sphinx/ext/autodoc.py", line 1000, in generate
    sig = self.format_signature()
  File "/usr/lib/python2.7/dist-packages/sphinx/ext/autodoc.py", line 654, in format_signature
    self.object, self.options, args, retann)
  File "/usr/lib/python2.7/dist-packages/sphinx/application.py", line 593, in emit_firstresult
    for result in self.emit(event, *args):
  File "/usr/lib/python2.7/dist-packages/sphinx/application.py", line 589, in emit
    results.append(callback(self, *args))
  File "/usr/lib/python2.7/dist-packages/numpydoc/numpydoc.py", line 119, in mangle_signature
    sig = re.sub(sixu("^[^(]*"), sixu(""), sig)
  File "/usr/lib/python2.7/re.py", line 155, in sub
    return _compile(pattern, flags).sub(repl, string, count)
TypeError: expected string or buffer

numpydoc is 0.7.0. Intersphinx/net connection during build is blocked by policy.
Thanks,
DS

cffi and PyPy support

As suggested by @brouberol during #ep14

v1.4.3 gives `error: Error -1 while decompressing data`

specifically in blosc.toplevel, decompress_ptr() returns with error. This is called from Pandas v.0.18.1 pd.read_msgpack()

Use Sphinx for producing documentation

Sphinx has become almost a standard for documenting Python libraries. Let's do that.

fix return type of decompress_ptr

decompress_ptr should return the number of bytes written into memory rather than None.

Compress does not work as it should under Mac OSX

The test suite reveals that some dataset configurations do not work well on Mac OSX. Here it as example:

import array
import blosc
a = array.array('i', xrange(1000)).tostring()
print blosc.set_nthreads(1)  # disable multithreading, just in case
ac = blosc.compress(a, 4, 9, True)
print len(ac)

Here is the result on Mac OSX (incorrect):

2
4016

And here on Linux (which is correct):

6
365

Is it possible to remove the MAX_BUFFERSIZE restriction?

I just run into this issue:

C:\Miniconda3\lib\site-packages\blosc\toplevel.py in compress(bytesobj, typesize, clevel, shuffle, cname)
    359     """
    360 
--> 361     _check_input_length('bytesobj', len(bytesobj))
    362     _check_typesize(typesize)
    363     _check_shuffle(shuffle)

C:\Miniconda3\lib\site-packages\blosc\toplevel.py in _check_input_length(input_name, input_len)
    299     if input_len > blosc.MAX_BUFFERSIZE:
    300         raise ValueError("%s cannot be larger than %d bytes" %
--> 301                          (input_name, blosc.MAX_BUFFERSIZE))
    302 
    303 

ValueError: bytesobj cannot be larger than 2147483631 bytes

...so it looks like you're currently restricted to sizeof(int32) or ~2GB. Is there any way around this restriction?

Sporadic segfaults with GCC on Ubuntu Linux

When using GCC (tested with 4.9.3 and 5.2.1) on a Ubuntu 15.10 box one can get sporadicly but consistently segfaults when exercising the test suite enough times:

$ for i in {1..10}; do nosetests --with-doctest blosc; done
........................
----------------------------------------------------------------------
Ran 24 tests in 5.054s

OK
........................
----------------------------------------------------------------------
Ran 24 tests in 5.368s

OK
........................
----------------------------------------------------------------------
Ran 24 tests in 5.122s

OK
........................
----------------------------------------------------------------------
Ran 24 tests in 5.184s

OK
........................
----------------------------------------------------------------------
Ran 24 tests in 5.123s

OK
........................
----------------------------------------------------------------------
Ran 24 tests in 5.753s

OK
........................
----------------------------------------------------------------------
Ran 24 tests in 5.343s

OK
........................
----------------------------------------------------------------------
Ran 24 tests in 5.133s

OK
........................
----------------------------------------------------------------------
Ran 24 tests in 5.487s

OK
Segmentation fault (core dumped)

I cannot get any segfault when using clang (tested with 3.6 and 3.7). Testing on a Mac OSX box does not show any problem either (this is normal because xcode brings clang/LLVM).

A detailed investigation using valgrind does not show anything too evident, except things like:

test_no_leaks (blosc.test.TestCodec) ... ==5330== Invalid read of size 4
==5330==    at 0x4ECEF73: PyObject_Free (obmalloc.c:1013)
==5330==    by 0x4EE2A72: tupledealloc (tupleobject.c:235)
==5330==    by 0x4F327C6: ext_do_call (ceval.c:4665)
==5330==    by 0x4F327C6: PyEval_EvalFrameEx (ceval.c:3026)
==5330==    by 0x4F35A2D: PyEval_EvalCodeEx (ceval.c:3582)
==5330==    by 0x4F34A54: fast_function (ceval.c:4446)
==5330==    by 0x4F34A54: call_function (ceval.c:4371)
==5330==    by 0x4F34A54: PyEval_EvalFrameEx (ceval.c:2987)
==5330==    by 0x4F35A2D: PyEval_EvalCodeEx (ceval.c:3582)
==5330==    by 0x4EB14A7: function_call (funcobject.c:526)
==5330==    by 0x4E81D22: PyObject_Call (abstract.c:2546)
==5330==    by 0x4F32796: ext_do_call (ceval.c:4663)
==5330==    by 0x4F32796: PyEval_EvalFrameEx (ceval.c:3026)
==5330==    by 0x4F35A2D: PyEval_EvalCodeEx (ceval.c:3582)
==5330==    by 0x4EB13A0: function_call (funcobject.c:526)
==5330==    by 0x4E81D22: PyObject_Call (abstract.c:2546)
==5330==  Address 0x428b9020 is 32 bytes before a block of size 80,002,976 in arena "client"

so perhaps there is a problem with reference counting but I am not sure if this is a red herring.

Anyway, as GCC is a very important compiler this ticket has high priority.

can't build master on aarch64

see pr #135

ubuntu 17.10, python 3.6

Linux rock64 4.4.77-rk3328 #29 SMP Mon Nov 20 03:26:28 CET 2017 aarch64 aarch64 aarch64 GNU/Linux

note some tests fail, I'll open another issue with details.

blosc_extension.error: Error -1 while compressing data (Issue probably coming from Shuffle filter)

When using this code (with blosc-1.2.1.win-amd64-py2.7.exe, Python 2.7 64bits, Windows 7 x64) :

import numpy as np
import blosc

with open('blah.bin','rb') as f:
    w = np.fromstring(f.read(), dtype=np.int16)
print w    # w is a regular, standard, numpy array

z = blosc.pack_array(w, cname='lz4')   # here we have a bug

then we have this crash :
(Please download the blah.bin file here : https://dl.dropboxusercontent.com/u/83031018/blah.bin)

[   0    0    2 ..., -872 -258 -599]
Traceback (most recent call last):
  File "D:\Documents\projects\coding\python\vrac\compression\blosc_bug.py", line 7, in <module>
    z = blosc.pack_array(w, cname='lz4')
  File "C:\Python27-64\lib\site-packages\blosc\toplevel.py", line 579, in pack_array
    packed_array = compress(pickled_array, itemsize, clevel, shuffle, cname)
  File "C:\Python27-64\lib\site-packages\blosc\toplevel.py", line 309, in compress
    return _ext.compress(bytesobj, typesize, clevel, shuffle, cname)
blosc_extension.error: Error -1 while compressing data

Important note : When shuffle = False, there is no more error. So the issue probably comes from "Shuffle".

Add EP14 talk video and slides to README.

Error encountered when updating Blosc 1.2.7

Hi,

I want to request for a python wheel distribution of this package or compiled binary instead of install from source. The following log was generated when I tried to update blosc:

c:\Python\Scripts>pip install -U blosc
Collecting blosc
  Downloading blosc-1.2.7.tar.gz (239kB)
    100% |################################| 241kB 292kB/s
Installing collected packages: blosc
  Found existing installation: blosc 1.2.5
    Uninstalling blosc-1.2.5:
      Successfully uninstalled blosc-1.2.5
  Running setup.py install for blosc
    Complete output from command c:\Python\python.exe -c "import setuptools, tokenize;__file__='c:\\users\\eep1\\appdata
\\local\\temp\\pip-build-h08iyx\\blosc\\setup.py';exec(compile(getattr(tokenize, 'open', open)(__file__).read().replace(
'\r\n', '\n'), __file__, 'exec'))" install --record c:\users\eep1\appdata\local\temp\pip-kup53o-record\install-record.tx
t --single-version-externally-managed --compile:
    running install
    running build
    running build_py
    creating build
    creating build\lib.win32-2.7
    creating build\lib.win32-2.7\blosc
    copying blosc\test.py -> build\lib.win32-2.7\blosc
    copying blosc\toplevel.py -> build\lib.win32-2.7\blosc
    copying blosc\version.py -> build\lib.win32-2.7\blosc
    copying blosc\__init__.py -> build\lib.win32-2.7\blosc
    running build_ext
    building 'blosc.blosc_extension' extension
    creating build\temp.win32-2.7
    creating build\temp.win32-2.7\Release
    creating build\temp.win32-2.7\Release\blosc
    creating build\temp.win32-2.7\Release\c-blosc
    creating build\temp.win32-2.7\Release\c-blosc\blosc
    creating build\temp.win32-2.7\Release\c-blosc\internal-complibs
    creating build\temp.win32-2.7\Release\c-blosc\internal-complibs\lz4-1.6.0
    creating build\temp.win32-2.7\Release\c-blosc\internal-complibs\snappy-1.1.1
    creating build\temp.win32-2.7\Release\c-blosc\internal-complibs\zlib-1.2.8
    C:\Python\MinGW\bin\gcc.exe -mdll -O -Wall -DHAVE_LZ4=1 -DHAVE_SNAPPY=1 -DHAVE_ZLIB=1 -Ic-blosc\blosc -Ic-blosc/inte
rnal-complibs\lz4-1.6.0 -Ic-blosc/internal-complibs\snappy-1.1.1 -Ic-blosc/internal-complibs\zlib-1.2.8 -Ic:\Python\incl
ude -Ic:\Python\PC -c blosc/blosc_extension.c -o build\temp.win32-2.7\Release\blosc\blosc_extension.o
    C:\Python\MinGW\bin\gcc.exe -mdll -O -Wall -DHAVE_LZ4=1 -DHAVE_SNAPPY=1 -DHAVE_ZLIB=1 -Ic-blosc\blosc -Ic-blosc/inte
rnal-complibs\lz4-1.6.0 -Ic-blosc/internal-complibs\snappy-1.1.1 -Ic-blosc/internal-complibs\zlib-1.2.8 -Ic:\Python\incl
ude -Ic:\Python\PC -c c-blosc/blosc\blosc.c -o build\temp.win32-2.7\Release\c-blosc\blosc\blosc.o
    c-blosc/blosc\blosc.c:56:23: fatal error: pthread.h: No such file or directory
       #include <pthread.h>
                           ^
    compilation terminated.
    error: command 'C:\\Python\\MinGW\\bin\\gcc.exe' failed with exit status 1

    ----------------------------------------
    Rolling back uninstall of blosc
    Command "c:\Python\python.exe -c "import setuptools, tokenize;__file__='c:\\users\\eep1\\appdata\\local\\temp\\pip-b
uild-h08iyx\\blosc\\setup.py';exec(compile(getattr(tokenize, 'open', open)(__file__).read().replace('\r\n', '\n'), __fil
e__, 'exec'))" install --record c:\users\eep1\appdata\local\temp\pip-kup53o-record\install-record.txt --single-version-e
xternally-managed --compile" failed with error code 1 in c:\users\eep1\appdata\local\temp\pip-build-h08iyx\blosc

Kind regards

support memoryviews even on bytearray?

Seems like attic (which is py3 only) would want to give a memoryview on a bytearray (called data) to blosc's compress, but it doesn't get accepted there, so I would have to use bytes(data) to first convert to an acceptable type - and this would make a full new copy of it, right?

I tried to losen the first type check in blosc to accept memoryviews also, but then it just fails a little later with:

    return _ext.compress(bytesobj, typesize, clevel, shuffle, cname)
TypeError: must be read-only pinned buffer, not memoryview

Seems like code like that (see HAS_NEW_BUFFER) might be useful (but please check, I am not familiar with low-level stuff):

https://github.com/dlitz/pycrypto/pull/81/files#diff-d29a5dec14d8ca1fc5d169320636fc52R631

MacOS installation failure

Installing on MacOS 10.12.6 with Anaconda 5.3.0 Python3.6.1, the following error occurs:

    blosc/blosc_extension.c:594:27: error: use of undeclared identifier 'BLOSC_NOSHUFFLE'
      PyModule_AddIntMacro(m, BLOSC_NOSHUFFLE);
                              ^
    blosc/blosc_extension.c:595:27: error: use of undeclared identifier 'BLOSC_SHUFFLE'
      PyModule_AddIntMacro(m, BLOSC_SHUFFLE);
                              ^
    blosc/blosc_extension.c:596:27: error: use of undeclared identifier 'BLOSC_BITSHUFFLE'
      PyModule_AddIntMacro(m, BLOSC_BITSHUFFLE);
                              ^
    22 warnings and 3 errors generated.
    error: command 'gcc' failed with exit status 1

Corrupted data when using bitshuffle?

Perhaps I don't quite understand the typesize parameter, but would doing the following be incorrect when using blosc.compress?

I'm running this on Mac with Python 3.6.2 using the default python-blosc package from conda-forge. Here it seems like the decompress does not give me back the original string value:

$ python
Python 3.6.2 | packaged by conda-forge | (default, Jul 23 2017, 23:01:38)
[GCC 4.2.1 Compatible Apple LLVM 6.1.0 (clang-602.0.53)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> import numpy
>>> x = numpy.ones(27266, dtype='uint8')
>>> xx = x.tobytes()
>>> import blosc
>>> blosc.decompress(blosc.compress(xx, clevel=5, cname='zstd', shuffle=blosc.BITSHUFFLE))[-3:]
b'\x01  '
>>> blosc.decompress(blosc.compress(xx, clevel=5, cname='zstd', shuffle=blosc.SHUFFLE))[-3:]
b'\x01\x01\x01'
>>> blosc.decompress(blosc.compress(xx, clevel=5, cname='zstd'))[-3:]
b'\x01\x01\x01'

As you can see, with the bitshuffle filter, the values from the decompress seems corrupted at the very end (with two \x00 instead of \x01). Do you have any idea why?

Here is my environment.yml:

name: test_blosc
channels:
- conda-forge
- defaults
dependencies:
- blas=1.1=openblas
- blosc=1.12.0=0
- ca-certificates=2017.7.27.1=0
- certifi=2017.7.27.1=py36_0
- libgfortran=3.0.0=0
- ncurses=5.9=10
- numpy=1.13.3=py36_blas_openblas_200
- openblas=0.2.19=2
- openssl=1.0.2l=0
- pip=9.0.1=py36_0
- python=3.6.2=0
- python-blosc=1.4.4=py36_0
- readline=6.2=0
- setuptools=36.6.0=py36_1
- snappy=1.1.7=1
- sqlite=3.13.0=1
- tk=8.5.19=2
- wheel=0.30.0=py_1
- xz=5.2.3=0
- zlib=1.2.11=0
prefix: /Users/wlee/miniconda3/envs/test_blosc

fix prefix in doctests

All doctests should use the blosc prefix:

>>> blosc.compress(...)
>>> blosc.decompress(...)

Check License

In the python blosc feedstock says the package is apache 2, Is this license correct?
https://github.com/conda-forge/python-blosc-feedstock/blob/master/recipe/meta.yaml#L32-L35

add docs for set_blocksize for release

Hi, great stuff you have done in blosc, thanks! :)

I am trying it for https://attic-backup.org/ and due to the way attic works, it usually processes ~ 64KB big chunks of data. I first had a bit trouble to get blosc into parallel execution until I found set_blocksize.

So I'ld like to suggest that you add a bit docs for it and not just write "experts only" - maxing out speed is your whole point, right? So blosc should not just use 1 thread.

I currently have set blocksize to 8192, assuming that 64KB chunk size divided by maybe 8 cores would give 8KB blocks. Is this correct?

Also, I'ld like to suggest to do a new release on pypi - having "dev" packages as a dependency is a bit ugly.

Expose underlying compression libraries directly

Sometimes I want to use some parts of blosc but not others. In particular I use blosc to gain access to other compression libraries (like snappy) but don't want the added features of blocked parallel compression (I handle parallelism on my own.) What are your thoughts on exposing some of the internal building blocks out to the Python layer?

I would love functions like blosc.shuffle, blosc.lzo.compress, blosc.snappy.compress, etc.. I would use blosc in several more applications if these were available to me directly. It would be especially useful if these did not hold on to the GIL.

document exceptions raised from toplevel.py

As per numpy docstring convention:

Raises
------
GreenSmurfError
   if the color of the smurf is green and not blue

Add a print_versions() function

This is handy for debugging and reporting purposes.

more extensive tests

These days I tend to do most of my testing with nosetests for discovery and test functions provided by nose.tools.

How would you prefer to add more tests? I would propose using a file test_blosc.py with simple test functions.

nbytes in PyBlosc_compress should be size_t

nbytes in PyBlosc_compress should be of type size_t (or perhaps Py_ssize_t better) and not int so as to match the signature of PyString_FromStringAndSize() and blosc_compress. PyBlosc_decompress should be consistent with this too.

Use memoryview.itemsize if no default provided

If I give blosc.compress a memoryview object it would be possible to learn the typesize automatically.

In [1]: import numpy as np

In [2]: x = np.ones(5, dtype='i4')

In [3]: x.data
Out[3]: <memory at 0x7fe710bf0408>

In [4]: x.data.itemsize
Out[4]: 4

In [5]: import blosc

In [6]: blosc.compress(x.data, typesize=x.data.itemsize)
Out[6]: b'\x02\x01\x13\x04\x14\x00\x00\x00\x14\x00\x00\x00$\x00\x00\x00\x01\x00\x00\x00\x01\x00\x00\x00\x01\x00\x00\x00\x01\x00\x00\x00\x01\x00\x00\x00'

floating point exception while operating on empty container

e.g. when something like this (reading 2nd read of the same opened file):

 $> python -c 'import blosc; f=file("/home/yoh/.emacs", "rb"); print 1; blosc.compress(f.read(), typesize=8); print 2; blosc.compress(f.read(), typesize=8); print 3;'
1
2
zsh: floating point exception  python -c

Build failure on raspberry pi running arch

Log:

    [user@scene_pi python-blosc]$ sudo python setup.py install
    running install
    running bdist_egg
    running egg_info
    writing dependency_links to blosc.egg-info/dependency_links.txt
    writing top-level names to blosc.egg-info/top_level.txt
    writing blosc.egg-info/PKG-INFO
    reading manifest file 'blosc.egg-info/SOURCES.txt'
    reading manifest template 'MANIFEST.in'
    warning: no files found matching '*.txt'
    warning: no files found matching '*.cpp' under directory 'c-blosc'
    warning: no files found matching '*.hpp' under directory 'c-blosc'
    writing manifest file 'blosc.egg-info/SOURCES.txt'
    installing library code to build/bdist.linux-armv7l/egg
    running install_lib
    running build_py
    creating build
    creating build/lib.linux-armv7l-3.4
    creating build/lib.linux-armv7l-3.4/blosc
    copying blosc/test.py -> build/lib.linux-armv7l-3.4/blosc
    copying blosc/__init__.py -> build/lib.linux-armv7l-3.4/blosc
    copying blosc/version.py -> build/lib.linux-armv7l-3.4/blosc
    copying blosc/toplevel.py -> build/lib.linux-armv7l-3.4/blosc
    running build_ext
    building 'blosc.blosc_extension' extension
    creating build/temp.linux-armv7l-3.4
    creating build/temp.linux-armv7l-3.4/blosc
    creating build/temp.linux-armv7l-3.4/c-blosc
    creating build/temp.linux-armv7l-3.4/c-blosc/blosc
    creating build/temp.linux-armv7l-3.4/c-blosc/internal-complibs
    creating build/temp.linux-armv7l-3.4/c-blosc/internal-complibs/lz4-1.7.0
    creating build/temp.linux-armv7l-3.4/c-blosc/internal-complibs/snappy-1.1.1
    creating build/temp.linux-armv7l-3.4/c-blosc/internal-complibs/zlib-1.2.8
    gcc -pthread -Wno-unused-result -DNDEBUG -g -fwrapv -O3 -Wall -Wstrict-prototypes -march=armv7-a -mfloat-abi=hard -mfpu=vfpv3-d16 -O2 -pipe -fstack-protector --param=ssp-buffer-size=4 -fPIC -DHAVE_LZ4=1 -DHAVE_SNAPPY=1 -DHAVE_ZLIB=1 -Ic-blosc/blosc -Ic-blosc/internal-complibs/lz4-1.7.0 -Ic-blosc/internal-complibs/snappy-1.1.1 -Ic-blosc/internal-complibs/zlib-1.2.8 -I/usr/include/python3.4m -c blosc/blosc_extension.c -o build/temp.linux-armv7l-3.4/blosc/blosc_extension.o
    gcc -pthread -Wno-unused-result -DNDEBUG -g -fwrapv -O3 -Wall -Wstrict-prototypes -march=armv7-a -mfloat-abi=hard -mfpu=vfpv3-d16 -O2 -pipe -fstack-protector --param=ssp-buffer-size=4 -fPIC -DHAVE_LZ4=1 -DHAVE_SNAPPY=1 -DHAVE_ZLIB=1 -Ic-blosc/blosc -Ic-blosc/internal-complibs/lz4-1.7.0 -Ic-blosc/internal-complibs/snappy-1.1.1 -Ic-blosc/internal-complibs/zlib-1.2.8 -I/usr/include/python3.4m -c c-blosc/blosc/blosc.c -o build/temp.linux-armv7l-3.4/c-blosc/blosc/blosc.o
    c-blosc/blosc/blosc.c: In function 'blosc_getitem':
    c-blosc/blosc/blosc.c:1275:7: warning: unused variable 'tmp_init' [-Wunused-variable]
       int tmp_init = 0;
           ^
    gcc -pthread -Wno-unused-result -DNDEBUG -g -fwrapv -O3 -Wall -Wstrict-prototypes -march=armv7-a -mfloat-abi=hard -mfpu=vfpv3-d16 -O2 -pipe -fstack-protector --param=ssp-buffer-size=4 -fPIC -DHAVE_LZ4=1 -DHAVE_SNAPPY=1 -DHAVE_ZLIB=1 -Ic-blosc/blosc -Ic-blosc/internal-complibs/lz4-1.7.0 -Ic-blosc/internal-complibs/snappy-1.1.1 -Ic-blosc/internal-complibs/zlib-1.2.8 -I/usr/include/python3.4m -c c-blosc/blosc/shuffle-sse2.c -o build/temp.linux-armv7l-3.4/c-blosc/blosc/shuffle-sse2.o
    c-blosc/blosc/shuffle-sse2.c:14:4: error: #error SSE2 is not supported by the target architecture/platform and/or this compiler.
       #error SSE2 is not supported by the target architecture/platform and/or this compiler.
        ^
    c-blosc/blosc/shuffle-sse2.c:17:23: fatal error: emmintrin.h: No such file or directory
    compilation terminated.
    error: command 'gcc' failed with exit status 1

Release archive

Hi Francesc,

Could you please also provide tarballs somewhere for python-blosc with an exact name/version ? (such as python-blosc-1.1.tar.gz)
That would be helpful for packagers. The PyPi archive is useful, but doesn't seem to match exactly this repository (it contains no README/Release-notes AFAIR).

Thanks!

Maintain conda packages on conda forge

Many people in the community have switched to building packages automatically on conda-forge. The getting started procedure is relatively straightforward if you already have a conda recipe.

Having up-to-date conda packages in a community maintained channel would make it easier for downstream libraries (like Dask) to rely on Blosc more heavily.

This would be a nice task for anyone who wants to help out Blosc and get involved in community package management. A deep knowledge of the blosc codebase is not necessary.

Missing a package? Failed Install

I just installed anaconda from a clean install and installed blosc using pip install blosc. I am now having this import error:

In [1]: import blosc
---------------------------------------------------------------------------
ImportError Traceback (most recent call last)
in ()
----> 1 import blosc

/usr/local/anaconda/lib/python2.7/site-packages/blosc/init.py in ()
11
12 # Blosc C symbols that we want to export
---> 13 from blosc.blosc_extension import (
14 BLOSC_VERSION_STRING as VERSION_STRING,
15 BLOSC_VERSION_DATE as VERSION_DATE,

ImportError: /usr/local/anaconda/lib/python2.7/site-packages/blosc/blosc_extension.so: undefined symbol: _ZNSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEE9_M_appendEPKcm

I am running Linux Mint 18 Sarah 64-bit Kernel Linux 4.4.0-21-generic x86_64 MATE 1.14.1
Am I missing a package when installing from pip? Thank you.

blosc / python-blosc Goto Github PK

python-blosc's Issues

Recommend Projects

Recommend Topics

Recommend Org