blosc / python-blosc Goto Github PK
View Code? Open in Web Editor NEWA Python wrapper for the extremely fast Blosc compression library
Home Page: https://www.blosc.org/python-blosc/python-blosc.html
License: Other
A Python wrapper for the extremely fast Blosc compression library
Home Page: https://www.blosc.org/python-blosc/python-blosc.html
License: Other
The subtree merge script is broken right now
Hi,
the title of this repo links to http://blosc.pydata.org, but this just returns a 404
. I guess, it should link to http://www.blosc.org/ .
Cheers,
Arne
The compress*
and decompress*
functions are doing a fair bit of type and value checking. Some of the code is copied and pasted. Should refactor this into functions (or decorators) and then just call the functions to avoid duplication.
If I run this, I get quickly into memory troubles:
import psutil
x = 'aj lkajfldkfjaoiur 0983 5t93h308 ajlkf n fsfhahtey8 haiuoyajkah ' * 100000
counter = 0
freemem = psutil.avail_phymem() / 1024. ** 3
while freemem > 1:
cx = blosc.compress(x, typesize=1, clevel=1, shuffle=False, cname='blosclz')
blosc.decompress(cx)
freemem = psutil.avail_phymem() / 1024. ** 3
counter += 1
if counter % 10000 == 0:
print('%.2f GB' % freemem)
#10.56 GB
#8.66 GB
#6.75 GB
#4.84 GB
#2.91 GB
#1.01 GB
But if I run this, I don't:
import psutil
x = 'aj lkajfldkfjaoiur 0983 5t93h308 ajlkf n fsfhahtey8 haiuoyajkah ' * 100000
cx = blosc.compress(x, typesize=1, clevel=1, shuffle=False, cname='blosclz')
counter = 0
freemem = psutil.avail_phymem() / 1024. ** 3
while freemem > 1:
blosc.decompress(cx)
freemem = psutil.avail_phymem() / 1024. ** 3
counter += 1
if counter % 10000 == 0:
print('%.2f GB' % freemem)
#12.44 GB
#12.44 GB
#12.44 GB
# ...
That makes me think that decompress is not freeing memory used by the compressed data. And that the bug has been introduced in some recent commit to python-blosc.
I get the leak with python-blosc from master, spiced with c-blosc 1.7.0 (although I think the particular version of c-blosc is not relevant here). If I run python-blosc 1.2.7 and c-blosc 1.7.0, everything works like a charm - winning combination, because if I use older versions of lz4hc I get segfaults, the reason why I got into all this trouble at first instance.
In order to detect AVX2 instructions and activate it, it may be a good idea to use:
https://github.com/FlightDataServices/PyCPUID
so as to detect if AVX2 flags can be used to compile c-blosc directly from setuptools.
The only 'problem' that I can see is that PyCPUID has a LGPL which can somewhat collide with BSD that is used for distribution purposes.
Check the types used in decompression and fix them akin to compression. See: FrancescAlted@2a852c3
pip install blosc
on aarch64 / python3.5.2 errors out due to cpuinfo.py not having a match clause for aarch64.
This is solvable by adding the following three lines:
+ elif re.match('^aarch64$', raw_arch_string):
+ arch = 'AARCH_64'
+ bits = 64
Cheers!
Ian
While having a look at a test failing on bloscpack, I saw the following:
blosc_compress_ctx
accepts a compressor string
passed as parameter.
While blosc_compress
the compressor parameter is missing, therefore I believe is ignored. It uses the value from g_global_context instead.
On one hand when releasing the gil the compress string is passed, but on the other hand blosc_compress
is using the global context blosc_extension.c lines of interest. Is this intentional what am I missing?
When looking at:
https://github.com/FrancescAlted/python-blosc/blob/master/blosc/toplevel.py#L116
I can see, that the parameters are incorrectly indented. This happens in other places too.
The heuristics of nosetests
lead to the test()
function which should only run the tests, to be misdetected as a real test. Hence, now when using nosetests
the test are actually run twice:
1 zsh» PYTHONPATH=. nosetests
........test_basic_codec (blosc.test.TestCodec) ... ok
test_compress_exceptions (blosc.test.TestCodec) ... ok
test_compress_ptr_exceptions (blosc.test.TestCodec) ... ok
test_decompress_exceptions (blosc.test.TestCodec) ... ok
test_decompress_ptr_exceptions (blosc.test.TestCodec) ... ok
test_pack_array_exceptions (blosc.test.TestCodec) ... ok
test_set_nthreads_exceptions (blosc.test.TestCodec) ... ok
test_unpack_array_exceptions (blosc.test.TestCodec) ... ok
compress (blosc.toplevel)
Doctest: blosc.toplevel.compress ... ok
compress_ptr (blosc.toplevel)
Doctest: blosc.toplevel.compress_ptr ... ok
decompress (blosc.toplevel)
Doctest: blosc.toplevel.decompress ... ok
decompress_ptr (blosc.toplevel)
Doctest: blosc.toplevel.decompress_ptr ... ok
free_resources (blosc.toplevel)
Doctest: blosc.toplevel.free_resources ... ok
pack_array (blosc.toplevel)
Doctest: blosc.toplevel.pack_array ... ok
set_nthreads (blosc.toplevel)
Doctest: blosc.toplevel.set_nthreads ... ok
unpack_array (blosc.toplevel)
Doctest: blosc.toplevel.unpack_array ... ok
----------------------------------------------------------------------
Ran 16 tests in 5.389s
OK
.
----------------------------------------------------------------------
Ran 9 tests in 9.674s
OK
PYTHONPATH=. nosetests 5.09s user 4.66s system 94% cpu 10.267 total
See the comments in FrancescAlted@2a852c3 for details.
In [1]: import blosc
In [2]: s = 'abc'
In [3]: blosc.compress(s, typesize=0)
[3] 12228 floating point exception (core dumped) ipython
We are stumbling upon the missing Python.h
issue again and again. Maybe it is time to put some effort into binary wheels. Though I must confess I have absolutely no idea about these.
I would like to submit a pull request to support the c-blosc
candidate branch lizard
in python-blosc
found here:
https://github.com/robbmcleod/python-blosc/tree/lizard
but I need a corresponding branch to pull against.
Hello,
See this for more info: Blosc/bloscpack#46
@esc Suggested I post a bug here instead...
I have Anaconda 2.5 (Python 3.5.1) with Visual Studio community edition. I can compile and install 1.2.8 (or 1.2.9dev0 from github) with no problems. With 1.2.7, pip install blosc==1.2.7 gives me all this nastiness....
Both 1.2.8 and 1.2.9dev0 seem to build just fine. (I'm using these while playing with castra).
uild\temp.win-amd64-3.5\Release\c-blosc/internal-complibs\zlib-1.2.8\inflate.obj
build\temp.win-amd64-3.5\Release\c-blosc/internal-complibs\zlib-1.2.8\inftrees.
obj build\temp.win-amd64-3.5\Release\c-blosc/internal-complibs\zlib-1.2.8\trees.
obj build\temp.win-amd64-3.5\Release\c-blosc/internal-complibs\zlib-1.2.8\uncomp
r.obj build\temp.win-amd64-3.5\Release\c-blosc/internal-complibs\zlib-1.2.8\zuti
l.obj /OUT:build\lib.win-amd64-3.5\blosc\blosc_extension.cp35-win_amd64.pyd /IMP
LIB:build\temp.win-amd64-3.5\Release\blosc\blosc_extension.cp35-win_amd64.lib
blosc_extension.obj : warning LNK4197: export 'PyInit_blosc_extension' speci
fied multiple times; using first specification
Creating library build\temp.win-amd64-3.5\Release\blosc\blosc_extension.c
p35-win_amd64.lib and object build\temp.win-amd64-3.5\Release\blosc\blosc_extens
ion.cp35-win_amd64.exp
shuffle.obj : error LNK2001: unresolved external symbol blosc_get_cpu_featur
es
build\lib.win-amd64-3.5\blosc\blosc_extension.cp35-win_amd64.pyd : fatal err
or LNK1120: 1 unresolved externals
error: command 'C:\Program Files (x86)\Microsoft Visual Studio 14.0\VC\B
IN\amd64\link.exe' failed with exit status 1120
----------------------------------------
Command "C:\Anaconda3\python.exe -u -c "import setuptools, tokenize;file='C:
\Users\PQUACK1\AppData\Local\Temp\pip-build-7u774njl\blosc\setup.py';ex1\AppData\Local\Temp\pip
ec(compile(getattr(tokenize, 'open', open)(file).read().replace('\r\n', '\n'
), file, 'exec'))" install --record C:\Users\QUACK
-rbdprlu1-record\install-record.txt --single-version-externally-managed --compil
e" failed with error code 1 in C:\Users\QUACK~1\AppData\Local\Temp\pip-build-7u
774njl\blosc\
[Anaconda3] C:\Users\pquackenbush\git>
I am trying to use python-blosc for hyperspy/hyperspy#1716 and after installing blosc using pip, I get the following error when importing blosc.
>>> import blosc
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/opt/anaconda3/lib/python3.6/site-packages/blosc/__init__.py", line 13, in <module>
from blosc.blosc_extension import (
ImportError: /opt/anaconda3/lib/python3.6/site-packages/blosc/blosc_extension.cpython-36m-x86_64-linux-gnu.so: undefined symbol: _ZNSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEE9_M_appendEPKcm
For example, in:
https://github.com/FrancescAlted/python-blosc/blob/master/blosc/blosc_extension.c#L77
the original error should not be masked with another error, which can be misleading for debugging.
I would like to use single-threaded blosc.compress/decompress
in many threads in parallel. My understanding is that this is possible in the C layer by creating many contexts but not currently possible in the Python layer.
Hi, recently on Debian we've got this error from building the docs:
# Sphinx version: 1.5.6
# Python version: 2.7.13 (CPython)
# Docutils version: 0.13.1 release
# Jinja2 version: 2.9.6
# Last messages:
# loading intersphinx inventory from http://docs.python.org/objects.inv...
# WARNING: intersphinx inventory 'http://docs.python.org/objects.inv' not fetchable due to <class 'requests.exceptions.ProxyError'>: HTTPConnectionPool(host='127.0.0.1', port=9): Max retries exceeded with url: http://docs.python.org/objects.inv (Caused by ProxyError('Cannot connect to proxy.', NewConnectionError('<requests.packages.urllib3.connection.HTTPConnection object at 0x7f4cc6e66750>: Failed to establish a new connection: [Errno 111] Connection refused',)))
# building [mo]: targets for 0 po files that are out of date
# building [html]: targets for 5 source files that are out of date
# updating environment:
# 5 added, 0 changed, 0 removed
# reading sources... [ 20%] index
# reading sources... [ 40%] install
# reading sources... [ 60%] intro
# reading sources... [ 80%] reference
# Loaded extensions:
# sphinx.ext.coverage (1.5.6) from /usr/lib/python2.7/dist-packages/sphinx/ext/coverage.pyc
# sphinx.ext.todo (1.5.6) from /usr/lib/python2.7/dist-packages/sphinx/ext/todo.pyc
# sphinx.ext.autodoc (1.5.6) from /usr/lib/python2.7/dist-packages/sphinx/ext/autodoc.pyc
# sphinx.ext.intersphinx (1.5.6) from /usr/lib/python2.7/dist-packages/sphinx/ext/intersphinx.pyc
# sphinx.ext.doctest (1.5.6) from /usr/lib/python2.7/dist-packages/sphinx/ext/doctest.pyc
# alabaster (0.7.8) from /usr/lib/python2.7/dist-packages/alabaster/__init__.pyc
# numpydoc (unknown version) from /usr/lib/python2.7/dist-packages/numpydoc/__init__.pyc
Traceback (most recent call last):
File "/usr/lib/python2.7/dist-packages/sphinx/cmdline.py", line 296, in main
app.build(opts.force_all, filenames)
File "/usr/lib/python2.7/dist-packages/sphinx/application.py", line 333, in build
self.builder.build_update()
File "/usr/lib/python2.7/dist-packages/sphinx/builders/__init__.py", line 251, in build_update
'out of date' % len(to_build))
File "/usr/lib/python2.7/dist-packages/sphinx/builders/__init__.py", line 265, in build
self.doctreedir, self.app))
File "/usr/lib/python2.7/dist-packages/sphinx/environment/__init__.py", line 556, in update
self._read_serial(docnames, app)
File "/usr/lib/python2.7/dist-packages/sphinx/environment/__init__.py", line 576, in _read_serial
self.read_doc(docname, app)
File "/usr/lib/python2.7/dist-packages/sphinx/environment/__init__.py", line 684, in read_doc
pub.publish()
File "/usr/lib/python2.7/dist-packages/docutils/core.py", line 217, in publish
self.settings)
File "/usr/lib/python2.7/dist-packages/sphinx/io.py", line 55, in read
self.parse()
File "/usr/lib/python2.7/dist-packages/docutils/readers/__init__.py", line 78, in parse
self.parser.parse(self.input, document)
File "/usr/lib/python2.7/dist-packages/docutils/parsers/rst/__init__.py", line 185, in parse
self.statemachine.run(inputlines, document, inliner=self.inliner)
File "/usr/lib/python2.7/dist-packages/docutils/parsers/rst/states.py", line 171, in run
input_source=document['source'])
File "/usr/lib/python2.7/dist-packages/docutils/statemachine.py", line 239, in run
context, state, transitions)
File "/usr/lib/python2.7/dist-packages/docutils/statemachine.py", line 460, in check_line
return method(match, context, next_state)
File "/usr/lib/python2.7/dist-packages/docutils/parsers/rst/states.py", line 2983, in text
self.section(title.lstrip(), source, style, lineno + 1, messages)
File "/usr/lib/python2.7/dist-packages/docutils/parsers/rst/states.py", line 327, in section
self.new_subsection(title, lineno, messages)
File "/usr/lib/python2.7/dist-packages/docutils/parsers/rst/states.py", line 395, in new_subsection
node=section_node, match_titles=True)
File "/usr/lib/python2.7/dist-packages/docutils/parsers/rst/states.py", line 282, in nested_parse
node=node, match_titles=match_titles)
File "/usr/lib/python2.7/dist-packages/docutils/parsers/rst/states.py", line 196, in run
results = StateMachineWS.run(self, input_lines, input_offset)
File "/usr/lib/python2.7/dist-packages/docutils/statemachine.py", line 239, in run
context, state, transitions)
File "/usr/lib/python2.7/dist-packages/docutils/statemachine.py", line 460, in check_line
return method(match, context, next_state)
File "/usr/lib/python2.7/dist-packages/docutils/parsers/rst/states.py", line 2748, in underline
self.section(title, source, style, lineno - 1, messages)
File "/usr/lib/python2.7/dist-packages/docutils/parsers/rst/states.py", line 327, in section
self.new_subsection(title, lineno, messages)
File "/usr/lib/python2.7/dist-packages/docutils/parsers/rst/states.py", line 395, in new_subsection
node=section_node, match_titles=True)
File "/usr/lib/python2.7/dist-packages/docutils/parsers/rst/states.py", line 282, in nested_parse
node=node, match_titles=match_titles)
File "/usr/lib/python2.7/dist-packages/docutils/parsers/rst/states.py", line 196, in run
results = StateMachineWS.run(self, input_lines, input_offset)
File "/usr/lib/python2.7/dist-packages/docutils/statemachine.py", line 239, in run
context, state, transitions)
File "/usr/lib/python2.7/dist-packages/docutils/statemachine.py", line 460, in check_line
return method(match, context, next_state)
File "/usr/lib/python2.7/dist-packages/docutils/parsers/rst/states.py", line 2321, in explicit_markup
nodelist, blank_finish = self.explicit_construct(match)
File "/usr/lib/python2.7/dist-packages/docutils/parsers/rst/states.py", line 2333, in explicit_construct
return method(self, expmatch)
File "/usr/lib/python2.7/dist-packages/docutils/parsers/rst/states.py", line 2076, in directive
directive_class, match, type_name, option_presets)
File "/usr/lib/python2.7/dist-packages/docutils/parsers/rst/states.py", line 2125, in run_directive
result = directive_instance.run()
File "/usr/lib/python2.7/dist-packages/sphinx/ext/autodoc.py", line 1668, in run
documenter.generate(more_content=self.content)
File "/usr/lib/python2.7/dist-packages/sphinx/ext/autodoc.py", line 1000, in generate
sig = self.format_signature()
File "/usr/lib/python2.7/dist-packages/sphinx/ext/autodoc.py", line 654, in format_signature
self.object, self.options, args, retann)
File "/usr/lib/python2.7/dist-packages/sphinx/application.py", line 593, in emit_firstresult
for result in self.emit(event, *args):
File "/usr/lib/python2.7/dist-packages/sphinx/application.py", line 589, in emit
results.append(callback(self, *args))
File "/usr/lib/python2.7/dist-packages/numpydoc/numpydoc.py", line 119, in mangle_signature
sig = re.sub(sixu("^[^(]*"), sixu(""), sig)
File "/usr/lib/python2.7/re.py", line 155, in sub
return _compile(pattern, flags).sub(repl, string, count)
TypeError: expected string or buffer
numpydoc is 0.7.0. Intersphinx/net connection during build is blocked by policy.
Thanks,
DS
As suggested by @brouberol during #ep14
specifically in blosc.toplevel
, decompress_ptr()
returns with error. This is called from Pandas v.0.18.1 pd.read_msgpack()
Sphinx has become almost a standard for documenting Python libraries. Let's do that.
decompress_ptr
should return the number of bytes written into memory rather than None
.
The test suite reveals that some dataset configurations do not work well on Mac OSX. Here it as example:
import array
import blosc
a = array.array('i', xrange(1000)).tostring()
print blosc.set_nthreads(1) # disable multithreading, just in case
ac = blosc.compress(a, 4, 9, True)
print len(ac)
Here is the result on Mac OSX (incorrect):
2
4016
And here on Linux (which is correct):
6
365
I just run into this issue:
C:\Miniconda3\lib\site-packages\blosc\toplevel.py in compress(bytesobj, typesize, clevel, shuffle, cname)
359 """
360
--> 361 _check_input_length('bytesobj', len(bytesobj))
362 _check_typesize(typesize)
363 _check_shuffle(shuffle)
C:\Miniconda3\lib\site-packages\blosc\toplevel.py in _check_input_length(input_name, input_len)
299 if input_len > blosc.MAX_BUFFERSIZE:
300 raise ValueError("%s cannot be larger than %d bytes" %
--> 301 (input_name, blosc.MAX_BUFFERSIZE))
302
303
ValueError: bytesobj cannot be larger than 2147483631 bytes
...so it looks like you're currently restricted to sizeof(int32)
or ~2GB. Is there any way around this restriction?
When using GCC (tested with 4.9.3 and 5.2.1) on a Ubuntu 15.10 box one can get sporadicly but consistently segfaults when exercising the test suite enough times:
$ for i in {1..10}; do nosetests --with-doctest blosc; done
........................
----------------------------------------------------------------------
Ran 24 tests in 5.054s
OK
........................
----------------------------------------------------------------------
Ran 24 tests in 5.368s
OK
........................
----------------------------------------------------------------------
Ran 24 tests in 5.122s
OK
........................
----------------------------------------------------------------------
Ran 24 tests in 5.184s
OK
........................
----------------------------------------------------------------------
Ran 24 tests in 5.123s
OK
........................
----------------------------------------------------------------------
Ran 24 tests in 5.753s
OK
........................
----------------------------------------------------------------------
Ran 24 tests in 5.343s
OK
........................
----------------------------------------------------------------------
Ran 24 tests in 5.133s
OK
........................
----------------------------------------------------------------------
Ran 24 tests in 5.487s
OK
Segmentation fault (core dumped)
I cannot get any segfault when using clang (tested with 3.6 and 3.7). Testing on a Mac OSX box does not show any problem either (this is normal because xcode brings clang/LLVM).
A detailed investigation using valgrind does not show anything too evident, except things like:
test_no_leaks (blosc.test.TestCodec) ... ==5330== Invalid read of size 4
==5330== at 0x4ECEF73: PyObject_Free (obmalloc.c:1013)
==5330== by 0x4EE2A72: tupledealloc (tupleobject.c:235)
==5330== by 0x4F327C6: ext_do_call (ceval.c:4665)
==5330== by 0x4F327C6: PyEval_EvalFrameEx (ceval.c:3026)
==5330== by 0x4F35A2D: PyEval_EvalCodeEx (ceval.c:3582)
==5330== by 0x4F34A54: fast_function (ceval.c:4446)
==5330== by 0x4F34A54: call_function (ceval.c:4371)
==5330== by 0x4F34A54: PyEval_EvalFrameEx (ceval.c:2987)
==5330== by 0x4F35A2D: PyEval_EvalCodeEx (ceval.c:3582)
==5330== by 0x4EB14A7: function_call (funcobject.c:526)
==5330== by 0x4E81D22: PyObject_Call (abstract.c:2546)
==5330== by 0x4F32796: ext_do_call (ceval.c:4663)
==5330== by 0x4F32796: PyEval_EvalFrameEx (ceval.c:3026)
==5330== by 0x4F35A2D: PyEval_EvalCodeEx (ceval.c:3582)
==5330== by 0x4EB13A0: function_call (funcobject.c:526)
==5330== by 0x4E81D22: PyObject_Call (abstract.c:2546)
==5330== Address 0x428b9020 is 32 bytes before a block of size 80,002,976 in arena "client"
so perhaps there is a problem with reference counting but I am not sure if this is a red herring.
Anyway, as GCC is a very important compiler this ticket has high priority.
see pr #135
ubuntu 17.10, python 3.6
Linux rock64 4.4.77-rk3328 #29 SMP Mon Nov 20 03:26:28 CET 2017 aarch64 aarch64 aarch64 GNU/Linux
note some tests fail, I'll open another issue with details.
When using this code (with blosc-1.2.1.win-amd64-py2.7.exe, Python 2.7 64bits, Windows 7 x64) :
import numpy as np
import blosc
with open('blah.bin','rb') as f:
w = np.fromstring(f.read(), dtype=np.int16)
print w # w is a regular, standard, numpy array
z = blosc.pack_array(w, cname='lz4') # here we have a bug
then we have this crash :
(Please download the blah.bin
file here : https://dl.dropboxusercontent.com/u/83031018/blah.bin)
[ 0 0 2 ..., -872 -258 -599]
Traceback (most recent call last):
File "D:\Documents\projects\coding\python\vrac\compression\blosc_bug.py", line 7, in <module>
z = blosc.pack_array(w, cname='lz4')
File "C:\Python27-64\lib\site-packages\blosc\toplevel.py", line 579, in pack_array
packed_array = compress(pickled_array, itemsize, clevel, shuffle, cname)
File "C:\Python27-64\lib\site-packages\blosc\toplevel.py", line 309, in compress
return _ext.compress(bytesobj, typesize, clevel, shuffle, cname)
blosc_extension.error: Error -1 while compressing data
Important note : When shuffle = False
, there is no more error. So the issue probably comes from "Shuffle".
Hi,
I want to request for a python wheel distribution of this package or compiled binary instead of install from source. The following log was generated when I tried to update blosc:
c:\Python\Scripts>pip install -U blosc
Collecting blosc
Downloading blosc-1.2.7.tar.gz (239kB)
100% |################################| 241kB 292kB/s
Installing collected packages: blosc
Found existing installation: blosc 1.2.5
Uninstalling blosc-1.2.5:
Successfully uninstalled blosc-1.2.5
Running setup.py install for blosc
Complete output from command c:\Python\python.exe -c "import setuptools, tokenize;__file__='c:\\users\\eep1\\appdata
\\local\\temp\\pip-build-h08iyx\\blosc\\setup.py';exec(compile(getattr(tokenize, 'open', open)(__file__).read().replace(
'\r\n', '\n'), __file__, 'exec'))" install --record c:\users\eep1\appdata\local\temp\pip-kup53o-record\install-record.tx
t --single-version-externally-managed --compile:
running install
running build
running build_py
creating build
creating build\lib.win32-2.7
creating build\lib.win32-2.7\blosc
copying blosc\test.py -> build\lib.win32-2.7\blosc
copying blosc\toplevel.py -> build\lib.win32-2.7\blosc
copying blosc\version.py -> build\lib.win32-2.7\blosc
copying blosc\__init__.py -> build\lib.win32-2.7\blosc
running build_ext
building 'blosc.blosc_extension' extension
creating build\temp.win32-2.7
creating build\temp.win32-2.7\Release
creating build\temp.win32-2.7\Release\blosc
creating build\temp.win32-2.7\Release\c-blosc
creating build\temp.win32-2.7\Release\c-blosc\blosc
creating build\temp.win32-2.7\Release\c-blosc\internal-complibs
creating build\temp.win32-2.7\Release\c-blosc\internal-complibs\lz4-1.6.0
creating build\temp.win32-2.7\Release\c-blosc\internal-complibs\snappy-1.1.1
creating build\temp.win32-2.7\Release\c-blosc\internal-complibs\zlib-1.2.8
C:\Python\MinGW\bin\gcc.exe -mdll -O -Wall -DHAVE_LZ4=1 -DHAVE_SNAPPY=1 -DHAVE_ZLIB=1 -Ic-blosc\blosc -Ic-blosc/inte
rnal-complibs\lz4-1.6.0 -Ic-blosc/internal-complibs\snappy-1.1.1 -Ic-blosc/internal-complibs\zlib-1.2.8 -Ic:\Python\incl
ude -Ic:\Python\PC -c blosc/blosc_extension.c -o build\temp.win32-2.7\Release\blosc\blosc_extension.o
C:\Python\MinGW\bin\gcc.exe -mdll -O -Wall -DHAVE_LZ4=1 -DHAVE_SNAPPY=1 -DHAVE_ZLIB=1 -Ic-blosc\blosc -Ic-blosc/inte
rnal-complibs\lz4-1.6.0 -Ic-blosc/internal-complibs\snappy-1.1.1 -Ic-blosc/internal-complibs\zlib-1.2.8 -Ic:\Python\incl
ude -Ic:\Python\PC -c c-blosc/blosc\blosc.c -o build\temp.win32-2.7\Release\c-blosc\blosc\blosc.o
c-blosc/blosc\blosc.c:56:23: fatal error: pthread.h: No such file or directory
#include <pthread.h>
^
compilation terminated.
error: command 'C:\\Python\\MinGW\\bin\\gcc.exe' failed with exit status 1
----------------------------------------
Rolling back uninstall of blosc
Command "c:\Python\python.exe -c "import setuptools, tokenize;__file__='c:\\users\\eep1\\appdata\\local\\temp\\pip-b
uild-h08iyx\\blosc\\setup.py';exec(compile(getattr(tokenize, 'open', open)(__file__).read().replace('\r\n', '\n'), __fil
e__, 'exec'))" install --record c:\users\eep1\appdata\local\temp\pip-kup53o-record\install-record.txt --single-version-e
xternally-managed --compile" failed with error code 1 in c:\users\eep1\appdata\local\temp\pip-build-h08iyx\blosc
Kind regards
Seems like attic (which is py3 only) would want to give a memoryview on a bytearray (called data) to blosc's compress, but it doesn't get accepted there, so I would have to use bytes(data) to first convert to an acceptable type - and this would make a full new copy of it, right?
I tried to losen the first type check in blosc to accept memoryviews also, but then it just fails a little later with:
return _ext.compress(bytesobj, typesize, clevel, shuffle, cname)
TypeError: must be read-only pinned buffer, not memoryview
Seems like code like that (see HAS_NEW_BUFFER) might be useful (but please check, I am not familiar with low-level stuff):
https://github.com/dlitz/pycrypto/pull/81/files#diff-d29a5dec14d8ca1fc5d169320636fc52R631
Installing on MacOS 10.12.6 with Anaconda 5.3.0 Python3.6.1, the following error occurs:
blosc/blosc_extension.c:594:27: error: use of undeclared identifier 'BLOSC_NOSHUFFLE'
PyModule_AddIntMacro(m, BLOSC_NOSHUFFLE);
^
blosc/blosc_extension.c:595:27: error: use of undeclared identifier 'BLOSC_SHUFFLE'
PyModule_AddIntMacro(m, BLOSC_SHUFFLE);
^
blosc/blosc_extension.c:596:27: error: use of undeclared identifier 'BLOSC_BITSHUFFLE'
PyModule_AddIntMacro(m, BLOSC_BITSHUFFLE);
^
22 warnings and 3 errors generated.
error: command 'gcc' failed with exit status 1
Perhaps I don't quite understand the typesize parameter, but would doing the following be incorrect when using blosc.compress?
I'm running this on Mac with Python 3.6.2 using the default python-blosc package from conda-forge. Here it seems like the decompress does not give me back the original string value:
$ python
Python 3.6.2 | packaged by conda-forge | (default, Jul 23 2017, 23:01:38)
[GCC 4.2.1 Compatible Apple LLVM 6.1.0 (clang-602.0.53)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> import numpy
>>> x = numpy.ones(27266, dtype='uint8')
>>> xx = x.tobytes()
>>> import blosc
>>> blosc.decompress(blosc.compress(xx, clevel=5, cname='zstd', shuffle=blosc.BITSHUFFLE))[-3:]
b'\x01 '
>>> blosc.decompress(blosc.compress(xx, clevel=5, cname='zstd', shuffle=blosc.SHUFFLE))[-3:]
b'\x01\x01\x01'
>>> blosc.decompress(blosc.compress(xx, clevel=5, cname='zstd'))[-3:]
b'\x01\x01\x01'
As you can see, with the bitshuffle filter, the values from the decompress seems corrupted at the very end (with two \x00 instead of \x01). Do you have any idea why?
Here is my environment.yml:
name: test_blosc
channels:
- conda-forge
- defaults
dependencies:
- blas=1.1=openblas
- blosc=1.12.0=0
- ca-certificates=2017.7.27.1=0
- certifi=2017.7.27.1=py36_0
- libgfortran=3.0.0=0
- ncurses=5.9=10
- numpy=1.13.3=py36_blas_openblas_200
- openblas=0.2.19=2
- openssl=1.0.2l=0
- pip=9.0.1=py36_0
- python=3.6.2=0
- python-blosc=1.4.4=py36_0
- readline=6.2=0
- setuptools=36.6.0=py36_1
- snappy=1.1.7=1
- sqlite=3.13.0=1
- tk=8.5.19=2
- wheel=0.30.0=py_1
- xz=5.2.3=0
- zlib=1.2.11=0
prefix: /Users/wlee/miniconda3/envs/test_blosc
All doctests should use the blosc
prefix:
>>> blosc.compress(...)
>>> blosc.decompress(...)
In the python blosc feedstock says the package is apache 2, Is this license correct?
https://github.com/conda-forge/python-blosc-feedstock/blob/master/recipe/meta.yaml#L32-L35
Hi, great stuff you have done in blosc, thanks! :)
I am trying it for https://attic-backup.org/ and due to the way attic works, it usually processes ~ 64KB big chunks of data. I first had a bit trouble to get blosc into parallel execution until I found set_blocksize.
So I'ld like to suggest that you add a bit docs for it and not just write "experts only" - maxing out speed is your whole point, right? So blosc should not just use 1 thread.
I currently have set blocksize to 8192, assuming that 64KB chunk size divided by maybe 8 cores would give 8KB blocks. Is this correct?
Also, I'ld like to suggest to do a new release on pypi - having "dev" packages as a dependency is a bit ugly.
Sometimes I want to use some parts of blosc but not others. In particular I use blosc to gain access to other compression libraries (like snappy) but don't want the added features of blocked parallel compression (I handle parallelism on my own.) What are your thoughts on exposing some of the internal building blocks out to the Python layer?
I would love functions like blosc.shuffle
, blosc.lzo.compress
, blosc.snappy.compress
, etc.. I would use blosc
in several more applications if these were available to me directly. It would be especially useful if these did not hold on to the GIL.
This is handy for debugging and reporting purposes.
These days I tend to do most of my testing with nosetests
for discovery and test functions provided by nose.tools
.
How would you prefer to add more tests? I would propose using a file test_blosc.py
with simple test functions.
nbytes in PyBlosc_compress should be of type size_t
(or perhaps Py_ssize_t
better) and not int
so as to match the signature of PyString_FromStringAndSize() and blosc_compress. PyBlosc_decompress should be consistent with this too.
If I give blosc.compress
a memoryview
object it would be possible to learn the typesize automatically.
In [1]: import numpy as np
In [2]: x = np.ones(5, dtype='i4')
In [3]: x.data
Out[3]: <memory at 0x7fe710bf0408>
In [4]: x.data.itemsize
Out[4]: 4
In [5]: import blosc
In [6]: blosc.compress(x.data, typesize=x.data.itemsize)
Out[6]: b'\x02\x01\x13\x04\x14\x00\x00\x00\x14\x00\x00\x00$\x00\x00\x00\x01\x00\x00\x00\x01\x00\x00\x00\x01\x00\x00\x00\x01\x00\x00\x00\x01\x00\x00\x00'
e.g. when something like this (reading 2nd read of the same opened file):
$> python -c 'import blosc; f=file("/home/yoh/.emacs", "rb"); print 1; blosc.compress(f.read(), typesize=8); print 2; blosc.compress(f.read(), typesize=8); print 3;'
1
2
zsh: floating point exception python -c
Log:
[user@scene_pi python-blosc]$ sudo python setup.py install
running install
running bdist_egg
running egg_info
writing dependency_links to blosc.egg-info/dependency_links.txt
writing top-level names to blosc.egg-info/top_level.txt
writing blosc.egg-info/PKG-INFO
reading manifest file 'blosc.egg-info/SOURCES.txt'
reading manifest template 'MANIFEST.in'
warning: no files found matching '*.txt'
warning: no files found matching '*.cpp' under directory 'c-blosc'
warning: no files found matching '*.hpp' under directory 'c-blosc'
writing manifest file 'blosc.egg-info/SOURCES.txt'
installing library code to build/bdist.linux-armv7l/egg
running install_lib
running build_py
creating build
creating build/lib.linux-armv7l-3.4
creating build/lib.linux-armv7l-3.4/blosc
copying blosc/test.py -> build/lib.linux-armv7l-3.4/blosc
copying blosc/__init__.py -> build/lib.linux-armv7l-3.4/blosc
copying blosc/version.py -> build/lib.linux-armv7l-3.4/blosc
copying blosc/toplevel.py -> build/lib.linux-armv7l-3.4/blosc
running build_ext
building 'blosc.blosc_extension' extension
creating build/temp.linux-armv7l-3.4
creating build/temp.linux-armv7l-3.4/blosc
creating build/temp.linux-armv7l-3.4/c-blosc
creating build/temp.linux-armv7l-3.4/c-blosc/blosc
creating build/temp.linux-armv7l-3.4/c-blosc/internal-complibs
creating build/temp.linux-armv7l-3.4/c-blosc/internal-complibs/lz4-1.7.0
creating build/temp.linux-armv7l-3.4/c-blosc/internal-complibs/snappy-1.1.1
creating build/temp.linux-armv7l-3.4/c-blosc/internal-complibs/zlib-1.2.8
gcc -pthread -Wno-unused-result -DNDEBUG -g -fwrapv -O3 -Wall -Wstrict-prototypes -march=armv7-a -mfloat-abi=hard -mfpu=vfpv3-d16 -O2 -pipe -fstack-protector --param=ssp-buffer-size=4 -fPIC -DHAVE_LZ4=1 -DHAVE_SNAPPY=1 -DHAVE_ZLIB=1 -Ic-blosc/blosc -Ic-blosc/internal-complibs/lz4-1.7.0 -Ic-blosc/internal-complibs/snappy-1.1.1 -Ic-blosc/internal-complibs/zlib-1.2.8 -I/usr/include/python3.4m -c blosc/blosc_extension.c -o build/temp.linux-armv7l-3.4/blosc/blosc_extension.o
gcc -pthread -Wno-unused-result -DNDEBUG -g -fwrapv -O3 -Wall -Wstrict-prototypes -march=armv7-a -mfloat-abi=hard -mfpu=vfpv3-d16 -O2 -pipe -fstack-protector --param=ssp-buffer-size=4 -fPIC -DHAVE_LZ4=1 -DHAVE_SNAPPY=1 -DHAVE_ZLIB=1 -Ic-blosc/blosc -Ic-blosc/internal-complibs/lz4-1.7.0 -Ic-blosc/internal-complibs/snappy-1.1.1 -Ic-blosc/internal-complibs/zlib-1.2.8 -I/usr/include/python3.4m -c c-blosc/blosc/blosc.c -o build/temp.linux-armv7l-3.4/c-blosc/blosc/blosc.o
c-blosc/blosc/blosc.c: In function 'blosc_getitem':
c-blosc/blosc/blosc.c:1275:7: warning: unused variable 'tmp_init' [-Wunused-variable]
int tmp_init = 0;
^
gcc -pthread -Wno-unused-result -DNDEBUG -g -fwrapv -O3 -Wall -Wstrict-prototypes -march=armv7-a -mfloat-abi=hard -mfpu=vfpv3-d16 -O2 -pipe -fstack-protector --param=ssp-buffer-size=4 -fPIC -DHAVE_LZ4=1 -DHAVE_SNAPPY=1 -DHAVE_ZLIB=1 -Ic-blosc/blosc -Ic-blosc/internal-complibs/lz4-1.7.0 -Ic-blosc/internal-complibs/snappy-1.1.1 -Ic-blosc/internal-complibs/zlib-1.2.8 -I/usr/include/python3.4m -c c-blosc/blosc/shuffle-sse2.c -o build/temp.linux-armv7l-3.4/c-blosc/blosc/shuffle-sse2.o
c-blosc/blosc/shuffle-sse2.c:14:4: error: #error SSE2 is not supported by the target architecture/platform and/or this compiler.
#error SSE2 is not supported by the target architecture/platform and/or this compiler.
^
c-blosc/blosc/shuffle-sse2.c:17:23: fatal error: emmintrin.h: No such file or directory
compilation terminated.
error: command 'gcc' failed with exit status 1
Hi Francesc,
Could you please also provide tarballs somewhere for python-blosc with an exact name/version ? (such as python-blosc-1.1.tar.gz)
That would be helpful for packagers. The PyPi archive is useful, but doesn't seem to match exactly this repository (it contains no README/Release-notes AFAIR).
Thanks!
Many people in the community have switched to building packages automatically on conda-forge. The getting started procedure is relatively straightforward if you already have a conda recipe.
Having up-to-date conda packages in a community maintained channel would make it easier for downstream libraries (like Dask) to rely on Blosc more heavily.
This would be a nice task for anyone who wants to help out Blosc and get involved in community package management. A deep knowledge of the blosc codebase is not necessary.
I just installed anaconda from a clean install and installed blosc using pip install blosc. I am now having this import error:
In [1]: import blosc
---------------------------------------------------------------------------
ImportError Traceback (most recent call last)
in ()
----> 1 import blosc
/usr/local/anaconda/lib/python2.7/site-packages/blosc/init.py in ()
11
12 # Blosc C symbols that we want to export
---> 13 from blosc.blosc_extension import (
14 BLOSC_VERSION_STRING as VERSION_STRING,
15 BLOSC_VERSION_DATE as VERSION_DATE,
ImportError: /usr/local/anaconda/lib/python2.7/site-packages/blosc/blosc_extension.so: undefined symbol: _ZNSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEE9_M_appendEPKcm
I am running Linux Mint 18 Sarah 64-bit Kernel Linux 4.4.0-21-generic x86_64 MATE 1.14.1
Am I missing a package when installing from pip? Thank you.
Isn't being installed automatically or checked for.
Currently the, Blosc sources in c-blosc
subdirectory are managed by manually copying them when a new version of Blosc is released.
I will shortly present two PRs to demo using subtrees or submodules to manage Blosc sources.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.