Comments (1)
I have ended making a minimal example that crashes:
from __future__ import print_function
import numpy
import blosc
print("Blosc version info:", blosc.blosclib_version)
# Setting the number of threads to 3 accelerates the segfaults occurrencies
blosc.set_nthreads(3)
a = numpy.arange(1e6)
parray = blosc.compress(a, clevel=9, shuffle=blosc.SHUFFLE, cname="blosclz")
ratio = len(a) * a.itemsize * 1. / len(parray)
print("Compression: %s -> %s (%4.1fx)" % (
len(a) * a.itemsize, len(parray), ratio))
With that, it is quite easy to make the python-blosc wrapper to crash:
$ time for i in {1..100}; do PYTHONPATH=. python segfault.py>p ; done
Segmentation fault (core dumped)
real 0m9.803s
user 0m8.416s
sys 0m1.380s
Then, during my investigations I found this:
-
The crashes only happen when you combine Python + GCC + high compiler optimization level (-O2 or higher) + threading. I have verified this in both Ubuntu 15.10 and Gentoo 2.2.
-
The crashes do not happen when you replace GCC by CLANG or you don't use multi-threading or you use a low optimization level (-O1 or less).
-
The main C-Blosc library seems not affected by this. See this equivalent example in pure C.
-
When compiling python-blosc against an external C-Blosc library with this:
$ python setup.py build_ext --inplace --blosc=/my_c-blosc_lib_path
everything is fine, even in the case 1) above.
So, that's a funny situation, and after thinking about this for a good amount of time, I propose to approach this issue as follows:
-
In case the C-Blosc library is not found, print a visible warning saying that, for maximum performance, the user should install the C-Blosc library separately.
-
In case the vendored library is to be compiled inside the extension, force the use of -O1 in setup.py for Linux platforms (Mac OSX is not affected that much because CLANG/LLVM is probably used there, and Windows/MSVC is definitely not an issue here).
-
Add information about this issue early in the README file. If people is using Blosc it is probably because of speed reasons, so making this as apparent as possible seems reasonable.
Addedum: Here it follows what you can expect from using python-blosc with an external C-Blosc library:
$ PYTHONPATH=. python bench/compress_ptr.py
Creating NumPy arrays with 10**8 int64/float64 elements:
*** ctypes.memmove() *** Time for memcpy(): 0.295 s (2.53 GB/s)
Times for compressing/decompressing with clevel=5 and 8 threads
*** the arange linear distribution ***
*** blosclz , noshuffle *** 0.455 s (1.64 GB/s) / 0.087 s (8.58 GB/s) Compr. ratio: 1.0x
*** blosclz , shuffle *** 0.108 s (6.93 GB/s) / 0.075 s (10.00 GB/s) Compr. ratio: 57.1x
*** blosclz , bitshuffle *** 0.120 s (6.19 GB/s) / 0.107 s (6.97 GB/s) Compr. ratio: 74.0x
*** lz4 , noshuffle *** 0.342 s (2.18 GB/s) / 0.212 s (3.52 GB/s) Compr. ratio: 2.0x
*** lz4 , shuffle *** 0.078 s (9.54 GB/s) / 0.093 s (8.02 GB/s) Compr. ratio: 58.6x
*** lz4 , bitshuffle *** 0.116 s (6.41 GB/s) / 0.135 s (5.53 GB/s) Compr. ratio: 52.5x
*** lz4hc , noshuffle *** 8.142 s (0.09 GB/s) / 0.212 s (3.52 GB/s) Compr. ratio: 2.0x
*** lz4hc , shuffle *** 0.140 s (5.33 GB/s) / 0.092 s (8.06 GB/s) Compr. ratio: 137.2x
*** lz4hc , bitshuffle *** 1.572 s (0.47 GB/s) / 0.142 s (5.25 GB/s) Compr. ratio: 208.9x
*** snappy , noshuffle *** 0.381 s (1.95 GB/s) / 0.244 s (3.06 GB/s) Compr. ratio: 2.0x
*** snappy , shuffle *** 0.073 s (10.25 GB/s) / 0.136 s (5.48 GB/s) Compr. ratio: 17.4x
*** snappy , bitshuffle *** 0.126 s (5.92 GB/s) / 0.177 s (4.22 GB/s) Compr. ratio: 18.2x
*** zlib , noshuffle *** 5.298 s (0.14 GB/s) / 0.401 s (1.86 GB/s) Compr. ratio: 5.3x
*** zlib , shuffle *** 0.974 s (0.76 GB/s) / 0.393 s (1.90 GB/s) Compr. ratio: 237.3x
*** zlib , bitshuffle *** 1.026 s (0.73 GB/s) / 0.444 s (1.68 GB/s) Compr. ratio: 305.4x
*** the linspace linear distribution ***
*** blosclz , noshuffle *** 0.434 s (1.72 GB/s) / 0.088 s (8.45 GB/s) Compr. ratio: 1.0x
*** blosclz , shuffle *** 0.298 s (2.50 GB/s) / 0.090 s (8.32 GB/s) Compr. ratio: 2.0x
*** blosclz , bitshuffle *** 0.476 s (1.56 GB/s) / 0.166 s (4.50 GB/s) Compr. ratio: 2.8x
*** lz4 , noshuffle *** 0.219 s (3.41 GB/s) / 0.088 s (8.45 GB/s) Compr. ratio: 1.0x
*** lz4 , shuffle *** 0.190 s (3.92 GB/s) / 0.112 s (6.63 GB/s) Compr. ratio: 3.2x
*** lz4 , bitshuffle *** 0.248 s (3.00 GB/s) / 0.149 s (5.00 GB/s) Compr. ratio: 4.9x
*** lz4hc , noshuffle *** 2.797 s (0.27 GB/s) / 0.211 s (3.53 GB/s) Compr. ratio: 1.2x
*** lz4hc , shuffle *** 0.528 s (1.41 GB/s) / 0.085 s (8.78 GB/s) Compr. ratio: 24.1x
*** lz4hc , bitshuffle *** 2.918 s (0.26 GB/s) / 0.131 s (5.71 GB/s) Compr. ratio: 35.0x
*** snappy , noshuffle *** 0.088 s (8.49 GB/s) / 0.087 s (8.61 GB/s) Compr. ratio: 1.0x
*** snappy , shuffle *** 0.235 s (3.16 GB/s) / 0.176 s (4.24 GB/s) Compr. ratio: 4.2x
*** snappy , bitshuffle *** 0.317 s (2.35 GB/s) / 0.198 s (3.76 GB/s) Compr. ratio: 6.1x
*** zlib , noshuffle *** 6.569 s (0.11 GB/s) / 0.718 s (1.04 GB/s) Compr. ratio: 1.6x
*** zlib , shuffle *** 1.313 s (0.57 GB/s) / 0.339 s (2.20 GB/s) Compr. ratio: 27.0x
*** zlib , bitshuffle *** 1.348 s (0.55 GB/s) / 0.380 s (1.96 GB/s) Compr. ratio: 35.2x
*** the random distribution ***
*** blosclz , noshuffle *** 0.517 s (1.44 GB/s) / 0.087 s (8.60 GB/s) Compr. ratio: 1.0x
*** blosclz , shuffle *** 0.212 s (3.52 GB/s) / 0.070 s (10.62 GB/s) Compr. ratio: 3.9x
*** blosclz , bitshuffle *** 0.181 s (4.13 GB/s) / 0.104 s (7.16 GB/s) Compr. ratio: 6.1x
*** lz4 , noshuffle *** 0.373 s (2.00 GB/s) / 0.149 s (5.00 GB/s) Compr. ratio: 2.1x
*** lz4 , shuffle *** 0.135 s (5.52 GB/s) / 0.101 s (7.36 GB/s) Compr. ratio: 4.5x
*** lz4 , bitshuffle *** 0.129 s (5.77 GB/s) / 0.138 s (5.39 GB/s) Compr. ratio: 6.1x
*** lz4hc , noshuffle *** 4.684 s (0.16 GB/s) / 0.101 s (7.36 GB/s) Compr. ratio: 3.2x
*** lz4hc , shuffle *** 3.223 s (0.23 GB/s) / 0.101 s (7.37 GB/s) Compr. ratio: 5.4x
*** lz4hc , bitshuffle *** 0.429 s (1.74 GB/s) / 0.139 s (5.36 GB/s) Compr. ratio: 6.2x
*** snappy , noshuffle *** 0.461 s (1.62 GB/s) / 0.257 s (2.90 GB/s) Compr. ratio: 2.2x
*** snappy , shuffle *** 0.166 s (4.49 GB/s) / 0.160 s (4.66 GB/s) Compr. ratio: 4.3x
*** snappy , bitshuffle *** 0.136 s (5.48 GB/s) / 0.167 s (4.45 GB/s) Compr. ratio: 5.0x
*** zlib , noshuffle *** 5.383 s (0.14 GB/s) / 0.499 s (1.49 GB/s) Compr. ratio: 3.9x
*** zlib , shuffle *** 2.903 s (0.26 GB/s) / 0.408 s (1.83 GB/s) Compr. ratio: 6.1x
*** zlib , bitshuffle *** 1.403 s (0.53 GB/s) / 0.433 s (1.72 GB/s) Compr. ratio: 6.3x
Th above also has the advantage that C-Blosc CMake infraestructure can recognize the AVX2 support by the compiler much easier. Anyway, here it is the output with python-blosc extensions compiled with -O1 flag:
$ PYTHONPATH=. python bench/compress_ptr.py
Creating NumPy arrays with 10**8 int64/float64 elements:
*** ctypes.memmove() *** Time for memcpy(): 0.295 s (2.52 GB/s)
Times for compressing/decompressing with clevel=5 and 8 threads
*** the arange linear distribution ***
*** blosclz , noshuffle *** 0.517 s (1.44 GB/s) / 0.086 s (8.67 GB/s) Compr. ratio: 1.0x
*** blosclz , shuffle *** 0.115 s (6.50 GB/s) / 0.083 s (8.95 GB/s) Compr. ratio: 57.1x
*** blosclz , bitshuffle *** 0.167 s (4.46 GB/s) / 0.172 s (4.33 GB/s) Compr. ratio: 74.0x
*** lz4 , noshuffle *** 0.917 s (0.81 GB/s) / 0.246 s (3.03 GB/s) Compr. ratio: 2.0x
*** lz4 , shuffle *** 0.109 s (6.84 GB/s) / 0.145 s (5.12 GB/s) Compr. ratio: 58.6x
*** lz4 , bitshuffle *** 0.198 s (3.77 GB/s) / 0.267 s (2.79 GB/s) Compr. ratio: 52.5x
*** lz4hc , noshuffle *** 8.224 s (0.09 GB/s) / 0.245 s (3.04 GB/s) Compr. ratio: 2.0x
*** lz4hc , shuffle *** 0.193 s (3.86 GB/s) / 0.144 s (5.19 GB/s) Compr. ratio: 137.2x
*** lz4hc , bitshuffle *** 1.800 s (0.41 GB/s) / 0.206 s (3.62 GB/s) Compr. ratio: 208.9x
*** snappy , noshuffle *** 0.404 s (1.84 GB/s) / 0.251 s (2.97 GB/s) Compr. ratio: 2.0x
*** snappy , shuffle *** 0.110 s (6.78 GB/s) / 0.196 s (3.80 GB/s) Compr. ratio: 17.4x
*** snappy , bitshuffle *** 0.191 s (3.90 GB/s) / 0.306 s (2.43 GB/s) Compr. ratio: 18.2x
*** zlib , noshuffle *** 5.167 s (0.14 GB/s) / 0.410 s (1.82 GB/s) Compr. ratio: 5.3x
*** zlib , shuffle *** 1.046 s (0.71 GB/s) / 0.523 s (1.42 GB/s) Compr. ratio: 237.3x
*** zlib , bitshuffle *** 1.338 s (0.56 GB/s) / 0.721 s (1.03 GB/s) Compr. ratio: 305.4x
*** the linspace linear distribution ***
*** blosclz , noshuffle *** 0.540 s (1.38 GB/s) / 0.088 s (8.44 GB/s) Compr. ratio: 1.0x
*** blosclz , shuffle *** 0.324 s (2.30 GB/s) / 0.103 s (7.25 GB/s) Compr. ratio: 2.0x
*** blosclz , bitshuffle *** 0.533 s (1.40 GB/s) / 0.234 s (3.19 GB/s) Compr. ratio: 2.8x
*** lz4 , noshuffle *** 0.359 s (2.08 GB/s) / 0.088 s (8.43 GB/s) Compr. ratio: 1.0x
*** lz4 , shuffle *** 0.351 s (2.13 GB/s) / 0.142 s (5.26 GB/s) Compr. ratio: 3.2x
*** lz4 , bitshuffle *** 0.396 s (1.88 GB/s) / 0.221 s (3.37 GB/s) Compr. ratio: 4.9x
*** lz4hc , noshuffle *** 3.223 s (0.23 GB/s) / 0.239 s (3.12 GB/s) Compr. ratio: 1.2x
*** lz4hc , shuffle *** 0.572 s (1.30 GB/s) / 0.104 s (7.18 GB/s) Compr. ratio: 24.1x
*** lz4hc , bitshuffle *** 2.920 s (0.26 GB/s) / 0.203 s (3.67 GB/s) Compr. ratio: 35.0x
*** snappy , noshuffle *** 0.088 s (8.51 GB/s) / 0.088 s (8.49 GB/s) Compr. ratio: 1.0x
*** snappy , shuffle *** 0.262 s (2.85 GB/s) / 0.190 s (3.92 GB/s) Compr. ratio: 4.2x
*** snappy , bitshuffle *** 0.418 s (1.78 GB/s) / 0.256 s (2.91 GB/s) Compr. ratio: 6.1x
*** zlib , noshuffle *** 6.463 s (0.12 GB/s) / 0.753 s (0.99 GB/s) Compr. ratio: 1.6x
*** zlib , shuffle *** 1.431 s (0.52 GB/s) / 0.351 s (2.12 GB/s) Compr. ratio: 27.0x
*** zlib , bitshuffle *** 1.433 s (0.52 GB/s) / 0.451 s (1.65 GB/s) Compr. ratio: 35.2x
*** the random distribution ***
*** blosclz , noshuffle *** 0.538 s (1.38 GB/s) / 0.088 s (8.47 GB/s) Compr. ratio: 1.0x
*** blosclz , shuffle *** 0.232 s (3.21 GB/s) / 0.081 s (9.18 GB/s) Compr. ratio: 3.9x
*** blosclz , bitshuffle *** 0.221 s (3.37 GB/s) / 0.160 s (4.67 GB/s) Compr. ratio: 6.1x
*** lz4 , noshuffle *** 0.857 s (0.87 GB/s) / 0.218 s (3.42 GB/s) Compr. ratio: 2.1x
*** lz4 , shuffle *** 0.278 s (2.68 GB/s) / 0.176 s (4.24 GB/s) Compr. ratio: 4.5x
*** lz4 , bitshuffle *** 0.232 s (3.22 GB/s) / 0.268 s (2.78 GB/s) Compr. ratio: 6.1x
*** lz4hc , noshuffle *** 5.000 s (0.15 GB/s) / 0.151 s (4.92 GB/s) Compr. ratio: 3.2x
*** lz4hc , shuffle *** 3.526 s (0.21 GB/s) / 0.124 s (6.02 GB/s) Compr. ratio: 5.4x
*** lz4hc , bitshuffle *** 0.541 s (1.38 GB/s) / 0.206 s (3.61 GB/s) Compr. ratio: 6.2x
*** snappy , noshuffle *** 0.621 s (1.20 GB/s) / 0.260 s (2.86 GB/s) Compr. ratio: 2.2x
*** snappy , shuffle *** 0.196 s (3.80 GB/s) / 0.172 s (4.32 GB/s) Compr. ratio: 4.3x
*** snappy , bitshuffle *** 0.174 s (4.29 GB/s) / 0.224 s (3.32 GB/s) Compr. ratio: 5.0x
*** zlib , noshuffle *** 5.319 s (0.14 GB/s) / 0.505 s (1.48 GB/s) Compr. ratio: 3.9x
*** zlib , shuffle *** 2.910 s (0.26 GB/s) / 0.415 s (1.80 GB/s) Compr. ratio: 6.1x
*** zlib , bitshuffle *** 1.548 s (0.48 GB/s) / 0.492 s (1.52 GB/s) Compr. ratio: 6.3x
So, although the -O1 case still performs very well, the external library can be more than 2 GB/s faster in some cases (specially with the bitshuffle filter that takes quite a bit of advantage from AVX2).
Thoughts?
from python-blosc.
Related Issues (20)
- Issues decompressing bytes from files HOT 1
- Replace obsolete `popen2` HOT 1
- Properly identify vendored `cpuinfo.py` version
- Blosc_ROOT cmake warning: Policy CMP0074 is not set HOT 2
- "RuntimeError: Cannot decompress" for a compressed sequence of more than 7240 zero bytes HOT 1
- Very bad compression on short inputs 1-127 bytes long HOT 5
- “python_requires” should be set with “>=3.6”, as blosc 1.10.6 is not compatible with all Python versions. HOT 2
- wrong setuptools build command
- Concatenate two blosc compressed bytes objects HOT 2
- LICENSES/BLOSC.txt HOT 4
- Rename default branch HOT 1
- Update pypi with latest blosc version HOT 3
- Wheel for Python 3.10 and Python 3.11 HOT 3
- Cannot install blosc 1.11.0 on apple M1 machine HOT 3
- decompress in fore-end HOT 1
- README link to python-blosc2 seems useful HOT 1
- __pack_tensor__ must be made portable and not depend on Python HOT 2
- __pack_tensor__ should be in the beginning of the file to avoid seeking the whole file HOT 2
- Python 3.12 compatibility HOT 6
- Numpy 2 compatibility
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from python-blosc.