Code Monkey home page Code Monkey logo

Comments (1)

FrancescAlted avatar FrancescAlted commented on July 22, 2024

I have ended making a minimal example that crashes:

from __future__ import print_function
import numpy
import blosc

print("Blosc version info:", blosc.blosclib_version)
# Setting the number of threads to 3 accelerates the segfaults occurrencies
blosc.set_nthreads(3)

a = numpy.arange(1e6)
parray = blosc.compress(a, clevel=9, shuffle=blosc.SHUFFLE, cname="blosclz")
ratio = len(a) * a.itemsize * 1. / len(parray)
print("Compression: %s -> %s (%4.1fx)" % (
    len(a) * a.itemsize, len(parray), ratio))

With that, it is quite easy to make the python-blosc wrapper to crash:

$ time for i in {1..100}; do PYTHONPATH=. python segfault.py>p ; done
Segmentation fault (core dumped)

real    0m9.803s
user    0m8.416s
sys     0m1.380s

Then, during my investigations I found this:

  1. The crashes only happen when you combine Python + GCC + high compiler optimization level (-O2 or higher) + threading. I have verified this in both Ubuntu 15.10 and Gentoo 2.2.

  2. The crashes do not happen when you replace GCC by CLANG or you don't use multi-threading or you use a low optimization level (-O1 or less).

  3. The main C-Blosc library seems not affected by this. See this equivalent example in pure C.

  4. When compiling python-blosc against an external C-Blosc library with this:

 $ python setup.py build_ext --inplace --blosc=/my_c-blosc_lib_path

everything is fine, even in the case 1) above.

So, that's a funny situation, and after thinking about this for a good amount of time, I propose to approach this issue as follows:

  1. In case the C-Blosc library is not found, print a visible warning saying that, for maximum performance, the user should install the C-Blosc library separately.

  2. In case the vendored library is to be compiled inside the extension, force the use of -O1 in setup.py for Linux platforms (Mac OSX is not affected that much because CLANG/LLVM is probably used there, and Windows/MSVC is definitely not an issue here).

  3. Add information about this issue early in the README file. If people is using Blosc it is probably because of speed reasons, so making this as apparent as possible seems reasonable.

Addedum: Here it follows what you can expect from using python-blosc with an external C-Blosc library:

$ PYTHONPATH=. python bench/compress_ptr.py 
Creating NumPy arrays with 10**8 int64/float64 elements:
  *** ctypes.memmove() *** Time for memcpy():   0.295 s (2.53 GB/s)

Times for compressing/decompressing with clevel=5 and 8 threads

*** the arange linear distribution ***
  *** blosclz , noshuffle  ***  0.455 s (1.64 GB/s) / 0.087 s (8.58 GB/s)       Compr. ratio:   1.0x
  *** blosclz , shuffle    ***  0.108 s (6.93 GB/s) / 0.075 s (10.00 GB/s)      Compr. ratio:  57.1x
  *** blosclz , bitshuffle ***  0.120 s (6.19 GB/s) / 0.107 s (6.97 GB/s)       Compr. ratio:  74.0x
  *** lz4     , noshuffle  ***  0.342 s (2.18 GB/s) / 0.212 s (3.52 GB/s)       Compr. ratio:   2.0x
  *** lz4     , shuffle    ***  0.078 s (9.54 GB/s) / 0.093 s (8.02 GB/s)       Compr. ratio:  58.6x
  *** lz4     , bitshuffle ***  0.116 s (6.41 GB/s) / 0.135 s (5.53 GB/s)       Compr. ratio:  52.5x
  *** lz4hc   , noshuffle  ***  8.142 s (0.09 GB/s) / 0.212 s (3.52 GB/s)       Compr. ratio:   2.0x
  *** lz4hc   , shuffle    ***  0.140 s (5.33 GB/s) / 0.092 s (8.06 GB/s)       Compr. ratio: 137.2x
  *** lz4hc   , bitshuffle ***  1.572 s (0.47 GB/s) / 0.142 s (5.25 GB/s)       Compr. ratio: 208.9x
  *** snappy  , noshuffle  ***  0.381 s (1.95 GB/s) / 0.244 s (3.06 GB/s)       Compr. ratio:   2.0x
  *** snappy  , shuffle    ***  0.073 s (10.25 GB/s) / 0.136 s (5.48 GB/s)      Compr. ratio:  17.4x
  *** snappy  , bitshuffle ***  0.126 s (5.92 GB/s) / 0.177 s (4.22 GB/s)       Compr. ratio:  18.2x
  *** zlib    , noshuffle  ***  5.298 s (0.14 GB/s) / 0.401 s (1.86 GB/s)       Compr. ratio:   5.3x
  *** zlib    , shuffle    ***  0.974 s (0.76 GB/s) / 0.393 s (1.90 GB/s)       Compr. ratio: 237.3x
  *** zlib    , bitshuffle ***  1.026 s (0.73 GB/s) / 0.444 s (1.68 GB/s)       Compr. ratio: 305.4x

*** the linspace linear distribution ***
  *** blosclz , noshuffle  ***  0.434 s (1.72 GB/s) / 0.088 s (8.45 GB/s)       Compr. ratio:   1.0x
  *** blosclz , shuffle    ***  0.298 s (2.50 GB/s) / 0.090 s (8.32 GB/s)       Compr. ratio:   2.0x
  *** blosclz , bitshuffle ***  0.476 s (1.56 GB/s) / 0.166 s (4.50 GB/s)       Compr. ratio:   2.8x
  *** lz4     , noshuffle  ***  0.219 s (3.41 GB/s) / 0.088 s (8.45 GB/s)       Compr. ratio:   1.0x
  *** lz4     , shuffle    ***  0.190 s (3.92 GB/s) / 0.112 s (6.63 GB/s)       Compr. ratio:   3.2x
  *** lz4     , bitshuffle ***  0.248 s (3.00 GB/s) / 0.149 s (5.00 GB/s)       Compr. ratio:   4.9x
  *** lz4hc   , noshuffle  ***  2.797 s (0.27 GB/s) / 0.211 s (3.53 GB/s)       Compr. ratio:   1.2x
  *** lz4hc   , shuffle    ***  0.528 s (1.41 GB/s) / 0.085 s (8.78 GB/s)       Compr. ratio:  24.1x
  *** lz4hc   , bitshuffle ***  2.918 s (0.26 GB/s) / 0.131 s (5.71 GB/s)       Compr. ratio:  35.0x
  *** snappy  , noshuffle  ***  0.088 s (8.49 GB/s) / 0.087 s (8.61 GB/s)       Compr. ratio:   1.0x
  *** snappy  , shuffle    ***  0.235 s (3.16 GB/s) / 0.176 s (4.24 GB/s)       Compr. ratio:   4.2x
  *** snappy  , bitshuffle ***  0.317 s (2.35 GB/s) / 0.198 s (3.76 GB/s)       Compr. ratio:   6.1x
  *** zlib    , noshuffle  ***  6.569 s (0.11 GB/s) / 0.718 s (1.04 GB/s)       Compr. ratio:   1.6x
  *** zlib    , shuffle    ***  1.313 s (0.57 GB/s) / 0.339 s (2.20 GB/s)       Compr. ratio:  27.0x
  *** zlib    , bitshuffle ***  1.348 s (0.55 GB/s) / 0.380 s (1.96 GB/s)       Compr. ratio:  35.2x

*** the random distribution ***
  *** blosclz , noshuffle  ***  0.517 s (1.44 GB/s) / 0.087 s (8.60 GB/s)       Compr. ratio:   1.0x
  *** blosclz , shuffle    ***  0.212 s (3.52 GB/s) / 0.070 s (10.62 GB/s)      Compr. ratio:   3.9x
  *** blosclz , bitshuffle ***  0.181 s (4.13 GB/s) / 0.104 s (7.16 GB/s)       Compr. ratio:   6.1x
  *** lz4     , noshuffle  ***  0.373 s (2.00 GB/s) / 0.149 s (5.00 GB/s)       Compr. ratio:   2.1x
  *** lz4     , shuffle    ***  0.135 s (5.52 GB/s) / 0.101 s (7.36 GB/s)       Compr. ratio:   4.5x
  *** lz4     , bitshuffle ***  0.129 s (5.77 GB/s) / 0.138 s (5.39 GB/s)       Compr. ratio:   6.1x
  *** lz4hc   , noshuffle  ***  4.684 s (0.16 GB/s) / 0.101 s (7.36 GB/s)       Compr. ratio:   3.2x
  *** lz4hc   , shuffle    ***  3.223 s (0.23 GB/s) / 0.101 s (7.37 GB/s)       Compr. ratio:   5.4x
  *** lz4hc   , bitshuffle ***  0.429 s (1.74 GB/s) / 0.139 s (5.36 GB/s)       Compr. ratio:   6.2x
  *** snappy  , noshuffle  ***  0.461 s (1.62 GB/s) / 0.257 s (2.90 GB/s)       Compr. ratio:   2.2x
  *** snappy  , shuffle    ***  0.166 s (4.49 GB/s) / 0.160 s (4.66 GB/s)       Compr. ratio:   4.3x
  *** snappy  , bitshuffle ***  0.136 s (5.48 GB/s) / 0.167 s (4.45 GB/s)       Compr. ratio:   5.0x
  *** zlib    , noshuffle  ***  5.383 s (0.14 GB/s) / 0.499 s (1.49 GB/s)       Compr. ratio:   3.9x
  *** zlib    , shuffle    ***  2.903 s (0.26 GB/s) / 0.408 s (1.83 GB/s)       Compr. ratio:   6.1x
  *** zlib    , bitshuffle ***  1.403 s (0.53 GB/s) / 0.433 s (1.72 GB/s)       Compr. ratio:   6.3x

Th above also has the advantage that C-Blosc CMake infraestructure can recognize the AVX2 support by the compiler much easier. Anyway, here it is the output with python-blosc extensions compiled with -O1 flag:

$ PYTHONPATH=. python bench/compress_ptr.py 
Creating NumPy arrays with 10**8 int64/float64 elements:
  *** ctypes.memmove() *** Time for memcpy():   0.295 s (2.52 GB/s)

Times for compressing/decompressing with clevel=5 and 8 threads

*** the arange linear distribution ***
  *** blosclz , noshuffle  ***  0.517 s (1.44 GB/s) / 0.086 s (8.67 GB/s)       Compr. ratio:   1.0x
  *** blosclz , shuffle    ***  0.115 s (6.50 GB/s) / 0.083 s (8.95 GB/s)       Compr. ratio:  57.1x
  *** blosclz , bitshuffle ***  0.167 s (4.46 GB/s) / 0.172 s (4.33 GB/s)       Compr. ratio:  74.0x
  *** lz4     , noshuffle  ***  0.917 s (0.81 GB/s) / 0.246 s (3.03 GB/s)       Compr. ratio:   2.0x
  *** lz4     , shuffle    ***  0.109 s (6.84 GB/s) / 0.145 s (5.12 GB/s)       Compr. ratio:  58.6x
  *** lz4     , bitshuffle ***  0.198 s (3.77 GB/s) / 0.267 s (2.79 GB/s)       Compr. ratio:  52.5x
  *** lz4hc   , noshuffle  ***  8.224 s (0.09 GB/s) / 0.245 s (3.04 GB/s)       Compr. ratio:   2.0x
  *** lz4hc   , shuffle    ***  0.193 s (3.86 GB/s) / 0.144 s (5.19 GB/s)       Compr. ratio: 137.2x
  *** lz4hc   , bitshuffle ***  1.800 s (0.41 GB/s) / 0.206 s (3.62 GB/s)       Compr. ratio: 208.9x
  *** snappy  , noshuffle  ***  0.404 s (1.84 GB/s) / 0.251 s (2.97 GB/s)       Compr. ratio:   2.0x
  *** snappy  , shuffle    ***  0.110 s (6.78 GB/s) / 0.196 s (3.80 GB/s)       Compr. ratio:  17.4x
  *** snappy  , bitshuffle ***  0.191 s (3.90 GB/s) / 0.306 s (2.43 GB/s)       Compr. ratio:  18.2x
  *** zlib    , noshuffle  ***  5.167 s (0.14 GB/s) / 0.410 s (1.82 GB/s)       Compr. ratio:   5.3x
  *** zlib    , shuffle    ***  1.046 s (0.71 GB/s) / 0.523 s (1.42 GB/s)       Compr. ratio: 237.3x
  *** zlib    , bitshuffle ***  1.338 s (0.56 GB/s) / 0.721 s (1.03 GB/s)       Compr. ratio: 305.4x

*** the linspace linear distribution ***
  *** blosclz , noshuffle  ***  0.540 s (1.38 GB/s) / 0.088 s (8.44 GB/s)       Compr. ratio:   1.0x
  *** blosclz , shuffle    ***  0.324 s (2.30 GB/s) / 0.103 s (7.25 GB/s)       Compr. ratio:   2.0x
  *** blosclz , bitshuffle ***  0.533 s (1.40 GB/s) / 0.234 s (3.19 GB/s)       Compr. ratio:   2.8x
  *** lz4     , noshuffle  ***  0.359 s (2.08 GB/s) / 0.088 s (8.43 GB/s)       Compr. ratio:   1.0x
  *** lz4     , shuffle    ***  0.351 s (2.13 GB/s) / 0.142 s (5.26 GB/s)       Compr. ratio:   3.2x
  *** lz4     , bitshuffle ***  0.396 s (1.88 GB/s) / 0.221 s (3.37 GB/s)       Compr. ratio:   4.9x
  *** lz4hc   , noshuffle  ***  3.223 s (0.23 GB/s) / 0.239 s (3.12 GB/s)       Compr. ratio:   1.2x
  *** lz4hc   , shuffle    ***  0.572 s (1.30 GB/s) / 0.104 s (7.18 GB/s)       Compr. ratio:  24.1x
  *** lz4hc   , bitshuffle ***  2.920 s (0.26 GB/s) / 0.203 s (3.67 GB/s)       Compr. ratio:  35.0x
  *** snappy  , noshuffle  ***  0.088 s (8.51 GB/s) / 0.088 s (8.49 GB/s)       Compr. ratio:   1.0x
  *** snappy  , shuffle    ***  0.262 s (2.85 GB/s) / 0.190 s (3.92 GB/s)       Compr. ratio:   4.2x
  *** snappy  , bitshuffle ***  0.418 s (1.78 GB/s) / 0.256 s (2.91 GB/s)       Compr. ratio:   6.1x
  *** zlib    , noshuffle  ***  6.463 s (0.12 GB/s) / 0.753 s (0.99 GB/s)       Compr. ratio:   1.6x
  *** zlib    , shuffle    ***  1.431 s (0.52 GB/s) / 0.351 s (2.12 GB/s)       Compr. ratio:  27.0x
  *** zlib    , bitshuffle ***  1.433 s (0.52 GB/s) / 0.451 s (1.65 GB/s)       Compr. ratio:  35.2x

*** the random distribution ***
  *** blosclz , noshuffle  ***  0.538 s (1.38 GB/s) / 0.088 s (8.47 GB/s)       Compr. ratio:   1.0x
  *** blosclz , shuffle    ***  0.232 s (3.21 GB/s) / 0.081 s (9.18 GB/s)       Compr. ratio:   3.9x
  *** blosclz , bitshuffle ***  0.221 s (3.37 GB/s) / 0.160 s (4.67 GB/s)       Compr. ratio:   6.1x
  *** lz4     , noshuffle  ***  0.857 s (0.87 GB/s) / 0.218 s (3.42 GB/s)       Compr. ratio:   2.1x
  *** lz4     , shuffle    ***  0.278 s (2.68 GB/s) / 0.176 s (4.24 GB/s)       Compr. ratio:   4.5x
  *** lz4     , bitshuffle ***  0.232 s (3.22 GB/s) / 0.268 s (2.78 GB/s)       Compr. ratio:   6.1x
  *** lz4hc   , noshuffle  ***  5.000 s (0.15 GB/s) / 0.151 s (4.92 GB/s)       Compr. ratio:   3.2x
  *** lz4hc   , shuffle    ***  3.526 s (0.21 GB/s) / 0.124 s (6.02 GB/s)       Compr. ratio:   5.4x
  *** lz4hc   , bitshuffle ***  0.541 s (1.38 GB/s) / 0.206 s (3.61 GB/s)       Compr. ratio:   6.2x
  *** snappy  , noshuffle  ***  0.621 s (1.20 GB/s) / 0.260 s (2.86 GB/s)       Compr. ratio:   2.2x
  *** snappy  , shuffle    ***  0.196 s (3.80 GB/s) / 0.172 s (4.32 GB/s)       Compr. ratio:   4.3x
  *** snappy  , bitshuffle ***  0.174 s (4.29 GB/s) / 0.224 s (3.32 GB/s)       Compr. ratio:   5.0x
  *** zlib    , noshuffle  ***  5.319 s (0.14 GB/s) / 0.505 s (1.48 GB/s)       Compr. ratio:   3.9x
  *** zlib    , shuffle    ***  2.910 s (0.26 GB/s) / 0.415 s (1.80 GB/s)       Compr. ratio:   6.1x
  *** zlib    , bitshuffle ***  1.548 s (0.48 GB/s) / 0.492 s (1.52 GB/s)       Compr. ratio:   6.3x

So, although the -O1 case still performs very well, the external library can be more than 2 GB/s faster in some cases (specially with the bitshuffle filter that takes quite a bit of advantage from AVX2).

Thoughts?

from python-blosc.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.