Comments (11)
any hunches?
from python-blosc.
Apparently this line:
https://github.com/FrancescAlted/blosc/blob/master/blosc/blosc.c#L657
evaluates to true, and hence typesize
is set to 1, then the shuffle does not trigger, and hence the data cannot be compressed.
I still need some more research on why the heck this does not work properly on Mac OSX (using Lion here) :-/
from python-blosc.
Potentially an issue with signed / unsigned potenially also issues with casting?
from python-blosc.
Its funny, I have just discovered that using gcc, the comparison works properly, but clang fails. The problem is that clang is used by default in Mac OSX. Anyway, here are the details for the faulty clang:
$ clang -v
Apple clang version 3.0 (tags/Apple/clang-211.12) (based on LLVM 3.0svn)
Target: x86_64-apple-darwin11.4.2
Thread model: posix
For the record, my current clang works properly if the condition is written as:
if ((int)typesize > BLOSC_MAX_TYPESIZE) {
instead of the original:
if (typesize > BLOSC_MAX_TYPESIZE) {
Maybe this is a problem with a pretty old clang. I'm going to try updating it, although that might affect more people like me. Hmm...
from python-blosc.
Just updated xcode (which include clang) in my box, but the issue is not resolved. Here it is the version of the new clang:
$ clang -v
Apple LLVM version 4.2 (clang-425.0.28) (based on LLVM 3.2svn)
Target: x86_64-apple-darwin11.4.2
Thread model: posix
The funny thing is that, after upgrading xcode, gcc also got the same (bad) behavior. Here it is the gcc version:
$ gcc -v
Using built-in specs.
Target: i686-apple-darwin11
Configured with: /private/var/tmp/llvmgcc42/llvmgcc42-2336.11~182/src/configure --disable-checking --enable-werror --prefix=/Applications/Xcode.app/Contents/Developer/usr/llvm-gcc-4.2 --mandir=/share/man --enable-languages=c,objc,c++,obj-c++ --program-prefix=llvm- --program-transform-name=/^[cg][^.-]*$/s/$/-4.2/ --with-slibdir=/usr/lib --build=i686-apple-darwin11 --enable-llvm=/private/var/tmp/llvmgcc42/llvmgcc42-2336.11~182/dst-llvmCore/Developer/usr/local --program-prefix=i686-apple-darwin11- --host=x86_64-apple-darwin11 --target=i686-apple-darwin11 --with-gxx-include-dir=/usr/include/c++/4.2.1
Thread model: posix
gcc version 4.2.1 (Based on Apple Inc. build 5658) (LLVM build 2336.11.00)
Anyway, using:
if ((int)typesize > BLOSC_MAX_TYPESIZE) {
seems to fix the issue in both compilers. The problem is that I'm not sure why the cast should be done (for me it is clang/gcc problem on Mac OSX). Anyway, I'm afraid that the only way to solve this is to add the cast in Blosc directly. Hmm, time for a new Blosc release...
from python-blosc.
Can you reproduce this with pure blosc?
from python-blosc.
Maybe the following line ๐
https://github.com/FrancescAlted/python-blosc/blob/master/blosc/blosc_extension.c#L125
Should be using n
instead of i
? But that is a wild guess.
from python-blosc.
Your wild guess turned out to be correct. I'm very surprised that was the problem. In fact, Blosc itself does not have the problem (which is another indicator that the problem was the format). That means that we should be very cautious when passing arguments to C extensions. You have been very inspired man. Thanks!
from python-blosc.
I haven't yet fully grokked the whole size_t
, sszie_t
and Py_ssize_t
business yet, it seems to be quite important. Do you have any hypothesis, what exactly went wrong? I don't see how an integer value of 4
could be misinterpreted...
from python-blosc.
Frankly, I don't understand either what was happening, and why this affected just the Mac platform (Linux and Win seemed to be perfectly happy without the patch above).
Anyway, I have just committed another patch (rev dc1525c) for uniformizing the treatment of size_t. Seems to pass all tests on my Mac box (I still need to test that on Win and Linux, but I don't expecte surprises there).
from python-blosc.
I think I know what was happening here. In 64-bit platforms, size_t
is 64-bit large, so when asking for an 'i' conversion, only the lower 32-bit are set. That means that the higher 32-bit are not set, and that part could have some dirty values on it (i.e. they might not necessarily be zeroed). Because of this, the comparison:
if (typesize > BLOSC_MAX_TYPESIZE)
could fail depending on the dirty values in higher 32-bit. Of course:
if ((int)typesize > BLOSC_MAX_TYPESIZE)
worked because this is enforcing to consider only the lower 32-bit in the comparison.
from python-blosc.
Related Issues (20)
- Issues decompressing bytes from files HOT 1
- Replace obsolete `popen2` HOT 1
- Properly identify vendored `cpuinfo.py` version
- Blosc_ROOT cmake warning: Policy CMP0074 is not set HOT 2
- "RuntimeError: Cannot decompress" for a compressed sequence of more than 7240 zero bytes HOT 1
- Very bad compression on short inputs 1-127 bytes long HOT 5
- โpython_requiresโ should be set with โ>=3.6โ, as blosc 1.10.6 is not compatible with all Python versions. HOT 2
- wrong setuptools build command
- Concatenate two blosc compressed bytes objects HOT 2
- LICENSES/BLOSC.txt HOT 4
- Rename default branch HOT 1
- Update pypi with latest blosc version HOT 3
- Wheel for Python 3.10 and Python 3.11 HOT 3
- Cannot install blosc 1.11.0 on apple M1 machine HOT 3
- decompress in fore-end HOT 1
- README link to python-blosc2 seems useful HOT 1
- __pack_tensor__ must be made portable and not depend on Python HOT 2
- __pack_tensor__ should be in the beginning of the file to avoid seeking the whole file HOT 2
- Python 3.12 compatibility HOT 6
- Numpy 2 compatibility
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. ๐๐๐
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google โค๏ธ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from python-blosc.