hajimes / mmh3 Goto Github PK
View Code? Open in Web Editor NEWPython extension for MurmurHash (MurmurHash3), a set of fast and robust hash functions.
Home Page: https://pypi.org/project/mmh3/
License: MIT License
Python extension for MurmurHash (MurmurHash3), a set of fast and robust hash functions.
Home Page: https://pypi.org/project/mmh3/
License: MIT License
mmh3
is currently not hashlib-compliant. This makes it challenging to use it as a replacement for md5 or other cryptographic hashes. A wrapper can be built to make this module hashlib-compliant. One should be able to use the module as hashlib.md5.
update() -- update the current digest with an additional string
digest() -- return the current digest value
hexdigest() -- return the current digest as a string of hexadecimal digits
intdigest() -- return the current digest as an integer
copy() -- return a copy of the current mmh3 object
reset() -- reset state
The original c++ code of MurmurHash3 by Austin Appleby is endian-sensitive. The advantage of this style is, first and foremost, performance.
However, inconsistency between platforms may cause problems in various fields, e.g., NLP (cf. explosion/murmurhash#26).
In addition, several IoT search engines (including Shodan) use a little-endian variant mmh3 value as the fingerprint of a favicon.
To guarantee portability and consistency across platforms, mmh3
will use little-endian variant values for all architectures from version 4.0.0, even though it will make the hash functions slow on big-endian architectures.
mmh3 is not compiling when installing through pip via packet or git.
This has been raised once in the past (#7) and fixed but the issue appeared again.
Specs of my system:
gcc: 7.3.0
os: ubuntu 16 / ubuntu 18
pip: 9.0.1
git: 2.17.1
python: 2.7.15rc1
Error is exactly the same:
building 'mmh3' extension
creating build
creating build/temp.linux-x86_64-2.7
x86_64-linux-gnu-gcc -pthread -DNDEBUG -g -fwrapv -O2 -Wall -Wstrict-prototypes -fno-strict-aliasing -Wdate-time -D_FORTIFY_SOURCE=2 -g -fstack-protector-strong -Wformat -Werror=format-security -fPIC -I/usr/include/python2.7 -c
mmh3module.cpp -o build/temp.linux-x86_64-2.7/mmh3module.o
cc1plus: warning: command line option ‘-Wstrict-prototypes’ is valid for C/ObjC but not for C++
x86_64-linux-gnu-gcc -pthread -DNDEBUG -g -fwrapv -O2 -Wall -Wstrict-prototypes -fno-strict-aliasing -Wdate-time -D_FORTIFY_SOURCE=2 -g -fstack-protector-strong -Wformat -Werror=format-security -fPIC -I/usr/include/python2.7 -c
MurmurHash3.cpp -o build/temp.linux-x86_64-2.7/MurmurHash3.o
cc1plus: warning: command line option ‘-Wstrict-prototypes’ is valid for C/ObjC but not for C++
MurmurHash3.cpp:34:31: error: expected constructor, destructor, or type conversion before ‘(’ token
#define FORCE_INLINE attribute((always_inline))
^
MurmurHash3.cpp:60:1: note: in expansion of macro ‘FORCE_INLINE’
FORCE_INLINE uint32_t getblock ( const uint32_t * p, int i )
^
MurmurHash3.cpp:34:31: error: expected constructor, destructor, or type conversion before ‘(’ token
#define FORCE_INLINE attribute((always_inline))
^
MurmurHash3.cpp:65:1: note: in expansion of macro ‘FORCE_INLINE’
FORCE_INLINE uint64_t getblock ( const uint64_t * p, int i )
^
MurmurHash3.cpp:34:31: error: expected constructor, destructor, or type conversion before ‘(’ token
#define FORCE_INLINE attribute((always_inline))
^
MurmurHash3.cpp:73:1: note: in expansion of macro ‘FORCE_INLINE’
FORCE_INLINE uint32_t fmix ( uint32_t h )
^
MurmurHash3.cpp:34:31: error: expected constructor, destructor, or type conversion before ‘(’ token
#define FORCE_INLINE attribute((always_inline))
^
MurmurHash3.cpp:86:1: note: in expansion of macro ‘FORCE_INLINE’
FORCE_INLINE uint64_t fmix ( uint64_t k )
^
MurmurHash3.cpp: In function ‘void MurmurHash3_x86_32(const void*, int, uint32_t, void*)’:
MurmurHash3.cpp:117:36: error: ‘getblock’ was not declared in this scope
uint32_t k1 = getblock(blocks,i);
^
MurmurHash3.cpp:148:15: error: ‘fmix’ was not declared in this scope
h1 = fmix(h1);
^
MurmurHash3.cpp: In function ‘void MurmurHash3_x86_128(const void*, int, uint32_t, void*)’:
MurmurHash3.cpp:178:40: error: ‘getblock’ was not declared in this scope
uint32_t k1 = getblock(blocks,i*4+0);
^
MurmurHash3.cpp:244:15: error: ‘fmix’ was not declared in this scope
h1 = fmix(h1);
^
MurmurHash3.cpp: In function ‘void MurmurHash3_x64_128(const void*, int, uint32_t, void*)’:
MurmurHash3.cpp:279:40: error: ‘getblock’ was not declared in this scope
uint64_t k1 = getblock(blocks,i*2+0);
^
MurmurHash3.cpp:329:15: error: ‘fmix’ was not declared in this scope
h1 = fmix(h1);
^
error: command 'x86_64-linux-gnu-gcc' failed with exit status 1
----------------------------------------
Failed building wheel for mmh3
As a user it is very confusing why mmh3.hash64
returns two 64-bit values, whereas hash
and hash128
do not. Are we supposed to just pick one of them? Is there a recommended way to combine them?
Please provide official packaging for 3.10 and 3.11.
mmh3 is not compiling when installing through pip via packet or git.
I want to use it for the Bloom filter implemented here:
http://www.maxburstein.com/blog/creating-a-simple-bloom-filter/
Specs of my system:
gcc: 5.1.0
os: archlinux
pip: 7.0.3
git: 2.4.2
python: 3.4.3 / 2.7.9
Output from pip
> sudo pip install mmh3
Collecting mmh3
Using cached mmh3-2.3.tar.gz
Installing collected packages: mmh3
Running setup.py install for mmh3
Complete output from command /usr/bin/python -c "import setuptools, tokenize;__file__='/tmp/pip-build-ia20ohxq/mmh3/setup.py';exec(compile(getattr(tokenize, 'open', open)(__file__).read().replace('\r\n', '\n'), __file__, 'exec'))" install --record /tmp/pip-dmuabtur-record/install-record.txt --single-version-externally-managed --compile:
running install
running build
running build_ext
building 'mmh3' extension
creating build
creating build/temp.linux-x86_64-3.4
gcc -pthread -Wno-unused-result -DNDEBUG -g -fwrapv -O3 -Wall -Wstrict-prototypes -march=x86-64 -mtune=generic -O2 -pipe -fstack-protector-strong --param=ssp-buffer-size=4 -fPIC -I/usr/include/python3.4m -c mmh3module.cpp -o build/temp.linux-x86_64-3.4/mmh3module.o
cc1plus: warning: command line option ‘-Wstrict-prototypes’ is valid for C/ObjC but not for C++
gcc -pthread -Wno-unused-result -DNDEBUG -g -fwrapv -O3 -Wall -Wstrict-prototypes -march=x86-64 -mtune=generic -O2 -pipe -fstack-protector-strong --param=ssp-buffer-size=4 -fPIC -I/usr/include/python3.4m -c MurmurHash3.cpp -o build/temp.linux-x86_64-3.4/MurmurHash3.o
cc1plus: warning: command line option ‘-Wstrict-prototypes’ is valid for C/ObjC but not for C++
MurmurHash3.cpp:34:31: error: expected constructor, destructor, or type conversion before ‘(’ token
#define FORCE_INLINE attribute((always_inline))
^
MurmurHash3.cpp:60:1: note: in expansion of macro ‘FORCE_INLINE’
FORCE_INLINE uint32_t getblock ( const uint32_t * p, int i )
^
MurmurHash3.cpp:34:31: error: expected constructor, destructor, or type conversion before ‘(’ token
#define FORCE_INLINE attribute((always_inline))
^
MurmurHash3.cpp:65:1: note: in expansion of macro ‘FORCE_INLINE’
FORCE_INLINE uint64_t getblock ( const uint64_t * p, int i )
^
MurmurHash3.cpp:34:31: error: expected constructor, destructor, or type conversion before ‘(’ token
#define FORCE_INLINE attribute((always_inline))
^
MurmurHash3.cpp:73:1: note: in expansion of macro ‘FORCE_INLINE’
FORCE_INLINE uint32_t fmix ( uint32_t h )
^
MurmurHash3.cpp:34:31: error: expected constructor, destructor, or type conversion before ‘(’ token
#define FORCE_INLINE attribute((always_inline))
^
MurmurHash3.cpp:86:1: note: in expansion of macro ‘FORCE_INLINE’
FORCE_INLINE uint64_t fmix ( uint64_t k )
^
MurmurHash3.cpp: In function ‘void MurmurHash3_x86_32(const void*, int, uint32_t, void*)’:
MurmurHash3.cpp:117:36: error: ‘getblock’ was not declared in this scope
uint32_t k1 = getblock(blocks,i);
^
MurmurHash3.cpp:148:15: error: ‘fmix’ was not declared in this scope
h1 = fmix(h1);
^
MurmurHash3.cpp: In function ‘void MurmurHash3_x86_128(const void*, int, uint32_t, void*)’:
MurmurHash3.cpp:178:40: error: ‘getblock’ was not declared in this scope
uint32_t k1 = getblock(blocks,i*4+0);
^
MurmurHash3.cpp:244:15: error: ‘fmix’ was not declared in this scope
h1 = fmix(h1);
^
MurmurHash3.cpp: In function ‘void MurmurHash3_x64_128(const void*, int, uint32_t, void*)’:
MurmurHash3.cpp:279:40: error: ‘getblock’ was not declared in this scope
uint64_t k1 = getblock(blocks,i*2+0);
^
MurmurHash3.cpp:329:15: error: ‘fmix’ was not declared in this scope
h1 = fmix(h1);
^
error: command 'gcc' failed with exit status 1
----------------------------------------
Command "/usr/bin/python -c "import setuptools, tokenize;__file__='/tmp/pip-build-ia20ohxq/mmh3/setup.py';exec(compile(getattr(tokenize, 'open', open)(__file__).read().replace('\r\n', '\n'), __file__, 'exec'))" install --record /tmp/pip-dmuabtur-record/install-record.txt --single-version-externally-managed --compile" failed with error code 1 in /tmp/pip-build-ia20ohxq/mmh3
C:\Users\Saurabh bhandari\AppData\Local\Programs\Python\Python38>pip install mmh3
Collecting mmh3
Using cached mmh3-2.5.1.tar.gz (9.8 kB)
Using legacy 'setup.py install' for mmh3, since package 'wheel' is not installed.
Installing collected packages: mmh3
Running setup.py install for mmh3 ... error
ERROR: Command errored out with exit status 1:
command: 'c:\users\saurabh bhandari\appdata\local\programs\python\python38\python.exe' -u -c 'import sys, setuptools, tokenize; sys.argv[0] = '"'"'C:\Users\Saurabh bhandari\AppData\Local\Temp\pip-install-ukiszg0q\mmh3\setup.py'"'"'; file='"'"'C:\Users\Saurabh bhandari\AppData\Local\Temp\pip-install-ukiszg0q\mmh3\setup.py'"'"';f=getattr(tokenize, '"'"'open'"'"', open)(file);code=f.read().replace('"'"'\r\n'"'"', '"'"'\n'"'"');f.close();exec(compile(code, file, '"'"'exec'"'"'))' install --record 'C:\Users\Saurabh bhandari\AppData\Local\Temp\pip-record-iye5vtng\install-record.txt' --single-version-externally-managed --compile --install-headers 'c:\users\saurabh bhandari\appdata\local\programs\python\python38\Include\mmh3'
cwd: C:\Users\Saurabh bhandari\AppData\Local\Temp\pip-install-ukiszg0q\mmh3
Complete output (11 lines):
running install
running build
running build_ext
building 'mmh3' extension
creating build
creating build\temp.win-amd64-3.8
creating build\temp.win-amd64-3.8\Release
C:\Program Files (x86)\Microsoft Visual Studio\2019\BuildTools\VC\Tools\MSVC\14.26.28801\bin\HostX86\x64\cl.exe /c /nologo /Ox /W3 /GL /DNDEBUG /MD "-Ic:\users\saurabh bhandari\appdata\local\programs\python\python38\include" "-Ic:\users\saurabh bhandari\appdata\local\programs\python\python38\include" "-IC:\Program Files (x86)\Microsoft Visual Studio\2019\BuildTools\VC\Tools\MSVC\14.26.28801\include" /EHsc /Tpmmh3module.cpp /Fobuild\temp.win-amd64-3.8\Release\mmh3module.obj
mmh3module.cpp
mmh3module.cpp(12): fatal error C1083: Cannot open include file: 'stdio.h': No such file or directory
error: command 'C:\Program Files (x86)\Microsoft Visual Studio\2019\BuildTools\VC\Tools\MSVC\14.26.28801\bin\HostX86\x64\cl.exe' failed with exit status 2
----------------------------------------
ERROR: Command errored out with exit status 1: 'c:\users\saurabh bhandari\appdata\local\programs\python\python38\python.exe' -u -c 'import sys, setuptools, tokenize; sys.argv[0] = '"'"'C:\Users\Saurabh bhandari\AppData\Local\Temp\pip-install-ukiszg0q\mmh3\setup.py'"'"'; file='"'"'C:\Users\Saurabh bhandari\AppData\Local\Temp\pip-install-ukiszg0q\mmh3\setup.py'"'"';f=getattr(tokenize, '"'"'open'"'"', open)(file);code=f.read().replace('"'"'\r\n'"'"', '"'"'\n'"'"');f.close();exec(compile(code, file, '"'"'exec'"'"'))' install --record 'C:\Users\Saurabh bhandari\AppData\Local\Temp\pip-record-iye5vtng\install-record.txt' --single-version-externally-managed --compile --install-headers 'c:\users\saurabh bhandari\appdata\local\programs\python\python38\Include\mmh3' Check the logs for full command output.
cpp: (compile with the code https://github.com/hajimes/mmh3/blob/master/MurmurHash3.h)
uint32_t MurmurHash32(const void* key, size_t len) {
uint32_t hash;
MurmurHash3_x86_32(key, (int)len, 0, &hash);
std::cout << "site= " << key << ", code= " << code << std::endl;
return hash;
}
// output: (import mmh3)
site= www.taobao.com, code= 4076543410
python:
import mmh3
res = mmh3.hash(domain, signed=False)
print 'site= %s, code= %s' % (domain, res)
// output:
site= www.taobao.com, code= 3707551990
why the same site "www.taobao.com" get 4076543410 in cpp while 3707551990 in python?
Hello @hajimes thank you so much for providing this murmur3 implementation in Python and for all the work you do in open source; we really appreciate being able to use your library!
I was recently investigating some compatibility issues with the output produced by mmh3 and a Go library we were using internally. I asked the question on Stackoverflow and got a response back:
https://stackoverflow.com/questions/75921577/murmur3-hash-compatibility-between-go-and-python
It looks like the order of the two uint64s returned by the 128-bit algorithm is reversed between the two libraries; but it's simple enough to modify the returned results in either Go or Python to produce compatible hashes.
I was wondering; would you like me to open a PR to update the README with the compatibility information? Is there any other docs I should update in the PR?
Additionally, if there is any way to reverse the order order of the uint64s returned by murmur3 (e.g. with an argument to hash128
or hash_bytes
) I'd be happy to open a PR for that as well. Let me know how you'd like to proceed!
I also encountered the problem of mm3. I have been very distressed looking for a solution. The errors I see here are not the same as mine. Can you help me?
——————————————————————————————————————————
1 warning generated.
creating build/lib.macosx-10.6-intel-3.6
/usr/bin/clang++ -bundle -undefined dynamic_lookup -arch i386 -arch x86_64 -g -L/usr/local/opt/openssl/lib -I/usr/local/opt/openssl/include build/temp.macosx-10.6-intel-3.6/mmh3module.o build/temp.macosx-10.6-intel-3.6/MurmurHash3.o -o build/lib.macosx-10.6-intel-3.6/mmh3.cpython-36m-darwin.so
clang: warning: libstdc++ is deprecated; move to libc++ with a minimum deployment target of OS X 10.9 [-Wdeprecated]
clang: warning: libstdc++ is deprecated; move to libc++ with a minimum deployment target of OS X 10.9 [-Wdeprecated]
ld: library not found for -lstdc++
clang: error: linker command failed with exit code 1 (use -v to see invocation)
error: command '/usr/bin/clang++' failed with exit status 1
It looks like this change broke the hash_bytes function in 2.5:
3bf1e5a - Add a keyword argument signed (2 days ago) <Hajime Senuma>
diff --git a/mmh3module.cpp b/mmh3module.cpp
index ef49083..ba771ee 100644
--- a/mmh3module.cpp
+++ b/mmh3module.cpp
@@ -143,7 +155,7 @@ mmh3_hash_bytes(PyObject *self, PyObject *args, PyObject *keywds)
static char *kwlist[] = {(char *)"key", (char *)"seed",
(char *)"x64arch", NULL};
- if (!PyArg_ParseTupleAndKeywords(args, keywds, "s#|IB", kwlist,
+ if (!PyArg_ParseTupleAndKeywords(args, keywds, "s#|IBB", kwlist,
&target_str, &target_str_len, &seed, &x64arch)) {
return NULL;
There is no additional is_signed
kwarg. I'm not sure if there should be. The result is this error every time the hash_bytes
function is used:
RuntimeError: more argument specifiers than keyword list entries (remaining format:'B')
I plan to switch the license of this project from CC0 to MIT in the very near future.
The adoption of CC0 was an homage to Austin Appleby, the inventor of the MurmurHash3 algorithm, who published the code under the public domain.
However, CC0 is not recognized as an OSI-approved license, as it was withdrew in 2012 from the review process. Besides, in 2022, the Fedora community said they planned to demote the status of CC0 from "good" to "allowed-content only".
Considering these issues, I made a decision to adopt the MIT License, a simple yet one of the most popular OSI-approved permissive licenses.
Consider: mmh3.hash("Hello World")
.
Expected behavior: returns 427197390
Actual behavior: raises exception AttributeError: module 'mmh3' has no attribute 'hash'
Regression: This works in version 4.0.0. The error is triggered in version 4.0.1.
Environment: Curiously, this seems to happen when running the test through Bazel, not when installing into a virtual environment. Not sure if the bug is on the mmh3
side or the Bazel side, but something changed between 4.0.0 and 4.0.1. Can you help me figure out what?
To reproduce, get the gist from https://gist.github.com/vonschultz/18b4e58a697d56c8cc421528e0a4ef13 and run
bazelisk test --test_output=streamed //...
Get bazelisk
from https://github.com/bazelbuild/bazelisk/releases if you don't already have it.
I'm running Ubuntu 20.04.
I was using mmh3 as part of a project, and was getting invalid values when I tried to rescaled the hashes into the range [0, 1]. Turns out, mmh3.hash128 was returning unsigned integers, not signed integers as the documentation suggests.
I was using Python 3.6.2 and mmh3 2.4.
C:\Users\10324\Downloads\mmh3-master\mmh3-master>python setup.py build
running build
running build_ext
building 'mmh3' extension
C:\Program Files (x86)\Microsoft Visual Studio 14.0\VC\BIN\x86_amd64\cl.exe /c /nologo /Ox /W3 /GL /DNDEBUG /MD -IC:\Users\10324\AppData\Local\Programs\Python\Python36\include -IC:\Users\10324\AppData\Local\Programs\Python\Python36\include "-IC:\Program Files (x86)\Microsoft Visual Studio 14.0\VC\INCLUDE" "-IC:\Program Files (x86)\Windows Kits\10\include\10.0.10240.0\ucrt" "-IC:\Program Files (x86)\Windows Kits\8.1\include\shared" "-IC:\Program Files (x86)\Windows Kits\8.1\include\um" "-IC:\Program Files (x86)\Windows Kits\8.1\include\winrt" /EHsc /Tpmmh3module.cpp /Fobuild\temp.win-amd64-3.6\Release\mmh3module.obj
mmh3module.cpp
c:\users\10324\downloads\mmh3-master\mmh3-master\MurmurHash3.h(16): error C2371: 'uint32_t': redefinition; different basic types
C:\Program Files (x86)\Microsoft Visual Studio 14.0\VC\INCLUDE\stdint.h(23): note: see declaration of 'uint32_t'
mmh3module.cpp(14): error C2371: 'int32_t': redefinition; different basic types
C:\Program Files (x86)\Microsoft Visual Studio 14.0\VC\INCLUDE\stdint.h(19): note: see declaration of 'int32_t'
mmh3module.cpp(17): error C2371: 'uint32_t': redefinition; different basic types
C:\Program Files (x86)\Microsoft Visual Studio 14.0\VC\INCLUDE\stdint.h(23): note: see declaration of 'uint32_t'
error: command 'C:\Program Files (x86)\Microsoft Visual Studio 14.0\VC\BIN\x86_amd64\cl.exe' failed with exit status 2
Run at Win10 64 bit
Just a friendly check, if there is still anyone here maintaining this project or if anyone knows more about the project status.
While the code itself could be considered "done", the wish for prebuilt wheels will continue.
What is the recommended way forward for users mmh3? Build your own wheels? Alternative modules? Alternative distributions?
Does anyone know more?
Hi, I got this error under python 3.10
Python 3.10.10 (main, Mar 21 2023, 18:45:11) [GCC 11.2.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import mmh3
>>> a=mmh3.hash("abc", 1234)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
SystemError: PY_SSIZE_T_CLEAN macro must be defined for '#' formats
>>>
Any solution or code hacking?
The next release of mmh3 will support type hints.
FORCE_INLINE uint64_t getblock ( const uint64_t * p, int i )
uses a 32-bit integer i
for size, this fails for hashing huge blocks. Change to FORCE_INLINE uint64_t getblock ( const uint64_t * p, Py_ssize_t i )
.
Here is the pull request:
#34
$ pip install mmh3
Downloading/unpacking mmh3
Running setup.py egg_info for package mmh3
Installing collected packages: mmh3
Running setup.py install for mmh3
building 'mmh3' extension
cc -fno-strict-aliasing -fno-common -dynamic -arch x86_64 -arch i386 -g -Os -pipe -fno-common -fno-strict-aliasing -fwrapv -mno-fused-madd -DENABLE_DTRACE -DMACOSX -DNDEBUG -Wall -Wstrict-prototypes -Wshorten-64-to-32 -DNDEBUG -g -fwrapv -Os -Wall -Wstrict-prototypes -DENABLE_DTRACE -arch x86_64 -arch i386 -pipe -I/System/Library/Frameworks/Python.framework/Versions/2.7/include/python2.7 -c mmh3module.cpp -o build/temp.macosx-10.9-intel-2.7/mmh3module.o
clang: error: unknown argument: '-mno-fused-madd' [-Wunused-command-line-argument-hard-error-in-future]
clang: note: this will be a hard error (cannot be downgraded to a warning) in the future
error: command 'cc' failed with exit status 1
Complete output from command /Users/andre/work/penv/discosite/bin/python -c "import setuptools;__file__='/Users/andre/work/penv/discosite/build/mmh3/setup.py';exec(compile(open(__file__).read().replace('\r\n', '\n'), __file__, 'exec'))" install --single-version-externally-managed --record /var/folders/xq/yt9mr8t52cj9dqjsn8v0x5z40000gn/T/pip-kAdgDV-record/install-record.txt --install-headers /Users/andre/work/penv/discosite/include/site/python2.7:
running install
running build
running build_ext
building 'mmh3' extension
cc -fno-strict-aliasing -fno-common -dynamic -arch x86_64 -arch i386 -g -Os -pipe -fno-common -fno-strict-aliasing -fwrapv -mno-fused-madd -DENABLE_DTRACE -DMACOSX -DNDEBUG -Wall -Wstrict-prototypes -Wshorten-64-to-32 -DNDEBUG -g -fwrapv -Os -Wall -Wstrict-prototypes -DENABLE_DTRACE -arch x86_64 -arch i386 -pipe -I/System/Library/Frameworks/Python.framework/Versions/2.7/include/python2.7 -c mmh3module.cpp -o build/temp.macosx-10.9-intel-2.7/mmh3module.o
clang: error: unknown argument: '-mno-fused-madd' [-Wunused-command-line-argument-hard-error-in-future]
clang: note: this will be a hard error (cannot be downgraded to a warning) in the future
error: command 'cc' failed with exit status 1
----------------------------------------
Command /Users/andre/work/penv/discosite/bin/python -c "import setuptools;__file__='/Users/andre/work/penv/discosite/build/mmh3/setup.py';exec(compile(open(__file__).read().replace('\r\n', '\n'), __file__, 'exec'))" install --single-version-externally-managed --record /var/folders/xq/yt9mr8t52cj9dqjsn8v0x5z40000gn/T/pip-kAdgDV-record/install-record.txt --install-headers /Users/andre/work/penv/discosite/include/site/python2.7 failed with error code 1 in /Users/andre/work/penv/discosite/build/mmh3
Storing complete log in /Users/andre/.pip/pip.log
getting the murmur3 hash of a text file is trivial,
and i can get the murmur2 hash of binary files,
see https://github.com/milahu/murmurhash-cli-python
how to get the murmur3 hash of a binary file?
there is https://pypi.org/project/mmh3-binary/ but its an "empty fork"
expected API
#!/usr/bin/env python3
import mmh3
fd = open('/bin/sh', 'rb')
hash = mmh3.hash_from_buffer(fd)
fd
is an io.BufferedReader
ideally, avoid passing a bytes array ... this should support "a million gigabyte" files in theory,
so the bytes should be "streamed" or "piped" into the mmh3 function
currently, mmh3 says
mmh3.hash_from_buffer(fd)
TypeError: a bytes-like object is required, not '_io.BufferedReader'
Currently I'm working on refactoring the library to decouple files whose large part traces back to the original C++ code (specifically, murmurhash3.c
and murmurhash3.h
) from this repository.
The update is to adress the pre-review process of the Journal of Open Source Software (JOSS), whose managing EiC (Daniel S. Katz) thoughtfully pointed out that it was not clear which part of this library was my (and other contributors') own contributions.
openjournals/joss-reviews#5487
I proposed to use a git submodule to refer to Appleby's repository, and then write a script that converts the original C++ files to more portable C code at compile time.
It turns out, however, readability may be degraded to some extent in my current ad hoc implementation, which may also impact on easiness of extension. Solving these issues will be left for future updates. On the other hand, this update will clarify the extent of the authorship of code and solve the license issue #45.
mmh3
cannot hash data larger than 2**31 bytes:
>>> import mmh3
>>> import numpy as np
>>> a = np.zeros(2**30, dtype=np.int8)
>>> mmh3.hash_bytes(a)
b"O\xc5\xf1\xf2\x80';s\x1b\xddc\xa1E\x8d\xe3r"
>>> a = np.zeros(2**32, dtype=np.int8)
>>> mmh3.hash_bytes(a)
Traceback (most recent call last):
File "<ipython-input-9-918a38167947>", line 1, in <module>
mmh3.hash_bytes(a)
OverflowError: size does not fit in an int
The solution is to either use the s*
code instead of s#
in PyArg_ParseTuple(), or define the PY_SSIZET_CLEAN
macro and change size fields from int
to Py_ssize_t
. See https://docs.python.org/2.7/c-api/arg.html . I can also make a PR if you want.
Also, there's no test suite?
Collecting mmh3
Using cached mmh3-2.5.1.tar.gz (9.8 kB)
Building wheels for collected packages: mmh3
Building wheel for mmh3 (setup.py) ... error
ERROR: Command errored out with exit status 1:
command: /Users/aeaeaeae/venv/dslearn/bin/python3.7 -u -c 'import sys, setuptools, tokenize; sys.argv[0] = '"'"'/private/var/folders/20/9yq706c906g0l3dc2cxpm2xw0000gn/T/pip-install-efejad8_/mmh3/setup.py'"'"'; __file__='"'"'/private/var/folders/20/9yq706c906g0l3dc2cxpm2xw0000gn/T/pip-install-efejad8_/mmh3/setup.py'"'"';f=getattr(tokenize, '"'"'open'"'"', open)(__file__);code=f.read().replace('"'"'\r\n'"'"', '"'"'\n'"'"');f.close();exec(compile(code, __file__, '"'"'exec'"'"'))' bdist_wheel -d /private/var/folders/20/9yq706c906g0l3dc2cxpm2xw0000gn/T/pip-wheel-ks30qw3g
cwd: /private/var/folders/20/9yq706c906g0l3dc2cxpm2xw0000gn/T/pip-install-efejad8_/mmh3/
Complete output (12 lines):
running bdist_wheel
running build
running build_ext
building 'mmh3' extension
creating build
creating build/temp.macosx-10.9-x86_64-3.7
gcc -Wno-unused-result -Wsign-compare -Wunreachable-code -fno-common -dynamic -DNDEBUG -g -fwrapv -O3 -Wall -arch x86_64 -g -I/Library/Frameworks/Python.framework/Versions/3.7/include/python3.7m -c mmh3module.cpp -o build/temp.macosx-10.9-x86_64-3.7/mmh3module.o
mmh3module.cpp:12:19: fatal error: stdio.h: No such file or directory
#include <stdio.h>
^
compilation terminated.
error: command 'gcc' failed with exit status 1
----------------------------------------
ERROR: Failed building wheel for mmh3
Running setup.py clean for mmh3
Failed to build mmh3
Installing collected packages: mmh3
Running setup.py install for mmh3 ... error
ERROR: Command errored out with exit status 1:
command: /Users/aeaeaeae/venv/dslearn/bin/python3.7 -u -c 'import sys, setuptools, tokenize; sys.argv[0] = '"'"'/private/var/folders/20/9yq706c906g0l3dc2cxpm2xw0000gn/T/pip-install-efejad8_/mmh3/setup.py'"'"'; __file__='"'"'/private/var/folders/20/9yq706c906g0l3dc2cxpm2xw0000gn/T/pip-install-efejad8_/mmh3/setup.py'"'"';f=getattr(tokenize, '"'"'open'"'"', open)(__file__);code=f.read().replace('"'"'\r\n'"'"', '"'"'\n'"'"');f.close();exec(compile(code, __file__, '"'"'exec'"'"'))' install --record /private/var/folders/20/9yq706c906g0l3dc2cxpm2xw0000gn/T/pip-record-t_fjan2e/install-record.txt --single-version-externally-managed --compile --install-headers /Users/aeaeaeae/venv/dslearn/bin/../include/site/python3.7/mmh3
cwd: /private/var/folders/20/9yq706c906g0l3dc2cxpm2xw0000gn/T/pip-install-efejad8_/mmh3/
Complete output (12 lines):
running install
running build
running build_ext
building 'mmh3' extension
creating build
creating build/temp.macosx-10.9-x86_64-3.7
gcc -Wno-unused-result -Wsign-compare -Wunreachable-code -fno-common -dynamic -DNDEBUG -g -fwrapv -O3 -Wall -arch x86_64 -g -I/Library/Frameworks/Python.framework/Versions/3.7/include/python3.7m -c mmh3module.cpp -o build/temp.macosx-10.9-x86_64-3.7/mmh3module.o
mmh3module.cpp:12:19: fatal error: stdio.h: No such file or directory
#include <stdio.h>
^
compilation terminated.
error: command 'gcc' failed with exit status 1
----------------------------------------
ERROR: Command errored out with exit status 1: /Users/aeaeaeae/venv/dslearn/bin/python3.7 -u -c 'import sys, setuptools, tokenize; sys.argv[0] = '"'"'/private/var/folders/20/9yq706c906g0l3dc2cxpm2xw0000gn/T/pip-install-efejad8_/mmh3/setup.py'"'"'; __file__='"'"'/private/var/folders/20/9yq706c906g0l3dc2cxpm2xw0000gn/T/pip-install-efejad8_/mmh3/setup.py'"'"';f=getattr(tokenize, '"'"'open'"'"', open)(__file__);code=f.read().replace('"'"'\r\n'"'"', '"'"'\n'"'"');f.close();exec(compile(code, __file__, '"'"'exec'"'"'))' install --record /private/var/folders/20/9yq706c906g0l3dc2cxpm2xw0000gn/T/pip-record-t_fjan2e/install-record.txt --single-version-externally-managed --compile --install-headers /Users/aeaeaeae/venv/dslearn/bin/../include/site/python3.7/mmh3 Check the logs for full command output.
WARNING: You are using pip version 20.1.1; however, version 20.2.2 is available.
You should consider upgrading via the '/Users/aeaeaeae/venv/dslearn/bin/python3.7 -m pip install --upgrade pip' command.
Hello Hajime,
I guess i am doing something wrong - so this is probably not a real issue.
I am trying to hash big files an reading them in chunks for obvious reasons.
As a test i ran the following:
>>> mmh3.hash128('foobar', 0, signed = True)
155033341411922636178181560508455868997
>>> mmh3.hash128('bar',mmh3.hash128('foo', 0,signed = True), signed = True)
144772797738558108830387305245635675932
I expected the hash to be the same in both cases.
Am I missinterpreting the seed value - or is there another way of chaining hashes in murmur in general?
Thanks & Regards,
Martin
The metadata for version 2.4 was uploaded to PyPI as can be seen here, but the actual wheel and/or tarball was not.
It would be great if mmh3
provided prebuilt wheels for linux. See https://github.com/pypa/manylinux
Hi, would it be possible to add wheels or build support for Python 3.10? I ran into this problem when trying to build from source:
#19 51.73 Running setup.py install for mmh3: started
#19 52.09 Running setup.py install for mmh3: finished with status 'error'
#19 52.09 ERROR: Command errored out with exit status 1:
#19 52.09 command: /usr/local/bin/python -u -c 'import io, os, sys, setuptools, tokenize; sys.argv[0] = '"'"'/tmp/pip-install-29v4jtx8/mmh3_e21cf43dc5144a5ca51d99e23e0f7752/setup.py'"'"'; __file__='"'"'/tmp/pip-install-29v4jtx8/mmh3_e21cf43dc5144a5ca51d99e23e0f7752/setup.py'"'"';f = getattr(tokenize, '"'"'open'"'"', open)(__file__) if os.path.exists(__file__) else io.StringIO('"'"'from setuptools import setup; setup()'"'"');code = f.read().replace('"'"'\r\n'"'"', '"'"'\n'"'"');f.close();exec(compile(code, __file__, '"'"'exec'"'"'))' install --record /tmp/pip-record-vjnuvpe7/install-record.txt --single-version-externally-managed --compile --install-headers /usr/local/include/python3.10/mmh3
#19 52.09 cwd: /tmp/pip-install-29v4jtx8/mmh3_e21cf43dc5144a5ca51d99e23e0f7752/
#19 52.09 Complete output (12 lines):
#19 52.09 running install
#19 52.09 /usr/local/lib/python3.10/site-packages/setuptools/command/install.py:34: SetuptoolsDeprecationWarning: setup.py install is deprecated. Use build and pip and other standards-based tools.
#19 52.09 warnings.warn(
#19 52.09 running build
#19 52.09 running build_ext
#19 52.09 building 'mmh3' extension
#19 52.09 creating build
#19 52.09 creating build/temp.linux-x86_64-3.10
#19 52.09 gcc -pthread -Wno-unused-result -Wsign-compare -DNDEBUG -g -fwrapv -O3 -Wall -fPIC -I/usr/local/include/python3.10 -c MurmurHash3.cpp -o build/temp.linux-x86_64-3.10/MurmurHash3.o
#19 52.09 gcc: fatal error: cannot execute ‘cc1plus’: execvp: No such file or directory
#19 52.09 compilation terminated.
#19 52.09 error: command '/usr/bin/gcc' failed with exit code 1
#19 52.09 ----------------------------------------
#19 52.09 ERROR: Command errored out with exit status 1: /usr/local/bin/python -u -c 'import io, os, sys, setuptools, tokenize; sys.argv[0] = '"'"'/tmp/pip-install-29v4jtx8/mmh3_e21cf43dc5144a5ca51d99e23e0f7752/setup.py'"'"'; __file__='"'"'/tmp/pip-install-29v4jtx8/mmh3_e21cf43dc5144a5ca51d99e23e0f7752/setup.py'"'"';f = getattr(tokenize, '"'"'open'"'"', open)(__file__) if os.path.exists(__file__) else io.StringIO('"'"'from setuptools import setup; setup()'"'"');code = f.read().replace('"'"'\r\n'"'"', '"'"'\n'"'"');f.close();exec(compile(code, __file__, '"'"'exec'"'"'))' install --record /tmp/pip-record-vjnuvpe7/install-record.txt --single-version-externally-managed --compile --install-headers /usr/local/include/python3.10/mmh3 Check the logs for full command output.
OS X Version: 10.14.2
$ pip install mmh3
Collecting mmh3
Using cached https://files.pythonhosted.org/packages/fa/7e/3ddcab0a9fcea034212c02eb411433db9330e34d626360b97333368b4052/mmh3-2.5.1.tar.gz
Building wheels for collected packages: mmh3
Running setup.py bdist_wheel for mmh3 ... error
Complete output from command /Users/pranjal/anaconda3/bin/python -u -c "import setuptools, tokenize;__file__='/private/var/folders/jd/r4rs3fq51qv90t08ljqpkkhc0000gp/T/pip-install-qi111dnd/mmh3/setup.py';f=getattr(tokenize, 'open', open)(__file__);code=f.read().replace('\r\n', '\n');f.close();exec(compile(code, __file__, 'exec'))" bdist_wheel -d /private/var/folders/jd/r4rs3fq51qv90t08ljqpkkhc0000gp/T/pip-wheel-7q_m8vvr --python-tag cp36:
running bdist_wheel
running build
running build_ext
building 'mmh3' extension
creating build
creating build/temp.macosx-10.7-x86_64-3.6
gcc -DNDEBUG -g -fwrapv -O3 -Wall -Wstrict-prototypes -Qunused-arguments -Qunused-arguments -I/Users/pranjal/anaconda3/include/python3.6m -c mmh3module.cpp -o build/temp.macosx-10.7-x86_64-3.6/mmh3module.o
warning: include path for stdlibc++ headers not found; pass '-std=libc++' on the command line to use the libc++ standard library instead [-Wstdlibcxx-not-found]
1 warning generated.
gcc -DNDEBUG -g -fwrapv -O3 -Wall -Wstrict-prototypes -Qunused-arguments -Qunused-arguments -I/Users/pranjal/anaconda3/include/python3.6m -c MurmurHash3.cpp -o build/temp.macosx-10.7-x86_64-3.6/MurmurHash3.o
warning: include path for stdlibc++ headers not found; pass '-std=libc++' on the command line to use the libc++ standard library instead [-Wstdlibcxx-not-found]
1 warning generated.
creating build/lib.macosx-10.7-x86_64-3.6
g++ -bundle -undefined dynamic_lookup -L/Users/pranjal/anaconda3/lib -arch x86_64 -L/Users/pranjal/anaconda3/lib -arch x86_64 -Qunused-arguments -Qunused-arguments -arch x86_64 build/temp.macosx-10.7-x86_64-3.6/mmh3module.o build/temp.macosx-10.7-x86_64-3.6/MurmurHash3.o -o build/lib.macosx-10.7-x86_64-3.6/mmh3.cpython-36m-darwin.so
clang: warning: libstdc++ is deprecated; move to libc++ with a minimum deployment target of OS X 10.9 [-Wdeprecated]
ld: library not found for -lstdc++
clang: error: linker command failed with exit code 1 (use -v to see invocation)
error: command 'g++' failed with exit status 1
----------------------------------------
Failed building wheel for mmh3
Running setup.py clean for mmh3
Failed to build mmh3
Installing collected packages: mmh3
Running setup.py install for mmh3 ... error
Complete output from command /Users/pranjal/anaconda3/bin/python -u -c "import setuptools, tokenize;__file__='/private/var/folders/jd/r4rs3fq51qv90t08ljqpkkhc0000gp/T/pip-install-qi111dnd/mmh3/setup.py';f=getattr(tokenize, 'open', open)(__file__);code=f.read().replace('\r\n', '\n');f.close();exec(compile(code, __file__, 'exec'))" install --record /private/var/folders/jd/r4rs3fq51qv90t08ljqpkkhc0000gp/T/pip-record-38bwkbu3/install-record.txt --single-version-externally-managed --compile:
running install
running build
running build_ext
building 'mmh3' extension
creating build
creating build/temp.macosx-10.7-x86_64-3.6
gcc -DNDEBUG -g -fwrapv -O3 -Wall -Wstrict-prototypes -Qunused-arguments -Qunused-arguments -I/Users/pranjal/anaconda3/include/python3.6m -c mmh3module.cpp -o build/temp.macosx-10.7-x86_64-3.6/mmh3module.o
warning: include path for stdlibc++ headers not found; pass '-std=libc++' on the command line to use the libc++ standard library instead [-Wstdlibcxx-not-found]
1 warning generated.
gcc -DNDEBUG -g -fwrapv -O3 -Wall -Wstrict-prototypes -Qunused-arguments -Qunused-arguments -I/Users/pranjal/anaconda3/include/python3.6m -c MurmurHash3.cpp -o build/temp.macosx-10.7-x86_64-3.6/MurmurHash3.o
warning: include path for stdlibc++ headers not found; pass '-std=libc++' on the command line to use the libc++ standard library instead [-Wstdlibcxx-not-found]
1 warning generated.
creating build/lib.macosx-10.7-x86_64-3.6
g++ -bundle -undefined dynamic_lookup -L/Users/pranjal/anaconda3/lib -arch x86_64 -L/Users/pranjal/anaconda3/lib -arch x86_64 -Qunused-arguments -Qunused-arguments -arch x86_64 build/temp.macosx-10.7-x86_64-3.6/mmh3module.o build/temp.macosx-10.7-x86_64-3.6/MurmurHash3.o -o build/lib.macosx-10.7-x86_64-3.6/mmh3.cpython-36m-darwin.so
clang: warning: libstdc++ is deprecated; move to libc++ with a minimum deployment target of OS X 10.9 [-Wdeprecated]
ld: library not found for -lstdc++
clang: error: linker command failed with exit code 1 (use -v to see invocation)
error: command 'g++' failed with exit status 1
----------------------------------------
Command "/Users/pranjal/anaconda3/bin/python -u -c "import setuptools, tokenize;__file__='/private/var/folders/jd/r4rs3fq51qv90t08ljqpkkhc0000gp/T/pip-install-qi111dnd/mmh3/setup.py';f=getattr(tokenize, 'open', open)(__file__);code=f.read().replace('\r\n', '\n');f.close();exec(compile(code, __file__, 'exec'))" install --record /private/var/folders/jd/r4rs3fq51qv90t08ljqpkkhc0000gp/T/pip-record-38bwkbu3/install-record.txt --single-version-externally-managed --compile" failed with error code 1 in /private/var/folders/jd/r4rs3fq51qv90t08ljqpkkhc0000gp/T/pip-install-qi111dnd/mmh3/
Hi there,
I freshly downloaded mmh through Anaconda, and the mmh.hash function provides the following error message when I run "element= mmh3.hash(element,signed =False)".
TypeError: 'signed' is an invalid keyword argument for this function
Is this an issue with the Anaconda version or something else?
In the course of implementing hashlib-compliant interfaces (#39), I plan to port the main code of MurmurHash3 from C++ (as originally written by Austin Appleby) to C for portability.
This actually can be done with few hassles, thanks to PEP 7 updates for Python >= 3.6; versions before 3.6 had to conform to C89 and did not officially support <stdint.h>
or <inttypes.h>
.
In addition, I will relicense these code from the public domain to MIT. The intent is purely for resolving issues related to the public domain and its kin licenses (#43). The text of the original public domain notice will be left for attribution and recognition.
The code for mmh3_hash64 in mmh3module.c has a x64arch
argument, but the typing file __init__.pyi
does not declare this as an argument to hash64. The result is a mypy error if code uses the x64arch
keyword argument.
I have recently come into the same issue as here which was closed.
Please see:
C:\Users\Luca>python -m pip install murmurhash3
Collecting murmurhash3
Using cached https://files.pythonhosted.org/packages/b5/f4/1f9c4851667a2541bd151b8d9efef707495816274fada365fa6a31085a32/murmurhash3-2.3.5.tar.gz
Building wheels for collected packages: murmurhash3
Running setup.py bdist_wheel for murmurhash3 ... error
Complete output from command C:\Users\Luca\Anaconda3\python.exe -u -c "import setuptools, tokenize;__file__='C:\\Users\\Luca\\AppData\\Local\\Temp\\pip-install-0ftrk0aa\\murmurhash3\\setup.py';f=getattr(tokenize, 'open', open)(__file__);code=f.read().replace('\r\n', '\n');f.close();exec(compile(code, __file__, 'exec'))" bdist_wheel -d C:\Users\Luca\AppData\Local\Temp\pip-wheel-6_gzb5c8 --python-tag cp37:
running bdist_wheel
running build
running build_ext
building 'mmh3' extension
creating build
creating build\temp.win-amd64-3.7
creating build\temp.win-amd64-3.7\Release
C:\Program Files (x86)\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.16.27023\bin\HostX86\x64\cl.exe /c /nologo /Ox /W3 /GL /DNDEBUG /MD -IC:\Users\Luca\Anaconda3\include -IC:\Users\Luca\Anaconda3\include "-IC:\Program Files (x86)\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.16.27023\ATLMFC\include" "-IC:\Program Files (x86)\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.16.27023\include" "-IC:\Program Files (x86)\Windows Kits\10\include\10.0.17763.0\ucrt" "-IC:\Program Files (x86)\Windows Kits\10\include\10.0.17763.0\shared" "-IC:\Program Files (x86)\Windows Kits\10\include\10.0.17763.0\um" "-IC:\Program Files (x86)\Windows Kits\10\include\10.0.17763.0\winrt" "-IC:\Program Files (x86)\Windows Kits\10\include\10.0.17763.0\cppwinrt" /EHsc /Tpmmh3module.cpp /Fobuild\temp.win-amd64-3.7\Release\mmh3module.obj
mmh3module.cpp
c:\users\luca\appdata\local\temp\pip-install-0ftrk0aa\murmurhash3\murmur_hash_3.hpp(5): error C2371: 'uint32_t': redefinition; different basic types
C:\Program Files (x86)\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.16.27023\include\stdint.h(23): note: see declaration of 'uint32_t'
mmh3module.cpp(9): error C2371: 'int32_t': redefinition; different basic types
C:\Program Files (x86)\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.16.27023\include\stdint.h(19): note: see declaration of 'int32_t'
mmh3module.cpp(12): error C2371: 'uint32_t': redefinition; different basic types
C:\Program Files (x86)\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.16.27023\include\stdint.h(23): note: see declaration of 'uint32_t'
error: command 'C:\\Program Files (x86)\\Microsoft Visual Studio\\2017\\Community\\VC\\Tools\\MSVC\\14.16.27023\\bin\\HostX86\\x64\\cl.exe' failed with exit status 2
----------------------------------------
Failed building wheel for murmurhash3
Running setup.py clean for murmurhash3
Failed to build murmurhash3
Installing collected packages: murmurhash3
Running setup.py install for murmurhash3 ... error
Complete output from command C:\Users\Luca\Anaconda3\python.exe -u -c "import setuptools, tokenize;__file__='C:\\Users\\Luca\\AppData\\Local\\Temp\\pip-install-0ftrk0aa\\murmurhash3\\setup.py';f=getattr(tokenize, 'open', open)(__file__);code=f.read().replace('\r\n', '\n');f.close();exec(compile(code, __file__, 'exec'))" install --record C:\Users\Luca\AppData\Local\Temp\pip-record-j4aoi9ln\install-record.txt --single-version-externally-managed --compile:
running install
running build
running build_ext
building 'mmh3' extension
creating build
creating build\temp.win-amd64-3.7
creating build\temp.win-amd64-3.7\Release
C:\Program Files (x86)\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.16.27023\bin\HostX86\x64\cl.exe /c /nologo /Ox /W3 /GL /DNDEBUG /MD -IC:\Users\Luca\Anaconda3\include -IC:\Users\Luca\Anaconda3\include "-IC:\Program Files (x86)\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.16.27023\ATLMFC\include" "-IC:\Program Files (x86)\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.16.27023\include" "-IC:\Program Files (x86)\Windows Kits\10\include\10.0.17763.0\ucrt" "-IC:\Program Files (x86)\Windows Kits\10\include\10.0.17763.0\shared" "-IC:\Program Files (x86)\Windows Kits\10\include\10.0.17763.0\um" "-IC:\Program Files (x86)\Windows Kits\10\include\10.0.17763.0\winrt" "-IC:\Program Files (x86)\Windows Kits\10\include\10.0.17763.0\cppwinrt" /EHsc /Tpmmh3module.cpp /Fobuild\temp.win-amd64-3.7\Release\mmh3module.obj
mmh3module.cpp
c:\users\luca\appdata\local\temp\pip-install-0ftrk0aa\murmurhash3\murmur_hash_3.hpp(5): error C2371: 'uint32_t': redefinition; different basic types
C:\Program Files (x86)\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.16.27023\include\stdint.h(23): note: see declaration of 'uint32_t'
mmh3module.cpp(9): error C2371: 'int32_t': redefinition; different basic types
C:\Program Files (x86)\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.16.27023\include\stdint.h(19): note: see declaration of 'int32_t'
mmh3module.cpp(12): error C2371: 'uint32_t': redefinition; different basic types
C:\Program Files (x86)\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.16.27023\include\stdint.h(23): note: see declaration of 'uint32_t'
error: command 'C:\\Program Files (x86)\\Microsoft Visual Studio\\2017\\Community\\VC\\Tools\\MSVC\\14.16.27023\\bin\\HostX86\\x64\\cl.exe' failed with exit status 2
I need mmh3 for h2o4gpu. I open a question on stackexhange relating to this issue.
Thanks
my colleague use this function to generate a int64, could you please add a demo how Go call this function to get the same number? thank you in advance
Hi, thanks so much for this very useful library! I'm using it to randomize keys for billions of objects to create condensed files that contain groups of thousands to millions of objects at a time.
It's not critical, but there is a bottleneck step in the front of my Python processing pipeline where the hash is applied to all object labels at once to figure out how to assign them for further processing. The hash function is dominating this calculation.
Function: murmur at line 25
Time per Hit in microseconds
Line # Hits Time Per Hit % Time Line Contents
==============================================================
25 @profile
26 def murmur(x):
27 69734 74332.0 1.1 1.6 y = uint64(x).tobytes()
28 69734 4381932.0 62.8 97.2 y = mmh3.hash64(y, x64arch=False)
29 69733 52731.0 0.8 1.2 return uint64(y[0])
Total time: 5.44635 s
File: REDACTED
Function: compute_shard_location at line 145
Time per Hit in microseconds
Line # Hits Time Per Hit % Time Line Contents
==============================================================
145 @profile
146 def compute_shard_location(self, key):
147 69734 99805.0 1.4 1.8 chunkid = uint64(key) >> uint64(self.preshift_bits)
148 69734 4703010.0 67.4 86.4 chunkid = self.hashfn(chunkid)
149 69733 60072.0 0.9 1.1 minishard_number = uint64(chunkid & self.minishard_mask)
150 69733 97034.0 1.4 1.8 shard_number = uint64((chunkid & self.shard_mask) >> uint64(self.minishard_bits))
151 69733 312626.0 4.5 5.7 shard_number = format(shard_number, 'x').zfill(int(np.ceil(self.shard_bits / 4.0)))
152 69733 102740.0 1.5 1.9 remainder = chunkid >> uint64(self.minishard_bits + self.shard_bits)
153
154 69733 71060.0 1.0 1.3 return ShardLocation(shard_number, minishard_number, remainder)
It could be possible to thread this processing, but Python has the GIL. Multiprocessing could work, though the picke/unpickle will also take some time. I was thinking that a neat way to increase thoughput would be to process multiple hashes at once in C, that is, accept both a scalar and an iterator as input to the function. This would allow for the compiler to autovetorize and also avoid Python/C overheads. I'm getting ~66.5k hashes/sec on Apple Silicon M1 ARM64 currently.
I'm thinking of an interface similar to this. The second should be some buffer that is easy to read into numpy.
(lower, upper) = mmh3.hash64(some_bytes, x64arch=False)
[l1,h1,l2,h2] = mmh3.hash64(iterable_containing_bytes, x64arch=False)
Thank you for your consideration and for all the effort you've put into this library!
Hi,
It seems that the pypi version (2.0) has a bug. Isn't time to upgrade it?
I plan to remove the line
PyModule_AddStringConstant(module, "__version__", "3.1.0");
in mmh3module.cpp
in the next non-trivial update.
The __version__
constant was introduced in 2.1 (2013-02-25), since it was in vogue back then and the only (iirc) way to get the version number of a module from within a python script.
However, Python 3.8 officially introduced importlib.metadata
, which is no longer provisional in Python 3.10. Python 3.7 still needs to pip install importlib-metadata
, but it will be in EOL soon. Therefore, there is no need to keep the __version__
constant anymore. Plus, keeping the same info in multiple files is a bad (very bad indeed) engineering practice.
See also
https://stackoverflow.com/a/72168209
I will bump up the version to 4.0.0 then, because the removal breaks backward compatibility, however minor it is.
Test cases for the functions hash64
/hash128
/hash_bytes
fail on s390x when the arg x64arch
is False
(https://github.com/hajimes/mmh3/actions/runs/7396371609), likely because the architecture is big-endian.
> assert mmh3.hash64("foo", signed=False, x64arch=False) == (
6968798590592097061,
6968798590746895717,
)
E assert (630394342576...8590746895717) == (696879859059...8590746895717)
E At index 0 diff: 6303943425762141541 != 6968798590592097061
E Use -v to get more diff
The result should be the same as the value in little-endian environments (feature from 4.0.0).
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.