Code Monkey home page Code Monkey logo

mmh3's People

Contributors

arieleizenberg avatar doozr avatar dshein-alt avatar hajimes avatar honnibal avatar n-dusan avatar pik avatar wbolster avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

mmh3's Issues

Make the module hashlib-compliant

mmh3 is currently not hashlib-compliant. This makes it challenging to use it as a replacement for md5 or other cryptographic hashes. A wrapper can be built to make this module hashlib-compliant. One should be able to use the module as hashlib.md5.

update() -- update the current digest with an additional string
digest() -- return the current digest value
hexdigest() -- return the current digest as a string of hexadecimal digits
intdigest() -- return the current digest as an integer
copy() -- return a copy of the current mmh3 object
reset() -- reset state

Make hash functions endian-neutral

The original c++ code of MurmurHash3 by Austin Appleby is endian-sensitive. The advantage of this style is, first and foremost, performance.

However, inconsistency between platforms may cause problems in various fields, e.g., NLP (cf. explosion/murmurhash#26).

In addition, several IoT search engines (including Shodan) use a little-endian variant mmh3 value as the fingerprint of a favicon.

To guarantee portability and consistency across platforms, mmh3 will use little-endian variant values for all architectures from version 4.0.0, even though it will make the hash functions slow on big-endian architectures.

mmh3 not building anymore

mmh3 is not compiling when installing through pip via packet or git.

This has been raised once in the past (#7) and fixed but the issue appeared again.

Specs of my system:
gcc: 7.3.0
os: ubuntu 16 / ubuntu 18
pip: 9.0.1
git: 2.17.1
python: 2.7.15rc1

Error is exactly the same:

building 'mmh3' extension
  creating build
  creating build/temp.linux-x86_64-2.7
  x86_64-linux-gnu-gcc -pthread -DNDEBUG -g -fwrapv -O2 -Wall -Wstrict-prototypes -fno-strict-aliasing -Wdate-time -D_FORTIFY_SOURCE=2 -g -fstack-protector-strong -Wformat -Werror=format-security -fPIC -I/usr/include/python2.7 -c 
mmh3module.cpp -o build/temp.linux-x86_64-2.7/mmh3module.o
  cc1plus: warning: command line option ‘-Wstrict-prototypes’ is valid for C/ObjC but not for C++
  x86_64-linux-gnu-gcc -pthread -DNDEBUG -g -fwrapv -O2 -Wall -Wstrict-prototypes -fno-strict-aliasing -Wdate-time -D_FORTIFY_SOURCE=2 -g -fstack-protector-strong -Wformat -Werror=format-security -fPIC -I/usr/include/python2.7 -c 
MurmurHash3.cpp -o build/temp.linux-x86_64-2.7/MurmurHash3.o
  cc1plus: warning: command line option ‘-Wstrict-prototypes’ is valid for C/ObjC but not for C++
  MurmurHash3.cpp:34:31: error: expected constructor, destructor, or type conversion before ‘(’ token
   #define FORCE_INLINE attribute((always_inline))
                                 ^
  MurmurHash3.cpp:60:1: note: in expansion of macro ‘FORCE_INLINE’
   FORCE_INLINE uint32_t getblock ( const uint32_t * p, int i )
   ^
  MurmurHash3.cpp:34:31: error: expected constructor, destructor, or type conversion before ‘(’ token
   #define FORCE_INLINE attribute((always_inline))
                                 ^
  MurmurHash3.cpp:65:1: note: in expansion of macro ‘FORCE_INLINE’
   FORCE_INLINE uint64_t getblock ( const uint64_t * p, int i )
   ^
  MurmurHash3.cpp:34:31: error: expected constructor, destructor, or type conversion before ‘(’ token
   #define FORCE_INLINE attribute((always_inline))
                                 ^
  MurmurHash3.cpp:73:1: note: in expansion of macro ‘FORCE_INLINE’
   FORCE_INLINE uint32_t fmix ( uint32_t h )
   ^
  MurmurHash3.cpp:34:31: error: expected constructor, destructor, or type conversion before ‘(’ token
   #define FORCE_INLINE attribute((always_inline))
                                 ^
  MurmurHash3.cpp:86:1: note: in expansion of macro ‘FORCE_INLINE’
   FORCE_INLINE uint64_t fmix ( uint64_t k )
   ^
  MurmurHash3.cpp: In function ‘void MurmurHash3_x86_32(const void*, int, uint32_t, void*)’:
  MurmurHash3.cpp:117:36: error: ‘getblock’ was not declared in this scope
       uint32_t k1 = getblock(blocks,i);
                                      ^
  MurmurHash3.cpp:148:15: error: ‘fmix’ was not declared in this scope
     h1 = fmix(h1);
                 ^
  MurmurHash3.cpp: In function ‘void MurmurHash3_x86_128(const void*, int, uint32_t, void*)’:
  MurmurHash3.cpp:178:40: error: ‘getblock’ was not declared in this scope
       uint32_t k1 = getblock(blocks,i*4+0);
                                          ^
  MurmurHash3.cpp:244:15: error: ‘fmix’ was not declared in this scope
     h1 = fmix(h1);
                 ^
  MurmurHash3.cpp: In function ‘void MurmurHash3_x64_128(const void*, int, uint32_t, void*)’:
  MurmurHash3.cpp:279:40: error: ‘getblock’ was not declared in this scope
       uint64_t k1 = getblock(blocks,i*2+0);
                                          ^
  MurmurHash3.cpp:329:15: error: ‘fmix’ was not declared in this scope
     h1 = fmix(h1);
                 ^
  error: command 'x86_64-linux-gnu-gcc' failed with exit status 1
  
  ----------------------------------------
  Failed building wheel for mmh3

Documentation of why hash64 returns two values

As a user it is very confusing why mmh3.hash64 returns two 64-bit values, whereas hash and hash128 do not. Are we supposed to just pick one of them? Is there a recommended way to combine them?

mmh3 not building anymore?

mmh3 is not compiling when installing through pip via packet or git.
I want to use it for the Bloom filter implemented here:
http://www.maxburstein.com/blog/creating-a-simple-bloom-filter/

Specs of my system:
gcc: 5.1.0
os: archlinux
pip: 7.0.3
git: 2.4.2
python: 3.4.3 / 2.7.9

Output from pip

> sudo pip install mmh3
Collecting mmh3
  Using cached mmh3-2.3.tar.gz
Installing collected packages: mmh3
  Running setup.py install for mmh3
    Complete output from command /usr/bin/python -c "import setuptools, tokenize;__file__='/tmp/pip-build-ia20ohxq/mmh3/setup.py';exec(compile(getattr(tokenize, 'open', open)(__file__).read().replace('\r\n', '\n'), __file__, 'exec'))" install --record /tmp/pip-dmuabtur-record/install-record.txt --single-version-externally-managed --compile:
    running install
    running build
    running build_ext
    building 'mmh3' extension
    creating build
    creating build/temp.linux-x86_64-3.4
    gcc -pthread -Wno-unused-result -DNDEBUG -g -fwrapv -O3 -Wall -Wstrict-prototypes -march=x86-64 -mtune=generic -O2 -pipe -fstack-protector-strong --param=ssp-buffer-size=4 -fPIC -I/usr/include/python3.4m -c mmh3module.cpp -o build/temp.linux-x86_64-3.4/mmh3module.o
    cc1plus: warning: command line option ‘-Wstrict-prototypes’ is valid for C/ObjC but not for C++
    gcc -pthread -Wno-unused-result -DNDEBUG -g -fwrapv -O3 -Wall -Wstrict-prototypes -march=x86-64 -mtune=generic -O2 -pipe -fstack-protector-strong --param=ssp-buffer-size=4 -fPIC -I/usr/include/python3.4m -c MurmurHash3.cpp -o build/temp.linux-x86_64-3.4/MurmurHash3.o
    cc1plus: warning: command line option ‘-Wstrict-prototypes’ is valid for C/ObjC but not for C++
    MurmurHash3.cpp:34:31: error: expected constructor, destructor, or type conversion before ‘(’ token
     #define FORCE_INLINE attribute((always_inline))
                                   ^
    MurmurHash3.cpp:60:1: note: in expansion of macro ‘FORCE_INLINE’
     FORCE_INLINE uint32_t getblock ( const uint32_t * p, int i )
     ^
    MurmurHash3.cpp:34:31: error: expected constructor, destructor, or type conversion before ‘(’ token
     #define FORCE_INLINE attribute((always_inline))
                                   ^
    MurmurHash3.cpp:65:1: note: in expansion of macro ‘FORCE_INLINE’
     FORCE_INLINE uint64_t getblock ( const uint64_t * p, int i )
     ^
    MurmurHash3.cpp:34:31: error: expected constructor, destructor, or type conversion before ‘(’ token
     #define FORCE_INLINE attribute((always_inline))
                                   ^
    MurmurHash3.cpp:73:1: note: in expansion of macro ‘FORCE_INLINE’
     FORCE_INLINE uint32_t fmix ( uint32_t h )
     ^
    MurmurHash3.cpp:34:31: error: expected constructor, destructor, or type conversion before ‘(’ token
     #define FORCE_INLINE attribute((always_inline))
                                   ^
    MurmurHash3.cpp:86:1: note: in expansion of macro ‘FORCE_INLINE’
     FORCE_INLINE uint64_t fmix ( uint64_t k )
     ^
    MurmurHash3.cpp: In function ‘void MurmurHash3_x86_32(const void*, int, uint32_t, void*)’:
    MurmurHash3.cpp:117:36: error: ‘getblock’ was not declared in this scope
         uint32_t k1 = getblock(blocks,i);
                                        ^
    MurmurHash3.cpp:148:15: error: ‘fmix’ was not declared in this scope
       h1 = fmix(h1);
                   ^
    MurmurHash3.cpp: In function ‘void MurmurHash3_x86_128(const void*, int, uint32_t, void*)’:
    MurmurHash3.cpp:178:40: error: ‘getblock’ was not declared in this scope
         uint32_t k1 = getblock(blocks,i*4+0);
                                            ^
    MurmurHash3.cpp:244:15: error: ‘fmix’ was not declared in this scope
       h1 = fmix(h1);
                   ^
    MurmurHash3.cpp: In function ‘void MurmurHash3_x64_128(const void*, int, uint32_t, void*)’:
    MurmurHash3.cpp:279:40: error: ‘getblock’ was not declared in this scope
         uint64_t k1 = getblock(blocks,i*2+0);
                                            ^
    MurmurHash3.cpp:329:15: error: ‘fmix’ was not declared in this scope
       h1 = fmix(h1);
                   ^
    error: command 'gcc' failed with exit status 1

    ----------------------------------------
Command "/usr/bin/python -c "import setuptools, tokenize;__file__='/tmp/pip-build-ia20ohxq/mmh3/setup.py';exec(compile(getattr(tokenize, 'open', open)(__file__).read().replace('\r\n', '\n'), __file__, 'exec'))" install --record /tmp/pip-dmuabtur-record/install-record.txt --single-version-externally-managed --compile" failed with error code 1 in /tmp/pip-build-ia20ohxq/mmh3

Not able to install it how to fix it

C:\Users\Saurabh bhandari\AppData\Local\Programs\Python\Python38>pip install mmh3
Collecting mmh3
Using cached mmh3-2.5.1.tar.gz (9.8 kB)
Using legacy 'setup.py install' for mmh3, since package 'wheel' is not installed.
Installing collected packages: mmh3
Running setup.py install for mmh3 ... error
ERROR: Command errored out with exit status 1:
command: 'c:\users\saurabh bhandari\appdata\local\programs\python\python38\python.exe' -u -c 'import sys, setuptools, tokenize; sys.argv[0] = '"'"'C:\Users\Saurabh bhandari\AppData\Local\Temp\pip-install-ukiszg0q\mmh3\setup.py'"'"'; file='"'"'C:\Users\Saurabh bhandari\AppData\Local\Temp\pip-install-ukiszg0q\mmh3\setup.py'"'"';f=getattr(tokenize, '"'"'open'"'"', open)(file);code=f.read().replace('"'"'\r\n'"'"', '"'"'\n'"'"');f.close();exec(compile(code, file, '"'"'exec'"'"'))' install --record 'C:\Users\Saurabh bhandari\AppData\Local\Temp\pip-record-iye5vtng\install-record.txt' --single-version-externally-managed --compile --install-headers 'c:\users\saurabh bhandari\appdata\local\programs\python\python38\Include\mmh3'
cwd: C:\Users\Saurabh bhandari\AppData\Local\Temp\pip-install-ukiszg0q\mmh3
Complete output (11 lines):
running install
running build
running build_ext
building 'mmh3' extension
creating build
creating build\temp.win-amd64-3.8
creating build\temp.win-amd64-3.8\Release
C:\Program Files (x86)\Microsoft Visual Studio\2019\BuildTools\VC\Tools\MSVC\14.26.28801\bin\HostX86\x64\cl.exe /c /nologo /Ox /W3 /GL /DNDEBUG /MD "-Ic:\users\saurabh bhandari\appdata\local\programs\python\python38\include" "-Ic:\users\saurabh bhandari\appdata\local\programs\python\python38\include" "-IC:\Program Files (x86)\Microsoft Visual Studio\2019\BuildTools\VC\Tools\MSVC\14.26.28801\include" /EHsc /Tpmmh3module.cpp /Fobuild\temp.win-amd64-3.8\Release\mmh3module.obj
mmh3module.cpp
mmh3module.cpp(12): fatal error C1083: Cannot open include file: 'stdio.h': No such file or directory
error: command 'C:\Program Files (x86)\Microsoft Visual Studio\2019\BuildTools\VC\Tools\MSVC\14.26.28801\bin\HostX86\x64\cl.exe' failed with exit status 2
----------------------------------------
ERROR: Command errored out with exit status 1: 'c:\users\saurabh bhandari\appdata\local\programs\python\python38\python.exe' -u -c 'import sys, setuptools, tokenize; sys.argv[0] = '"'"'C:\Users\Saurabh bhandari\AppData\Local\Temp\pip-install-ukiszg0q\mmh3\setup.py'"'"'; file='"'"'C:\Users\Saurabh bhandari\AppData\Local\Temp\pip-install-ukiszg0q\mmh3\setup.py'"'"';f=getattr(tokenize, '"'"'open'"'"', open)(file);code=f.read().replace('"'"'\r\n'"'"', '"'"'\n'"'"');f.close();exec(compile(code, file, '"'"'exec'"'"'))' install --record 'C:\Users\Saurabh bhandari\AppData\Local\Temp\pip-record-iye5vtng\install-record.txt' --single-version-externally-managed --compile --install-headers 'c:\users\saurabh bhandari\appdata\local\programs\python\python38\Include\mmh3' Check the logs for full command output.

return different hash between c++ and python

cpp: (compile with the code https://github.com/hajimes/mmh3/blob/master/MurmurHash3.h)
uint32_t MurmurHash32(const void* key, size_t len) {
uint32_t hash;
MurmurHash3_x86_32(key, (int)len, 0, &hash);
std::cout << "site= " << key << ", code= " << code << std::endl;
return hash;
}

// output: (import mmh3)
site= www.taobao.com, code= 4076543410

python:
import mmh3
res = mmh3.hash(domain, signed=False)
print 'site= %s, code= %s' % (domain, res)

// output:
site= www.taobao.com, code= 3707551990

why the same site "www.taobao.com" get 4076543410 in cpp while 3707551990 in python?

Golang Compatibility

Hello @hajimes thank you so much for providing this murmur3 implementation in Python and for all the work you do in open source; we really appreciate being able to use your library!

I was recently investigating some compatibility issues with the output produced by mmh3 and a Go library we were using internally. I asked the question on Stackoverflow and got a response back:

https://stackoverflow.com/questions/75921577/murmur3-hash-compatibility-between-go-and-python

It looks like the order of the two uint64s returned by the 128-bit algorithm is reversed between the two libraries; but it's simple enough to modify the returned results in either Go or Python to produce compatible hashes.

I was wondering; would you like me to open a PR to update the README with the compatibility information? Is there any other docs I should update in the PR?

Additionally, if there is any way to reverse the order order of the uint64s returned by murmur3 (e.g. with an argument to hash128 or hash_bytes) I'd be happy to open a PR for that as well. Let me know how you'd like to proceed!

mmh3

I also encountered the problem of mm3. I have been very distressed looking for a solution. The errors I see here are not the same as mine. Can you help me?
——————————————————————————————————————————
1 warning generated.
creating build/lib.macosx-10.6-intel-3.6
/usr/bin/clang++ -bundle -undefined dynamic_lookup -arch i386 -arch x86_64 -g -L/usr/local/opt/openssl/lib -I/usr/local/opt/openssl/include build/temp.macosx-10.6-intel-3.6/mmh3module.o build/temp.macosx-10.6-intel-3.6/MurmurHash3.o -o build/lib.macosx-10.6-intel-3.6/mmh3.cpython-36m-darwin.so
clang: warning: libstdc++ is deprecated; move to libc++ with a minimum deployment target of OS X 10.9 [-Wdeprecated]
clang: warning: libstdc++ is deprecated; move to libc++ with a minimum deployment target of OS X 10.9 [-Wdeprecated]
ld: library not found for -lstdc++
clang: error: linker command failed with exit code 1 (use -v to see invocation)
error: command '/usr/bin/clang++' failed with exit status 1

hash_bytes broken in 2.5

It looks like this change broke the hash_bytes function in 2.5:

3bf1e5a - Add a keyword argument signed (2 days ago) <Hajime Senuma> 
diff --git a/mmh3module.cpp b/mmh3module.cpp
index ef49083..ba771ee 100644
--- a/mmh3module.cpp
+++ b/mmh3module.cpp
@@ -143,7 +155,7 @@ mmh3_hash_bytes(PyObject *self, PyObject *args, PyObject *keywds)
     static char *kwlist[] = {(char *)"key", (char *)"seed",
       (char *)"x64arch", NULL};
 
-    if (!PyArg_ParseTupleAndKeywords(args, keywds, "s#|IB", kwlist,
+    if (!PyArg_ParseTupleAndKeywords(args, keywds, "s#|IBB", kwlist,
         &target_str, &target_str_len, &seed, &x64arch)) {
         return NULL;

There is no additional is_signed kwarg. I'm not sure if there should be. The result is this error every time the hash_bytes function is used:

RuntimeError: more argument specifiers than keyword list entries (remaining format:'B')

Switch license from CC0 to MIT

I plan to switch the license of this project from CC0 to MIT in the very near future.

The adoption of CC0 was an homage to Austin Appleby, the inventor of the MurmurHash3 algorithm, who published the code under the public domain.

However, CC0 is not recognized as an OSI-approved license, as it was withdrew in 2012 from the review process. Besides, in 2022, the Fedora community said they planned to demote the status of CC0 from "good" to "allowed-content only".

Considering these issues, I made a decision to adopt the MIT License, a simple yet one of the most popular OSI-approved permissive licenses.

(Bazel) AttributeError: module 'mmh3' has no attribute 'hash'

Consider: mmh3.hash("Hello World").

Expected behavior: returns 427197390

Actual behavior: raises exception AttributeError: module 'mmh3' has no attribute 'hash'

Regression: This works in version 4.0.0. The error is triggered in version 4.0.1.

Environment: Curiously, this seems to happen when running the test through Bazel, not when installing into a virtual environment. Not sure if the bug is on the mmh3 side or the Bazel side, but something changed between 4.0.0 and 4.0.1. Can you help me figure out what?

To reproduce, get the gist from https://gist.github.com/vonschultz/18b4e58a697d56c8cc421528e0a4ef13 and run

bazelisk test --test_output=streamed //...

Get bazelisk from https://github.com/bazelbuild/bazelisk/releases if you don't already have it.

I'm running Ubuntu 20.04.

hash128 returns unsigned int

I was using mmh3 as part of a project, and was getting invalid values when I tried to rescaled the hashes into the range [0, 1]. Turns out, mmh3.hash128 was returning unsigned integers, not signed integers as the documentation suggests.

I was using Python 3.6.2 and mmh3 2.4.

Can not compile in windows

C:\Users\10324\Downloads\mmh3-master\mmh3-master>python setup.py build
running build
running build_ext
building 'mmh3' extension
C:\Program Files (x86)\Microsoft Visual Studio 14.0\VC\BIN\x86_amd64\cl.exe /c /nologo /Ox /W3 /GL /DNDEBUG /MD -IC:\Users\10324\AppData\Local\Programs\Python\Python36\include -IC:\Users\10324\AppData\Local\Programs\Python\Python36\include "-IC:\Program Files (x86)\Microsoft Visual Studio 14.0\VC\INCLUDE" "-IC:\Program Files (x86)\Windows Kits\10\include\10.0.10240.0\ucrt" "-IC:\Program Files (x86)\Windows Kits\8.1\include\shared" "-IC:\Program Files (x86)\Windows Kits\8.1\include\um" "-IC:\Program Files (x86)\Windows Kits\8.1\include\winrt" /EHsc /Tpmmh3module.cpp /Fobuild\temp.win-amd64-3.6\Release\mmh3module.obj
mmh3module.cpp
c:\users\10324\downloads\mmh3-master\mmh3-master\MurmurHash3.h(16): error C2371: 'uint32_t': redefinition; different basic types
C:\Program Files (x86)\Microsoft Visual Studio 14.0\VC\INCLUDE\stdint.h(23): note: see declaration of 'uint32_t'
mmh3module.cpp(14): error C2371: 'int32_t': redefinition; different basic types
C:\Program Files (x86)\Microsoft Visual Studio 14.0\VC\INCLUDE\stdint.h(19): note: see declaration of 'int32_t'
mmh3module.cpp(17): error C2371: 'uint32_t': redefinition; different basic types
C:\Program Files (x86)\Microsoft Visual Studio 14.0\VC\INCLUDE\stdint.h(23): note: see declaration of 'uint32_t'
error: command 'C:\Program Files (x86)\Microsoft Visual Studio 14.0\VC\BIN\x86_amd64\cl.exe' failed with exit status 2

Run at Win10 64 bit

Status of project?

Just a friendly check, if there is still anyone here maintaining this project or if anyone knows more about the project status.

  • Last commit to project was almost 2 years ago.
  • There are issues without any response for a year and some open oneliner-PRs that should be easy to take in and that have been open for 6 months+, for example #35, that now makes python version upgrades more difficult if you are not building native modules yourself.
  • Hajime, the project owner, has not had any Github activity at all for over one year.
  • I sent an email to Hajime asking about the future plans, one week ago, and have not received any response. (Please others, let that email and this issue be the only channel of such reminders to avoid nagging the owner.)

While the code itself could be considered "done", the wish for prebuilt wheels will continue.
What is the recommended way forward for users mmh3? Build your own wheels? Alternative modules? Alternative distributions?

Does anyone know more?

SystemError: PY_SSIZE_T_CLEAN macro must be defined for '#' formats

Hi, I got this error under python 3.10

Python 3.10.10 (main, Mar 21 2023, 18:45:11) [GCC 11.2.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import mmh3
>>> a=mmh3.hash("abc", 1234)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
SystemError: PY_SSIZE_T_CLEAN macro must be defined for '#' formats
>>>

Any solution or code hacking?

Support for hashing data > 16GB

FORCE_INLINE uint64_t getblock ( const uint64_t * p, int i ) uses a 32-bit integer i for size, this fails for hashing huge blocks. Change to FORCE_INLINE uint64_t getblock ( const uint64_t * p, Py_ssize_t i ).

Here is the pull request:
#34

Does not compile in OSX Mavericks

$ pip install mmh3
Downloading/unpacking mmh3
  Running setup.py egg_info for package mmh3

Installing collected packages: mmh3
  Running setup.py install for mmh3
    building 'mmh3' extension
    cc -fno-strict-aliasing -fno-common -dynamic -arch x86_64 -arch i386 -g -Os -pipe -fno-common -fno-strict-aliasing -fwrapv -mno-fused-madd -DENABLE_DTRACE -DMACOSX -DNDEBUG -Wall -Wstrict-prototypes -Wshorten-64-to-32 -DNDEBUG -g -fwrapv -Os -Wall -Wstrict-prototypes -DENABLE_DTRACE -arch x86_64 -arch i386 -pipe -I/System/Library/Frameworks/Python.framework/Versions/2.7/include/python2.7 -c mmh3module.cpp -o build/temp.macosx-10.9-intel-2.7/mmh3module.o
    clang: error: unknown argument: '-mno-fused-madd' [-Wunused-command-line-argument-hard-error-in-future]
    clang: note: this will be a hard error (cannot be downgraded to a warning) in the future
    error: command 'cc' failed with exit status 1
    Complete output from command /Users/andre/work/penv/discosite/bin/python -c "import setuptools;__file__='/Users/andre/work/penv/discosite/build/mmh3/setup.py';exec(compile(open(__file__).read().replace('\r\n', '\n'), __file__, 'exec'))" install --single-version-externally-managed --record /var/folders/xq/yt9mr8t52cj9dqjsn8v0x5z40000gn/T/pip-kAdgDV-record/install-record.txt --install-headers /Users/andre/work/penv/discosite/include/site/python2.7:
    running install

running build

running build_ext

building 'mmh3' extension

cc -fno-strict-aliasing -fno-common -dynamic -arch x86_64 -arch i386 -g -Os -pipe -fno-common -fno-strict-aliasing -fwrapv -mno-fused-madd -DENABLE_DTRACE -DMACOSX -DNDEBUG -Wall -Wstrict-prototypes -Wshorten-64-to-32 -DNDEBUG -g -fwrapv -Os -Wall -Wstrict-prototypes -DENABLE_DTRACE -arch x86_64 -arch i386 -pipe -I/System/Library/Frameworks/Python.framework/Versions/2.7/include/python2.7 -c mmh3module.cpp -o build/temp.macosx-10.9-intel-2.7/mmh3module.o

clang: error: unknown argument: '-mno-fused-madd' [-Wunused-command-line-argument-hard-error-in-future]

clang: note: this will be a hard error (cannot be downgraded to a warning) in the future

error: command 'cc' failed with exit status 1

----------------------------------------
Command /Users/andre/work/penv/discosite/bin/python -c "import setuptools;__file__='/Users/andre/work/penv/discosite/build/mmh3/setup.py';exec(compile(open(__file__).read().replace('\r\n', '\n'), __file__, 'exec'))" install --single-version-externally-managed --record /var/folders/xq/yt9mr8t52cj9dqjsn8v0x5z40000gn/T/pip-kAdgDV-record/install-record.txt --install-headers /Users/andre/work/penv/discosite/include/site/python2.7 failed with error code 1 in /Users/andre/work/penv/discosite/build/mmh3
Storing complete log in /Users/andre/.pip/pip.log

get murmurhash3 of binary file in python

getting the murmur3 hash of a text file is trivial,
and i can get the murmur2 hash of binary files,
see https://github.com/milahu/murmurhash-cli-python

how to get the murmur3 hash of a binary file?

there is https://pypi.org/project/mmh3-binary/ but its an "empty fork"

expected API

#!/usr/bin/env python3

import mmh3

fd = open('/bin/sh', 'rb')
hash = mmh3.hash_from_buffer(fd)

fd is an io.BufferedReader

ideally, avoid passing a bytes array ... this should support "a million gigabyte" files in theory,
so the bytes should be "streamed" or "piped" into the mmh3 function

currently, mmh3 says

mmh3.hash_from_buffer(fd)
TypeError: a bytes-like object is required, not '_io.BufferedReader'

Decouple the original code from this repository

Currently I'm working on refactoring the library to decouple files whose large part traces back to the original C++ code (specifically, murmurhash3.c and murmurhash3.h) from this repository.

The update is to adress the pre-review process of the Journal of Open Source Software (JOSS), whose managing EiC (Daniel S. Katz) thoughtfully pointed out that it was not clear which part of this library was my (and other contributors') own contributions.
openjournals/joss-reviews#5487

I proposed to use a git submodule to refer to Appleby's repository, and then write a script that converts the original C++ files to more portable C code at compile time.

It turns out, however, readability may be degraded to some extent in my current ad hoc implementation, which may also impact on easiness of extension. Solving these issues will be left for future updates. On the other hand, this update will clarify the extent of the authorship of code and solve the license issue #45.

mmh3 not 64-bit ready

mmh3 cannot hash data larger than 2**31 bytes:

>>> import mmh3
>>> import numpy as np
>>> a = np.zeros(2**30, dtype=np.int8)
>>> mmh3.hash_bytes(a)
b"O\xc5\xf1\xf2\x80';s\x1b\xddc\xa1E\x8d\xe3r"
>>> a = np.zeros(2**32, dtype=np.int8)
>>> mmh3.hash_bytes(a)
Traceback (most recent call last):
  File "<ipython-input-9-918a38167947>", line 1, in <module>
    mmh3.hash_bytes(a)
OverflowError: size does not fit in an int

The solution is to either use the s* code instead of s# in PyArg_ParseTuple(), or define the PY_SSIZET_CLEAN macro and change size fields from int to Py_ssize_t. See https://docs.python.org/2.7/c-api/arg.html . I can also make a PR if you want.

Also, there's no test suite?

mmh3 fails to install (via pip) on macOS Mojave

Collecting mmh3
  Using cached mmh3-2.5.1.tar.gz (9.8 kB)
Building wheels for collected packages: mmh3
  Building wheel for mmh3 (setup.py) ... error
  ERROR: Command errored out with exit status 1:
   command: /Users/aeaeaeae/venv/dslearn/bin/python3.7 -u -c 'import sys, setuptools, tokenize; sys.argv[0] = '"'"'/private/var/folders/20/9yq706c906g0l3dc2cxpm2xw0000gn/T/pip-install-efejad8_/mmh3/setup.py'"'"'; __file__='"'"'/private/var/folders/20/9yq706c906g0l3dc2cxpm2xw0000gn/T/pip-install-efejad8_/mmh3/setup.py'"'"';f=getattr(tokenize, '"'"'open'"'"', open)(__file__);code=f.read().replace('"'"'\r\n'"'"', '"'"'\n'"'"');f.close();exec(compile(code, __file__, '"'"'exec'"'"'))' bdist_wheel -d /private/var/folders/20/9yq706c906g0l3dc2cxpm2xw0000gn/T/pip-wheel-ks30qw3g
       cwd: /private/var/folders/20/9yq706c906g0l3dc2cxpm2xw0000gn/T/pip-install-efejad8_/mmh3/
  Complete output (12 lines):
  running bdist_wheel
  running build
  running build_ext
  building 'mmh3' extension
  creating build
  creating build/temp.macosx-10.9-x86_64-3.7
  gcc -Wno-unused-result -Wsign-compare -Wunreachable-code -fno-common -dynamic -DNDEBUG -g -fwrapv -O3 -Wall -arch x86_64 -g -I/Library/Frameworks/Python.framework/Versions/3.7/include/python3.7m -c mmh3module.cpp -o build/temp.macosx-10.9-x86_64-3.7/mmh3module.o
  mmh3module.cpp:12:19: fatal error: stdio.h: No such file or directory
   #include <stdio.h>
                     ^
  compilation terminated.
  error: command 'gcc' failed with exit status 1
  ----------------------------------------
  ERROR: Failed building wheel for mmh3
  Running setup.py clean for mmh3
Failed to build mmh3
Installing collected packages: mmh3
    Running setup.py install for mmh3 ... error
    ERROR: Command errored out with exit status 1:
     command: /Users/aeaeaeae/venv/dslearn/bin/python3.7 -u -c 'import sys, setuptools, tokenize; sys.argv[0] = '"'"'/private/var/folders/20/9yq706c906g0l3dc2cxpm2xw0000gn/T/pip-install-efejad8_/mmh3/setup.py'"'"'; __file__='"'"'/private/var/folders/20/9yq706c906g0l3dc2cxpm2xw0000gn/T/pip-install-efejad8_/mmh3/setup.py'"'"';f=getattr(tokenize, '"'"'open'"'"', open)(__file__);code=f.read().replace('"'"'\r\n'"'"', '"'"'\n'"'"');f.close();exec(compile(code, __file__, '"'"'exec'"'"'))' install --record /private/var/folders/20/9yq706c906g0l3dc2cxpm2xw0000gn/T/pip-record-t_fjan2e/install-record.txt --single-version-externally-managed --compile --install-headers /Users/aeaeaeae/venv/dslearn/bin/../include/site/python3.7/mmh3
         cwd: /private/var/folders/20/9yq706c906g0l3dc2cxpm2xw0000gn/T/pip-install-efejad8_/mmh3/
    Complete output (12 lines):
    running install
    running build
    running build_ext
    building 'mmh3' extension
    creating build
    creating build/temp.macosx-10.9-x86_64-3.7
    gcc -Wno-unused-result -Wsign-compare -Wunreachable-code -fno-common -dynamic -DNDEBUG -g -fwrapv -O3 -Wall -arch x86_64 -g -I/Library/Frameworks/Python.framework/Versions/3.7/include/python3.7m -c mmh3module.cpp -o build/temp.macosx-10.9-x86_64-3.7/mmh3module.o
    mmh3module.cpp:12:19: fatal error: stdio.h: No such file or directory
     #include <stdio.h>
                       ^
    compilation terminated.
    error: command 'gcc' failed with exit status 1
    ----------------------------------------
ERROR: Command errored out with exit status 1: /Users/aeaeaeae/venv/dslearn/bin/python3.7 -u -c 'import sys, setuptools, tokenize; sys.argv[0] = '"'"'/private/var/folders/20/9yq706c906g0l3dc2cxpm2xw0000gn/T/pip-install-efejad8_/mmh3/setup.py'"'"'; __file__='"'"'/private/var/folders/20/9yq706c906g0l3dc2cxpm2xw0000gn/T/pip-install-efejad8_/mmh3/setup.py'"'"';f=getattr(tokenize, '"'"'open'"'"', open)(__file__);code=f.read().replace('"'"'\r\n'"'"', '"'"'\n'"'"');f.close();exec(compile(code, __file__, '"'"'exec'"'"'))' install --record /private/var/folders/20/9yq706c906g0l3dc2cxpm2xw0000gn/T/pip-record-t_fjan2e/install-record.txt --single-version-externally-managed --compile --install-headers /Users/aeaeaeae/venv/dslearn/bin/../include/site/python3.7/mmh3 Check the logs for full command output.
WARNING: You are using pip version 20.1.1; however, version 20.2.2 is available.
You should consider upgrading via the '/Users/aeaeaeae/venv/dslearn/bin/python3.7 -m pip install --upgrade pip' command.

Chained hashing not working as expected

Hello Hajime,

I guess i am doing something wrong - so this is probably not a real issue.
I am trying to hash big files an reading them in chunks for obvious reasons.
As a test i ran the following:

>>> mmh3.hash128('foobar', 0, signed = True)
155033341411922636178181560508455868997
>>> mmh3.hash128('bar',mmh3.hash128('foo', 0,signed = True), signed = True)
144772797738558108830387305245635675932

I expected the hash to be the same in both cases.
Am I missinterpreting the seed value - or is there another way of chaining hashes in murmur in general?

Thanks & Regards,

Martin

Python 3.10 support?

Hi, would it be possible to add wheels or build support for Python 3.10? I ran into this problem when trying to build from source:

#19 51.73     Running setup.py install for mmh3: started
#19 52.09     Running setup.py install for mmh3: finished with status 'error'
#19 52.09     ERROR: Command errored out with exit status 1:
#19 52.09      command: /usr/local/bin/python -u -c 'import io, os, sys, setuptools, tokenize; sys.argv[0] = '"'"'/tmp/pip-install-29v4jtx8/mmh3_e21cf43dc5144a5ca51d99e23e0f7752/setup.py'"'"'; __file__='"'"'/tmp/pip-install-29v4jtx8/mmh3_e21cf43dc5144a5ca51d99e23e0f7752/setup.py'"'"';f = getattr(tokenize, '"'"'open'"'"', open)(__file__) if os.path.exists(__file__) else io.StringIO('"'"'from setuptools import setup; setup()'"'"');code = f.read().replace('"'"'\r\n'"'"', '"'"'\n'"'"');f.close();exec(compile(code, __file__, '"'"'exec'"'"'))' install --record /tmp/pip-record-vjnuvpe7/install-record.txt --single-version-externally-managed --compile --install-headers /usr/local/include/python3.10/mmh3
#19 52.09          cwd: /tmp/pip-install-29v4jtx8/mmh3_e21cf43dc5144a5ca51d99e23e0f7752/
#19 52.09     Complete output (12 lines):
#19 52.09     running install
#19 52.09     /usr/local/lib/python3.10/site-packages/setuptools/command/install.py:34: SetuptoolsDeprecationWarning: setup.py install is deprecated. Use build and pip and other standards-based tools.
#19 52.09       warnings.warn(
#19 52.09     running build
#19 52.09     running build_ext
#19 52.09     building 'mmh3' extension
#19 52.09     creating build
#19 52.09     creating build/temp.linux-x86_64-3.10
#19 52.09     gcc -pthread -Wno-unused-result -Wsign-compare -DNDEBUG -g -fwrapv -O3 -Wall -fPIC -I/usr/local/include/python3.10 -c MurmurHash3.cpp -o build/temp.linux-x86_64-3.10/MurmurHash3.o
#19 52.09     gcc: fatal error: cannot execute ‘cc1plus’: execvp: No such file or directory
#19 52.09     compilation terminated.
#19 52.09     error: command '/usr/bin/gcc' failed with exit code 1
#19 52.09     ----------------------------------------
#19 52.09 ERROR: Command errored out with exit status 1: /usr/local/bin/python -u -c 'import io, os, sys, setuptools, tokenize; sys.argv[0] = '"'"'/tmp/pip-install-29v4jtx8/mmh3_e21cf43dc5144a5ca51d99e23e0f7752/setup.py'"'"'; __file__='"'"'/tmp/pip-install-29v4jtx8/mmh3_e21cf43dc5144a5ca51d99e23e0f7752/setup.py'"'"';f = getattr(tokenize, '"'"'open'"'"', open)(__file__) if os.path.exists(__file__) else io.StringIO('"'"'from setuptools import setup; setup()'"'"');code = f.read().replace('"'"'\r\n'"'"', '"'"'\n'"'"');f.close();exec(compile(code, __file__, '"'"'exec'"'"'))' install --record /tmp/pip-record-vjnuvpe7/install-record.txt --single-version-externally-managed --compile --install-headers /usr/local/include/python3.10/mmh3 Check the logs for full command output.

Unable to build mmh3 on macOS Mojave

OS X Version: 10.14.2

$ pip install mmh3
Collecting mmh3
  Using cached https://files.pythonhosted.org/packages/fa/7e/3ddcab0a9fcea034212c02eb411433db9330e34d626360b97333368b4052/mmh3-2.5.1.tar.gz
Building wheels for collected packages: mmh3
  Running setup.py bdist_wheel for mmh3 ... error
  Complete output from command /Users/pranjal/anaconda3/bin/python -u -c "import setuptools, tokenize;__file__='/private/var/folders/jd/r4rs3fq51qv90t08ljqpkkhc0000gp/T/pip-install-qi111dnd/mmh3/setup.py';f=getattr(tokenize, 'open', open)(__file__);code=f.read().replace('\r\n', '\n');f.close();exec(compile(code, __file__, 'exec'))" bdist_wheel -d /private/var/folders/jd/r4rs3fq51qv90t08ljqpkkhc0000gp/T/pip-wheel-7q_m8vvr --python-tag cp36:
  running bdist_wheel
  running build
  running build_ext
  building 'mmh3' extension
  creating build
  creating build/temp.macosx-10.7-x86_64-3.6
  gcc -DNDEBUG -g -fwrapv -O3 -Wall -Wstrict-prototypes -Qunused-arguments -Qunused-arguments -I/Users/pranjal/anaconda3/include/python3.6m -c mmh3module.cpp -o build/temp.macosx-10.7-x86_64-3.6/mmh3module.o
  warning: include path for stdlibc++ headers not found; pass '-std=libc++' on the command line to use the libc++ standard library instead [-Wstdlibcxx-not-found]
  1 warning generated.
  gcc -DNDEBUG -g -fwrapv -O3 -Wall -Wstrict-prototypes -Qunused-arguments -Qunused-arguments -I/Users/pranjal/anaconda3/include/python3.6m -c MurmurHash3.cpp -o build/temp.macosx-10.7-x86_64-3.6/MurmurHash3.o
  warning: include path for stdlibc++ headers not found; pass '-std=libc++' on the command line to use the libc++ standard library instead [-Wstdlibcxx-not-found]
  1 warning generated.
  creating build/lib.macosx-10.7-x86_64-3.6
  g++ -bundle -undefined dynamic_lookup -L/Users/pranjal/anaconda3/lib -arch x86_64 -L/Users/pranjal/anaconda3/lib -arch x86_64 -Qunused-arguments -Qunused-arguments -arch x86_64 build/temp.macosx-10.7-x86_64-3.6/mmh3module.o build/temp.macosx-10.7-x86_64-3.6/MurmurHash3.o -o build/lib.macosx-10.7-x86_64-3.6/mmh3.cpython-36m-darwin.so
  clang: warning: libstdc++ is deprecated; move to libc++ with a minimum deployment target of OS X 10.9 [-Wdeprecated]
  ld: library not found for -lstdc++
  clang: error: linker command failed with exit code 1 (use -v to see invocation)
  error: command 'g++' failed with exit status 1

  ----------------------------------------
  Failed building wheel for mmh3
  Running setup.py clean for mmh3
Failed to build mmh3
Installing collected packages: mmh3
  Running setup.py install for mmh3 ... error
    Complete output from command /Users/pranjal/anaconda3/bin/python -u -c "import setuptools, tokenize;__file__='/private/var/folders/jd/r4rs3fq51qv90t08ljqpkkhc0000gp/T/pip-install-qi111dnd/mmh3/setup.py';f=getattr(tokenize, 'open', open)(__file__);code=f.read().replace('\r\n', '\n');f.close();exec(compile(code, __file__, 'exec'))" install --record /private/var/folders/jd/r4rs3fq51qv90t08ljqpkkhc0000gp/T/pip-record-38bwkbu3/install-record.txt --single-version-externally-managed --compile:
    running install
    running build
    running build_ext
    building 'mmh3' extension
    creating build
    creating build/temp.macosx-10.7-x86_64-3.6
    gcc -DNDEBUG -g -fwrapv -O3 -Wall -Wstrict-prototypes -Qunused-arguments -Qunused-arguments -I/Users/pranjal/anaconda3/include/python3.6m -c mmh3module.cpp -o build/temp.macosx-10.7-x86_64-3.6/mmh3module.o
    warning: include path for stdlibc++ headers not found; pass '-std=libc++' on the command line to use the libc++ standard library instead [-Wstdlibcxx-not-found]
    1 warning generated.
    gcc -DNDEBUG -g -fwrapv -O3 -Wall -Wstrict-prototypes -Qunused-arguments -Qunused-arguments -I/Users/pranjal/anaconda3/include/python3.6m -c MurmurHash3.cpp -o build/temp.macosx-10.7-x86_64-3.6/MurmurHash3.o
    warning: include path for stdlibc++ headers not found; pass '-std=libc++' on the command line to use the libc++ standard library instead [-Wstdlibcxx-not-found]
    1 warning generated.
    creating build/lib.macosx-10.7-x86_64-3.6
    g++ -bundle -undefined dynamic_lookup -L/Users/pranjal/anaconda3/lib -arch x86_64 -L/Users/pranjal/anaconda3/lib -arch x86_64 -Qunused-arguments -Qunused-arguments -arch x86_64 build/temp.macosx-10.7-x86_64-3.6/mmh3module.o build/temp.macosx-10.7-x86_64-3.6/MurmurHash3.o -o build/lib.macosx-10.7-x86_64-3.6/mmh3.cpython-36m-darwin.so
    clang: warning: libstdc++ is deprecated; move to libc++ with a minimum deployment target of OS X 10.9 [-Wdeprecated]
    ld: library not found for -lstdc++
    clang: error: linker command failed with exit code 1 (use -v to see invocation)
    error: command 'g++' failed with exit status 1

    ----------------------------------------
Command "/Users/pranjal/anaconda3/bin/python -u -c "import setuptools, tokenize;__file__='/private/var/folders/jd/r4rs3fq51qv90t08ljqpkkhc0000gp/T/pip-install-qi111dnd/mmh3/setup.py';f=getattr(tokenize, 'open', open)(__file__);code=f.read().replace('\r\n', '\n');f.close();exec(compile(code, __file__, 'exec'))" install --record /private/var/folders/jd/r4rs3fq51qv90t08ljqpkkhc0000gp/T/pip-record-38bwkbu3/install-record.txt --single-version-externally-managed --compile" failed with error code 1 in /private/var/folders/jd/r4rs3fq51qv90t08ljqpkkhc0000gp/T/pip-install-qi111dnd/mmh3/

Hash function doesn't recognize signed as a keyword argument

Hi there,

I freshly downloaded mmh through Anaconda, and the mmh.hash function provides the following error message when I run "element= mmh3.hash(element,signed =False)".

TypeError: 'signed' is an invalid keyword argument for this function

Is this an issue with the Anaconda version or something else?

Port MurmurHash3 from C++ to C

In the course of implementing hashlib-compliant interfaces (#39), I plan to port the main code of MurmurHash3 from C++ (as originally written by Austin Appleby) to C for portability.

This actually can be done with few hassles, thanks to PEP 7 updates for Python >= 3.6; versions before 3.6 had to conform to C89 and did not officially support <stdint.h> or <inttypes.h>.

In addition, I will relicense these code from the public domain to MIT. The intent is purely for resolving issues related to the public domain and its kin licenses (#43). The text of the original public domain notice will be left for attribution and recognition.

Typing: hash64 missing x64arch argument

The code for mmh3_hash64 in mmh3module.c has a x64arch argument, but the typing file __init__.pyi does not declare this as an argument to hash64. The result is a mypy error if code uses the x64arch keyword argument.

Cannot install mmh3 via pip command windows

I have recently come into the same issue as here which was closed.

Please see:

C:\Users\Luca>python -m pip install murmurhash3
Collecting murmurhash3
  Using cached https://files.pythonhosted.org/packages/b5/f4/1f9c4851667a2541bd151b8d9efef707495816274fada365fa6a31085a32/murmurhash3-2.3.5.tar.gz
Building wheels for collected packages: murmurhash3
  Running setup.py bdist_wheel for murmurhash3 ... error
  Complete output from command C:\Users\Luca\Anaconda3\python.exe -u -c "import setuptools, tokenize;__file__='C:\\Users\\Luca\\AppData\\Local\\Temp\\pip-install-0ftrk0aa\\murmurhash3\\setup.py';f=getattr(tokenize, 'open', open)(__file__);code=f.read().replace('\r\n', '\n');f.close();exec(compile(code, __file__, 'exec'))" bdist_wheel -d C:\Users\Luca\AppData\Local\Temp\pip-wheel-6_gzb5c8 --python-tag cp37:
  running bdist_wheel
  running build
  running build_ext
  building 'mmh3' extension
  creating build
  creating build\temp.win-amd64-3.7
  creating build\temp.win-amd64-3.7\Release
  C:\Program Files (x86)\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.16.27023\bin\HostX86\x64\cl.exe /c /nologo /Ox /W3 /GL /DNDEBUG /MD -IC:\Users\Luca\Anaconda3\include -IC:\Users\Luca\Anaconda3\include "-IC:\Program Files (x86)\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.16.27023\ATLMFC\include" "-IC:\Program Files (x86)\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.16.27023\include" "-IC:\Program Files (x86)\Windows Kits\10\include\10.0.17763.0\ucrt" "-IC:\Program Files (x86)\Windows Kits\10\include\10.0.17763.0\shared" "-IC:\Program Files (x86)\Windows Kits\10\include\10.0.17763.0\um" "-IC:\Program Files (x86)\Windows Kits\10\include\10.0.17763.0\winrt" "-IC:\Program Files (x86)\Windows Kits\10\include\10.0.17763.0\cppwinrt" /EHsc /Tpmmh3module.cpp /Fobuild\temp.win-amd64-3.7\Release\mmh3module.obj
  mmh3module.cpp
  c:\users\luca\appdata\local\temp\pip-install-0ftrk0aa\murmurhash3\murmur_hash_3.hpp(5): error C2371: 'uint32_t': redefinition; different basic types
  C:\Program Files (x86)\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.16.27023\include\stdint.h(23): note: see declaration of 'uint32_t'
  mmh3module.cpp(9): error C2371: 'int32_t': redefinition; different basic types
  C:\Program Files (x86)\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.16.27023\include\stdint.h(19): note: see declaration of 'int32_t'
  mmh3module.cpp(12): error C2371: 'uint32_t': redefinition; different basic types
  C:\Program Files (x86)\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.16.27023\include\stdint.h(23): note: see declaration of 'uint32_t'
  error: command 'C:\\Program Files (x86)\\Microsoft Visual Studio\\2017\\Community\\VC\\Tools\\MSVC\\14.16.27023\\bin\\HostX86\\x64\\cl.exe' failed with exit status 2

  ----------------------------------------
  Failed building wheel for murmurhash3
  Running setup.py clean for murmurhash3
Failed to build murmurhash3
Installing collected packages: murmurhash3
  Running setup.py install for murmurhash3 ... error
    Complete output from command C:\Users\Luca\Anaconda3\python.exe -u -c "import setuptools, tokenize;__file__='C:\\Users\\Luca\\AppData\\Local\\Temp\\pip-install-0ftrk0aa\\murmurhash3\\setup.py';f=getattr(tokenize, 'open', open)(__file__);code=f.read().replace('\r\n', '\n');f.close();exec(compile(code, __file__, 'exec'))" install --record C:\Users\Luca\AppData\Local\Temp\pip-record-j4aoi9ln\install-record.txt --single-version-externally-managed --compile:
    running install
    running build
    running build_ext
    building 'mmh3' extension
    creating build
    creating build\temp.win-amd64-3.7
    creating build\temp.win-amd64-3.7\Release
    C:\Program Files (x86)\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.16.27023\bin\HostX86\x64\cl.exe /c /nologo /Ox /W3 /GL /DNDEBUG /MD -IC:\Users\Luca\Anaconda3\include -IC:\Users\Luca\Anaconda3\include "-IC:\Program Files (x86)\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.16.27023\ATLMFC\include" "-IC:\Program Files (x86)\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.16.27023\include" "-IC:\Program Files (x86)\Windows Kits\10\include\10.0.17763.0\ucrt" "-IC:\Program Files (x86)\Windows Kits\10\include\10.0.17763.0\shared" "-IC:\Program Files (x86)\Windows Kits\10\include\10.0.17763.0\um" "-IC:\Program Files (x86)\Windows Kits\10\include\10.0.17763.0\winrt" "-IC:\Program Files (x86)\Windows Kits\10\include\10.0.17763.0\cppwinrt" /EHsc /Tpmmh3module.cpp /Fobuild\temp.win-amd64-3.7\Release\mmh3module.obj
    mmh3module.cpp
    c:\users\luca\appdata\local\temp\pip-install-0ftrk0aa\murmurhash3\murmur_hash_3.hpp(5): error C2371: 'uint32_t': redefinition; different basic types
    C:\Program Files (x86)\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.16.27023\include\stdint.h(23): note: see declaration of 'uint32_t'
    mmh3module.cpp(9): error C2371: 'int32_t': redefinition; different basic types
    C:\Program Files (x86)\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.16.27023\include\stdint.h(19): note: see declaration of 'int32_t'
    mmh3module.cpp(12): error C2371: 'uint32_t': redefinition; different basic types
    C:\Program Files (x86)\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.16.27023\include\stdint.h(23): note: see declaration of 'uint32_t'
    error: command 'C:\\Program Files (x86)\\Microsoft Visual Studio\\2017\\Community\\VC\\Tools\\MSVC\\14.16.27023\\bin\\HostX86\\x64\\cl.exe' failed with exit status 2

I need mmh3 for h2o4gpu. I open a question on stackexhange relating to this issue.

Thanks

Unable to build on mmh

It looks like a previous issue resurfaced. Downloading through pip, I'm unable to build a wheel.

selection_001

How Golang call this?

my colleague use this function to generate a int64, could you please add a demo how Go call this function to get the same number? thank you in advance

Accept Iterable for Performance

Hi, thanks so much for this very useful library! I'm using it to randomize keys for billions of objects to create condensed files that contain groups of thousands to millions of objects at a time.

https://github.com/google/neuroglancer/blob/056a3548abffc3c76c93c7a906f1603ce02b5fa3/src/neuroglancer/datasource/precomputed/sharded.md

It's not critical, but there is a bottleneck step in the front of my Python processing pipeline where the hash is applied to all object labels at once to figure out how to assign them for further processing. The hash function is dominating this calculation.

Function: murmur at line 25
Time per Hit in microseconds
Line #      Hits         Time  Per Hit   % Time  Line Contents
==============================================================
    25                                           @profile
    26                                           def murmur(x):
    27     69734      74332.0      1.1      1.6    y = uint64(x).tobytes()
    28     69734    4381932.0     62.8     97.2    y = mmh3.hash64(y, x64arch=False)
    29     69733      52731.0      0.8      1.2    return uint64(y[0])

Total time: 5.44635 s
File: REDACTED
Function: compute_shard_location at line 145
Time per Hit in microseconds
Line #      Hits         Time  Per Hit   % Time  Line Contents
==============================================================
   145                                             @profile
   146                                             def compute_shard_location(self, key):
   147     69734      99805.0      1.4      1.8      chunkid = uint64(key) >> uint64(self.preshift_bits)
   148     69734    4703010.0     67.4     86.4      chunkid = self.hashfn(chunkid)
   149     69733      60072.0      0.9      1.1      minishard_number = uint64(chunkid & self.minishard_mask)
   150     69733      97034.0      1.4      1.8      shard_number = uint64((chunkid & self.shard_mask) >> uint64(self.minishard_bits))
   151     69733     312626.0      4.5      5.7      shard_number = format(shard_number, 'x').zfill(int(np.ceil(self.shard_bits / 4.0)))
   152     69733     102740.0      1.5      1.9      remainder = chunkid >> uint64(self.minishard_bits + self.shard_bits)
   153                                           
   154     69733      71060.0      1.0      1.3      return ShardLocation(shard_number, minishard_number, remainder)

It could be possible to thread this processing, but Python has the GIL. Multiprocessing could work, though the picke/unpickle will also take some time. I was thinking that a neat way to increase thoughput would be to process multiple hashes at once in C, that is, accept both a scalar and an iterator as input to the function. This would allow for the compiler to autovetorize and also avoid Python/C overheads. I'm getting ~66.5k hashes/sec on Apple Silicon M1 ARM64 currently.

I'm thinking of an interface similar to this. The second should be some buffer that is easy to read into numpy.

(lower, upper) = mmh3.hash64(some_bytes, x64arch=False)
[l1,h1,l2,h2] = mmh3.hash64(iterable_containing_bytes, x64arch=False)

Thank you for your consideration and for all the effort you've put into this library!

Remove __version__ in mmh3module.cpp

I plan to remove the line

PyModule_AddStringConstant(module, "__version__", "3.1.0");

in mmh3module.cpp in the next non-trivial update.

The __version__ constant was introduced in 2.1 (2013-02-25), since it was in vogue back then and the only (iirc) way to get the version number of a module from within a python script.

However, Python 3.8 officially introduced importlib.metadata, which is no longer provisional in Python 3.10. Python 3.7 still needs to pip install importlib-metadata, but it will be in EOL soon. Therefore, there is no need to keep the __version__ constant anymore. Plus, keeping the same info in multiple files is a bad (very bad indeed) engineering practice.

See also
https://stackoverflow.com/a/72168209

I will bump up the version to 4.0.0 then, because the removal breaks backward compatibility, however minor it is.

hash64/hash128/hash_bytes(..., x64arch = False) fail on s390x

Test cases for the functions hash64/hash128/hash_bytes fail on s390x when the arg x64arch is False (https://github.com/hajimes/mmh3/actions/runs/7396371609), likely because the architecture is big-endian.

  >       assert mmh3.hash64("foo", signed=False, x64arch=False) == (
              6968798590592097061,
              6968798590746895717,
          )
  E       assert (630394342576...8590746895717) == (696879859059...8590746895717)
  E         At index 0 diff: 6303943425762141541 != 6968798590592097061
  E         Use -v to get more diff

The result should be the same as the value in little-endian environments (feature from 4.0.0).

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.