sourc7 / smhasher Goto Github PK
View Code? Open in Web Editor NEWAutomatically exported from code.google.com/p/smhasher
Automatically exported from code.google.com/p/smhasher
string "str_hash_mmh"
should give mmh3_128 value
EBC36FB5EAEF57B81E12217B6E10C8B1 (x64 Linux/Solaris/Windows/etc)
while for big-endian machines it gives
B857EFEAB56FC3EBB1C8106E7B21121E (SPARC Solaris)
Original issue reported on code.google.com by alexandr.ustinov
on 5 Dec 2014 at 4:35
Is there a class similar to CMurmurHash2A, using the MurmurHash3 algorithm, or
can you explain breifly how it might be produced? I imagine it needs 3 state
fields, namely h1, c1, and c2. I previously converted CMurmurHash2A to a struct
in C#, and plan to do the same for the new algorithm.
Thank you, Frank Hileman
Original issue reported on code.google.com by [email protected]
on 8 Jan 2011 at 5:46
as title..
because boost is a good lib to use..
QQ
Original issue reported on code.google.com by [email protected]
on 19 Aug 2012 at 6:11
It's slow in my machine. A Slackware 13.1 box with a Core i5 650 3.2 GHz.
It's only 500 MB/s, where OpenSSL MD5 is 600 MB/s.
The reason is the (broken) optimizer of gcc 4.4.4 that does things like:
g++ -g -c -O2 MurmurHash3.cpp -o MurmurHash3.o
-> .cpp
k1 *= c1;
-> .asm
152: 8d 14 80 lea (%eax,%eax,4),%edx
155: 8d 14 90 lea (%eax,%edx,4),%edx
158: c1 e2 03 shl $0x3,%edx
15b: 29 c2 sub %eax,%edx
15d: 8d 14 d2 lea (%edx,%edx,8),%edx
160: 8d 14 90 lea (%eax,%edx,4),%edx
163: 8d 14 d0 lea (%eax,%edx,8),%edx
166: 8d 14 90 lea (%eax,%edx,4),%edx
169: 8d 14 50 lea (%eax,%edx,2),%edx
16c: 8d 14 90 lea (%eax,%edx,4),%edx
16f: 8d 14 92 lea (%edx,%edx,4),%edx
172: 8d 14 50 lea (%eax,%edx,2),%edx
175: 8d 04 d0 lea (%eax,%edx,8),%eax
178: 8d 14 c5 00 00 00 00 lea 0x0(,%eax,8),%edx
17f: 29 d0 sub %edx,%eax
changing the c1,c2,c3,c4 constants as global variables, fixes that, and I get
3000 MB/s.
Original issue reported on code.google.com by [email protected]
on 12 Apr 2011 at 6:31
A change in revision 145 caused MurmurHash3 to start failing the "keyset
'Window' tests" with:
Keyset 'Windowed' - 64-bit key, 20-bit window - 64 tests, 1048576 keys per test
Window at 0 - Testing collisions - Expected 128.00, actual 1048575.00 (8192.00x) !!!!!
Window at 1 - Testing collisions - Expected 128.00, actual 1048575.00 (8192.00x) !!!!!
... etc ...
I applies the r145 changes one by one to r144 and found that the removal of the
following line from CMakeLists.txt is the cause.
set(CMAKE_CXX_FLAGS "-g -fno-strict-aliasing -Wall")
After about an hour I've come to the conclusion that strict-aliasing is complex
:). GCC has __attribute__((__may_alias__)) that I believe should allow
strict-aliasing to be enabled project wide but disabled in the places that
cause the problem, but I can't get it to work.
Adding the line back into CMakeLists.txt fixes the problem, but the logs say it
was removed to stop Visual Studio from complaining. Perhaps that problem can be
solved by some other means.
(I'm using Debian-6 with GCC 4.4.5)
Shane Day
Original issue reported on code.google.com by [email protected]
on 16 Jul 2012 at 4:33
SMHasher is reporting that Murmur3A is failing on CentOS 6 x86_64.
The standard development environment was used
(gcc 4.4.6, cmake 2.6-patch 4, GNU make 3.81)
Expected output is successful window tests, similar to
http://code.google.com/p/smhasher/wiki/MurmurHash3_x86_32
Instead, 'SMHasher Murmur3A' is reporting that window tests are failing.
Keyset 'Windowed' - 64-bit key, 20-bit window - 64 tests, 1048576 keys per
test
Window at 0 - Testing collisions - Expected 128.00, actual 1048575.00
(8192.00x) !!!!!
Window at 1 - Testing collisions - Expected 128.00, actual 1048575.00
(8192.00x) !!!!!
Window at 2 - Testing collisions - Expected 128.00, actual 1048575.00
(8192.00x) !!!!!
[...]
Window at 64 - Testing collisions - Expected 128.00, actual 1048575.00
(8192.00x) !!!!!
*********FAIL*********
Repeatable on multiple systems, using SVN version 145, CentOS 6.2,
2.6.32-220.13.1.el6.x86_64
SMHasher results attached.
Original issue reported on code.google.com by [email protected]
on 2 May 2012 at 3:23
Attachments:
What steps will reproduce the problem?
1. copied all the files in Linux-m/c(x86_64),Gcc 4.5.0
2. Compile command g++ -D__USE_XOPEN2K8 main.cpp AvalancheTest.cpp Bitslice.cpp
Bitvec.cpp City.cpp CityTest.cpp DifferentialTest.cpp Hashes.cpp KeysetTest.cpp
MurmurHash1.cpp MurmurHash2.cpp MurmurHash3.cpp Platform.cpp Random.cpp
SpeedTest.cpp Spooky.cpp Types.cpp SuperFastHash.cpp Stats.cpp -o main
3. When run it produces segment fault
What is the expected output? What do you see instead?
Expect to run murmur hash test
What version of the product are you using? On what operating system?
Linux
Please provide any additional information below.
-------------------------------------------------------------------------------
--- Testing Murmur2 (MurmurHash2 for x86, 32-bit)
[[[ Sanity Tests ]]]
Program received signal SIGSEGV, Segmentation fault.
0x000000000065c800 in memcmp@@GLIBC_2.2.5 ()
Original issue reported on code.google.com by [email protected]
on 9 May 2015 at 8:58
buffer1 and buffer2 are not freed (lines 54 and 55).
r136
Original issue reported on code.google.com by [email protected]
on 2 Jul 2011 at 2:54
Trying to port C++ implementation to C# I found a funny bug that confused me
when I compared the hashes between both implementations.
The hard coded value for the MurMurHash3_x86_32 (end probably for the rest of
methods) is not correct because of the bug in KeysetTest.cpp.
The problem is in the following piece of code (lines 28-33):
for(int i = 0; i < 256; i++)
{
key[i] = (uint8_t)i;
hash(key,i,256-i,&hashes[i*hashbytes]);
}
At first iteration the first element of key array is set to "0" and the length
of the key array is 1, but as far as i is sent to hash function as len
parameter, it is set to 0.
The correct code should look like:
hash(key,i+1,256-i,&hashes[i*hashbytes]);
and the resulting hash is 0xDBFB92BE.
Original issue reported on code.google.com by [email protected]
on 28 Feb 2013 at 4:09
If of any interest I wrote a 32bit constexpr MurmurHash3 implementation that
calculates the hash value at compile time:
https://gist.github.com/mattyclarkson/5318077
Original issue reported on code.google.com by [email protected]
on 20 May 2013 at 9:21
Hello,
I am raising a understanding query on verification code of hash function, using
the issues tab as I did not find any other appropriate place to connect with
the forum.
=====
I am trying to understand Verification Tests run on every hashing algorithm.
I see during initialization of hash function, a static verification code is
mentioned.
How did we arise at this verification code? Any pointers would be of great help
to understand it better.
e.g.
{ crc32, 32, 0x3719DB20, "crc32", "CRC-32" },
i.e. crc32 has a verification code "0x3719DB20". How did we come to this
verification code to begin with?
Thank you.
=====
Original issue reported on code.google.com by [email protected]
on 3 Jul 2015 at 6:23
Replace inline by FORCE_INLINE in rotl*
Original issue reported on code.google.com by [email protected]
on 23 Apr 2012 at 4:52
Attachments:
In City.cpp, change #include "city.h" to #include "City.h"
Otherwise it doesn't compile in *nix.
Original issue reported on code.google.com by [email protected]
on 12 Apr 2011 at 6:14
What steps will reproduce the problem?
1. Supply a negative seed value when constructing an instance of MurmurHash3
2. Compute a hash
3. Get a different hash value than the reference implementation provides for
the same input and seed
Please provide any additional information below.
Below is the current constructor for when a seed is supplied.
public Murmur3F(int seed)
{
this.seed = seed & (0xffffffffL); // unsigned
h1 = seed;
h2 = seed;
}
Here is the code from the reference implementation:
void MurmurHash3_x64_128 ( const void * key, const int len,
const uint32_t seed, void * out )
{
const uint8_t * data = (const uint8_t*)key;
const int nblocks = len / 16;
uint64_t h1 = seed;
uint64_t h2 = seed;
Note that h1 and h2 are unsigned 64-bit integers. Because they are unsigned,
they will not inherit the sign of the supplied seed. However, in the
corresponding java code, h1 and h2 will become negative if the seed is negative
(e.g. a value greater than Integer.MAX_VALUE).
A minor modification to the constructor resolves the issue, and makes the
output congrument with the reference implementation for negative seed values:
public Murmur3F(int seed)
{
this.seed = seed & (0xffffffffL); // unsigned
h1 = this.seed;
h2 = this.seed;
}
Original issue reported on code.google.com by [email protected]
on 31 Dec 2014 at 3:39
The murmurhash code is open-source, but lacking a proper license.
"Public domain is not a license", as they say.
<http://www.linuxjournal.com/article/6225>
Original issue reported on code.google.com by [email protected]
on 14 Mar 2014 at 6:21
On Solaris, with native cc, _BIG_ENDIAN or _LITTLE_ENDIAN is #define'd, but
just #define'd; there is no numeric value. This causes tests like
(_BIG_ENDIAN==1) (PMurHash.c:96) to fail with a "missing operand".
Perhaps choose BIG if _BIG_ENDIAN is defined and _LITTLE_ENDIAN is not defined,
and vice versa. Using (_BIG_ENDIAN+0==1) should get around the syntax problem
for that part of the test.
Original issue reported on code.google.com by [email protected]
on 15 Oct 2014 at 8:48
Would it be possible to have some basic test vectors for testing alternate
implementations ?
Even a simple single one, like the hash of "The quick brown fox jumps over the
lazy dog" for all the algorithm flavor would be enough.
I found a lot of wrong implementations of Murmur3 and a such simple test vector
from the official site will help a lot the developers.
Original issue reported on code.google.com by [email protected]
on 30 Apr 2011 at 2:06
Perhaps a bug in my code. I'm not able to create a sanity test that will
generate the SHA1 result from the SHA1 in SMHasher using the sha1 implemented
by Node.js. I'll create a test, but offhand I'm wondering if you're aware of
any discrepancies between your SHA1 and the OpenSSL SHA1?
Original issue reported on code.google.com by [email protected]
on 16 Aug 2013 at 3:47
More recent versions of gcc (such as 4.7.0) have started warning that functions
using the FORCE_INLINE attribute could not always be inlined.
In fact, __attribute__((always_inline)) is supposed to be used together with
`inline' instead of replace it.
The attached patch (verified to work with gcc 4.8.0 and clang 3.2) adds the
missing keyword.
Original issue reported on code.google.com by [email protected]
on 18 Apr 2013 at 12:05
Attachments:
I wanted to make sure this is intentional and not a mistake. The mixing logic
for the tail (last 1 to 3 bytes) is different from the body mixing, and I was
not sure why. Here is the tail mix:
k1 *= c1;
k1 = ROTL32(k1,16);
k1 *= c2;
h1 ^= k1;
Original issue reported on code.google.com by [email protected]
on 13 May 2011 at 10:18
City.h shouldn't include Platform.h, as that can lead to compile errors when
City.h is included in other third-party software (such as Chromium:
https://code.google.com/p/chromium/issues/detail?id=353157).
The attached patch should fix the issue, in a way analogous to what was done in
r151 for the MurmurHash functions. If this looks good to you could you please
apply it?
Original issue reported on code.google.com by [email protected]
on 24 Mar 2014 at 11:23
Attachments:
Adds:
* One-At-A-Time
* Lookup2
* Python's hash function
Also renames FNV to FNV-1a because the implemented FNV hash is the 1a variant.
Adds FNV-1 so it can be compared to the 1a variant.
Original issue reported on code.google.com by [email protected]
on 6 Jul 2014 at 6:41
Attachments:
Compiling MurmurHash3.cpp as C code instead of C++ outputs some errors that are
trivially fixed.
The attached diff makes two changes.
Added the suffixes 32 and 64 to getblock and fmix since C does not do function
overloading. Updated function calls sites to use the correct one.
Changed "uint64_t(tail[14])" style casts to "((uint64_t)tail[14])" in the
function MurmurHash3_x64_128.
Original issue reported on code.google.com by [email protected]
on 9 Jul 2012 at 2:53
Attachments:
It would be nice if a license file would be included, for MurmurHash3 and Co,
that we can package with our binary distribution of our software. I intend to
use the Hash function, which seems to be ok, given the MIT license.
Original issue reported on code.google.com by [email protected]
on 26 Jan 2011 at 12:54
What steps will reproduce the problem?
1. Rename Murmurhash3.cpp to Murmurhash3.c
2. Import Murmurhash3.h and Murmurhash3.c into Xcode
3. Attempt to compile it into a project.
What is the expected output? What do you see instead?
Should work, but to errors:
Undefined symbols for architecture i386:
"_rotl32", referenced from:
_MurmurHash3_x86_32 in MurmurHash3.o
_MurmurHash3_x86_128 in MurmurHash3.o
"_rotl64", referenced from:
_MurmurHash3_x64_128 in MurmurHash3.o
ld: symbol(s) not found for architecture i386
clang: error: linker command failed with exit code 1 (use -v to see invocation)
What version of the product are you using? On what operating system?
Copied from Subversion today (2014-05-05). Xcode 5.1.1.
Please provide any additional information below.
Was able to get it to work by making those functions static:
Index: MurmurHash3.cpp
===================================================================
--- MurmurHash3.cpp (revision 152)
+++ MurmurHash3.cpp (working copy)
@@ -31,12 +31,12 @@
#define FORCE_INLINE inline __attribute__((always_inline))
-inline uint32_t rotl32 ( uint32_t x, int8_t r )
+static inline uint32_t rotl32 ( uint32_t x, int8_t r )
{
return (x << r) | (x >> (32 - r));
}
-inline uint64_t rotl64 ( uint64_t x, int8_t r )
+static inline uint64_t rotl64 ( uint64_t x, int8_t r )
{
return (x << r) | (x >> (64 - r));
}
Original issue reported on code.google.com by [email protected]
on 5 May 2014 at 10:22
Fix description for City64
Original issue reported on code.google.com by [email protected]
on 21 Aug 2011 at 9:16
Attachments:
I was interested in using a hash for integer arrays with few numbers in them.
So I searched for any collisions in a very sample case: unique triplets of
numbers of the form (i,j,k) , where 1<= i< j < k <=255. The hash is calculated
from the simple uint32_t array made with those 3 numbers. There are dozens of
repetitions!
What steps will reproduce the problem?
-Run the enclosed C++ file
See the attached "sample output" for identified collisions.
Am I doing something wrong?
Original issue reported on code.google.com by [email protected]
on 10 Mar 2012 at 11:10
Attachments:
o it calculates the buffer size in quadwords, rounded down
nblocks = len/16;
o then the body loop iterates i up to nblocks-1:
for(int i = 0; i < nblocks; i++)
o Inside the loop, these accesses happen:
uint64_t k1 = getblock(blocks,i*2+0);
uint64_t k2 = getblock(blocks,i*2+1);
o "blocks" is a qword pointer and the getblock calls are equivalent to
blocks[i*2] and blocks[i*2+1]
o So these qword accesses will occur up to qword offset:
(nblocks-1)*2+1
...almost twice the caller's buffer size.
Original issue reported on code.google.com by [email protected]
on 2 Apr 2013 at 6:02
Your MurmurHash3 implementation uses the following tail code:
switch(len & 3)
{
case 3: k1 ^= tail[2] << 16;
case 2: k1 ^= tail[1] << 8;
case 1: k1 ^= tail[0];
k1 *= c1; k1 = ROTL32(k1,15); k1 *= c2; h1 ^= k1;
};
But this does match the (otherwise identical) MurmurHash algorithm in Wikipedia:
with any remainingBytesInKey
remainingBytes <- SwapEndianOrderOf(remainingBytesInKey)
// Note: Endian swapping is only necessary on big-endian machines.
// The purpose is to place the meaningful digits towards the low end of the value,
// so that these digits have the greatest potential to affect the low range digits
// in the subsequent multiplication. Consider that locating the meaningful digits
// in the high range would produce a greater effect upon the high digits of the
// multiplication, and notably, that such high digits are likely to be discarded
// by the modulo arithmetic under overflow. We don't want that.
remainingBytes <- remainingBytes * c1
remainingBytes <- (remainingBytes << r1) OR (remainingBytes >> (32 - r1))
remainingBytes <- remainingBytes * c2
hash <- hash XOR remainingBytes
To make the two versions match, you'd have to change the bitwise-xor in your
implementation to bitwise-or, as follows:
switch(len & 3)
{
case 3: k1 |= tail[2] << 16;
case 2: k1 |= tail[1] << 8;
case 1: k1 |= tail[0];
k1 *= c1; k1 = ROTL32(k1,15); k1 *= c2; h1 ^= k1;
};
I'm just wondering which implementation is correct. I can't figure out where
the canonical implementation of the algorithm is.
Original issue reported on code.google.com by [email protected]
on 17 Aug 2013 at 7:07
Select tests to run from command line.
Move test definitions from main.cpp to Tester.cpp
Original issue reported on code.google.com by [email protected]
on 2 Feb 2012 at 10:06
Attachments:
Compilation of MurmurHash3.cpp under MSVC2010 failed with error "redefinition
of uint32_t...".
I changed line "#if defined(_MSC_VER)" to "#if defined(_MSC_VER) && (_MSC_VER
<= 1500)" and it worked for me.
Thank you!
Original issue reported on code.google.com by [email protected]
on 20 Dec 2012 at 10:03
What steps will reproduce the problem?
Read the code.
What is the expected output? What do you see instead?
In MurmurHash3.h there are all 6 variations of MurmurHash3. However in
MurmurHash3.cpp, there are only 3 implementations.
What version of the product are you using? On what operating system?
SVN repository by 4/6/2011. Revision 125.
Please provide any additional information below.
In project homepage, update on 4/2/2011 states that "MurmurHash3, all versions,
is final." Is this suggesting that the other 3 variations are on the way? Will
be the package that contains all 6 variation released for downloading?
Original issue reported on code.google.com by [email protected]
on 8 Apr 2011 at 3:34
I've attached a patch to fix some warnings generated for a 64-bit Windows
target. It's just two explicit casts, and I verified that the types are in
range.
Original issue reported on code.google.com by [email protected]
on 21 Dec 2012 at 1:03
Attachments:
Hi,
The benchmarks: http://code.google.com/p/smhasher/wiki/MurmurHash3 show a
MurmurHash2_x64_64 which is fast but not recommended. But i wonder, is there a
MurmurHash3_x64_64? I'm guessing such a function would be about the same as
MurmurHash3_x64_128 only faster since it just needs 1 64 bit int.
Kind regards,
Mark
Original issue reported on code.google.com by [email protected]
on 16 Feb 2012 at 3:12
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.