Code Monkey home page Code Monkey logo

smhasher's People

Watchers

 avatar  avatar

smhasher's Issues

SHA1 Sanity does not agree with Node.js SHA1.

Perhaps a bug in my code. I'm not able to create a sanity test that will 
generate the SHA1 result from the SHA1 in SMHasher using the sha1 implemented 
by Node.js. I'll create a test, but offhand I'm wondering if you're aware of 
any discrepancies between your SHA1 and the OpenSSL SHA1?


Original issue reported on code.google.com by [email protected] on 16 Aug 2013 at 3:47

SmHasher breaks without -fno-strict-aliasing

A change in revision 145 caused MurmurHash3 to start failing the "keyset 
'Window' tests" with:

  Keyset 'Windowed' -  64-bit key,  20-bit window - 64 tests, 1048576 keys per test
  Window at   0 - Testing collisions   - Expected   128.00, actual 1048575.00 (8192.00x) !!!!!
  Window at   1 - Testing collisions   - Expected   128.00, actual 1048575.00 (8192.00x) !!!!!
  ... etc ...

I applies the r145 changes one by one to r144 and found that the removal of the 
following line from CMakeLists.txt is the cause.

  set(CMAKE_CXX_FLAGS "-g -fno-strict-aliasing -Wall")

After about an hour I've come to the conclusion that strict-aliasing is complex 
:). GCC has __attribute__((__may_alias__)) that I believe should allow 
strict-aliasing to be enabled project wide but disabled in the places that 
cause the problem, but I can't get it to work.

Adding the line back into CMakeLists.txt fixes the problem, but the logs say it 
was removed to stop Visual Studio from complaining. Perhaps that problem can be 
solved by some other means.

(I'm using Debian-6 with GCC 4.4.5)

Shane Day

Original issue reported on code.google.com by [email protected] on 16 Jul 2012 at 4:33

Lots of collisions with triplets of integers

I was interested in using a hash for integer arrays with few numbers in them.
So I searched for any collisions in a very sample case: unique triplets of 
numbers of the form (i,j,k) , where 1<= i< j < k <=255. The hash is calculated 
from the simple uint32_t array made with those 3 numbers. There are dozens of 
repetitions!

What steps will reproduce the problem?
-Run the enclosed C++ file
See the attached "sample output" for identified collisions.

Am I doing something wrong?

Original issue reported on code.google.com by [email protected] on 10 Mar 2012 at 11:10

Attachments:

MurmurHash3_x86_32 Algorithm Does Not match the one in Wikipedia

Your MurmurHash3 implementation uses the following tail code:

  switch(len & 3)
  {
  case 3: k1 ^= tail[2] << 16;
  case 2: k1 ^= tail[1] << 8;
  case 1: k1 ^= tail[0];
          k1 *= c1; k1 = ROTL32(k1,15); k1 *= c2; h1 ^= k1;
  };


But this does match the (otherwise identical) MurmurHash algorithm in Wikipedia:


    with any remainingBytesInKey
        remainingBytes <- SwapEndianOrderOf(remainingBytesInKey)
        // Note: Endian swapping is only necessary on big-endian machines.
        //       The purpose is to place the meaningful digits towards the low end of the value,
        //       so that these digits have the greatest potential to affect the low range digits
        //       in the subsequent multiplication.  Consider that locating the meaningful digits
        //       in the high range would produce a greater effect upon the high digits of the
        //       multiplication, and notably, that such high digits are likely to be discarded
        //       by the modulo arithmetic under overflow.  We don't want that.

        remainingBytes <- remainingBytes * c1
        remainingBytes <- (remainingBytes << r1) OR (remainingBytes >> (32 - r1))
        remainingBytes <- remainingBytes * c2

        hash <- hash XOR remainingBytes


To make the two versions match, you'd have to change the bitwise-xor in your 
implementation to bitwise-or, as follows:

  switch(len & 3)
  {
  case 3: k1 |= tail[2] << 16;
  case 2: k1 |= tail[1] << 8;
  case 1: k1 |= tail[0];
          k1 *= c1; k1 = ROTL32(k1,15); k1 *= c2; h1 ^= k1;
  };



I'm just wondering which implementation is correct.  I can't figure out where 
the canonical implementation of the algorithm is.


Original issue reported on code.google.com by [email protected] on 17 Aug 2013 at 7:07

More hashes

Adds:
* One-At-A-Time
* Lookup2
* Python's hash function

Also renames FNV to FNV-1a because the implemented FNV hash is the 1a variant.  
Adds FNV-1 so it can be compared to the 1a variant.

Original issue reported on code.google.com by [email protected] on 6 Jul 2014 at 6:41

Attachments:

MSVC2010 compilation.

Compilation of MurmurHash3.cpp under MSVC2010 failed with error "redefinition 
of uint32_t...".
I changed line "#if defined(_MSC_VER)" to "#if defined(_MSC_VER) && (_MSC_VER 
<= 1500)" and it worked for me.
Thank you!

Original issue reported on code.google.com by [email protected] on 20 Dec 2012 at 10:03

Make inline Functions static

What steps will reproduce the problem?
1. Rename Murmurhash3.cpp to Murmurhash3.c
2. Import Murmurhash3.h and Murmurhash3.c into Xcode
3. Attempt to compile it into a project.

What is the expected output? What do you see instead?

Should work, but to errors:

Undefined symbols for architecture i386:
  "_rotl32", referenced from:
      _MurmurHash3_x86_32 in MurmurHash3.o
      _MurmurHash3_x86_128 in MurmurHash3.o
  "_rotl64", referenced from:
      _MurmurHash3_x64_128 in MurmurHash3.o
ld: symbol(s) not found for architecture i386
clang: error: linker command failed with exit code 1 (use -v to see invocation)

What version of the product are you using? On what operating system?

Copied from Subversion today (2014-05-05). Xcode 5.1.1.

Please provide any additional information below.

Was able to get it to work by making those functions static:

Index: MurmurHash3.cpp
===================================================================
--- MurmurHash3.cpp (revision 152)
+++ MurmurHash3.cpp (working copy)
@@ -31,12 +31,12 @@

 #define    FORCE_INLINE inline __attribute__((always_inline))

-inline uint32_t rotl32 ( uint32_t x, int8_t r )
+static inline uint32_t rotl32 ( uint32_t x, int8_t r )
 {
   return (x << r) | (x >> (32 - r));
 }

-inline uint64_t rotl64 ( uint64_t x, int8_t r )
+static inline uint64_t rotl64 ( uint64_t x, int8_t r )
 {
   return (x << r) | (x >> (64 - r));
 }

Original issue reported on code.google.com by [email protected] on 5 May 2014 at 10:22

Very slow with gcc 4.4.4

It's slow in my machine. A Slackware 13.1 box with a Core i5 650 3.2 GHz.

It's only 500 MB/s, where OpenSSL MD5 is 600 MB/s.

The reason is the (broken) optimizer of gcc 4.4.4 that does things like:

g++ -g -c -O2 MurmurHash3.cpp -o MurmurHash3.o

-> .cpp
    k1 *= c1;
-> .asm
 152:   8d 14 80                lea    (%eax,%eax,4),%edx
 155:   8d 14 90                lea    (%eax,%edx,4),%edx
 158:   c1 e2 03                shl    $0x3,%edx
 15b:   29 c2                   sub    %eax,%edx
 15d:   8d 14 d2                lea    (%edx,%edx,8),%edx
 160:   8d 14 90                lea    (%eax,%edx,4),%edx
 163:   8d 14 d0                lea    (%eax,%edx,8),%edx
 166:   8d 14 90                lea    (%eax,%edx,4),%edx
 169:   8d 14 50                lea    (%eax,%edx,2),%edx
 16c:   8d 14 90                lea    (%eax,%edx,4),%edx
 16f:   8d 14 92                lea    (%edx,%edx,4),%edx
 172:   8d 14 50                lea    (%eax,%edx,2),%edx
 175:   8d 04 d0                lea    (%eax,%edx,8),%eax
 178:   8d 14 c5 00 00 00 00    lea    0x0(,%eax,8),%edx
 17f:   29 d0                   sub    %edx,%eax

changing the c1,c2,c3,c4 constants as global variables, fixes that, and I get 
3000 MB/s.



Original issue reported on code.google.com by [email protected] on 12 Apr 2011 at 6:31

Is there a MurmurHash3_x64_64?

Hi,

The benchmarks: http://code.google.com/p/smhasher/wiki/MurmurHash3 show a 
MurmurHash2_x64_64 which is fast but not recommended. But i wonder, is there a 
MurmurHash3_x64_64? I'm guessing such a function would be about the same as 
MurmurHash3_x64_128 only faster since it just needs 1 64 bit int.

Kind regards,
Mark

Original issue reported on code.google.com by [email protected] on 16 Feb 2012 at 3:12

Murmur3A on CentOS 6 x86_64 fails Window Tests

SMHasher is reporting that Murmur3A is failing on CentOS 6 x86_64.

The standard development environment was used
(gcc 4.4.6, cmake 2.6-patch 4, GNU make 3.81)

Expected output is successful window tests, similar to 
http://code.google.com/p/smhasher/wiki/MurmurHash3_x86_32 

Instead, 'SMHasher Murmur3A' is reporting that window tests are failing.
Keyset 'Windowed' -  64-bit key,  20-bit window - 64 tests, 1048576 keys per 
test
Window at   0 - Testing collisions   - Expected   128.00, actual 1048575.00 
(8192.00x) !!!!! 
Window at   1 - Testing collisions   - Expected   128.00, actual 1048575.00 
(8192.00x) !!!!! 
Window at   2 - Testing collisions   - Expected   128.00, actual 1048575.00 
(8192.00x) !!!!! 
[...]
Window at  64 - Testing collisions   - Expected   128.00, actual 1048575.00 
(8192.00x) !!!!! 
*********FAIL*********

Repeatable on multiple systems, using SVN version 145, CentOS 6.2, 
2.6.32-220.13.1.el6.x86_64

SMHasher results attached.

Original issue reported on code.google.com by [email protected] on 2 May 2012 at 3:23

Attachments:

Improper handling of negative seeds

What steps will reproduce the problem?
1. Supply a negative seed value when constructing an instance of MurmurHash3
2. Compute a hash
3. Get a different hash value than the reference implementation provides for 
the same input and seed



Please provide any additional information below.

Below is the current constructor for when a seed is supplied.

   public Murmur3F(int seed)
   {
      this.seed = seed & (0xffffffffL); // unsigned
      h1 = seed;
      h2 = seed;
   }

Here is the code from the reference implementation:

void MurmurHash3_x64_128 ( const void * key, const int len,
                           const uint32_t seed, void * out )
{
  const uint8_t * data = (const uint8_t*)key;
  const int nblocks = len / 16;

  uint64_t h1 = seed;
  uint64_t h2 = seed;


Note that h1 and h2 are unsigned 64-bit integers. Because they are unsigned, 
they will not inherit the sign of the supplied seed. However, in the 
corresponding java code, h1 and h2 will become negative if the seed is negative 
(e.g. a value greater than Integer.MAX_VALUE).

A minor modification to the constructor resolves the issue, and makes the 
output congrument with the reference implementation for negative seed values:

   public Murmur3F(int seed)
   {
      this.seed = seed & (0xffffffffL); // unsigned
      h1 = this.seed;
      h2 = this.seed;
   }

Original issue reported on code.google.com by [email protected] on 31 Dec 2014 at 3:39

[PATCH] Add missing `inline' keyword to FORCE_INLINE

More recent versions of gcc (such as 4.7.0) have started warning that functions 
using the FORCE_INLINE attribute could not always be inlined.

In fact, __attribute__((always_inline)) is supposed to be used together with 
`inline' instead of replace it.

The attached patch (verified to work with gcc 4.8.0 and clang 3.2) adds the 
missing keyword.

Original issue reported on code.google.com by [email protected] on 18 Apr 2013 at 12:05

Attachments:

License File Missing ;)

It would be nice if a license file would be included, for MurmurHash3 and Co, 
that we can package with our binary distribution of our software. I intend to 
use the Hash function, which seems to be ok, given the MIT license.

Original issue reported on code.google.com by [email protected] on 26 Jan 2011 at 12:54

missing license

The murmurhash code is open-source, but lacking a proper license.
"Public domain is not a license", as they say. 
<http://www.linuxjournal.com/article/6225>

Original issue reported on code.google.com by [email protected] on 14 Mar 2014 at 6:21

murmurhash3 128bit gives wrong result on big-endian platform

string "str_hash_mmh"

should give mmh3_128 value

EBC36FB5EAEF57B81E12217B6E10C8B1 (x64 Linux/Solaris/Windows/etc)

while for big-endian machines it gives

B857EFEAB56FC3EBB1C8106E7B21121E (SPARC Solaris)


Original issue reported on code.google.com by alexandr.ustinov on 5 Dec 2014 at 4:35

Class similar to CMurmurHash2A, using the MurmurHash3 algorithm?

Is there a class similar to CMurmurHash2A, using the MurmurHash3 algorithm, or 
can you explain breifly how it might be produced? I imagine it needs 3 state 
fields, namely h1, c1, and c2. I previously converted CMurmurHash2A to a struct 
in C#, and plan to do the same for the new algorithm. 

Thank you, Frank Hileman

Original issue reported on code.google.com by [email protected] on 8 Jan 2011 at 5:46

Probably bug in KeysetTest.cpp

Trying to port C++ implementation to C# I found a funny bug that confused me 
when I compared the hashes between both implementations.

The hard coded value for the MurMurHash3_x86_32 (end probably for the rest of 
methods) is not correct because of the bug in KeysetTest.cpp.

The problem is in the following piece of code (lines 28-33):

 for(int i = 0; i < 256; i++)
  {
    key[i] = (uint8_t)i;

    hash(key,i,256-i,&hashes[i*hashbytes]);
  }

At first iteration the first element of key array is set to "0" and the length 
of the key array is 1, but as far as i is sent to hash function as len 
parameter, it is set to 0.

The correct code should look like:

    hash(key,i+1,256-i,&hashes[i*hashbytes]);

and the resulting hash is 0xDBFB92BE.

Original issue reported on code.google.com by [email protected] on 28 Feb 2013 at 4:09

Wrote a constexpr MurmurHash3 function

If of any interest I wrote a 32bit constexpr MurmurHash3 implementation that 
calculates the hash value at compile time:

https://gist.github.com/mattyclarkson/5318077

Original issue reported on code.google.com by [email protected] on 20 May 2013 at 9:21

Please provided basic test vectors

Would it be possible to have some basic test vectors for testing alternate 
implementations ?

Even a simple single one, like the hash of "The quick brown fox jumps over the 
lazy dog" for all the algorithm flavor would be enough.

I found a lot of wrong implementations of Murmur3 and a such simple test vector 
from the official site will help a lot the developers.


Original issue reported on code.google.com by [email protected] on 30 Apr 2011 at 2:06

Lacking 3 MurmurHash3 variations

What steps will reproduce the problem?
Read the code.

What is the expected output? What do you see instead?
In MurmurHash3.h there are all 6 variations of MurmurHash3. However in 
MurmurHash3.cpp, there are only 3 implementations.

What version of the product are you using? On what operating system?
SVN repository by 4/6/2011. Revision 125.

Please provide any additional information below.
In project homepage, update on 4/2/2011 states that "MurmurHash3, all versions, 
is final." Is this suggesting that the other 3 variations are on the way? Will 
be the package that contains all 6 variation released for downloading?

Original issue reported on code.google.com by [email protected] on 8 Apr 2011 at 3:34

MurmurHash3_x64_128 reads past end of key buffer

o it calculates the buffer size in quadwords, rounded down
  nblocks = len/16;  

o then the body loop iterates i up to nblocks-1:
  for(int i = 0; i < nblocks; i++)

o Inside the loop, these accesses happen:
  uint64_t k1 = getblock(blocks,i*2+0);
  uint64_t k2 = getblock(blocks,i*2+1);

o "blocks" is a qword pointer and the getblock calls are equivalent to
  blocks[i*2]  and   blocks[i*2+1]

o So these qword accesses will occur up to qword offset:
  (nblocks-1)*2+1
  ...almost twice the caller's buffer size.





Original issue reported on code.google.com by [email protected] on 2 Apr 2013 at 6:02

Fixes to City.h to to allow inclusion in third-party sources

City.h shouldn't include Platform.h, as that can lead to compile errors when 
City.h is included in other third-party software (such as Chromium: 
https://code.google.com/p/chromium/issues/detail?id=353157).

The attached patch should fix the issue, in a way analogous to what was done in 
r151 for the MurmurHash functions.  If this looks good to you could you please 
apply it?

Original issue reported on code.google.com by [email protected] on 24 Mar 2014 at 11:23

Attachments:

Minor changes to allow compiling MurmurHash3.cpp as C code.

Compiling MurmurHash3.cpp as C code instead of C++ outputs some errors that are 
trivially fixed.

The attached diff makes two changes.

Added the suffixes 32 and 64 to getblock and fmix since C does not do function 
overloading. Updated function calls sites to use the correct one.

Changed "uint64_t(tail[14])" style casts to "((uint64_t)tail[14])" in the 
function MurmurHash3_x64_128.

Original issue reported on code.google.com by [email protected] on 9 Jul 2012 at 2:53

Attachments:

PMurHash.c on Solaris

On Solaris, with native cc, _BIG_ENDIAN or _LITTLE_ENDIAN is #define'd, but 
just #define'd; there is no numeric value. This causes tests like 
(_BIG_ENDIAN==1) (PMurHash.c:96) to fail with a "missing operand".

Perhaps choose BIG if _BIG_ENDIAN is defined and _LITTLE_ENDIAN is not defined, 
and vice versa. Using (_BIG_ENDIAN+0==1) should get around the syntax problem 
for that part of the test.

Original issue reported on code.google.com by [email protected] on 15 Oct 2014 at 8:48

City.h case

In City.cpp, change #include "city.h" to #include "City.h"

Otherwise it doesn't compile in *nix.

Original issue reported on code.google.com by [email protected] on 12 Apr 2011 at 6:14

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.