cyan4973 / xxhash Goto Github PK
View Code? Open in Web Editor NEWExtremely fast non-cryptographic hash algorithm
Home Page: http://www.xxhash.com/
License: Other
Extremely fast non-cryptographic hash algorithm
Home Page: http://www.xxhash.com/
License: Other
In OS X, when the Terminal window is narrower than the output of xxhsum, the "Loading filename..." display is not erased after the hash is calculated. This causes a confusing display, as shown below. Here I issue the same command twice, but the second time, the Terminal window is narrower. This is more of a problem when using a very long path or filename as input.
macpro-yosemite:A007R6QT dit$ xxhsum-dev A007C001_160108_R6QT.mov
1886a9da141ee804 A007C001_160108_R6QT.mov
macpro-yosemite:A007R6QT dit$ xxhsum-dev A007C001_160108_R6QT.mov
Loading A007C001_160108_R6QT.mov..1886a9da141ee804 A007C001_160108_R6QT.mov
Note sure why, but you've put only forward declarations of XXH64_state_s and XXH64_state_t in the header file. This makes it so I can't instantiate XXH64_state_t in my code when xxhash is built as a library and I include xxhash.h.
Is there any reason for this design choice? Seems to work just fine when I move the declarations to the header itself.
It would be interesting to compare the performance of this function with PJW or ELF:
https://en.wikipedia.org/wiki/PJW_hash_function
Some source code is available at:
http://www.partow.net/downloads/GeneralHashFunctions_-_C.zip
From the NuGet Package Manager:
Package System.Data.HashFunction.xxHash 1.8.2.2 is not compatible with uap10.0.10586 (UAP,Version=v10.0.10586).
Good Morning.
We need test Fast Hash algorithm, So i have installed xxhsum, How to install xxhash for Fedora-24 version & CentOS - 7.2
Kindly advice how to install xxhash & xxHash64.
How to test the xxHash64 in command.
Thanks,
Chellasundar SR
As per vurtun/nuklear#285 (comment) there is an issue with forcing the user of xxHash to and I quote "reproduce the above copyright notice, this list of conditions and the following disclaimer in the documentation and/or other materials provided with the distribution". This puts a burden on developers and in case of such a tiny code base does not seem to be that necessary.
Would you @Cyan4973 mind changing the BSD license to MIT, which requires referencing only in the source code?
Hi, do you have any plans for creating 128 bit version( 16 bytes for resulting hash )?
The review of API/ABI changes for xxHash since 0.5.0 version: https://abi-laboratory.pro/tracker/timeline/xxhash/
Created with the help of open-source abi-tracker tool: https://github.com/lvc/abi-tracker.
The report is updated 3 times a week. Hope it will be helpful for users and maintainers of the library.
In the development environment we've established where I'm trying to use XXHASH, we have a hard limit on source line length of 120 characters. There are a few lines in the source which are longer than this. Can these be adjusted to use additional lines (some preprocessor statements will require backslash to split a line)
If you compile xxHash
with a compiler supporting -fsanitize=undefined
(I used clang version 3.5)
CFLAGS=-fsanitize=undefined make
then run xxh64sum
with the following input
xxh64sum <(echo -n abcdbcdecdefdefgefghfghighijhijkijkljklmklmnlmnomnopnopq)
It looks like there is an alignment error of some sort.
xxhash.c:240:43: runtime error: member access within misaligned address 0x7fff573102d4 for type 'U64_S' (aka 'struct _U64_S'), which requires 8 byte alignment
0x7fff573102d4: note: pointer points here
18 00 00 00 69 6a 6b 6c 6a 6b 6c 6d 6b 6c 6d 6e 6c 6d 6e 6f 6d 6e 6f 70 6e 6f 70 71 6c 35 2f 62
^
xxhash.c:240:43: runtime error: load of misaligned address 0x7fff573102d4 for type 'U64' (aka 'unsigned long long'), which requires 8 byte alignment
0x7fff573102d4: note: pointer points here
18 00 00 00 69 6a 6b 6c 6a 6b 6c 6d 6b 6c 6d 6e 6c 6d 6e 6f 6d 6e 6f 70 6e 6f 70 71 6c 35 2f 62
^
I don't know enough about the xxHash
internals nor memory alignment to diagnosis the cause.
A question rather than an issue,
Does xxHash guarantee to produce the same output from the same input across all implementations and past/future versions?
Basically, is it viable to be used cross vendor?
As it says above currently I'm using md5sum for my checking of files since it's the fastest algorithm out there currently that lets me do that. I would like it if xxhsum would be able to do similar system, with a -c option and that would greatly speed up things for me. As at the moment I'm pretty sure all it does is hash the file but no way to check a file against a hash without using something else.
Hi,
Just a note: R package digest
(https://github.com/eddelbuettel/digest) also implements xxHash as one of its choices, so it could be added to the list of language wrappers on https://cyan4973.github.io/xxHash/.
The header xxhash.h
uses the #pragma once
as its inclusion-guard, which is still not in the standard (it was rejected, because it cannot be implemented reliably according to the committees), therefore it is not portable.
I would recommend to use the good ol' CPP definition based solution:
#ifndef XXHASH_H_5627135585666179
#define XXHASH_H_5627135585666179 1
/* Here goes the header content */
#endif /* XXHASH_H_5627135585666179 */
(The numbers are random ones, just to make sure the header guard identifier is unique.)
Hi, not an issue or a request but I could not find how else I could contact you.
In case you care, I just wanted to notify an other use of your awesome library: Keypirinha. Cheers
// This is a derivative work based on xxHash 0.6.2, copyright below:
/*
xxHash - Extremely Fast Hash algorithm
Header File
Copyright (C) 2012-2016, Yann Collet.
BSD 2-Clause License (http://www.opensource.org/licenses/bsd-license.php)
Redistribution and use in source and binary forms, with or without
modification, are permitted provided that the following conditions are
met:
* Redistributions of source code must retain the above copyright
notice, this list of conditions and the following disclaimer.
* Redistributions in binary form must reproduce the above
copyright notice, this list of conditions and the following disclaimer
in the documentation and/or other materials provided with the
distribution.
THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
"AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
(INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
You can contact the author at :
- xxHash source repository : https://github.com/Cyan4973/xxHash
*/
#ifndef NUDB_DETAIL_XXHASH_HPP
#define NUDB_DETAIL_XXHASH_HPP
#include <nudb/detail/endian.hpp>
#include <cstdint>
#include <cstdlib>
#include <cstring>
namespace nudb {
namespace detail {
#define NUDB_GCC_VERSION (__GNUC__ * 100 + __GNUC_MINOR__)
// minGW _rotl gives poor performance
#if defined(_MSC_VER)
# define NUDB_XXH_rotl64(x,r) _rotl64(x,r)
#else
# define NUDB_XXH_rotl64(x,r) ((x << r) | (x >> (64 - r)))
#endif
#if defined(_MSC_VER)
# define NUDB_XXH_swap32 _byteswap_ulong
#elif NUDB_GCC_VERSION >= 403
# define NUDB_XXH_swap32 __builtin_bswap32
#endif
#if defined(_MSC_VER)
# define NUDB_XXH_swap64 _byteswap_uint64
#elif NUDB_GCC_VERSION >= 403
# define NUDB_XXH_swap64 __builtin_bswap64
#endif
#ifndef NUDB_XXH_swap32
inline
std::uint32_t
NUDB_XXH_swap32(std::uint32_t x)
{
return ((x << 24) & 0xff000000 ) |
((x << 8) & 0x00ff0000 ) |
((x >> 8) & 0x0000ff00 ) |
((x >> 24) & 0x000000ff );
}
#endif
#ifndef NUDB_XXH_swap64
inline
std::uint64_t
NUDB_XXH_swap64(std::uint64_t x)
{
return ((x << 56) & 0xff00000000000000ULL) |
((x << 40) & 0x00ff000000000000ULL) |
((x << 24) & 0x0000ff0000000000ULL) |
((x << 8) & 0x000000ff00000000ULL) |
((x >> 8) & 0x00000000ff000000ULL) |
((x >> 24) & 0x0000000000ff0000ULL) |
((x >> 40) & 0x000000000000ff00ULL) |
((x >> 56) & 0x00000000000000ffULL);
}
#endif
static std::uint64_t constexpr prime64_1 = 11400714785074694791ULL;
static std::uint64_t constexpr prime64_2 = 14029467366897019727ULL;
static std::uint64_t constexpr prime64_3 = 1609587929392839161ULL;
static std::uint64_t constexpr prime64_4 = 9650029242287828579ULL;
static std::uint64_t constexpr prime64_5 = 2870177450012600261ULL;
// Portable and safe solution. Generally efficient.
// see : http://stackoverflow.com/a/32095106/646947
inline
std::uint32_t
XXH_read32(void const* p)
{
std::uint32_t v;
memcpy(&v, p, sizeof(v));
return v;
}
inline
std::uint64_t
XXH_read64(void const* p)
{
std::uint64_t v;
memcpy(&v, p, sizeof(v));
return v;
}
// little endian, aligned
inline
std::uint32_t
XXH_readLE32_align(void const* p, std::true_type, std::true_type)
{
return *reinterpret_cast<std::uint32_t const*>(p);
}
// little endian, unaligned
inline
std::uint32_t
XXH_readLE32_align(void const* p, std::true_type, std::false_type)
{
return XXH_read32(p);
}
// big endian, aligned
inline
std::uint32_t
XXH_readLE32_align(void const* p, std::false_type, std::true_type)
{
return NUDB_XXH_swap32(
*reinterpret_cast<std::uint32_t const*>(p));
}
// big endian, unaligned
inline
std::uint32_t
XXH_readLE32_align(void const* p, std::false_type, std::false_type)
{
return NUDB_XXH_swap32(XXH_read32(p));
}
// little endian, aligned
inline
std::uint64_t
XXH_readLE64_align(void const* p, std::true_type, std::true_type)
{
return *reinterpret_cast<std::uint64_t const*>(p);
}
// little endian, unaligned
inline
std::uint64_t
XXH_readLE64_align(void const* p, std::true_type, std::false_type)
{
return XXH_read64(p);
}
// big endian, aligned
inline
std::uint64_t
XXH_readLE64_align(void const* p, std::false_type, std::true_type)
{
return NUDB_XXH_swap64(
*reinterpret_cast<std::uint64_t const*>(p));
}
// big endian, unaligned
inline
std::uint64_t
XXH_readLE64_align(void const* p, std::false_type, std::false_type)
{
return NUDB_XXH_swap64(XXH_read64(p));
}
inline
std::uint64_t
XXH64_round(std::uint64_t acc, std::uint64_t input)
{
acc += input * prime64_2;
acc = NUDB_XXH_rotl64(acc, 31);
acc *= prime64_1;
return acc;
}
inline
std::uint64_t
XXH64_mergeRound(std::uint64_t acc, std::uint64_t val)
{
val = XXH64_round(0, val);
acc ^= val;
acc = acc * prime64_1 + prime64_4;
return acc;
}
template<bool LittleEndian, bool Aligned>
std::uint64_t
XXH64_endian_align(
void const* input, std::size_t len, std::uint64_t seed,
std::integral_constant<bool, LittleEndian> endian,
std::integral_constant<bool, Aligned> align)
{
const std::uint8_t* p = (const std::uint8_t*)input;
const std::uint8_t* const bEnd = p + len;
std::uint64_t h64;
auto const XXH_get32bits =
[](void const* p)
{
return XXH_readLE32_align(p,
decltype(endian){}, decltype(align){});
};
auto const XXH_get64bits =
[](void const* p)
{
return XXH_readLE64_align(p,
decltype(endian){}, decltype(align){});
};
if(len>=32)
{
const std::uint8_t* const limit = bEnd - 32;
std::uint64_t v1 = seed + prime64_1 + prime64_2;
std::uint64_t v2 = seed + prime64_2;
std::uint64_t v3 = seed + 0;
std::uint64_t v4 = seed - prime64_1;
do
{
v1 = XXH64_round(v1, XXH_get64bits(p)); p+=8;
v2 = XXH64_round(v2, XXH_get64bits(p)); p+=8;
v3 = XXH64_round(v3, XXH_get64bits(p)); p+=8;
v4 = XXH64_round(v4, XXH_get64bits(p)); p+=8;
}
while(p<=limit);
h64 = NUDB_XXH_rotl64(v1, 1) +
NUDB_XXH_rotl64(v2, 7) +
NUDB_XXH_rotl64(v3, 12) +
NUDB_XXH_rotl64(v4, 18);
h64 = XXH64_mergeRound(h64, v1);
h64 = XXH64_mergeRound(h64, v2);
h64 = XXH64_mergeRound(h64, v3);
h64 = XXH64_mergeRound(h64, v4);
}
else
{
h64 = seed + prime64_5;
}
h64 += len;
while(p + 8 <= bEnd)
{
std::uint64_t const k1 = XXH64_round(0, XXH_get64bits(p));
h64 ^= k1;
h64 = NUDB_XXH_rotl64(h64,27) * prime64_1 + prime64_4;
p+=8;
}
if(p+4<=bEnd)
{
h64 ^= (std::uint64_t)(XXH_get32bits(p)) * prime64_1;
h64 = NUDB_XXH_rotl64(h64, 23) * prime64_2 + prime64_3;
p+=4;
}
while(p<bEnd)
{
h64 ^= (*p) * prime64_5;
h64 = NUDB_XXH_rotl64(h64, 11) * prime64_1;
p++;
}
h64 ^= h64 >> 33;
h64 *= prime64_2;
h64 ^= h64 >> 29;
h64 *= prime64_3;
h64 ^= h64 >> 32;
return h64;
}
/* Calculate the 64-bit hash of a block of memory.
@param data A pointer to the buffer to compute the hash on.
The buffer may be unaligned.
@note This function runs faster on 64-bits systems, but slower
on 32-bits systems (see benchmark).
@param bytes The size of the buffer in bytes.
@param seed A value which may be used to permute the output.
Using a different seed with the same input will produce a
different value.
@return The 64-bit hash of the input data.
*/
template<class = void>
std::uint64_t
XXH64(void const* data, size_t bytes, std::uint64_t seed)
{
// Use faster algorithm if aligned
if((reinterpret_cast<std::uintptr_t>(data) & 7) == 0)
return XXH64_endian_align(data, bytes, seed,
is_little_endian{}, std::false_type{});
return XXH64_endian_align(data, bytes, seed,
is_little_endian{}, std::true_type{});
}
} // detail
} // nudb
#endif
As a follow up to http://stackoverflow.com/questions/34058947/hashing-tuple-in-python-causing-different-results-in-different-systems/37360914?noredirect=1#comment64261200_37360914 it would be great to include a few sanity test that prove that a hash value is the same when computed on different OSes and architectures. One possibility could be to use appveyor to run tests on Windows and QEMU on Travis to test various OS/arches.
If this can help there is a Google Summer of Code projects and some test setup code available strace that could be used as a base for QEMU.
When compiling on Mac OS X 10.10, the output filename is xxHash.exe, whereas it should be xxHash.
Hi Yann,
As per trial mail, I have installed xxh64sum and xx32sum. I need to check the entire directory. We need every file and folder checksum value. How to run and check the directory.
[root@localhost xxHash-master]# ./xxh xxh32sum xxh64sum xxhsum
[root@localhost xxHash-master]# ./xxh64sum Wrong parameters ./xxh64sum 0.6.2 (64-bits little endian), by Yann Collet Usage : ./xxh64sum [arg] [filenames] When no filename provided, or - provided : use stdin as input Arguments : -H# : hash selection : 0=32bits, 1=64bits (default: 1) -c : read xxHash sums from the [filenames] and check them -h : help [root@localhost xxHash-master]#
I recently imported xxhash into my project https://github.com/paulkramme/btsoot/. I am, however, unsure that i have fulfilled the license's demands. I included the xxhash in the README and put xxhash.*
inside my repository. Is this all i have to do?
It's much easier for cmake-based projects to integrate xxhash if it has CMakeLists.txt.
Something like this?
cmake_minimum_required(VERSION 2.6)
cmake_policy(VERSION 2.6)
project(xxhash)
# I don't find xxHash's release version info. Let's say r39 means 0.39.
set(XXHASH_LIB_VERSION "0.39.0")
set(XXHASH_LIB_SOVERSION "0")
set(BUILD_STATIC_LIBS ON CACHE BOOL "Set to ON to also build static libraries")
if(BUILD_STATIC_LIBS)
add_library(xxhashstatic xxhash.c)
set_target_properties(xxhashstatic PROPERTIES OUTPUT_NAME xxhash)
endif(BUILD_STATIC_LIBS)
add_library(xxhash SHARED xxhash.c)
set_target_properties(xxhash PROPERTIES
COMPILE_DEFINITIONS "XXHASH_EXPORT"
VERSION "${XXHASH_LIB_VERSION}"
SOVERSION "${XXHASH_LIB_SOVERSION}")
The above example separates out xxhsum.c, which is under GPL thus some project might not want to include.
// edit: markdown format
Clang's AddressSanitizer flagged a global buffer overflow in my unit tests. Here's self-contained example:
#include "xxhash.h"
int main() {
XXH64_state_t state;
XXH64_update(&state, "foo", 3);
XXH64_update(&state, "bar", 3);
XXH64_update(&state, "baz", 3);
XXH64_digest(&state);
}
I'm on OSX with Clang:
c++ -v
Apple LLVM version 7.0.0 (clang-700.0.72)
Target: x86_64-apple-darwin14.5.0
Thread model: posix
And compiled this as follows:
c++ -g -std=c++11 -stdlib=libc++ -fsanitize=address -fno-omit-frame-pointer test.cc xxhash.c
Running the executable gives me the following output:
==62514==ERROR: AddressSanitizer: global-buffer-overflow on address 0x000103b16de4 at pc 0x000103b5732a bp 0x7fff5c0ee2e0 sp 0x7fff5c0eda98
READ of size 1519340655 at 0x000103b16de4 thread T0
==62514==atos returned: An admin user name and password is required to enter Developer Mode.
#0 0x103b57329 in __asan_memcpy (/Applications/Xcode.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/lib/clang/7.0.0/lib/darwin/libclang_rt.asan_osx_dynamic.dylib+0x39329)
#1 0x103b16c16 in XXH_memcpy(void*, void const*, unsigned long) (/private/tmp/xxHash/./a.out+0x100005c16)
#2 0x103b15070 in XXH64_update_endian(XXH64_state_t*, void const*, unsigned long, XXH_endianess) (/private/tmp/xxHash/./a.out+0x100004070)
#3 0x103b14c54 in XXH64_update (/private/tmp/xxHash/./a.out+0x100003c54)
#4 0x103b11bfc in main (/private/tmp/xxHash/./a.out+0x100000bfc)
#5 0x7fff8828d5c8 in start (/usr/lib/system/libdyld.dylib+0x35c8)
#6 0x0 (<unknown module>)
0x000103b16de4 is located 60 bytes to the left of global variable '<string literal>' defined in 'test.cc:6:24' (0x103b16e20) of size 4
'<string literal>' is ascii string 'bar'
0x000103b16de4 is located 0 bytes to the right of global variable '<string literal>' defined in 'test.cc:5:24' (0x103b16de0) of size 4
'<string literal>' is ascii string 'foo'
SUMMARY: AddressSanitizer: global-buffer-overflow ??:0 __asan_memcpy
Shadow bytes around the buggy address:
0x100020762d60: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
0x100020762d70: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
0x100020762d80: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
0x100020762d90: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
0x100020762da0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
=>0x100020762db0: 00 00 00 00 00 00 00 00 00 00 00 00[04]f9 f9 f9
0x100020762dc0: f9 f9 f9 f9 04 f9 f9 f9 f9 f9 f9 f9 04 f9 f9 f9
0x100020762dd0: f9 f9 f9 f9 00 00 00 00 04 f9 f9 f9 f9 f9 f9 f9
0x100020762de0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
0x100020762df0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
0x100020762e00: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
Shadow byte legend (one shadow byte represents 8 application bytes):
Addressable: 00
Partially addressable: 01 02 03 04 05 06 07
Heap left redzone: fa
Heap right redzone: fb
Freed heap region: fd
Stack left redzone: f1
Stack mid redzone: f2
Stack right redzone: f3
Stack partial redzone: f4
Stack after return: f5
Stack use after scope: f8
Global redzone: f9
Global init order: f6
Poisoned by user: f7
Container overflow: fc
Array cookie: ac
Intra object redzone: bb
ASan internal: fe
Left alloca redzone: ca
Right alloca redzone: cb
==62514==ABORTING
[1] 62514 abort ./a.out
With all the casting going on, I can't tell whether that's a false positive or whether it's actually legit. I hope you can reproduce this.
In xxhsum.c, you define some Windows specific constructs when building under Cygwin. That is not correct. The purpose of Cygwin is to provide a POSIX API layer over Windows APIs (whereas the purpose of MinGW* projects is to allow Windows programs to be compiled using the GCC toolchain).
$ gcc -DXXH_PRIVATE_API -O3 -march=native xxhsum.c -o xxhsum-cygwin.exe
xxhsum.c: In function ‘BMK_hash’:
xxhsum.c:71:42: warning: implicit declaration of function ‘_fileno’ [-Wimplicit-function-declaration]
# define SET_BINARY_MODE(file) _setmode(_fileno(file), _O_BINARY)
^
xxhsum.c:523:9: note: in expansion of macro ‘SET_BINARY_MODE’
SET_BINARY_MODE(stdin);
^
xxhsum.c:71:57: error: ‘_O_BINARY’ undeclared (first use in this function)
# define SET_BINARY_MODE(file) _setmode(_fileno(file), _O_BINARY)
^
xxhsum.c:523:9: note: in expansion of macro ‘SET_BINARY_MODE’
SET_BINARY_MODE(stdin);
^
xxhsum.c:71:57: note: each undeclared identifier is reported only once for each function it appears in
# define SET_BINARY_MODE(file) _setmode(_fileno(file), _O_BINARY)
^
xxhsum.c:523:9: note: in expansion of macro ‘SET_BINARY_MODE’
SET_BINARY_MODE(stdin);
^
xxhsum.c: In function ‘main’:
xxhsum.c:72:33: warning: implicit declaration of function ‘_isatty’ [-Wimplicit-function-declaration]
# define IS_CONSOLE(stdStream) _isatty(_fileno(stdStream))
^
xxhsum.c:1226:33: note: in expansion of macro ‘IS_CONSOLE’
if ( (filenamesStart==0) && IS_CONSOLE(stdin) ) return badusage(exename);
With the following change:
$ git diff
diff --git a/xxhsum.c b/xxhsum.c
index 1928141..322cfd1 100644
--- a/xxhsum.c
+++ b/xxhsum.c
@@ -62,7 +62,7 @@
/*-************************************
* OS-Specific Includes
**************************************/
-#if defined(MSDOS) || defined(OS2) || defined(WIN32) || defined(_WIN32) || defined(__CYGWIN__)
+#if defined(MSDOS) || defined(OS2) || defined(WIN32) || defined(_WIN32)
# include <fcntl.h> /* _O_BINARY */
# include <io.h> /* _setmode, _isatty */
# ifdef __MINGW32__
We get:
$ gcc -DXXH_PRIVATE_API -O3 -march=native xxhsum.c -o xxhsum-cygwin
$ ./xxhsum-cygwin -h
./xxhsum-cygwin 0.6.2 (64-bits little endian), by Yann Collet
...
HTH.
http://www.xxhash.com should probably be changed to http://cyan4973.github.io/xxHash
Hi Yann,
Good Evening.
Hope all is well.
I try to run the script using xxh32sum and xx64sum file.Its show some error message.
-bash: /root/xxHash-master/xxh64sum: Argument list too long.
Our total file is 29000 Nos.
How many numbers of files can support xxh32sum and xxh64sum script.
Any alternate opensource or paid software is available to check the checksum with sub folder and files.
Thanks,
Chellasundar SR
Thank-you for your incredibly fast hash!
two issues on FreeBSD;
I have a patch for the Makefile; the options may not be correct but I won't know more until the deprecation issue is resolved.
commit b0db64779234163feb539bbf40129e72fd4df205
Author: Dave Cottlehuber <[email protected]>
Date: Tue Oct 28 09:26:45 2014 +0000
FreeBSD uses clang
diff --git a/Makefile b/Makefile
index c579ea6..d564ebd 100644
--- a/Makefile
+++ b/Makefile
@@ -23,9 +23,14 @@
# xxHash.exe : benchmark program, to demonstrate xxHash speed
# ################################################################
-CC=gcc
-CFLAGS+= -I. -std=c99 -O3 -Wall -Wextra -Wundef -Wshadow -Wstrict-prototypes
-
+uname_S := $(shell sh -c 'uname -s 2>/dev/null || echo not')
+ifeq ($(uname_S),FreeBSD)
+ CXX=clang++
+ CFLAGS+=-O3 -I. -DNDEBUG -Wall -Wno-sign-compare -Wno-unused -g -D_FILE_OFFSET_BITS=64 -D_LARGEFILE_SOURCE
+else
+ CC=gcc
+ CFLAGS+= -I. -std=c99 -O3 -Wall -Wextra -Wundef -Wshadow -Wstrict-prototypes
+endif
# Define *.exe as extension for Windows systems
ifneq (,$(filter Windows%,$(OS)))
But I still get an error because sys/timeb.h
is deprecated. Sorry as I'm not a C programmer I don't know if this is a trivial fix or a big portability nightmare.
dch /r/xxHash git:master ❯❯❯gmake
cc -O3 -I. -DNDEBUG -Wall -Wno-sign-compare -Wno-unused -g -D_FILE_OFFSET_BITS=64 -D_LARGEFILE_SOURCE xxhash.c xxhsum.c -o xxhsum
In file included from xxhsum.c:40:
/usr/include/sys/timeb.h:42:2: warning: "this file includes <sys/timeb.h> which is deprecated" [-W#warnings]
#warning "this file includes <sys/timeb.h> which is deprecated"
^
1 warning generated.
/tmp/dch/xxhsum-b4d529.o: In function `BMK_GetMilliStart':
/ramdisk/xxHash/xxhsum.c:122: undefined reference to `ftime'
/ramdisk/xxHash/xxhsum.c:122: undefined reference to `ftime'
/ramdisk/xxHash/xxhsum.c:122: undefined reference to `ftime'
/ramdisk/xxHash/xxhsum.c:122: undefined reference to `ftime'
/ramdisk/xxHash/xxhsum.c:122: undefined reference to `ftime'
/tmp/dch/xxhsum-b4d529.o:/ramdisk/xxHash/xxhsum.c:122: more undefined references to `ftime' follow
cc: error: linker command failed with exit code 1 (use -v to see invocation)
Makefile:47: recipe for target 'xxhsum' failed
gmake: *** [xxhsum] Error 1
Can you please provide your SMHasher tests? Since in my attempt xxHash32 failed the "Differential Tests": https://github.com/Bulat-Ziganshin/FARSH/blob/master/SMHasher/reports/smhasher-XXH32-report.txt#L81
I tried xxHash 0.6.1 with integration code as simple as https://github.com/Bulat-Ziganshin/FARSH/blob/master/SMHasher/xxHashTest.cpp#L4
Please do not strip the install targets from the Makefile. I am packaging xxHash for Gentoo Linux, and noticed that the Makefile is incomplete compared to what's in the git repository.
Hi,
5 months ago I created an alternative wrapper for xxHash in Ruby. Please consider adding it to http://cyan4973.github.io/xxHash/#other-languages.
The difference of it compared to the existing wrapper is that it stays consistent with how the Digest functions are implemented. I.e., digest
produces a string instead of a number. I added idigest
for producing the number instead. hexdigest
is also implemented. The string produced by digest
and hexdigest
are big-endian based.
Homepage: https://rubygems.org/gems/digest-xxhash
Source: https://github.com/konsolebox/digest-xxhash-ruby
Auto-generated documentation: http://www.rubydoc.info/gems/digest-xxhash/0.0.3
Is there any formal documentation of the xxhash algorithm itself? Some have concerns about the algorithm without something describing how it works.
Many Thanks,
Denis
Works great on Mac, but when used in an AWS Lambda function, this module throws an error during require():
Error: /var/task/node_modules/xxhash/build/Release/hash.node: invalid ELF header
at Error (native)
at Object.Module._extensions..node (module.js:597:18)
at Module.load (module.js:487:32)
at tryModuleLoad (module.js:446:12)
at Function.Module._load (module.js:438:3)
at Module.require (module.js:497:17)
at require (internal/module.js:20:19)
at Object.<anonymous> (/var/task/node_modules/xxhash/lib/xxhash.js:4:13)
at Module._compile (module.js:570:32)
at Object.Module._extensions..js (module.js:579:10)
The module is npm installed on mac and zipped for deployment to Lambda. Does it use something specific where it would need to be installed on AWS Linux only in order to work in Lambda? Haven't had to do that for any other modules yet.
A quick search shows we might have to npm install bcrypt? https://stackoverflow.com/questions/15809611/bcrypt-invalid-elf-header-when-running-node-app I haven't tried it as I just went for a crypto built-in hash, but if that is the case it would be nice to call out Mac support in the README of xxHash.
Original request by @42Bastian : #72
Inspired from the AVX2 discussion, I suggest following code for ARM targets:
Function:
FORCE_INLINE U32 XXH32_endian_align(const void* input, size_t len, U32 seed, XXH_endianess endian, XXH_alignment align)
`if (len>=16) {
const BYTE* const limit = bEnd - 16;
const uint32_t initial[4] = {
PRIME32_1 + PRIME32_2,
PRIME32_2,
0,
-PRIME32_1
};
U32 v1;
U32 v2;
U32 v3;
U32 v4;
uint32x4_t vseed = vdupq_n_u32 (seed); // v(0,1,2,3) = seed
uint32x4_t prime1 = vdupq_n_u32(PRIME32_1); // prime1(0,1,2,3) = prime1
uint32x4_t prime2 = vdupq_n_u32(PRIME32_2); // prime2(0,1,2,3) = prime2
uint32x4_t v = vld1q_u32 (initial); // read initial into vector
uint32x4_t input;
uint32x4_t tmp;
v += vseed;
do {
input = vld1q_u32((uint32_t )p);
p += 16;
/ round */
v = vmlaq_u32 (v, input, prime2); // seed += input * PRIME32_2;
tmp = vshrq_n_u32 (v, 19); // XXH_rotl32(seed, 13);
v = vsliq_n_u32 (tmp, v, 13);
v = vmulq_u32 (v, prime1); // seed *= PRIME32_1;
} while (p<=limit);
v1 = vgetq_lane_u32(v,0);
v2 = vgetq_lane_u32(v,1);
v3 = vgetq_lane_u32(v,2);
v4 = vgetq_lane_u32(v,3);
h32 = XXH_rotl32(v1, 1) + XXH_rotl32(v2, 7) + XXH_rotl32(v3, 12) + XXH_rotl32(v4, 18);
} else {
h32 = seed + PRIME32_5;
}`
On a ZYNQ (Cortex-A9) it nearly doubles speed.
Trying to build on Windows 7 x64 with Python 3 I get the following error:
python-xxhash.c
python-xxhash.c(363) : error C2099: initializer is not a constant
python-xxhash.c(687) : error C2099: initializer is not a constant
error: command 'C:\\Program Files (x86)\\Microsoft Visual Studio 10.0\\VC\\Bin\\
amd64\cl.exe' failed with exit status 2
Consider semantic versioning for xxHash. See http://semver.org/
I'm supporting an xxHash port to another programming language (Java), and ought to read all commits to determine whether I should put some fixes to the port, accordingly, or not. Neither version (r39, r40, ...), release messages (https://github.com/Cyan4973/xxHash/releases) nor commit messages point if hash algorithm was changed.
If you don't want to change versioning scheme, you could just put a short sentence in release messages, if this release has exactly the same semantics (improvements only), or has semantic changes.
Not a big deal on Unix, but kinda pita on Windows.
(it's not that complicated, bit chatty)
@echo off
setlocal enabledelayedexpansion
set EXE=xxhsum.exe
:: This batch adds "-r <folder>" argument for recursive hashing,
:: otherwise acts transparent.
if "%~1"=="-r" (
:: Strip "-r" and folder name from arguments that we are going to pass to xxhsum.exe
set ARGS=%*
set ARGS=!ARGS:%~1=!
set ARGS=!ARGS:%~2=!
:: Process root dir
call :process-dir "%~2" "%~2"
:: Process subdirs
for /D /r "%~f2" %%d in (*) do (
:: Dont feed xxhsum.exe with folders that contain no files
for /F %%_ in ('dir /b /a:-d "%%d" 2^>nul') do (
:: Process subdir only once (the above for loop will cycle once for each file in %%d)
if "%%d" neq "!LAST!" (
set LAST=%%d
call :process-dir "%%d" "%~2"
)
)
)
) else (
:: Be transparent
%EXE% %*
)
goto :eof
:: Usage: process-dir <the-dir-being-processed> <the-dir-from-r-arg>
:process-dir
:: Get relative path to dir
set REL=%~f1
set REL=!REL:%~f2=%~2!
%EXE% %ARGS% "%REL%\*"
goto :eof
Sorry to bother you, but why there are no binaries yet?
I would appreciate stepping into the future with xxHash for Windows x64 on board.
There are many *.iso, *.mkv, etc waiting to be processed with your rapid checksum gem.
It's great you try to keep it compatible with ISO C90 (ANSI C89), however, the typedef
s for fixed types is not C90 compatible, while you do not include stdint.h
(a header added in C99), the type long long
was also added in C99 and does not exist in C90.
This line is here has the non compatible definition.
The code generates the following warning messages under VS2015:
xxhash.cpp(565): warning C4804: '/': unsafe use of type 'bool' in operation
xxhash.cpp(576): warning C4804: '/': unsafe use of type 'bool' in operation
These warnings refer to the following line:
The seed variable is stored in the XXH32_state_s struct because it's used when XXH32_digest_endian() is called after less than 16 bytes. However, that means that state->v1...v4 are is still unmodified and especially state->v3 has it's initial value which was state->seed.
If you replace line 665 by "h32 = state->v3 + PRIME32_5;" then we can get rid of state->seed.
See my xxHash implementation at http://create.stephan-brumme.com/xxhash/ (written from scratch) for a proof-of-concept.
I haven't analyzed xxHash64 yet, but after a quick look at your code it seems we could apply the same trick.
in xxhash.h:
#if !(defined(__STDC_VERSION__) && (__STDC_VERSION__ >= 199901L)) /* ! C99 */
# define restrict /* disable restrict */
#endif
restrict
is technically only a keyword in C, not C++ yet this header needs to be includible from either. Most C++ compilers I guess either accept it as an extension or doesn't set __STDC_VERSION__
incorrectly. However, one host in my build farm (a Solaris x86_64 machine with gcc 4.7.3) is getting a compile error when this header is included from C++ code:
include/xxhash.h:233:61: error: expected ',' or '...' before 'dst_state'
include/xxhash.h:234:61: error: expected ',' or '...' before 'dst_state'
According to wikipedia https://en.wikipedia.org/wiki/Restrict
C++ does not have standard support for restrict, but many compilers have equivalents that usually work in both C++ and C, such as the GNU Compiler Collection's and Clang's restrict, and Visual C++'s __restrict and __declspec(restrict).
I'll try fixing it tomorrow, but I'll probably do something like:
#ifdef __cplusplus
# if defined(__GNUC__) || defined(__clang__)
# define XXH_RESTRICT __restrict__
# elif defined (_MSC_VER)
# define XXH_RESTRICT __restrict
# endif
#else
# if defined(__STDC_VERSION__) && (__STDC_VERSION__ >= 199901L) /* C99 */
# define XXH_RESTRICT restrict
# endif
#endif
#ifndef XXH_RESTRICT
# define restrict /* disable restrict */
#endif
I think defining an xxhash-specific macro like XXH_RESTRICT
is less intrusive than potentially defining restrict
which might affect how system headers, etc, are interpreted.
If you use the crc32 instruction properly, available since Nehalem (SSE 4.2), you can achieve throughput of 1.17 cycles per 8 bytes, which would be a theoretical performance of 20.5 GB/s on a 3ghz processor, under idealistic conditions. Source: http://www.drdobbs.com/parallel/fast-parallelized-crc-computation-using/229401411?pgno=2
Googling a little brings up this SO question, which quotes 20GB/s throughput, which matches up to the theoretical numbers very nicely: http://stackoverflow.com/questions/17645167/implementing-sse-4-2s-crc32c-in-software
Could you make a little note that hardware crc32 is actually ~3x faster than xxhash? That's not to say it's a more suitable hash algorithm, but I wasted considerable time considering a vectorized xxhash vs crc32 for checksum purposes, before I realized I couldn't come close to crc32 in performance.
For e. g. testing purposes I need to generate different sequences, whose XXH64 hash codes collide.
I've found this thread and applied (rather blindly) the same algorithm to XXH64, so I got
0xBA79078168D4BAF * 14029467366897019727 = 1 (mod 2^64)
-7046029288634856825 * 2^31 = 0xC2F5E54380000000 (mod 2^64)
0x9C90005B80000000 * -4417276706812531889 = -0xC2F5E54380000000 (mod 2^64)
0xBA79078168D4BAF
to the 64-bit value at position P
(in the little-endian order)0x9C90005B80000000
to the value at P + 32
.But it doesn't work (the hash codes are different).
Hi,
xxhash.c defines a couple of unsigned types (U16
, U32
, etc.) depending on the implemented C standard. Since they are also used as return types of public functions, declaration and definition differ in some cases:
xxhash.h: unsigned int XXH32_digest (const XXH32_state_t* statePtr);
xxhash.c: U32 XXH32_digest (const XXH32_state_t* statePtr);
XXH32_digest
uses unsigned long long
in both cases.
Wouldn't it be better to either use the U* types consistently in all declarations and definitions, or drop them altogether?
Regards,
Martin
Hi all – I saw all the different programming language implementations on the xxHash website. Is anyone posting binaries for standalone programs, presumably a CLI? I'm most interested in programs I could run on Windows 10, newer releases of Fedora/RHEL, and Alpine.
All I've seen so far are some online xxHash sites, but I'd rather run local.
Any tips on which library implementations would be easiest to compile into a program on my own? Maybe the Go versions?
Hi,
i tried install xxHash via pip and i get error:
`Collecting xxhash
Using cached xxhash-0.6.1.tar.gz
Building wheels for collected packages: xxhash
Running setup.py bdist_wheel for xxhash ... error
Complete output from command /bin/python35 -u -c "import setuptools, tokenize;file='/tmp/pip-build-k2nynhgm/xxhash/setup.py';f=getattr(tokenize, 'open', open)(file);code=f.read().replace('\r\n', '\n');f.close();exec(compile(code, file, 'exec'))" bdist_wheel -d /tmp/tmpv4u1_b9wpip-wheel- --python-tag cp35:
running bdist_wheel
running build
running build_ext
building 'xxhash' extension
creating build
creating build/temp.linux-x86_64-3.5
creating build/temp.linux-x86_64-3.5/xxhash
gcc -pthread -Wno-unused-result -Wsign-compare -DDYNAMIC_ANNOTATIONS_ENABLED=1 -DNDEBUG -O2 -g -pipe -Wall -Werror=format-security -Wp,-D_FORTIFY_SOURCE=2 -fexceptions -fstack-protector-strong --param=ssp-buffer-size=4 -grecord-gcc-switches -specs=/usr/lib/rpm/redhat/redhat-hardened-cc1 -m64 -mtune=generic -D_GNU_SOURCE -fPIC -fwrapv -fPIC -DVERSION=0.6.1 -I/usr/include/python3.5m -c python-xxhash.c -o build/temp.linux-x86_64-3.5/python-xxhash.o -std=c99 -O3 -Wall -W -Wundef -Wno-error=declaration-after-statement
gcc: error: /usr/lib/rpm/redhat/redhat-hardened-cc1: No such file or directory
error: command 'gcc' failed with exit status 1
Failed building wheel for xxhash
Running setup.py clean for xxhash
Failed to build xxhash
Installing collected packages: xxhash
Running setup.py install for xxhash ... error
Complete output from command /bin/python35 -u -c "import setuptools, tokenize;file='/tmp/pip-build-k2nynhgm/xxhash/setup.py';f=getattr(tokenize, 'open', open)(file);code=f.read().replace('\r\n', '\n');f.close();exec(compile(code, file, 'exec'))" install --record /tmp/pip-trdp7asl-record/install-record.txt --single-version-externally-managed --compile:
running install
running build
running build_ext
building 'xxhash' extension
creating build
creating build/temp.linux-x86_64-3.5
creating build/temp.linux-x86_64-3.5/xxhash
gcc -pthread -Wno-unused-result -Wsign-compare -DDYNAMIC_ANNOTATIONS_ENABLED=1 -DNDEBUG -O2 -g -pipe -Wall -Werror=format-security -Wp,-D_FORTIFY_SOURCE=2 -fexceptions -fstack-protector-strong --param=ssp-buffer-size=4 -grecord-gcc-switches -specs=/usr/lib/rpm/redhat/redhat-hardened-cc1 -m64 -mtune=generic -D_GNU_SOURCE -fPIC -fwrapv -fPIC -DVERSION=0.6.1 -I/usr/include/python3.5m -c python-xxhash.c -o build/temp.linux-x86_64-3.5/python-xxhash.o -std=c99 -O3 -Wall -W -Wundef -Wno-error=declaration-after-statement
gcc: error: /usr/lib/rpm/redhat/redhat-hardened-cc1: No such file or directory
error: command 'gcc' failed with exit status 1
----------------------------------------
Command "/bin/python35 -u -c "import setuptools, tokenize;file='/tmp/pip-build-k2nynhgm/xxhash/setup.py';f=getattr(tokenize, 'open', open)(file);code=f.read().replace('\r\n', '\n');f.close();exec(compile(code, file, 'exec'))" install --record /tmp/pip-trdp7asl-record/install-record.txt --single-version-externally-managed --compile" failed with error code 1 in /tmp/pip-build-k2nynhgm/xxhash/`
I'm attempting to use this tool to verify checksums created by Pomfort Silverstack. They never match, and the Silverstack developers say this is because they use little endian, and xxhsum uses big endian. Would it make sense to add this switch as a command-line option? Or at least specify in the documentation which version is being used, so that users aren't surprised when hashes don't match.
Hello,
Since XXH32_state_t
and XXH64_state_t
are incomplete types, memcpy doesn't work.
How about providing XXH*_copyState
?
Using 64 bit PowerPC and I'd like 32 bit hash result over a sequence of non-contiguous strings.
Is there any loss in hash quality if I am hashing a sequence of non contiguous strings using XXH64 and simply passing the result of each hash as the seed of the next XXH64 call? Also, I would only be taking the lower 32 bits of the final result as my final single 32 bit hash value representing the sequence of strings.
Subsequent hashes that are expected to be equal will be done against the exact same sequence of strings. In other words, I have no need for the final hash of this sequence "STRING1" , "STRING2" to be the same as the final hash of "STRIN", "G1STRING2"
My current code uses CRC32 and does the above (passing the intermediate result into the next string's as a seed)
Thanks.
From Mikkel Fahnøe Jørgensen :
I got a few compiler warnings with my default pedenatic build settings that I just wanted to mention, see below.
[2/3] Building C object test/CMakeFiles/performance_lmv.dir/__/external/xxhash.c.o
../../external/xxhash.c:538:2: warning: extra ';' outside of a function [-Wextra-semi]
};
^
../../external/xxhash.c:549:2: warning: extra ';' outside of a function [-Wextra-semi]
};
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.