Code Monkey home page Code Monkey logo

blake2s-opt's Introduction

ABOUT

This is a portable, performant implementation of BLAKE2s using optimized block compression functions. The compression functions are tree/parallel mode compatible, although only serial mode (singled threaded, the common use-case) is currently implemented.

BLAKE2s is a 256 bit hash, i.e. the hashes produced are 32 bytes long.

All assembler is PIC safe.

INITIALIZING

The library can be initialized, i.e. the most optimized implementation that passes internal tests will be automatically selected, in two ways, neither of which are thread safe:

  1. int blake2s_startup(void); explicitly initializes the library, and returns a non-zero value if no suitable implementation is found that passes internal tests

  2. Do nothing and use the library like normal. It will auto-initialize itself when needed, and hard exit if no suitable implementation is found.

CALLING

Common assumptions:

  • When using the incremental functions, the blake2s_state struct is assumed to be word aligned, if necessary, for the system in use.

ONE SHOT

in is assumed to be word aligned. Incremental support has no alignment requirements, but will obviously slow down if non word-aligned pointers are passed.

void blake2s(unsigned char *hash, const unsigned char *in, const size_t inlen);

Hashes inlen bytes from in and stores the result in hash.

void blake2s_keyed(unsigned char *hash, const unsigned char *in, const size_t inlen, const unsigned char *key, size_t keylen);

Hashes inlen bytes from in in keyed mode using key, and and stores the result in hash. keylen must be <= 32.

INCREMENTAL

Incremental in buffers are not required to be word aligned. Unaligned buffers will require copying to aligned buffers however, which will obviously incur a speed penalty.

void blake2s_init(blake2s_state *S)

Initializes S to the default state.

void blake2s_keyed_init(blake2s_state *S, const unsigned char *key, size_t keylen)

Initializes S in keyed mode with key. keylen must be <= 32.

void blake2s_update(blake2s_state *S, const unsigned char *in, size_t inlen)

Updates the state S with inlen bytes from in in.

void blake2s_final(blake2s_state *S, unsigned char *hash)

Performs the final pass on state S and stores the result in to hash.

Examples

HASHING DATA WITH ONE CALL

size_t bytes = ...;
unsigned char data[...] = {...};
unsigned char hash[32];

blake2s(hash, data, bytes);

HASHING INCREMENTALLY

Hashing incrementally, i.e. with multiple calls to update the hash state.

size_t bytes = ...;
unsigned char data[...] = {...};
unsigned char hash[32];
blake2s_state state;
size_t i;

blake2s_init(&state);
/* add one byte at a time, extremely inefficient */
for (i = 0; i < bytes; i++) {
    blake2s_update(&state, data + i, 1);
}
blake2s_final(&state, hash);

VERSIONS

Reference

There are 3 reference versions, specialized for increasingly capable systems from 8 bit only operations (with the world's most inefficient portable carries, you really don't want to use this unless nothing else runs) to unrolled 32 bit.

x86 (32 bit)

x86-64

From what I've seen, the x86-64 compatible version is usually slower than the optimized SIMD version for that platform, so it is not included.

ARM

I attempted a NEON version, but it is almost entirely serial NEON, and the NEON latencies were too high to overcome the ARMv6 version.

BUILDING

See asm-opt#configuring for full configure options.

If you would like to use Yasm with a gcc-compatible compiler, pass --yasm to configure.

The Visual Studio projects are generated assuming Yasm is available. You will need to have Yasm.exe somewhere in your path to build them.

STATIC LIBRARY

./configure
make lib

and make install-lib OR copy bin/blake2s.lib and app/include/blake2s.h to your desired location.

SHARED LIBRARY

./configure --pic
make shared
make install-shared

UTILITIES / TESTING

./configure
make util
bin/chacha-util [bench|fuzz]

BENCHMARK / TESTING

Benchmarking will implicitly test every available version. If any fail, it will exit with an error indicating which versions did not pass. Features tested include:

  • One-shot hashing
  • Incremental hashing
  • Counter handling when the 32-bit low half overflows to the upper half

FUZZING

Fuzzing tests every available implementation for the current CPU against the reference implementation. Features tested are:

  • Arbitrary starting state
  • Arbitrary starting counter

BENCHMARKS

Only the top 3 benchmarks per mode will be shown. Anything past 3 or so is pretty irrelevant to the current architecture.

Implemenation1 byte576 bytes8192 bytes
SSSE3-64 433 6.10 5.91
SSSE3-32 500 6.27 5.89
SSE2-64 505 7.18 7.04
SSE2-32 575 7.42 7.05
x86-32 754 10.15 9.87

Timings are with Turbo Boost and Hyperthreading, so their accuracy is not concrete. For reference, OpenSSL and Crypto++ give ~0.8cpb for AES-128-CTR and ~1.1cpb for AES-256-CTR, ~7.4cpb for SHA-512, and ~4.5cpb for MD5.

Implemenation1 byte576 bytes8192 bytes
AVX-64 355 5.10 5.02
SSSE3-64 356 5.10 5.03
SSSE3-32 384 5.17 5.05
AVX-32 390 5.24 5.11
SSE2-64 411 5.92 5.88
SSE2-32 437 6.03 5.91

AMD FX-8120

Timings are with Turbo on, so accuracy is not concrete. I'm not sure how to adjust for it either, and depending on clock speed (3.1ghz vs 4.0ghz), OpenSSL gives between 0.73cpb - 0.94cpb for AES-128-CTR, 1.03cpb - 1.33cpb for AES-256-CTR, 10.96cpb - 14.1cpb for SHA-512, and 4.7cpb - 5.16cpb for MD5.

Implemenation1 byte576 bytes8192 bytes
XOP-64 512 6.80 6.61
XOP-32 523 6.90 6.66
AVX-64 611 8.68 8.54
SSSE3-64 604 8.69 8.56
SSSE3-32 646 8.86 8.59
AVX-32 664 9.03 8.80

ZedBoard (Cortex-A9)

I don't have access to the cycle counter yet, so cycles are computed by taking the microseconds times the clock speed (666mhz) divided by 1 million. For comparison, on long messages, OpenSSL 1.0.0e gives 52.3 cpb for AES-128-CBC (woof), ~123cpb for SHA-512 (really woof), ~49.11cpb for SHA-256, ~16.38 for SHA-1, and ~9.6cpb for MD5.

Implemenation1 byte576 bytes8192 bytes
ARMv6-32 1014 13.26 12.87
Generic-32 1806 22.06 21.20

LICENSE

Public Domain, or MIT

blake2s-opt's People

Contributors

floodyberry avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.