cyan4973 / finitestateentropy Goto Github PK

View Code? Open in Web Editor NEW

1.3K 82.0 141.0 14.54 MB

New generation entropy codecs : Finite State Entropy and Huff0

License: BSD 2-Clause "Simplified" License

C 98.51% Makefile 1.49%

huffman c fse compression entropy

finitestateentropy's People

Contributors

Stargazers

Watchers

Forkers

bulat-ziganshin geeksville rtvt123 pierrec i-e-b nemesisqp nealburns-achieveinternet imvu pombredanne flaing chowyuncat paulshirley nagyistoce macressler claudiouzelac ekandrot jarekduda ashumeow wellsguo wenyisu lorrcan lilongcnc ifzz advanpix datacompression archerbroler arkadiuszsz nishanthvasudevan katietyler826 rjmcguire mandeep19 qezt tv-s razumit rockeet terrelln zihua headupinclouds krzysfr cchtinna personifyinc jakirkham sean-purcell j4m3z0r hieuttcse tujianwei ashehussain abhi-infrrd codekatana bearrundr alkeldi rajeshgangam klauspost lazyuncle theassyrian krzysztofsakowski rafaelmri jl777 andreykulik thomash99 lidwt721 zeldazhang gasim92 shannonyu jangwonpark74 pl0q1n almostnomemory zhaolc90 xu188618861886 1shekhar iiseymour ccf19881030 syelee brolerbin sahwar pappuks ryanbekabe jinfeihan57 marcelomata icaas shamx9ir arthas1121 ivash yeahwgy onlyrobot dushishuang beebreeze xirdigh harwin0329 ibelee lzxx57 todda yfor1008 pseitz 3vfxvc becker345 icloudsong fredyfrg nathanhack zhuomingliang

finitestateentropy's Issues

NULL pointer dereference in BIT_reloadDStream()

Crashing line:
430: bitD->bitContainer = MEM_readLEST(bitD->ptr);
in bitstream.h in the function BIT_reloadDStream()
triggered by FSE_decompressU16() with the following code:

const uint8_t* casted2 = (uint8_t*)"Àâ(¢x(Kùÿÿcb¿\a";
uint16_t out[256];
size_t ret = FSE_decompressU16(out, 256, casted2, 14);

This is caused by
size_t const NSize = FSE_readNCount (NCount, &maxSymbolValue, &tableLog, istart, cSrcSize);
returning zero in the following block of FSE_decompressU16():

{ size_t const NSize = FSE_readNCount (NCount, &maxSymbolValue, &tableLog, istart, cSrcSize);
if (FSE_isError(NSize)) return NSize;
ip += NSize;
cSrcSize -= NSize;
}
thus, resulting in a zero cSrcSize when entering FSE_decompressU16_usingDTable() which is unexpected

I don't know if this can happen in the 8 bit version.

Error 39 : Decoding error : Destination buffer is too small

I tried diagnosis of the source code and using the clang-6.0 reports to find the issue for 10min, but wasn't lucky. My best guess is that the clang diagnosis is right and in lib/huf_decompress.c line 422 symbol is not assigned correctly to a U16, as it was formerly stored as a BYTE within assembling sortedSymbol in HUF_readDTableX4 line 494. Is the read out of bounds for weight and symbol on purpose?

Best Eodj

Full output:
make -C programs test
make[1]: Verzeichnis „/home/kvothe/Arbeit/EntropyCoder/FiniteStateEntropy/programs“ wird betreten
./probagen 20%
Binary file generator
Generating 1023 KB with P=20.00%
File proba.bin generated
**** compress using FSE ****
./fse -f proba.bin tmp
FSE : Finite State Entropy, 64-bits demo by Yann Collet (Mar 29 2018)
Compressed 1048575 bytes into 474414 bytes ==> 45.24%
./fse -df tmp result
FSE : Finite State Entropy, 64-bits demo by Yann Collet (Mar 29 2018)
Error 39 : Decoding error : Destination buffer is too small
Makefile:111: die Regel für Ziel „test-fse“ scheiterte
make[1]: *** [test-fse] Fehler 39
make[1]: Verzeichnis „/home/kvothe/Arbeit/EntropyCoder/FiniteStateEntropy/programs“ wird verlassen
Makefile:36: die Regel für Ziel „test“ scheiterte
make: *** [test] Fehler 2

Why Huffman code in reverse order？

I'm very interested in compression.Huffman coding can be performed sequentially，I can't understand why huffman code is executed in reverse order in the code.

Inconsistencies in stream format

I was looking into porting the streaming format, but hit some inconsistencies.

First of, this seems like a straight up bug: https://github.com/Cyan4973/FiniteStateEntropy/blob/dev/programs/fileio.c#L374

Note that the write is going to the wrong offsets.

Secondly the streaming format doc states that " max block size, 2^value from 0 to 0xA". This seems false, since the maximum block size if 16 bits, so the uncompressed block size can be stored in 2 bytes. Technically I guess you can have bigger values, but that would force you to have "full sized" blocks, but decompressing them will fail this check.

Thirdly it states that in regenerated size that "0 = 64 KB". This seems misleading if I am reading the code correct. There is no special value for 0 zeroes. However, if bit 5 (full block) is set, the block size is assumed to be the size set in the stream header (1 << n+10). So the 64KB isn't a special value.

I will probably implement a slightly different streaming format instead, since this seems a bit too flaky and I would like to have the option of bigger blocks.

Can't compile on cygwin64.

In fileio.c and in commandline.c , both of the defined (CYGWIN) lines must be removed to get it to compile as IS_CONSOLE and SET_BINARY_MODE are not supported.

Comparison to arithmetic coding?

I know the wiki says that the performance is similar, but can we get a benchmark comparing processing time and compression ratio to know exactly how well it performs compared to it's closest competition?

FSE performance optimizations

Hello, I am working on the FSE performance optimizations and trying to implement 8-states instead of the default 2-states, which might accelerate the decompressing by processing the 8-states in parallel. To process the 8-states, I changed the bitContainer to a __m128i type and modified all the related functions. Some files were compressed and decompressed correctly for the tests, but some were corrupted because of minor differences between the decoded and the original files.
I tried to find the bug but couldn't make it. I would like to know if it is possible to have someone check my code or discuss it with me? Thank you!

FSE_isError and FSE_getErrorName

I feel like I might be running into an issue with the error codes.

When FiniteStateEntropy finds that it is simply one symbol is repeated many times, it should send back a code suggesting Run Length Encoding.

Also when it does not manage to make it smaller, it should send back another code.

But FSE_isError and FSE_getErrorName don't do what I would expect here:

What is intended?

#include "fse.h"
#include <stdio.h>
#include <stdlib.h>

int
main()
{
  char* out_buffer = (char*)malloc(1000);
  if (out_buffer == NULL)
    exit(1);
  size_t maxlength = 1000;

  // gives an error code of 1?
  char* input = "aaaaaaaaaaaaaaaaaaaaaaaaaaaaaa";
  size_t in_length = 30;

  // gives an error code of 0?
  // char* input = "ab";
  // size_t in_length = 2;

  // works without error
  // char* input = "abaaaaaaaaaaaaaaaaaabaaaaaaaaa";
  // size_t in_length = 30;

  size_t size = FSE_compress(out_buffer, maxlength, input, in_length);

  printf("size = %d\n", (int)size);

  if (size <= 1) { // sometimes the size is an error code

    printf("\n[%s]\n\n", FSE_getErrorName(size));

    printf("We infer there was probably some kind of error since size = %zu\n",
           size);
    if (size == 0) {
      printf("\n\n*********************************\nFailed to get "
             "smaller?\n+++++++++++++++++++++++++++++++++\n\n");
    }
    if (size == 1) {
      printf("\n\n*********************************\nsource is a single symbol "
             "a bunch of times?\n+++++++++++++++++++++++++++++++++\n\n");
    }
  }
  if (FSE_isError(size)) { // should detect when size is an error code
    printf("Considered to be an error\n");
  } else {
    printf("Not considered to be an error\n");
  }

  return 0;
}

decompression of 1-byte length buffer

Reported by Przemyslaw Skibinski :
Initial discussion thread :
http://encode.ru/threads/1845-Finite-State-Entropy?p=42655&viewfull=1#post42655

superflous semicolon?

This line looks suspicious:

https://github.com/Cyan4973/FiniteStateEntropy/blob/master/fse.c#L185

worst case compression wrongly evaluated

From Artem Drach :
https://groups.google.com/forum/#!topic/lz4c/SnFzpmyzXKU

Negative index into stateTable?

This is more a question:
How can you guarantee that this index to stateTable cannot become negative?

FiniteStateEntropy/lib/fse.h

Line 520 in 7149fa4

    
           statePtr->value = stateTable[ (statePtr->value >> nbBitsOut) + symbolTT.deltaFindState];

Thanks!

Inline FSE_abs usages

FSE_abs is defined twice, making problem for amalgated compilation. Can you rename one of the definitions or just inline those two calls?

And of course, do the same change in ZSTD

Decoding error : Destination buffer is too small

Hello, this might be a n00b question but at the time of "make" I am getting following error
./fse -f proba.bin tmp FSE : Finite State Entropy, 64-bits demo by Yann Collet (Feb 20 2018) Compressed 1048575 bytes into 474414 bytes ==> 45.24% ./fse -df tmp result FSE : Finite State Entropy, 64-bits demo by Yann Collet (Feb 20 2018) Error 39 : Decoding error : Destination buffer is too small Makefile:111: recipe for target 'test-fse' failed make[1]: *** [test-fse] Error 39
If I try to comment out the test-fse in MakeFile, I manage to compile it without error but the encoder/decoder does not work and gives random erros (such as segmentation fault when i use -z flag and Error 39 : Decoding error : Corrupted block detected when i use huff0)

Could you kindly tell me what maybe causing the error? I am trying with dev branch.

UPDATE: I have also tried to compile it in different machine, and it is the same problem.
Specs are: i7-4790, 8GB with Ubuntu 16.04

question: FSE doesn't always compress better than huf - expected?

I can't share the data, but in short, it's 9108363519 bytes (~9GB) of almost-uncompressible data (IIRC it's the 9GB tail of a larger already-compressed stream).

% ./fse -e ./almost-uncompressable
Compressed 9108363519 bytes into 8992064047 bytes ==> 98.72%
% ./fse -h ./almost-uncompressable
Compressed 9108363519 bytes into 8943423537 bytes ==> 98.19%
% ./fse -z ./almost-uncompressable
Compressed 9108363519 bytes into 8944678105 bytes ==> 98.20%

Granted that I don't know the intimate details of FSE and this is a near-pathological case, but I'd have expected the two huffman implementations to fare rather worse than FSE on these almost-but-not-quite-uniform distributions of data.

Am I wrong?

Cheers.

FSE_compress returns 0 even when maxDstSize is still equal to or slightly larger than final compression size

While adding more unit tests, I found a corner case when trying to compress a document with a maxDstSize which is exactly the expected compressed size (found by a previous compression attempt): in this test, FSE_compress(..) returns 0 (uncompressible data) while previously it was able to compress the same input (given a larger destination buffer).

I'm doing a two-step process:

Allocate a buffer large enough (using FSE_compressBound), and compress the source by calling FSE_compress(..., maxDstSize: FSE_compressBound(ORIGINAL_SIZE)), and measure its final compressed size COMPRESSED_SIZE.
Repeat the process on the same buffer and payload, but this time calling FSE_compress(..., maxDstSize: COMPRESSED_SIZE). I get a result code of 0 which means that data is uncompressible.

I tried probing the minimum size that will allow compressing the buffer (which is known to be compressible), and each time I need to call FSE_compress(..) with at least COMPRESSED_SIZE + 8. At first I thought it could be a pointer alignment issue, but it is always + 8 bytes whatever compressed size is (by changing the source a bit).

In my test, raw original size is 4,288 bytes, FSE_compressBound return 4,833 bytes, and the compressed size is 2,821 bytes. I need to pass at least 2,823+8 = 2,829 bytes for FSE_compress to succeed (return value > 0).

Is this expected behavior? I'm not sure if the "+8 rule" is true, or if this is random chance with the inputs I'm passing in.

a step in normalization of count

I am new here and still reading the source code. In this line, why normalizedCounter[s] = -1? In my opinion, -1 should be 1. Could you please explain it to me? Thanks a lot.

FiniteStateEntropy/lib/fse_compress.c

Line 459 in 12a533a

normalizedCounter[s] = -1;

Crash when FSE_MAX_MEMORY_USAGE 13

Clone current master
Edit FSE_MAX_MEMORY_USAGE to 13
fse blows up its stack when compressing

Valgrind and gdb traces are useless. Changing DEBUGLEVEL to 1 in debug.h does not help, no assert fires. Compiling with -fstack-protector says "*** stack smashing detected ***" and gdb says it came when exiting FSE_compress2 at ../lib/fse_compress.c:706.

Integer division by zero in FSE_normalizeM2

Hi, I get sometimes integer division by zero error from FSE_normalizeM2, when having large max symbol values. Specifically, I am using following compiler defines to compile FSE:

FSE_DEFAULT_MEMORY_USAGE=14
FSE_MAX_MEMORY_USAGE=16
FSEU16_DEFAULT_MEMORY_USAGE=14
FSEU16_MAX_MEMORY_USAGE=16
FSEU16_MAX_SYMBOL_VALUE=4095

Please find below example code that will reproduce the error.

#include "fse.h"

void test()
{
	uint16_t sourceBuffer[64] =
	{
		4048, 1072, 1532, 1936, 2252, 2460, 2536, 2480, 2292, 1988, 1596, 1140, 668, 200, 
		207, 535, 747, 839, 795, 615, 327, 64, 512, 988, 1456, 1868, 2208, 2432, 2532,
		2500, 2332, 2048, 1672, 1224, 748, 280, 139, 483, 723, 831, 811, 659, 383, 11,
		432, 904, 1376, 1800, 2152, 2404, 2524, 2516, 2372, 2104, 1744, 1304, 832, 364,
		71, 431, 687, 823, 827, 263
	};
	uint16_t destinationBuffer[64];
	static_assert(FSEU16_MAX_SYMBOL_VALUE == 4095, "Custom fseu16 max symbol value");
	FSE_compressU16(destinationBuffer, 64, sourceBuffer, 64, 4048, 13);
	/* -> Integer division by zero at fse_compress.c line 540 (FSE_normalizeM2) */
}

Thanks,
Markus

Feature request: safe decompression for U16 coder

The 8-bit decompressor has a safe function which guarantees it will not read outside of the compressed buffer. It would be great if the 16-bit decompressor also had that functionality.

FSE_compressU16() computes clipped data if dstCapacity different

Repro:

constexpr size_t inSize = 24;
uint16_t in[inSize] =
  {0, 0, 3, 2, 0, 0, 0, 0, 314, 0, 0, 0, 0, 51, 50, 0, 0, 59, 22, 36, 0, 55, 32, 22};
unsigned int maxSymbolValue = 314;
unsigned int outSize = 48;

uint8_t out1[256];
uint8_t out2[256];
size_t        numBytes1 = FSE_compressU16(out1,
                                   outSize,
                                   in,
                                   inSize,
                                   maxSymbolValue,
                                   0);

size_t numBytes2 = FSE_compressU16(out2,
                                   256,
                                   in,
                                   inSize,
                                   maxSymbolValue,
                                   0);
// numBytes1 is now 34 bytes
// numBytes2 is now 43 bytes

FSE_decompressU16(in, inSize, out2, numBytes2);
FSE_decompressU16(in, inSize, out1, numBytes1); // <- Crashes

When output bytes is 48 rather than 256 as an argument for FSE_compressU16(), all generated bytes are identical up until 34 bytes. The rest of the bytes are clipped. The difference in code paths is that in the 34 bytes case, the non-fast path is chosen.

resync with changes from zstd

zstd has accumulated a lot of little fixes and changes to its embedded fse/huff0. Would be great to see those backported to standalone FiniteStateEntropy if plausible.

Build fails with link errors

I fetched the latest code today and tried to cd test ; make, but this failed with undefined symbol errors:

~/git/FiniteStateEntropy/test$ make
gcc      -O3 -I. -std=c99 -Wall -W -Wundef bench.c commandline.c fileio.c lz4hce.c xxhash.c ../fse.c -o fse
Undefined symbols:
  "_FSED_compressU16", referenced from:
      _BMK_benchMemU16 in ccynhgmC.o
  "_FSED_decompressU16", referenced from:
      _BMK_benchMemU16 in ccynhgmC.o
ld: symbol(s) not found
collect2: ld returned 1 exit status
make: *** [fse] Error 1

This change seems to resolve the problem:

diff --git a/test/Makefile b/test/Makefile
index bd63dcc..089c55e 100644
--- a/test/Makefile
+++ b/test/Makefile
@@ -40,10 +40,10 @@ default: fse

 all: fse fse32 probagen

-fse: bench.c commandline.c fileio.c lz4hce.c xxhash.c ../fse.c
+fse: bench.c commandline.c fileio.c lz4hce.c xxhash.c fseDist.c ../fse.c
        $(CC)      -O3 $(CFLAGS) $^ -o $@$(EXT)

-fse32: bench.c commandline.c fileio.c lz4hce.c xxhash.c ../fse.c
+fse32: bench.c commandline.c fileio.c lz4hce.c xxhash.c fseDist.c ../fse.c
        $(CC) -m32 -O3 $(CFLAGS) $^ -o $@$(EXT)

 probagen: probaGenerator.c

can the opt huff0 integrate into libjpeg-turbo？

i wander to know the huff0 which have been extremelyoptimised can be integrate into libjpeg-turbo？

Feature: high-level pseudocode in README

Would be great if you could add pseudocode to the README. I'd love to understand how the algorithm works, but there's too much C for me to understand, and the whitepaper is not good for a layman. :/

How should we understand this function ‘FSE_normalizeM2’？

I am new here and still reading the source code.I can't understand these calculations

https://github.com/Cyan4973/FiniteStateEntropy/blob/dev/lib/fse_compress.c
Line 415~429 in fse_compress.c

Add Visual project

As requested by @josephernest :
#2 (comment)

Build fails because 32 bit absolute addressing isn't supported

I'm on OS X, just did a fresh clone of the repo, and clang outputs the following when calling make.

I'm currently using the beta of Xcode 8, but this issue has been ongoing for a while.

make CFLAGS="-march=native -ofast" LDFLAGS="-flto"

/Applications/Xcode-beta.app/Contents/Developer/usr/bin/make -C programs test
cc -I../lib -march=native -ofast -flto probaGenerator.c -o probagen
cc -I../lib -march=native -ofast -flto ../lib/huf_decompress.c ../lib/entropy_common.c bench.c commandline.c fileio.c xxhash.c zlibh.c ../lib/fse_decompress.c ../lib/fse_compress.c ../lib/fseU16.c ../lib/huf_compress.c -o fse
./probagen 20%
Binary file generator
Generating 1023 KB with P=20.00%
File proba.bin generated
**** compress using FSE ****
./fse -f proba.bin tmp
FSE : Finite State Entropy, 64-bits demo by Yann Collet (Sep 4 2016)
Compressed 1048575 bytes into 474414 bytes ==> 45.24%
./fse -df tmp result
FSE : Finite State Entropy, 64-bits demo by Yann Collet (Sep 4 2016)
Decoded 1048575 bytes
diff proba.bin result
**** compress using HUF ****
./fse -fh proba.bin tmp
FSE : Finite State Entropy, 64-bits demo by Yann Collet (Sep 4 2016)
Compressed 1048575 bytes into 478412 bytes ==> 45.62%
./fse -df tmp result
FSE : Finite State Entropy, 64-bits demo by Yann Collet (Sep 4 2016)
Decoded 1048575 bytes
diff proba.bin result
**** compress using zlibh ****
./fse -fz proba.bin tmp
FSE : Finite State Entropy, 64-bits demo by Yann Collet (Sep 4 2016)
Compressed 1048575 bytes into 478213 bytes ==> 45.61%
./fse -df tmp result
FSE : Finite State Entropy, 64-bits demo by Yann Collet (Sep 4 2016)
Decoded 1048575 bytes
diff proba.bin result
rm result
rm proba.bin
rm tmp
cc -I../lib -march=native -ofast -flto fullbench.c xxhash.c ../lib/fse_decompress.c ../lib/fse_compress.c ../lib/fseU16.c ../lib/huf_compress.c ../lib/huf_decompress.c ../lib/entropy_common.c -o fullbench
LLVM ERROR: 32-bit absolute addressing is not supported in 64-bit mode
clang: error: linker command failed with exit code 1 (use -v to see invocation)
make[1]: *** [fullbench] Error 1
make: *** [test] Error 2

argument order bug?!?

on line 466 of fse_compress.c
FSE_CTable* FSE_createCTable (unsigned maxSymbolValue, unsigned tableLog)

then on line 186 of fse.h
FSE_PUBLIC_API FSE_CTable* FSE_createCTable (unsigned tableLog, unsigned maxSymbolValue);

arguments are different order... which is correct? are callers to this function getting them backwards?

Compression wedges

Commit d5ff8d4 works for me. The latest one though wedges because nbSymbols has become -1. This is the "fse foo -o bar" syntax. I think it's related to using the 286 symbol alphabet.

Corrupted block detected error reported in -hb mode

The following file produces the following error:

./programs/fse -bh ./badfile FSE : Finite State Entropy, 64-bits demo by Yann Collet (Nov 9 2018) !! Error decompressing block 986 of cSize 12110 !! => (Corrupted block detected)

File is way too large to attach to ticket. It can be downloaded off my personal webiste:
www.michael-maniscalco.com/download/badfile.zip

Architecture specific optimizations

Hello, I would like to know if it is possible to have ARM's SIMD (neon) routines to be added in huff0 and/or FSE encode/decode parts? That way, I can make them run a bit faster on raspberry pi.

cmake tree

I'd like to upstream cmake build support, how do I go about doing that?

benchmark mode claims a decompression error on this data w/huf

 % ./fse -h -b ./xxx    
FSE : Finite State Entropy, 64-bits demo by Yann Collet (Jul 17 2020)
!! Error decompressing block 4 of cSize 18041 !! => (Corrupted block detected)

gunzip the below file and run the above.
xxx.gz
The 'xxx' file appears to survive a huf compress and then a huf decompress intact when doing them individually, so perhaps this is an issue specific to the benchmark mode.

Tested with 3865a70

Could use updated benchmarks; identify OS

A few notes on the benchmarks:

They're out of date

The benchmarks here on the main GitHub page have been stuck on Zstd 0.8.2 and brotli 0.4 for a while.
The benchmarks on the separate Zstd website use Zstd 1.0.0, and still brotli 0.4. (The website I'm referring to is listed as zstd.net, but redirects to http://facebook.github.io/zstd/ – github.io is not the GitHub you are on right now, but is their custom web hosting.)

Given all the releases since those benchmarks, it would be helpful to have updated results for Zstd ≥ 1.1.4 and brotli ≥ 0.5.2.

Different compilers

Note also that the benchmarks here on GitHub use two different compilers: gcc 5.4 for the table, and gcc 5.2.1 for the graphs. Unless there's a compelling reason, there's no point in introducing a confound like two different compilers.

No operating system

There's no mention of an OS in any of the benchmarks, not even broad families like Linux vs. Windows.

Summary

Ideally, what we want is: updated builds, same and updated compiler, identify the OS, report the amount of memory (and type) the system has and whether it's SSD or spinner (unless the benchmarks are in-memory only).

I've got some useful machines for benchmarking, but lzbench only works with gcc (on Windows). I think @inikep and others would be better for gcc benchmarks. I could do Visual Studio 2017 or 2015 benchmarks.

FSEU16_MAX_MEMORY_USAGE is defined at fseU16.c not a fseU16.h

When I test the FSE-16bit version, I found some issue about the FSE_MAX_MEMORY_USAGE.
This is the code I tested.

#include <stdio.h>
#include <stdlib.h>
#include <stdint.h>
#include "fseU16.h"

int main() {
    // Change the FSEU16_MAX_SYMBOL_VALUE to 4095
    int n = 10000, max = 4095;
    uint16_t *ip = malloc(sizeof(uint16_t) * n);
    uint8_t *op = malloc(sizeof(uint8_t) * 2 * n);

    for (int i=0; i<n; i++){
        ip[i] = i % max;
    }

    size_t compress = FSE_compressU16(op, 2 * n, ip, n, max, 0);
    printf("Code : %zd\n", compress);

    free(ip);
    free(op);
}

I include fseU16.h to use 16bit version, so I expected FSE_MAX_MEMORY_USAGE is 15.
However, FSEU16_MAX_MEMORY_USAGE is defined on this line at fseU16.c, so at this code FSE_MAX_MEMORY_USAGE is 14 by fse.h.

Before I make PR about this, I would like to ask if this is intended.

GCC and LLVM CLANG Stats

Hi,

I ran these on a mediocre Core2Duo with 1.3GHz and 4GB DDR on Gentoo Linux with the latest kernel and thought that sharing these stats might be useful. I've not integrated Yepp! into FSE, but would be curious, if it could bring any advantage. (The -lyeppp flag was used in the hope that it would have a positive effect.) I have tried various combinations of gcc/clang flags and none have had a positive effect, except -02 in combination with -lyeppp, but most visibly the usage of -funroll-loops in combination with clang version 3.3.

FSE : Finite State Entropy, capability demo by Yann Collet (Jan 12 2014)

File already compressed

GCC
../data/win98-lz : 4671615 -> 4671758 (100.0%), 73.0 MB/s , 1420.2 MB/s
GCC -funroll-loops
../data/win98-lz : 4671615 -> 4671758 (100.0%), 76.5 MB/s , 1405.2 MB/s
GCC -funroll-loops -lyeppp
../data/win98-lz : 4671615 -> 4671758 (100.0%), 75.9 MB/s , 1420.9 MB/s

CLANG
../data/win98-lz : 4671615 -> 4671758 (100.0%), 78.4 MB/s , 1409.0 MB/s
CLANG -funroll-loops
../data/win98-lz : 4671615 -> 4671758 (100.0%), 78.4 MB/s , 1418.3 MB/s
CLANG -funroll-loops -lyeppp
../data/win98-lz : 4671615 -> 4671758 (100.0%), 78.3 MB/s , 1431.4 MB/s

File is uncompressed

GCC
../data/win98-lz : 12536244 -> 4671591 (37.26%), 73.0 MB/s , 96.9 MB/s
GCC -funroll-loops
../data/win98-lz : 12536244 -> 4671591 (37.26%), 76.5 MB/s , 112.9 MB/s
GCC -funroll-loops -lyeppp
`../data/win98-lz : 12536244 -> 4671591 (37.26%), 76.5 MB/s , 112.9 MB/s

CLANG
../data/win98-lz : 12536244 -> 4671591 (37.26%), 78.2 MB/s , 107.9 MB/s
CLANG -funroll-loops
../data/win98-lz : 12536244 -> 4671591 (37.26%), 78.2 MB/s , 108.0 MB/s
CLANG -funroll-loops -lyeppp
../data/win98-lz : 12536244 -> 4671591 (37.26%), 78.6 MB/s , 108.0 MB/s

EDIT:
I have been thinking for several months about entropy, the universe and the use and state of compression in computer science. I've also used entropy as a main theme in my thesis. What strikes me is that the similarity between this and a neural networks is diminishing, if you chained multiple FSE's into a multi-layer network. Thought leader in this region is currently the work of Prof. Dr. Jürgen Schmidthuber's work and those of his Students which you can study here: http://www.idsia.ch/~juergen/onlinepub.html
The main problem being the topology of data in the study of entropy raises the question why topological data analysis is so rarely used in the field as a method of exploiting the nature of the dataset to achieve higher compression ratios. It would be a pleasure to exchange ideas on entropy with you. Thanks for this great contribution! I've just recently enjoyed a growing awe on groundbreaking algorithms that have come up with linear, near optimal and rarely even near perfect solutions.

Public methods are not exported to DLL/LIB

As stated in facebook/zstd#472, I would like to use FSE and HUFF0 methods from .NET which requires having a Windows DLL with public method marked as exported.

As was done for zstd, we just need to add a macro that does the same thing as here: https://github.com/facebook/zstd/blob/426a9d4b7128ef54d79627cff346173e833f733a/lib/zstd.h#L21

I can try putting a PR together for this, but I have a few questions:

Should there be a single PUBLIC_API macro used in all the fse.h, huf.h error_public.h? Or should there be multiple FSE_API, HUF_API macros to opt-in/out of each module?
Can we consider all methods declared in fse.h, huf.h and error_public.h as public, and everything else as private? What about bitstream.h and mem.h ?
the VC2012 project does not seem to provide a name for the generated dll (so it's probably . Would there be objections to updating the project to generate libfse.dll instead? Though if it can contains huff0 as well, many libfse.dll is too specific, and there should be another name for this library? (the repo is named FiniteStateEntropy but the title says "New Generation Entropy coders"...)

how to start? and how to use the Huffman codec?

Hello author, I just got in touch with the direction of data compression, and I am ashamed to ask, how to use your code? I want to try to compress image data using Huffman codec , how do I start?

FSE_emergencyDistrib does not exit under certain conditions

when all values in the first maxSymbolValue positions of the normalizedCounter are <= 1 the function will not exit and an infinite loop will result

huff0 with > 256 symbols

I am fascinated by the speed of the huff0 encoder however unfortunately I wish to encode/decode data with 512 bits. I have already tried the fseU16 and it works great, however is there a quick hack/modification I can do to make the huff0 also work with 9-bit data?

Any pointers in this direction are appreciated.
Thanks

UPDATE: My apologies, I didn't realized that HUF_compress2() is doing exactly what I desire. However will have to use it before saying it for sure.