lzfse / lzfse Goto Github PK

LZFSE compression library and command line tool

License: BSD 3-Clause "New" or "Revised" License

Makefile 1.41% C 95.86% CMake 2.74%

lzfse's Introduction

LZFSE

This is a reference C implementation of the LZFSE compressor introduced in the Compression library with OS X 10.11 and iOS 9.

LZFSE is a Lempel-Ziv style data compression algorithm using Finite State Entropy coding. It targets similar compression rates at higher compression and decompression speed compared to deflate using zlib.

Files

README.md                             This file ;-)
Makefile                              Linux / macOS Makefile
lzfse.xcodeproj                       Xcode project

src/lzfse.h                           Main LZFSE header
src/lzfse_tunables.h                  LZFSE encoder configuration
src/lzfse_internal.h                  LZFSE internal header
src/lzfse_decode.c                    LZFSE decoder API entry point
src/lzfse_encode.c                    LZFSE encoder API entry point
src/lzfse_decode_base.c               LZFSE decoder internal functions
src/lzfse_encode_base.c               LZFSE encoder internal functions
src/lzfse_encode_tables.h             LZFSE encoder tables

src/lzfse_fse.h                       FSE entropy encoder/decoder header
src/lzfse_fse.c                       FSE entropy encoder/decoder functions

src/lzvn_decode_base.h                LZVN decoder
src/lzvn_decode_base.c
src/lzvn_encode_base.h                LZVN encoder
src/lzvn_encode_base.c

src/lzfse_main.c                      Command line tool

Building on OS X

$ xcodebuild install DSTROOT=/tmp/lzfse.dst

Produces the following files in /tmp/lzfse.dst:

usr/local/bin/lzfse                   command line tool
usr/local/include/lzfse.h             LZFSE library header
usr/local/lib/liblzfse.a              LZFSE library

Building on Linux

Tested on Ubuntu 15.10 with gcc 5.2.1 and clang 3.6.2. Should work on any recent distribution.

$ make install INSTALL_PREFIX=/tmp/lzfse.dst/usr/local

Produces the following files in /tmp/lzfse.dst:

usr/local/bin/lzfse                   command line tool
usr/local/include/lzfse.h             LZFSE library header
usr/local/lib/liblzfse.a              LZFSE library

Building with cmake

$ mkdir build
$ cd build
$ cmake ..
$ make install

Installs the header, library, and command line tool in /usr/local.

Bindings

Python: dimkr/pylzfse

lzfse's People

Contributors

Stargazers

Watchers

Forkers

antondomashnev taiki-san chipturner jibsen inigosola grp timwee johanneshoehn shyamalschandra c4rpi longthanhtran n0madsky ejchet vi4m yak0xff leopardpan neuroradiology oskycar farruggia it-stone zofuthan robmorgan zitsen stevenliuit darkness0ut d9com123 ares7 xianliy adm1n007 gaoxiaojun rhli abhishekgahlot eagle518 diqiuche alphadyz mckellyln razrjay bindx hbcbh1999 rtvt123 letup fzhenyu beartung vieyang frankfoofoo allenwangxiao thurday xubingyue 0-t-0 turbin koolhazz nijinosuke nosuchprocess edisonqkj pi31415926535987932 layerfsd nbhhcty mitchell-dream sivansong machinelearningjourney wk1990ok mickycm meetleev honfe mysqto nwaiting wangmy xingyinwang caiobzen graysu tairanhu andy737 adroit91 nemofusion hongyunnchen fanyer qazx84265 archer-sys chenqitao techlord-rce xyzr0482 ieswxia zakos13 gdxn super-rain jrwren cyecp jakirkham a7vinx aruanruan hades210 huanghan lewis180777 uckelman-sf rpm5 horrorho jason0660 hklindworth 476139183 henosteven

lzfse's Issues

Clarify patent situation

I would appreciate some sort of clarification on LZFSE's patent situation. At minimum, if Apple believes it has any patents relevant to LZFSE they should disclose them, and if Apple doesn't have any relevant patents it should provide a statement to that effect (Google recently did something similar for Brotli).

Ideally, though, I would like to see some sort of patent license grant for any patents it owns or acquires which read on LZFSE, at least for the purpose of using LZFSE.

Conan package

Hello,
Do you know about Conan?
Conan is modern dependency manager for C++. And will be great if your library will be available via package manager for other developers.

Here you can find example, how you can create package for the library.

If you have any questions, just ask :-)

Expose streaming API

LZFSE already contains a streaming API, I'd like to see it exposed (probably after a bit of cleanup, at least on the encoder side).

License and Usage Approval Questions

So far LZFSE seems to be quite nice as an alternative to zlib. But I hope that I'm not the only one having troubles using a proprietary copyrighted algorithm which might deny me to use this software piece in future.

My questions so far are:

Will there be an RFC for LZFSE or any intention to standardize this algorithm to the public so that cross-platform adaption is possible?
Will LZFSE be distributed via another License (like Apache, BSD or MIT)?
What kind of patents by Apple are involved in the usage and adaption of LZFSE?

Thanks in advice.

Leftover conflict in the LICENSE

The LICENSE file in its current form is in a conflicted state (HEAD vs a7fa1f9dd645a3937217f51b2052221eeadeae45).

It looks by eye to be just formatting changes, but I can't be sure without a proper comparison, and given the importance of the document it's probably worth resolving as soon as possible!

Attempt to decode throws a malloc error

./lzfse -decode -i inputfile -o outputFile

lzfse(34763,0x10ee8b5c0) malloc: can't allocate region
*** mach_vm_map(size=264883518046208) failed (error code=3)
lzfse(34763,0x10ee8b5c0) malloc: *** set a breakpoint in malloc_error_break to debug
malloc: Cannot allocate memory

I want to compress a file use lzfse on system-level ,like lzvn or zlib,what could I do?

Mention bindings in the README

Python bindings at https://github.com/dimkr/pylzfse

Try impl on Android but app crashes

I have added c files to project

added CmakeLines file:

externalNativeBuild {
        cmake {
            path "src/main/lzfse/CMakeLists.txt"
            version "3.10.2"
        }
    }

so
I have written JNI fore decode:

JNIEXPORT jint JNICALL
Java_com_android_Decompressor_decode(

        JNIEnv* env, jclass cls, jobject src, jobject dst

) {

    uint8_t* src_buffer = (*env)->GetDirectBufferAddress(env,src);
    const size_t src_size = (const size_t) (*env)->GetDirectBufferCapacity(env, src);

    uint8_t* dst_buffer = (*env)->GetDirectBufferAddress(env,dst);

    size_t dst_size = (size_t) (*env)->GetDirectBufferCapacity(env, dst);

    jlong test = lzfse_decode_buffer(dst_buffer, dst_size, src_buffer, src_size, NULL);

    return (jint) test;
}

then I call from kt decode fun:

    ```
    val buf = ByteBuffer.wrap(byteArray)
        
            val buf_out = ByteBuffer.allocateDirect(byteArray.size *20)
            val size= decode(dstArray = buf_out, srcArray = buf)



but app crashes:


2020-02-25 20:12:25.717 28603-28603/ A/libc: Fatal signal 11 (SIGSEGV), code 1 (SEGV_MAPERR), fault addr 0x0 in tid 28603 , pid 28603

https://github.com/sbingner/lzfse.git

No portable fallbacks for GCC builtins

I've been playing with PGI lately; it's a great portability test because it doesn't pretend to be GCC (no defining __GNUC__, GCC builtins and attributes don't work, etc.) or MSVC, unlike a lot of other compilers.

When I tried building LZFSE I ended up with errors about undefined references to __builtin_clz, __builtin_ctzl, and __builtin_ctzll. Turns out there are fallbacks for MSVC, but nothing portable. If you want to copy some code, I've been working on something which take care of this (pubilc domain).

Also, I notice that __builtin_ctzll and __builtin_ctzl are called exclusively with 32-bit values, which is a bit wasteful. unsigned long long is ≥ 64-bit everywhere (the C spec defines the minimum range as 0 .. 2⁶⁴-1), and unsigned long is 64 bit on most 64 bit platforms (Windows is LLP64, but most others are L>64). The project I linked to above has some logic for choosing the right variant, but just using __builtin_ctz would probably be okay… there are some places where int is 16 bit (AVR comes to mind), but honestly I'm not sure how many of them LZFSE would work on anyways.

If you want to catch this stuff in CI, I also threw together a quick script to install PGI on Travis, so it would be trivial to add a PGI build.

lzfse decode issue - msys2 build

compile success over msys2(win10x64)
encode all fine
only issue is with decode :

decoding file of 14mb, RAM increase insanely like 5GB+
then output :

malloc: Not enough space

add support for LZBITMAP

Will you please open source the LZBITMAP algo as well?

Provide a function or macro to get the max compressed size

Most libraries which use buffer-to-buffer compression (as opposed to streaming) have a function which will return the maximum compressed size of a buffer; for example, zlib has compressBound.

It would be nice if LZFSE had something similar. AFAICK this is currently just input_size + 12. IMHO it would be better to make it a function so it's not part of the API/ABI, but if not I think you should at least document it.

On a related note, I believe the lzfse CLI will fail if you try to compress uncompressible data; see https://github.com/lzfse/lzfse/blob/master/src/lzfse_main.c#L199… assuming I'm right about the requirements, changing that from in_size to in_size + 12 would fix the issue.

Benchmarks

Would be great to see some performance benchmarks on standard datasets with some comparison to known compression algorithms.

Lots of compiler warnings

By default, LZFSE triggers quite a few compiler warnings which would be nice to fix. For example, if we just set the cflags to -Wall -Wextra:

$ make CFLAGS="-Wall -Wextra"
cc -Wall -Wextra -c src/lzfse_encode.c -o build/obj/lzfse_encode.o
In file included from src/lzfse_internal.h:30:0,
                 from src/lzfse_encode.c:25:
src/lzfse_fse.h:45:0: warning: ignoring #pragma mark  [-Wunknown-pragmas]
 #pragma mark - Bit utils

In file included from src/lzfse_internal.h:30:0,
                 from src/lzfse_encode.c:25:
src/lzfse_fse.h:124:0: warning: ignoring #pragma mark  [-Wunknown-pragmas]
 #pragma mark - Bit stream

In file included from src/lzfse_internal.h:30:0,
                 from src/lzfse_encode.c:25:
src/lzfse_fse.h:414:0: warning: ignoring #pragma mark  [-Wunknown-pragmas]
 #pragma mark - Encode/Decode

In file included from src/lzfse_internal.h:30:0,
                 from src/lzfse_encode.c:25:
src/lzfse_fse.h:541:0: warning: ignoring #pragma mark  [-Wunknown-pragmas]
 #pragma mark - Tables

In file included from src/lzfse_internal.h:30:0,
                 from src/lzfse_encode.c:25:
src/lzfse_fse.h: In function ‘fse_check_freq’:
src/lzfse_fse.h:560:21: warning: comparison between signed and unsigned integer expressions [-Wsign-compare]
   for (int i = 0; i < table_size; i++) {
                     ^
In file included from src/lzfse_encode.c:25:0:
src/lzfse_internal.h: At top level:
src/lzfse_internal.h:108:0: warning: ignoring #pragma mark  [-Wunknown-pragmas]
 #pragma mark - Encoder and Decoder state objects

src/lzfse_internal.h:227:0: warning: ignoring #pragma mark  [-Wunknown-pragmas]
 #pragma mark - Block header objects

In file included from src/lzfse_encode.c:25:0:
src/lzfse_internal.h:339:0: warning: ignoring #pragma mark  [-Wunknown-pragmas]
 #pragma mark - LZFSE encode/decode interfaces

src/lzfse_internal.h:347:0: warning: ignoring #pragma mark  [-Wunknown-pragmas]
 #pragma mark - LZVN encode/decode interfaces

In file included from src/lzfse_encode.c:25:0:
src/lzfse_internal.h:376:0: warning: ignoring #pragma mark  [-Wunknown-pragmas]
 #pragma mark - LZFSE utility functions

In file included from src/lzfse_encode.c:25:0:
src/lzfse_internal.h:524:0: warning: ignoring #pragma mark  [-Wunknown-pragmas]
 #pragma mark - L, M, D encoding constants for LZFSE

src/lzfse_encode.c: In function ‘lzfse_encode_buffer’:
src/lzfse_encode.c:94:23: warning: comparison between signed and unsigned integer expressions [-Wsign-compare]
       while (src_size >= encoder_block_size) {
                       ^~
In file included from src/lzfse_encode.c:25:0:
At top level:
src/lzfse_internal.h:561:16: warning: ‘d_base_value’ defined but not used [-Wunused-variable]
 static int32_t d_base_value[LZFSE_ENCODE_D_SYMBOLS] = {
                ^~~~~~~~~~~~
src/lzfse_internal.h:555:16: warning: ‘d_extra_bits’ defined but not used [-Wunused-variable]
 static uint8_t d_extra_bits[LZFSE_ENCODE_D_SYMBOLS] = {
                ^~~~~~~~~~~~
src/lzfse_internal.h:552:16: warning: ‘m_base_value’ defined but not used [-Wunused-variable]
 static int32_t m_base_value[LZFSE_ENCODE_M_SYMBOLS] = {
                ^~~~~~~~~~~~
src/lzfse_internal.h:549:16: warning: ‘m_extra_bits’ defined but not used [-Wunused-variable]
 static uint8_t m_extra_bits[LZFSE_ENCODE_M_SYMBOLS] = {
                ^~~~~~~~~~~~
src/lzfse_internal.h:546:16: warning: ‘l_base_value’ defined but not used [-Wunused-variable]
 static int32_t l_base_value[LZFSE_ENCODE_L_SYMBOLS] = {
                ^~~~~~~~~~~~
src/lzfse_internal.h:543:16: warning: ‘l_extra_bits’ defined but not used [-Wunused-variable]
 static uint8_t l_extra_bits[LZFSE_ENCODE_L_SYMBOLS] = {
                ^~~~~~~~~~~~
cc -Wall -Wextra -c src/lzfse_decode.c -o build/obj/lzfse_decode.o
In file included from src/lzfse_internal.h:30:0,
                 from src/lzfse_decode.c:25:
src/lzfse_fse.h:45:0: warning: ignoring #pragma mark  [-Wunknown-pragmas]
 #pragma mark - Bit utils

In file included from src/lzfse_internal.h:30:0,
                 from src/lzfse_decode.c:25:
src/lzfse_fse.h:124:0: warning: ignoring #pragma mark  [-Wunknown-pragmas]
 #pragma mark - Bit stream

In file included from src/lzfse_internal.h:30:0,
                 from src/lzfse_decode.c:25:
src/lzfse_fse.h:414:0: warning: ignoring #pragma mark  [-Wunknown-pragmas]
 #pragma mark - Encode/Decode

In file included from src/lzfse_internal.h:30:0,
                 from src/lzfse_decode.c:25:
src/lzfse_fse.h:541:0: warning: ignoring #pragma mark  [-Wunknown-pragmas]
 #pragma mark - Tables

In file included from src/lzfse_internal.h:30:0,
                 from src/lzfse_decode.c:25:
src/lzfse_fse.h: In function ‘fse_check_freq’:
src/lzfse_fse.h:560:21: warning: comparison between signed and unsigned integer expressions [-Wsign-compare]
   for (int i = 0; i < table_size; i++) {
                     ^
In file included from src/lzfse_decode.c:25:0:
src/lzfse_internal.h: At top level:
src/lzfse_internal.h:108:0: warning: ignoring #pragma mark  [-Wunknown-pragmas]
 #pragma mark - Encoder and Decoder state objects

src/lzfse_internal.h:227:0: warning: ignoring #pragma mark  [-Wunknown-pragmas]
 #pragma mark - Block header objects

In file included from src/lzfse_decode.c:25:0:
src/lzfse_internal.h:339:0: warning: ignoring #pragma mark  [-Wunknown-pragmas]
 #pragma mark - LZFSE encode/decode interfaces

src/lzfse_internal.h:347:0: warning: ignoring #pragma mark  [-Wunknown-pragmas]
 #pragma mark - LZVN encode/decode interfaces

In file included from src/lzfse_decode.c:25:0:
src/lzfse_internal.h:376:0: warning: ignoring #pragma mark  [-Wunknown-pragmas]
 #pragma mark - LZFSE utility functions

In file included from src/lzfse_decode.c:25:0:
src/lzfse_internal.h:524:0: warning: ignoring #pragma mark  [-Wunknown-pragmas]
 #pragma mark - L, M, D encoding constants for LZFSE

In file included from src/lzfse_decode.c:25:0:
src/lzfse_internal.h:561:16: warning: ‘d_base_value’ defined but not used [-Wunused-variable]
 static int32_t d_base_value[LZFSE_ENCODE_D_SYMBOLS] = {
                ^~~~~~~~~~~~
src/lzfse_internal.h:555:16: warning: ‘d_extra_bits’ defined but not used [-Wunused-variable]
 static uint8_t d_extra_bits[LZFSE_ENCODE_D_SYMBOLS] = {
                ^~~~~~~~~~~~
src/lzfse_internal.h:552:16: warning: ‘m_base_value’ defined but not used [-Wunused-variable]
 static int32_t m_base_value[LZFSE_ENCODE_M_SYMBOLS] = {
                ^~~~~~~~~~~~
src/lzfse_internal.h:549:16: warning: ‘m_extra_bits’ defined but not used [-Wunused-variable]
 static uint8_t m_extra_bits[LZFSE_ENCODE_M_SYMBOLS] = {
                ^~~~~~~~~~~~
src/lzfse_internal.h:546:16: warning: ‘l_base_value’ defined but not used [-Wunused-variable]
 static int32_t l_base_value[LZFSE_ENCODE_L_SYMBOLS] = {
                ^~~~~~~~~~~~
src/lzfse_internal.h:543:16: warning: ‘l_extra_bits’ defined but not used [-Wunused-variable]
 static uint8_t l_extra_bits[LZFSE_ENCODE_L_SYMBOLS] = {
                ^~~~~~~~~~~~
cc -Wall -Wextra -c src/lzfse_encode_base.c -o build/obj/lzfse_encode_base.o
In file included from src/lzfse_internal.h:30:0,
                 from src/lzfse_encode_base.c:24:
src/lzfse_fse.h:45:0: warning: ignoring #pragma mark  [-Wunknown-pragmas]
 #pragma mark - Bit utils

In file included from src/lzfse_internal.h:30:0,
                 from src/lzfse_encode_base.c:24:
src/lzfse_fse.h:124:0: warning: ignoring #pragma mark  [-Wunknown-pragmas]
 #pragma mark - Bit stream

In file included from src/lzfse_internal.h:30:0,
                 from src/lzfse_encode_base.c:24:
src/lzfse_fse.h:414:0: warning: ignoring #pragma mark  [-Wunknown-pragmas]
 #pragma mark - Encode/Decode

In file included from src/lzfse_internal.h:30:0,
                 from src/lzfse_encode_base.c:24:
src/lzfse_fse.h:541:0: warning: ignoring #pragma mark  [-Wunknown-pragmas]
 #pragma mark - Tables

In file included from src/lzfse_internal.h:30:0,
                 from src/lzfse_encode_base.c:24:
src/lzfse_fse.h: In function ‘fse_check_freq’:
src/lzfse_fse.h:560:21: warning: comparison between signed and unsigned integer expressions [-Wsign-compare]
   for (int i = 0; i < table_size; i++) {
                     ^
In file included from src/lzfse_encode_base.c:24:0:
src/lzfse_internal.h: At top level:
src/lzfse_internal.h:108:0: warning: ignoring #pragma mark  [-Wunknown-pragmas]
 #pragma mark - Encoder and Decoder state objects

src/lzfse_internal.h:227:0: warning: ignoring #pragma mark  [-Wunknown-pragmas]
 #pragma mark - Block header objects

In file included from src/lzfse_encode_base.c:24:0:
src/lzfse_internal.h:339:0: warning: ignoring #pragma mark  [-Wunknown-pragmas]
 #pragma mark - LZFSE encode/decode interfaces

src/lzfse_internal.h:347:0: warning: ignoring #pragma mark  [-Wunknown-pragmas]
 #pragma mark - LZVN encode/decode interfaces

In file included from src/lzfse_encode_base.c:24:0:
src/lzfse_internal.h:376:0: warning: ignoring #pragma mark  [-Wunknown-pragmas]
 #pragma mark - LZFSE utility functions

In file included from src/lzfse_encode_base.c:24:0:
src/lzfse_internal.h:524:0: warning: ignoring #pragma mark  [-Wunknown-pragmas]
 #pragma mark - L, M, D encoding constants for LZFSE

src/lzfse_encode_base.c: In function ‘lzfse_encode_v1_freq_table’:
src/lzfse_encode_base.c:129:9: warning: comparison between signed and unsigned integer expressions [-Wsign-compare]
     assert(bits < (1 << nbits));
         ^
cc -Wall -Wextra -c src/lzfse_decode_base.c -o build/obj/lzfse_decode_base.o
In file included from src/lzfse_internal.h:30:0,
                 from src/lzfse_decode_base.c:22:
src/lzfse_fse.h:45:0: warning: ignoring #pragma mark  [-Wunknown-pragmas]
 #pragma mark - Bit utils

In file included from src/lzfse_internal.h:30:0,
                 from src/lzfse_decode_base.c:22:
src/lzfse_fse.h:124:0: warning: ignoring #pragma mark  [-Wunknown-pragmas]
 #pragma mark - Bit stream

In file included from src/lzfse_internal.h:30:0,
                 from src/lzfse_decode_base.c:22:
src/lzfse_fse.h:414:0: warning: ignoring #pragma mark  [-Wunknown-pragmas]
 #pragma mark - Encode/Decode

In file included from src/lzfse_internal.h:30:0,
                 from src/lzfse_decode_base.c:22:
src/lzfse_fse.h:541:0: warning: ignoring #pragma mark  [-Wunknown-pragmas]
 #pragma mark - Tables

In file included from src/lzfse_internal.h:30:0,
                 from src/lzfse_decode_base.c:22:
src/lzfse_fse.h: In function ‘fse_check_freq’:
src/lzfse_fse.h:560:21: warning: comparison between signed and unsigned integer expressions [-Wsign-compare]
   for (int i = 0; i < table_size; i++) {
                     ^
In file included from src/lzfse_decode_base.c:22:0:
src/lzfse_internal.h: At top level:
src/lzfse_internal.h:108:0: warning: ignoring #pragma mark  [-Wunknown-pragmas]
 #pragma mark - Encoder and Decoder state objects

src/lzfse_internal.h:227:0: warning: ignoring #pragma mark  [-Wunknown-pragmas]
 #pragma mark - Block header objects

In file included from src/lzfse_decode_base.c:22:0:
src/lzfse_internal.h:339:0: warning: ignoring #pragma mark  [-Wunknown-pragmas]
 #pragma mark - LZFSE encode/decode interfaces

src/lzfse_internal.h:347:0: warning: ignoring #pragma mark  [-Wunknown-pragmas]
 #pragma mark - LZVN encode/decode interfaces

In file included from src/lzfse_decode_base.c:22:0:
src/lzfse_internal.h:376:0: warning: ignoring #pragma mark  [-Wunknown-pragmas]
 #pragma mark - LZFSE utility functions

In file included from src/lzfse_decode_base.c:22:0:
src/lzfse_internal.h:524:0: warning: ignoring #pragma mark  [-Wunknown-pragmas]
 #pragma mark - L, M, D encoding constants for LZFSE

src/lzfse_decode_base.c: In function ‘lzfse_decode_lmd’:
src/lzfse_decode_base.c:240:30: warning: comparison between signed and unsigned integer expressions [-Wsign-compare]
         for (size_t i = 0; i < M; i++)
                              ^
src/lzfse_decode_base.c:256:30: warning: comparison between signed and unsigned integer expressions [-Wsign-compare]
         for (size_t i = 0; i < L; i++)
                              ^
src/lzfse_decode_base.c:268:30: warning: comparison between signed and unsigned integer expressions [-Wsign-compare]
         for (size_t i = 0; i < remaining_bytes; i++)
                              ^
src/lzfse_decode_base.c:280:30: warning: comparison between signed and unsigned integer expressions [-Wsign-compare]
         for (size_t i = 0; i < M; i++)
                              ^
src/lzfse_decode_base.c:294:30: warning: comparison between signed and unsigned integer expressions [-Wsign-compare]
         for (size_t i = 0; i < remaining_bytes; i++)
                              ^
cc -Wall -Wextra -c src/lzvn_encode_base.c -o build/obj/lzvn_encode_base.o
In file included from src/lzfse_internal.h:30:0,
                 from src/lzvn_encode_base.h:27,
                 from src/lzvn_encode_base.c:24:
src/lzfse_fse.h:45:0: warning: ignoring #pragma mark  [-Wunknown-pragmas]
 #pragma mark - Bit utils

In file included from src/lzfse_internal.h:30:0,
                 from src/lzvn_encode_base.h:27,
                 from src/lzvn_encode_base.c:24:
src/lzfse_fse.h:124:0: warning: ignoring #pragma mark  [-Wunknown-pragmas]
 #pragma mark - Bit stream

In file included from src/lzfse_internal.h:30:0,
                 from src/lzvn_encode_base.h:27,
                 from src/lzvn_encode_base.c:24:
src/lzfse_fse.h:414:0: warning: ignoring #pragma mark  [-Wunknown-pragmas]
 #pragma mark - Encode/Decode

In file included from src/lzfse_internal.h:30:0,
                 from src/lzvn_encode_base.h:27,
                 from src/lzvn_encode_base.c:24:
src/lzfse_fse.h:541:0: warning: ignoring #pragma mark  [-Wunknown-pragmas]
 #pragma mark - Tables

In file included from src/lzfse_internal.h:30:0,
                 from src/lzvn_encode_base.h:27,
                 from src/lzvn_encode_base.c:24:
src/lzfse_fse.h: In function ‘fse_check_freq’:
src/lzfse_fse.h:560:21: warning: comparison between signed and unsigned integer expressions [-Wsign-compare]
   for (int i = 0; i < table_size; i++) {
                     ^
In file included from src/lzvn_encode_base.h:27:0,
                 from src/lzvn_encode_base.c:24:
src/lzfse_internal.h: At top level:
src/lzfse_internal.h:108:0: warning: ignoring #pragma mark  [-Wunknown-pragmas]
 #pragma mark - Encoder and Decoder state objects

src/lzfse_internal.h:227:0: warning: ignoring #pragma mark  [-Wunknown-pragmas]
 #pragma mark - Block header objects

In file included from src/lzvn_encode_base.h:27:0,
                 from src/lzvn_encode_base.c:24:
src/lzfse_internal.h:339:0: warning: ignoring #pragma mark  [-Wunknown-pragmas]
 #pragma mark - LZFSE encode/decode interfaces

src/lzfse_internal.h:347:0: warning: ignoring #pragma mark  [-Wunknown-pragmas]
 #pragma mark - LZVN encode/decode interfaces

In file included from src/lzvn_encode_base.h:27:0,
                 from src/lzvn_encode_base.c:24:
src/lzfse_internal.h:376:0: warning: ignoring #pragma mark  [-Wunknown-pragmas]
 #pragma mark - LZFSE utility functions

In file included from src/lzvn_encode_base.h:27:0,
                 from src/lzvn_encode_base.c:24:
src/lzfse_internal.h:524:0: warning: ignoring #pragma mark  [-Wunknown-pragmas]
 #pragma mark - L, M, D encoding constants for LZFSE

In file included from src/lzvn_encode_base.h:27:0,
                 from src/lzvn_encode_base.c:24:
src/lzfse_internal.h:561:16: warning: ‘d_base_value’ defined but not used [-Wunused-variable]
 static int32_t d_base_value[LZFSE_ENCODE_D_SYMBOLS] = {
                ^~~~~~~~~~~~
src/lzfse_internal.h:555:16: warning: ‘d_extra_bits’ defined but not used [-Wunused-variable]
 static uint8_t d_extra_bits[LZFSE_ENCODE_D_SYMBOLS] = {
                ^~~~~~~~~~~~
src/lzfse_internal.h:552:16: warning: ‘m_base_value’ defined but not used [-Wunused-variable]
 static int32_t m_base_value[LZFSE_ENCODE_M_SYMBOLS] = {
                ^~~~~~~~~~~~
src/lzfse_internal.h:549:16: warning: ‘m_extra_bits’ defined but not used [-Wunused-variable]
 static uint8_t m_extra_bits[LZFSE_ENCODE_M_SYMBOLS] = {
                ^~~~~~~~~~~~
src/lzfse_internal.h:546:16: warning: ‘l_base_value’ defined but not used [-Wunused-variable]
 static int32_t l_base_value[LZFSE_ENCODE_L_SYMBOLS] = {
                ^~~~~~~~~~~~
src/lzfse_internal.h:543:16: warning: ‘l_extra_bits’ defined but not used [-Wunused-variable]
 static uint8_t l_extra_bits[LZFSE_ENCODE_L_SYMBOLS] = {
                ^~~~~~~~~~~~
cc -Wall -Wextra -c src/lzvn_decode_base.c -o build/obj/lzvn_decode_base.o
In file included from src/lzfse_internal.h:30:0,
                 from src/lzvn_decode_base.h:29,
                 from src/lzvn_decode_base.c:24:
src/lzfse_fse.h:45:0: warning: ignoring #pragma mark  [-Wunknown-pragmas]
 #pragma mark - Bit utils

In file included from src/lzfse_internal.h:30:0,
                 from src/lzvn_decode_base.h:29,
                 from src/lzvn_decode_base.c:24:
src/lzfse_fse.h:124:0: warning: ignoring #pragma mark  [-Wunknown-pragmas]
 #pragma mark - Bit stream

In file included from src/lzfse_internal.h:30:0,
                 from src/lzvn_decode_base.h:29,
                 from src/lzvn_decode_base.c:24:
src/lzfse_fse.h:414:0: warning: ignoring #pragma mark  [-Wunknown-pragmas]
 #pragma mark - Encode/Decode

In file included from src/lzfse_internal.h:30:0,
                 from src/lzvn_decode_base.h:29,
                 from src/lzvn_decode_base.c:24:
src/lzfse_fse.h:541:0: warning: ignoring #pragma mark  [-Wunknown-pragmas]
 #pragma mark - Tables

In file included from src/lzfse_internal.h:30:0,
                 from src/lzvn_decode_base.h:29,
                 from src/lzvn_decode_base.c:24:
src/lzfse_fse.h: In function ‘fse_check_freq’:
src/lzfse_fse.h:560:21: warning: comparison between signed and unsigned integer expressions [-Wsign-compare]
   for (int i = 0; i < table_size; i++) {
                     ^
In file included from src/lzvn_decode_base.h:29:0,
                 from src/lzvn_decode_base.c:24:
src/lzfse_internal.h: At top level:
src/lzfse_internal.h:108:0: warning: ignoring #pragma mark  [-Wunknown-pragmas]
 #pragma mark - Encoder and Decoder state objects

src/lzfse_internal.h:227:0: warning: ignoring #pragma mark  [-Wunknown-pragmas]
 #pragma mark - Block header objects

In file included from src/lzvn_decode_base.h:29:0,
                 from src/lzvn_decode_base.c:24:
src/lzfse_internal.h:339:0: warning: ignoring #pragma mark  [-Wunknown-pragmas]
 #pragma mark - LZFSE encode/decode interfaces

src/lzfse_internal.h:347:0: warning: ignoring #pragma mark  [-Wunknown-pragmas]
 #pragma mark - LZVN encode/decode interfaces

In file included from src/lzvn_decode_base.h:29:0,
                 from src/lzvn_decode_base.c:24:
src/lzfse_internal.h:376:0: warning: ignoring #pragma mark  [-Wunknown-pragmas]
 #pragma mark - LZFSE utility functions

In file included from src/lzvn_decode_base.h:29:0,
                 from src/lzvn_decode_base.c:24:
src/lzfse_internal.h:524:0: warning: ignoring #pragma mark  [-Wunknown-pragmas]
 #pragma mark - L, M, D encoding constants for LZFSE

src/lzvn_decode_base.c: In function ‘lzvn_decode’:
src/lzvn_decode_base.c:225:9: warning: comparison between signed and unsigned integer expressions [-Wsign-compare]
   if (D > dst_ptr - state->dst_begin || D == 0)
         ^
In file included from src/lzvn_decode_base.h:29:0,
                 from src/lzvn_decode_base.c:24:
At top level:
src/lzfse_internal.h:561:16: warning: ‘d_base_value’ defined but not used [-Wunused-variable]
 static int32_t d_base_value[LZFSE_ENCODE_D_SYMBOLS] = {
                ^~~~~~~~~~~~
src/lzfse_internal.h:555:16: warning: ‘d_extra_bits’ defined but not used [-Wunused-variable]
 static uint8_t d_extra_bits[LZFSE_ENCODE_D_SYMBOLS] = {
                ^~~~~~~~~~~~
src/lzfse_internal.h:552:16: warning: ‘m_base_value’ defined but not used [-Wunused-variable]
 static int32_t m_base_value[LZFSE_ENCODE_M_SYMBOLS] = {
                ^~~~~~~~~~~~
src/lzfse_internal.h:549:16: warning: ‘m_extra_bits’ defined but not used [-Wunused-variable]
 static uint8_t m_extra_bits[LZFSE_ENCODE_M_SYMBOLS] = {
                ^~~~~~~~~~~~
src/lzfse_internal.h:546:16: warning: ‘l_base_value’ defined but not used [-Wunused-variable]
 static int32_t l_base_value[LZFSE_ENCODE_L_SYMBOLS] = {
                ^~~~~~~~~~~~
src/lzfse_internal.h:543:16: warning: ‘l_extra_bits’ defined but not used [-Wunused-variable]
 static uint8_t l_extra_bits[LZFSE_ENCODE_L_SYMBOLS] = {
                ^~~~~~~~~~~~
cc -Wall -Wextra -c src/lzfse_fse.c -o build/obj/lzfse_fse.o
In file included from src/lzfse_internal.h:30:0,
                 from src/lzfse_fse.c:22:
src/lzfse_fse.h:45:0: warning: ignoring #pragma mark  [-Wunknown-pragmas]
 #pragma mark - Bit utils

In file included from src/lzfse_internal.h:30:0,
                 from src/lzfse_fse.c:22:
src/lzfse_fse.h:124:0: warning: ignoring #pragma mark  [-Wunknown-pragmas]
 #pragma mark - Bit stream

In file included from src/lzfse_internal.h:30:0,
                 from src/lzfse_fse.c:22:
src/lzfse_fse.h:414:0: warning: ignoring #pragma mark  [-Wunknown-pragmas]
 #pragma mark - Encode/Decode

In file included from src/lzfse_internal.h:30:0,
                 from src/lzfse_fse.c:22:
src/lzfse_fse.h:541:0: warning: ignoring #pragma mark  [-Wunknown-pragmas]
 #pragma mark - Tables

In file included from src/lzfse_internal.h:30:0,
                 from src/lzfse_fse.c:22:
src/lzfse_fse.h: In function ‘fse_check_freq’:
src/lzfse_fse.h:560:21: warning: comparison between signed and unsigned integer expressions [-Wsign-compare]
   for (int i = 0; i < table_size; i++) {
                     ^
In file included from src/lzfse_fse.c:22:0:
src/lzfse_internal.h: At top level:
src/lzfse_internal.h:108:0: warning: ignoring #pragma mark  [-Wunknown-pragmas]
 #pragma mark - Encoder and Decoder state objects

src/lzfse_internal.h:227:0: warning: ignoring #pragma mark  [-Wunknown-pragmas]
 #pragma mark - Block header objects

In file included from src/lzfse_fse.c:22:0:
src/lzfse_internal.h:339:0: warning: ignoring #pragma mark  [-Wunknown-pragmas]
 #pragma mark - LZFSE encode/decode interfaces

src/lzfse_internal.h:347:0: warning: ignoring #pragma mark  [-Wunknown-pragmas]
 #pragma mark - LZVN encode/decode interfaces

In file included from src/lzfse_fse.c:22:0:
src/lzfse_internal.h:376:0: warning: ignoring #pragma mark  [-Wunknown-pragmas]
 #pragma mark - LZFSE utility functions

In file included from src/lzfse_fse.c:22:0:
src/lzfse_internal.h:524:0: warning: ignoring #pragma mark  [-Wunknown-pragmas]
 #pragma mark - L, M, D encoding constants for LZFSE

In file included from src/lzfse_fse.c:22:0:
src/lzfse_internal.h:561:16: warning: ‘d_base_value’ defined but not used [-Wunused-variable]
 static int32_t d_base_value[LZFSE_ENCODE_D_SYMBOLS] = {
                ^~~~~~~~~~~~
src/lzfse_internal.h:555:16: warning: ‘d_extra_bits’ defined but not used [-Wunused-variable]
 static uint8_t d_extra_bits[LZFSE_ENCODE_D_SYMBOLS] = {
                ^~~~~~~~~~~~
src/lzfse_internal.h:552:16: warning: ‘m_base_value’ defined but not used [-Wunused-variable]
 static int32_t m_base_value[LZFSE_ENCODE_M_SYMBOLS] = {
                ^~~~~~~~~~~~
src/lzfse_internal.h:549:16: warning: ‘m_extra_bits’ defined but not used [-Wunused-variable]
 static uint8_t m_extra_bits[LZFSE_ENCODE_M_SYMBOLS] = {
                ^~~~~~~~~~~~
src/lzfse_internal.h:546:16: warning: ‘l_base_value’ defined but not used [-Wunused-variable]
 static int32_t l_base_value[LZFSE_ENCODE_L_SYMBOLS] = {
                ^~~~~~~~~~~~
src/lzfse_internal.h:543:16: warning: ‘l_extra_bits’ defined but not used [-Wunused-variable]
 static uint8_t l_extra_bits[LZFSE_ENCODE_L_SYMBOLS] = {
                ^~~~~~~~~~~~
ld -r -o ./build/obj/liblzfse_master.o ./build/obj/lzfse_encode.o  ./build/obj/lzfse_decode.o ./build/obj/lzfse_encode_base.o ./build/obj/lzfse_decode_base.o ./build/obj/lzvn_encode_base.o ./build/obj/lzvn_decode_base.o ./build/obj/lzfse_fse.o
ar rvs build/bin/liblzfse.a ./build/obj/liblzfse_master.o
ar: creating build/bin/liblzfse.a
a - ./build/obj/liblzfse_master.o
cc -Wall -Wextra -c src/lzfse_main.c -o build/obj/lzfse_main.o
src/lzfse_main.c: In function ‘usage’:
src/lzfse_main.c:58:16: warning: unused parameter ‘argc’ [-Wunused-parameter]
 void usage(int argc, char **argv) {
                ^~~~
src/lzfse_main.c: In function ‘main’:
src/lzfse_main.c:149:20: warning: comparison between signed and unsigned integer expressions [-Wsign-compare]
     if (st.st_size > SIZE_MAX) {
                    ^
cc -Wall -Wextra -o build/bin/lzfse ./build/obj/lzfse_main.o ./build/bin/liblzfse.a
nemequ@peltast:~/local/src/lzfse$ git:(master) git status
On branch master
Your branch is up-to-date with 'origin/master'.
nothing to commit, working directory clean

If you'd like, I can submit a patch so lzfse is automatically tested on a bunch of different versions of different compilers on different operating systems like Squash does; see https://travis-ci.org/quixdb/squash/builds/138154148 and https://ci.appveyor.com/project/quixdb/squash/build/466

Consider requiring scratch memory to already be zeroed

One potential optimization is to require that memory passed to the encode and decode functions already be zeroed, which would let you save the memset call. It can be much faster to call calloc than malloc + memset. For people doing lots of small operations without reusing memory this could result in a significant speedup (I've seen big boosts in other codec's performance by doing this).

MSVC support

LZFSE currently fails to build with MSVC. See https://ci.appveyor.com/project/quixdb/squash/build/470/job/1fvyqd3lrkrpem6t#L1388 for a build log.

@jibsen kindly looked into this, and it seems the major issue is reliance on a gcc extension in lzvn_decode_base.c.

[Question] How to build this as dylib (consumable in a .NET Core project)

How can I build this as a dylib on macOS, rather than a .a (static library)?

I've tried the following with no success:

passing -DBUILD_SHARED_LIBS=ON parameter to xcodebuild
Explicitly specified "Dynamic Library" as a "Match-O Type": https://take.ms/4Tzj6

Add to homebrew

Since macOS does not ship with this tool, it would be nice to have it available on homebrew:
http://brew.sh

For this, you have to meet certain requirements:
https://github.com/Homebrew/brew/blob/master/share/doc/homebrew/Formula-Cookbook.md

how to inline compile assembly

I am trying to use c2goasm to generate a pure go version of lzfse, however, it doesn't support calls so I was wondering how to I generate only inlined assembly of lzfse? More specifically just lzfse_decode_buffer function?

$ mkdir build
$ cd build
$ cmake ..
$ make src/lzfse_decode.c.s

I'm using clang on latest macOS.

Thank you!!

UBsan flags 2 loops in lzfse_decode_base.c with "Pointer Overflow" warnings

Hi, I've compiled the most recent lzfse library using Xcode 11.7 and ran it with UB sanitizer enabled. It flags these two warnings when I ran it against my test data:

file lzfse_decode_base.c
lines 240-241:
for (size_t i = 0; i < M; i++)
dst[i] = dst[i - D];

UBsan flags line 241 as "Thread 1: Pointer overflow"
"Addition of unsigned offset to 0x000106a85801 overflowed to 0x000106a85800"

variables:
D = 1
M = 1023
i = 0

The actual problem is that this code is trying to copy bytes with overlapping buffers and it ends up performing the copy incorrectly.

In this specific case, it will first copy dst[-1] to dst[0] and then on the next loop iteration, it will copy dst[0] to dst[1] but this will be the same value as dst[-1] which is almost certainly not what the code's author intended because the comment above this code states
// ..."a more
// careful path that applies a permutation to account for the
// possible overlap between source and destination if the distance
// is small".
which is referring to this loop.

I believe the code should be performing the equivalent of

memmove(dst, dst - D, M);

which is what the fast code path above it does (the fast code path assumes that the buffers do not overlap however so it performs its work using a memcpy-like loop).

I've patched my working copy of the source file to use memmove and this stopped the warning from being generated.

There is another UBsan warning in the same file on line 280-281:

    for (size_t i = 0; i < M; i++)
      dst[i] = dst[i - D];

UBsan flags line 281 as "Thread 1: Pointer overflow"
"Addition of unsigned offset to 0x000106b790f4 overflowed to 0x000106b3f0f4"

with variables:

M = 1801
D = 237568
i = 0

which clearly is a non-overlapping copy so the only issue here is the way the code appears to access invalid array elements.

I've patched my working copy of the source file to use memcpy(dst, dst - D, M) and this stopped the warning from being generated.

I will also note that lines 294-295 are another loop similar to these 2 and may also generate a similar warning (but my test data did not cause this code to execute).

Cheers.

Problem with lzfse_decode_buffer_with_scratch

When checking return value of lzfse_decode_buffer (which calls lzfse_decode_buffer_with_scratch), I'm supposed to treat dst_size as error:

if (status == LZFSE_STATUS_DST_FULL)
    return dst_size;

But I've encountered a situation, when the last line of lzfse_decode_buffer_with_scratch:

return (size_t)(s->dst - dst_buffer); // bytes written

also returns dst_size! Which must not be treated as error.

The current workaround is returning 0 instead of dst_size, but it can be a special handling e.g. extra error parameter.

Not able to decompress iCloud backup file when size is greater than 64KB

Library release versioning

Hi, I noticed the project doesn't have proper releases and that makes hard to build package for third party distribution.

In particular, I was adding the package for this library to conda-forge (see conda-forge/staged-recipes#2596) and having versions somewhat required. It would be posible to have tags (and optionally releases) that we can refer to?

CC @jakirkham

error when using on centos 7

i build with
make install INSTALL_PREFIX=/usr/local

but when use it , got error

[root@cp ~]# /usr/local/bin/lzfse -encode -i /root/temp.mp3 -o /root/testmp3.lzf se
*** Error in `/usr/local/bin/lzfse': double free or corruption (fasttop): 0x0000 000001fe8fb0 ***
======= Backtrace: =========
/lib64/libc.so.6(+0x7c503)[0x7dcc9ae13503]
/usr/local/bin/lzfse[0x400bd8]
/lib64/libc.so.6(__libc_start_main+0xf5)[0x7dcc9adb8b35]
/usr/local/bin/lzfse[0x400e01]
======= Memory map: ========
00400000-00407000 r-xp 00000000 09:02 924310 /usr/lo cal/bin/lzfse
00606000-00607000 r--p 00006000 09:02 924310 /usr/lo cal/bin/lzfse
00607000-00608000 rw-p 00007000 09:02 924310 /usr/lo cal/bin/lzfse
00608000-01fe8000 ---p 00000000 00:00 0
01fe8000-0200a000 rw-p 00000000 00:00 0 [heap]
7dcc94000000-7dcc94021000 rw-p 00000000 00:00 0
7dcc94021000-7dcc98000000 ---p 00000000 00:00 0
7dcc9ab81000-7dcc9ab96000 r-xp 00000000 09:02 397899 /usr/li b64/libgcc_s-4.8.5-20150702.so.1
7dcc9ab96000-7dcc9ad95000 ---p 00015000 09:02 397899 /usr/li b64/libgcc_s-4.8.5-20150702.so.1
7dcc9ad95000-7dcc9ad96000 r--p 00014000 09:02 397899 /usr/li b64/libgcc_s-4.8.5-20150702.so.1
7dcc9ad96000-7dcc9ad97000 rw-p 00015000 09:02 397899 /usr/li b64/libgcc_s-4.8.5-20150702.so.1
7dcc9ad97000-7dcc9af4e000 r-xp 00000000 09:02 397858 /usr/li b64/libc-2.17.so
7dcc9af4e000-7dcc9b14d000 ---p 001b7000 09:02 397858 /usr/li b64/libc-2.17.so
7dcc9b14d000-7dcc9b151000 r--p 001b6000 09:02 397858 /usr/li b64/libc-2.17.so
7dcc9b151000-7dcc9b153000 rw-p 001ba000 09:02 397858 /usr/li b64/libc-2.17.so
7dcc9b153000-7dcc9b158000 rw-p 00000000 00:00 0
7dcc9b158000-7dcc9b178000 r-xp 00000000 09:02 392654 /usr/li b64/ld-2.17.so
7dcc9b367000-7dcc9b36a000 rw-p 00000000 00:00 0
7dcc9b375000-7dcc9b377000 rw-p 00000000 00:00 0
7dcc9b377000-7dcc9b378000 r-xp 00000000 00:00 0 [vdso]
7dcc9b378000-7dcc9b379000 r--p 00020000 09:02 392654 /usr/li b64/ld-2.17.so
7dcc9b379000-7dcc9b37a000 rw-p 00021000 09:02 392654 /usr/li b64/ld-2.17.so
7dcc9b37a000-7dcc9b37b000 rw-p 00000000 00:00 0
7fddc2fb7000-7fddc2fd8000 rw-p 00000000 00:00 0 [stack]
ffffffffff600000-ffffffffff601000 r--p 00000000 00:00 0 [vsysca ll]
Aborted

is this a bug?

lzfse_decode can't distinguish between decompression error and insufficient space

It would be nice if, when lzfse_decode fails, it would be possible to distinguish between the decoder running out of space in the output buffer (in which case retrying with a larger buffer may fix the issue) and more serious errors such as corrupt data.

In order to get around this limitation in Squash, I'm using the internal API.

My suggestion would be to change the function to something like

int lzfse_decode_buffer(uint8_t *__restrict dst_buffer,
                           size_t* dst_size,
                           const uint8_t *__restrict src_buffer,
                           size_t src_size,
                           void *__restrict scratch_buffer) LZFSE_LIB_API;

dst_size becomes an in/out variable, and the return value can be used to indicate a failure. See https://quixdb.github.io/squash/api/c/group__SquashCodec.html#gada8fa80a9fe604d8b2bb48f84e3914d1 for an example.

Crash (double free) when compressing empty file or pipe

In lzfse 1.0, built with gcc 7.2.0 -O3 -flto :

> rm -f empty.txt ; touch empty.txt
> lzfse -v -encode -i empty.txt
LZFSE encode
Input: empty.txt
Output: /dev/null
*** Error in `lzfse': double free or corruption (fasttop): 0x000000000243e010 ***
Abort (core dumped)

> lzfse -v -encode
LZFSE encode
Input: stdin
Output: stdout
Input size: 0 B
Output buffer was too small, increasing size...
*** Error in `lzfse': double free or corruption (fasttop): 0x0000000001deb010 ***
Abort (core dumped)

Unaligned loads in decoder fail on asm.js.

Asm.js doesn't support unaligned loads, but WASM does. These are both 32-bit compiles. I've noted that for most test inputs, the lzfse reference decoder fails to decode in asm.js but works in wasm as a result. It looks like there were attempts to avoid unaligned loads, but that they weren't all fixed. Compression libraries often have a flag to enable/disable unaligned load usage.

Here's where the first failure occurs and this halts the decode.
err: -3 lzfse_decode_base.c 224
Decode failed with size 0 into buffer 2706.

    //  Error if D is out of range, so that we avoid passing through
    //  uninitialized data or accesssing memory out of the destination
    //  buffer.
    if ((uint32_t)D > dst + L - s->dst_begin)  <- this is a compile warning already
      return LZFSE_STATUS_ERROR;

Running "Undefined Behavior Sanitizer" on the xcode 64-bit build turns up these. There are also about 20 unsigned/signed mismatch warnings throughout, and many teams like our have "warnings as errors" enabled. These may not be indicative of the unaligned load issue, but seem worth investigation.

lzfse_decode_base.c:297:20: runtime error: addition of unsigned offset to 0x00011480a46f overflowed to 0x00011480a36a
lzfse_decode_base.c:243:20: runtime error: addition of unsigned offset to 0x00011480b106 overflowed to 0x00011480b105

The decode works for very small data sets, but anything above a given byte count fails.

Also the decoder example seems to treat 0 the same as returning dest length in the growth loop. That seems to only be returned for the error case.

Memory-related crash

Hello! I'm looking to solve the crash in my app, a WatchKit complication, to be precise. I already searched all of the common places - no clues. I came here since the crash log contains lzfse symbols, so may be you know the source of this bug.

The error I get:

Thread 1: EXC_RESOURCE (RESOURCE_TYPE_MEMORY: high watermark memory limit exceeded) (limit=15 MB)

The call stack:

Thread 1 Queue : com.apple.main-thread (serial)
#0	0x507c8864 in lzfseDecode ()
#1	0x507c788c in lzfse_decode_buffer ()
#2	0x507c645c in compression_decode_buffer ()
#3	0x52200ff0 in Deepmap2DecodeDefault ()
#4	0x5221d184 in DecodeTiledImage ()
#5	0x521ea5a4 in vImageDeepmap2Decode ()
#6	0x41ea601c in __CUIUncompressDeepmap2ImageData_block_invoke ()
#7	0x41ea074c in CUIUncompressDeepmap2ImageData ()
#8	0x41ea4548 in -[_CSIRenditionBlockData expandCSIBitmapData:fromSlice:makeReadOnly:] ()
#9	0x41ea1d48 in __csiCompressImageProviderCopyImageBlockSetWithOptions ()
#10	0x2d7cdcd8 in IIOImagePixelDataProvider::getBytesImageProvider(void*, unsigned long) ()
#11	0x2d7dac6c in PNGWritePlugin::writePNG(IIOImagePixelDataProvider*, IIODictionary*) ()
#12	0x2d7e1d64 in PNGWritePlugin::writeAll() ()
#13	0x2d7da558 in PNGWritePlugin::WriteProc(void*, void*, void*, void*) ()
#14	0x2d7cd044 in IIOImageDestination::finalizeDestination() ()
#15	0x2d807770 in CGImageDestinationFinalize ()
#16	0x244cd564 in ___lldb_unnamed_symbol212136 ()
#17	0x244cd0ec in ___lldb_unnamed_symbol212135 ()
#18	0x244cebfc in ___lldb_unnamed_symbol212193 ()
#19	0x2402f64c in ___lldb_unnamed_symbol166857 ()
#20	0x2402c180 in ___lldb_unnamed_symbol166740 ()
#21	0x2402cdec in ___lldb_unnamed_symbol166744 ()
#22	0x2402ca94 in ___lldb_unnamed_symbol166742 ()
#23	0x244cc404 in ___lldb_unnamed_symbol212112 ()
#24	0x244cce08 in ___lldb_unnamed_symbol212134 ()
#25	0x23b3f110 in ___lldb_unnamed_symbol123404 ()
#26	0x241fed00 in ___lldb_unnamed_symbol183123 ()
#27	0x241fce70 in ___lldb_unnamed_symbol183102 ()
#28	0x241fcf54 in ___lldb_unnamed_symbol183102 ()
#29	0x241fcf54 in ___lldb_unnamed_symbol183102 ()
#30	0x241fcf54 in ___lldb_unnamed_symbol183102 ()
#31	0x241fcf54 in ___lldb_unnamed_symbol183102 ()
#32	0x23c0e600 in ___lldb_unnamed_symbol130598 ()
#33	0x23c10a20 in ___lldb_unnamed_symbol130761 ()
#34	0x242f05e4 in ___lldb_unnamed_symbol193041 ()
#35	0x23c0dbf0 in ___lldb_unnamed_symbol130593 ()
#36	0x23c0e888 in ___lldb_unnamed_symbol130605 ()
#37	0x23c0e7ac in ___lldb_unnamed_symbol130604 ()
#38	0x23c0e9e8 in ___lldb_unnamed_symbol130607 ()
#39	0x27b5dc48 in ___lldb_unnamed_symbol5926 ()
#40	0x27ae5124 in ___lldb_unnamed_symbol2483 ()
#41	0x27ae4558 in ___lldb_unnamed_symbol2482 ()
#42	0x27b9f844 in ___lldb_unnamed_symbol7756 ()
#43	0x26214138 in _dispatch_call_block_and_release ()
#44	0x262159ac in _dispatch_client_callout ()
#45	0x2622246c in _dispatch_main_queue_drain ()
#46	0x262220c4 in _dispatch_main_queue_callback_4CF ()
#47	0x1d0aede4 in __CFRUNLOOP_IS_SERVICING_THE_MAIN_DISPATCH_QUEUE__ ()
#48	0x1d08274c in __CFRunLoopRun ()
#49	0x1d0c76bc in CFRunLoopRunSpecific ()
#50	0x1baf5d78 in -[NSRunLoop(NSRunLoop) runMode:beforeDate:] ()
#51	0x1bb2ec14 in -[NSRunLoop(NSRunLoop) run] ()
#52	0x50520c3c in _xpc_objc_main ()
#53	0x50522d64 in _xpc_main ()
#54	0x50522f24 in xpc_main ()
#55	0x1bb66208 in -[NSXPCListener resume] ()
#56	0x3351142c in -[_EXRunningExtension resume] ()
#57	0x335112dc in -[_EXRunningExtension startWithArguments:count:] ()
#58	0x3352e52c in EXExtensionMain ()
#59	0x1c2644b8 in NSExtensionMain ()
#60	0x4e69167c in start ()

Looking at the stack I assume that I have some sort of memory leak connected with PNG image. Normal complication memory is 6.8MB, but sometimes it starts rapidly grow and after 3-5 sec it reaches the 15 MB limit.

My complication doesn't use any heavy images. I commented out all of them, I also commented out all custom fonts, just in case. The crash is still there.

Segmentation fault

On my Fedora 23 (x86_64) box, I get a segfault when using the lzfse CLI to compress asyoulik.txt from the Canterbury Corpus. After adding AddressSanitizer to get details (i.e., the segfault happens without it), I get:

$ make CFLAGS="-g -fsanitize=address -fno-omit-frame-pointer" && ./build/bin/lzfse -encode -i ~/local/src/squash-benchmark/asyoulik.txt -o compressed.lzfse
cc -g -fsanitize=address -fno-omit-frame-pointer -c src/lzfse_encode.c -o build/obj/lzfse_encode.o
cc -g -fsanitize=address -fno-omit-frame-pointer -c src/lzfse_decode.c -o build/obj/lzfse_decode.o
cc -g -fsanitize=address -fno-omit-frame-pointer -c src/lzfse_encode_base.c -o build/obj/lzfse_encode_base.o
cc -g -fsanitize=address -fno-omit-frame-pointer -c src/lzfse_decode_base.c -o build/obj/lzfse_decode_base.o
cc -g -fsanitize=address -fno-omit-frame-pointer -c src/lzvn_encode_base.c -o build/obj/lzvn_encode_base.o
cc -g -fsanitize=address -fno-omit-frame-pointer -c src/lzvn_decode_base.c -o build/obj/lzvn_decode_base.o
cc -g -fsanitize=address -fno-omit-frame-pointer -c src/lzfse_fse.c -o build/obj/lzfse_fse.o
ld -r -o ./build/obj/liblzfse_master.o ./build/obj/lzfse_encode.o  ./build/obj/lzfse_decode.o ./build/obj/lzfse_encode_base.o ./build/obj/lzfse_decode_base.o ./build/obj/lzvn_encode_base.o ./build/obj/lzvn_decode_base.o ./build/obj/lzfse_fse.o
ar rvs build/bin/liblzfse.a ./build/obj/liblzfse_master.o
ar: creating build/bin/liblzfse.a
a - ./build/obj/liblzfse_master.o
cc -g -fsanitize=address -fno-omit-frame-pointer -c src/lzfse_main.c -o build/obj/lzfse_main.o
cc -g -fsanitize=address -fno-omit-frame-pointer -o build/bin/lzfse ./build/obj/lzfse_main.o ./build/bin/liblzfse.a
=================================================================
==19712==ERROR: AddressSanitizer: unknown-crash on address 0x7fa27100b9ef at pc 0x00000040ac7d bp 0x7ffdbdf24760 sp 0x7ffdbdf24750
READ of size 8 at 0x7fa27100b9ef thread T0
    #0 0x40ac7c in load8 src/lzfse_internal.h:395
    #1 0x40ac7c in lzfse_encode_base src/lzfse_encode_base.c:707
    #2 0x402a83 in lzfse_encode_buffer src/lzfse_encode.c:91
    #3 0x401fb1 in main src/lzfse_main.c:221
    #4 0x7fa26fb01730 in __libc_start_main (/lib64/libc.so.6+0x20730)
    #5 0x4013c8 in _start (/home/nemequ/local/src/lzfse/build/bin/lzfse+0x4013c8)

0x7fa27100b9f6 is located 0 bytes to the right of 250358-byte region [0x7fa270fce800,0x7fa27100b9f6)
allocated by thread T0 here:
    #0 0x7fa26ff6b150 in realloc (/lib64/libasan.so.3+0xc7150)
    #1 0x4014b8 in lzfse_reallocf src/lzfse_main.c:34
    #2 0x401d70 in main src/lzfse_main.c:172
    #3 0x7fa26fb01730 in __libc_start_main (/lib64/libc.so.6+0x20730)

SUMMARY: AddressSanitizer: unknown-crash src/lzfse_internal.h:395 in load8
Shadow bytes around the buggy address:
  0x0ff4ce1f96e0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
  0x0ff4ce1f96f0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
  0x0ff4ce1f9700: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
  0x0ff4ce1f9710: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
  0x0ff4ce1f9720: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
=>0x0ff4ce1f9730: 00 00 00 00 00 00 00 00 00 00 00 00 00[00]06 fa
  0x0ff4ce1f9740: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
  0x0ff4ce1f9750: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
  0x0ff4ce1f9760: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
  0x0ff4ce1f9770: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
  0x0ff4ce1f9780: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
Shadow byte legend (one shadow byte represents 8 application bytes):
  Addressable:           00
  Partially addressable: 01 02 03 04 05 06 07 
  Heap left redzone:       fa
  Heap right redzone:      fb
  Freed heap region:       fd
  Stack left redzone:      f1
  Stack mid redzone:       f2
  Stack right redzone:     f3
  Stack partial redzone:   f4
  Stack after return:      f5
  Stack use after scope:   f8
  Global redzone:          f9
  Global init order:       f6
  Poisoned by user:        f7
  Container overflow:      fc
  Array cookie:            ac
  Intra object redzone:    bb
  ASan internal:           fe
  Left alloca redzone:     ca
  Right alloca redzone:    cb
==19712==ABORTING

Support passing NULL as scratch_buffer

If passed NULL, the encode/decode functions should allocate the scratch buffer internally.

Compression of inputs larger than 2GiB degraded

lzfse uses int32_t internally to store offsets. In lzfse_encode_buffer_with_scratch() there is code which handles huge input sizes by processing it in blocks and translating the offsets so they never become too large.

lzfse/src/lzfse_encode.c

Lines 82 to 89 in e634ca5

    
           if (src_size >= 0xffffffffU) { 
        
             //  lzfse only uses 32 bits for offsets internally, so if the input 
        
             //  buffer is really huge, we need to process it in smaller chunks. 
        
             //  Note that we switch over to this path for sizes much smaller 
        
             //  2GB because it's actually faster to change algorithms well before 
        
             //  it's necessary for correctness. 
        
             //  The first chunk, we just process normally. 
        
             const lzfse_offset encoder_block_size = 262144;

Initially there was a bug in this code where the blocking could be used on smaller input sizes than the block size, but the fix in af2993d introduced another problem -- the blocking is now only used for inputs larger than approximately 4GiB.

This means once the offsets approach 2GiB, the addition at

lzfse/src/lzfse_encode_base.c

Lines 685 to 687 in e634ca5

    
           int32_t ref = h.pos[k]; 
        
           if (ref + LZFSE_ENCODE_MAX_D_VALUE < pos) 
        
             continue; // too far

results in integer overflow (undefined behavior). On x86_64 (I tried GCC and Clang) this makes the comparison true, skipping matches.

As the offsets exceed 2GiB, the cast at

lzfse/src/lzfse_encode_base.c

Line 665 in e634ca5

newH.pos[0] = (int32_t)pos;

stores a large negative value in the int32_t, which also makes the comparison mentioned above true.

One possible solution is to change the block handling so it is applied to smaller input sizes, another would be to use int64_t for offsets internally.

As an example, here are the results for compressing the first 3GiB of enwik10 - note the compressed size:

$ ./lzfse -encode -i enwik3 -o enwik3.lzfse -v
LZFSE encode
Input: enwik3
Output: enwik3.lzfse
Input size: 3221225472 B
Output size: 1440940699 B
Compression ratio: 2.236
Speed: 25.10 ns/B, 38.00 MB/s

$ ./lzfse -decode -i enwik3.lzfse -o enwik3.de -v
LZFSE decode
Input: enwik3.lzfse
Output: enwik3.de
Input size: 1440940699 B
Output size: 3221225472 B
Compression ratio: 2.236
Speed: 2.92 ns/B, 326.40 MB/s

And here the results using a 512MiB block size on inputs above 512MiB (the original code uses a 256KiB block size, I am unsure why):

$ ./lzfse -encode -i enwik3 -o enwik3.lzfse -v
LZFSE encode
Input: enwik3
Output: enwik3.lzfse
Input size: 3221225472 B
Output size: 1069320268 B
Compression ratio: 3.012
Speed: 22.80 ns/B, 41.82 MB/s

$ ./lzfse -decode -i enwik3.lzfse -o enwik3.de -v
LZFSE decode
Input: enwik3.lzfse
Output: enwik3.de
Input size: 1069320268 B
Output size: 3221225472 B
Compression ratio: 3.012
Speed: 2.61 ns/B, 365.21 MB/s

If there is any interest in fixing this I am happy to open a PR.

LZFSE doesn't work on big endian

I'm setting up CI for Squash on POWER8, and LZFSE is one of the codecs which fails. To make sure the issue isn't with Squash, I tried a simple test of just LZFSE:

[fedora@squash-be lzfse]$ ./build/bin/lzfse -encode < README.md | ./build/bin/lzfse -decode
malloc: Cannot allocate memory
[fedora@squash-be lzfse]$ uname -a
Linux squash-be.novalocal 4.5.7-202.fc23.ppc64 #1 SMP Thu Jun 30 10:45:03 UTC 2016 ppc64 ppc64 ppc64 GNU/Linux

There are lots of other errors for LZFSE in the Squash test suite (which is pretty evil) on ppc64be (sample log), so it could be helpful for testing.

If it would help, I can provide access to a VM for testing. Just e-mail me, or drop by #squash on freenode, and we can figure something out.

gzip switch compatibility

I wanted to test lzfse for serious work, but it seems it is not a drop-in replacement for gzip.

Please consider making a command that is switch compatible with gzip/bzip2/zstd/lzip/lzop/xz.

For my use I need 'cat foo | lzfse -1 | wc' and 'cat foo.lze | lzfse -d | wc' to work.

	if (src_size >= 0xffffffffU) {
	// lzfse only uses 32 bits for offsets internally, so if the input
	// buffer is really huge, we need to process it in smaller chunks.
	// Note that we switch over to this path for sizes much smaller
	// 2GB because it's actually faster to change algorithms well before
	// it's necessary for correctness.
	// The first chunk, we just process normally.
	const lzfse_offset encoder_block_size = 262144;

	int32_t ref = h.pos[k];
	if (ref + LZFSE_ENCODE_MAX_D_VALUE < pos)
	continue; // too far