emmanuel-marty / lzsa Goto Github PK

Byte-aligned, efficient lossless packer that is optimized for fast decompression on 8-bit micros

License: Other

Makefile 0.16% Assembly 36.68% C 58.26% CMake 4.90%

compression 8-bit retrocomputing suffix-array 8088 c z80 6502 zx-spectrum c64

lzsa's Introduction

LZSA is a collection of byte-aligned compression formats that are specifically engineered for very fast decompression on 8-bit systems. It can compress files of any size by using blocks of a maximum size of 64 Kb with block-interdependent compression and up to 64 Kb of back-references for matches.

^{*ZX Spectrum}

Check out The Hollow by Darklite and Offense, winner of the Solskogen 2019 wild compo, that uses LZSA on Z80.

Gabba by Stardust ranked 2nd in the ZX Spectrum demo compo at CAFe demoparty 2019 and also used LZSA on Z80.

Myst Demake for the Apple II by Vince Weaver, uses LZSA on 6502.

The 8 bit guy's Commander X16 ROM uses LZSA on 6502 as well.

RomWBW uses LZSA on Z80 for a variety of hobbyist computers.

The popular rasm assembler for Z80 features LZSA-compressed data sections.

The desolate game port to the ZX Spectrum uses LZSA compression on Z80.

Marsmare: Alienation, the winner of the recent Yandex Retro Games Battle 2020, is using LZSA to compress its assets.

The Lowtech demo for the Apple II+ and IIe, by Wiz/Imphobia, compresses data with LZSA.

The Druid & Droid game for the Amstrad CPC, also uses LZSA for compression.

The LZSA compression tool uses an aggressive optimal packing strategy to try to find the sequence of commands that gives the smallest packed file that decompresses to the original while maintaining the maximum possible decompression speed.

The compression formats give the user choices that range from decompressing faster than LZ4 on 8-bit systems with better compression, to compressing as well as ZX7 with much better decompression speed. LZSA1 is designed to replace LZ4 and LZSA2 to replace ZX7, in 8-bit scenarios.

Compression ratio comparison between LZSA and other optimal packers, for a workload composed of ZX Spectrum and C64 files:

                     Bytes            Ratio            Decompression speed vs. LZ4
LZSA2                676681           52,49% <------   75%   
MegaLZ 4.89          679041           52,68%           Not measured
ZX7                  687133           53,30%           47,73%
LZ5 1.4.1            727107           56,40%           75%
LZSA1                735785           57,08% <------   90%
Lizard -29           776122           60,21%           Not measured
LZ4_HC -19 -B4 -BD   781049           60,59%           100%
Uncompressed         1289127          100%             N/A

Performance over well-known compression corpus files:

                     Uncompressed     LZ4_HC -19 -B4 -BD    LZSA1                LZSA2
Canterbury           2810784          935827 (33,29%)       850792 (30,27%)      770877 (27,43%)
Silesia              211938580        77299725 (36,47%)     73706340 (34,78%)    68928564 (32,52%)
Calgary              3251493          1248780 (38,40%)      1192123 (36,67%)     1110290 (34,15%)
Large                11159482         3771025 (33,79%)      3648393 (32,69%)     3519480 (31,54%)
enwik9               1000000000       371841591 (37,18%)    355360043 (35,54%)   334900611 (33,49%)

As an example of LZSA1's simplicity, a size-optimized decompressor on Z80 has been implemented in 67 bytes.

The compressor is approximately 2X slower than LZ4_HC but compresses better while maintaining similar decompression speeds and decompressor simplicity.

The main differences between LZSA1 and the LZ4 compression format are:

The use of short (8-bit) match offsets where possible. The match-finder and optimizer cooperate to try and use the shortest match offsets possible.
Shorter encoding of lengths. As blocks are maximum 64 Kb in size, lengths can only be up to 64 Kb.
As a result of the smaller commands due to the possibly shorter match offsets, a minimum match size of 3 bytes instead of 4. The use of small matches is driven by the optimizer, and used where they provide gains.

As for LZSA2:

5-bit, 9-bit, 13-bit and 16-bit match offsets, using nibble encoding
Rep-matches
Shorter encoding of lengths, also using nibbles
A minmatch of 2 bytes
No (slow) bit-packing. LZSA2 uses byte alignment in the hot path, and nibbles.

Inspirations:

LZ4 by Yann Collet.
LZ5/Lizard by Przemyslaw Skibinski and Yann Collet.
The suffix array intervals in Wimlib by Eric Biggers.
ZX7 by Einar Saukas
apc by Sven-Åke Dahl
Charles Bloom's compression blog

License:

The LZSA code is available under the Zlib license.
The match finder (matchfinder.c) is available under the CC0 license due to using portions of code from Eric Bigger's Wimlib in the suffix array-based matchfinder.

8-bit assembly code:

Z80 decompressors (size- and speed-optimized) written by introspec with optimizations by uniabis
6502 and 8088 size-optimized improvements by Peter Ferrie
6502 speed-optimized decompressor by John Brandwood
8088 speed-optimized decompressor by Jim Leonard
6809 decompressors (Tandy CoCo, Thomson MO/TO, Dragon 32/64..) optimized by Doug Masten
Hitachi 6309 decompressors (Tandy CoCo 3) also contributed by Doug Masten

External links:

i8080 and PDP-11 decompressors by Ivan Gorodetsky
MC68000 decompressors by Steven Tattersall
Gameboy decompressors by Meltycode, based on the Z80 code by introspec
Streamed LZSA2 depacker by C. Masloch
LZSA's page on Pouet

Compressed format

Decompression code is provided for common 8-bit CPUs such as Z80 and 6502. However, if you would like to write your own, or understand the encoding, LZSA compresses data to a format that is fast and simple to decompress on 8-bit CPUs. It is encoded in either a stream of blocks, or as a single raw block, depending on command-line settings. The encoding is deliberately designed to avoid complicated operations on 8-bits (such as 16-bit math).

lzsa's People

Contributors

Stargazers

Watchers

lzsa's Issues

gcc with C++11 incompatibilities

Marty, many thanks for the LZSA code, I re-used it for my Sav2Cart utility here: https://github.com/nzeemin/ukncbtl-utils/tree/master/Sav2Cartridge

Just FYI, compiling the LZSA code under Ubuntu with gcc, in C++11 mode, I found two small incompatibilities, both are very easy to fix:
1.

lzsa/shrink_block_v2.c:444:28: error: ‘for’ loop initial declarations are only allowed in C99 mode
                            for (int nn = n;
                            ^
lzsa/shrink_block_v2.c:444:28: note: use option -std=c99 or -std=gnu99 to compile your code

lzsa/matchfinder.c:135:4: error: ‘for’ loop initial declarations are only allowed in C99 mode
    for (int r = 1; r < nInWindowSize; r++) {
    ^
lzsa/matchfinder.c:135:4: note: use option -std=c99 or -std=gnu99 to compile your code

Format documentation typo in LZSA2.

Hello, Emmanuel.

Given the simplicity of LZSA2, I am using it as a replacement for the Heatshrink library. The role of the compression library is to assist in OS booting on a 32-bit ARMv7-A processor with megabytes of RAM, running at GHz speeds.

I am using a custom decoder that allows me to feed the decompressor byte-by-byte, just as is the case for Heatshrink. This has allowed me to cut on boot times since I can do something else while I wait for more compressed data to stream into my RAM, even if the decompression is no longer very fast. The bottleneck is the flash memory, which I can access in parallel to the decompression, so this was a worthwhile trade.

I have been relying on the GitHub documentation in creating my decoder. While doing so I have encountered this puzzle:

If the encoded match length is 7 or more, the 'M' bits in the token form the value 7, and an extra nibble is read:
0-14: the value is added to the ***3*** stored in the token, and then the minmatch of 2 is added, to compose the final match length.

Shouldn't this be 7? Or 3 bits?

Thank you for the hard work and for the beautiful file format.

Suggestion: Support for streamed decompression?

Hi there, thanks for this great project.

A while ago, I wrote a custom 8-bit oriented LZ4 compressor/decoder (but nowhere near as optimal as lzsa!), mainly to solve a particular problem of "streamed decompression", where we want to partially decompress data on the fly but without requiring full access to all of the previously decompressed data stream.

This is useful in 8-bit scenarios for example where we might be decompressing video or audio data to be consumed byte-by-byte through a small in-memory buffer, and it is not practical nor desirable to decompress the whole thing in one go due to memory or latency constraints.

In my custom modification to LZ4 I achieved this by just limiting the window size (similar to BLOCK_SIZE in lzsa I suspect) for match offsets, and setting it to some user provided command line value (in my use-case anywhere from 256 bytes to 2048 bytes).

In this way, we know the decoder will never need to persist more than WINDOW_SIZE previously decompressed bytes in memory, so all we need is a WINDOW_SIZE memory buffer on the decoder side, and some fairly trivial helper functions to supply decompressed bytes one at a time from the compressed data stream. (I just implemented a simple state machine in my 6502 decoder to keep a track of ongoing literal and match runs to facilitate fetching of individual bytes)

Naturally, setting a smaller window size for match offsets will degrade compression ratio, but we can happily accept that trade-off in exchange for the streamed decompression capability. I still achieved pretty decent ratios even with a tiny 256 byte window.

In summary, do you think the ability to specify the maximum match offset window size would be a feasible possibility for lzsa to support?
Thanks!

Segmentation fault when compressing since commit on 2021-06-04

I tried updating my lzsa repo from its prior revision of 5cfec00 today. However, the compression crashed with a segmentation fault. A git bisect run ended up pointing to 5e404e9 as the first bad commit. This is on a Debian 11 amd64 server. Observe:

lzsa$ clang --version
Debian clang version 11.0.1-2
Target: x86_64-pc-linux-gnu
Thread model: posix
InstalledDir: /usr/bin
lzsa$ git checkout 65d6972 && make && lzsa -c -f2 --prefer-ratio -v debug.big /tmp/output.bin
Previous HEAD position was 5e404e9 Small LZSA2 ratio increase for some files
HEAD is now at 65d6972 Add link to streamed LZSA2 depacker for 8088
clang -O3 -g -fomit-frame-pointer -Isrc/libdivsufsort/include -Isrc -c src/../src/shrink_block_v2.c -o obj/src/shrink_block_v2.o
clang -O3 -g -fomit-frame-pointer -Isrc/libdivsufsort/include -Isrc -c src/../src/shrink_context.c -o obj/src/shrink_context.o
clang obj/src/lzsa.o obj/src/dictionary.o obj/src/expand_block_v1.o obj/src/expand_block_v2.o obj/src/expand_context.o obj/src/expand_inmem.o obj/src/expand_streaming.o obj/src/frame.o obj/src/matchfinder.o obj/src/shrink_block_v1.o obj/src/shrink_block_v2.o obj/src/shrink_context.o obj/src/shrink_inmem.o obj/src/shrink_streaming.o obj/src/stream.o obj/src/libdivsufsort/lib/divsufsort.o obj/src/libdivsufsort/lib/divsufsort_utils.o obj/src/libdivsufsort/lib/sssort.o obj/src/libdivsufsort/lib/trsort.o  -o lzsa
Compressed 'debug.big' in 0.358186 seconds, 0.26 Mb/s, 12710 tokens (7.66515 bytes/token), 97424 into 62853 bytes ==> 64.5149 %
Compared '/tmp/output.bin' in 0.000311 seconds, 298.748 Mb/s
lzsa$ git checkout 5e404e9 && make && lzsa -c -f2 --prefer-ratio -v debug.big /tmp/output.bin
Previous HEAD position was 65d6972 Add link to streamed LZSA2 depacker for 8088
HEAD is now at 5e404e9 Small LZSA2 ratio increase for some files
clang -O3 -g -fomit-frame-pointer -Isrc/libdivsufsort/include -Isrc -c src/../src/shrink_block_v2.c -o obj/src/shrink_block_v2.o
clang -O3 -g -fomit-frame-pointer -Isrc/libdivsufsort/include -Isrc -c src/../src/shrink_context.c -o obj/src/shrink_context.o
clang obj/src/lzsa.o obj/src/dictionary.o obj/src/expand_block_v1.o obj/src/expand_block_v2.o obj/src/expand_context.o obj/src/expand_inmem.o obj/src/expand_streaming.o obj/src/frame.o obj/src/matchfinder.o obj/src/shrink_block_v1.o obj/src/shrink_block_v2.o obj/src/shrink_context.o obj/src/shrink_inmem.o obj/src/shrink_streaming.o obj/src/stream.o obj/src/libdivsufsort/lib/divsufsort.o obj/src/libdivsufsort/lib/divsufsort_utils.o obj/src/libdivsufsort/lib/sssort.o obj/src/libdivsufsort/lib/trsort.o  -o lzsa
Segmentation fault

Dropping any of the switches doesn't help:

lzsa$ lzsa -c -f2 --prefer-ratio -v debug.big /tmp/output.bin
Segmentation fault
lzsa$ lzsa -c -f2 --prefer-ratio debug.big /tmp/output.bin
Segmentation fault
lzsa$ lzsa -c -f2 debug.big /tmp/output.bin
Segmentation fault
lzsa$ lzsa -c debug.big /tmp/output.bin
Segmentation fault
lzsa$ lzsa debug.big /tmp/output.bin
Segmentation fault

Test file is available at https://pushbx.org/ecm/test/20220323/debug.big -- however it does not seem like it matters, trying to compress the readme of lzsa itself also fails:

lzsa$ ./lzsa README.md /tmp/output.bin
Segmentation fault

MC68000 depacker source

Hi there,

Just letting you know that there is a 68k depack source for both LZSA formats here: https://github.com/tattlemuss/lz4-m68k/blob/master/src/lzsa.s

At the author mentions, it's not super optimised but it's correct, so I figure it's better than starting from scratch :)

msufsort

msufsort is usually faster than divsufsort.
https://github.com/michaelmaniscalco/msufsort

improvement of literal run length & match length encoding

`low=LITERALS_RUN_LEN_V1 or MATCH_RUN_LEN_V1

if(len >= low)
if((len-=low)<254)
output len
else if((len-=254)<256)
output 254,len
else ouput 255,len&255,len>>8

EOD marker:
ouput 0,255,255,255(because it is greater than BLOCK_SIZE)`

Can't decompress raw block backwards on 6502

Likely my own fault but I have;

Compressed data using

        auto packed_size =
            lzsa_compress_inmem(..., LZSA_FLAG_RAW_BACKWARD | LZSA_FLAG_RAW_BLOCK, 0, 1);

Verified the result using the lzma command line tool
Using 6502/decompress_fast_v1.asm for decompression, with -DBACKWARDS_DECOMPRESS=1
Packed data is at $880, unpacker at $400
Passing the address of the last byte (end-1) as source and a high memory location ($c000) as dest

I see the unpacker running, first reading 3 bytes, and then just writing and reading one byte at a time, going further and further down in memory, way past the beginning of the packed data, until it hits the depacker routine (at $400) and crashes.

The first 3 bytes read are the last 3 bytes of the compressed data (0x70 0x2c 0x00, in reverse order) like expected.

Any idea what could be wrong?

Bug in decompress_small_v1.S for 8088?

The following is compressed using lzsa on Linux : lzsa -f1 -r test.txt test.bin

00000000: 5468 6973 2069 7320 6120 7465 7374 2066  This is a test f
00000010: 6f72 204c 5a53 4131 2064 6563 6f6d 7072  or LZSA1 decompr
00000020: 6573 736f 7220 696e 2038 3038 3820 6173  essor in 8088 as
00000030: 7365 6d62 6c79 2e0a                      sembly..

Which results in the following.


00000000: 5054 6869 7320 fd7f 2961 2074 6573 7420  PThis ..)a test 
00000010: 666f 7220 4c5a 5341 3120 6465 636f 6d70  for LZSA1 decomp
00000020: 7265 7373 6f72 2069 6e20 3830 3838 2061  ressor in 8088 a
00000030: 7373 656d 626c 792e 0a00 ee00 00         ssembly......

However, when trying to decompress this using the decompress_small_v1.S, it doesn't appear to decompress properly.

00000000: 5468 6973 2000 0000 6120 7465 7374 2066  This ...a test f
00000010: 6f72 204c 5a53 4131 2064 6563 6f6d 7072  or LZSA1 decompr
00000020: 6573 736f 7220 696e 2038 3038 3820 6173  essor in 8088 as
00000030: 7365 6d62 6c79 2e0a                      sembly..

Any idea what's wrong and why "is" doesn't decompress? When I decompress with the lzsa app, it works fine.

8086 depacker for non-raw LZSA2 file

Hello,

I used your NASM source for the space-efficient 8088 LZSA2 raw decompressor to build my 8086 depacker for the lDOS/lDebug/RxDOS kernel file.

I use -f2 for the lzsa tool to create an LZSA2 compressed stream. My files are usually larger than 64 KiB so using raw blocks was not an option.

I adapted your source so it counts down the remaining lengths of the source and destination. This provides a simple type of error check. I also added checks for the end of the block data similar to your C source. I further made it so a back reference can cross a segment boundary by doing segment arithmetic. I changed the code not to use bp so I can use it as a stack frame base pointer throughout. Finally, I added support for overlapping source and destination buffers which checks the far pointers after every match to insure the source data is not corrupted.

Faster 6502 decompressor & suggestions for possible format changes.

Hi, I've been trying out LZSA and I am really impressed with the work that you've done!

I have written a 6502 LZSA2 decompressor that seems to be a bit more efficient than Peter Ferrie's code that you are currently distributing.

I've also identified a couple of potential changes that you might consider making to the LZSA2 format to make it decompress a bit better on the 6502.

Here are some test results (from running the 6502 code on a HuC6280 which has slightly different instruction timings to a standard 6502) ...

lzsa2: Peter Ferrie's 6502 FAST decompression (code 253 bytes long) of gpl-3.0.txt (35147 bytes decompressed)

2723148 cycles : 77 cycles per byte decompressed.

lzsa2: 6502 elmer-standard decompression (code 252 bytes long, 1 bytes shorter) of gpl-3.0.txt (35147 bytes decompressed)

1937239 cycles with LZSA_BEST_SIZE==1 and LZSA_SLOW_NIBL==1 (29% faster than Peter Ferrie's code).

lzse2: 6502 elmer-modified decompression (code 243 bytes long, 10 bytes shorter) of gpl-3.0.txt (35147 bytes decompressed)

1912587 cycles with LZSA_BEST_SIZE==1 and LZSA_SLOW_NIBL==1 (30% faster than Peter Ferrie's code).

24652 cycles saved by changing format.

lzsa2: 6502 elmer-standard decompression (code 267 bytes long, 14 bytes longer) of gpl-3.0.txt (35147 bytes decompressed)

1813175 cycles with LZSA_BEST_SIZE==0 and LZSA_SLOW_NIBL==1 (33% faster than Peter Ferrie's code).

lzse2: 6502 elmer-modified decompression (code 258 bytes long, 5 bytes longer) of gpl-3.0.txt (35147 bytes decompressed)

1788523 cycles with LZSA_BEST_SIZE==0 and LZSA_SLOW_NIBL==1 (35% faster than Peter Ferrie's code).

1788523 cycles : 50 cycles per byte decompressed.

24652 cycles saved by changing format.

For comparison ...

aplib 6502 elmer-standard decompression (code 256 bytes long) of gpl-3.0.txt (35147 bytes decompressed)

2668144 cycles : 75 cycles per byte decompressed.

I'm attaching the decompressor here, and I have checked it into my fork of lzsa, together with the potential changes to the compressor that I wish to bring to your attention.

Note: Of the two potential changes, one seems like it wouldn't cause any slowdown in decompressors on other targets, but the second change probably would.

I can't currently see any simple and elegant way to make your compressor do a platform-specific change to the output format at runtime, but I don't consider that the changes that I propose could justify calling them a separate format.

Anyway, thanks for your consideration!

lzsa2_6502.txt

[Question] Cortex M4 assembly decompressor

Salut Emmanuel,

Before I spend time in learning the ARM/Cortex M4 assembly, did you already code an assembly or C-optimized version of the decompressor ?

My hardware targets are STM32F4 and ATSMAD51.

Merci!

13-bit match offset description error

There's an error in description of 13-bit match in BlockFormat_LZSA2.md
You write:

10Z 13-bit offset: read a nibble for offset bits 9-12 and use the inverted bit Z for bit 8 of the offset, then read a byte for offset bits 0-7. set bits 13-15 of the offset to 1.

But you don't mention subtracting 512 from resulting value.

When creating my own unpacking code, I couldn't understand the reason of error. And found this value of 512 only in your sources.

And one more note about stream format (StreamFormat.md)

You write:

0    1                2
0x7b 0x9e             7 6 5 4 3 2 1
                      V V V Z Z Z Z
<--- signature --->   <- traits ->
Trait bits:

V: 3 bit code that indicates which block data encoding is used. 0 is LZSA1 and 2 is LZSA2.

Z: these bits in the traits are set to 0 for LZSA1 and LZSA2.

Where is zero bit ?
Your packer writes 20h into traits byte for LZSA2 and 0 for LZSA1. So it's must be four V bits (if V = 2 means LZSA2). Or five Z bits (but V = 1 means LZSA2 in that case, not V = 2).

fast match finder

Do you know libsais and esa-matchfinder? those are very fast.

Feature request

Hi there,

i have been using a lz-packer derived from doynax-lz for a while with my 8 bit loader-system bitfire on c64. The pack-ration is about the same as yielded by lzsa. I had a look into the faster depacker and saved a few bytes and found a few spots where to squeeze out a few cycles. Your depacker is really fast. I am thinking of adding this packer to my loader-suite, or might even switch completely to lzsa. So far i lack support for loading-adresses on the packed files, as well as inplace depacking that allows for depacking without overlap. I think that this can also be implemented into lzsa, just as done in my lz. by treating everything that would reach into overlap as literal, and prepending the end-position to the file.

Greetings Toby aka bitbreaker/performers^oxyron

lzsa v1.3.6 crashes when built under linux

Hello,

I've update a previous version 1.2.0 of lzsa which compiled and ran fine under linux to 1.3.6.

The new version compiles successfully but fails when executed, reporting

** stack smashing detected ***: terminated

Failure can be invoked using:

lzsa -test

I am not adept at linux or c but am available to assist in testing to help resolve, please let me know what I can do to assist.

Thanks.

8088 decompressors won't run on 8088

Hi, there's a few instances of this:

shl al,2

...which actually require an 80186 or higher to work. These should be replaced with:

shl al,1
shl al,1

typo?

shrink_block_v2.c line 1033-1044:
comment is not correct. becouse @param nStartOffset and @param nEndOffset is not exist.

Suggestion: ZX0 and ZX1

The ZX0 project says that ZX7 is superseded by ZX0.
https://github.com/einar-saukas/ZX0
Also, a variant ZX1 exists.
Would it be interesting to add those in the comparison list and the pareto graph?

emmanuel-marty / lzsa Goto Github PK

lzsa's Introduction

Compressed format

lzsa's People

Contributors

Stargazers

Watchers

Forkers

lzsa's Issues

Recommend Projects

Recommend Topics

Recommend Org