main-- / rust-lz-fear Goto Github PK
View Code? Open in Web Editor NEWA fast pure-rust no-unsafe implementation of LZ4 compression and decompression
Home Page: https://docs.rs/lz-fear
License: MIT License
A fast pure-rust no-unsafe implementation of LZ4 compression and decompression
Home Page: https://docs.rs/lz-fear
License: MIT License
Hi,
i've been looking over some solutions to compression on embedded rust eg: no_std environment.
Does this crate require standard library?
Thanks, Sam
Really like this crate, nice work @main--!
I bumped into a decompression that fails. It's truncated below but here's a link to the full code.
fn main() {
let input = [0x80, 0x01, 0x04, 0x90, 0x4E, 0xD5, 0x8D, 0xA5, 0xDF, 0xA8, 0xE3, 0xDA, 0xE9, 0x3B, 0xE9, 0x8E, 0xF9, 0xA0, 0xDE, 0x81, 0xC2, 0x84, 0xF9, 0x01, 0x40, 0x40, 0x40, 0x40, 0x87, 0x8C, 0x65, 0x58, 0xFB, 0xF2, 0xB9, 0x23, 0xDF, 0xCB, 0xE2, 0x07, 0x34, 0x2A, 0x92, 0x15, 0x3C, 0xA1, 0x8E, 0x76, 0x1F, 0xB2, 0xED, 0x03, 0xA0, 0xEB, 0x04, 0xA2, 0x25, 0x64, 0x78, 0x2A, 0x64, 0xBF, 0x2E, 0x65, 0x9F, 0xB3, 0x7A, 0x2D, 0x17, 0xC7, 0xBC, 0x67, 0x29, 0x3E, 0x9F, 0x76, 0xA0, 0xAE, 0xAC, 0xB0, 0x4B, 0x5C, 0x30, 0xC9, 0xD0, 0x41, 0x3C, 0x8C, 0x54, 0x24, 0x38, 0x11, 0xEE, 0x19, 0xDF, 0xB5, 0x72, 0xAA, 0x8A, 0x58, 0xF1, 0x2E, 0xCB, 0x29, 0x97, 0x8D, 0xBA, 0xEC, 0xAA, 0x63, 0x1D, 0x29, 0xCB, 0xDB, 0x69, 0x06, 0x90, 0xF5, 0xA0, 0x75, 0xE5, 0x6B, 0x2F, 0xBD, 0xE7, 0xDD, 0xB8, 0x19, 0xDB, 0x1A, 0xB9, 0xF3, 0x0F, 0x37, 0xF3, 0x02, 0xA9, 0x7D, 0x07, 0x16, 0x90, 0x48, 0x1A, 0x2B, 0xB4, 0x3E, 0x7E, 0xDD, 0x3E, 0xFE, 0x15, 0xA1, 0xDB, 0x56, 0x20, 0xDF, 0xB2, 0xD7, 0xA7, 0x75, 0xCF, 0xEC, 0xAD, 0x97, 0x38, 0xE3, 0x6E, 0x1D, 0x1C, 0x51, 0xE9, 0x5A, 0x40, 0x7D, 0xD7, 0xCC, 0x1D, 0x2A, 0x25, 0x31, 0xD4, 0xEC, 0x1C, 0x8D, 0x2D, 0x43, 0x45, 0x11, 0xC5, 0x02, 0xE2, 0xAE, 0xF1, 0xDC, 0x4E, 0xD9, 0x87, 0x0D, 0xB6, 0xB7, 0x3A, 0x29, 0x78, 0xE3, 0x53, 0xE8, 0xDD, 0x71, 0xC8, 0x2E, 0x4A, 0x4C, 0x9A, 0x26, 0x4A, 0xE9, 0x36, 0x08, 0x17, 0x68, 0x18, 0x3F, 0x13, 0x04, 0xCA, 0x61, 0x45, 0x66, 0x9A, 0xC1, 0x09, 0xEB, 0xCA, 0x8F, 0x50, 0xA3, 0xFA, 0x0E, 0x28, 0x37, 0xD3, 0xCA, 0xC4, 0x38, 0xD5, 0x6C, 0x79, 0x9B, 0x1F, 0x8F, 0xA8, 0x8E, 0x91, 0x87, 0x09, 0x31, 0xFB, 0x75, 0xCA, 0xDA, 0xC9, 0x1C, 0x3D, 0xD6, 0xF3, 0x79, 0x87, 0xF9, 0xEE, 0x85, 0x19, 0x9C, 0x6A, 0xC8, 0xA9, 0xA4, 0x76, 0x61, 0x8A, 0xD8, 0x51, 0x3C, 0x70, 0x4E, 0x79, 0x19, 0x58, 0xD5, 0x66, 0x77, 0xC2, 0x71, 0x4D, 0xE3, 0xDB, 0xB2, 0x3E, 0xB4, 0x05, 0x43, 0x62, 0xEB, 0x01, 0xD2, 0x74, 0xA9, 0xD4, 0x7A, 0xCC, 0xB8, 0x69, 0x08, 0xE8, 0x99, 0x28, 0x2C, 0xE9, 0xFC, 0x58, 0x69, 0x68, 0x4B, 0x48, 0xCC, 0x76, 0xFA, 0x83, 0x04, 0x78, 0xA7, 0xF6, 0x20, 0xF2, 0x59, 0x65, 0x23, 0x49, 0xD0, 0x54, 0x77, 0x33, 0xC8, 0xD8, 0xE5, 0x20, 0xB3, 0xB2, 0x76, 0x3C, 0x5E, 0x55, 0x87, 0xFB, 0xF7, 0x0B, 0x89, 0xD7, 0xF7, 0x2B, 0xD1, 0xA5, 0x66, 0x4F, 0x84, 0x94, 0x44, 0xBD, 0x65, 0x5D, 0x15, 0x27, 0x10, 0xC3, 0x21, 0xC2, 0xB4, 0xC7, 0x90, 0x95, 0x02, 0x7D, 0x28, 0x6E, 0xD1, 0xF4, 0xE1, 0x5D, 0x83, 0x79, 0xF5, ...];
let mut compressed = vec![];
lz_fear::framed::CompressionSettings::default().independent_blocks(false).block_size(64*1024).compress(&input[..], &mut compressed).unwrap();
let mut decompressed = vec![];
lz_fear::framed::LZ4FrameReader::new(&*compressed).unwrap().into_read().read_to_end(&mut decompressed).unwrap();
assert_eq!(&input[..], &*decompressed);
}
"underlying IO error: the raw LZ4 decompression failed (data corruption?)"
The compressed bytes are identical to the output of the lz4
crate with default config.
Right now it looks like the output from dictionary mode works, but never matches the C implementation. The most important reason is that the C implementation maintains two distinct hashtables (one for the dictionary, one for the encoded input) whereas this implementation currently simply initializes the encoder hashtable with the dictionary.
Using two tables obviously gives better compression ratios. It's not clear to me however whether this is actually worth it. Perhaps it makes sense to give up on the goal of byte-perfect output when a dictionary is involved.
Hello and thank you for this project.
The speed is incredible though if possible I'd like to trade some of that speed for a better compression ratio. How can I do this?
Thanks again!
With #6 out of the way, decoding fuzzing has discovered another issue in less than 5 minutes - a panic with message 'end drain index (is 7) should be <= len (is 0)'
Sample input triggering the bug, gzipped so that github would accept the upload: lz4-fear-panic.lz4.gz
Code to reproduce is in #5
Backtrace:
thread '<unnamed>' panicked at 'end drain index (is 7) should be <= len (is 0)', src/liballoc/vec.rs:1331:13
stack backtrace:
0: backtrace::backtrace::libunwind::trace
at /cargo/registry/src/github.com-1ecc6299db9ec823/backtrace-0.3.46/src/backtrace/libunwind.rs:86
1: backtrace::backtrace::trace_unsynchronized
at /cargo/registry/src/github.com-1ecc6299db9ec823/backtrace-0.3.46/src/backtrace/mod.rs:66
2: std::sys_common::backtrace::_print_fmt
at src/libstd/sys_common/backtrace.rs:78
3: <std::sys_common::backtrace::_print::DisplayBacktrace as core::fmt::Display>::fmt
at src/libstd/sys_common/backtrace.rs:59
4: core::fmt::write
at src/libcore/fmt/mod.rs:1069
5: std::io::Write::write_fmt
at src/libstd/io/mod.rs:1504
6: std::sys_common::backtrace::_print
at src/libstd/sys_common/backtrace.rs:62
7: std::sys_common::backtrace::print
at src/libstd/sys_common/backtrace.rs:49
8: std::panicking::default_hook::{{closure}}
at src/libstd/panicking.rs:198
9: std::panicking::default_hook
at src/libstd/panicking.rs:218
10: libfuzzer_sys::initialize::{{closure}}
11: std::panicking::rust_panic_with_hook
at src/libstd/panicking.rs:515
12: rust_begin_unwind
at src/libstd/panicking.rs:419
13: core::panicking::panic_fmt
at src/libcore/panicking.rs:111
14: alloc::vec::Vec<T>::drain::end_assert_failed
at src/liballoc/vec.rs:1331
15: lz_fear::framed::decompress::LZ4FrameReader<R>::decode_block
16: <lz_fear::framed::decompress::LZ4FrameIoReader<R> as std::io::Read>::read
17: rust_fuzzer_test_input
18: LLVMFuzzerTestOneInput
19: _ZN6fuzzer6Fuzzer15ExecuteCallbackEPKhm
20: _ZN6fuzzer10RunOneTestEPNS_6FuzzerEPKcm
21: _ZN6fuzzer12FuzzerDriverEPiPPPcPFiPKhmE
22: main
23: __libc_start_main
24: _start
The C implementation uses the U16Table for small inputs, but our framed encoder currently always uses the U32Table.
compress2
)
The attached files, most of which are under 1Kb in size, require more than 2Gb of RAM to decode. Worse, this occurs in an allocation internal to the code - this memory will be consumed even if the client code uses 4Kb buffers.
Sample files triggering the issue: lz4-fear-decode-ooms.zip
The code to reproduce the issue can be found in #5
Decoding the attached file using code from #5 results in a panic:
end drain index (is 33073) should be <= len (is 33058)
Input triggering the crash, gzipped so that github would accept the upload:
lz4-fear-drain-index-panic.lz4.gz
Backtrace:
thread '<unnamed>' panicked at 'end drain index (is 33073) should be <= len (is 33058)', src/liballoc/vec.rs:1331:13
stack backtrace:
0: backtrace::backtrace::libunwind::trace
at /cargo/registry/src/github.com-1ecc6299db9ec823/backtrace-0.3.46/src/backtrace/libunwind.rs:86
1: backtrace::backtrace::trace_unsynchronized
at /cargo/registry/src/github.com-1ecc6299db9ec823/backtrace-0.3.46/src/backtrace/mod.rs:66
2: std::sys_common::backtrace::_print_fmt
at src/libstd/sys_common/backtrace.rs:78
3: <std::sys_common::backtrace::_print::DisplayBacktrace as core::fmt::Display>::fmt
at src/libstd/sys_common/backtrace.rs:59
4: core::fmt::write
at src/libcore/fmt/mod.rs:1069
5: std::io::Write::write_fmt
at src/libstd/io/mod.rs:1504
6: std::sys_common::backtrace::_print
at src/libstd/sys_common/backtrace.rs:62
7: std::sys_common::backtrace::print
at src/libstd/sys_common/backtrace.rs:49
8: std::panicking::default_hook::{{closure}}
at src/libstd/panicking.rs:198
9: std::panicking::default_hook
at src/libstd/panicking.rs:218
10: libfuzzer_sys::initialize::{{closure}}
11: std::panicking::rust_panic_with_hook
at src/libstd/panicking.rs:515
12: rust_begin_unwind
at src/libstd/panicking.rs:419
13: core::panicking::panic_fmt
at src/libcore/panicking.rs:111
14: alloc::vec::Vec<T>::drain::end_assert_failed
at src/liballoc/vec.rs:1331
15: lz_fear::framed::decompress::LZ4FrameReader<R>::decode_block
16: <lz_fear::framed::decompress::LZ4FrameIoReader<R> as std::io::Read>::read
17: rust_fuzzer_test_input
18: LLVMFuzzerTestOneInput
19: _ZN6fuzzer6Fuzzer15ExecuteCallbackEPKhm
20: _ZN6fuzzer10RunOneTestEPNS_6FuzzerEPKcm
21: _ZN6fuzzer12FuzzerDriverEPiPPPcPFiPKhmE
22: main
23: __libc_start_main
24: _start
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.