Code Monkey home page Code Monkey logo

ionescu007 / minlzma Goto Github PK

View Code? Open in Web Editor NEW
339.0 12.0 28.0 263 KB

The Minimal LZMA (minlzma) project aims to provide a minimalistic, cross-platform, highly commented, standards-compliant C library (minlzlib) for decompressing LZMA2-encapsulated compressed data in LZMA format within an XZ container, as can be generated with Python 3.6, 7-zip, and xzutils

License: MIT License

CMake 3.20% C 96.80%
compression-algorithm compression-library lzma lzma2 lzma-sdk lzmasdkoc arithmetic-coding range-coder lempel-ziv markov-chain

minlzma's People

Contributors

fizbin avatar gtanadam avatar ionescu007 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

minlzma's Issues

Support uncompressed lzma2 packets

Support for uncompressed packets is not really an optional feature; failure to support uncompressed packets means that minlzdec fails to perform as expected depending on a feature of the original uncompressed file.

For example, this works as expected:

$ rm testnoise*
$ (perl -e 'print("This is a test file.\n" x 40)'; dd if=/dev/urandom bs=512 count=1; perl -e 'print("This is a test file.\n" x 40)';) > testnoise
1+0 records in
1+0 records out
512 bytes transferred in 0.000082 secs (6242685 bytes/sec)
$ shasum -a 256 -b testnoise
76dddc82b3e6f1eb29cdb88955f920fe5bb120bba9dcdc9ae49c0ced23fcfa9a *testnoise
$ xz -C crc32 testnoise
$ ls testnoise*
testnoise.xz
$ minlzdec/minlzdec testnoise.xz testnoise
minlzdec v.1.0.5 -- http://ionescu007.github.io/minlzma
Copyright(c) 2020 Alex Ionescu (@aionescu)

Input file size: 628
Decompressed file will be 2192 bytes (0.286496% ratio)
Decompressed 2192 bytes
$ shasum -a 256 -b testnoise
76dddc82b3e6f1eb29cdb88955f920fe5bb120bba9dcdc9ae49c0ced23fcfa9a *testnoise

However, if we change the original (uncompressed) data, this does not work:

$ rm testnoise*
$ (perl -e 'print("This is a test file.\n" x 40)'; dd if=/dev/urandom bs=512 count=800; perl -e 'print("This is a test file.\n" x 40)';) > testnoise
800+0 records in
800+0 records out
409600 bytes transferred in 0.019239 secs (21290144 bytes/sec)
$ shasum -a 256 -b testnoise
e3ec02171cc7ff79912d5a7574ec339095fdf44bebdff76f395e45cebf6f827c *testnoise
$ xz -C crc32 testnoise
$ ls testnoise*
testnoise.xz
$ minlzdec/minlzdec testnoise.xz testnoise
minlzdec v.1.0.5 -- http://ionescu007.github.io/minlzma
Copyright(c) 2020 Alex Ionescu (@aionescu)

Input file size: 411168
Decoding failed after 0 bytes
$ shasum -a 256 -b testnoise
shasum: testnoise: 

(Also note that the defaults of xz would produce something that is not decodable by minlzdec: namely, xz defaults to using CRC64 integrity checks)

Certain files are failing to decompile.

I have a series of lzma compressed files hosted on an http server that I plan on downloading and decompressing with minlzma. The issue is, some files fail to decompress for unknown reasons. I know they are valid files because the linux command xz and functions from liblzma can decompress then just fine, but they are failing to decompress with minlzma.

I did do some research into some of the files and it seems like when the "reset state" is 1, it is unhandled, which causes the file decompression to fail. I tried to implement my own fix for this, however I know very little about the lzma compression method, and may have done it incorrectly. My attempt can be found here. This allowed a lot more files to be decompressed successfully, but there were still some left that I did not look into very deeply that failed.

I have attached a zip file called "reset-state" which contains files that failed for me because of the reset state, and one called "misc-fail" that failed for some other unknown reason.

reset-state.zip
misc-fail.zip

If you wish to view the entire repository of lzma compressed files, they
are currently being hosted here:
https://svn.openfortress.fun/files/

And for reference, this is how I tried implementing minlzma in the launcher:
https://github.com/Walaryne/oflauncher-stainless/blob/bc03388f1bd427af906511d3d72de7f2d40278f9/src/launcher/net/OFSNet.cpp#L61

warning: multi-character character constant

When compiling minlzdec using Zig CC (clang wrapper) I get the following warning:

./minlzlib/xzstream.h:39:40: warning: multi-character character constant [-Wmultichar]
const uint16_t k_XzStreamFooterMagic = 'ZY';
                                       ^

static_assert is not valid C11

static_assert is a C++ keyword and thus this does not compile when using a purely C compiler (clang via LLVM for Windows in my case)

_Static_assert is the appropriate keyword in C11.

However MSVC's cl compiler is a C++ compiler and doesn't really implement C11. So pure usage of _Static_assert is not valid.

An important thing to note here is that when checking for MSVC in CMake you cannot purely rely on the MSVC keyword. When using the clang-cl driver for Visual Studio (configured with cmake by passing -T clangcl) CMake will report set MSVC to true since clang-cl is a MSVC-styled compiler. But just because clang-cl is MSVC-styled that doesn't mean it's really 100% compatible.

clang and clang-cl on Windows work properly with _Static_assert since they both compile .c files as C11. Neither will work with static_assert.

MS's cl compiler will compile with static_assert but will not compile with _Static_assert

Certain compressed files with a small final block won't decompress

Consider this:

bash-5.0# dd if=/dev/zero bs=1 count=20 | xz -C none -c > zeros.xz
20+0 records in
20+0 records out
bash-5.0# minlzma/minlzdec/minlzdec zeros.xz zeros
minlzdec v.1.1.1 -- http://ionescu007.github.io/minlzma
Copyright(c) 2020 Alex Ionescu (@aionescu)

Input file size: 60
Decompressed file will be 20 bytes (3.000000% ratio)
Decoding failed after 0 bytes

Just to show this doesn't happen purely on all-zero files; it can happen with "normal" files that just end at an unfortunate spot:

bash-5.0# wget -O- https://www.gutenberg.org/ebooks/16328.txt.utf-8 | dd bs=1 count=184645 | xz -C none -c > beowulf.xz
Connecting to www.gutenberg.org (152.19.134.47:443)
Connecting to www.gutenberg.org (152.19.134.47:80)
writing to stdout
-                     27% |********************                                                     | 83678  0:00:02 ETA184645+0 records in
184645+0 records out
bash-5.0# minlzma/minlzdec/minlzdec beowulf.xz beowulf
minlzdec v.1.1.1 -- http://ionescu007.github.io/minlzma
Copyright(c) 2020 Alex Ionescu (@aionescu)

Input file size: 61516
Decompressed file will be 184645 bytes (0.333158% ratio)
Decoding failed after 184626 bytes

#pragma once is not standard

First off, very nice! I am still looking, and apologize for starting with a "complaint", albeit a very minor one: but you label this as a "standards-compliant C library", yet are using #pragma once which is a non-standard compiler extension. Luckily this would be easy to rectify by using good old standard include guards. :-)

Usage question

Hello!

This is not an issue, but more of a usage question. Looking at the API of the library and the sample application, I can see that it reads from an input buffer and decompresses into a single output buffer. Does it mean that it supports only archives that contain a single file, or just that the library is concerned only with decompression of lzma streams, and delegates the task of archive processing (traversing the container to get all entries) to the user / another library? If that's the case, any recommendation on with which libraries have you used it successfully to say, combine it and provide a full-featured extractor that supports multiple files as well?

Thanks!

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.