ckolivas / lrzip Goto Github PK

Long Range Zip

License: GNU General Public License v2.0

C 74.52% Shell 1.69% C++ 20.92% Assembly 0.73% Makefile 0.55% M4 1.51% Dockerfile 0.08%

lrzip's Introduction

lrzip - Long Range ZIP or LZMA RZIP

A compression utility that excels at compressing large files (usually > 10-50 MB). Larger files and/or more free RAM means that the utility will be able to more effectively compress your files (ie: faster / smaller size), especially if the filesize(s) exceed 100 MB. You can either choose to optimise for speed (fast compression / decompression) or size, but not both.

haneefmubarak's TL;DR for the long explanation:

Just change the word directory to the name of the directory you wish to compress.

Compression:

lrzdir=directory; tar cvf $lrzdir.tar $lrzdir; lrzip -Ubvvp `nproc` -S .bzip2-lrz -L 9 $lrzdir.tar; rm -fv $lrzdir.tar; unset lrzdir

tars the directory, then maxes out all of the system's processor cores along with sliding window RAM to give the best BZIP2 compression while being as fast as possible, enables max verbosity output, attaches the extension .bzip2-lrz, and finally gets rid of the temporary tarfile. Uses a tempvar lrzdir which is unset automatically.

Decompression for the kind of file from above:

lrzdir=directory; lrunzip -cdivvp `nproc` -o $lrzdir.tar $lrzdir.tar.bzip2-lrz; tar xvf $lrzdir.tar; rm -vf $lrzdir.tar

Checks integrity, then decompresses the directory using all of the processor cores for max speed, enables max verbosity output, unarchives the resulting tarfile, and finally gets rid of the temporary tarfile. Uses the same kind of tempvar.

lrzip build/install guide:

A quick guide on building and installing.

What you will need

gcc
bash or zsh
pthreads
tar
libc
libm
libz-dev
libbz2-dev
liblzo2-dev
liblz4-dev
coreutils
Optional nasm
git if you want a repo-fresh copy
an OS with the usual *nix headers and libraries

Obtaining the source

Two different ways of doing this:

Stable: Packaged tarball that is known to work:

Go to https://github.com/ckolivas/lrzip/releases and download the tar.gz file from the top. cd to the directory you downloaded, and use tar xvzf lrzip-X.X.tar.gz to extract the files (don't forget to replace X.X with the correct version). Finally, cd into the directory you just extracted.

Latest: git clone -v https://github.com/ckolivas/lrzip.git; cd lrzip

Build

./autogen.sh
./configure
make -j `nproc` # maxes out all cores

Install

Simple 'n Easy™: sudo make install

lrzip 101:

Command	Result
`lrztar directory`	An archive `directory.tar.lrz` compressed with LZMA.
`lrzuntar directory.tar.lrz`	A directory extracted from a `lrztar` archive.
`lrzip filename`	An archive `filename.lrz` compressed with LZMA, meaning slow compression and fast decompression.
`lrzip -z filename`	An archive "filename.lrz" compressed with ZPAQ that can give extreme compression, but takes a bit longer than forever to compress and decompress.
`lrzip -l filename`	An archive lightly compressed with LZO, meaning really, really fast compression and decompression.
`lrunzip filename.lrz`	Decompress filename.lrz to filename.
`lrz filename`	As per lrzip above but with gzip compatible semantics (i.e. will be quiet and delete original file)
`lrz -d filename.lrz`	As per lrunzip above but with gzip compatible semantics (i.e. will be quiet and delete original file)

lrzip internals

lrzip uses an extended version of rzip which does a first pass long distance redundancy reduction. lrzip's modifications allow it to scale to accommodate various memory sizes.

Then, one of the following scenarios occurs:

Compressed
(default) LZMA gives excellent compression @ ~2x the speed of bzip2
ZPAQ gives extreme compression while taking forever
LZO gives insanely fast compression that can actually be faster than simply copying a large file
GZIP gives compression almost as fast as LZO but with better compression
BZIP2 is a defacto linux standard and hacker favorite which usually gives quite good compression (ZPAQ>LZMA>BZIP2>GZIP>LZO) while staying fairly fast (LZO>GZIP>BZIP2>LZMA>ZPAQ); in other words, a good middle-ground and a good choice overall
Uncompressed, in the words of the software's original author:

Leaving it uncompressed and rzip prepared. This form improves substantially any compression performed on the resulting file in both size and speed (due to the nature of rzip preparation merging similar compressible blocks of data and creating a smaller file). By "improving" I mean it will either speed up the very slow compressors with minor detriment to compression, or greatly increase the compression of simple compression algorithms.

(Con Kolivas, from the original lrzip README)

The only real disadvantages:

The main program, lrzip, only works on single files, and therefore requires the use of an lrztar wrapper to fake a complete archiver.
lrzip requires quite a bit of memory along with a modern processor to get the best performance in reasonable time. This usually means that it is somewhat unusable with less than 256 MB. However, decompression usually requires less RAM and can work on less powerful machines with much less RAM. On machines with less RAM, it may be a good idea to enable swap if you want to keep your operating system happy.
Piping output to and/or from STDIN and/or STDOUT works fine with both compression and decompression, but larger files compressed this way will likely end up being compressed less efficiently. Decompression doesn't really have any issues with piping, though.

One of the more unique features of lrzip is that it will try to use all of the available RAM as best it can at all times to provide maximum benefit. This is the default operating method, where it will create and use the single largest memory window that will still fit in available memory without freezing up the system. It does this by mmaping the small portions of the file that it is working on. However, it also has a unique "sliding mmap" feature, which allows it to use compression windows that far exceed the size of your RAM if the file you are compressing is large. It does this by using one large mmap along with a smaller moving mmap buffer to track the part of the file that is currently being examined. From a higher level, this can be seen as simply emulating a single, large mmap buffer. The unfortunate thing about this feature is that it can become extremely slow. The counter-argument to being slower is that it will usually give a better compression factor.

The file doc/README.benchmarks has some performance examples to show what kind of data lrzip is good with.

FAQ

Q: What kind of encryption does lrzip use?

A: lrzip uses SHA2-512 repetitive hashing of the password along with a salt to provide a key which is used by AES-128 to do block encryption. Each block has more random salts added to the block key. The amount of initial hashing increases as the timestamp goes forward, in direct relation to Moore's law, which means that the amount of time required to encrypt/decrypt the file stays the same on a contemporary computer. It is virtually guaranteed that the same file encrypted with the same password will never be the same twice. The weakest link in this encryption mode by far is the password chosen by the user. There is currently no known attack or backdoor for this encryption mechanism, and there is absolutely no way of retrieving your password should you forget it.

Q: How do I make a static build?

A: ./configure --enable-static-bin

Q: I want the absolute maximum compression I can possibly get, what do I do?

A: Try the command line options "-Uzp 1 -L 9". This uses all available ram and ZPAQ compression, and even uses a compression window larger than you have ram. The -p 1 option disables multithreading which improves compression but at the expense of speed. Expect it to take many times longer.

Q: I want the absolute fastest decent compression I can possibly get.

A: Try the command line option -l. This will use the lzo backend compression, and level 7 compression (1 isn't much faster).

Q: How much slower is the unlimited mode?

A: It depends on 2 things. First, just how much larger than your ram the file is, as the bigger the difference, the slower it will be. The second is how much redundant data there is. The more there is, the slower, but ultimately the better the compression. Why isn't it on by default? If the compression window is a LOT larger than ram, with a lot of redundant information it can be drastically slower. I may revisit this possibility in the future if I can make it any faster.

Q: Can I use your tool for even more compression than lzma offers?

A: Yes, the rzip preparation of files makes them more compressible by most other compression technique I have tried. Using the -n option will generate a .lrz file smaller than the original which should be more compressible, and since it is smaller it will compress faster than it otherwise would have.

Q: 32bit?

A: 32bit machines have a limit of 2GB sized compression windows due to userspace limitations on mmap and malloc, so even if you have much more ram you will not be able to use compression windows larger than 2GB. Also you may be unable to decompress files compressed on 64bit machines which have used windows larger than 2GB.

Q: How about 64bit?

A: 64bit machines with their ability to address massive amounts of ram will excel with lrzip due to being able to use compression windows limited only in size by the amount of physical ram.

Q: Other operating systems?

A: The code is POSIXy with GNU extensions. Patches are welcome. Version 0.43+ should build on MacOSX 10.5+

Q: Does it work on stdin/stdout?

A: Yes it does. Compression and decompression work well to/from STDIN/STDOUT. However because lrzip does multiple passes on the data, it has to store a large amount in ram before it dumps it to STDOUT (and vice versa), thus it is unable to work with the massive compression windows regular operation provides. Thus the compression afforded on files larger than approximately 25% RAM size will be less efficient (though still benefiting compared to traditional compression formats).

Q: I have another compression format that is even better than zpaq, can you use that?

A: You can use it yourself on rzip prepared files (see above). Alternatively if the source code is compatible with the GPL license it can be added to the lrzip source code. Libraries with functions similar to compress() and decompress() functions of zlib would make the process most painless. Please tell me if you have such a library so I can include it :)

Q: What's this "Starting lzma back end compression thread..." message?

A: While I'm a big fan of progress percentage being visible, unfortunately lzma compression can't currently be tracked when handing over 100+MB chunks over to the lzma library. Therefore you'll see progress percentage until each chunk is handed over to the lzma library.

Q: What's this "lz4 testing for incompressible data" message?

A: Other compression is much slower, and lz4 is the fastest. To help speed up the process, lz4 compression is performed on the data first to test that the data is at all compressible. If a small block of data is not compressible, it tests progressively larger blocks until it has tested all the data (if it fails to compress at all). If no compressible data is found, then the subsequent compression is not even attempted. This can save a lot of time during the compression phase when there is incompressible data. Theoretically it may be possible that data is compressible by the other backend (zpaq, lzma etc) and not at all by lz4, but in practice such data achieves only minuscule amounts of compression which are not worth pursuing. Most of the time it is clear one way or the other that data is compressible or not. If you wish to disable this test and force it to try compressing it anyway, use -T.

Q: I have truckloads of ram so I can compress files much better, but can my generated file be decompressed on machines with less ram?

A: Yes. Ram requirements for decompression go up only by the -L compression option with lzma and are never anywhere near as large as the compression requirements. However if you're on 64bit and you use a compression window greater than 2GB, it might not be possible to decompress it on 32bit machines.

Q: Why are you including bzip2 compression?

A: To maintain a similar compression format to the original rzip (although the other modes are more useful).

Q: What about multimedia?

A: Most multimedia is already in a heavily compressed "lossy" format which by its very nature has very little redundancy. This means that there is not much that can actually be compressed. If your video/audio/picture is in a high bitrate, there will be more redundancy than a low bitrate one making it more suitable to compression. None of the compression techniques in lrzip are optimised for this sort of data. However, the nature of rzip preparation means that you'll still get better compression than most normal compression algorithms give you if you have very large files. ISO images of dvds for example are best compressed directly instead of individual .VOB files. ZPAQ is the only compression format that can do any significant compression of multimedia.

Q: Is this multithreaded?

A: As of version 0.540, it is HEAVILY multithreaded with the back end compression and decompression phase, and will continue to process the rzip pre-processing phase so when using one of the more CPU intensive backend compressions like lzma or zpaq, SMP machines will show massive speed improvements. Lrzip will detect the number of CPUs to use, but it can be overridden with the -p option if the slightly better compression is desired more than speed. -p 1 will give the best compression but also be the slowest.

Q: This uses heaps of memory, can I make it use less?

A: Well you can by setting -w to the lowest value (1) but the huge use of memory is what makes the compression better than ordinary compression programs so it defeats the point. You'll still derive benefit with -w 1 but not as much.

Q: What CFLAGS should I use?

A: With a recent enough compiler (gcc>4) setting both CFLAGS and CXXFLAGS to -O2 -march=native -fomit-frame-pointer

Q: What compiler does this work with?

A: It has been tested on gcc, ekopath and the intel compiler successfully previously. Whether the commercial compilers help or not, I could not tell you.

Q: What codebase are you basing this on?

A: rzip v2.1 and lzma sdk920, but it should be possible to stay in sync with each of these in the future.

Q: Do we really need yet another compression format?

A: It's not really a new one at all; simply a reimplementation of a few very good performing ones that will scale with memory and file size.

Q: How do you use lrzip yourself?

A: Three basic uses. I compress large files currently on my drive with the -l option since it is so quick to get a space saving. When archiving data for permanent storage I compress it with the default options. When compressing small files for distribution I use the -z option for the smallest possible size.

Q: I found a file that compressed better with plain lzma. How can that be?

A: When the file is more than 5 times the size of the compression window you have available, the efficiency of rzip preparation drops off as a means of getting better compression. Eventually when the file is large enough, plain lzma compression will get better ratios. The lrzip compression will be a lot faster though. The only way around this is to use as large compression windows as possible with -U option.

Q: Can I use swapspace as ram for lrzip with a massive window?

A: It will indirectly do this with -U (unlimited) mode enabled. This mode will make the compression window as big as the file itself no matter how big it is, but it will slow down proportionately more the bigger the file is than your ram.

Q: Why do you nice it to +19 by default? Can I speed up the compression by changing the nice value?

A: This is a common misconception about what nice values do. They only tell the cpu process scheduler how to prioritise workloads, and if your application is the only thing running it will be no faster at nice -20 nor will it be any slower at +19.

Q: What is the LZ4 Testing option, -T?

A: LZ4 testing is normally performed for the slower back-end compression of LZMA and ZPAQ. The reasoning is that if it is completely incompressible by LZ4 then it will also be incompressible by them. Thus if a block fails to be compressed by the very fast LZ4, lrzip will not attempt to compress that block with the slower compressor, thereby saving time. If this option is enabled, it will bypass the LZ4 testing and attempt to compress each block regardless.

Q: Compression and decompression progress on large archives slows down and speeds up. There's also a jump in the percentage at the end?

A: Yes, that's the nature of the compression/decompression mechanism. The jump is because the rzip preparation makes the amount of data much smaller than the compression backend (lzma) needs to compress.

Q: Tell me about patented compression algorithms, GPL, lawyers and copyright.

A: No

Q: I receive an error "LZMA ERROR: 2. Try a smaller compression window." what does this mean?

A: LZMA requests large amounts of memory. When a higher compression window is used, there may not be enough contiguous memory for LZMA: LZMA may request up to 25% of TOTAL ram depending on compression level. If contiguous blocks of memory are not free, LZMA will return an error. This is not a fatal error, and a backup mode of compression will be used.

Q: Where can I get more information about the internals of LZMA?

A: See http://www.7-zip.org and http://www.p7zip.org. Also, see the file ./lzma/C/lzmalib.h which explains the LZMA properties used and the LZMA memory requirements and computation.

Q: This version is much slower than the old version?

A: Make sure you have set CFLAGS and CXXFLAGS. An unoptimised build will be almost 3 times slower.

Q: Why not update to the latest version of libzpaq?

A: For reasons that are unclear the later versions of libzpaq create corrupt archives when included with lrzip

LIMITATIONS

Due to mmap limitations the maximum size a window can be set to is currently 2GB on 32bit unless the -U option is specified. Files generated on 64 bit machines with windows >2GB in size might not be decompressible on 32bit machines. Large files might not decompress on machines with less RAM if SWAP is disabled.

BUGS:

Probably lots. https://github.com/ckolivas/lrzip/issues if you spot any :D

Any known ones should be documented in the file BUGS.

Backends:

rzip: http://rzip.samba.org/

lzo: http://www.oberhumer.com/opensource/lzo/

lzma: http://www.7-zip.org/

zpaq: http://mattmahoney.net/dc/

Thanks (CONTRIBUTORS)

Person(s)	Thanks for
`Andrew Tridgell`	`rzip`
`Markus Oberhumer`	`lzo`
`Igor Pavlov`	`lzma`
`Jean-Loup Gailly & Mark Adler`	`zlib`
`Con Kolivas`	Original Code, binding all of this together, managing the project, original `README`
`Christian Leber`	`lzma` compatibility layer
`Michael J Cohen`	Darwin/OSX support
`Lasse Collin`	fixes to `LZMALib.cpp` and `Makefile.in`
Everyone else who coded along the way (add yourself where appropriate if that's you)	Miscellaneous Coding
`Peter Hyman`	Most of the `0.19` to `0.24` changes
`^^^^^^^^^^^`	Updating the multithreaded `lzma` lib
`^^^^^^^^^^^`	All sorts of other features
`René Rhéaume`	Fixing executable stacks
`Ed Avis`	Various fixes
`Matt Mahoney`	`zpaq` integration code
`Jukka Laurila`	Additional Darwin/OSX support
`George Makrydakis`	`lrztar` wrapper
`Ulrich Drepper`	special implementation of md5
`Michael Blumenkrantz`	New config tools
`^^^^^^^^^^^^^^^^^^^^`	`liblrzip`
Authors of `PolarSSL`	Encryption code
`Serge Belyshev`	Extensive help, advice, and patches to implement secure encryption
`Jari Aalto`	Fixing typos, esp. in code
`Carlo Alberto Ferraris`	Code cleanup
`Peter Hyman`	Additional documentation
`Haneef Mubarak`	Cleanup, Rewrite, and GH Markdown of `README` --> `README.md`

Persons above are listed in chronological order of first contribution to lrzip. Person(s) with names in bold have multiple major contributions, person(s) with names in italics have made massive contributions, person(s) with names in both have made innumerable massive contributions.

README Authors

Con Kolivas (ckolivas on GitHub) [email protected] Tuesday, 16 February 2021: README

Also documented by Peter Hyman [email protected] Sun, 04 Jan 2009: README

Mostly Rewritten + GFMified: Haneef Mubarak (haneefmubarak on GitHub) Sun/Mon Sep 01-02 2013: README.md

lrzip's People

Contributors

Stargazers

Watchers

Forkers

epa jaalto wiedi cafxx cappuccinokid haneefmubarak yulkisa caioycosta stamhe kylemanna theryuu procyon112 grevutiu-gabriel pombredanne ole-tange lweijin zetok kata198 tylerberry wullee rahulz deloclair1 lingyunzhi nsudemaduka mfrcommunity connielim ngophutuathuy25 hpitas shikulja cuijier areading bhimrazy egreen77 danieldjewell dl6aku microhexhq akimdi robomobso zlyux957 davekaran tomy34 hitrzajc sssseb ternava nightly-target leleliu008 oded-ist alebcay externalrepositories lyh16 atsampson usc-isi-bass clayne shaunrolen gonbbb myzookeeper rrassuncao leloubil rubenllorente synthetiic baptistapedro cyberflamego richarah edk-rise seano-vs eat-swap ahmedeldaly097 tromcho bmwiedemann pasaopasen lexember parona-source kokizzu scantist-ossops-m2 jonbirge martin-g jasper1467

lrzip's Issues

Bogus output: No such file or directory

$ lrzip -d -o - -m 10 hda.img.lrz -f
Decompressing...
Unable to malloc buffer of size 693574770 in this environment
No such file or directory
Fatal error - exiting

The No such file or directory seems wrong. What file is missing? (the specified file exists and is a good lrzip compressed file).

Using version 0.631 on cygwin64.

NULL pointer dereference in bufRead::get (libzpaq.h)

On 0.631:

# lrzip -t $FILE
Decompressing...                                                                                                                                                                                                  
Inconsistent length after decompression. Got 0 bytes, expected 2                                                                                                                                                  
Inconsistent length after decompression. Got 0 bytes, expected 2                                                                                                                                                  
Inconsistent length after decompression. Got 0 bytes, expected 2                                                                                                                                                  
Inconsistent length after decompression. Got 0 bytes, expected 2                                                                                                                                                  
Inconsistent length after decompression. Got 0 bytes, expected 2                                                                                                                                                  
Inconsistent length after decompression. Got 0 bytes, expected 2                                                                                                                                                  
Inconsistent length after decompression. Got 0 bytes, expected 2                                                                                                                                                  
Inconsistent length after decompression. Got 0 bytes, expected 2                                                                                                                                                  
Inconsistent length after decompression. Got 0 bytes, expected 2                                                                                                                                                  
ASAN:DEADLYSIGNAL                                                                                                                                                                                                 
=================================================================                                                                                                                                                 
==24966==ERROR: AddressSanitizer: SEGV on unknown address 0x000000000000 (pc 0x0000005e7caa bp 0x7f7c755a58d0 sp 0x7f7c755a5870 T2)                                                                               
==24966==The signal is caused by a READ memory access.                                                                                                                                                            
==24966==Hint: address points to the zero page.                                                                                                                                                                   
    #0 0x5e7ca9 in bufRead::get() /tmp/portage/app-arch/lrzip-0.631/work/lrzip-0.631/libzpaq/libzpaq.h:485:24                                                                                                     
    #1 0x5856f1 in libzpaq::Decompresser::findBlock(double*) /tmp/portage/app-arch/lrzip-0.631/work/lrzip-0.631/libzpaq/libzpaq.cpp:1236:21                                                                       
    #2 0x55f79a in libzpaq::decompress(libzpaq::Reader*, libzpaq::Writer*) /tmp/portage/app-arch/lrzip-0.631/work/lrzip-0.631/libzpaq/libzpaq.cpp:1363:12                                                         
    #3 0x55f4e2 in zpaq_decompress /tmp/portage/app-arch/lrzip-0.631/work/lrzip-0.631/libzpaq/libzpaq.h:538:2                                                                                                     
    #4 0x54b3a4 in zpaq_decompress_buf /tmp/portage/app-arch/lrzip-0.631/work/lrzip-0.631/stream.c:453:2                                                                                                          
    #5 0x54b3a4 in ucompthread /tmp/portage/app-arch/lrzip-0.631/work/lrzip-0.631/stream.c:1534                                                                                                                   
    #6 0x7f81b7a434a3 in start_thread /tmp/portage/sys-libs/glibc-2.23-r3/work/glibc-2.23/nptl/pthread_create.c:333
    #7 0x7f81b6d6e66c in clone /tmp/portage/sys-libs/glibc-2.23-r3/work/glibc-2.23/misc/../sysdeps/unix/sysv/linux/x86_64/clone.S:109

AddressSanitizer can not provide additional info.
SUMMARY: AddressSanitizer: SEGV /tmp/portage/app-arch/lrzip-0.631/work/lrzip-0.631/libzpaq/libzpaq.h:485:24 in bufRead::get()
Thread T2 created by T0 here:
    #0 0x42d49d in pthread_create /tmp/portage/sys-devel/llvm-3.9.1-r1/work/llvm-3.9.1.src/projects/compiler-rt/lib/asan/asan_interceptors.cc:245
    #1 0x53e70f in create_pthread /tmp/portage/app-arch/lrzip-0.631/work/lrzip-0.631/stream.c:133:6
    #2 0x53e70f in fill_buffer /tmp/portage/app-arch/lrzip-0.631/work/lrzip-0.631/stream.c:1673
    #3 0x53e70f in read_stream /tmp/portage/app-arch/lrzip-0.631/work/lrzip-0.631/stream.c:1755
    #4 0x531075 in unzip_literal /tmp/portage/app-arch/lrzip-0.631/work/lrzip-0.631/runzip.c:162:16
    #5 0x531075 in runzip_chunk /tmp/portage/app-arch/lrzip-0.631/work/lrzip-0.631/runzip.c:320
    #6 0x531075 in runzip_fd /tmp/portage/app-arch/lrzip-0.631/work/lrzip-0.631/runzip.c:382
    #7 0x519b41 in decompress_file /tmp/portage/app-arch/lrzip-0.631/work/lrzip-0.631/lrzip.c:826:6
    #8 0x511074 in main /tmp/portage/app-arch/lrzip-0.631/work/lrzip-0.631/main.c:669:4
    #9 0x7f81b6ca778f in __libc_start_main /tmp/portage/sys-libs/glibc-2.23-r3/work/glibc-2.23/csu/../csu/libc-start.c:289

==24966==ABORTING

Reproducer:
https://github.com/asarubbo/poc/blob/master/00229-lrzip-nullptr-bufRead-get

lrzip should perserve the timestamp on compression

When compressing a file, the resulting output file should have the same timestamp as the original file, this is the typical behavior of gzip/bzip2/etc.

lrzip-0.621 fails to build on ARMv7: : narrowing conversion of '-1' from 'int' to 'char' inside { } [-Wnarrowing]

lrzip cannot be built on ARMv7 with GCC 6:

  CXX      libzpaq.lo
libzpaq/libzpaq.cpp: In member function 'void libzpaq::Compressor::startBlock(int)':
libzpaq/libzpaq.cpp:1480:6: error: narrowing conversion of '-1' from 'int' to 'char' inside { } [-Wnarrowing]
   0,0}; // 0,0 = end of list
      ^
libzpaq/libzpaq.cpp:1480:6: error: narrowing conversion of '-49' from 'int' to 'char' inside { } [-Wnarrowing]
libzpaq/libzpaq.cpp:1480:6: error: narrowing conversion of '-60' from 'int' to 'char' inside { } [-Wnarrowing]
libzpaq/libzpaq.cpp:1480:6: error: narrowing conversion of '-96' from 'int' to 'char' inside { } [-Wnarrowing]
libzpaq/libzpaq.cpp:1480:6: error: narrowing conversion of '-1' from 'int' to 'char' inside { } [-Wnarrowing]
libzpaq/libzpaq.cpp:1480:6: error: narrowing conversion of '-1' from 'int' to 'char' inside { } [-Wnarrowing]
libzpaq/libzpaq.cpp:1480:6: error: narrowing conversion of '-1' from 'int' to 'char' inside { } [-Wnarrowing]
libzpaq/libzpaq.cpp:1480:6: error: narrowing conversion of '-1' from 'int' to 'char' inside { } [-Wnarrowing]
libzpaq/libzpaq.cpp:1480:6: error: narrowing conversion of '-1' from 'int' to 'char' inside { } [-Wnarrowing]
libzpaq/libzpaq.cpp:1480:6: error: narrowing conversion of '-73' from 'int' to 'char' inside { } [-Wnarrowing]
libzpaq/libzpaq.cpp:1480:6: error: narrowing conversion of '-17' from 'int' to 'char' inside { } [-Wnarrowing]
libzpaq/libzpaq.cpp:1480:6: error: narrowing conversion of '-25' from 'int' to 'char' inside { } [-Wnarrowing]
libzpaq/libzpaq.cpp:1480:6: error: narrowing conversion of '-122' from 'int' to 'char' inside { } [-Wnarrowing]
libzpaq/libzpaq.cpp:1480:6: error: narrowing conversion of '-105' from 'int' to 'char' inside { } [-Wnarrowing]
libzpaq/libzpaq.cpp:1480:6: error: narrowing conversion of '-33' from 'int' to 'char' inside { } [-Wnarrowing]
libzpaq/libzpaq.cpp:1480:6: error: narrowing conversion of '-113' from 'int' to 'char' inside { } [-Wnarrowing]
libzpaq/libzpaq.cpp:1480:6: error: narrowing conversion of '-44' from 'int' to 'char' inside { } [-Wnarrowing]
libzpaq/libzpaq.cpp:1480:6: error: narrowing conversion of '-113' from 'int' to 'char' inside { } [-Wnarrowing]
libzpaq/libzpaq.cpp:1480:6: error: narrowing conversion of '-40' from 'int' to 'char' inside { } [-Wnarrowing]
libzpaq/libzpaq.cpp:1480:6: error: narrowing conversion of '-81' from 'int' to 'char' inside { } [-Wnarrowing]
libzpaq/libzpaq.cpp:1480:6: error: narrowing conversion of '-49' from 'int' to 'char' inside { } [-Wnarrowing]
Makefile:813: recipe for target 'libzpaq.lo' failed

This is because of a warning in newly default c++11 mode. On ARM, the char is unsigned, thus negative values cannot be represented in int type.

[feature request] create a FUSE mountable fs from the archive.

If the archive could be mounted as a file system it would enable a lot of other applications to use it as default fs.

Inaccurage percentage when using zpaq on large files

I use lrzip -z to compress backups (in the form of plain tar archives). It provides great compression, and I'm fine with it taking time to do so. The issue occurs when lrzip displays it's total percentage which quickly goes up, slowly decreasing in speed, before it finally stops at 99% and sits there for sometimes several minutes before finally finishing. I'd like some accurate percentages, in terms of the time remaining in the compression -- perhaps with a command line flag if it takes a significant amount of extra computing power to do.

Compiling lrzip-0.621 on Solaris 11.3 x86

I have a problem with compiling lrzip on Solaris 11.

CXXLD lrzip
Undefined first referenced
symbol in file
fake_mremap ./.libs/libtmplrzip.a(rzip.o)
ld: fatal: symbol referencing errors
collect2: error: ld returned 1 exit status
*** Error code 1
The following command caused the error:
echo " CXXLD " lrzip;/bin/sh ./libtool --silent --tag=CXX --mode=link g++ -I. -I lzma/C -DNDEBUG -g -O2 -o lrzip main.o libtmplrzip.la -llzo2 -lbz2 -lz -lm -lpthread
make: Fatal error: Command failed for target lrzip' Current working directory /root/lrzip-0.621 *** Error code 1 The following command caused the error: fail=; \ if (target_option=k; case ${target_option-} in ?) ;; *) echo "am__make_running_with_option: internal error: invalid" "target option '${target_option-}' specified" >&2; exit 1;; esac; has_opt=no; sane_makeflags=$MAKEFLAGS; if test -n '' && test -n '1'; then sane_makeflags=$MFLAGS; else case $MAKEFLAGS in *\\[\ \ ]*) bs=\\; sane_makeflags=printf '%s\n' "$MAKEFLAGS" | sed "s/$bs$bs[$bs $bs ]//g";; esac; fi; skip_next=no; strip_trailopt () { flg=printf '%s\n' "$flg" | sed "s/$1.$//"; }; for flg in $sane_makeflags; do test $skip_next = yes && { skip_next=no; continue; }; case $flg in *=*|--*) continue;; -*I) strip_trailopt 'I'; skip_next=yes;; -*I?*) strip_trailopt 'I';; -*O) strip_trailopt 'O'; skip_next=yes;; -*O?*) strip_trailopt 'O';; -*l) strip_trailopt 'l'; skip_next=yes;; -*l?*) strip_trailopt 'l';; -[dEDm]) skip_next=yes;; -[JT]) skip_next=yes;; esac; case $flg in *$target_option*) has_opt=yes; break;; esac; done; test $has_opt = yes); then \ failcom='fail=yes'; \ else \ failcom='exit 1'; \ fi; \ dot_seen=no; \ target=echo all-recursive | sed s/-recursive//; \ case "all-recursive" in \ distclean-* | maintainer-clean-*) list='lzma man doc' ;; \ *) list='lzma man doc' ;; \ esac; \ for subdir in $list; do \ echo "Making $target in $subdir"; \ if test "$subdir" = "."; then \ dot_seen=yes; \ local_target="$target-am"; \ else \ local_target="$target"; \ fi; \ (CDPATH="${ZSH_VERSION+.}:" && cd $subdir && make $local_target) \ || eval $failcom; \ done; \ if test "$dot_seen" = "no"; then \ make "$target-am" || exit 1; \ fi; test -z "$fail" make: Fatal error: Command failed for targetall-recursive'
Current working directory /root/lrzip-0.621
*** Error code 1
make: Fatal error: Command failed for target `all'

does not seem to work on device files

tested with v0.621

lrzip /dev/sda -o /mnt/backup/sda.compressed

This creates a 66 byte big file that decompresses to a file with 0 length.

Looks like lrzip reads the size of the device node itself instead of the underlying actual device.

Input data from pipe creates an empty archive

Submitting input data from a pipe (bash, Mac OS X 10.8.3) gives an archive which when decrypted only contains zeroes:

> cat text.txt | lrzip -o archive.lrz
> lrunzip archive.lrz
> od archive
0000000    000000  000000  000000  000000  000000  000000  000000  000000
*
2071040

(2071040 is not important, just the size of my test file)

Same goes for using lrztar - I get very very small compressed files which when uncompressed only contains zeroes.

Separate hash storage for later adding files to archive

Can you please check if it's possible to add this feature to lrzip?
Storing the hashset beside the archive would allow addition of files to the archive without doing an uncompress->untar->tar->compress cycle. For files that differ only slightly, like logs or binary backup snapshots, this would allow using lrzip as the main compression mechanism.
Possibly might need some kind of file list structure, to be able to extract specific files instead of whole archive contents, although with archives which are uncompressed seldom, like logs/backups, it's less of an issue.
Thanks!

Failed to malloc ckbuf in hash_search

On "lrzip -lU gsy.tar", where the tar file was 37 GB big (an update of https://archive.org/download/ftp-ftp.hp.com_ftp1/gsy.zip ):

Failed to malloc ckbuf in hash_search
Cannot allocate memory
Fatal error - exiting

$ lrzip -V
lrzip version 0.608
$ uname -a
Linux i-00000355 3.2.0-53-virtual #81-Ubuntu SMP Thu Aug 22 21:21:26 UTC 2013 x86_64 x86_64 x86_64 GNU/Linux

4 GB RAM, disk volume on glusterfs (in case very slow I/O may be at fault).

lrzip exceedingly slow on my computer.

Hello,
I downloaded the osm street map planet data and need to keep it in sync via the change sets which means that about once a mouth I must decompress and then recompress the entire thing.
I decided that I would benchmark the various compressors and then report the results to some popular linux place, like the lwn or linux journal as the information would also benefit others.
Sadly, I'm actually seeing a speed loss as compared to other compressors.
The benchmark "lrzip 1372218189 12.8 10m23s 2m53s" means that it must be compressing at a rate of ~1 GBm. I am getting a rate of (with no options), 160MBm. I have a chain like this:
bzip2 -dkc planet*.osm.bz2 | pv -WCcarbB 1G | lrzip -o - -nN 0 -p5 -w 200 | pv -Wcarb > /dev/null
I don't know what to do about this, it's very far of from your claimed results on the 10GB image, please advise.

compat: -c,-C options

While I see merit in trying to make options similar to gzip (and many are already), the idea of having -c and -C is not good IMO. Since we are in development versions, instead of having synonyms for options, maybe in this branch redefine check as just a long option only since it has no parallel in gzip. Until we ever get to version 1.0 (production), we do have some flexibility in option naming, changing. JM2C.

I do like the --fast and --best synonym options.

(Feature request) Add LZMA2 compression support

Currently, lrzip supports LZMA, however, a revised version of this algorithm which achieves similar compression ratios at significantly higher performance has been released, and supported in archive formats such as 7z and XZ.

It would be great if LZMA2 could become the new default algorithm in lrzip, as it would deliver similar results at better performance.

segfault with "p" option

Using something like "-p1" results in segfault.
The call to getopt_long() in main.c:334 is missing the ":" after the "p", so "optarg" isn't valid.

Describe return values/exit codes in man pages

It would be nice for users to have a reference that describes which return values/exit codes to expect from lrzip and friends.

Currently nothing is mentioned anywhere in man:

~/lrzip/man$ egrep "(exit|ret|status)" -ri
~/lrzip/man$

Automatic native suffix

Since lrzip is mostly a preprocessor for standard compression formats it would be easier for clients to decompress the end-result if the compressor would automatically attach the suffix of whatever the final compression format is.

heap-based buffer overflow write in read_1g (stream.c)

On 0.631:

# lrzip -t $FILE
Decompressing...
=================================================================
==25584==ERROR: AddressSanitizer: heap-buffer-overflow on address 0x60200000ef33 at pc 0x00000045246e bp 0x7ffd881d4970 sp 0x7ffd881d4120
WRITE of size 8 at 0x60200000ef33 thread T0
    #0 0x45246d in read /tmp/portage/sys-devel/llvm-3.9.1-r1/work/llvm-3.9.1.src/projects/compiler-rt/lib/asan/../sanitizer_common/sanitizer_common_interceptors.inc:765
    #1 0x537ce1 in read_1g /tmp/portage/app-arch/lrzip-0.631/work/lrzip-0.631/stream.c:731:9
    #2 0x53e349 in read_buf /tmp/portage/app-arch/lrzip-0.631/work/lrzip-0.631/stream.c:774:8
    #3 0x53e349 in fill_buffer /tmp/portage/app-arch/lrzip-0.631/work/lrzip-0.631/stream.c:1648
    #4 0x53e349 in read_stream /tmp/portage/app-arch/lrzip-0.631/work/lrzip-0.631/stream.c:1755
    #5 0x5307fc in read_vchars /tmp/portage/app-arch/lrzip-0.631/work/lrzip-0.631/runzip.c:79:6
    #6 0x5307fc in unzip_match /tmp/portage/app-arch/lrzip-0.631/work/lrzip-0.631/runzip.c:208
    #7 0x5307fc in runzip_chunk /tmp/portage/app-arch/lrzip-0.631/work/lrzip-0.631/runzip.c:329
    #8 0x5307fc in runzip_fd /tmp/portage/app-arch/lrzip-0.631/work/lrzip-0.631/runzip.c:382
    #9 0x519b41 in decompress_file /tmp/portage/app-arch/lrzip-0.631/work/lrzip-0.631/lrzip.c:826:6
    #10 0x511074 in main /tmp/portage/app-arch/lrzip-0.631/work/lrzip-0.631/main.c:669:4
    #11 0x7f02ed48f78f in __libc_start_main /tmp/portage/sys-libs/glibc-2.23-r3/work/glibc-2.23/csu/../csu/libc-start.c:289
    #12 0x41abf8 in _init (/usr/bin/lrzip+0x41abf8)

0x60200000ef33 is located 0 bytes to the right of 3-byte region [0x60200000ef30,0x60200000ef33)
allocated by thread T0 here:
    #0 0x4d39b8 in malloc /tmp/portage/sys-devel/llvm-3.9.1-r1/work/llvm-3.9.1.src/projects/compiler-rt/lib/asan/asan_malloc_linux.cc:64
    #1 0x53e2ab in fill_buffer /tmp/portage/app-arch/lrzip-0.631/work/lrzip-0.631/stream.c:1643:10
    #2 0x53e2ab in read_stream /tmp/portage/app-arch/lrzip-0.631/work/lrzip-0.631/stream.c:1755
    #3 0x5307fc in read_vchars /tmp/portage/app-arch/lrzip-0.631/work/lrzip-0.631/runzip.c:79:6
    #4 0x5307fc in unzip_match /tmp/portage/app-arch/lrzip-0.631/work/lrzip-0.631/runzip.c:208
    #5 0x5307fc in runzip_chunk /tmp/portage/app-arch/lrzip-0.631/work/lrzip-0.631/runzip.c:329
    #6 0x5307fc in runzip_fd /tmp/portage/app-arch/lrzip-0.631/work/lrzip-0.631/runzip.c:382
    #7 0x519b41 in decompress_file /tmp/portage/app-arch/lrzip-0.631/work/lrzip-0.631/lrzip.c:826:6
    #8 0x511074 in main /tmp/portage/app-arch/lrzip-0.631/work/lrzip-0.631/main.c:669:4
    #9 0x7f02ed48f78f in __libc_start_main /tmp/portage/sys-libs/glibc-2.23-r3/work/glibc-2.23/csu/../csu/libc-start.c:289

SUMMARY: AddressSanitizer: heap-buffer-overflow /tmp/portage/sys-devel/llvm-3.9.1-r1/work/llvm-3.9.1.src/projects/compiler-rt/lib/asan/../sanitizer_common/sanitizer_common_interceptors.inc:765 in read
Shadow bytes around the buggy address:
  0x0c047fff9d90: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
  0x0c047fff9da0: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
  0x0c047fff9db0: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
  0x0c047fff9dc0: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
  0x0c047fff9dd0: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
=>0x0c047fff9de0: fa fa fa fa fa fa[03]fa fa fa fd fd fa fa fd fa
  0x0c047fff9df0: fa fa fd fd fa fa 04 fa fa fa 03 fa fa fa 05 fa
  0x0c047fff9e00: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
  0x0c047fff9e10: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
  0x0c047fff9e20: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
  0x0c047fff9e30: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
Shadow byte legend (one shadow byte represents 8 application bytes):
  Addressable:           00
  Partially addressable: 01 02 03 04 05 06 07 
  Heap left redzone:       fa
  Heap right redzone:      fb
  Freed heap region:       fd
  Stack left redzone:      f1
  Stack mid redzone:       f2
  Stack right redzone:     f3
  Stack partial redzone:   f4
  Stack after return:      f5
  Stack use after scope:   f8
  Global redzone:          f9
  Global init order:       f6
  Poisoned by user:        f7
  Container overflow:      fc
  Array cookie:            ac
  Intra object redzone:    bb
  ASan internal:           fe
  Left alloca redzone:     ca
  Right alloca redzone:    cb
==25584==ABORTING

Reproducer:
https://github.com/asarubbo/poc/blob/master/00232-lrzip-heapoverflow-read_1g

./lrzip -z COPYING segfautls on ARMv7

The lrzip-0.621 has no tests. I tried to add a simple round-trip test into Fedora package that does:

./lrzip -z COPYING
./lrzip --info COPYING.lrz
./lrzip -d -o COPYING.new COPYING.lrz
cmp COPYING COPYING.new

and it turned out that the compression command segfaults at the end of compression on ARMv7. See [https://kojipkgs.fedoraproject.org//work/tasks/5807/13035807/build.log]. It happens even with GCC 5 without the patch from issue #46:

------------------------------------------------------------------------
lrzip 0.621
------------------------------------------------------------------------
Configuration Options Summary:
  ASM.(32 bit only)..: no
  Static binary......: no
Documentation..........: no
Compilation............: make (or gmake)
  CPPFLAGS.............: 
  CFLAGS...............: -O2 -g -pipe -Wall -Werror=format-security -Wp,-D_FORTIFY_SOURCE=2 -fexceptions -fstack-protector-strong --param=ssp-buffer-size=4 -grecord-gcc-switches -specs=/usr/lib/rpm/redhat/redhat-hardened-cc1 -march=armv7-a -mfpu=vfpv3-d16  -mfloat-abi=hard
  CXXFLAGS.............: -O2 -g -pipe -Wall -Werror=format-security -Wp,-D_FORTIFY_SOURCE=2 -fexceptions -fstack-protector-strong --param=ssp-buffer-size=4 -grecord-gcc-switches -specs=/usr/lib/rpm/redhat/redhat-hardened-cc1 -march=armv7-a -mfpu=vfpv3-d16  -mfloat-abi=hard
  LDFLAGS..............: -Wl,-z,relro -specs=/usr/lib/rpm/redhat/redhat-hardened-ld
Installation...........: make install (as root if needed, with 'su' or 'sudo')
  prefix...............: /usr
+ make -j4
[...]
Executing(%check): /bin/sh -e /var/tmp/rpm-tmp.c7AUDz
+ umask 022
+ cd /builddir/build/BUILD
+ cd lrzip-0.621
+ ./lrzip -z COPYING
Output filename is: COPYING.lrz
Total:  1%  Chunk:  1%
Total:  2%  Chunk:  2%
[...]
Total: 98%  Chunk: 98%
Total: 99%  Chunk: 99%
/var/tmp/rpm-tmp.c7AUDz: line 31: 24743 Segmentation fault      (core dumped) ./lrzip -z COPYING

I have no opportunity to debug it. I just want to notify you about this issue. Maybe you will have some idea.

divide-by-zero in bufRead::get (libzpaq.h)

On 0.631:

# lrzip -t $FILE
Decompressing...
ASAN:DEADLYSIGNAL
=================================================================
==8026==ERROR: AddressSanitizer: FPE on unknown address 0x0000005e7957 (pc 0x0000005e7957 bp 0x7fcdf9ba58d0 sp 0x7fcdf9ba5870 T1)
    #0 0x5e7956 in bufRead::get() /tmp/portage/app-arch/lrzip-0.631/work/lrzip-0.631/libzpaq/libzpaq.h:468:41
    #1 0x5856f1 in libzpaq::Decompresser::findBlock(double*) /tmp/portage/app-arch/lrzip-0.631/work/lrzip-0.631/libzpaq/libzpaq.cpp:1236:21
    #2 0x55f79a in libzpaq::decompress(libzpaq::Reader*, libzpaq::Writer*) /tmp/portage/app-arch/lrzip-0.631/work/lrzip-0.631/libzpaq/libzpaq.cpp:1363:12
    #3 0x55f4e2 in zpaq_decompress /tmp/portage/app-arch/lrzip-0.631/work/lrzip-0.631/libzpaq/libzpaq.h:538:2
    #4 0x54b3a4 in zpaq_decompress_buf /tmp/portage/app-arch/lrzip-0.631/work/lrzip-0.631/stream.c:453:2
    #5 0x54b3a4 in ucompthread /tmp/portage/app-arch/lrzip-0.631/work/lrzip-0.631/stream.c:1534
    #6 0x7fd33c0594a3 in start_thread /tmp/portage/sys-libs/glibc-2.23-r3/work/glibc-2.23/nptl/pthread_create.c:333
    #7 0x7fd33b38466c in clone /tmp/portage/sys-libs/glibc-2.23-r3/work/glibc-2.23/misc/../sysdeps/unix/sysv/linux/x86_64/clone.S:109                                                                             
                                                                                                                                                                                                                  
AddressSanitizer can not provide additional info.                                                                                                                                                                 
SUMMARY: AddressSanitizer: FPE /tmp/portage/app-arch/lrzip-0.631/work/lrzip-0.631/libzpaq/libzpaq.h:468:41 in bufRead::get()                                                                                      
Thread T1 created by T0 here:                                                                                                                                                                                     
    #0 0x42d49d in pthread_create /tmp/portage/sys-devel/llvm-3.9.1-r1/work/llvm-3.9.1.src/projects/compiler-rt/lib/asan/asan_interceptors.cc:245                                                                 
    #1 0x53e70f in create_pthread /tmp/portage/app-arch/lrzip-0.631/work/lrzip-0.631/stream.c:133:6                                                                                                               
    #2 0x53e70f in fill_buffer /tmp/portage/app-arch/lrzip-0.631/work/lrzip-0.631/stream.c:1673                                                                                                                   
    #3 0x53e70f in read_stream /tmp/portage/app-arch/lrzip-0.631/work/lrzip-0.631/stream.c:1755                                                                                                                   
    #4 0x5303e3 in read_u8 /tmp/portage/app-arch/lrzip-0.631/work/lrzip-0.631/runzip.c:55:6                                                                                                                       
    #5 0x5303e3 in read_header /tmp/portage/app-arch/lrzip-0.631/work/lrzip-0.631/runzip.c:144                                                                                                                    
    #6 0x5303e3 in runzip_chunk /tmp/portage/app-arch/lrzip-0.631/work/lrzip-0.631/runzip.c:314                                                                                                                   
    #7 0x5303e3 in runzip_fd /tmp/portage/app-arch/lrzip-0.631/work/lrzip-0.631/runzip.c:382                                                                                                                      
    #8 0x519b41 in decompress_file /tmp/portage/app-arch/lrzip-0.631/work/lrzip-0.631/lrzip.c:826:6                                                                                                               
    #9 0x511074 in main /tmp/portage/app-arch/lrzip-0.631/work/lrzip-0.631/main.c:669:4                                                                                                                           
    #10 0x7fd33b2bd78f in __libc_start_main /tmp/portage/sys-libs/glibc-2.23-r3/work/glibc-2.23/csu/../csu/libc-start.c:289                                                                                       
                                                                                                                                                                                                                  
==8026==ABORTING

Reproducer:
https://github.com/asarubbo/poc/blob/master/00228-lrzip-fpe-bufRead-get

0.608 [PATCH] manual page typos: calcuated, compard

Forwarded from http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=655295

Package: lrzip
Version: 0.608-1
Severity: minor
Tags: patch

Dear Maintainer,

Found a few typos in '/usr/share/man/man1/lrzip.1.gz', see attached '.diff'.

Hope this helps...

http://bugs.debian.org/cgi-bin/bugreport.cgi?msg=5;filename=lrzip.1.diff;att=1;bug=655295

Cygwin build only seems to use one core

A version of lrzip built against Cygwin only seems to use one core - i.e. shows as 25% CPU usage on a 4 core system.

lrzip-bz2 performs worse than just bz2 for large json file

See here:

 880M 14. Mai 21:25 files.tar
  11M 14. Mai 21:25 files.tar.bz2
  21M 14. Mai 21:31 files.tar.lrz-bz2
  22M 14. Mai 21:33 files.tar.lrz-lzma
  51M 14. Mai 21:29 files.tar.lrz-nocompress
  21M 14. Mai 21:29 files.tar.lrz-nocompress.bz2
  19M 14. Mai 21:35 files.tar.lrz-zpaq

Using just bzip2, the 880MB file compresses to 11MB. Using lrzip -b, it is 21MB.
How can this happen / should this happen?

Here is the file (.zip so github accepts the upload):
files.tar.zip

Cygwin build crashes when using bzip compression

$ ./lrzip.exe -b /cygdrive/c/Users/Adam/Documents/acorndata.sql
Output filename is: /cygdrive/c/Users/Adam/Documents/acorndata.sql.lrz
pthread_mutex_unlock failedNo error
Fatal error - exiting

Performance regressions with -n and -l

Using git bisecton on a Celeron 220 running Ubuntu 10.04 x64, I found the following performance regressions:

c0ac813 - runs at 18 mb/s with -n, 11 with -l
9c00229 - this commit slows -n down from 18 to 14 mb/s
29b1666 - this commit slows -n down from 15 to 10 mb/s
017ec9e - this finally reduces -n to 7,5 mb/s, 6 with -l

This means cutting performance in halves with the 8 commits between c0ac813 and 017ec9e !

lrztar doesn't accept -U option

$ lrztar -lDU $dir

Cannot have -U and stdin, unlimited mode disabled.

It could fallback to creating a temporary tar file; doing it manually is tedious.

lrztar misses support for some lrzip options

I'm running lrzip 0.621.

Using lrztar I noticed that -m option (needed to circumvent #44) results in
lrztar: illegal option -- m
The same behavior is caused also by options -e and -i.

Neither options -t and -V works but they prints a different error:
lrztar: invalid option for lrztar: t

Thanks

Segmentation fault when using stdin on large files/streams

I commonly tar + lrz + gpg encrypt backups / archives. To validate them I typically run gpg -d <file> | lrzcat | tar tvf - or similar. However, this appears to seg fault.

Running lrzip 0.616 on Arch Linux on an x84 machine with 16 GB of memory. Also fails on current head commit of 2c151a0

Steps to reproduce with random 5GB raw file, (note: no compression, also fails with standard lzma for me):

Generate large file

~/t/lrz $ dd if=/dev/zero bs=1M count=5k 2>/dev/null | openssl enc -rc4-40 -pass pass:weak | lrzip -n -fo 5GB.raw.lrz
Compression Ratio: 1.000. Average Compression Speed: 25.859MB/s.
Total time: 00:03:17.95

File details

~/t/lrz $ lrzip -i 5GB.raw.lrz
5GB.raw.lrz:
lrzip version: 0.6 file
Compression: rzip alone
Decompressed file size: 5368709136
Compressed file size: 5368955036
Compression ratio: 1.000
MD5 used for integrity testing
MD5: e09435b9e754cf88c93c5c1c85477e81

Attempt to stream file (sorry pv makes noise on console, but it always crashes at 2.61 GB for me):

~/t/lrz $ pv -cN in 5GB.raw.lrz | lrzcat | pv -cN out  > /dev/null
        in: 2.61GiB 0:00:29 [90.5MiB/s] [====================================>                                   ] 52%            
       out:    0 B 0:00:29 [   0 B/s] [<=>                                                                                       ]
[1]    4018 done                              pv -cN in 5GB.raw.lrz | 
       4020 segmentation fault (core dumped)  lrzcat | 
       4021 done                              pv -cN out > /dev/null

Successfully decompress the same file without using a pipe/stdin

~/t/lrz $ time lrzip 5GB.raw.lrz -o out                                                                                               5GB.raw.lrz - Compression Ratio: 1.000. Average Compression Speed: 13.838MB/s.
Total time: 00:06:10.36
lrzip 5GB.raw.lrz -o out  171.24s user 35.71s system 55% cpu 6:10.37 total

Memory info at the time:

~/t/lrz$ free -m
             total       used       free     shared    buffers     cached
Mem:         16036      15792        244         93         46      10460
-/+ buffers/cache:       5285      10751
Swap:         4095        773       3322

Core dump, no time to look deeper yet:

Core was generated by `/home/nitro/tmp/lrz/lrzip/lrzip -d'.
Program terminated with signal SIGSEGV, Segmentation fault.
#0  0x000000000040ea78 in read_fdin (control=0x63fd40 <local_control>, len=5368709152) at stream.c:679
679                     control->tmp_inbuf[control->in_ofs + i] = (char)tmpchar;
(gdb) where
#0  0x000000000040ea78 in read_fdin (control=0x63fd40 <local_control>, len=5368709152) at stream.c:679
#1  0x000000000040f26b in read_seekto (control=0x63fd40 <local_control>, sinfo=<optimized out>, pos=5368709184) at stream.c:842
#2  0x0000000000412c5b in fill_buffer (streamno=<optimized out>, sinfo=<optimized out>, control=<optimized out>) at stream.c:1567
#3  read_stream (control=0x7f62149dd9c0 <_IO_2_1_stdin_>, ss=0x253f070, streamno=1834450999, 
    p=0x7f62147117a0 <__read_nocancel+7> "H=\001\360\377\377s1\303H\203\354\b莖\001", len=1) at stream.c:1735
#4  0x000000000040d3c8 in read_u8 (stream=<optimized out>, err=<optimized out>, ss=<optimized out>, control=<optimized out>) at runzip.c:55
#5  read_header (head=<optimized out>, ss=<optimized out>, control=<optimized out>) at runzip.c:148
#6  runzip_chunk (tally=<optimized out>, expected_size=<optimized out>, fd_in=<optimized out>, control=<optimized out>) at runzip.c:312
#7  runzip_fd (control=0x63fd40 <local_control>, fd_in=223, fd_in@entry=3, fd_out=1834450999, fd_out@entry=4, fd_hist=342955936, fd_hist@entry=5, 
      expected_size=5368709136) at runzip.c:379
#8  0x0000000000406c49 in decompress_file (control=0x63fd40 <local_control>) at lrzip.c:784
#9  0x0000000000403685 in main (argc=0, argv=0x7fffd435da88) at main.c:474

Lrzip should avoid writing large temporary files to tmpfs

Lrzip runs my system out of memory (sometimes fatally depending on what OOM-killer does) if $TMPDIR is on a tmpfs while decompressing a very large stream with lrzcat. My tmpfs size is half my system memory per the default mount options.

Many distributions set /tmp to a tmpfs and may also experience this problem when decompressing very large streams.

Setting $TMPDIR to a traditional harddrive backed filesystem works fine.

Perhaps lrzip should check before writing temporary files to tmpfs and fail in this case until the user sets a reasonable $TMPDIR.

lrzip sticks at total: 3%

Hello. I am currently testing lrzip on a small system backup before moving on to more serious applications. However even this "small" backup is over 100GB (143 to be exact) so lrzip should be a godsend.

Currently I tar the backup (as it contains many full directories) and then compress with lrzip. For a little while it works fine. But once it reaches total: 3% it freezes up; no more progress is made. I don't really know what the problem is....

My most recent test contained no flags at all. I have ~12 GB in physical memory.

compression level and zpaq

The help output suggests that we can choose a compression level with -L # switch when using lzma/bzip2/gzip compression.

Low level options:
-L level set lzma/bzip2/gzip compression level (1-9, default 7)

However, -L also affects the compression level when using the ZPAQ compression with -z.
ZPAQ's number of compression levels varies with version number [1]. What ZPAQ compression levels do the lrzip's compression levels 1-9 correspond to?

NULL pointer dereference in join_pthread (stream.c)

On 0.631:

# lrzip -t $FILE
Decompressing...
100%       2.00 /      2.00 ^MASAN:DEADLYSIGNAL
=================================================================
==1329==ERROR: AddressSanitizer: SEGV on unknown address 0x0000000002d0 (pc 0x7fa931ad7660 bp 0x7ffff4a30c30 sp 0x7ffff4a309f8 T0)
==1329==The signal is caused by a READ memory access.
==1329==Hint: address points to the zero page.
    #0 0x7fa931ad765f  /tmp/portage/sys-libs/glibc-2.23-r3/work/glibc-2.23/nptl/pthread_join.c:34
    #1 0x53ee0d in join_pthread /tmp/portage/app-arch/lrzip-0.631/work/lrzip-0.631/stream.c:147:6
    #2 0x53ee0d in fill_buffer /tmp/portage/app-arch/lrzip-0.631/work/lrzip-0.631/stream.c:1697
    #3 0x53ee0d in read_stream /tmp/portage/app-arch/lrzip-0.631/work/lrzip-0.631/stream.c:1755
    #4 0x531075 in unzip_literal /tmp/portage/app-arch/lrzip-0.631/work/lrzip-0.631/runzip.c:162:16
    #5 0x531075 in runzip_chunk /tmp/portage/app-arch/lrzip-0.631/work/lrzip-0.631/runzip.c:320
    #6 0x531075 in runzip_fd /tmp/portage/app-arch/lrzip-0.631/work/lrzip-0.631/runzip.c:382
    #7 0x519b41 in decompress_file /tmp/portage/app-arch/lrzip-0.631/work/lrzip-0.631/lrzip.c:826:6
    #8 0x511074 in main /tmp/portage/app-arch/lrzip-0.631/work/lrzip-0.631/main.c:669:4
    #9 0x7fa930d3a78f in __libc_start_main /tmp/portage/sys-libs/glibc-2.23-r3/work/glibc-2.23/csu/../csu/libc-start.c:289
    #10 0x41abf8 in _init (/usr/bin/lrzip+0x41abf8)

AddressSanitizer can not provide additional info.
SUMMARY: AddressSanitizer: SEGV /tmp/portage/sys-libs/glibc-2.23-r3/work/glibc-2.23/nptl/pthread_join.c:34 
==1329==ABORTING

Reproducer:
https://github.com/asarubbo/poc/blob/master/00231-lrzip-nullptr-join_pthread

Compile fails on Cygwin

make[2]: Entering directory `/home/Adam/lrzip'
  CXXLD    libtmplrzip.la
  CXXLD    liblrzip.la
libtool: link: warning: undefined symbols not allowed in i686-pc-cygwin shared libraries
  CXXLD    lrzip.exe
  CC       decompress_demo.o
In file included from decompress_demo.c:26:
./Lrzip.h:134: error: parse error before "va_list"
Makefile:709: recipe for target `decompress_demo.o' failed
make[2]: *** [decompress_demo.o] Error 1
make[2]: Leaving directory `/home/Adam/lrzip'
Makefile:855: recipe for target `all-recursive' failed
make[1]: *** [all-recursive] Error 1
make[1]: Leaving directory `/home/Adam/lrzip'
Makefile:480: recipe for target `all' failed
make: *** [all] Error 2

invalid memory read in lzo1x_decompress (lzo1x_d.ch)

On 0.631:

# lrzip -t $FILE
Decompressing...
Failed to decompress buffer - lzmaerr=6
ASAN:DEADLYSIGNAL
=================================================================
==3311==ERROR: AddressSanitizer: SEGV on unknown address 0x602000010000 (pc 0x7f75cabe8834 bp 0x62100002c11f sp 0x7f7085ab4d78 T5)
==3311==The signal is caused by a READ memory access.
    #0 0x7f75cabe8833 in lzo1x_decompress /tmp/portage/dev-libs/lzo-2.08/work/lzo-2.08/src/lzo1x_d.ch:108
    #1 0x54af2f in lzo_decompress_buf /tmp/portage/app-arch/lrzip-0.631/work/lrzip-0.631/stream.c:590:10
    #2 0x54af2f in ucompthread /tmp/portage/app-arch/lrzip-0.631/work/lrzip-0.631/stream.c:1525
    #3 0x7f75ca2944a3 in start_thread /tmp/portage/sys-libs/glibc-2.23-r3/work/glibc-2.23/nptl/pthread_create.c:333
    #4 0x7f75c95bf66c in clone /tmp/portage/sys-libs/glibc-2.23-r3/work/glibc-2.23/misc/../sysdeps/unix/sysv/linux/x86_64/clone.S:109

AddressSanitizer can not provide additional info.
SUMMARY: AddressSanitizer: SEGV /tmp/portage/dev-libs/lzo-2.08/work/lzo-2.08/src/lzo1x_d.ch:108 in lzo1x_decompress
Thread T5 created by T0 here:
    #0 0x42d49d in pthread_create /tmp/portage/sys-devel/llvm-3.9.1-r1/work/llvm-3.9.1.src/projects/compiler-rt/lib/asan/asan_interceptors.cc:245
    #1 0x53e70f in create_pthread /tmp/portage/app-arch/lrzip-0.631/work/lrzip-0.631/stream.c:133:6
    #2 0x53e70f in fill_buffer /tmp/portage/app-arch/lrzip-0.631/work/lrzip-0.631/stream.c:1673
    #3 0x53e70f in read_stream /tmp/portage/app-arch/lrzip-0.631/work/lrzip-0.631/stream.c:1755
    #4 0x531075 in unzip_literal /tmp/portage/app-arch/lrzip-0.631/work/lrzip-0.631/runzip.c:162:16
    #5 0x531075 in runzip_chunk /tmp/portage/app-arch/lrzip-0.631/work/lrzip-0.631/runzip.c:320
    #6 0x531075 in runzip_fd /tmp/portage/app-arch/lrzip-0.631/work/lrzip-0.631/runzip.c:382
    #7 0x519b41 in decompress_file /tmp/portage/app-arch/lrzip-0.631/work/lrzip-0.631/lrzip.c:826:6
    #8 0x511074 in main /tmp/portage/app-arch/lrzip-0.631/work/lrzip-0.631/main.c:669:4
    #9 0x7f75c94f878f in __libc_start_main /tmp/portage/sys-libs/glibc-2.23-r3/work/glibc-2.23/csu/../csu/libc-start.c:289

Dunno wtf decompression type to use!
==3311==AddressSanitizer: while reporting a bug found another one. Ignoring.
Fatal error - exiting

Reproducer:
https://github.com/asarubbo/poc/blob/master/00230-lrzip-invalidread-lzo1x_decompress

possible call to malloc with argument value 0 in unzip_match

at runzip.c:219 malloc is called with a value that is earlier checked to be non-negative, but not nonzero.

I can't rule out a zero-valued len at runzip.c:327, simply because there are too many paths for it to be immediately obvious.

(picked up running a scan-build make)

error running lrunzip on large encrypted lrzip file

My tests suggest that both of these conditions need to apply to produce this error:

lrzip using the -e (encrypt) flag
input file big enough to cause lrzip to do more than one pass

Here's a test scenario with an 8GB input file: lrzip, then lrunzip the result. The output <test8.lrz.out> should be identical to the input , but it is short:

lrzip -e -f -v -o test8.lrz test8 && lrunzip -f -v -o test8.lrz.out test8.lrz
-rw-------  1 root   root 8589934592 2016/04/04 13:26:53 test8
-rw-------  1 root   root 3100380112 2016/04/04 17:39:10 test8.lrz
-rw-------  1 root   root 8413675520 2016/04/04 17:46:44 test8.lrz.out

No error messages are seen during lrzip, but "chunk_bytes <#> is invalid in runzip_chunk" is seen during lrunzip.

The same error can be seen with smaller input files if the lrzip -w flag is used to specify a smaller than default compression window size: eg "-w 1" makes the error show up with a 100MB input file.

I'm running on a 64-bit Debian Stable (Jessie) system with 12GB of RAM and an Intel Core i7-950 processor. I've tried both lrzip-0.616 (the current version on the Debian Stable repo) and lrzip-0.621 which I built myself from the source on https://github.com/ckolivas/lrzip/releases
They both error in the same way.

Any idea what might be happening, and how I can investigate further?

lrzip 0.631 occasionally hangs forever

I have not investigated what causes this but occasionally when running lrztar it will hang for unreasonably long period of time until it is killed. In some cases the process was in the background for nearly a month for a 27M directory!

lrzip silently ignores arguments after parse error

IMO lrzip should either parse the following command line correctly or flag an error:

$ lrzip -L9Uz

This will not complain and zip the file, but won't, as far as I can see, apply zpaq compression.

The following works fine:

$ lrzip -L 9 -U -z

Version: 0.616

Last but not least: Thanks for this otherwise absolutely great tool!

use-after-free in read_stream (stream.c)

On 0.631:

# lrzip -t $FILE
Decompressing...
=================================================================
==4026==ERROR: AddressSanitizer: heap-use-after-free on address 0x62100000dd00 at pc 0x0000004bccc5 bp 0x7ffcf3b4d9f0 sp 0x7ffcf3b4d1a0
READ of size 1 at 0x62100000dd00 thread T0
    #0 0x4bccc4 in __asan_memcpy /tmp/portage/sys-devel/llvm-3.9.1-r1/work/llvm-3.9.1.src/projects/compiler-rt/lib/asan/asan_interceptors.cc:413
    #1 0x53cff6 in read_stream /tmp/portage/app-arch/lrzip-0.631/work/lrzip-0.631/stream.c:1747:4
    #2 0x5307fc in read_vchars /tmp/portage/app-arch/lrzip-0.631/work/lrzip-0.631/runzip.c:79:6
    #3 0x5307fc in unzip_match /tmp/portage/app-arch/lrzip-0.631/work/lrzip-0.631/runzip.c:208
    #4 0x5307fc in runzip_chunk /tmp/portage/app-arch/lrzip-0.631/work/lrzip-0.631/runzip.c:329
    #5 0x5307fc in runzip_fd /tmp/portage/app-arch/lrzip-0.631/work/lrzip-0.631/runzip.c:382
    #6 0x519b41 in decompress_file /tmp/portage/app-arch/lrzip-0.631/work/lrzip-0.631/lrzip.c:826:6
    #7 0x511074 in main /tmp/portage/app-arch/lrzip-0.631/work/lrzip-0.631/main.c:669:4
    #8 0x7f743a5d278f in __libc_start_main /tmp/portage/sys-libs/glibc-2.23-r3/work/glibc-2.23/csu/../csu/libc-start.c:289
    #9 0x41abf8 in _init (/usr/bin/lrzip+0x41abf8)

0x62100000dd00 is located 0 bytes inside of 4096-byte region [0x62100000dd00,0x62100000ed00)
freed by thread T0 here:
    #0 0x4d3660 in free /tmp/portage/sys-devel/llvm-3.9.1-r1/work/llvm-3.9.1.src/projects/compiler-rt/lib/asan/asan_malloc_linux.cc:47
    #1 0x53d186 in fill_buffer /tmp/portage/app-arch/lrzip-0.631/work/lrzip-0.631/stream.c:1574:3
    #2 0x53d186 in read_stream /tmp/portage/app-arch/lrzip-0.631/work/lrzip-0.631/stream.c:1755
    #3 0x5307fc in read_vchars /tmp/portage/app-arch/lrzip-0.631/work/lrzip-0.631/runzip.c:79:6
    #4 0x5307fc in unzip_match /tmp/portage/app-arch/lrzip-0.631/work/lrzip-0.631/runzip.c:208
    #5 0x5307fc in runzip_chunk /tmp/portage/app-arch/lrzip-0.631/work/lrzip-0.631/runzip.c:329
    #6 0x5307fc in runzip_fd /tmp/portage/app-arch/lrzip-0.631/work/lrzip-0.631/runzip.c:382
    #7 0x519b41 in decompress_file /tmp/portage/app-arch/lrzip-0.631/work/lrzip-0.631/lrzip.c:826:6
    #8 0x511074 in main /tmp/portage/app-arch/lrzip-0.631/work/lrzip-0.631/main.c:669:4
    #9 0x7f743a5d278f in __libc_start_main /tmp/portage/sys-libs/glibc-2.23-r3/work/glibc-2.23/csu/../csu/libc-start.c:289

previously allocated by thread T1 here:
    #0 0x4d39b8 in malloc /tmp/portage/sys-devel/llvm-3.9.1-r1/work/llvm-3.9.1.src/projects/compiler-rt/lib/asan/asan_malloc_linux.cc:64
    #1 0x54b0d7 in lzma_decompress_buf /tmp/portage/app-arch/lrzip-0.631/work/lrzip-0.631/stream.c:546:20
    #2 0x54b0d7 in ucompthread /tmp/portage/app-arch/lrzip-0.631/work/lrzip-0.631/stream.c:1522
    #3 0x7f743b36e4a3 in start_thread /tmp/portage/sys-libs/glibc-2.23-r3/work/glibc-2.23/nptl/pthread_create.c:333

Thread T1 created by T0 here:
    #0 0x42d49d in pthread_create /tmp/portage/sys-devel/llvm-3.9.1-r1/work/llvm-3.9.1.src/projects/compiler-rt/lib/asan/asan_interceptors.cc:245
    #1 0x53e70f in create_pthread /tmp/portage/app-arch/lrzip-0.631/work/lrzip-0.631/stream.c:133:6
    #2 0x53e70f in fill_buffer /tmp/portage/app-arch/lrzip-0.631/work/lrzip-0.631/stream.c:1673
    #3 0x53e70f in read_stream /tmp/portage/app-arch/lrzip-0.631/work/lrzip-0.631/stream.c:1755
    #4 0x5303e3 in read_u8 /tmp/portage/app-arch/lrzip-0.631/work/lrzip-0.631/runzip.c:55:6
    #5 0x5303e3 in read_header /tmp/portage/app-arch/lrzip-0.631/work/lrzip-0.631/runzip.c:144
    #6 0x5303e3 in runzip_chunk /tmp/portage/app-arch/lrzip-0.631/work/lrzip-0.631/runzip.c:314
    #7 0x5303e3 in runzip_fd /tmp/portage/app-arch/lrzip-0.631/work/lrzip-0.631/runzip.c:382
    #8 0x519b41 in decompress_file /tmp/portage/app-arch/lrzip-0.631/work/lrzip-0.631/lrzip.c:826:6
    #9 0x511074 in main /tmp/portage/app-arch/lrzip-0.631/work/lrzip-0.631/main.c:669:4
    #10 0x7f743a5d278f in __libc_start_main /tmp/portage/sys-libs/glibc-2.23-r3/work/glibc-2.23/csu/../csu/libc-start.c:289

SUMMARY: AddressSanitizer: heap-use-after-free /tmp/portage/sys-devel/llvm-3.9.1-r1/work/llvm-3.9.1.src/projects/compiler-rt/lib/asan/asan_interceptors.cc:413 in __asan_memcpy
Shadow bytes around the buggy address:
  0x0c427fff9b50: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
  0x0c427fff9b60: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
  0x0c427fff9b70: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
  0x0c427fff9b80: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
  0x0c427fff9b90: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
=>0x0c427fff9ba0:[fd]fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd
  0x0c427fff9bb0: fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd
  0x0c427fff9bc0: fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd
  0x0c427fff9bd0: fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd
  0x0c427fff9be0: fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd
  0x0c427fff9bf0: fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd
Shadow byte legend (one shadow byte represents 8 application bytes):
  Addressable:           00
  Partially addressable: 01 02 03 04 05 06 07 
  Heap left redzone:       fa
  Heap right redzone:      fb
  Freed heap region:       fd
  Stack left redzone:      f1
  Stack mid redzone:       f2
  Stack right redzone:     f3
  Stack partial redzone:   f4
  Stack after return:      f5
  Stack use after scope:   f8
  Global redzone:          f9
  Global init order:       f6
  Poisoned by user:        f7
  Container overflow:      fc
  Array cookie:            ac
  Intra object redzone:    bb
  ASan internal:           fe
  Left alloca redzone:     ca
  Right alloca redzone:    cb
==4026==ABORTING

Reproducer:
https://github.com/asarubbo/poc/blob/master/00233-lrzip-UAF-read_stream

Failed to malloc ckbuf in hash_search2

This was fixed according to the changelog, but I'm getting it in version 0.621 with a 12 GB directory on Linux 4.2.1-1-ARCH x86_64

lrztar fails on directory names with spaces

There seems to be an issue with how lrztar handles directory names containing spaces:

$ lrztar "a directory"
Failed to open directory.tar.lrz
No such file or directory
Fatal error - exiting

$ lrztar "one more directory"
Cannot specify output filename with more than 1 file
Fatal error - exiting

Quite peculiar :)

Compress failed in compthread

I tried to compress a Windows 7 VMware vmdk file with 25 GByte which worked with the standard LZMA as expected. But then I tried bzip2 and gzip mode and both failed with the above error.

I took a short look at the source and found following solution for me, with this I could go further and have now a malloc problem which I will try to solve later.

--- lrzip-0.551/stream.c    2010-12-12 07:49:00.000000000 +0100
+++ lrzip-0.551/stream.c.new    2010-12-14 15:23:51.192320814 +0100
@@ -213,6 +213,7 @@
 {
    u32 dlen = cthread->s_len;
    uchar *c_buf;
+   int bzip2_ret;

    if (!lzo_compresses(cthread->s_buf, cthread->s_len))
        return 0;
@@ -223,9 +224,21 @@
        return -1;
    }

-   if (BZ2_bzBuffToBuffCompress((char *)c_buf, &dlen,
+   bzip2_ret = BZ2_bzBuffToBuffCompress((char *)c_buf, &dlen,
        (char *)cthread->s_buf, cthread->s_len,
-       control.compression_level, 0, control.compression_level * 10) != BZ_OK) {
+       control.compression_level, 0, control.compression_level * 10);
+
+   /* if compressed data is bigger then original data leave
+           as CTYPE_NONE */
+
+   if (bzip2_ret == BZ_OUTBUFF_FULL) {
+       print_maxverbose("Incompressible block\n");
+       /* Incompressible, leave as CTYPE_NONE */
+       free(c_buf);
+       return 0;
+   }
+
+   if (bzip2_ret != BZ_OK) {
            free(c_buf);
            print_maxverbose("BZ2 compress failed\n");
            return -1;
@@ -249,6 +262,7 @@
 {
    unsigned long dlen = cthread->s_len;
    uchar *c_buf;
+   int gzip_ret;

    c_buf = malloc(dlen);
    if (!c_buf) {
@@ -256,8 +270,20 @@
        return -1;
    }

-   if (compress2(c_buf, &dlen, cthread->s_buf, cthread->s_len,
-       control.compression_level) != Z_OK) {
+   gzip_ret = compress2(c_buf, &dlen, cthread->s_buf, cthread->s_len,
+   control.compression_level);
+
+   /* if compressed data is bigger then original data leave 
+      as CTYPE_NONE */
+
+   if (gzip_ret == Z_BUF_ERROR) {
+       print_maxverbose("Incompressible block\n");
+       /* Incompressible, leave as CTYPE_NONE */
+       free(c_buf);
+       return 0;
+   }
+
+   if (gzip_ret != Z_OK) {
            free(c_buf);
            print_maxverbose("compress2 failed\n");
            return -1;

0.608 gcc -Wall -pedantic build warnings

FYI,

Debian now builds with hardened CFLAGS to reveal possible problems. I cranked the flags to maximum "-Wall -pedantic". Perhaps some of the issues could be addresses in later releases.

Flags used were:

LDFLAGS="-Wl,-z,relro -Wl,--as-needed"

CFLAGS="-g -O2 -fstack-protector --param=ssp-buffer-size=4 -Wformat -Wformat-security -Werror=format-security -Wall -pedantic"

...(this only an exerpt)...

Types.h:88:14: warning: ISO C90 does not support 'long long' [-Wlong-long]
Types.h:89:23: warning: ISO C90 does not support 'long long' [-Wlong-long]

... warning: C++ style comments are not allowed in ISO C90 [enabled by default]

LzmaEnc.c:2159:8: warning: variable 'allocaDummy' set but not used [-Wunused-but-set-variable]

... warning: ISO C90 does not support the 'll' gnu_printf length modifier [-Wformat]

lrzip.c:165:55: warning: ISO C99 requires rest arguments to be used [enabled by default]

main.c:163:2: warning: ISO C does not support the '%Lu' gnu_scanf format [-Wformat]

0.614 warning: ISO C forbids omitting the middle term of a ?: expression

Hi Con,

Following messages are displayed for the 0.614

  CC       liblrzip.lo
liblrzip.c: In function 'lrzip_log_cb_set':
liblrzip.c:601:24: warning: ISO C forbids conversion of function pointer to object pointer type [-peda\
ntic]
liblrzip.c:601:22: warning: ISO C forbids assignment between function pointer and 'void *' [-pedantic]
liblrzip.c: In function 'lrzip_pass_cb_set':
liblrzip.c:637:25: warning: ISO C forbids conversion of function pointer to object pointer type [-peda\
ntic]
liblrzip.c:637:23: warning: ISO C forbids assignment between function pointer and 'void *' [-pedantic]
liblrzip.c: In function 'lrzip_info_cb_set':
liblrzip.c:645:25: warning: ISO C forbids conversion of function pointer to object pointer type [-peda\
ntic]
liblrzip.c:645:23: warning: ISO C forbids assignment between function pointer and 'void *' [-pedantic]
  CC       lrzip.lo
lrzip.c: In function 'percentage':
lrzip.c:873:17: warning: ISO C forbids omitting the middle term of a ?: expression [-pedantic]
  CC       rzip.lo
rzip.c: In function 'show_distrib':
rzip.c:555:117: warning: ISO C forbids omitting the middle term of a ?: expression [-pedantic]
rzip.c: In function 'hash_search':
rzip.c:643:28: warning: ISO C forbids omitting the middle term of a ?: expression [-pedantic]
rzip.c:644:35: warning: ISO C forbids omitting the middle term of a ?: expression [-pedantic]
rzip.c: In function 'rzip_fd':
rzip.c:1086:55: warning: ISO C forbids omitting the middle term of a ?: expression [-pedantic]
rzip.c:1164:81: warning: ISO C forbids omitting the middle term of a ?: expression [-pedantic]
  CC       runzip.lo
runzip.c: In function 'runzip_fd':
runzip.c:398:121: warning: ISO C forbids omitting the middle term of a ?: expression [-pedantic]
  CC       stream.lo
stream.c: In function 'lzma_compress_buf':
stream.c:316:52: warning: ISO C forbids omitting the middle term of a ?: expression [-pedantic]
  CC       util.lo
util.c: In function 'setup_overhead':
util.c:113:52: warning: ISO C forbids omitting the middle term of a ?: expression [-pedantic]

Failed to decompress large file

I created an archive on 28 July 2011, which I cannot open anymore.

$ du -h myLargeArchive.tar.lrz
152G    myLargeArchive.tar.lrz

$ file myLargeArchive.tar.lrz
myLargeArchive.tar.lrz: LRZIP compressed data - version 0.6

$ lrunzip myLargeArchive.tar.lrz
Output filename is: myLargeArchive.tar
Decompressing...
Failed to decompress buffer - lzmaerr=1
Failed to decompress buffer - lzmaerr=1
Failed to decompress in ucompthread
Fatal error - exiting

I remember, that, at that time, I did some testing with smaller archives and compression and decompression worked fine.

[Googling around](https://www.google.com/#q=lrzip "Failed to decompress in ucompthread"&tbs=li:1), I found that someone posted this error message in your (?) blog, but only concerning very small archives.

I am on Debian Testing, and it occured in Debian's lrzip version 0.608-2. I compiled lrzip-0.614 from http://ck.kolivas.org/apps/lrzip/ but get the same result.

Is there something I can do to about it?

I probably created the archive like this

$ tar ... | lrzip -q -L6 -p2 -w5

Thanks

Link bug tracker from BUGS file

It would be useful to have a link to this issue tracker https://github.com/ckolivas/lrzip/issues in the BUGS file

zstd support

I am horribly impressed by (p)zstd https://github.com/facebook/zstd

The parallelized version is blazingly fast on a multicore system: You are getting gzip compression at lzo speed.

Will that be an option in addition to --zpaq --bzip2 --gzip --lzo --lzma?

no way of non-interactively entering a passprase for -e

I've looked at the source and it seems that there is no way to non-interactively specify a passphrase to the lrzip binary?

Is that really so? I have about 300k files to recompress and would like to encrypt them as well, but don't really feel like entering a passphrase on the terminal 300k times ;)

lrzip does not work on big endian unless endian.h is present.

When I was compiling lrzip 0.621 on a Solaris Sparc machine I noticed that lrzip could neither produce correct lrz-files or decompress correct lrz-files.

The problem is in the endian-detection in lrzip_private.h if the system does not have a endian.h-file.
This might look ok, but unless __BIG_ENDIAN and __LITTLE_ENDIAN are already defined to something, the value of __BYTE_ORDER is the same regardless of what WORDS_BIGENDIAN is set to:

ifndef __BYTE_ORDER

ifdef WORDS_BIGENDIAN

define __BYTE_ORDER __BIG_ENDIAN

else

define __BYTE_ORDER __LITTLE_ENDIAN

endif

Which means that further down in lrzip_private.h when you do

if __BYTE_ORDER == __LITTLE_ENDIAN

this will return true even when WORDS_BIGENDIAN is set.

I have attached a patch which should fix the problem.

Running attached test with gcc 4.9.3 demonstrates the problem (the test works on Linux x64 also).
% gcc -DWORDS_BIGENDIAN endian_test.c
% ./a.out
Little Endian
but WORDS_BIGENDIAN is true !!
% gcc -DWORDS_BIGENDIAN -D__BIG_ENDIAN=1 -D__LITTLE_ENDIAN=2 endian_test.c
% ./a.out
Big Endian

endian_patch.txt
endian_test_c.txt

Successful path through bool lrzip_compress_full may leak

I noticed that the Lrzip*, lr is freed when lrzip_run returns an error, but not when lrzip_run, and subsequently lrzip_compress_full succeeds.

lrzip_decompress also has the same issue. Is this intentional?

I see no other place where that memory is freed, so I highly doubt it is.

(picked up running a scan-build make)

ckolivas / lrzip Goto Github PK

lrzip's Introduction

lrzip - Long Range ZIP or LZMA RZIP

haneefmubarak's TL;DR for the long explanation:

Compression:

Decompression for the kind of file from above:

lrzip build/install guide:

What you will need

Obtaining the source

Build

Install

lrzip 101:

lrzip internals

FAQ

LIMITATIONS

BUGS:

Backends:

Thanks (CONTRIBUTORS)

README Authors

lrzip's People

Contributors

Stargazers

Watchers

Forkers

lrzip's Issues

Cannot have -U and stdin, unlimited mode disabled.

ifndef __BYTE_ORDER

ifdef WORDS_BIGENDIAN

define __BYTE_ORDER __BIG_ENDIAN

else

define __BYTE_ORDER __LITTLE_ENDIAN

endif

endif

if __BYTE_ORDER == __LITTLE_ENDIAN

Recommend Projects

Recommend Topics

Recommend Org