Code Monkey home page Code Monkey logo

compress's People

Contributors

dsnet avatar niksko avatar rathann avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

compress's Issues

Question: Is there an example of buffered read/seek in chunks?

I was playing with bgzf archives and it was fairly easy to use bgzf.Reader in bufio.Reader so the the archive could be read in chunks. In one pass I would make an useful index of offsets so later on I could use the very large archive as if it was memory mapped file on the disk.

I tried to find if there's any example of using xflate in the similar way. All of the examples I could find would read the whole compressed archive into the memory.

So, my question is, is there an example of buffered read/seek in chunks of the compressed "xflated" archive?

I found the custom implementation of what I tried to describe here via io.ReadSeeker as not trivial one. So if there's already an example I would appreciate it immensely :)

bzip2: empty files produce invalid bz2 files

Hi,

The following will generate an invalid bzip2 file:

package main

import (
	"fmt"
	"log"
	"os"

	"github.com/dsnet/compress/bzip2"
)

func main() {
	f, err := os.Create("test.bz2")
	if err != nil {
		log.Fatal(err)
	}

	w, err := bzip2.NewWriter(f, &bzip2.WriterConfig{Level: 2})
	if err != nil {
		log.Fatal(err)
	}
	defer w.Close()

	// fmt.Fprint(w, "test")
}

Resulting in:

โžœ  tmp bzip2 -d test.bz2
bzip2: test.bz2 is not a bzip2 file.

Uncommeting fmt.Fprint(w, "test") will produce a valid bz2.

Go get is stuck

Stuck while getting this package, i ve waited for a long time

bzip2: decoder does not check CRC

Found through fuzz testing.

The input string,
"BZh21AY&SY0000\x00\x00\x00\x01\x00?\x80 \x0010\x00\xc4\xc2e\xe2\xeeH\xa7\n\x12\x000000",
is successfully decompressed using github.com/dsnet/compress/bzip2, while it is rejected by the C library.

The string is also rejected by the compress/bzip2 implementation. Since both the Go standard library and C library uses the same huffman implementation, the divergence in behavior likely stems from there.

Lack of deflate backwards compatibility?

I've faced issues with standard linux unzip being unable to unzip the file that I create programmatically with xflate compression. I decided to repro on a smaller case and got same results. I basically took the example from the documentation of xflate package and outputted the resulting zip to a file, source code follows:

package main

import (
	"archive/zip"
	"io"
	"io/ioutil"
	"log"
	"os"

	"github.com/dsnet/compress/xflate"
)

func init() { log.SetFlags(log.Lshortfile) }

// MustLoadFile must load a file or else panics.
func MustLoadFile(file string) []byte {
	b, err := ioutil.ReadFile(file)
	if err != nil {
		panic(err)
	}
	return b
}

func main() {
	// Test files of non-trivial sizes.
	files := map[string][]byte{
		"twain.txt":   MustLoadFile("testdata/twain.txt"),
		"digits.txt":  MustLoadFile("testdata/digits.txt"),
		"huffman.txt": MustLoadFile("testdata/huffman.txt"),
	}

	// Write the Zip archive.
	out, err := os.Create("output.zip")
	if err != nil {
		log.Fatal(err)
	}
	zw := zip.NewWriter(out)
	zw.RegisterCompressor(zip.Deflate, func(wr io.Writer) (io.WriteCloser, error) {
		// Instead of the default DEFLATE compressor, register one that uses
		// XFLATE instead. We choose a relative small chunk size of 64KiB for
		// better random access properties, at the expense of compression ratio.
		return xflate.NewWriter(wr, &xflate.WriterConfig{
			Level:     xflate.BestSpeed,
			ChunkSize: 1 << 16,
		})
	})
	for _, name := range []string{"twain.txt", "digits.txt", "huffman.txt"} {
		body := files[name]
		f, err := zw.Create(name)
		if err != nil {
			log.Fatal(err)
		}
		if _, err = f.Write(body); err != nil {
			log.Fatal(err)
		}
	}
	if err := zw.Close(); err != nil {
		log.Fatal(err)
	}
	err = out.Close()
	if err != nil {
		log.Fatal(err)
	}
}

Here's what's happening with output.zip that this program created:

$ unzip -l output.zip
Archive:  output.zip
  Length      Date    Time    Name
---------  ---------- -----   ----
   387969  1980-00-00 00:00   twain.txt
   100003  1980-00-00 00:00   digits.txt
   262144  1980-00-00 00:00   huffman.txt
---------                     -------
   750116                     3 files

$ unzip -t output.zip
Archive:  output.zip
    testing: twain.txt               (incomplete d-tree)
  error:  invalid compressed data to inflate
    testing: digits.txt              (incomplete d-tree)
  error:  invalid compressed data to inflate
    testing: huffman.txt             (incomplete d-tree)
  error:  invalid compressed data to inflate
At least one error was detected in output.zip.

I'm not sure that's an expected behavior. To double check I've took the zip to my Windows system, and it was also unable to extract the files.

bzip2: encoder crashes in SAIS code on some inputs

Reported by email some time ago by @flanglet; recording here so I don't forget.

On some files in the Silesia corpus, the bzip2 encoder will crash:

panic: runtime error: index out of range [recovered]
    panic: runtime error: index out of range

goroutine 1 [running]:
github.com/dsnet/compress/bzip2.errRecover(0xc820060648)
    /home/fred/workspace/kanzi/go/src/github.com/dsnet/compress/bzip2/common.go:62 +0xd6
github.com/dsnet/compress/bzip2/internal/sais.getBuckets_int(0xc82015a000, 0x50a, 0x50a, 0x0, 0x0, 0x0, 0x50a, 0x501)
    /home/fred/workspace/kanzi/go/src/github.com/dsnet/compress/bzip2/internal/sais/sais_int.go:47 +0x68
github.com/dsnet/compress/bzip2/internal/sais.computeSA_int(0xc820148840, 0x1954, 0x1954, 0xc82013a000, 0x365c, 0x365c, 0x3b4, 0x1954, 0x50a)
    /home/fred/workspace/kanzi/go/src/github.com/dsnet/compress/bzip2/internal/sais/sais_int.go:474 +0x2b7
github.com/dsnet/compress/bzip2/internal/sais.computeSA_byte(0xc82006c000, 0x365c, 0x927c0, 0xc82013a000, 0x365c, 0x365c, 0x0, 0x365c, 0x100)
    /home/fred/workspace/kanzi/go/src/github.com/dsnet/compress/bzip2/internal/sais/sais_byte.go:569 +0x882
github.com/dsnet/compress/bzip2/internal/sais.ComputeSA(0xc82006c000, 0x365c, 0x927c0, 0xc82013a000, 0x365c, 0x365c)
    /home/fred/workspace/kanzi/go/src/github.com/dsnet/compress/bzip2/internal/sais/common.go:28 +0xcd
github.com/dsnet/compress/bzip2.(*burrowsWheelerTransform).Encode(0xc8200606a0, 0xc82006c000, 0x1b2e, 0x927c0, 0x0)
    /home/fred/workspace/kanzi/go/src/github.com/dsnet/compress/bzip2/bwt.go:43 +0x1de
github.com/dsnet/compress/bzip2.(*Writer).encodeBlock(0xc820060400, 0xc82006c000, 0x1b2e, 0x927c0)
    /home/fred/workspace/kanzi/go/src/github.com/dsnet/compress/bzip2/writer.go:163 +0x10e
github.com/dsnet/compress/bzip2.(*Writer).flush.func1(0xc820060400, 0xc82006c000, 0x1b2e, 0x927c0)
    /home/fred/workspace/kanzi/go/src/github.com/dsnet/compress/bzip2/writer.go:103 +0x134
github.com/dsnet/compress/bzip2.(*Writer).flush(0xc820060400, 0x0, 0x0)
    /home/fred/workspace/kanzi/go/src/github.com/dsnet/compress/bzip2/writer.go:104 +0xcd
github.com/dsnet/compress/bzip2.(*Writer).Close(0xc820060400, 0x0, 0x0)
    /home/fred/workspace/kanzi/go/src/github.com/dsnet/compress/bzip2/writer.go:127 +0xa5
main.main()
    /ws/compress/bzip2/bzip2.go:12 +0x11e

brotli: test failure on Fedora rawhide ppc64le

cc9eb1d with #64 applied fails brotli tests when built on Fedora rawhide ppc64le arch:

GOPATH=/builddir/build/BUILD/compress-cc9eb1d7ad760af14e8f918698f745e80377af4f/_build:/usr/share/gocode
+ go test -buildmode pie -compiler gc -ldflags '-extldflags '\''-Wl,-z,relro -Wl,--as-needed  -Wl,-z,now -specs=/usr/lib/rpm/redhat/redhat-hardened-ld '\'''
--- FAIL: TestReader (0.15s)
    reader_test.go:528: test 20, mismatching error:
        got brotli: corrupted input
        want IsCorrupted(err) == true
    reader_test.go:528: test 22, mismatching error:
        got brotli: corrupted input
        want IsCorrupted(err) == true
    reader_test.go:528: test 26, mismatching error:
        got brotli: corrupted input
        want IsCorrupted(err) == true
    reader_test.go:528: test 27, mismatching error:
        got brotli: corrupted input
        want IsCorrupted(err) == true
    reader_test.go:528: test 28, mismatching error:
        got brotli: corrupted input
        want IsCorrupted(err) == true
    reader_test.go:528: test 33, mismatching error:
        got brotli: corrupted input
        want IsCorrupted(err) == true
    reader_test.go:528: test 34, mismatching error:
        got brotli: corrupted input
        want IsCorrupted(err) == true
    reader_test.go:528: test 39, mismatching error:
        got brotli: corrupted input
        want IsCorrupted(err) == true
    reader_test.go:528: test 44, mismatching error:
        got brotli: corrupted input
        want IsCorrupted(err) == true
    reader_test.go:528: test 49, mismatching error:
        got brotli: corrupted input
        want IsCorrupted(err) == true
    reader_test.go:528: test 50, mismatching error:
        got brotli: corrupted input
        want IsCorrupted(err) == true
    reader_test.go:528: test 53, mismatching error:
        got brotli: corrupted input
        want IsCorrupted(err) == true
    reader_test.go:528: test 54, mismatching error:
        got brotli: corrupted input
        want IsCorrupted(err) == true
    reader_test.go:528: test 56, mismatching error:
        got brotli: corrupted input
        want IsCorrupted(err) == true
    reader_test.go:528: test 57, mismatching error:
        got brotli: corrupted input
        want IsCorrupted(err) == true
    reader_test.go:528: test 63, mismatching error:
        got brotli: corrupted input
        want IsCorrupted(err) == true
FAIL

Unfortunately, I don't know enough Go and I'm not familiar with brotli to fix this right now.

BufferedReader (missing Discard method)

I just tried to go get in golang 1.4.2:

$ go get github.com/dsnet/compress
# github.com/dsnet/compress
go/src/github.com/dsnet/compress/api.go:52: cannot use (*bufio.Reader)(nil) (type *bufio.Reader) as type BufferedReader in assignment:
    *bufio.Reader does not implement BufferedReader (missing Discard method)

Thank you for this great package

(This is not a bug, you may close the issue.) I'm still on the edge of my seat for brotli write support, but in the meantime I have found the bzip2 writer very useful. Thank you for writing this package! Looking forward to its completion.

bzip2: slow compression performance

I'm currently converting a python tool that uses bzip2 for compression to go.

I stumbled upon missing bzip2 compression in golang and found your package (thanks for implementing it!).

Yet, bzip2 compression speed seems to be way lower than with my python variant.
It seems that the speed is independent of the compression level that I apply.
According to pprof most of the time is spent in computeSA_byte which in turn spends most of its time in sortLMS2_int and induceSA_int.

My python implementation is at least 3x faster.

Do you have any hints how I can speed it up?

My application feeds lines of (json) data to the bzip2.Writer, i.e., basically a loop over the lines (actually they are streamed from somewhere else (so I don't know when this end)) calling the Write method for each line.
In contrast to my python implementation in which I can manually call a flush method when I expect no more data (basically after a timeout not receiving further lines), it seems that the go implementation calls flush after each call to write automatically (I have only briefly looked at the code but the above mentioned functions seem to originate from flush). Also I found that your implementation causes larger files compared to my python equivalent (which I suppose might also come from prematurely flushing data?!).

Thanks for any help
Jan

Vulnerability in xz version v0.5.6

This version of xz is vulnerable to a denial of service and an infinite loop.

Your bz2 encoder seems to be the only one out there at the moment. This means that packages that need to do bz2 encoding (or test bz2 decoding) require this package eventually, which means this vulnerability is likely to cause headaches.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.