Code Monkey home page Code Monkey logo

codecbgzf.jl's Introduction

CodecBGZF.jl

CI codecov

Codec for BGZF files

This package implements an efficient codec for BGZF files. The BGZF format consists of the concatenation of small gzip blocks. Because the format is blocked, it allows for random access and siginificantly faster de/compression.

The package has the following notable features:

  • Correctness above all: The BGZF format is well specified, and the package must write and read spec-compliantly. This includes validating the given checksums, decompression lengths, and the trailing EOF block.
  • Integration with the Julia ecosystem. This is achieved by this package being a codec for the TranscodingStreams.jl package.
  • Speed: This package should be the fastest Julia implementation of a BGZF parser. It is achieved by leveraging LibDeflate.jl, and by doing de/compression in a multithreaded and asynchronous manner.
  • Convenient random access with virtual file offsets.
  • Creation of GZI index files directly from compressed bgzipped files.

API

High level API

  • BGZFDecompressorStream(io::IO; nthreads=Threads.nthreads()) - create a decompressing TranscodingStream.
  • BGZFCompressorStream(io::IO; nthreads=Threads.nthreads(), compresslevel=6) - create a compressing TranscodingStream compressing to level compresslevel.
  • gzi(io::IO) - return a Vector{UInt8} representing the GZI index for a BGZF file io. To be used like this: gzi(open("/path/to/file.bgz"))
  • VirtualOffset(s::BGZFDecompressorStream) - Get an object representing the current offset of the stream. You can obtain the block offset and inblock offsets with offsets(v)
  • seek(s::BGZFDecompressorStream, v::VirtualOffset) - seek the stream to the given offset.
  • Being TranscodingStreams, you can expect the usual IO-related functions to work on the streams.

codecbgzf.jl's People

Contributors

jakobnissen avatar

Stargazers

Páll Haraldsson avatar Soc Virnyl S. Estela avatar

Watchers

James Cloos avatar  avatar Ciarán O'Mara avatar  avatar

Forkers

ciaranomara

codecbgzf.jl's Issues

Fix multithreaded writing

MWE:

CodecBGZF.jl on  master via C base took 18s ❯ julia --project=. --startup-file=no -q -t 3
julia> using CodecBGZF

julia> open("/tmp/foo.bgz", "w") do file
           io = BGZFCompressorStream(file)
           write(io, [1,2,3,4])
           close(io)
       end

julia> open(x -> read(BGZFDecompressorStream(x)), "/tmp/foo.bgz")
UInt8[]

It works with 1 and 2 threads.

Also, this package REALLY needs to have some multithreaded testing.

BGZFCompressorStream is writing (almost) empty files

Hi there,

I did this:

open(BGZFCompressorStream,"test.csv.gz", "w") do stream
       CSV.write(stream, df; delim=';', stringtype=String)
end

This writes an almost empty file (28 bytes long although df has 1000 rows a 4 cols)

After reading back with:

io = IOBuffer(Base.read(BGZFDecompressorStream(open(f"test.csv.gz", "r"))))
df = DataFrame(CSV.File(io; delim=';', stringtype=String)

df is empty (no error/exception)

What am I doing wrong?

(It's working with GzipCompressorStream/GzipDecompressorStream

Bye
Oliver

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.