Code Monkey home page Code Monkey logo

kaitai_compress's People

Contributors

cugu avatar generalmimon avatar greycat avatar kolanich avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

kaitai_compress's Issues

How will we deal with the testing blobs

I dislike storing the blobs in this repo. Also I dislike using any CLI tools to generate them. Yes it tests compatibility to CLI tools, but there are some issues.

  • Today I have implemented the tests for #2 and saw that zstd tests don't pass. It turned out the tests files are incompatible to the current version (I have not tested against old ones). I have regenerated them with the recent version and everything has started passing.

  • lzma (lzma version 1, alone format) files also don't work IDK why.

  • testing files are binary which don't suit well for git.

  • testing files don't align with Kaitai Struct use case well. KS is used to parse custom binary formats, in these formats compressed streams are achieved not by calling CLI utilities but by using API of the libs.

So I propose to get rid of all of the testing files (and remove them from history) and instead generate some random data in runtime, compress it using the interface for serialization I have introduced and then decompress and verify that the decompressed stream matches the original one. I have implemented only python part of serializing interface because I don't use node.js and Ruby.

If you prefer to keep them ... probably they should be moved to LFS rewriting all the history.

Meet fileTestSuite

@GreyCat, @generalmimon, @armijnhemel, what do you think about https://github.com/fileTestSuite/fileTestSuite (an example of a compliant repo is https://github.com/implode-compression-impls/implode_test_files) ? Please note, this software and its deps are currently of alpha quality, extensive refactoring is being done, currently it contains memory-safety bugs, and all the commits land into the repos with the history being rewritten.

@generalmimon , I have already considered using reuse tool for that, but it'd be a bit painful to merge changes in dep5 files (and having a license file for each file is not ok, it'd only cause junk). So, reuse spec doesn't currently match the needs of that project.

WebAssembly?

All sorts of languages can already compile to WebAssembly, there are several efforts to write standalone VMs (awesome-wasm#non-web-embeddings, life, and others) and it is already possible to run WebAssembly through node (wast example, can easily be modified to run wasm, too).

So how about implementing all the pre-processors in WebAssembly and just having a wrapper for each language? This would make it so the pre-processors only have to be implemented once (especially useful for stuff like LZHAM with C++ source only). Thoughts?

Hello, what about inflate / decompress ?

Various compression algorithms allow to inflate only passing in the uncompressed size, would be awesome to have it in kaitai.

I am dealing with git packfiles : they are pretty classical bundles of zlib blobs.
The annoying thing is that git stores in these pack files, as a blob header, only the inflated / decompressed size :
deserializing from kaitai needs a dependency on zlib or a simultaneous parsing of the sibling external index file which has bundle offsets.

P.S. : how can I help?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.