kaitai-io / kaitai_compress Goto Github PK
View Code? Open in Web Editor NEWKaitai Struct: data compression algorithms processing routines
License: MIT License
Kaitai Struct: data compression algorithms processing routines
License: MIT License
I dislike storing the blobs in this repo. Also I dislike using any CLI tools to generate them. Yes it tests compatibility to CLI tools, but there are some issues.
Today I have implemented the tests for #2 and saw that zstd tests don't pass. It turned out the tests files are incompatible to the current version (I have not tested against old ones). I have regenerated them with the recent version and everything has started passing.
lzma (lzma version 1, alone
format) files also don't work IDK why.
testing files are binary which don't suit well for git.
testing files don't align with Kaitai Struct use case well. KS is used to parse custom binary formats, in these formats compressed streams are achieved not by calling CLI utilities but by using API of the libs.
So I propose to get rid of all of the testing files (and remove them from history) and instead generate some random data in runtime, compress it using the interface for serialization I have introduced and then decompress and verify that the decompressed stream matches the original one. I have implemented only python part of serializing interface because I don't use node.js and Ruby.
If you prefer to keep them ... probably they should be moved to LFS rewriting all the history.
@GreyCat, @generalmimon, @armijnhemel, what do you think about https://github.com/fileTestSuite/fileTestSuite (an example of a compliant repo is https://github.com/implode-compression-impls/implode_test_files) ? Please note, this software and its deps are currently of alpha quality, extensive refactoring is being done, currently it contains memory-safety bugs, and all the commits land into the repos with the history being rewritten.
@generalmimon , I have already considered using reuse
tool for that, but it'd be a bit painful to merge changes in dep5
files (and having a license file for each file is not ok, it'd only cause junk). So, reuse
spec doesn't currently match the needs of that project.
Now it is problematic to reference python runtime as a package which should be installed from git, not pypi.
All sorts of languages can already compile to WebAssembly, there are several efforts to write standalone VMs (awesome-wasm#non-web-embeddings, life, and others) and it is already possible to run WebAssembly through node (wast example, can easily be modified to run wasm, too).
So how about implementing all the pre-processors in WebAssembly and just having a wrapper for each language? This would make it so the pre-processors only have to be implemented once (especially useful for stuff like LZHAM with C++ source only). Thoughts?
Various compression algorithms allow to inflate only passing in the uncompressed size, would be awesome to have it in kaitai.
I am dealing with git packfiles : they are pretty classical bundles of zlib blobs.
The annoying thing is that git stores in these pack files, as a blob header, only the inflated / decompressed size :
deserializing from kaitai needs a dependency on zlib or a simultaneous parsing of the sibling external index file which has bundle offsets.
P.S. : how can I help?
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.