Comments (4)
One possible improvement I see: when decompressing I see here a void *
is created which is later copied into a bytes object with PyBytes_FromStringAndSize.
But you can also do
PyObject * return_value = PyBytes_FromStringAndSize(NULL, decompressed_size);
void * decompressed_data = (void *)PyBytes_AS_STRING(return_value);
// Decompression set up here
libdeflate_gzip_decompress(
decompressor, data.buf, data.len, decompressed_data, size, &decompressed_size);
// error-handling code here
return return_value;
This way you only allocate a output buffer once for the bytes object. No copying required.
from deflate.
Thanks for the suggestion! It wasn't quite that simple, since decompressed_size isn't known before decompression, but there is a _PyBytes_Resize
function. Almost certainly better than a copy.
I wrote this to use in https://github.com/imsweb/pzip, which compresses (and encrypts) in chunks, so I had no need for a streaming interface -- libdeflate is very well suited for this case.
from deflate.
t wasn't quite that simple, since decompressed_size isn't known before decompression
Well it should be equal to the ISIZE block from the gzip trailer. Otherwise the gzip is corrupt. So you already initiate the buffer with the correct size. And the nice thing is that _PyBytes_Resize
quits early when the size is already correct. So no resizing happens in the correct case.
I wrote this to use in https://github.com/imsweb/pzip, which compresses (and encrypts) in chunks, so I had no need for a streaming interface -- libdeflate is very well suited for this case.
Ah very useful. Chunked compression is also used by a format in bioinformatics called BAM. It uses block gzip format, which is basically compressed blocks. The length of the compressed block is saved in the first EXTRA field, while the length of the decompressed result is saved in ISIZE. This is very useful as you know the exact sizes of the buffers.
from deflate.
I just got a notification (I comaintain the conda-feedstock for libdeflate) https://github.com/ebiggers/libdeflate/releases/tag/v1.9. FYI.
from deflate.
Related Issues (20)
- add deflate.crc32 api? HOT 1
- tests? HOT 1
- AttributeError: module 'deflate' has no attribute 'DeflateError'
- CI via github actions? HOT 4
- add tests for crc32
- update submodule to v1.10?
- performance issues HOT 11
- new release? HOT 15
- add git tags? HOT 3
- columns discrepancy
- macos "build wheels" ran for 5h when i cancelled it
- fix pypy support? HOT 5
- Upgrade to libdeflate 1.18 HOT 1
- Python 3.12 Wheels HOT 1
- segmentation fault using deflate_decompress HOT 6
- `import deflate` fails immediately on MacOS HOT 14
- Decompression failed HOT 9
- Add libdeflate gunzip logic to handle gzipped files which decompress to files larger than 4G or which contain multiple concatenated gzip files. HOT 2
- Can this library be used for streaming decompression? HOT 1
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from deflate.