blosc2_decompress_ctx does not take the size of the s

For reference, I have just filed a <a href="https://github.com/zlib-ng/zlib-ng/pull/29

blosc2_decompress_ctx can overrun src about c-blosc2 HOT 8 CLOSED

blosc commented on August 25, 2024

blosc2_decompress_ctx can overrun src

from c-blosc2.

Comments (8)

FrancescAlted commented on August 25, 2024

Yes, for Blosc1 compressed chunks, I think this makes total sense. The proposal would be to add a new function for safe decompression, something like:

int blosc2_decompress_safe_ctx(blosc2_context* context, const void* src, size_t srcsize, void* dest, size_t destsize);

Also, I was thinking in optionally adding a (64-bit) hash at the end of frames (a new container which has been recently introduced in Blosc2). This could bring much higher safety for them. If a hash exists, functions like blosc2_frame_decompress_chunk() could double check that the contents of the frame are ok and safely proceed if that is the case. What do you think?

from c-blosc2.

asomers commented on August 25, 2024

I think that if blosc2 is already making backwards-incompatible changes to the API, then why bother to introduce a new decompression function? I think it would be simpler just to make the normal decompression function safe-by-default. An alternate version, however, would be useful for blosc1, which doesn't want to break backwards compatibility.

I think that a builtin checksum might be useful for some applications, but it doesn't help with the safety situation. A malicious chunk, after all, could still have a valid checksum. Checksums, even cryptographic ones, only protect against accidental damage.

Regarding bundled checksums, there is one benefit that really intrigues me: cache efficiency. Using a separate checksum layer atop of the compression layer requires an application to traverse a block of data twice. Later parts of the first pass could evict useful data from the CPU cache. But a combined hashcompress algorithm could do the whole thing in a single pass. I don't know how much of a performance impact that would have, but it sounds like it might be significant.

from c-blosc2.

FrancescAlted commented on August 25, 2024

I have put some serious thought on this, and I think that in scenarios where safety is important, the user can still call blosc_cbuffer_sizes() first to check whether the cbytes (i.e. the number of bytes to be read from src buffer) is safe for her. I have added a note about this in commit 2c9d598.

One might argue that the destsize argument could be removed for the same reason, but IMO, limiting the destination size is generally more useful, so I think it is fine keeping it.

from c-blosc2.

FrancescAlted commented on August 25, 2024

For reference, I have just filed a new ticket for implementing checksums.

from c-blosc2.

asomers commented on August 25, 2024

However, blosc_cbuffer_sizes works by interrogating the provided buffer. So it still doesn't protect against a corrupted buffer. Blosc's compression and decompression routines should be safe even if provided with a corrupted or garbage buffer. Even a maliciously crafted buffer shouldn't be allowed to cause an array overrun.

from c-blosc2.

FrancescAlted commented on August 25, 2024

Uh, I don't completely understand the difference between passing a size of the source buffer and taking a decision based on the blosc_cbuffer_sizes() feedback. Normally you have an educated guess for the size of the source buffer and this can be compared with the cbytes returned by blosc_cbuffer_sizes(); in case there is too much a difference you can distrust the buffer, right?

On the other hand, one cannot implement the limitation of size of the source buffer as you suggest because it happens that blosc compressed buffers are not decompressed strictly sequentially, but require to access the index of the different blocks inside the buffer that are at the end of the buffer, so we are enforced to read at minimum cbytes bytes.

For the record, the reason why the compressed buffers need the index at the end is because the parallelization creates blocks of different sizes, and the decoder needs this info in order to pass the starting points for every block for every decompressing thread.

from c-blosc2.

asomers commented on August 25, 2024

What I have in mind is an application that receives a blosc buffer from an untrusted source. The pseudocode would look like this:

let buffer = read_from_(net|disk|other);
let destlen = length_that_I_expect_based_on_metadata_stored_somewhere_else();
let dest = make_buffer(destlen);
let err = blosc2_decompress_safe_ctx(ctx, buffer.base_address, buffer.size, dest.base_address, destlen);

Even if the received buffer is malicious, then blosc2_decompress_safe_ctx won't overrun the source buffer, because it knows the source length. You seem to suggest that I should do the following instead:

let buffer = read_from_(net|disk|other);
let destlen = length_that_I_expect_based_on_metadata_stored_somewhere_else();
let dest = make_buffer(destlen);
let (nbytes, cbytes, blocksize) = blosc_cbuffer_sizes(buffer.base_address);
if (cbytes != buffer.len)
    return EINVAL;
if (nbytes != destlen)
    return EINVAL;
let err = blosc2_decompress_safe_ctx(ctx, buffer.base_address, dest.base_address, destlen);

Not only does that strategy require additional error-prone code in every blosc consumer, but the output of blosc_cbuffer_sizes cannot be trusted if the buffer came from an untrusted source.

Why can't blosc2_decompress_safe_ctx enforce the length of the source buffer? It shouldn't be a problem that the buffer is read non-sequentially. All that is required is that it never read beyond buffer.base_address + buffer.size.

from c-blosc2.

FrancescAlted commented on August 25, 2024

Yes, I am suggesting exactly that. I agree that you cannot trust the output of blosc_cbuffer_sizes(), but you can trust your buffer.lenand if they do not match, then you should not proceed decompressing the buffer. I concede that this is a bit more work for users (not that much, just one more call), but this should be a fair price to pay for people that are worried about source trustyness.

from c-blosc2.

blosc2_decompress_ctx can overrun src about c-blosc2 HOT 8 CLOSED

Comments (8)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent