I'm not sure how "invalid" input data shall be processed. Could this be clarified ?</p

[Question] Invalid input data handling ? about base64 HOT 10 OPEN

aklomp commented on July 21, 2024 2

[Question] Invalid input data handling ?

from base64.

Comments (10)

aklomp commented on July 21, 2024

The compliance baseline for this library should be the relevant RFC 4648. I never liked Postel's law much ("be strict in what you send, and liberal in what you accept"). Being liberal just invites abuse. All your examples should throw an error when they violate the preconditions.

from base64.

mayeut commented on July 21, 2024

I'll check the RFC 4648and try to make a PR to update the test suite and library to handle invalid input properly.

from base64.

aklomp commented on July 21, 2024

Thanks! Let's have a library that can at least strictly validate. If at any later moment we want to relax the validation, we could create a flag for it.

(Personally, I think that any string that violates the base64 spec as given in the RFC, is not a valid base64 string, and should not be given the benefit of the doubt.)

from base64.

aklomp commented on July 21, 2024

Maybe one comment: it's quite common for base64 input to contain whitespace (newlines and such), and it would be helpful to have a flag to tell the library to ignore it. This does not negate the fact that a bona-fide base64 string with such characters is invalid, but it would indicate that it would be useful to have a flag that tells the library to strip out those characters before processing. (See also #15).

from base64.

mayeut commented on July 21, 2024

So, not much else than what you're saying after reviewing RFC 4648.

First step, report all violations.
Second step, allow whitespace. See #33

from base64.

mayeut commented on July 21, 2024

For now, I added validation for base64_decode function only.
The streaming implementation would require a final step (as can be seen in validation for base64_decode). This requires adding a new function in the API or asking integrators to do the same check as what's done for base64_decode.
I'm more in favor of adding a new API function. Could be something like:

int
base64_stream_decode_final (struct base64_state *state)
{
    state->eof = BASE64_EOF;
    if (state->bytes == 0) {
        return 1;
    }
    return 0;
}

from base64.

aklomp commented on July 21, 2024

Hm, I'd like to avoid changing the API if possible. Your suggested addition basically checks if there are no trailing characters in the decode buffer. I'm not sure I consider that something that we must intercept and note. My attitude would be "as long as we honour the EOF marker and output no further bytes, who cares about trailing bytes". As far as decoding is concerned, those bytes are neither here or there, and it seems pedantic to point out to the user that his buffer length calculation is "optimistic".

What do you think? Maybe the right way is to be pedantic and technically correct.

from base64.

mayeut commented on July 21, 2024

Well, if I quote what you said in an earlier comment:

The compliance baseline for this library should be the relevant RFC 4648. I never liked Postel's law much ("be strict in what you send, and liberal in what you accept"). Being liberal just invites abuse. All your examples should throw an error when they violate the preconditions.

I'd say that the library should throw an error even when there are trailing bytes remaining. For sure, it shall be the case for the non streaming API. Since there's no API to get the number of unread bytes, I'd say the library shall return an error in this case (it might be possible to compute this with this version but adding whitespace character stripping will remove that possibility).

I did not understand this:

As far as decoding is concerned, those bytes are neither here or there, and it seems pedantic to point out to the user that his buffer length calculation is "optimistic".

The only thing that will be pointed out to the user is that he's feeding too much (or not enough) data in the decoder.

The API change might be considered an enhancement only. Backward compatibility is still here. If the user wants a stricter behavior then, he can integrate the new function in his code.

from base64.

aklomp commented on July 21, 2024

The viewpoint changes depending on how you conceptualize the buffer. If you define it as being strictly a Base64-encoded string and nothing more, you get a different viewpoint than when you define it as a buffer that contains a properly-terminated Base64-encoded string starting at offset 0.

In the first case trailing bytes are an error, while in the second case trailing bytes after a valid string are simply irrelevant (which is what I meant with "neither here nor there": they don't matter).

I've been holding more or less the second view of the buffer, whereas you are closer to the first. I don't think my view contradicts what I said earlier about violating the preconditions, because the preconditions are defined for the Base64-encoded string itself and not the buffer it's embedded in. Of your four examples, I believe the first three should throw errors, whereas the fourth is probably OK (but no bytes should be output after seeing the EOF marker).

Concretely, I've assumed that it's not an error to pass an oversized buffer (of "optimistic" length :), as long as the Base64 string starting at offset 0 is valid.

Maybe a stricter check would be a good idea, though. I do still think that it's better to be strict than liberal in these cases.

from base64.

mayeut commented on July 21, 2024

OK, I see what you meant now.
What bothers me with that viewpoint is that, if I'm not mistaken or missing something else, it's only valid for "terminated" Base64-encoded string. That is, for Base64-encoded strings that are multiple of 4 characters, what remains will be decoded (or return an error at some point).
Given what I just said, I don't know what are the real use cases for using an "optimistic" length. It's seems to be error prone more than anything else. Still, I might be missing something.

from base64.

[Question] Invalid input data handling ? about base64 HOT 10 OPEN

Comments (10)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent