<a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="/us

In the specific case that is mentioned, I think that the value of <code class="notrans

My thinking was that we could have a set of macros like <div class="snippet-clipbo

Systematically avoiding overflow for integer operations about dqlite HOT 6 CLOSED

cole-miller commented on July 2, 2024

Systematically avoiding overflow for integer operations

from dqlite.

Comments (6)

freeekanayaka commented on July 2, 2024 1

Validating user input makes sense to me, but I worry about cases where we can't foresee exactly what validity properties are needed at the point of receiving input. For example, it wasn't immediately obvious to me when reviewing #459 that some multiplications at type size_t couldn't overflowm, where the quantities being multiplied are not related in any remotely transparent way to user input. Mathieu convinced me that those particular multiplications are fine, but if we could reach for __builtin_mul_overflow it would be clear from the code that if overflow did occur we wouldn't continue operating in an unsafe state.

In general, I agree with doing validation up-front where that's practical, but I'm strongly in favor of making liberal use of checked arithmetic for (1) cases where early validation just isn't practical, and (2) even where we can and do validate the inputs, to provide a second line of defense in the event of faulty reasoning on our part, refactorings that introduce unanticipated states, etc. With unchecked arithmetic, the worst case for those kinds of subtle bugs is that we introduce UB; with checked arithmetic, the worst case is that we get a predictable crash that at least points to a call site.

I see your point. To me it sounds like the check should be some kind of assertion (possibly with a macro if that's more convenient), which makes the program abort, in the same vein of all the assertions we have all around the code. Especially at the beginning of functions we place a number of assertions to make sure that the assumptions we make locally about current state or function arguments actually hold, precisely because the reasons you mention, it's hard sometimes to reason through all code and it helps against regressions in refactorings. It could be as easy as placing a couple of assertions before a multiplication, to validate that the terms of the multiplications meet the expectations and assumptions.

Our testing code (unit tests all the way up to jenkins) should be good enough to trigger those assertions in case we screw up something.

Note that in some sense these are not genuine failure modes for which we should really return errors, they are more "bugs" in the code, e.g. we didn't perform validation or we broke some assumptions or similar things. Those are really software defects that should be fixed rather than bumped to the user.

FWIW SQLite uses this kind of technique very extensively (via assertions indeed). There's also an interesting take about assertions here: https://www.sqlite.org/assert.html, which I find quite sensible.

from dqlite.

freeekanayaka commented on July 2, 2024

In the specific case that is mentioned, I think that the value of c->request.words should be checked at the time the request received by the server from the client over the wire gets decoded, basically right after this line:

dqlite/src/conn.c

Line 194 in e618e62

rv = message__decode(&cursor, &c->request);

The other fields of c->request that are decoded by message__decode should be checked too in that spot, so that subsequent code and rely on them to be sane.

Validating the values that the client sends us over the wire is something we probably don't always do, so it'd be nice to check that too.

I'm not entirely sure what macros can do in a case like the mentioned one, what would the macro do?

from dqlite.

cole-miller commented on July 2, 2024

My thinking was that we could have a set of macros like

#define CHECKED_ADD(TY, A, B) \
    { \
        /* If the (infinite-precision) result of adding A to B fits in the type TY, return that value; otherwise, abort/return an error code. */ \
    }

(Or, more likely, with an int return value and out parameter for the result to allow more flexibility for what happens on overflow/wrapping.) It would be a function rather than a macro to handle a variety of input/output types, but we could also just write monomorphic functions for particular sets of types as needed. Then in a case like the one I linked we'd use CHECKED_MUL instead of a regular multiplication, erroring out if overflow/wrapping is detected.

Rather than reinventing the wheel, we could use the GCC/Clang builtins that exist for this purpose:

https://gcc.gnu.org/onlinedocs/gcc/Integer-Overflow-Builtins.html
https://clang.llvm.org/docs/LanguageExtensions.html#checked-arithmetic-builtins

(There was some kind of proposal to standardize these in C23, but it seems not to have made it into the final cut.)

from dqlite.

freeekanayaka commented on July 2, 2024

The drawback I see is that it's difficult to understand what went wrong, because the check does not happen when the user provides the input, but at some point later down the road, so it'd be difficult to produce a meaningful error message (e.g. "invalid message words count: %d", to stick with the mentioned example).

Given that input should be ideally validated, I'd suggest to do check all our assumptions as soon as user input is received. After that point we can safely assume that those assumptions are met (e.g. the words count is not greater than X). Note that there might be other types of issues related to invalid user input that don't fall under the overflow category, so it feels to me that anyway the generic solution would be to validate the input. That should cover the overflow problem and any other ones where we make some sort of assumption on the input.

from dqlite.

cole-miller commented on July 2, 2024

Validating user input makes sense to me, but I worry about cases where we can't foresee exactly what validity properties are needed at the point of receiving input. For example, it wasn't immediately obvious to me when reviewing #459 that some multiplications at type size_t couldn't overflowm, where the quantities being multiplied are not related in any remotely transparent way to user input. Mathieu convinced me that those particular multiplications are fine, but if we could reach for __builtin_mul_overflow it would be clear from the code that if overflow did occur we wouldn't continue operating in an unsafe state.

In general, I agree with doing validation up-front where that's practical, but I'm strongly in favor of making liberal use of checked arithmetic for (1) cases where early validation just isn't practical, and (2) even where we can and do validate the inputs, to provide a second line of defense in the event of faulty reasoning on our part, refactorings that introduce unanticipated states, etc. With unchecked arithmetic, the worst case for those kinds of subtle bugs is that we introduce UB; with checked arithmetic, the worst case is that we get a predictable crash that at least points to a call site.

from dqlite.

freeekanayaka commented on July 2, 2024

Note also that placing individual assertions about the individual arguments of a multiplication just before the multiplication takes place helps in identifying what exactly is wrong, as opposed to just checking the result of the multiplication, in which case you don't know what factor is to blame. This is probably irrelevant for the specific case that was outlined above, for the words count (since one of the two factors is a constant), but it might be true in general.

from dqlite.

Systematically avoiding overflow for integer operations about dqlite HOT 6 CLOSED

Comments (6)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent