Code Monkey home page Code Monkey logo

Comments (6)

freeekanayaka avatar freeekanayaka commented on July 2, 2024 1

Validating user input makes sense to me, but I worry about cases where we can't foresee exactly what validity properties are needed at the point of receiving input. For example, it wasn't immediately obvious to me when reviewing #459 that some multiplications at type size_t couldn't overflowm, where the quantities being multiplied are not related in any remotely transparent way to user input. Mathieu convinced me that those particular multiplications are fine, but if we could reach for __builtin_mul_overflow it would be clear from the code that if overflow did occur we wouldn't continue operating in an unsafe state.

In general, I agree with doing validation up-front where that's practical, but I'm strongly in favor of making liberal use of checked arithmetic for (1) cases where early validation just isn't practical, and (2) even where we can and do validate the inputs, to provide a second line of defense in the event of faulty reasoning on our part, refactorings that introduce unanticipated states, etc. With unchecked arithmetic, the worst case for those kinds of subtle bugs is that we introduce UB; with checked arithmetic, the worst case is that we get a predictable crash that at least points to a call site.

I see your point. To me it sounds like the check should be some kind of assertion (possibly with a macro if that's more convenient), which makes the program abort, in the same vein of all the assertions we have all around the code. Especially at the beginning of functions we place a number of assertions to make sure that the assumptions we make locally about current state or function arguments actually hold, precisely because the reasons you mention, it's hard sometimes to reason through all code and it helps against regressions in refactorings. It could be as easy as placing a couple of assertions before a multiplication, to validate that the terms of the multiplications meet the expectations and assumptions.

Our testing code (unit tests all the way up to jenkins) should be good enough to trigger those assertions in case we screw up something.

Note that in some sense these are not genuine failure modes for which we should really return errors, they are more "bugs" in the code, e.g. we didn't perform validation or we broke some assumptions or similar things. Those are really software defects that should be fixed rather than bumped to the user.

FWIW SQLite uses this kind of technique very extensively (via assertions indeed). There's also an interesting take about assertions here: https://www.sqlite.org/assert.html, which I find quite sensible.

from dqlite.

freeekanayaka avatar freeekanayaka commented on July 2, 2024

In the specific case that is mentioned, I think that the value of c->request.words should be checked at the time the request received by the server from the client over the wire gets decoded, basically right after this line:

rv = message__decode(&cursor, &c->request);

The other fields of c->request that are decoded by message__decode should be checked too in that spot, so that subsequent code and rely on them to be sane.

Validating the values that the client sends us over the wire is something we probably don't always do, so it'd be nice to check that too.

I'm not entirely sure what macros can do in a case like the mentioned one, what would the macro do?

from dqlite.

cole-miller avatar cole-miller commented on July 2, 2024

My thinking was that we could have a set of macros like

#define CHECKED_ADD(TY, A, B) \
    { \
        /* If the (infinite-precision) result of adding A to B fits in the type TY, return that value; otherwise, abort/return an error code. */ \
    }

(Or, more likely, with an int return value and out parameter for the result to allow more flexibility for what happens on overflow/wrapping.) It would be a function rather than a macro to handle a variety of input/output types, but we could also just write monomorphic functions for particular sets of types as needed. Then in a case like the one I linked we'd use CHECKED_MUL instead of a regular multiplication, erroring out if overflow/wrapping is detected.

Rather than reinventing the wheel, we could use the GCC/Clang builtins that exist for this purpose:

https://gcc.gnu.org/onlinedocs/gcc/Integer-Overflow-Builtins.html
https://clang.llvm.org/docs/LanguageExtensions.html#checked-arithmetic-builtins

(There was some kind of proposal to standardize these in C23, but it seems not to have made it into the final cut.)

from dqlite.

freeekanayaka avatar freeekanayaka commented on July 2, 2024

The drawback I see is that it's difficult to understand what went wrong, because the check does not happen when the user provides the input, but at some point later down the road, so it'd be difficult to produce a meaningful error message (e.g. "invalid message words count: %d", to stick with the mentioned example).

Given that input should be ideally validated, I'd suggest to do check all our assumptions as soon as user input is received. After that point we can safely assume that those assumptions are met (e.g. the words count is not greater than X). Note that there might be other types of issues related to invalid user input that don't fall under the overflow category, so it feels to me that anyway the generic solution would be to validate the input. That should cover the overflow problem and any other ones where we make some sort of assumption on the input.

from dqlite.

cole-miller avatar cole-miller commented on July 2, 2024

Validating user input makes sense to me, but I worry about cases where we can't foresee exactly what validity properties are needed at the point of receiving input. For example, it wasn't immediately obvious to me when reviewing #459 that some multiplications at type size_t couldn't overflowm, where the quantities being multiplied are not related in any remotely transparent way to user input. Mathieu convinced me that those particular multiplications are fine, but if we could reach for __builtin_mul_overflow it would be clear from the code that if overflow did occur we wouldn't continue operating in an unsafe state.

In general, I agree with doing validation up-front where that's practical, but I'm strongly in favor of making liberal use of checked arithmetic for (1) cases where early validation just isn't practical, and (2) even where we can and do validate the inputs, to provide a second line of defense in the event of faulty reasoning on our part, refactorings that introduce unanticipated states, etc. With unchecked arithmetic, the worst case for those kinds of subtle bugs is that we introduce UB; with checked arithmetic, the worst case is that we get a predictable crash that at least points to a call site.

from dqlite.

freeekanayaka avatar freeekanayaka commented on July 2, 2024

Note also that placing individual assertions about the individual arguments of a multiplication just before the multiplication takes place helps in identifying what exactly is wrong, as opposed to just checking the result of the multiplication, in which case you don't know what factor is to blame. This is probably irrelevant for the specific case that was outlined above, for the words count (since one of the two factors is a constant), but it might be true in general.

from dqlite.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.