Code Monkey home page Code Monkey logo

Comments (1)

freeekanayaka avatar freeekanayaka commented on July 18, 2024 1

Currently, dqlite and raft make an effort to handle allocation failures. I would like to explore the idea of unconditionally doing pthread_exit when this happens, or at least something closer to that extreme in the space of possible error-handling strategies.

Maybe we could offer that as an option, e.g. raft_set_abort_upon_oom()? However, I'd not remove the possibility for the user to handle it gracefully, see below.

I am not militant about this, but I think it would have some real benefits and in any case it would be good to get clear on what value RAFT_NOMEM et al. are providing (if only for my benefit slightly_smiling_face).

If we report an OOM failure (like any other failure) then the user as the choice of deciding what to do. If they want to abort using pthread_exit, that's easy to do. However they might also want do handle the error gracefully, for example in an embedded device with limited memory that should ideally not go down, or maybe because some malicious client is trying to attack the server in some way.

Like all other errors, leaving the choice to the user offers the most flexibility, especially for a low-level piece like libraft.

Possible advantages of trying less hard to handle allocation failure

The allocation failure handling code is a pain to test (though not impossible)

The current testing approach for OOM is basically to have parameterized tests that inject OOM failures progressively. You run the same test multiple times, but each time the memory allocator injects a failure at a different spot (e.g. in the first run the first call to malloc fails right way, in the second run the first call to malloc succeeds but the second fails, etc). That should already cover quite some ground, but we can surely improve it.

and I wouldn't be at all surprised if there are bugs there.

Yeah, we should definitely improve this. However, in most (if not all) cases I believe the bug wouldn't be related to the type of failure (OOM), but to the fact that a failure occurs at all and is not properly handled. In other words if we find such a bug, it's worth being fixed independently from the type of failure, because if later on we modify that particular buggy code in ways that it can legitimately produce other types of failures (non OOM), then the bug is almost certainly going to be still there.

The simpler our overall error-handling strategy is, the easier it is for us to implement it consistently and without introducing subtle bugs, and the better we can describe it to users of dqlite.

Perhaps until we feel more confident about the robustness of our error handling for this particular failure, we could turn on the abort-upon-oom option (raft_set_abort_upon_oom) in the projects for which we are direct raft users, e.g. dqlite and LXD.

Possible disadvantages

I'm pretty sure no current user of dqlite depends on it gracefully handling allocation failure.

Agreed. But that might change in the future, plus dqlite is not the only consumer of libraft.

Anybody who's using go-dqlite has already signed up for unrecoverable errors when memory is exhausted. But if dqlite gets more users in the future, via the C client, that have different requirements, then "OOM -> pthread_exit" might become a problem. And it would be pretty annoying to have to backtrack from the more brutal error-handling strategy to the more graceful one.

Right. I'd leave the door open and make abort-upon-oom an opt-in options, so we can (eventually) meet the needs of both audiences.

from dqlite.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.