Code Monkey home page Code Monkey logo

cap-faq's People

Contributors

apsops avatar henryr avatar ptarjan avatar sergeivalko avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

cap-faq's Issues

Discussion of CAP consistency and Application-level consistency

As I mentioned in #3, I think there are two major causes of confusion when it comes to "beating CAP." Number two is the difference between CAP "consistency" and application-level "consistency."

While linearizable registers can be used to guarantee almost any application-level integrity constraint, it's not always required. For example, if we're trying to generate unique IDs for a client, if we opt for a sequential ID generation scheme, then we're going to have to be unavailable. However, if we generate IDs using a clientID and a logical clock, then we can guarantee uniqueness without coordination. This example assumes client IDs are unique, but there are other useful "consistency" properties we can achieve with HA, like ensuring that fields in a database are non-NULL.

What's going on here is that, at a high level, for restricted data types and operations, it is possible to satisfy integrity constraints without coordination and giving up HA. This difference between read/write linearizability and application-level consistency appears to be a source of confusion/contention (e.g., explicitly windowed queries over immutable data). This isn't limited to linearizability: the same debate crops up in conflict serializable databases (e.g., weak isolation models).

Unfortunately, I don't think it's well-understood at this point which application-level integrity constraints are achievable with HA; we've been thinking about this a bit, but I don't have anything concrete enough to add to the FAQ at this point. However, I think it's possible to tie this issue up neatly by stressing that CAP pertains only to read-write registers. That is, application-level integrity constraints are not addressed by the Gilbert and Lynch result. To be complete, the FAQ might mention that some constraints are achievable, but not all are. I think that's the limit of what we know today. This might go in "3. What is 'read-write storage'?" but could also go in the "common fallacies" section.

This is similar to the "What's the relationship between CAP and ACID?" question, but only for CAP "C" and ACID "C," not CAP "C" and ACID {"A", "I", and "D"}. I think it's worth a separate discussion.

Pedantic comment

Maybe being a bit pedantic but w.r.t

"the CAP theorem tells us that we can't build a database that both responds to every request and returns the results that you would expect every time. It's an impossibility result"

CA systems are possible but then the P goes for a toss !!

Incorrect Url for paper trail

In below statement (Line number 5) you have incorrect href attribute(2 : ) which is causing page to go blank when clicked.

<pre><a href="http:://the-paper-trail.org/">http://the-paper-trail.org/</a></pre>&gt;

it should be

<pre><a href="http://the-paper-trail.org/">http://the-paper-trail.org/</a></pre>

Or best option only provide the url and let markdown do rest for you

href="http://the-paper-trail.org/

Discussion of weaker failure modes

I think there are two major causes of confusion when it comes to "beating CAP." Number one is the fault model that Gilbert and Lynch assume. (Number two is the confusion over "application-level" versus "CAP" consistency, as in #4.)

I've recently seen several discussions of CAP (even in academic publications) that discuss the availability requirement. Neither of these is actually Gilbert and Lynch HA, but, for their relaxed failure domain, guarantee a response. Here are two examples:

  • "up to F server faults": If you can contact a majority of servers, you can get a response. Not "HA" as minority servers may be partitioned. The HyperDex paper, Section 8 states this assumption rather clearly, noting it is "thus able to provide seemingly impossible guarantees."
  • "for specific fault model[s]": If we provision networks appropriately, and partitions never happen, there many be no partitions! The Windows Azure Storage paper, Section 8 discusses this. It's stronger than the asynchronous model and is not HA (I'll not speculate as to how realistic this is, but the paper is fairly adamant that the system circumvents CAP.)

I'm not quite sure how to best address these in the text, but it might be useful.

Two concrete suggestions:

Under "15. Is a failed machine the same as a partitioned one?" the FAQ could mention that, in an HA system, a minority partitioned server still needs to guarantee a response.
Under "12. Is my network really asynchronous?" the FAQ could mention that, in the limit, failures can render any communication network asynchronous.

Alternatively, (at the risk of starting a "list of shame"), the FAQ might expand "17. Have I 'got around' or 'beaten' the CAP theorem?" into a "list of common fallacies" like those above.

I'm curious what you think and am happy to drop a pull request if there's interest.

Confusion about "writing to one side of a partition"

In section 8. why is CAP true it says

The basic idea is that if a client writes to one side of a partition, any reads that go to the other side of that partition can't possibly know about the most recent write.

but this is confusing to me. Because in section 7. What is a partition a partition is defined as

A partition is when the network fails to deliver some messages to one or more nodes by losing them (not by delaying them - eventual delivery is not a partition).

Basically it's not clear to me what you mean by "a client writes to one side of a partition". A partition as defined as a network failing to deliver some messages to node(s). I don't understand how that is something that is written to or has sides.

Definition of Availability

When Gilbert and Lynch proved the CAP theorem they did it by relying on a very tight definition of availability, that included the condition that all nodes in the distributed system should be able to respond to a request in a finite amount of time. Importantly this excludes a typical high availability solution like failover to a replica, where the minority partition becomes unavailable but the distributed system as a whole can still respond to requests.

I think it's important to include this information prominently in your (otherwise very good) CAP write up, as it starts to show why the CAP theorem (as proved) often is not relevant to real world distributed systems.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.