henryr / cap-faq Goto Github PK
View Code? Open in Web Editor NEWThe CAP FAQ
The CAP FAQ
As I mentioned in #3, I think there are two major causes of confusion when it comes to "beating CAP." Number two is the difference between CAP "consistency" and application-level "consistency."
While linearizable registers can be used to guarantee almost any application-level integrity constraint, it's not always required. For example, if we're trying to generate unique IDs for a client, if we opt for a sequential ID generation scheme, then we're going to have to be unavailable. However, if we generate IDs using a clientID and a logical clock, then we can guarantee uniqueness without coordination. This example assumes client IDs are unique, but there are other useful "consistency" properties we can achieve with HA, like ensuring that fields in a database are non-NULL.
What's going on here is that, at a high level, for restricted data types and operations, it is possible to satisfy integrity constraints without coordination and giving up HA. This difference between read/write linearizability and application-level consistency appears to be a source of confusion/contention (e.g., explicitly windowed queries over immutable data). This isn't limited to linearizability: the same debate crops up in conflict serializable databases (e.g., weak isolation models).
Unfortunately, I don't think it's well-understood at this point which application-level integrity constraints are achievable with HA; we've been thinking about this a bit, but I don't have anything concrete enough to add to the FAQ at this point. However, I think it's possible to tie this issue up neatly by stressing that CAP pertains only to read-write registers. That is, application-level integrity constraints are not addressed by the Gilbert and Lynch result. To be complete, the FAQ might mention that some constraints are achievable, but not all are. I think that's the limit of what we know today. This might go in "3. What is 'read-write storage'?" but could also go in the "common fallacies" section.
This is similar to the "What's the relationship between CAP and ACID?" question, but only for CAP "C" and ACID "C," not CAP "C" and ACID {"A", "I", and "D"}. I think it's worth a separate discussion.
Maybe being a bit pedantic but w.r.t
"the CAP theorem tells us that we can't build a database that both responds to every request and returns the results that you would expect every time. It's an impossibility result"
CA systems are possible but then the P goes for a toss !!
In below statement (Line number 5) you have incorrect href attribute(2 : ) which is causing page to go blank when clicked.
<pre><a href="http:://the-paper-trail.org/">http://the-paper-trail.org/</a></pre>>
it should be
<pre><a href="http://the-paper-trail.org/">http://the-paper-trail.org/</a></pre>
Or best option only provide the url and let markdown do rest for you
href="http://the-paper-trail.org/
I think there are two major causes of confusion when it comes to "beating CAP." Number one is the fault model that Gilbert and Lynch assume. (Number two is the confusion over "application-level" versus "CAP" consistency, as in #4.)
I've recently seen several discussions of CAP (even in academic publications) that discuss the availability requirement. Neither of these is actually Gilbert and Lynch HA, but, for their relaxed failure domain, guarantee a response. Here are two examples:
I'm not quite sure how to best address these in the text, but it might be useful.
Two concrete suggestions:
Under "15. Is a failed machine the same as a partitioned one?" the FAQ could mention that, in an HA system, a minority partitioned server still needs to guarantee a response.
Under "12. Is my network really asynchronous?" the FAQ could mention that, in the limit, failures can render any communication network asynchronous.
Alternatively, (at the risk of starting a "list of shame"), the FAQ might expand "17. Have I 'got around' or 'beaten' the CAP theorem?" into a "list of common fallacies" like those above.
I'm curious what you think and am happy to drop a pull request if there's interest.
In section 8. why is CAP true
it says
The basic idea is that if a client writes to one side of a partition, any reads that go to the other side of that partition can't possibly know about the most recent write.
but this is confusing to me. Because in section 7. What is a partition
a partition is defined as
A partition is when the network fails to deliver some messages to one or more nodes by losing them (not by delaying them - eventual delivery is not a partition).
Basically it's not clear to me what you mean by "a client writes to one side of a partition"
. A partition as defined as a network failing to deliver some messages to node(s). I don't understand how that is something that is written to or has sides.
When Gilbert and Lynch proved the CAP theorem they did it by relying on a very tight definition of availability, that included the condition that all nodes in the distributed system should be able to respond to a request in a finite amount of time. Importantly this excludes a typical high availability solution like failover to a replica, where the minority partition becomes unavailable but the distributed system as a whole can still respond to requests.
I think it's important to include this information prominently in your (otherwise very good) CAP write up, as it starts to show why the CAP theorem (as proved) often is not relevant to real world distributed systems.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.