henryr / cap-faq Goto Github PK

The CAP FAQ

cap-faq's Introduction

redirect_to
http://www.the-paper-trail.org/page/cap-faq

The CAP FAQ

Version 1.0, May 9th 2013
By: Henry Robinson / [email protected] / @henryr

http://the-paper-trail.org/

0. What is this document?

No subject appears to be more controversial to distributed systems engineers than the oft-quoted, oft-misunderstood CAP theorem. The purpose of this FAQ is to explain what is known about CAP, so as to help those new to the theorem get up to speed quickly, and to settle some common misconceptions or points of disagreement.

Of course, there's every possibility I've made superficial or completely thorough mistakes here. Corrections and comments are welcome: let me have them.

There are some questions I still intend to answer. For example

What's the relationship between CAP and performance?
What does CAP mean to me as an engineer?
What's the relationship between CAP and ACID?

Please suggest more.

1. Where did the CAP Theorem come from?

Dr. Eric Brewer gave a keynote speech at the Principles of Distributed Computing conference in 2000 called 'Towards Robust Distributed Systems' [1]. In it he posed his 'CAP Theorem' - at the time unproven - which illustrated the tensions between being correct and being always available in distributed systems.

Two years later, Seth Gilbert and Professor Nancy Lynch - researchers in distributed systems at MIT - formalised and proved the conjecture in their paper “Brewer's conjecture and the feasibility of consistent, available, partition-tolerant web services” [2].

[1] http://www.cs.berkeley.edu/~brewer/cs262b-2004/PODC-keynote.pdf [2] https://users.ece.cmu.edu/~adrian/731-sp04/readings/GL-cap.pdf

2. What does the CAP Theorem actually say?

The CAP Theorem (henceforth 'CAP') says that it is impossible to build an implementation of read-write storage in an asynchronous network that satisfies all of the following three properties:

Availability - will a request made to the data store always eventually complete?
Consistency - will all executions of reads and writes seen by all nodes be atomic or linearizably consistent?
Partition tolerance - the network is allowed to drop any messages.

The next few items define any unfamiliar terms.

More informally, the CAP theorem tells us that we can't build a database that both responds to every request and returns the results that you would expect every time. It's an impossibility result - it tells us that something we might want to do is actually provably out of reach. It's important now because it is directly applicable to the many, many distributed systems which have been and are being built in the last few years, but it is not a death knell: it does not mean that we cannot build useful systems while working within these constraints.

The devil is in the details however. Before you start crying 'yes, but what about...', make sure you understand the following about exactly what the CAP theorem does and does not allow.

3. What is 'read-write storage'?

CAP specifically concerns itself with a theoretical construct called a register. A register is a data structure with two operations:

set(X) sets the value of the register to X
get() returns the last value set in the register

A key-value store can be modelled as a collection of registers. Even though registers appear very simple, they capture the essence of what many distributed systems want to do - write data and read it back.

4. What does atomic (or linearizable) mean?

Atomic, or linearizable, consistency is a guarantee about what values it's ok to return when a client performs get() operations. The idea is that the register appears to all clients as though it ran on just one machine, and responded to operations in the order they arrive.

Consider an execution consisting the total set of operations performed by all clients, potentially concurrently. The results of those operations must, under atomic consistency, be equivalent to a single serial (in order, one after the other) execution of all operations.

This guarantee is very strong. It rules out, amongst other guarantees, eventual consistency, which allows a delay before a write becomes visible. So under EC, you might have:

set(10), set(5), get() = 10

But this execution is invalid under atomic consistency.

Atomic consistency also ensures that external communication about the value of a register is respected. That is, if I read X and tell you about it, you can go to the register and read X for yourself. It's possible under slightly weaker guarantees (serializability for example) for that not to be true. In the following we write A: to mean that client A executes the following operation.

B:set(5), A:set(10), A:get() = 10, B:get() = 10

This is an atomic history. But the following is not:

B:set(5), A:set(10), A:get() = 10, B:get() = 5

even though it is equivalent to the following serial history:

B:set(5), B:get() = 5, A:set(10), A:get() = 10

In the second example, if A tells B about the value of the register (10) after it does its get(), B will falsely believe that some third-party has written 5 between A:get() and B:get(). If external communication isn't allowed, B cannot know about A:set, and so sees a consistent view of the register state; it's as if B:get really did happen before A:set.

Wikipedia [1] has more information. Maurice Herlihy's original paper from 1990 is available at [2].

[1] http://en.wikipedia.org/wiki/Linearizability [2] http://cs.brown.edu/~mph/HerlihyW90/p463-herlihy.pdf

5. What does asynchronous mean?

An asynchronous network is one in which there is no bound on how long messages may take to be delivered by the network or processed by a machine. The important consequence of this property is that there's no way to distinguish between a machine that has failed, and one whose messages are getting delayed.

6. What does available mean?

A data store is available if and only if all get and set requests eventually return a response that's part of their specification. This does not permit error responses, since a system could be trivially available by always returning an error.

There is no requirement for a fixed time bound on the response, so the system can take as long as it likes to process a request. But the system must eventually respond.

Notice how this is both a strong and a weak requirement. It's strong because 100% of the requests must return a response (there's no 'degree of availability' here), but weak because the response can take an unbounded (but finite) amount of time.

7. What is a partition?

A partition is when the network fails to deliver some messages to one or more nodes by losing them (not by delaying them - eventual delivery is not a partition).

The term is sometimes used to refer to a period during which no messages are delivered between two sets of nodes. This is a more restrictive failure model. We'll call these kinds of partitions total partitions.

The proof of CAP relied on a total partition. In practice, these are arguably the most likely since all messages may flow through one component; if that fails then message loss is usually total between two nodes.

8. Why is CAP true?

The basic idea is that if a client writes to one side of a partition, any reads that go to the other side of that partition can't possibly know about the most recent write. Now you're faced with a choice: do you respond to the reads with potentially stale information, or do you wait (potentially forever) to hear from the other side of the partition and compromise availability?

This is a proof by construction - we demonstrate a single situation where a system cannot be consistent and available. One reason that CAP gets some press is that this constructed scenario is not completely unrealistic. It is not uncommon for a total partition to occur if networking equipment should fail.

9. When does a system have to give up C or A?

CAP only guarantees that there is some circumstance in which a system must give up either C or A. Let's call that circumstance a critical condition. The theorem doesn't say anything about how likely that critical condition is. Both C and A are strong guarantees: they hold only if 100% of operations meet their requirements. A single inconsistent read, or unavailable write, invalidates either C or A. But until that critical condition is met, a system can be happily consistent and available and not contravene any known laws.

Since most distributed systems are long running, and may see millions of requests in their lifetime, CAP tells us to be cautious: there's a good chance that you'll realistically hit one of these critical conditions, and it's prudent to understand how your system will fail to meet either C or A.

10. Why do some people get annoyed when I characterise my system as CA?

Brewer's keynote, the Gilbert paper, and many other treatments, places C, A and P on an equal footing as desirable properties of an implementation and effectively say 'choose two!'. However, this is often considered to be a misleading presentation, since you cannot build - or choose! - 'partition tolerance': your system either might experience partitions or it won't.

CAP is better understood as describing the tradeoffs you have to make when you are building a system that may suffer partitions. In practice, this is every distributed system: there is no 100% reliable network. So (at least in the distributed context) there is no realistic CA system. You will potentially suffer partitions, therefore you must at some point compromise C or A.

Therefore it's arguably more instructive to rewrite the theorem as the following:

Possibility of Partitions => Not (C and A)

i.e. if your system may experience partitions, you can not always be C and A.

There are some systems that won't experience partitions - single-site databases, for example. These systems aren't generally relevant to the contexts in which CAP is most useful. If you describe your distributed database as 'CA', you are misunderstanding something.

11. What about when messages don't get lost?

A perhaps surprising result from the Gilbert paper is that no implementation of an atomic register in an asynchronous network can be available at all times, and consistent only when no messages are lost.

This result depends upon the asynchronous network property, the idea being that it is impossible to tell if a message has been dropped and therefore a node cannot wait indefinitely for a response while still maintaining availability, however if it responds too early it might be inconsistent.

12. Is my network really asynchronous?

Arguably, yes. Different networks have vastly differing characteristics.

Your nodes do not have clocks (unlikely) or they have clocks that may drift apart (more likely)
System processes may arbitrarily delay delivery of a message (due to retries, or GC pauses)

then your network may be considered asynchronous.

Gilbert and Lynch also proved that in a partially-synchronous system, where nodes have shared but not synchronised clocks and there is a bound on the processing time of every message, that it is still impossible to implement available atomic storage.

However, the result from #8 does not hold in the partially-synchronous model; it is possible to implement atomic storage that is available all the time, and consistent when all messages are delivered.

13. What, if any, is the relationship between FLP and CAP?

The Fischer, Lynch and Patterson theorem ('FLP') (see [1] for a link to the paper and a proof explanation) is an extraordinary impossibility result from nearly thirty years ago, which determined that the problem of consensus - having all nodes agree on a common value - is unsolvable in general in asynchronous networks where one node might fail.

The FLP result is not directly related to CAP, although they are similar in some respects. Both are impossibility results about problems that may not be solved in distributed systems. The devil is in the details. Here are some of the ways in which FLP is different from CAP:

FLP permits the possibility of one 'failed' node which is totally partitioned from the network and does not have to respond to requests.
Otherwise, FLP does not allow message loss; the network is only asynchronous but not lossy.
FLP deals with consensus, which is a similar but different problem to atomic storage.

For a bit more on this topic, consult the blog post at [2].

[1] http://the-paper-trail.org/blog/a-brief-tour-of-flp-impossibility/ [2] https://www.the-paper-trail.org/post/2012-03-25-flp-and-cap-arent-the-same-thing/

14. Are C and A 'spectrums'?

It is possible to relax both consistency and availability guarantees from the strong requirements that CAP imposes and get useful systems. In fact, the whole point of CAP is that you must do this, and any system you have designed and built relaxes one or both of these guarantees. The onus is on you to figure out when, and how, this occurs.

Real systems choose to relax availability - in the case of systems for whom consistency is of the utmost importance, like ZooKeeper. Other systems, like Amazon's Dynamo, relax consistency in order to maintain high degrees of availability.

Once you weaken any of the assumptions made in the statement or proof of CAP, you have to start again when it comes to proving an impossibility result.

15. Is a failed machine the same as a partitioned one?

No. A 'failed' machine is usually excused the burden of having to respond to client requests. CAP does not allow any machines to fail (in that sense it is a strong result, since it shows impossibility without having any machines fail).

It is possible to prove a similar result about the impossibility of atomic storage in an asynchronous network when there are up to N-1 failures. This result has ramifications about the tradeoff between how many nodes you write to (which is a performance concern) and how fault tolerant you are (which is a reliability concern).

16. Is a slow machine the same as a partitioned one?

No: messages eventually get delivered to a slow machine, but they never get delivered to a totally partitioned one. However, slow machines play a significant role in making it very hard to distinguish between lost messages (or failed machines) and a slow machine. This difficulty is right at the heart of why CAP, FLP and other results are true.

17. Have I 'got around' or 'beaten' the CAP theorem?

No. You might have designed a system that is not heavily affected by it. That's good.

cap-faq's People

Contributors

Stargazers

Watchers

Forkers

bonhag dohaivu minghai raf64flo bothra90 jfeng3 erwanor kleopatra999 lihuawu glbrtchen mubarak ptarjan zerosign zzl0 jobava apsops nivertech mihn lukaswelte odino davidglassborow jkthorne jamesoram ee08b397 jinfei21 ibmendoza adohe-zz abhishekgupta7 safiqksm boilerplates-dev rmoorman sonnyit liul85 seanzhou1023 amitambitions guoxiongxie beaver-company srinivasthuniki ankitbajpaii codealigned knowledgescout ashutoshtiwari48 mauryasankalp gowthamg ramch22 embydextrous pratikrjoshi justin6302971 gogobody yanjiasun wolfgang-koch ravibabuc mohareddygit jag678 itsmitul9 prince001996 sankalp-jain eduaravila manhtientran awesome-software-engineering jjrodenburg jatinvashisht1 hwf87 wanglongzheng0313 jayryu guzhixin0824 ashishkhanagwal20

cap-faq's Issues

CAP theorem paper link broken

http://lpd.epfl.ch/sgilbert/pubs/BrewersConjecture-SigAct.pdf

Incorrect Url for paper trail

In below statement (Line number 5) you have incorrect href attribute(2 : ) which is causing page to go blank when clicked.

<pre><a href="http:://the-paper-trail.org/">http://the-paper-trail.org/</a></pre>&gt;

it should be

<pre><a href="http://the-paper-trail.org/">http://the-paper-trail.org/</a></pre>

Or best option only provide the url and let markdown do rest for you

href="http://the-paper-trail.org/

Definition of Availability

When Gilbert and Lynch proved the CAP theorem they did it by relying on a very tight definition of availability, that included the condition that all nodes in the distributed system should be able to respond to a request in a finite amount of time. Importantly this excludes a typical high availability solution like failover to a replica, where the minority partition becomes unavailable but the distributed system as a whole can still respond to requests.

I think it's important to include this information prominently in your (otherwise very good) CAP write up, as it starts to show why the CAP theorem (as proved) often is not relevant to real world distributed systems.

Discussion of weaker failure modes

I think there are two major causes of confusion when it comes to "beating CAP." Number one is the fault model that Gilbert and Lynch assume. (Number two is the confusion over "application-level" versus "CAP" consistency, as in #4.)

I've recently seen several discussions of CAP (even in academic publications) that discuss the availability requirement. Neither of these is actually Gilbert and Lynch HA, but, for their relaxed failure domain, guarantee a response. Here are two examples:

"up to F server faults": If you can contact a majority of servers, you can get a response. Not "HA" as minority servers may be partitioned. The HyperDex paper, Section 8 states this assumption rather clearly, noting it is "thus able to provide seemingly impossible guarantees."
"for specific fault model[s]": If we provision networks appropriately, and partitions never happen, there many be no partitions! The Windows Azure Storage paper, Section 8 discusses this. It's stronger than the asynchronous model and is not HA (I'll not speculate as to how realistic this is, but the paper is fairly adamant that the system circumvents CAP.)

I'm not quite sure how to best address these in the text, but it might be useful.

Two concrete suggestions:

Under "15. Is a failed machine the same as a partitioned one?" the FAQ could mention that, in an HA system, a minority partitioned server still needs to guarantee a response.
Under "12. Is my network really asynchronous?" the FAQ could mention that, in the limit, failures can render any communication network asynchronous.

Alternatively, (at the risk of starting a "list of shame"), the FAQ might expand "17. Have I 'got around' or 'beaten' the CAP theorem?" into a "list of common fallacies" like those above.

I'm curious what you think and am happy to drop a pull request if there's interest.

Confusion about "writing to one side of a partition"

In section 8. why is CAP true it says

The basic idea is that if a client writes to one side of a partition, any reads that go to the other side of that partition can't possibly know about the most recent write.

but this is confusing to me. Because in section 7. What is a partition a partition is defined as

A partition is when the network fails to deliver some messages to one or more nodes by losing them (not by delaying them - eventual delivery is not a partition).

Basically it's not clear to me what you mean by "a client writes to one side of a partition". A partition as defined as a network failing to deliver some messages to node(s). I don't understand how that is something that is written to or has sides.

Pedantic comment

Maybe being a bit pedantic but w.r.t

"the CAP theorem tells us that we can't build a database that both responds to every request and returns the results that you would expect every time. It's an impossibility result"

CA systems are possible but then the P goes for a toss !!

Discussion of CAP consistency and Application-level consistency

As I mentioned in #3, I think there are two major causes of confusion when it comes to "beating CAP." Number two is the difference between CAP "consistency" and application-level "consistency."

While linearizable registers can be used to guarantee almost any application-level integrity constraint, it's not always required. For example, if we're trying to generate unique IDs for a client, if we opt for a sequential ID generation scheme, then we're going to have to be unavailable. However, if we generate IDs using a clientID and a logical clock, then we can guarantee uniqueness without coordination. This example assumes client IDs are unique, but there are other useful "consistency" properties we can achieve with HA, like ensuring that fields in a database are non-NULL.

What's going on here is that, at a high level, for restricted data types and operations, it is possible to satisfy integrity constraints without coordination and giving up HA. This difference between read/write linearizability and application-level consistency appears to be a source of confusion/contention (e.g., explicitly windowed queries over immutable data). This isn't limited to linearizability: the same debate crops up in conflict serializable databases (e.g., weak isolation models).

Unfortunately, I don't think it's well-understood at this point which application-level integrity constraints are achievable with HA; we've been thinking about this a bit, but I don't have anything concrete enough to add to the FAQ at this point. However, I think it's possible to tie this issue up neatly by stressing that CAP pertains only to read-write registers. That is, application-level integrity constraints are not addressed by the Gilbert and Lynch result. To be complete, the FAQ might mention that some constraints are achievable, but not all are. I think that's the limit of what we know today. This might go in "3. What is 'read-write storage'?" but could also go in the "common fallacies" section.

This is similar to the "What's the relationship between CAP and ACID?" question, but only for CAP "C" and ACID "C," not CAP "C" and ACID {"A", "I", and "D"}. I think it's worth a separate discussion.

Recommend Projects

React

A declarative, efficient, and flexible JavaScript library for building user interfaces.
Vue.js

🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
Typescript

TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
TensorFlow

An Open Source Machine Learning Framework for Everyone
Django

The Web framework for perfectionists with deadlines.
Laravel

A PHP framework for web artisans
D3

Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

javascript

JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
web

Some thing interesting about web. New door for the world.
server

A server is a program made to process requests and deliver data to clients.
Machine learning

Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Visualization

Some thing interesting about visualization, use data art
Game

Some thing interesting about game, make everyone happy.

Recommend Org

Facebook

We are working to build community through open source technology. NB: members must have two-factor auth.
Microsoft

Open source projects and samples from Microsoft.
Google

Google ❤️ Open Source for everyone.
Alibaba

Alibaba Open Source for everyone
D3

Data-Driven Documents codes.
Tencent

China tencent open source team.