TLA+ Talk

Overview

TLA+ is a language for specifying concurrent and distributed systems and for checking that these systems do what you expect. It has been used in industry:

See "Formal Methods for Real-World Systems" on Page 3 of this CACM article.

and in academia:

Tutorial

Note that this tutorial is stolen from Leslie Lamport's excellent video course: https://lamport.azurewebsites.net/video/videos.html. If you want to learn TLA+, I recommend watching this series.

DieHard
- We model the execution of a system as a series of states.
- A specification of a system describes the set of legal executions.
- A specification does NOT provide instructions on how to transition states, just which state transitions are legal.
- Play clip: https://youtu.be/6cAbgAaEOVE.
- EXTENDS is like import.
- We model the state of the system with two variables, big and small.
- Note that TLA+ is untyped.
- Init says the initial state of the system.
- Next, we can specify formulas that describe legal state transitions.
- A primed variable like small' represents the value of variable small in the next state.
- Note that we have to constrain every variable. For example, if we did not constraint big' in FillSmall, then the value of big' could be anything.
- /\ is conjunction, / is disjuntion.
- Next is the disjunction of all our formula.
- We can specify invariants for our executions. For example, small should be in the set {0, 1, 2, 3} and big should be in the range {0, 1, 2, 3, 4, 5}.
- TLC is a model checker. It checks that all executions of our system satisfy the invariants we claim.
- Show cfg file.
- Run TLC with TypeOk only.
- Flipping invariant checking on our head, we can also find particular executions, like NotFourGallons.
- Pause for questions.
AtomicCommit
- Atomic commit is the problem where a set of nodes are all initially assigned a value 0 or 1. They must all agree to abort or commit. The nodes can only agree to abort if all initial values are 1.
- Explain Constants and assumptions.
- Explain functions.
- Explain \A and \E.
- Explain EXCEPT.
- Explain /=.
- Explain specs.
- Explain theorem.
- Note that we haven't really specified a system. We've specified the legal behaviors allowed by a particular problem, namely atomic commit.
TwoPhaseCommit
- Now, we'll look at a particular implementation of atomic commit, namely two-phase commit.
- Explain tuples.
- Explain records and messages. Messages never dropped.
- I said two-phase commit implements atomic commit, but what do I mean by that? Can we formalize that?
- Draw execution of two-phase commit, erase everything that is not state. The theorem says that anything satisfying the two-phase commit spec also satisfies the atomic commitment spec.

Limitations and Alternatives

See #1 for clarifications!

Model checking is not theorem proving.
- Model checkers exhaustively enumerate executions looking for errors, but they're not proving anything about the correctness of your specification.
- If there are infinite executions, you cannot check all of them.
- Even if TLC does terminate, it often depends on a choice of constants.
- Anecdote: Generalized Paxos has a TLA+ specification, but a later paper shows that this definition is incorrect (see 6.3)!
- There is no substitute for a simple proof.
TLA+ is for specifications, not for implementations.
- You can't run TLA+ specs. Even if your specification is correct, you may implement it incorrectly.
- Need to resort to automated theorem proving for really being confident that code is bug free.
TLC can be slow, really slow
- I ran TLC on my BPaxos stuff on 64 cores for hours and I exhausted 16 GB of disk before it could finish. Could take days to simulate.
- See https://lamport.azurewebsites.net/video/video7-script.pdf slide 200.
Sometimes hard to check if your system is doing what you expect.
- Your specification might accidentally be disallowing certain actions that you think should be legal.
- Your invariants are satisfied, so it's hard to tell when this is happening.
- Liveness constraints can help discover this.
Levels of testing:
- unit tests
- quickcheck
- model checking
- automated theorem proving

Other Things

Constant expression vs state expression vs action expression vs temporal formula
Liveness
More advanced forms of refinement
PlusCal
TLAPS

Resources

Comments about "Limitations and Alternatives"

A few comments about the "Limitations and Alternatives" section:

Model checkers exhaustively enumerate executions looking for errors, but they're not proving anything about the correctness of your specification.

For the (finite) model, TLC actually checks the correctness of the specification for the stated safety and liveness properties. For simple algorithms - such as the one below (taken from here) - TLC can even check the complete algorithm, not just a model of it.

EXTENDS Integers
(*****
--algorithm SimpleMutex {
  variables flag = [i \in {0, 1} |-> FALSE] ;
    process (proc \in {0,1}) {
         s1: while (TRUE) {
                  flag[self] := TRUE ;
         s2:
                  await flag[1-self] = FALSE ;
         cs:
                  skip ;
         s3:
                  flag[self] := FALSE
    }
  }
}
*****)

If one needs higher assurances for infinite models, TLA+ comes with a proof system to write machine checked proofs. Usually people combine the powers of TLC and TLAPS (see Proving Safety Properties and Using TLC to Check Inductive Invariance).

If there are infinite executions, you cannot check all of them.

Infinite executions do not imply that the state space is infinite. By definition a behavior is always an infinite sequence of states (a terminating execution has a suffix of infinite stuttering). TLC can easily check specs that allow infinitely many behaviors:

VARIABLE x
Spec == x = 0 /\ [][x' \in {0,1}]_x

TLC checks this specification in less than a second. Do not replace {0,1} with Nat though! ;-)

Need to resort to automated theorem proving for really being confident that code is bug free.

True, even better if the implementation is correct-by-construction. In reality though, even machine checked proofs do not provide absolute confidence. Anyway, engineers are usually happy with less. SPIN for example supports model-driven verification which apparently is good enough for NASA.

TLC can be slow, really slow

I ran TLC on my BPaxos stuff on 64 cores for hours and I exhausted 16 GB of disk before it could finish. Could take days to simulate.

In comparison, how long did it take you to find the inductive invariant and write a machine checked proof (compare "Proof assistants are addictive and a huge time sink.")?

By the way, can you share your BPaxos spec to study it as part of our scalability tests?

Also, apalache might in the future scale to larger models.

Sometimes hard to check if your system is doing what you expect.

I'm inclined to argue that this problem is not unique for TLA+. ;-)

mwhittaker / tla_talk Goto Github PK

tla_talk's Introduction

TLA+ Talk

Overview

Tutorial

Limitations and Alternatives

Other Things

Resources

tla_talk's People

Contributors

Stargazers

Watchers

tla_talk's Issues

Comments about "Limitations and Alternatives"

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent