Code Monkey home page Code Monkey logo

biscuit's People

Contributors

adeinega avatar clementd-fretlink avatar derekkraan avatar divarvel avatar geal avatar gitter-badger avatar jdtw avatar kellerfuchs avatar mjolivet-lucca avatar urbainy avatar yaronf avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

biscuit's Issues

add to spec: verifier queries

a verifier should be able to run queries and gather data from the token, like revocation ids: in some cases, loading all of the revocation ids as logic facts might be too much data, so we would validate the token, get the revocation ids, then look them up elsewhere.
Since the facts we want to query might be generated by block or verifier rules, those queries should be run as part of the verification process. The verifier validates the token, and if there were queries, returns the associated data.
This is currently implemented in the rust version, but not the java version yet

Roadmap

After a few iterations on both cryptographic primitives and language designs, biscuit is in a phase of iterative improvements.

Biscuit is used in production in several places, the rust implementation now has a stable API and is used as the basis for other implementations (biscuit-wasm and biscuit-python).

Biscuit blocks are versioned and this mechanism has allowed gradual introduction of new features without disrupting existing deployments. Features that are not part of token serialization and authorization are not versioned and remain under the responsibility of each implementation. This includes datalog parsing for instance.

This roadmap starts from biscuit v2, which is the minima version supported by libraries. Due to an issue in the initial release of biscuit v2, a breaking change update had to be released, so biscuit v2 corresponds to version 3 of blocks. In the roadmap, only block versions will be used (v3+).

Roadmap

Documentation

Support for v3 blocks

Support for v4 blocks

API support for third-party blocks

API support for snapshots

Integration with web frameworks

Rust

  • Actix-web
  • Tower.rs

Javascript

  • Express

Haskell

  • Servant
  • WAI
  • Twain

Upcoming features

  • ECDSA
    • Initial design
    • PoC
    • Specification
  • WebAuthn signatures
    • Initial design
    • PoC
    • Specification
  • New datalog features (v5 blocks) #147
  • Resumable execution
Old roadmap there's already been a lot of discussion around what the biscuit can be and how it could work, so it's probably a good time to sum things up and see what's needed.

Roadmap

To get to a usable token implementation, here's what we would need now:

  • credential language
  • semantics
  • cryptography
    • asymmetric mode
      • propose some usable schemes
      • experimental implementation: #10
      • benchmarks (speed for attenuation and verification, size overhead)
      • choose one of the schemes: #19
      • check how easy it is to implement it in various languages (availability of high quality libraries, serialization, etc)
      • write a specification for the cryptography primitives we will use
      • security audit for the chosen scheme and its implementations
    • replaced with a seal based on signatures symmetric mode
  • serialization: #18
    • define a format to store the credential language and its basic types
    • define a binary format for the asymmetric token, with its blocks and signature(s)
    • define a binary format for the symmetric token
    • define a way to store the token in text formats (base64 of the binary token, etc)
  • tooling

Semantics

With #6 and #8, along with some out of band discussions, we have a better idea of how attenuation should work:

  • the first caveat specifies an authority field indicating the basic rights for this token, out of a set of rights the verifier knows about
  • some of those rights can be marked as critical, and the verifier should refuse a token that does not put bounds on it (to allow accidental elevation of rights)
  • there's an ambient authority with information coming from the context, ie which resource is accessed, current time, source IP address, etc
  • the verifier also provides information from its own context (different verifiers might care about different rights or resources, etc)
  • each caveat checks that the request is within what it authorizes (resource accessed, kind of operation...)

Caveat language

We're evaluating a datalog like language to express the caveats. It is simple to implement and allows complex queries. It can also be used to generate the authority field in a compact way.

We have been exploring example queries to get a feel for how it could work.

Cryptographic primitives

Asymmetric mode

We need to support our goals of a token that can be attenuated offline and verified in a decentralized way. To that end, we explored a few cryptographic systems:

  • first one is an aggregated signature scheme using pairing based cryptography on BLS curves. Easy to follow and very compact, but verification can be quite slow, and very few good libraries are available
  • second one is an aggregated zero knowledge discrete log proof using verifiable random functions. Quite fast and there's a version that does not add too much size overhead
  • third one uses a PKI like system with a challenge/response to authenticate the last element of the chain (its properties are a little bit different than the others)

All three of them would be usable, but we will need an audit of the schemes before deciding which one to go with.

Symmetric mode

It would be useful to have an alternative mode to transform a biscuit token to a symmetric construction, a bit like macaroons.
That mode is not well defined yet, but the idea would be to send an asymmetric token to the authentication service, which will check the token and its caveats, and create a new token with the same caveats, but using a symmetric mode, possibly with encryption.
At the cost of one RTT, we get a token that is much faster to check and can be fixed for requests to only one service (the one who knows the secret key).

New atom type: Set

It could be interesting to have a Set atom type in the spec.

We recently merged the feature into biscuit-go (see: biscuit-auth/biscuit-go@19073ab)

This allows to group several atoms under a single one, and have constraints allowing to check if all or none the set values are included in a given set:

# given the fact:
users(#ambient, ["alice", "bob", "eve"])

# doesn't match, since eve is not in the constrained list
*allow($username) <- users(#ambient, $usernames) @ $usernames in ["alice", "bob"]

# match, $usernames does not contains any of "admin" or "root"
*allow($username) <- users(#ambient, $usernames) @ $usernames not in ["admin", "root"]

The set does not restrict embedding different atom types (ie: [#a, "alice", 42]) but I don't see much usage of that yet. It doesn't seem to affect the constraint anyway.

wdyt ?

Clarify attenuation semantics

Compared to macaroons, the attenuation syntax seems to allow capabilities extension, whereas macaroons structurally guarantee that no extension can happen (by relying on boolean algebra).

The example features diff-ilke attenuation (instead of general properties the query must satisfy, attenuations are modifications to the token's rule set. In isolation, each attenuation rule cannot be checked.
This implies that a reduction has to happen before the receiving service can check if the token is okay for a given request. On the contrary, macaroons verifiers check each caveat in isolation.

This leads to an important question: "who's responsible for the reduction?".

  • If that's the library, it means that the attenuation semantics have to be formally expressible in a language expressive enough for describing a large range of attenuations (eg TTLs), while constrained enough to be able to prove attenuation properties on the reduction (it's not an easy feat)
  • If that's the user, it means that the user has now the burden of proving that no rights expansion is possible (which is a world-class footgun, unless you force the users to issue proofs before using the library in production).

Compare this to macaroons, where we have the best of both worlds: syntax is not constrained and attenuation is proved by basic boolean algebra rules.

Unless biscuit ships with a rock-solid attenuation language, with structural proofs that expansion is impossible (in the current spec, this implies amongst other things computing regexes intersections),
it means that the burden of proving attenuation is transferred to the owner, which would make it a dangerous token to use.

Using the symbol table for strings too

The difference between strings and symbols can be confusing, because there are special symbols, #authority and #ambient, and normal symbols that can be user defined. Those are just be considered as special strings with limited constraints checks, and a bit faster to compare since they're actually stored as an index in the symbols table.
I'd propose that we store all strings in the symbol tables (this can reduce storage size as strings tend to be repeated) and reserve the symbol type to reserved symbols (only #authority and #ambient for now).
This would require that the Datalog engine accesses the symbol table during execution (for string constraints checks)

Datalog restrictions in blocks

Currently, blocks can be added to any non-sealed biscuit. A block can contain both caveats (like in macaroons), but also facts and rules. This requires careful thought from both the biscuit minter and the biscuit verifier: the goal is to avoid confusing facts from the biscuit minter and facts from blocks.

To that end, the special #authority atom is used.
It's special in two ways:

  • it's forbidden in blocks rules and facts
  • it's automatically added in authority facts (and rules? not sure)

All that makes the semantics rather complex (and way more complex than, say, macaroons). Another possibility would be to completely forbid rules and facts in blocks, and only allow caveats. This would bring the attenuation semantics closer to those of macaroons, which are arguably easier to reason about.

I still see the value in more complexity in blocks, but I think it should be optional (ideally, opt-in). In that case, i think the biscuit should carry this information when minted. This would avoid validating an restricted biscuit with a verifier that expects an unrestricted one.

All in all, while datalog brings huge improvements over macaroons (putting aside asymmetric crypto), the ability to add rules and facts in blocks make the complexity explode and make auditing biscuits properties way harder.

As an additional bonus, not allowing rules in biscuits completely remove an attack channel where a legitimate biscuit is attenuated with rules crafted to exploit the datalog engine.

new text syntax

I'm simplifying the text syntax, but it can still be a bit confusing, so I'd lie to propose a different version, that's further from traditional datalog syntax.
I'd like to change from:

right(#authority, $1, #read) <- resource(#ambient, $1), owner(#ambient, $0, $1)

to

right(#authority, $1, #read) if resource(#ambient, $1) and owner(#ambient, $0, $1)

with if and and case insensitive, so we can write right(#authority, $1, #read) IF resource(#ambient, $1) AND owner(#ambient, $0, $1) if it's easier to read without text highlighting

And for caveats:

caveat1($0, $1) <- right(#authority, $0, $1), resource(#ambient, $0), operation(#ambient, $1)

to

check right(#authority, $0, $1) and resource(#ambient, $0) and operation(#ambient, $1)

or another keyword, like verify, satisfy, ensures, assert, etc

For constraints, instead of moving them to the end like this:

valid_date("file1") <- time(#ambient, $0 ), resource(#ambient, "file1") @ $0 <= 2019-12-04T09:46:41+00:00

we would write:

valid_date("file1") if time(#ambient, $0 ) and resource(#ambient, "file1") and $0 <= 2019-12-04T09:46:41+00:00

And constraints could be anywhere in the rule's body, like this:

valid_date("file1") if time(#ambient, $0 ) and $0 <= 2019-12-04T09:46:41+00:00 and resource(#ambient, "file1")

add a "cookbook" of common biscuit "recipes"

To make the logic language more useful, we should provide common patterns to be reused, with explanations of their design, etc, for patterns that cannot be integrated right away in the token API.
As an example:

  • public key signature validation: #35
  • RBACL support
  • hierarchical ressource access (files, folders, etc)

nitpicking: renaming atom

currently, we've been using atom to talk about terms while atom is generally used to desinate the predicates in the head and body of a rule

Public key signature constraint

To implement use cases such as third party caveats, or using a biscuit token as attestation accumulating acknowledgement from multiple parties, caveats should be able to verify a public key signature.

A few questions here:

  • do we fix the allowed algorithms, and if yes, which one(s)? If no, how?
  • should I add a new data type for this? Could we just fit it into strings? Or make a more general byte array type?

Signature verification would work well as a constraint affecting 3 elements (message, key, signature) that could be filled in various ways:

  • verify(message, key, sig?): a signature from a specific key must be provided
  • verify(message, key?, sig?): a signature must be provided, from any key, but we could have other constraints on that key here (like the key must be in a certain set)
  • verify(message?, key, sig?): specify we must have a valid signature from a key, with other constraints on the message

Protobuf schema simplification

When I designed the serialization format, I was not too familiar with Protobuf, so there were a few mistakes I made, like representing sum types with an index, instead of using oneof. As an example, I wrote:

message Constraint {
  required uint32 id = 1;

  enum Kind {
    INT = 0;
    STRING = 1;
    DATE = 2;
    SYMBOL = 3;
    BYTES = 4;
  }

  required Kind kind = 2;

  optional IntConstraint int = 3;
  optional StringConstraint str = 4;
  optional DateConstraint date = 5;
  optional SymbolConstraint symbol = 6;
  optional BytesConstraint bytes = 7;
}

while it could be written:

message Constraint {
  required uint32 id = 1;

  oneof ConstraintEnum {
   IntConstraint int = 2;
   StringConstraint str = 3;
   DateConstraint date = 4;
   SymbolConstraint symbol = 5;
   BytesConstraint bytes = 6;
  }
}

I'd like to clean this up, but that would be a breaking change

Retrieving a block index

Following from #35, we have designed a biscuit signature flow, where the authority block defines facts requesting the user to append a signature, like so:

The authority block defines facts:

  • should_sign(#authority, dataID, alg, pubkey)
  • data(#authority, dataID, challenge)

where:

  • dataID is an integer, linking the data fact to the should_sign one
  • alg is a signature algorithm symbol
  • pubkey is the public key bytes that is expected to verify the signature
  • challenge is a byte array

And also a caveat:

  • *valid(0?)<- should_sign(#authority, $0, $1, $2), valid_signature(#ambient, $0, $1, $2)

Which forces the verifier to provide a valid_signature fact matching the should_sign fact, in order to verify the biscuit.

This allows the verifier to make a query for a user signature:

*to_validate($0,$1,$2,$3,$4,$5,$6,$7) <- should_sign(#authority,$0,$1,$2), data(#authority,$0,$3), signature($0,$2,$4,$5,$6)

With the variables being:

  • $0: dataID integer
  • $1: alg string
  • $2: pubkey byte array
  • $3: challenge byte array
  • $4: signature byte array - the user provided signature
  • $5: signerNonce byte array - for anti replay
  • $6: signerTimestamp date - for anti replay

This expects the user to have appended a block to the biscuit containing a fact as follows:

  • signature(dataID, pubkey, signature, signerNonce, signerTimestamp)

But we can't verify it yet, as we expect the signature to have been constructed like so:

signature = challenge | tokenHash | signerNonce | signerTimestamp

where tokenHash has been introduced, being a hash over all the biscuit blocks before appending the signature block.

Biscuit
  |_ authority       -|
  |_ b1               |
  |_ b2              _| tokenHash
  |_ signature block
  |_ b3
  ...

So in order to verify the signature, the verifier needs to create a hash of N blocks, N being the block index containing the signature fact -1

And now we're wondering, what could be the best way to retrieve this block index ?

supporting expressions

do we need operations like these:

  • integers: + - * / %
  • strings: concat
  • aggregates: count, sum, min, max, average

they would not be too hard to implement, but integrating them to the syntax might be complex

add a new basic type for byte arrays

this will be useful to store arbitrary data in the tokens without relying on base64 and also to match on binary data like public keys. There's an example implementation in biscuit-auth/biscuit-rust#11
I'm using a hex: prefix when printing and parsing, it could be a good idea to add a base64: prefix

Text syntax changes

the * character is used to indicate the head of a rule or caveat but it has been very confusing for users, so I'd propose that we remove it.

Additionally, I'd like to remove the head from caveats, or make it empty by default. W'll have to choose a new syntax for caveats. Something like ? <- fact1("term") ?

Last one: there's a constraint on rules to produce facts that contain at least on term, but I'm not sure that constraint really makes sense, so maybe we could remove it

add to spec: context field

in a system with a large number of users or resources, loading all of the data in the verifier might be costly.
I am adding an optional "context" field to the blocks, that can be queried before verification, to give an id to look up or some filters for the data

Add support for a Macaroon-like, symmetric contruction

Do you think it would make sense to support a version/instanciation of the tokens that expose the same API & caveat language, but use a Macaroon-like, symmetric-crypto-only construction?

Requirements

  • Use only symmetric primitives, based on a single secret specific to a security domain.
  • Provide authentication and encryption of the caveat content.
  • Support all features of the pubkey version, aside from public verifiability; that includes offline attenuation.

Rationale

Compared to pubkey biscuits, Macaroons provide very different tradeoffs between performance and security/applicability (all verifiers needs access to the token-minting key). It could be quite useful to support a symmetric mode, for instance to support “caching” the validation (and expiration checking) of a pubkey-based credential by sending back to the user-agent a symmetric, time-limited version of it.

Having the same features and caveat language as the pubkey version supports this sort of translation between the two; in general, there should be as little difference as possible, from a user/developer's perspective, to limit cognitive overhead.

Lastly, there is a triple reason to encrypt those tokens:

  • (slight) increase in privacy;
  • implementable without overhead, compared to the Macaroon HMAC construction:
    single-pass AE modes exist, that aren't noticeably slower than a MAC;
  • preventing developers from parsing and “checking” caveats without verifying token authenticity when the minting key isn't available; amusingly & sadly, I've seen it happen, and IMO a misuse-resistant design should prevent that sort of sillyness.

Test cases

In order to make implementations easier to verify, we should provide, along with the specification, a serie of test cases with the expected result.
To test a token, we will need:

  • its serialized version (if we store the test cases as text, the base64 encoded version)
  • root public key
  • verifier facts (if any)
  • verifier queries (if any)
  • result:
    • if failed: error code(s) and message(s) (we should probably specify the errors returned by the verifier)
  • text explanation of the issue

Here's a temporary list of test cases we should provide:

  • format and signature errors:
    • a serie of random bytes: parse error
    • a correctly serialized structure with invalid crypto (wrong size): parse error
    • a correctly serialized structure with invalid crypto (right size, random bytes): signature verification error
    • a correctly serialized structure with valid crypto, but wrong root key: signature verification error
    • correctly serialized, valid crypto and root, random bytes in blocks: parse error
    • correctly serialized, valid crypto and root, valid blocks but wrong order: parse error
  • correct format and signature, logic language serialization errors:
    • how to test invalid symbol table? symbols not present in the table, or blocks refer to symbols defined by later blocks, symbols redefined by later blocks
    • authority block has fact without authority tag: validation error
    • ambient facts without ambient tag: validation error
    • block facts with authority or ambient tag: validation error
    • authority or ambient blocks contain variable type: validation error
    • block at index 0 is not the authority block: validation error
  • caveat verification:
    • block 1 defines fact X, block 2 has a caveat requiring fact X: validation error (the fact X should only be valid when evaluating block 1)
    • ?

Consider renaming Integer

It's a reserved keyword in java, which blocks the impl if we want to be consistent in naming.

I went with "Number" for now

removing the authority tag? Or merging block before validation?

The first block of a token is called the "authority block", and is used to define some rights and rules that will be used when validating other blocks. here is the process:

  • load facts and rules from the authority block
  • load ambient facts and rules from the verifier
  • test authority caveats and verifier caveats
  • take a snapshot of the current facts and rules
  • for each following block:
    • start from the snapshot
    • add the block's facts and rules
    • test the block caveats
    • test the verifier caveats

Facts from the authority block (either directly or generated from rules) start with the #authority symbol to differentiate them from facts generated in the other blocks. This was useful in earlier versions, where data generated in one block's validation could be carried over to the next block: we wanted to avoid accidentally increasing rights by adding more authority facts for the validation in the next blocks.

Since the validation of each block is independent from the other ones, a block adding a fact can only affect its own validation, so I do not think the authority tag is still needed. The only issue I would see is in verifier queries that would try to list the facts we can access, but since it would get the facts from each block then return their intersection, a block adding more facts would have no effect. For queries that return the union of results (like revocation ids), we would need to be more careful. Maybe we should mark separately "filter queries" and "aggregation queries"?

This can simplify Biscuit significantly:

  • no separation between block and authority caveats in the verifier
  • no separation between block and authority queries in the verifier (like getting revocation ids)
  • no need for separate add_authority_* methods at token creation
  • no more checks for the creation of authority facts in later blocks
  • most examples will look simpler without the authority symbol everywhere

specifying the regular expression features

we should have a clear definition of what is accepted or not in the regular expression constraints, as different engines will have different behaviour. As an example, we probably want to disallow backtracking, since a regex can be provided in an attenuated block (backtracking can lead to high CPU usage).

supporting block level rules?

with biscuit-auth/biscuit-rust#3 I am adding a way to check authority facts from the verifier side, thus allowing tokens with only the authority part. But if we're going to support that use case, we might as well allow block level caveats in the authority block (example: for an expiration date on the entire token).
Current problem: the block format contains facts and rules, and as a convention, for the authority block, the rules member contains rules that generate authority facts, and in other blocks, that member contains caveats.

I'd propose adding another member, so blocks would contain facts, rules and caveats.
For the authority block, we would have:

  • facts are authority facts
  • rules generate authority facts
  • caveats apply on authority and ambient facts

For other blocks:

  • facts are block level facts (and not carried from one block's validation to the next)
  • rules generate block level facts
  • caveats apply on authority, ambient and block level facts

adding serializing method or a print function to sealed biscuit

Hello , i work on some use cases using biscuit web token and i would like to use the feature sealed biscuit but i have a problem with that , after calling the feature used to seal the biscuit i can not print or serialize the result.

do you have any suggestions for this ?

Very quick high level review

Hello. Just found biscuit by watching https://docs.rs/releases and decided to review it as you use a few things I'm independently interested in: Ristretto, Macaroons, and Datalog.

Edit: I managed to skip over the "non-goal: revocation". Whoops. I'll leave my feedback though.

The delegation chains embed intermediate public keys in the token, this is sub-optimal for space. Implicit certificates & certificate-less public-key signature schemes suffice for one caveat; though I don't know if it is safe to generalize for multiple hops. I do know that pairings can solve this space problem, at the cost of time. I see you've been looking into pairings already; have you settled on the ristretto-friendly naive chaining?

Global synchronization of clocks is an impractical assumption in an asynchronous distributed system. Despite TLS' misplaced trust on global time, relying on globally synchronous clocks is poor practice. It would be suitable for the validator to ping the appropriate clock for validation.

Revocation may be trivial for macaroons where the single validator never has stale information; but revocation is difficult with distributed validators as they'd all need the up to date revocation state. Again the validator should ping the appropriate party for revocation information.

The two issues above strongly promote asking the normal symmetric mint/validation service to handle all of time, revocation, and validation. The root entity must still be trusted in the asymmetric variant. I do see value in third party validation when the root party is either offline or unable to handle the validation load.

One suitable (but weaker) approach for revocation would be to dedicate a stateful pub/sub service to store the revocation sets such that neither it, nor non-validating-users can see the revocation patterns. This service would store and relay a journal of revocation events, with optional bitmap snapshots. The revoking party can only append to the revocation set. The validators can only subscribe to this growing set and detect (and prove) any truncation attacks.

This is an interesting use of Datalog; though I've been thinking about using a DFA (Deterministic Finite Automaton; basically a regular expression). Only the parties adding caveats need the message schema (or subset they attenuate). The automaton may reference validator state such as dynamic sets (I.e. group memberships).

Additionally a DFA (or any join-semilattice) may be reduced such that several caveats flatten to one; stripping redundancies. This might accelerate validators and would be desirable for clients with many caveats. The reduction is both efficient and safe.

Example. Caveats write(*, *) and {read("/foo"),write("/foo", *)}; reduce to write("/foo", *); which matches write("/foo", "bar").

DFAs are much more efficient (when they suffice) but are not as expressive as Datalog. I am not convinced Datalog's complexity brings anything sufficient to warrant its usage for caveats.

define a text syntax for the logic language

the current API uses functions to create facts and rules, but it does not make the Datalog code very readable.

Instead of writing this:

builder.add_authority_fact(&fact( "right", &[s("authority"), string("file1"), s("read")]));

it could be nicer to write this:

builder.add_authority_fact("right(#authority, \"file1\", #read)");

Additionally, we could provide a UI to edit rights with text, test things out from a page with the wasm version, etc.

The token printer (cf the samples) already provides a syntax, but we might want to modify it:

  • a fact is made of a name and a tuple of values. In right(#authority, "file1", #read), the name is "right" and the values are inside the parens
  • a value can represent 5 types:
    • 123: integer
    • "abc": string
    • #abc: symbol (a string that is represented by its index in a table, this reduces token size and makes evaluation fast)
    • 0?: variable, used in rules and queries
    • not specified yet: date
  • a rule is made of a head and body. The head is a fact, the body is a comma separated list of fact. In caveat1(0?) <- resource(#ambient, 0?) && right(#authority, 0?, #read), caveat1(0?) is the head, separated from the rest with the arrow
  • if a rule puts additional constraints on values, the constraints are added at the end of the rules after a pipe character: expiration(0?) <- time(#ambient, 0?) | 0? <= 1545264000
  • we do not have a syntax for queries yet

Another common Datalog syntax is the following:

  • fact: parent(abc, def)
  • rule: ancestor(X, Y) :- parent(X, Z), ancestor(Z, Y).
  • query: ?- ancestor(bill, X).

cf https://docs.racket-lang.org/datalog/datalog.html for a BNF grammar of Datalog

Example API

The logic language we're exploring in #11 can already be useful, but it might be better to have an easy to use API that covers most use cases, and then allows people with more specific needs to use the low level tools. (the example is a bit rust-y but I'm also thinking of other languages).

Basic API:

// generate a root key pair
create_key() -> KeyPair
// generate root token
create_token(root_key: KeyPair, authority: [Facts], caveats: [Caveats]) -> Token
// derive a new token
derive_token(key: &KeyPair, caveats: &[Caveats]) -> Token
// verify
verify_token(root_public_key: &PublicKey, token: &Token, ambient: [Facts], query: Query) -> bool

Generating authority facts?

Generating a caveat?

generating a query?

Add license

Please add a license to this repo and biscuit-rust/java. Thanks!

Example logic queries

with @clementd-fretlink, we've been looking at a datalog like language to express caveats.
Here are some ideas that emerged:

  • we want to avoid rights reactivation that can appeared if we implemented negation. On the other hand, a limited model like datalog with constraints is interesting, because it can express limits on ambient data while keeping a simple logic
  • the authority field in the first block defines some facts
  • the ambient data defines some facts (IP address, time, http headers, etc)
  • the verifier provides some facts
  • the verifier provides some rules that can be used in caveats
  • caveats are queries that use rules and constraints on facts
  • we start from the initial set of facts (authority, ambient and verifier provided) at each caveat evaluation, to avoid new facts being kept between applications and possibly messing up validation

Current questions:

  • could one block define its own rules to make writing caveats a bit easier?
  • could one block define some facts and rules that could be reused in further tokens? (not sure I want to introduce ordering again here)
  • should facts have a scope, to prevent some rules generating them? (example: the ambient facts set should be locked, no new facts should appear)

To make it easier to reason about this language, I propose that we write some example facts, rules and queries in that issue.

First example:

authority=[right(file1, read), right(file2, read), right(file1, write)]
----------
caveat1 = resource(X) & operation(read) & right(X, read)  // restrict to read operations
----------
caveat2 = resource(file1)  // restrict to file1 resource

With resource(file1), operation(read) as ambient facts, caveat1 succeeds because resource(file1) & operation(read) & right(file1, read) is true, caveat2 succeeds because the fact resource(file1) succeeds.

With resource(file1), operation(write), caveat1 fails but caveat2 succeeds.
With resource(file2), operation(read), caveat1 suceeds but caveat2 fails.

file format to store and load facts, rules and caveats

Right now, when generating a new token or a verifier, all of the facts and rules are entered manually in the code. We might want to load them from a file or memory instead. Since we already have protobuf definitions to transport them inside a token, I'd propose we reuse protobuf to store them

allowing variables in constraints

Right now the values used in contraints are statically defined in the rule or caveat, but it could be useful to make them more dynamic.
As an example, if we wanted to define that the owner of a folder has all rights on any file in that folder or its subfolders, we would be able to write the following rule (with variable names instead of numbers for readability):

right($path, $operation) <- user($user_id), owner($user_id, $folder), operation($operation),
  resource($path) @ path matches $folder*

With that rule, we can define a path prefix constraint using the folder defined in the owner fact.

With this we could also compare integer variables between facts, and I think there could be some applications with set constraints too.

This would be a breaking change in the binary format.

explicit run time limits for the datalog engine

right now, we put some hardcoded limits on the number of iterations in a "world run", but it should be better to leave that decision to the user. We can get a token with some degenerate cases that would produce a lot of facts or apply on a large number of iterations, so there should be a hard limit there.

A few options:

  • limit on the number of facts produced
  • limit on the execution time
  • if we want a safer way, optionally forbid facts and rules in blocks other than authority (so those blocks would only contain caveats)

handling different format versions

to ease migrations in case of breaking changes, I'd like to add a version field (uint32) the the format. I know that protobuf is designed for compatibility between versions, but I do not want to see a case where a new field gets ignored by an old Biscuit version, that might create security issues.

One decision to make, do we put the version field:

  • in the wrapping Biscuit message that contains the serialized blocks and the signature (the version field would not be signed)
  • in the authority block (I'm ok with having a special case for deserializing the authority block, it could carry various configuration data)

support caveats with multiple queries

currently a caveat contains only one query, so it is difficult to model caveats using OR logic.
previously, it was possible to make intermediate rules for each case, and have a caveat that checks the rule's result, but with the recent block merge, any block could just generate the facts for those rules in advance.
This would change the format at https://github.com/CleverCloud/biscuit/blob/master/schema.proto#L28 to have each caveat contain a list of rules.
When executed, the caveat would try each rule until one of them succeeds, or return an error if all failed.

Choose a cryptographic scheme

We have 4 different methods we can use:

  • using pairings
  • using verifiable random functions (we actually have 2 versions of this one, with different size and speed)
  • using a kind of PKI with a final challenge response
  • using aggregated gamma signatures

We have to choose one, depending on a few criterion:

  • the most important is the security audit: we need to have a proper audit of biscuit's design. If one of the crypto designs is flawed and cannot be fixed, we will remove it from the list
  • performance: the overhead of verifying a token should be small enough (signing should not be too slow, but it's not the limiting factor here)
  • size: the token should not grow too much, so the crypto design should not have too much size overhead
  • availability of good implementations or base libraries in various languages (mainly Rust and Java for now)

While the pairings solution is interesting, it is quite slow (cf the benchmarks results at https://github.com/CleverCloud/biscuit/tree/master/code#benchmarks-summary ). Also, there's no guarantee of finding high quality crypto libraries to use in various languages. So I think we'll eliminate right away.

The other solutions are fast enough, and based on the Ristretto group. There's a good implementation in Rust, curve25519-dalek, and a new one in Java, curve25519-elisabeth. Using this group reduces the risk of implementation errors.

The challenge token solution makes fast and small tokens, but its behaviour has an annoying tradeoff: when we want to verify the token, an additional operation must be performed, that prevents further attenuation. This might not be what we want for the token.

The gamma signatures solution produces short signatures and is faster than ost other schemes

playground

This isn't an issue, but more a contribution. We have worked on a playground equivalent to jwt.io. We've seen it was on the roadmap, so hopefully it can help.

It is available at https://www.biscuitsec.org and the code is opensource. In the readme we credited clevercloud, but please let us know if you want the wording or the display to change.

We also tried to vary the examples, and try to explain through everyday life scenarios (compared to your main use case for pulsar).

Anyway, thanks for the biscuit, we think it's a great project.

Fabien & Mohamed

Using the symbol table for variable names in rules

Variables are defined with an index, and this does not make them very readable. It would help to have variable names instead.
We could reuse the symbol table to provide those: a rule can still be stored with variables containing numbers, but those numbers could be indexes in the symbol table. They would be converted between names and numbers when generating the blocks or printing the rules.

The following rule:

*caveat1($0, $1) <- right(#authority, $0, $1), resource(#ambient, $0), operation(#ambient, $1)

could be printed as:

*caveat1($file, $operation) <- right(#authority, $file, $operation), resource(#ambient, $file), operation(#ambient, $operation)

1.0 release

Biscuit has been in development for 2 years now and is now used in production. Most of the initial roadmap is done (we still need to commission an audit).

So it will soon be time for a stable release and more public communication. Before that, I'd prefer that we clean things up a bit, there are design decisions that were left alone because fixing them would be breaking changes, but a 1.0 release would be the right time to take care of them (here I consider a breaking change anything that would invalidate how currently existing tokens would be validated).

This will be a meta issue for the work needed towards the major release:

  • vocabulary issues:
    • #23 Consider renaming larger
    • #24 Consider renaming In/NotIn
    • #52 renaming atom (not a breaking change, this would mainly change the API in libraries)
  • #54 text syntax changes (changes to the parser and printer)
  • #63 renaming In to Contains, removing NotIn
  • #55 Protobuf schema simplification
  • #59 adding a version field to the Protobuf format
  • #1 revocation identifiers
  • new data types. There might be more of them we could add. I'd also like to consider whether constraints could be folded in the expression type (by making boolean expressions)
    • #51 new atom type: set
    • #38 new atom type: expression
    • #61 new type: boolean
    • #62 "deny" caveats vs "success caveats"
  • Datalog evaluation:
    • #50 specifying the regular expression features
    • #47 allowing variables in constraints (need to decide if we allow it or not)
    • #53 explicit run time limits for the datalog engine
    • #58 Using the symbol table for strings too
    • #56 Datalog restrictions in blocks

I'll make a branch of this repo with updated test samples once I've started that work on the Rust library.

see anything else we would need?
cc @divarvel @daeMOn63 @titanous @Keruspe @KannarFr @BlackYoup @meh

Serialization format

it will soon be time to define the serialization format for biscuit tokens. To sum up the current ideas:

  • using a binary format to make the token more compact
  • using symbol tables to avoid repeating symbols in the token
  • TAI64 for dates looks like a good idea
  • prefix varints to represent integers (the number of leading zeros indicates the number of bytes encoding the number) since most numbers will be very small
  • UTF8 for strings
  • add a version number to the format
  • the authority block should probably be separated from the other blocks
  • blocks carry an index that should be checked (because their order is important for the symbol table)
  • we might need two versions of the signature part: one that works with asymmetric crypto, one that works with symmetric crypto
  • beware formats that have no guarantee on field order in serialization, since we hash those fields. A workaround for this issue when appending new blocks: start from an already serialized token instead of reserializing everything
  • it might be better to have a format that can support nested messages. We have a wrapping structure that contains the blocks and signature, and if that one checks out, we deserialize the blocks

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.