edgebitio / enclaver Goto Github PK

Open source toolkit created to enable easy adoption of software enclaves

License: Apache License 2.0

Dockerfile 1.07% Rust 97.73% Shell 0.89% JavaScript 0.13% Go 0.19%

aws confidential-computing kubernetes rust secure-enclave security

enclaver's Introduction

Enclaver is an open source toolkit created to enable easy adoption of software enclaves, for new and existing backend software.

Enclaves provide several critical features for operating software which processes sensitive data, including:

Isolation: Enclaves enable a deny-by-default approach to accessing process memory. Software running in an enclave can expose interfaces for accessing specific data, while disallowing humans or other software on the same computer from reading arbitrary data from memory.
Attestation: Enclaves make it possible to determine the exact identity and configuration of software running in an enclave.
Network Restrictions: External communication is limited and controlled. The network policy is built into the image and therefore the software attestation.

These demos show off how your apps can use these unique features to improve privacy and security:

Run a simple Python app that represents a microservice or security-centric function
Run Hashicorp Vault to fully isolate it after it's unsealed

Project State

Enclaver is currently in beta and should be used cautiously in production. Enclaver currently only supports AWS Nitro Enclaves; support for Azure Confidential VMs, GCP Confidential VMs, and arbitrary SGX and OP-TEE enclaves is on the roadmap.

Architecture

Read the architecture doc for the full details. Enclaver consists of 3 interrelated pieces of software:

enclaver CLI for build and run
“Outer Proxy” + Enclave Supervisor
“Inner Proxy” + Process Supervisor

FAQ

See the FAQ for common questions and a comparison of Enclaver to similar technologies.

Reporting Security Bugs

Report security bugs confidentially at https://edgebit.io/contact

enclaver's People

Contributors

Stargazers

Watchers

Forkers

russellhaering eyakubovich robszumski cyberflamego mihayai jatanloya-skyflow iniesta-08 crawford dmore atbash-labs ssd39 leikahing

enclaver's Issues

Support Arm instances for running enclaves

Finish up the final plumbing to ensure that enclaver run selects the correct base image for the given architecture.

Better docs for KMS proxy and SDKs

Pull image during `enclaver build` if it's not on local disk

On a machine with no docker images, using this config file:

version: v1
image: "us-docker.pkg.dev/edgebit-containers/containers/no-fly-list:4ea61b5"
name: "example-enclave"

I would expect for my image to be pull since it doesn't exist locally:

$ sudo ./enclaver build -f enclaver.yaml --eif-only app.eif
error: Docker responded with status code 404: no such image: us-docker.pkg.dev/edgebit-containers/containers/no-fly-list:4ea61b5: No such image: us-docker.pkg.dev/edgebit-containers/containers/no-fly-list:4ea61b5

Document CPU/RAM overhead for outside components

Measure or estimate the overhead of our outside (and maybe inside) components so users can size their machines accordingly.

Update Components Outside the Enclave in architecture guide

Community Plan

Maybe set up a Discord and mention it in the README?

Decide/implement flags vs env vars vs manifest file for configuration

In the spirit of doc-driven brainstorming, the following was written up:

Enclaver accepts configuration from command line flags, environment variables, and from a configuration file for builds. When all three are present, the order of precedence is flag > env var > config file.

All environment variables are prefixed with ENC_ and exclusively use underscores. Flags exclusively use dashes. Configuration file parameters exclusively use underscores. For example, --cpu-count flag and ENC_CPU_COUNT configure the same behavior.

When overriding a configuration file parameter that is nested, like image > from, flatten it like so: --image-from or ENC_IMAGE_FROM.

We need to decide how we want to implement this.

Document enclave image format

Update Enclaver Image Format in architecture docs

Ingress Proxying

Automatically manipulate Nitro allocator

Should enclaver run automatically manipulate the allocator settings up to a reasonable default limit, say 80% of the available RAM?

Currently you modify /etc/nitro_enclaves/allocator.yaml by hand. I know the vCPUs are hot unplugged once allocated, but is the same true of the RAM? If so, this is probably not a great idea but adds a ton of overhead for management of the value. In a Kubernetes world, this is going to be another thing that the scheduler will need to understand.

systemd
Aside from if this is a good idea UX wise, what will that do to our system dependency chain? The currently doc'd Unit contains Requires=nitro-enclaves-allocator.service and if the ExecStart within that is going to need to start the allocator service...will that immediately fail our unit? We do have Restart=always but it would still be weird.

In the past when I have restarted the unit it killed my connection to the box and I believe restarted the entire machine:

$ sudo systemctl restart nitro-enclaves-allocator.service
client_loop: send disconnect: Broken pipe

# reconnect...
$ ssh ec2-user@ipaddress
$ uptime
 23:42:41 up 0 min,  1 user,  load average: 0.19, 0.07, 0.02

Timeout on fetching credentials from the IMDS

Due to the hop-limit issue, it's easy for the fetching of the credentials to hang. We need to timeout and print a useful message.

Prohibit app access to the vsock

To enforce the network policy reliably, the app needs to always go through the proxy via localhost. As such, it needs to be prohibited from connecting via the vsock directly.

Create a way to test an app's proxy support and egress policies locally

In thinking through what someone would want out of a local enclave simulator, I think the main things that could give people trouble are:

Configuring their app to use the odyn proxy
Tailoring a policy for their app

There are a few more like testing KMS integration that aren't easy to simulate locally, but these actually are. One approach might be:

Add a flag to Odyn that causes it to run in "simulator" mode, where it just forwards traffic directly out to the internet instead of over a vsock (or, forward traffic to the "outside-the-enclave" half of the proxy over TCP instead of vsock), but still enforces policy
Create a CLI command which would build a container almost like the one which would be converted to an EIF, but with the Odyn "simulator" flag set - then just run that container.

Network transmissions can get interrupted on shutdown

When the process within an enclave sends a large amount of data over an open socket and then exits, not all of the data will be sent before the enclave exits. From the client's point of view, the response to their request is prematurely terminated. I'm guessing this is because odyn isn't flushing its buffers before exiting, but I haven't had a chance to investigate.

Document final ports for sample app

Update Run the Enclave in app guide
Update Submit Passenger Names in app guide

Lock in a Name

Test / document / improve behavior when you run two enclaves at once

Document signing and verification of the enclave image

Update Signing an Enclave image section architecture docs with details.

CLI: Build OCI Artifacts

Build nitro-cli error mapping layer

Add a way to detect specific common or cryptic nitro-cli errors, and map them to a more helpful string.

Odyn should stay up in case of its own errors

To make troubleshooting easier, if odyn fails on startup but it has made it past starting up log/status service, it should continue to stay up. It should go into a zombie state and communicate it to the status endpoint. That will allow the Enclaver on the outside to collect the status and logs before terminating.

Review enclave configuration file structure

When working through the architecture and command docs in #19 it was clear to me that we need to tweak a few things in the configuration file.

Use the name "configuration", not "policy" since its more than just policy now
The image inputs and outputs are a little confusing. What if we embrace nesting to make it a bit clearer:

version: v1
name: example-enclave
image:
  from: foobar:latest
  output_tag: latest

We need to figure out which or all of these should be override-able with flags and what the relationship is. I would propose the order of precedence is flag > env var > config file and we log out when we see conflicts or just log out all of the detected env vars since it will be clear which flags are set. For nested items in the config file, we need a consistent scheme for "flattening" them.
Figure out how we want to do dashes and underscores. Flags feel better with dashes (--image-from) but env vars feel better with userscores (ENC_IMAGE_FROM). The policy file could go either way. For prior art, Kubernetes skips this issue by using camel case in its objects (outputTag)

Egress Proxying

Use labels to annotate Enclaver images

Add two labels to images built by enclaver build:

A label to indicate that the image was built by enclaver
A label indicating the hash of the source image

These have a variety of use-cases, for example enclaver run could:

Refuse to run images which are not enclaver images (to avoid confusion)
Automatically build updated images when the target is out of date

Send an EOF on logs socket

To assure that all the logs have been read by the host side, odyn should send an EOF (shutdown the socket).

Once the entrypoint has exited and all the data from the pipe has been read, an EOF marker should be inserted and used to shutdown the socket when the client reads past it.

KMS Proxy

`enclaver run` attestation verification flag

Determine if we think it is useful enough to verify an attestation before running an enclave image:

enclaver run --verify-before-run attestation.json will verify an attestation of an image after fetching it, but before executing it. If the comparison fails, the violating PCRs will be logged and the command will fail with an exit code.

Due to our threat model, this is more of a corruption check due to a hostile host manipulating the functionality.

If we move forward, update Verifying Cryptographic Attestations in architecture docs

Docs: Remove all enclaver invocation

Since the container image already contains the enclaver run command, we can remove it from any docker run instances:

$ docker run \
    --rm \
    --detach \
    --name enclave \
    --volume /var/run/docker.sock:/var/run/docker.sock \
    --device=/dev/nitro_enclaves:/dev/nitro_enclaves:rw \
    --port 443:443 \
-   us-docker.pkg.dev/edgebit-containers/containers/enclaver:v0.1.0 run \
    us-docker.pkg.dev/edgebit-containers/containers/no-fly-list-enclave:latest

CLI: Build EIFs

Running enclaver build -f POLICY_FILE --eif-only EIF_FILE should:

Load the policy file referenced in the command
Build an EIF, where the userspace filesystem is built from:
a. The contents of the Docker image referenced in policy.yaml
b. A byte-for-byte copy of policy.yaml written out at /etc/enclaver/policy.yaml
c. In the future: an odyn binary written out to /sbin/odyn

In the future, dropping the --eif-only EIF_FILE flag will be used to build an OCI-packaged EIF file.

CLI Packaging & Distribution

Expose an attestation API within an enclave

To make it easier to fetch an attestation document from within an enclave, we can expose an HTTP based endpoint:

curl -X POST -d '{ "nonce": "...", "user_data": "...", "public_key": "..." }' http://localhost:<port>/attestation

would return the attestation document.

Ability to map volumes to the source app Docker image

Hi 👋

I'm currently spiking Enclaver to run a production Vault cluster on AWS EC2 instances.
I went through the guide at https://edgebit.io/enclaver/docs/0.x/guide-vault/.

I'm still getting my head around the main concepts, so apologies if I'm not using the right terminology, or if my question doesn't make the most sense.

In our case, we'd be interested to map volumes on the source Vault container/image so we can access logs on the EC2 host and send them to CloudWatch, for example.

Since the source app Docker image is wrapped by Enclaver when we run enclaver build, running docker run -v /host/logs:/vault/logs vault:enclave doesn't map the volume on the Vault container, which I now understand.
Is this possible?

`enclaver trust` to output attestation document from a given image

Add a new command to make it easy to get an attestation document from an enclave image. You may want this to compare to what you are running, what was built from CI, or as the input to a KMS policy.

$ enclaver trust registry.example.com/my-app:v1.0
{
  "Measurements": {
    "HashAlgorithm": "Sha384 { ... }",
    "PCR0": "7fb5c55bc2ecbb68ed99a13d7122abfc0666b926a79d5379bc58b9445c84217f59cfdd36c08b2c79552928702efe23e4",
    "PCR1": "235c9e6050abf6b993c915505f3220e2d82b51aff830ad14cbecc2eec1bf0b4ae749d311c663f464cde9f718acca5286",
    "PCR2": "0f0ac32c300289e872e6ac4d19b0b5ac4a9b020c98295643ff3978610750ce6a86f7edff24e3c0a4a445f2ff8a9ea79d",
    "PCR8": "70da58334a884328944cd806127c7784677ab60a154249fd21546a217299ccfa1ebfe4fa96a163bf41d3bcfaebe68f6f"
  }
}

Remove mentions if this does not move forward:

Calculating Cryptographic Attestations on architecture docs
Inner Proxy on architecture docs
Here's an example of an attestation: on app guide docs

For folks testing on a personal AWS account, it would be nice to have this command take in an existing KMS policy to update, similar to this pseudocode:

$ aws kms get-policy | enclaver trust --kms | aws kms update-policy
Policy updated

enclave run could also take in a policy to check, as explored in #35

Document build output once finalized

Update Building an Enclave in architecture docs
Update Build the Enclave Image in app guide
Update Build the Enclave Image in vault guide

Prepare and update docs for launch

Review and clean up CLI help

Review/update enclaver build flags
Review/update enclaver run flags
Review/update enclaver run-eif flags

Does Enclaver support mTLS / self-signed certificates?

Hi, it's me again 👋

I'm spiking running a production-grade Vault cluster in Enclaver.

I'm having issues joining a second node to a cluster, at the very last step where the existing leader node needs to communicate to the new-joining node with mTLS.
The client certificate is self-signed and generated by Vault, see an excerpt from the official documentation:

[...]
For the request forwarding method, the servers need direct communication with each other. In order to perform this securely, the active node also advertises, via the encrypted data store entry, a newly-generated private key (ECDSA-P521) and a newly-generated self-signed certificate designated for client and server authentication. Each standby uses the private key and certificate to open a mutually-authenticated TLS 1.2 connection to the active node via the advertised cluster address. When client requests come in, the requests are serialized, sent over this TLS-protected communication channel, and acted upon by the active node. The active node then returns a response to the standby, which sends the response back to the requesting client.

Unfortunately, this communication fails with the following error message from Vault:

{
  "@level": "error",
  "@message": "failed to heartbeat to",
  "@module": "storage.raft",
  "@timestamp": "2023-12-01T09:15:23.527220Z",
  "backoff time": 2500000000,
  "error": "dial tcp 10.1.54.175:8201: connect: network is unreachable",
  "peer": "10.1.54.175:8201"
}

Things I've confirmed:

The IP address is correct.
The nodes can communicate over HTTP on port 8200, since prior to that last step, the new-joining node makes an HTTP call to the existing leader node to submit its desire to join the cluster.

The Enclaver manifest file allows both ingress on port 8201 for the existing leader and egress to the VPC CIDR for the new-joining node:

# https://edgebit.io/enclaver/docs/0.x/manifest/
version: v1
name: "enclaver-vault"

sources:
  # Name and tag of the Docker container that contains the application code
  app: "$SOURCE_DOCKER_IMAGE_NAME"

# Name and tag of the Docker container outputted from the build process
target: "$TARGET_DOCKER_IMAGE_NAME"

ingress:
  # Vault listens on both 8200 (API) and 8201 (node-to-node communication)
  - listen_port: 8200
  - listen_port: 8201

egress:
  allow:
    # IMDS
    - 169.254.169.254
    # EC2 APIs for auto-join discovery
    - ec2.*.amazonaws.com
    # VPC CIDR
    - 10.1.0.0/16
    # EC2 host (I don't think we need this one)
    - host

kms_proxy:
  listen_port: 9999

defaults:
  memory_mb: 2000

I tried the same setup by running the "bare" source Docker images and the node-to-node communication works fine, i.e. the second node did complete joining the cluster.

Do you know if there's something in Enclaver that would prevent this from happening, or if maybe there's a way to make this work?

Thanks, please let me know if you need additional information.

Enclaver should support "cross-compiling" images for other architectures

Enclaver should support building images for architectures other than the local one.

I've started work on this here, but am putting it on hold due to trouble making image resolution work.

Currently, the first step "enclaver build" is image resolution, where we take various image identifiers (source image, odyn image, wrapper-base image, etc) and resolve them to a specific image ID. This is all orchestrated through the Docker daemon - enclaver never talks to an image registry directly.

However, the Docker Engine API doesn't seem to have good support for listing or interacting with non-native images.

An alternative option might be to attempt to interact with registries directly, but:

We'd need to be able to authenticate to the registry (or possibly use this API?).
This would break the ability to build from a local source image - ie, source images would always have to be pushed to a registry.

Interestingly, Docker itself seems to be moving away from supporting building from local source images with buildx (for example) - but this seems like a pretty critical requirement for most CI pipelines.

Support systemd-notify

Enclaves can take a variable amount of time to start (and stop). Support for systemd's notify mechanism would greatly simplify the lifecycle management.

Add support for transparent TCP egress

Enable processes in enclaves to transparently resolve and connect to any hostname or IP address permitted by the egress policy, without explicit use of any proxy.

We can likely accomplish this through some combination of providing a custom DNS resolver, to track the processes mapping of host names to IP addresses, and transparent TCP redirection to our proxy via nftables or iptables.

We should support UDP while we’re at it, if it is easy.

Update Vault guide

Holding issue for all of the updates that need to happen in the Vault guide.

Once #79 is merged, the updated Vault guide should close out:

Fixes #23
Fixes #33

Clean up intermediate build artifacts in Docker

Investigate removing "stopped containers" and intermediate images which are currently left behind by enclaver build.

At a minimum, aim to enable docker prune to work correctly (eg by not leaving tagged images behind). Better yet, do the pruning ourselves?

CLI: Ability to run an EIF

Introduce an enclaver run-eif command which can be pointed at an EIF file, and will start and supervise a Nitro Enclave.

In the future (out of scope for this task), the "host" side of enclave proxying should be integrated into this process.