confidential-containers / enclave-cc Goto Github PK

Process-based Confidential Container Runtime

License: Apache License 2.0

Makefile 2.26% Rust 15.35% Go 65.90% Dockerfile 7.23% Shell 9.09% C 0.17%

enclave-cc's Introduction

Confidential Containers

Welcome to confidential-containers

Confidential Containers is an open source community working to leverage Trusted Execution Environments to protect containers and data and to deliver cloud native confidential computing.

We have a new release every 6 weeks! See Release Notes or Quickstart Guide

Our key considerations are:

Allow cloud native application owners to enforce application security requirements
Transparent deployment of unmodified containers
Support for multiple TEE and hardware platforms
A trust model which separates Cloud Service Providers (CSPs) from guest applications
Least privilege principles for the Kubernetes cluster administration capabilities which impact delivering Confidential Computing for guest applications or data inside the TEE

Get started quickly...

Kubernetes Operator for Confidential Computing : An operator to deploy confidential containers runtime (and required configs) on a Kubernetes cluster

Further Detail

Contribute...

CONTRIBUTING

License

enclave-cc's People

Contributors

Stargazers

Watchers

enclave-cc's Issues

Roadmap to support new image format for eaa-kbc

As stated #108, we brought a unified format for encrypted images and a new mechanism to identify kbs resources like decryption keys. We need the following jobs to be done to support the enhancement for encrypted image.

For eaa-kbc

Implement verdictd to support the new resource uri scheme
inclavare-containers/verdictd#29
confidential-containers/attestation-agent#152
update the image-rs version for enclave-agent
prepare new test images
#108
update ci for eaa-kbc

image pull failures with multi-layer images

I have debugged a problem where image pull fails for an image with 8 layers.

image-rs creates a pull thread for each layer and for some reason Occlum ends up in some lockup state with 8 threads. I tested a custom image-rs version that uses at most 4 threads but I also needed to add more resources to Occlum to make it finally working:

-.resource_limits.kernel_space_heap_size= "600MB" |
+.resource_limits.kernel_space_heap_size= "1024MB" |

TODO:

confidential-containers/guest-components#153
update Occlum resource limits.

Gramine PAL API adaption

adapt enclave-agent to containerd Transfer service

A quote from containerd:
"The transfer service provides a simple interface to transfer artifact objects between any source and destination. This allows for
pull and push operations to be done in containerd whether requested from clients or plugins. It is experimental in this release
to allow for further plugin development and integration into existing plugins.

See the Transfer Docs"

consider moving to a per node GRPC service
support layer sharing between containers

update crate with official image-rs

Will update crate with official image-rs once the confidential-containers/guest-components#37 merged.
image-rs = { git = "https://github.com/zhiwei-intel-h/image-rs", branch = "dev-occlum-adaption", features = ["occlum_feature"], default-features = false }

import shim-rune with a clean base from IC project

build: use APT preferences to force SGX PSW and DCAP versions to what Occlum prefers

something like (but with an ARG override option) :

+++ b/tools/packaging/build/agent-enclave-bundle/Dockerfile
@@ -13,6 +13,7 @@ RUN apt-get update && \
 RUN curl --proto '=https' --tlsv1.2 -sSf https://sh.rustup.rs | sh -s -- -y
 RUN echo "deb [arch=amd64 signed-by=/usr/share/keyrings/intel-sgx.gpg] https://download.01.org/intel-sgx/sgx_repo/ubuntu focal main" | tee -a /etc/apt/sources.list.d/intel-sgx.list \
  && wget -qO - https://download.01.org/intel-sgx/sgx_repo/ubuntu/intel-sgx-deb.key | gpg --dearmor --output /usr/share/keyrings/intel-sgx.gpg \
+ && wget -qO /etc/apt/preferences https://download.01.org/intel-sgx/sgx_repo/ubuntu/apt_preference_files/99sgx_2_17_100_focal_custom_version.cfg \
  && apt-get update \
  && env DEBIAN_FRONTEND=noninteractive apt-get install -y --no-install-recommends \
     libsgx-dcap-ql \
@@ -44,7 +45,7 @@ RUN env DEBIAN_FRONTEND=noninteractive apt-get install -y --no-install-recommend
     occlum

integrate cosign signature verification feature into enclave-cc

cosign image signature verification feature in image-rs is released in CoCo V0.2.0. This issue request is to enable this feature in enclave-cc and make sure this signature verification method can be configured and used in e2e and operator CI.

dependencies and status:

dependency 1: rats-tls build into agent (#48)
dependency 2: enable image-rs cosign feature for agent (#87)
dependency 3: use updated verdictd supporting cosign resources
dependency 4: agent should fix the bug of agent reading config (#51 (comment))
dependency 5: encrypted and cosigned image for testing purpose: docker.io/eqmcc/helloworld_enc:latest, in CI if kata-cc has same kind of image and resource, we can use that for consistency

update to combined image-rs+ocicrypt+AA repo

Let's have a separate ticket to ensure we move to the combined image-rs+ocicrypt+AA repo asap

secure security_validate policy

after #83 we need to ensure how to secure security_validate pull policy.

Add a CI pipeline as close to the Kata Containers one as possible

After chatting with @fidencio about the confidential-containers first release CI/CD requirements we felt that this would be an essential step for enclave-cc to integrate into the CC release.

Description

In kata-containers/kata-containers#3992 the Kata containers project is creating a kata-deploy image that can be used in to the operator payload. It also already has some Kubernetes CC integration tests as well as a set-up to install the pre-reqs.

In order to integrate Enclave-CC into the CC operator it needs to build it's own payload image and to demonstrate that Enclave-CC and 'Kata-CC' offer similar levels of functionality the Kata CC tests (or as close as possible) should be run on the Enclave-CC.

integrate Gramine into enclave-cc

this issue is to track the tasks and status of integration of Gramine into enclave-cc as it requires cooperation from other components and contributors.
Gramine integration includes the following tasks:

update documentation to reflect new features of enclave-cc

with external KBS(EAA+Veridctd) and cosign signature verification and other capabilities integrated into enclave-cc, the documentation needs to be updated to show how end user can use these new features.

Resolve FOSSA Failure

The FOSSA bot is reporting failing.

github.com/cilium/ebpf   (v0.9.1)  Golang[ policy flag](https://app.fossa.com/projects/git%2Bgithub.com%2Fconfidential-containers%2Fenclave-cc/refs/branch/main/540d6a00bbb0b44e31471265ae03d10fb8f34c3a/issues/licensing/2444275)
Cached by the Go Module Proxy at Tue, 19 Jul 2022 09:51:23 GMT

It might be a false positive on the cilium dependency, or that dependency may need to be removed.

It looks like in order to get into the issue inside fossabot you need to give it invasive access to your github account. Might be worth looking at using Snyk rather than FOSSA.

Install the RATS-TLS library in compile env to fix dependency bugs.

Although the PR #48 installs RATS-TLS in the runtime env, if we want to build a enclave agent referring to latest image-rs, which depends on the RATS-TLS, we need to install the RATS-TLS in the compile env too.

Otherwise, the ld complains that it can not find rats-tls as shown in here

 = note: /usr/bin/ld: cannot find -lrats_tls
          collect2: error: ld returned 1 exit status

The building Dockerfile should install the rats-tls in the builder, it may look like

FROM rust:1.63-bullseye as builder


RUN apt-get update && \
    env DEBIAN_FRONTEND=noninteractive apt-get install -y \
    protobuf-compiler

# FIX: install rats-tls
RUN git clone --depth 1 https://github.com/inclavare-containers/rats-tls.git && \ 
    cd rats-tls && \
    cmake -DRATS_TLS_BUILD_MODE="occlum" -DBUILD_SAMPLES=on -H. -Bbuild && \
    make -C build install

# Build enclave-agent
COPY src/ /enclave-cc/src/
RUN cd /enclave-cc/src/enclave-agent && \
    rustup component add rustfmt && \
    make

# Start preparing enclave-agent "bundle"
FROM ubuntu:20.04
...

Designs the PAL API v4 changes

This issue is used to address the definition of PAL API v4 for enclave-cc.

Update Quickstart for v0.8

https://github.com/confidential-containers/community/blob/main/guides/enclave-cc.md
Mentions verdictd. Should we update this to say cc-kbc?

Also maybe just point them to one of our test files?
https://github.com/confidential-containers/enclave-cc/blob/main/test/e2e/decrypt_config-HW-cc-kbc.conf

We had some feedback that it wasn't clear what to use for IP and Port, for example. So just pointing people to something that will work might be best for the quickstart.

improve payload image creation

Current payload image tooling in tools/packaging was drafted fairly quickly before the release and there few improvement areas:

build shim-rune in a container (#68)
build agent-enclave by COPYing the code rather than git cloneing (#69 )
move init maintenance to this repo (#43)
build init by COPYing the code rather than git cloneing (#69 )

improve CI test coverage with real-world containers

we currently have just a very simple (and small) hello-world image. Improve test coverage by adding new images

sample redis deployment
sample image from kata-cc
sample HUGE (1G+) image deployment?

update to Occlum NGO

adapt shim-rune to per pod unix-domain-socket (#18)
update agent-instance and boot-instance builds to install official Occlum debian packages

make agent listen on unix socket instead of all devices

Occlum needs to add support for Unix Domain Sockets
Once that is added, update the agent...

Update src/enclave-agent/Cargo.toml to use upstream ttrpc instead of haosanzi's branch

ttrpc = { git = "https://github.com/haosanzi/ttrpc-rust", features = ["async"] }

Fix TODO comment in agent src/enclave-agent/src/main.rs
// TODO: will replace with unix socket
const SOCK_ADDR: &str = "tcp://0.0.0.0:7788";

Enclave-CC development status for the first CoCo release

Let's track enclave-cc development status here for the first CoCo release. It will record what we have done, what we have left and potential issues.

Components

shim-rune
enclave-agent
image-rs
ocicrypt-rs
attestation agent
operator

Enclave-cc development status

Design topic collections

This issue collects the topics about the detailed sub-system/module designs for enclave-cc. The high level arch design will add a section to refer to these topics. Please feel free to contribute your topic.

new component: agent enclave (formerly stub enclave)
shim-rune changes @haosanzi
rune and PAL API v4 changes @YangLiang3
Local Attestation protocol for FUSE key transmission @bigdata-memory
FUSE encryption system

References

Development plan: #2
High level design doc: #1

RFC: Use runc in the first CoCo release

We have worked on getting the agent enclave and boot instance bundles installed using the operator. These are easy as they just involve copying the pre-built bundles to the host filesystem. Similarly, shim-rune installation is straighforward as it's just a stand-alone Go binary.

However, getting rune (and all of its dependencies) installed in a distro-agnostic way using the operator is currently not available. In addition, it seems to be possible to use enclave-cc simple deployment with just runc.

Since the enclave-cc arch diagram talks about rune, I thought it would make sense to submit this proposal to propose that we'll start with just runc.

Any feedback?

$ kubectl logs enclave-cc-pod
["init"]
Hello world!

Hello world!
...
$ kubectl get pod enclave-cc-pod -o json| jq .spec.containers[0]
{
  "command": [
    "/run/rune/boot_instance/build/bin/occlum-run",
    "/bin/hello_world"
  ],
  "env": [
    {
      "name": "LD_LIBRARY_PATH",
      "value": "/run/rune/boot_instance/build/lib:/lib/x86_64-linux-gnu/:/usr/lib/x86_64-linux-gnu/"
    }
  ],
  "image": "docker.io/huaijin20191223/scratch-base:v1.8",
  "imagePullPolicy": "IfNotPresent",
  "name": "hello-world",
  "resources": {
    "limits": {
      "sgx.intel.com/enclave": "1"
    },
    "requests": {
      "sgx.intel.com/enclave": "1"
    }
  },
  "securityContext": {
    "capabilities": {
      "add": [
        "IPC_LOCK"
      ]
    }
  },
  "terminationMessagePath": "/dev/termination-log",
  "terminationMessagePolicy": "File",
  "volumeMounts": [
    {
      "mountPath": "/var/run/secrets/kubernetes.io/serviceaccount",
      "name": "kube-api-access-q5dnw",
      "readOnly": true
    }
  ],
  "workingDir": "/run/rune/boot_instance/"
}

[RFC] Development plan

I create this in order to gather the development resources and track the task list. At this moment, IC team from Ali and Intel developers covering container runtime parts will explicitly participate in this project. Also, I‘m glad to add more tasks to detail our works according to the feedback.

Components

shim-rune @haosanzi
rune and PAL API v4 @YangLiang3, @bigdata-memory
agent-enclave @YangLiang3 @arronwy @zhiwei-intel-h
Gramine (including PAL adaption) @ying2liu, @bigdata-memory
Occlum (including PAL adaption) @qzheng527
FUSE encryption fs
CI/CD

Milestone 1: Initial PoC

The goal is to enable enclave-cc arch with a LibOS to launch an unencrypted/unsigned hello-world container image. The first LibOS to support this milestone is Occlum. Gramine still needs to discuss about PAL API adaption and decoupling design.

agent-enclave supports to pull unencrypted/unsigned image and unpack it as OCI bundle to unprotected storage. @zhiwei-intel-h @YangLiang3
the LibOS in app enclave can mount the OCI bundle from unprotected storage as app container rootfs. @zhiwei-intel-h @qzheng527 @YangLiang3
shim-rune supports to launch agent-enclave during Pod creation. @haosanzi
demo initial enclave-cc PoC based on Occlum (#8 ) @YangLiang3 @qzheng527 @zhiwei-intel-h @haosanzi
draft local attestation design doc describing the protocol between agent-enclave and app-enclave for exchanging the FUSE key. @bigdata-memory

Milestone 2: initial code finalization

The primary goal is to submit and review the initial PoC code base, and enable container image protections.

import shim-rune (#4 ) component with a clean base from IC project. @haosanzi @zhiwei-intel-h
integrate ocicrypt-rs to enable the container image decryption support. @intchr @HaokunX-intel @zhiwei-intel-h
enable the image signature verification support. @zhiwei-intel-h
documentation works (in documentation repo and this repo).
add the basic CI/CD for compilation error check.
add the CI/CD runtime test on genuine SGX 2.0 machine for enclave-cc project.

Milestone 3: attest agent-enclave through remote attestation

The primary goal is to enable the image protections and E2E demo with remote attestation support.

Adapt rats-tls/librats for Gramine LibOS. @vijaydhanraj @ying2liu
enclave-agent as a single process needs to integrate attestation-agent as internal module/service. @jialez0
Finalize the PAL API v4 changes: #6 @YangLiang3 @ying2liu @bigdata-memory
Finalize Occlum PAL API adaption. @qzheng527
Finalize Gramine PAL API adaption (see #7 for details). @ying2liu, @bigdata-memory
add E2E K8s CI/CD test.
add nightly test.

Milestone 4: initial release

The goal is to accomplish the initial design of enclave-cc, and add operator deployment support for enclave-cc.

add CC operator support to deploy enclave-cc.
implement that API limiting in enclave-agent.
LA protocol design closed. @bigdata-memory
implement the ttrpc client and server for the attested channel used to exchange the decryption key between agent enclave and app enclaves. @YangLiang3 @zhiwei-intel-h @ying2liu @bigdata-memory
provide a build and test script/Makefile to help to develop enclave-cc.
rebase shim-rune and rune from IC and upstream. @haosanzi @YangLiang3
enhance CI/CD to cover above new features.

Follow-up Story

create the coordination of enclave-cc and kata-cc on Intel CPU, forming the so-called “small TEE and big TEE" similar to "small core and big core", and deploy them together in a node.
kata-sgx

Reference

update operator flows for NFD and Debug

There's work ongoing on Kata-CC side to improve RuntimeClass creation with capabilities provided by NFD and to configure the installation to be debug enabled.

This issue is to track/follow the work and implement the same functionality for enclave-cc.

enclave-agent build error.

Run "make" under src/enclave-agent, met below errors:

error[E0425]: cannot find function decode_config in crate base64
--> /home/jie/.cargo/registry/src/github.com-1ecc6299db9ec823/sequoia-openpgp-1.11.0/src/armor/base64_utils.rs:166:19
|
166 | match base64::decode_config(&bytes, base64::STANDARD) {
| ^^^^^^^^^^^^^ not found in base64

error[E0425]: cannot find value STANDARD in crate base64
--> /home/jie/.cargo/registry/src/github.com-1ecc6299db9ec823/sequoia-openpgp-1.11.0/src/armor/base64_utils.rs:166:49
|
166 | match base64::decode_config(&bytes, base64::STANDARD) {
| ^^^^^^^^ not found in base64
|
help: consider importing this constant
|
1 | use base64::alphabet::STANDARD;
|
help: if you import STANDARD, refer to it directly
|
166 - match base64::decode_config(&bytes, base64::STANDARD) {
166 + match base64::decode_config(&bytes, STANDARD) {
|

image pull errors

CoCo quickstart documentation uses bitnami/nginx:1.22.0 image as an example and I gave it a try. I'm seeing different image pull errors:

Failed to pull image "docker.io/bitnami/nginx:1.22.0": rpc error: code = Internal desc = failed to mount "unionfs" to "/run/enclave-cc/containers/nginx_1.22.0/rootfs", with error: EIO: I/O error

to debug this in more details, I ran enclave-agent (sudo runc run 123) from the bundle with OCCLUM_LOG_LEVEL=debug and used the "async-client" to debug. This time I'm getting different errors:

Green Thread 1 - pull_image -> Err(RpcStatus(code: INTERNAL message: "unpack destination \"/var/lib/image-rs/layers/sha256_3a52f76b4a6462386fe51fabf6cc829dbdef540dc64cdcc809f907bfb6c68195\" already exists")) ended: 5.798667018s

The latter blocks me from investigating the former error in details.

import rune with a clean base from IC project

update boot-instance Occlum to 0.29.7

agent-enclave uses 0.29.5 so it's good for this to follow:

enclave-cc/tools/packaging/build/boot-instance-bundle/Dockerfile

Lines 24 to 26 in 8164a4c

    
           RUN echo "deb [arch=amd64 signed-by=/usr/share/keyrings/occlum.gpg] http://mirrors.openanolis.cn/inclavare-containers/ubuntu20.04 focal main" | tee -a /etc/apt/sources.list.d/occlum.list \ 
        
            && wget -qO - http://mirrors.openanolis.cn/inclavare-containers/ubuntu20.04/DEB-GPG-KEY.key | gpg --dearmor --output /usr/share/keyrings/occlum.gpg \ 
        
            && apt-get update

Change image-rs to use sefs fstype instead of unionfs
Change runtime boot struct user_rootfs_config
Update image-rs and change boot-instance/agent-instance Dockerfile to use "upstream" 0.29.6 Occlum

Get rid of eaa-kbc & verdictd

Now we already have supported cc-kbc with occlum attester.
Gramine is underdevelopment confidential-containers/attestation-agent#212

It is time to get rid of eaa-kbc & verdictd as they are not covered by any CI test now.

create rootfs_key dynamically and seal it

We have been waiting for #20 but in the mean time, let's work on something simpler to get rid of the static rootfs_key.

The proposal is to create rootfs_key dynamically and seal with with MRSIGNER key from Occlum's getkey ioctl().

Steps:

#126
#114
update image-rs Occlum snapshotter to make the key available under the containerd_id path
update runtime boot to get the key from containerd_id path

add basic build and unit tests for enclave-agent

Adding the basic compilation test and unit test about enclave-agent in basic.yaml is very useful to help troubleshoot some basic problems

Occlum PoC

Track the status of Occlum PoC.

cc-operator-daemon-install POD keeps crashing in enclave-cc operator-based deployment.

Reproduce steps:

$ kubectl apply -k github.com/confidential-containers/operator/config/release?ref=v0.2.0
$ kubectl apply -f https://raw.githubusercontent.com/confidential-containers/operator/main/config/samples/enclave-cc/base/ccruntime-enclave-cc.yaml
ccruntime.confidentialcontainers.org/ccruntime-enclave-cc created
$ kubectl get pods -n confidential-containers-system
NAME READY STATUS RESTARTS AGE
cc-operator-controller-manager-5bf6d49bb5-94ff4 2/2 Running 0 9h
cc-operator-daemon-install-6fmdz 0/1 CrashLoopBackOff 116 (3m47s ago) 9h

Use Kata Containers rust crate for container ID verification

Background

As mentioned in #14 (comment), that PR includes some code from the Kata Containers agent (added on kata-containers/kata-containers#1521).

Upcoming Kata changes

Hence, there are now two versions of this code. However, the original version has been moved into a separate rust crate in the Kata 3.x runtime-rs branch:

https://github.com/kata-containers/kata-containers/blob/runtime-rs/src/libs/kata-sys-util/src/validate.rs#L17

The plan is to merge the runtime-rs branch into Kata's main branch soon.

CoCo Plan

Once the runtime-rs branch has been merged into Kata's main branch, we should make this repo consume the kata-sys-util crate as a dependency.

Why bother?

The function is only ~18 lines of code, so is it worth doing this? I would say yes for the following reasons:

Unit tests

The original version of the code comes with a set of tests, whereas the version of the PR in this repo does not.

Maintenance issues

If there are multiple copies of the code, who's going to maintain them and ensure they stay in sync, with the latest fixes and improvements?

Rust eschews the golang approach of code copying (aka "vendoring") since crates.io makes that approach unnecessary.

Security

This is the more important reason. The backstory for my raising kata-containers/kata-containers#1521 was that container / sandbox IDs have a habit of ending up as part of path names for a container/sandbox specific data store. In CLI parlance...

$ cid="foo"
$ tree "/run/coco/containers/${cid}/"
/run/coco/containers/foo/
├── bar.json
└── baz.config

At some point, that path will be deleted:

$ sudo rm -rf "/run/foo/containers/${cid}/"

That seems reasonable. How about now?

$ cid='../../../sbin/init'
$ sudo /bin/rm "/run/foo/containers/${cid}/"

Ouch. Not so good. This is just an example, but the point is that the function in question provides a basic level of protection against these sorts of abuses (see the unit tests on the origional PR for further details).

/cc @YangLiang3, @quanweiZhou, @bergwolf for thoughts.

In summary, we should try to reuse code as much as possible in CoCo, but avoid copying it if at all possible.

But if we do have to copy code for some unusual situation, ensure that it is trivial for any interested parties to determine the original source location and authorship, aka it's provenance.

agent fail to start with "Failed to open Intel SGX device"

when start container with runc in bundle dir see following error:

root@iZ2ze49w79e4zvkn2mcbscZ:/opt/confidential-containers/share/enclave-cc-agent-instance# runc run 1234567
[get_driver_type /home/sgx/jenkins/ubuntuServer2004-release-build-trunk-218/build_target/PROD/label/Builder-UbuntuSrv20/label_exp/ubuntu64/linux-trunk-opensource/psw/urts/linux/edmm_utility.cpp:111] Failed to open Intel SGX device.
[ERROR] occlum-pal: Failed to create enclave with error code 0x2006: Invalid SGX device. Please make sure SGX module is enabled in the BIOS, and install SGX driver afterwards. (line 152, file src/pal_enclave.c)

and the device section in config.json of agent bundle is as below:
"devices": [
{
"path": "/dev/sgx_enclave",
"type": "c",
"major": 10,
"minor": 125,
"fileMode": 438
}
]

container ENV variables passing and parsing

For complicated workloads, the tenant will set some specific ENV variables in config file. In order to run the workload successfully in Libos, The ENV variables should be passed.
Shim and enclave-agent will work together to parse and combine the environment variables, and eventually pass them to the app enclave.

Roadmap for enclave-cc to support CoCo Key Broker System

Background

Now enclave-cc uses eaa-kbc & verdictd as underlying attestation & confidential resource broker componant. At the same time, CoCo key broker system is under development. CoCo Key Broker System includes

CC-KBC in Attestation-Agent: To collect hardware & software evidence and do remote attestation and make resource request from AS
Attestation-Service: To verify the evidence sent by CC-KBC
CC-KBS: To distribute confidential resources

Currently, we also use eaa-kbc & verdictd, where

eaa-kbc: same functionality as cc-kbc
verdictd: same functionality as AS + CC-KBS

What we want

To support CoCo key broker system in enclave-cc, we need the following things. Let's make a simple roadmap for this

Implement Occlum attester in Attestation-agent's cc-kbc
confidential-containers/guest-components#213
Implement SGX verifier in Attestation-Service (As Occlum & Gramine share sgx qvl)
confidential-containers/trustee#202
Add build features for both attestation-agent/eaa-kbc and attestation-agent/cc-kbc for agent-enclave
Related tests/ci using coco-key broker system

Append CI rules of format and linter for the enclave agent.

Current CI lacks rules of format and linter for the enclave agent, it makes contributors are hard to cooperate to code in one style.

[RFC] Shim-rune Design Proposal

Design

According to Enclave Confidential Containers (Enclave-CC) Design and Architecture, shim-rune should launch the agent enclave container when pod creation. The agent enclave container is deployed in the form of OCI bundle rather than a container image, and thus shim-rune can call rune to start it. The agent enclave container has the same life cycle as the pod. Its main job is to receive and then load the actual workload's container image into an app enclave.

As is shown in the picture, the binary name of shim-rune is containerd-shim-rune-v2. shim-rune should implement the Containerd Runtime V2 (Shim API) for Enclave-CC. With shim-rune, Kubernetes can launch Pod and OCI-compatible containers with one shim per Pod.

Goals

implements the Containerd Runtime V2 (Shim API)
use rune OCI runtime to manage container
launch agent enclave container during PodSandbox creation
complete PullIImage with agent enclave container with ttrpc communication
launch application enclave container with FUSE encryption filesystem generated by agent enclave container.
kill/cleanup agent enclave container resource when stopping PodSandbox

Workflow

1. RunPodSandbox

Compared with containerd-shim-runc-v2, shim-rune should add the following functions when launching the pod sandbox.

In the Create API of shim-rune, shim-rune will create an agent enclave container with the pre-created oci bundle after the pause container is created successfully.
In the Start API of shim-rune, shim-rune will start the agent enclave container after the pause container is started successfully.

A pre-created oci bundle for the agent enclave container is needed instead of the form of a container image because the agent enclave container is the first component along with pod creation and there is no direct support for it to pull and unpack it in a TEE. The agent_container_instance field in the configuration file of shim-rune(In the Appendix section) shows the host path of oci bundle for the agent enclave container.

The agent enclave container has the same life cycle as the pod. Its main job is to receive and then load the actual workload's container image into an app enclave. The contents of the bundle file are as follows:

tree /opt/enclave-cc/agent-instance/ -L 1
/opt/enclave-cc/agent-instance/
├── config.json
└── rootfs

Note that oci bundle of the agent enclave container (agent_container_instance field in the shim-rune configuration file) must be read-only because the agent enclave containers of all Pods on this node share this bundle. Once modified, it will affect the behavior of existing or newly created agent enclave containers. Therefore, every time the agent enclave container is created, agent_container_instance is used as the read-only layer and then overlays read and write layers to aggregate the final oci bundle for the agent enclave container.

In addition, the oci bundle of agent container may have multiple different versions, they contain different versions of untrusted PAL (even support different LibOS) and its dependencies, such as configuration files. shim-rune just needs to call rune and pass the location of its oci bundle to rune.

rune receives the path of the oci bundle passed by shim-rune, parses the oci container configuration file, creates and enters a new mount namespace, and uses bind mount to mount the rootfs directory tree in the oci bundle as the container rootfs. Finally, LibOS is loaded and initialized through the PAL API pal_init() as the app container's No. 1 process runelet. Please refer to rune for the detail information.

2. Pull application Image

The PullImage API of shim-rune will send PullImageReq to the agent enclave container.
The agent enclave program uses theimage-rs to pull images and uses the encrypted file system capabilities provided by LibOS to create an encrypted unionfs (based sefs) image.
The agent enclave store the encrypted unionfs (based sefs) image in host. We will discuss the path to store encrypted unionfs (based sefs) image in the next subsection(Create application container section).

Communication protocol

Below is the Image communication protocol between the shim-rune and agent enclave container. The communication protocol is referred to kata.

syntax = "proto3";
...
// Image defines the public APIs for managing images.
service Image {
    // PullImage pulls an image with authentication config.
    rpc PullImage(PullImageRequest) returns (PullImageResponse) {}
}

message PullImageRequest {
    // Image name (e.g. docker.io/library/busybox:latest).
    string image = 1;
    // Unique image identifier, used to avoid duplication when unpacking the image layers.
    string container_id = 2;
    // Use USERNAME[:PASSWORD] for accessing the registry
    string source_creds = 3;
}

message PullImageResponse {
    // Reference to the image in use. For most runtimes, this should be an
    // image ID or digest.
    string image_ref = 1;
}

3. Create app container

Ideally, just like kata-agent, the agent enclave container generates the bundle (config.json + rootfs) of the app container based on occlum or gramine. In this way, shim-rune can avoid generating the app container bundle which depends on specific Libos (occlum or gramine).

But LibOS has limited capabilities such as:

For UnionFS supported by Occlum, the lower layer must be a single-layer FUSE encrypted file system, and the app container image to be aggregated by agent enclave may contain multiple layers, that is, the lower layer is not a single layer.
Gramine may not be able to easily support the ability to aggregate filesystems.

Possible solutions:

In the long run, promote LibOS to realize the aggregation capability of supporting multiple lower layers.
In the short to medium term, the enclave-agent program cannot rely on the rootfs aggregation capability of a specific LibOS to construct the oci bundle of the app container. shim-rune is responsible for generating the bundle（config.json + rootfs）of the app container.

Considering these limitations and after discussing with Libos team, the first stage enclave-cc uses the following methods to run the application enclave container.

Occlum

Refer to the occlum guide of Runtime boot pre-generated UnionFS image. A pre-created oci bundle for the boot instance is needed in the host. The boot instance is responsible for using the customized init, mount, and boot a pre-generated UnionFS image.

Considering the implementation of occlum dynamic mount image, shim-rune should overlay the boot instance bundle and encrypted unionfs (based sefs) image to generate the final bundle of the application container. For example:

mount -t overlay overlay -o lowerdir=<path of boot instance bundle>:<path of encrypted unionfs (based sefs) image>, \
upperdir=upper,workdir=work merged

Then the question is where is the location of boot instance bundle and encrypted unionfs (based sefs) image?

Location of boot instance bundle

One host only needs a pre-created occlum oci bundle for the boot instance. The boot_container_instance field in shim configure file can show the location.

The pre-generated boot template instance looks like an oci bundle rootfs on the host side. It contains the occlum boot template instance.

tree /opt/enclave-cc/boot-instance -L 1
/opt/enclave-cc/boot-instance
└── rootfs

1 directory, 1 file

Location of encrypted unionfs (based sefs) image

One pod only has one agent enclave container. The agent container needs to manage multiple images. After the agent container pulls an image, it will convert all layers about this image into an encrypted unionfs (based sefs) image in the host.

Since one image corresponds to a encrypted unionfs (based sefs) image, we can refer to the implement in kata-agent, agent enclave contianer can generate based on image name. The agent enclave contianer can store the encrypted unionfs (based sefs) image to directory distinguished by , such as <agent_container_bundle_path>/rootfs/run/rune/<cid>/rootfs dir.

When shim-rune launch the application enclave container, shim-rune will generate the same based on the image name, and then find the corresponding encrypted unionfs (based sefs) image. Then shim-rune overlay the encrypted unionfs (based sefs) image and boot instance bundle to generate the final app container bundle.

Gramine

TODO

At last, shim-rune call rune to launch the app enclave container. The process is similar to launch an agent enclave container with rune.

4. Stop sandbox

Compared with containerd-shim-runc-v2, shim-rune adds the work of kill/cleanup work for the agent enclave container.

How to integrate with containerd

Because shim-rune supports containerd shim v2 API, you can add the associated configurations for shim-rune in the containerd config file, e.g, /etc/containerd/config.toml, on your system.

        [plugins.cri.containerd]
          ...
          [plugins.cri.containerd.runtimes.rune]
            runtime_type = "io.containerd.rune.v2"

then restart containerd on your system.

Appendix

shim configuration sample

log_level = "info" # "debug" "info" "warn" "error"

[containerd]
    socket = "/run/containerd/containerd.sock"
    agent_container_instance = "/opt/enclave-cc/agent-instance/"
    boot_container_instance = "/opt/enclave-cc/boot-instance/"
    agent_container_root_dir = "/run/containerd/agentenclave"
    agent_url = "tcp://0.0.0.0:7788"

where:

@log_level: specify the log level for shim-rune.
@socket: the containerd socket
@agent_container_instance: the host path of pre-created oci bundle for agent enclave container
@boot_container_instance: the host path of pre-created oci bundle for boot instance
@agent_container_root_dir: the root dir of agent enclave container running state
@agent_url: the listening address of the agent container, through which the shim communicates with the agent container to perform PullImage. (support unix and tcp communication)
- Note: Currently occlum 0.27 version does not support cross_world_uds, temporarily implement ttrpc communication based on tcp
- After occlum releases NGO at the end of June, the communication between shim-rune and agent container will be switched to ttrpc based on cross_world_uds

Reference

enable signature checks for sample_kbc in CI

as a follow-up to confidential-containers/confidential-containers#139 (comment)

Specification of user defined claims in RA evidence in CC-KBC Attester for SGX

Related to #120

I am working on Occlum attester in cc-kbc confidential-containers/attestation-agent#136. Now the Evidence is defined as following. Please ignore the name as I think we can use a same format of Evidence for occlum and gramine.

struct SgxOcclumAttesterEvidence {
    /// Base64 encoded SGX quote.
    quote: String,
}

Now it only contains the base64-encoded sgx quote. We can include more claims in the Evidence by including the digest of the claims into report_data field, by which we can bond the claims to the quote.
That is, like a claim

{
    "a": "value a",
    ...
}

Could be part of the evidence.

The question is what we can include?

Some initial ideas:

As the verifier will get raw data mr_enclave from the quote, it will not know which payload is measured, s.t. what paylaod is corresponding to the mr_enclave. We could add the type or name of the payload, for example we use a key "mrenclave-id" to specify the payload, s.t. "mrenclave-id":"occlumv1.0+enclave-agentv1.0" (?) to tell the verifier which reference value should be used to compare
mr_signer: like mr_encalve, do we need to specify the signer of the sgx so file?

We might need to have a public specification for different keys and their usages?

FUSE key protection scheme

Background

We have discussed the FUSE key provisioning approach for a long time with the approaches #11 and #3.

Actually, all approaches surrounds corresponding scenarios. Without the background of scenario, it is inefficient to decide next actions.

There are 2 scenarios with different approaches to deploy enclave-cc, which heavily affects the approach of FUSE key provisioning. I recognized we don't need to seek one unified approach to cover both scenarios.

Scenario 1: Tenant owning platform

In order to run confidential containers, a tenant would like to pay for a VM or bare-metal as a platform/node to set up enclave-cc. In this scenario, a tenant has to also set up K8s and take the responsibility of platform maintenance. In this scenario, tenant hopes to have strong control on the platform.

Here is the workflow for this scenario:

App-Enclave and Agent-Enclave are signed by tenant as signer.
Use sealing key with MRSIGNER policy is much simpler than other approaches.
The FUSE key is protected by the sealing key which acts as a wrapping key.
By the way, LA protocol can still support this scenario.

Scenario 2: CSP owing platform

In this scenario, a tenant only needs to pay for a Pod to deploy a protected container image to enclave-cc provided by CSP. Obviously this is mainstream PaaS use model with efficient cost reduction, sacrificing the control of platform.

v1

Here is the workflow for this scenario 2 v1 approach:

Sealing key with MRSIGNER policy is not useful.
Even the enclave binaries used to host app-enclave and agent-encalve are same, their contents of configuration data or manifest are different and are reflected in MRENCLAVE so sealing key with MRENCLAVE policy plus other factors are not useful.
Launching local attestation protocol as described in #11
Relying Party authenticates the identity of Agent-Enclave as attester, and then provisions App-Enclave's MRENCLAVE to Agent-Enclave as reference value.
Agent-Enclave as verifier this time can authenticate the identity of App-Enclave with App-Enclave's MRENCLAVE as reference value.
This approach assumes App-Enclave doesn't need to authenticate the identity of Agent-Enclave.

v2

It is more flexible to support decoupling the configuration data or manifest and enclave binary in MRENCLAVE. This is what KSS can do. Here is a good material describing the details of using KSS.
V2 approach additionally needs the authentications to CONFIGID in remote and local attestation.
Occlum can leverage KSS to load a runtime configurable configuration data or manifest.
Still, this approach assumes App-Enclave doesn't need to authenticate the identity of Agent-Enclave.

v3

In order to allow App-Enclave to authenticate the identity of Agent-Enclave, the Agent-Enclave's MRENCLAVE is recorded in the configuration data or manifest of App-Enclave.
When launching App-Enclave, the configuration data or manifest of App-Enclave is provisioned to App-Enclave.
App-Enclave then uses Agent-Enclave's MRENCLAVE to authenticiate the identity of Agent-Enclave.
The integrity of App-Enclavev's configuration data or manifest containing Agent-Enclave's MRENCLAVE is verified by Agent-Enclave during local attestation.
Open: is there really necessary to authenticate agent-enclave by app-enclave? Even if the agent-enclave is disguised, it has no ability to spoof app-enclave to retrieve the genuine FUSE key.

Conclusions

Code sealing key with MRSIGNER policy as the first supported approach.
Add the details of identity authentication in #11 and merge this PR, driving the continuous development work to support the scenario 2.

CI failed because of key not found

https://github.com/confidential-containers/enclave-cc/actions/runs/5418442602/jobs/9850738755

error report

E0630 10:18:18.576510 3425599 remote_image.go:238] "PullImage from image service failed" err="rpc error: code = Internal desc = failed to handle layer: failed to get decrypt key missing private key needed for decryption" image="ghcr.io/confidential-containers/test-container-enclave-cc:encrypted"
time="2023-06-30T10:18:18+08:00" level=fatal msg="creating container: rpc error: code = Internal desc = failed to handle layer: failed to get decrypt key missing private key needed for decryption"
Error: Process completed with exit code 1.

Prepare new test images for image metadata enhencement

We are working on unifying the encrypted images on Attestation-Agent and image-rs side, related proposals are published (confidential-containers/guest-components#218 and confidential-containers/documentation#85). If interested please give any comments :-) The influence on enclave-cc are

When we use new version of Attestation-Agent for CI, new images should be made or enclave-agent will fail to decrypt the old ones
Wherever a confidential resource is used should be indicated in a kbs uri format

I'd like to help with this when things of AA and image-rs are finished.

agent need to set image-rs and attestation agent config in a proper place

now the ENABLE_SECURITY_VALIDATE env var is set in pull_image and image-rs client(with validate image flag) is initialized before calling pull_image so failed to catch the env var hence the signature verification is not enabled/disabled according to agent's config

the agent's config should be read in right after the agent start to later set the config for image-rs and attestation agent

tasks to do

rats-tls build into agent bundle to support EAA KBC (#48)
operator to set agent config during runtime install (#83 and confidential-containers/operator#147)
agent to read config file #70

enable dependabot updates for rust dependencies

enclave-agent
runtime-boot

limiting entry points with rootfs_entry

In #109 (in occlum) we opened things up to the root path. We should discuss whether we need limitations elsewhere in enclave-cc to prevent abusing that flexibility, or whether this does not really pose additional security risk.

improve payload image creation v2

make it possible to specify enclave signing keys during build time (occlum build --sign-key <path_to/your_key.pem>)
#128

	RUN echo "deb [arch=amd64 signed-by=/usr/share/keyrings/occlum.gpg] http://mirrors.openanolis.cn/inclavare-containers/ubuntu20.04 focal main" \| tee -a /etc/apt/sources.list.d/occlum.list \
	&& wget -qO - http://mirrors.openanolis.cn/inclavare-containers/ubuntu20.04/DEB-GPG-KEY.key \| gpg --dearmor --output /usr/share/keyrings/occlum.gpg \
	&& apt-get update

confidential-containers / enclave-cc Goto Github PK

enclave-cc's Introduction

Confidential Containers

Welcome to confidential-containers

Get started quickly...

Further Detail

Contribute...

License

enclave-cc's People

Contributors

Stargazers

Watchers

Forkers

enclave-cc's Issues

For eaa-kbc

Description

Components

Enclave-cc development status

References

Components

Milestone 1: Initial PoC

Milestone 2: initial code finalization

Milestone 3: attest agent-enclave through remote attestation

Milestone 4: initial release

Follow-up Story

Reference

Background

Upcoming Kata changes

CoCo Plan

Why bother?

Unit tests

Maintenance issues

Security

Background

What we want

Design

Goals

Workflow

1. RunPodSandbox

2. Pull application Image

Communication protocol

3. Create app container

Occlum

Gramine

4. Stop sandbox

How to integrate with containerd

Appendix

shim configuration sample

Reference

Background

Scenario 1: Tenant owning platform

Scenario 2: CSP owing platform

v1

v2

v3

Conclusions

Recommend Projects

Recommend Topics

Recommend Org