We're trying to run spire through the hardened helm charts on k8s. Our k8s environment

Hi <a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="

Run SPIRE Agent on top of k8s as a workload sidecar about spire HOT 3 OPEN

sebastiandero commented on August 17, 2024 2

Run SPIRE Agent on top of k8s as a workload sidecar

from spire.

Comments (3)

evan2645 commented on August 17, 2024 2

Hi @sebastiandero - thanks for your patience here, and for opening this issue. It has come up in many forms in the past, and I've realized that I don't think we've adequately captured any of those conversations .. so I'll use this issue to do that. Speaking personally, I'd really like to see a better answer to this problem. If you or others are willing to help, that would be wonderful 🙏

SPIRE under K8s vs SPIRE over K8s

SPIRE today is generally deployed "under" K8s, meaning the agent expects some elevated access to the underlying host. It can serve both k8s and non-k8s workloads simultaneously. SPIRE Agents are sometimes run via Daemonset for convenience, and other times run via e.g. systemd.

This arrangement has pros and cons, and the trade-offs there aren't appropriate for everyone. The alternative is a world in which SPIRE runs wholly on top of k8s, with no access to or knowledge of the underlying host. For the purposes of this issue, we'll call the former "SPIRE under K8s" and the latter "SPIRE over K8s".

The user and UX

One major difference between these two modes is the shape of the user and their experience. SPIRE under K8s is generally deployed and operated by a cluster admin. In smaller companies, this is often someone who has full control of both the app and the cluster it runs on. In larger companies, this is often someone like a "platform engineer" offering SPIFFE services to other developer teams.

This excludes a couple groups:

Users at larger companies that do not have cluster admin
Users leveraging "serverless" K8s services (e.g. EKS on Fargate)
(anyone else?)

It also results in a number of UX challenges:

K8s RBAC
CSI Driver / socket injection
hostPid
hostPath
Kubelet API access
etc

The threat model

Another major difference between these two modes is the threat model. In SPIRE under K8s, users can (and do) remove some level of trust from the K8s control plane by leveraging provider- and hardware-centric attestation mechanisms. This allows them to issue identities only when they know that the software is running on the correct hardware, rather than trusting just any pod that a K8s cluster can schedule.

In SPIRE over K8s, you must 100% trust the K8s control plane. If users are OK with this, then the above UX challenges can probably be avoided. Even the concept of registration could be avoided, because we can unilaterally trust whatever the K8s API Server says and look at e.g. an annotation instead.

Possible solutions

I think there are a range of possible solutions, with varying levels of lift and benefit. I can think of two, which I've documented below

New lightweight implementation

In a mode where you fully trust k8s for everything, and you run an agent in a sidecar, workload attestation is no longer needed. Neither is workload registration. You can label or annotate a deployment, and have a socket appear in the pod via sidecar. I think this is the ideal experience - it is k8s native, requires none of the sharp edges needed above in the UX challenges list, etc.

This solves well for the excluded user groups, but it does not solve the needs of e.g. platform engineers, who need more control. Since it doesn't fit the needs of our existing user group(s), it can't be the only solution, and we'll need to continue supporting the current approach. For this and other reasons, taking this approach probably means it's a new implementation geared specifically towards this kind of use.

A specialized node attestor

SPIRE strives to uniquely identify the node that the agent is running on via node attestation. There are a lot of assumptions built around this, including one that enforces a single agent per node. In k8s node attestation, we attest the node uid, and disallow the existence of multiple agents simultaneously on the same node

That said, recent changes in SPIRE core may allow for us to side-step some of it. For example, a node attestor similar to k8s_psat, except issuing agent identities as a function of pod uid rather than node uid. We'd also need to update or reconfigure spire-controller-manager to create registrations based on pod uid rather than node uid. And perhaps the biggest downside is you'll still have to deal with things like hostPid, Kubelet API access, etc in addition to registration, as is today. Since those are the things that really unlock non-platform eng use cases, it's hard for me to understand how much value there is in this approach.

/cc @kfox1111 who had some interest in this option

from spire.

evan2645 commented on August 17, 2024 2

Oh I will also add that, in order to make the second possible solution sustainable, we'll need to implement automated periodic purge of stale agents

from spire.

evan2645 commented on August 17, 2024

IMO it's pretty clear that it would be good for the project to support this model somehow, so I'll go ahead and move it to the backlog as unscoped.

from spire.

Run SPIRE Agent on top of k8s as a workload sidecar about spire HOT 3 OPEN

Comments (3)

SPIRE under K8s vs SPIRE over K8s

The user and UX

The threat model

Possible solutions

New lightweight implementation

A specialized node attestor

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent