Code Monkey home page Code Monkey logo

Comments (3)

evan2645 avatar evan2645 commented on August 17, 2024 2

Hi @sebastiandero - thanks for your patience here, and for opening this issue. It has come up in many forms in the past, and I've realized that I don't think we've adequately captured any of those conversations .. so I'll use this issue to do that. Speaking personally, I'd really like to see a better answer to this problem. If you or others are willing to help, that would be wonderful 🙏

SPIRE under K8s vs SPIRE over K8s

SPIRE today is generally deployed "under" K8s, meaning the agent expects some elevated access to the underlying host. It can serve both k8s and non-k8s workloads simultaneously. SPIRE Agents are sometimes run via Daemonset for convenience, and other times run via e.g. systemd.

This arrangement has pros and cons, and the trade-offs there aren't appropriate for everyone. The alternative is a world in which SPIRE runs wholly on top of k8s, with no access to or knowledge of the underlying host. For the purposes of this issue, we'll call the former "SPIRE under K8s" and the latter "SPIRE over K8s".

The user and UX

One major difference between these two modes is the shape of the user and their experience. SPIRE under K8s is generally deployed and operated by a cluster admin. In smaller companies, this is often someone who has full control of both the app and the cluster it runs on. In larger companies, this is often someone like a "platform engineer" offering SPIFFE services to other developer teams.

This excludes a couple groups:

  • Users at larger companies that do not have cluster admin
  • Users leveraging "serverless" K8s services (e.g. EKS on Fargate)
  • (anyone else?)

It also results in a number of UX challenges:

  • K8s RBAC
  • CSI Driver / socket injection
  • hostPid
  • hostPath
  • Kubelet API access
  • etc

The threat model

Another major difference between these two modes is the threat model. In SPIRE under K8s, users can (and do) remove some level of trust from the K8s control plane by leveraging provider- and hardware-centric attestation mechanisms. This allows them to issue identities only when they know that the software is running on the correct hardware, rather than trusting just any pod that a K8s cluster can schedule.

In SPIRE over K8s, you must 100% trust the K8s control plane. If users are OK with this, then the above UX challenges can probably be avoided. Even the concept of registration could be avoided, because we can unilaterally trust whatever the K8s API Server says and look at e.g. an annotation instead.

Possible solutions

I think there are a range of possible solutions, with varying levels of lift and benefit. I can think of two, which I've documented below

New lightweight implementation

In a mode where you fully trust k8s for everything, and you run an agent in a sidecar, workload attestation is no longer needed. Neither is workload registration. You can label or annotate a deployment, and have a socket appear in the pod via sidecar. I think this is the ideal experience - it is k8s native, requires none of the sharp edges needed above in the UX challenges list, etc.

This solves well for the excluded user groups, but it does not solve the needs of e.g. platform engineers, who need more control. Since it doesn't fit the needs of our existing user group(s), it can't be the only solution, and we'll need to continue supporting the current approach. For this and other reasons, taking this approach probably means it's a new implementation geared specifically towards this kind of use.

A specialized node attestor

SPIRE strives to uniquely identify the node that the agent is running on via node attestation. There are a lot of assumptions built around this, including one that enforces a single agent per node. In k8s node attestation, we attest the node uid, and disallow the existence of multiple agents simultaneously on the same node

That said, recent changes in SPIRE core may allow for us to side-step some of it. For example, a node attestor similar to k8s_psat, except issuing agent identities as a function of pod uid rather than node uid. We'd also need to update or reconfigure spire-controller-manager to create registrations based on pod uid rather than node uid. And perhaps the biggest downside is you'll still have to deal with things like hostPid, Kubelet API access, etc in addition to registration, as is today. Since those are the things that really unlock non-platform eng use cases, it's hard for me to understand how much value there is in this approach.

/cc @kfox1111 who had some interest in this option

from spire.

evan2645 avatar evan2645 commented on August 17, 2024 2

Oh I will also add that, in order to make the second possible solution sustainable, we'll need to implement automated periodic purge of stale agents

from spire.

evan2645 avatar evan2645 commented on August 17, 2024

IMO it's pretty clear that it would be good for the project to support this model somehow, so I'll go ahead and move it to the backlog as unscoped.

from spire.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.