As discussed in <a href="https://github.com/kubernetes/kubernetes/issues/28311" data-h

<a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="/us

Explore design for cells (named node classes) about node-feature-discovery HOT 19 CLOSED

ConnorDoyle commented on August 27, 2024

Explore design for cells (named node classes)

from node-feature-discovery.

Comments (19)

jeremyeder commented on August 27, 2024

@timothysc @smarterclayton @derekwaynecarr

Thanks for filing this. The idea for cells is to:

Retain the ability for developers to ship code quickly and easily
De-couple shipping code from the need to understand the infrastructure at it's lowest levels.
- Assuming this code has special requirements, such as:
  - Which geography it should run in
  - Which mix of hardware/software is required (containers don't fix everything!)
  - What SLA the application has
  - Any other business need/logic

Also I noted that you titled this "node classes". In a way, cells are directly analogous to storage classes that have already been merged.

from node-feature-discovery.

derekwaynecarr commented on August 27, 2024

A StorageClass is primarily for dynamic provisioning. The PVC itself is not mutated other than a reference to the StorageClass, and that reference is optional, and we still have issues that arise when the StorageClass changes or is removed while the PVC remains.

This appears more to modify the incoming pod with default labels/taints/tolerations/resource requests/etc which may or may not have its own problems, and the scope of things that you would want to modify may be quite large.

from node-feature-discovery.

fabiand commented on August 27, 2024

Initially I was worried that exposing low-level features like CPU flags in annotations might quickly clutter those.
But by grouping these feature sin Cells (or maybe NodeFeatures?) this concern goes away.

But what about using the reverse fqdn notation - which makes it look nicer in a sorted fashion:

io.kubernetes.kernel.psap/flags: svm
io.kubernetes.net1.psap/name: data
io.kubernetes.net2.psap/name: controll

from node-feature-discovery.

ConnorDoyle commented on August 27, 2024

@fabiand thanks for the feedback. See also #27 where we're bikeshedding the prefix for all labels published by NFD.

from node-feature-discovery.

davidopp commented on August 27, 2024

Is there some way we can get the same benefits without having to introduce a new API object (Cell)?

You could put annotations directly on the daemonset pod, which could read them through the downward API (or even better, configure it using ConfigMap).

On the requesting pod, you could apply an annotation that is read by an admission controller (configured via the same ConfigMap as the daemonset pod uses) that adds the corresponding node selectors.

This isn't very different than what you have proposed, but the Cell just becomes something defined via a ConfigMap rather than as its own API object.

from node-feature-discovery.

fabiand commented on August 27, 2024

@davidopp to me that sounds a little to generic. To me features look special enough (they will be valuable for any application having some kind of hardware dependence), that they should be exposed in a well defined place.

I am not sure if it was already discussed in another place already: But why don't we expose the features of a host in the Node status? In that case we could reuse the existing object.

What is actually speaking against this from my POV is that these informations should probably be gathered outside of core node, and might be an add-on. In that case we don't want add-ons to update the Node status, do we?

Besides that: I would not expose/list all the features in an entity directly, at least not low-level features like svm/vmx or whatever, but I'd rather think that we the features are either grouped (like in cells) and this group is referenced. Or that some kind of controller is deriving more high-level features from a set of primitives (i.e. virtualization from svm or vmx), and this high-level feature can then be used for scheduling.

from node-feature-discovery.

balajismaniam commented on August 27, 2024

@davidopp, I think ConfigMap might work . @ConnorDoyle, @jeremyeder and I were discussing about using third party resources to define a Cell and associating the Cell to a node using labels. That way, we could select a node based on the label (or Cell). WDYT?

from node-feature-discovery.

jeremyeder commented on August 27, 2024

I agree w/TPR primarily because it is a zero-friction method of prototyping the long term solution, which I'd hope is a new API object. There's nothing in the API that cleanly lets us do this now, and retains a simple UX for users.

We also want Cells tied to other features in the product such as admission control and RBAC. While we can do the first generation of this with TPR, at least I would like us have a clear goal of promotion to a real API object that people can rally their design/architectures/security stance around.

As far as @davidopp point about adding node selectors...I think we want users to be comfortably numb about individual nodes, and target "any node that can successfully run my magic-app", which is abstracted cleanly with a Cell object.

from node-feature-discovery.

davidopp commented on August 27, 2024

I'm still not understanding why you need a new API object (or a TPR). Any kind of configuration that you can put in an API object you can put in a ConfigMap.

from node-feature-discovery.

timothysc commented on August 27, 2024

I'm still not understanding why you need a new API object (or a TPR).

Usability and sane RBAC rules.

Any kind of configuration that you can put in an API object you can put in a ConfigMap.

Users already have a tough time, I think that would obfuscate it even more.

from node-feature-discovery.

davidopp commented on August 27, 2024

Ah I see, you want to limit who can make changes to the Cell information, which I guess wouldn't be possible with a ConfigMap? (Though it seems that's a pretty big deficiency if we can't control who can modify ConfigMap, since we're moving towards doing all Kubernetes component config via ConfigMap)

from node-feature-discovery.

ConnorDoyle commented on August 27, 2024

I've been meaning to comment since we chatted but have been on the road. Thanks Balaji for summarizing what we talked about. Just a couple of additional comments/ideas:

Another vote for TPR, allows for seamless UX between experimentation and potential first-class resource in the future.
If we represent Cell membership as a special label on nodes, then we can schedule against them for free using affinity.
Node cell membership labels could be published by the NFD container (via a new "cell" source). The cell source could fetch the criteria for the defined cells via the API server and compare against the discovered features. If the NFD container is configured to use --sources=cpuid,cell then the --label-whitelist attribute could be used to filter out everything but the cell membership labels.

Anyway, this seems to be shaping up. Should we collaborate on a design doc to work through the details?

On Nov 14, 2016, at 13:28, David Oppenheimer [email protected] wrote:

Ah I see, you want to limit who can make changes to the Cell information, which I guess wouldn't be possible with a ConfigMap? (Though it seems that's a pretty big deficiency if we can't control who can modify ConfigMap, since we're moving towards doing all Kubernetes component config via ConfigMap)

—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub, or mute the thread.

from node-feature-discovery.

fabiand commented on August 27, 2024

Was there some further movement outside of this thread on this topic? Our use-case (http://kubevirt.io) also requires to publish node features.

I do like the separation of Cells as TPRs and associating them to pods using labels.

What would be a next step, @ConnorDoyle ?

from node-feature-discovery.

fejta-bot commented on August 27, 2024

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle stale

from node-feature-discovery.

fabiand commented on August 27, 2024

/remove-lifecycle stale

from node-feature-discovery.

fejta-bot commented on August 27, 2024

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle stale

from node-feature-discovery.

fejta-bot commented on August 27, 2024

Stale issues rot after 30d of inactivity.
Mark the issue as fresh with /remove-lifecycle rotten.
Rotten issues close after an additional 30d of inactivity.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle rotten

from node-feature-discovery.

fejta-bot commented on August 27, 2024

Rotten issues close after 30d of inactivity.
Reopen the issue with /reopen.
Mark the issue as fresh with /remove-lifecycle rotten.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/close

from node-feature-discovery.

k8s-ci-robot commented on August 27, 2024

@fejta-bot: Closing this issue.

In response to this:

Rotten issues close after 30d of inactivity.
Reopen the issue with /reopen.
Mark the issue as fresh with /remove-lifecycle rotten.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/close

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

from node-feature-discovery.

Explore design for cells (named node classes) about node-feature-discovery HOT 19 CLOSED

Comments (19)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent