Code Monkey home page Code Monkey logo

node-feature-discovery's Introduction

Node Feature Discovery

Go Report Card Prow Build Prow E2E-Test

Welcome to Node Feature Discovery – a Kubernetes add-on for detecting hardware features and system configuration!

See our Documentation for detailed instructions and reference

Quick-start – the short-short version

$ kubectl apply -k https://github.com/kubernetes-sigs/node-feature-discovery/deployment/overlays/default?ref=v0.15.4
  namespace/node-feature-discovery created
  customresourcedefinition.apiextensions.k8s.io/nodefeaturerules.nfd.k8s-sigs.io created
  customresourcedefinition.apiextensions.k8s.io/nodefeatures.nfd.k8s-sigs.io created
  serviceaccount/nfd-gc created
  serviceaccount/nfd-master created
  serviceaccount/nfd-worker created
  role.rbac.authorization.k8s.io/nfd-worker created
  clusterrole.rbac.authorization.k8s.io/nfd-gc created
  clusterrole.rbac.authorization.k8s.io/nfd-master created
  rolebinding.rbac.authorization.k8s.io/nfd-worker created
  clusterrolebinding.rbac.authorization.k8s.io/nfd-gc created
  clusterrolebinding.rbac.authorization.k8s.io/nfd-master created
  configmap/nfd-master-conf created
  configmap/nfd-worker-conf created
  deployment.apps/nfd-gc created
  deployment.apps/nfd-master created
  daemonset.apps/nfd-worker created

$ kubectl -n node-feature-discovery get all
  NAME                              READY   STATUS    RESTARTS   AGE
  pod/nfd-gc-565fc85d9b-94jpj       1/1     Running   0          18s
  pod/nfd-master-6796d89d7b-qccrq   1/1     Running   0          18s
  pod/nfd-worker-nwdp6              1/1     Running   0          18s
...

$ kubectl get no -o json | jq '.items[].metadata.labels'
  {
    "kubernetes.io/arch": "amd64",
    "kubernetes.io/os": "linux",
    "feature.node.kubernetes.io/cpu-cpuid.ADX": "true",
    "feature.node.kubernetes.io/cpu-cpuid.AESNI": "true",
...

node-feature-discovery's People

Contributors

adrianchiris avatar arangogutierrez avatar balajismaniam avatar connordoyle avatar dbaker-rh avatar dependabot[bot] avatar ffromani avatar fidencio avatar fmuyassarov avatar jjacobelli avatar jlojosnegros avatar jschintag avatar k8s-ci-robot avatar marquiz avatar mbssaiakhil avatar mythi avatar nfd-merge-bot avatar okartau avatar ozhuraki avatar piotrprokop avatar sajiyah-salat avatar spiffxp avatar stek29 avatar swatisehgal avatar tal-or avatar tessaio avatar testwill avatar uniemimu avatar vaibhav2107 avatar yselkowitz avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

node-feature-discovery's Issues

Consider expanding the scope to discovering/publishing opaque integer resources.

Per the original NFD scope document that was circulated at project incubation time, this project only handles "binary" (there-or-not) attributes.

Since then, alpha support for opaque integer resources was added to Kubernetes. See Kubernetes PR 31652.

Should we expand the scope to allow NFD sources to advertise new kinds of consumable resources?

For one concrete usage example, consider extending issue #35 (Discover SR-IOV capable network interfaces) to allow allocatable virtual functions. In this case, there is work in progress to isolate VFs in a CNI plugin.

Graduate from incubator

Criteria for graduating from the incubator.

Incubation date: 29 August 2016
Incubation deadline: 29 August 2017

When the OWNERS of the Incubator project determine they have met the criteria to graduate they should contact their Champion and Sponsor to discuss. If both the Sponsor and Champion agree that the above criteria have been met the project can graduate to become a Kubernetes Community Project and exit the Kubernetes Incubator.

TODO: Link sub-issues to gather evidence for each topic below.

  • Documented users: the project tracks evidence from downloads, mailing lists, Twitter, blogs, GitHub issues, etc that the project has gained a user base.
  • Regular releases: the project is making releases at least once every 3 months; although release should happen more often.
  • Incubation time: a minimum of 6 months has passed since incubation (March 2017)
  • Healthy community: a healthy number of users, contributors, and/or OWNERS outside of your own company have participated in the project. The Sponsor and Champion can make this call.
  • Roadmap: A roadmap for the next release is maintained; this can be in the form of a doc, GitHub milestones, or any other tool but it must be documented in the README.
  • An announcement of graduation must be sent to the [email protected] mailing list by the Champion.

Detect IOMMU

Detect if IOMMU is present (supported and enabled, by HW and the kernel).

mock*.go (generated) files process undocumented

Being new to project, I tried to improve gofmt/golint metrics in mock*.go files, and I was told these files are generated, and manual changes should be avoided. There is no mention of generated files in documentation, also I don't find any code in repo that deals with generation. Who/when/how generates those and what triggers need to refresh?
Doesn't (undocumented) manual generate & commit process create a risk that mock*.so elements get out of sync?
Proposals:

  • Can we document the mock*.so files maintenance process
  • Idea: Avoid storing mock* as source elements and instead
    generate such files on-the-fly when needed, if possible?

P.S.
mockapihelpers.go has few commits added which touch only small parts so seem like manually made. With manual changes, is that file now in mixed mode?

"Failure updating labels" sometimes happens, exits daemonset-mode pod

When one-shot job is turned into daemon, few things should get double-checked: resource leaking and stability (i.e. exit points) as these are minor issues for single shot but more serious for daemon.
K8s is good in restarting pods, but that's not excuse to allow low quality.
So I keep NFD daemonset running to check for both.
There is weak sign of slow mem.growth pattern, but it may get leveled at longer run, so I planned to keep it up for long monitoring. But no luck there, as nfd pod exits and gets restarted about once per 24h. Next, I tried shortened cycle time from 60 seconds in hope I see mem.pattern more quickly, but that does not work as the exit rate follows: with 1-sec cycle rate I see one exit in about 1h.

I ran pod log in -f mode to capture the exit messages, and these are:

2018/04/18 11:00:27 can't update node: Operation cannot be fulfilled on nodes "k8-vm-2": the object has been modified; please apply your changes to the latest version and try again
2018/04/18 11:00:27 failed to advertise labels: Operation cannot be fulfilled on nodes "k8-vm-2": the object has been modified; please apply your changes to the latest version and try again
2018/04/18 11:00:27 error occurred while updating node with feature labels: Operation cannot be fulfilled on nodes "k8-vm-2": the object has been modified; please apply your changes to the latest version and try again

That issue has likely always been there, but probability hitting it was low, it has not surfaced before.
Seems there is window during which node labels update is sentenced to fail?
Can that state be detected somehow so that we don't try to update?
Or, should we retry after some delay once such condition is hit?
Or, is this sign of some other problem somewhere else?
In any case, we should try to do better than exit, now that we can run as daemonset.

Detection of nodes in Ready state in label-nodes.sh fails too easily

label-nodes.sh calculates nodes in Ready state using this:
kubectl get nodes | grep -i ready
but this parsing method is too simple.
Few examples of strings which produce a false "ready"
in context of label-nodes.sh expectation:

node-1 NotReady 164d v1.10.2
node-2 Ready,SchedulingDisabled 165d v1.10.2

As first idea, I propose a small parsing change to
accept only clean "ready" with spaces on each side:

kubectl get nodes | grep -i ' ready '

Note that this may leave out nodes where some other non-blocking attribute
is appended, so we may need something more complex.
This whole nodes detection could be cleaner via some deterministic API instead of parsing state lines,
but is there any easily available command, without need to write even more parsing code? At least
I don't see better ways using kubectl options

Discover TDP of the CPU

Detect the TDP of the CPU package of the node.

This feature label would require enabling non-binary labels.

Container image size is huge

With the build instructions given, the resulting docker container becomes 1.2GB in size. This is huge.

Improve the Dockerfile to only ship the resulting node-feature-discovery and whatever else may be needed and drop the build environment parts.

Detect if "die clustering" is enabled

Cluster-on-die (COD) and sub-NUMA-Cluster (SNC) are technologies where a physical NUMA node is further split into separate NUMA domains, taking advantage of multiple memory controllers per physical CPU socket. COD and SNC improve the performance of highly NUMA optimized workloads.

These settings are not directly detectable from Linux user space. However, we can detect the number of NUMA nodes and number of physical IDs of CPUs. If the number of NUMA nodes is greater than the number of physical IDs, it is likely that COD/SNC is enabled. There are corner cases like numa emulation (which isn't likely be used in a k8s cluster).

Plugin mechanism for customer-specific labels

What: provide a way for customers to write their own feature detectors, creating custom labels.

Originally, I thought, it would be sufficient to implement custom feature sources as user-specific modules under source/ directory in nfd source code. The reasoning behind this idea was that the source modules and their API to "nfd core" are simple, so writing and maintaining a custom plugin would not be too cumbersome. However, this would effectively be forking nfd, requiring maintenance of your own code base and also building and using custom Docker images.

So, I was thinking about providing some relatively simple mechanism to have custom feature detectors, without the need to fork the code or use custom Docker images.

The first, obvious, idea would be to use hooks. E.g. look under /etc/kubernetes/node-feature-discovery/source.d/ and run all executable binaries from there. The hooks would return the detected features to stdout and have logging to stderr.

E.g.
running /etc/kubernetes/node-feature-discovery/source.d/my-source would have stdout output:

FEATURE_1
FEATURE_2

which, in turn, would cause nfd to create labels

node.alpha.kubernetes-incubator.io/nfd-my-source-FEATURE_1=true
node.alpha.kubernetes-incubator.io/nfd-my-source-FEATURE_2=true

This could be easily extended to non-binary labels, too

Spinning up of large number of erroneous pods

In case of some configuration errors e.g. SSL misconfiguration, Kubernetes jobs fails to successfully complete. It keeps on creating new pods and finally the ends up having ~5000 pods with Error state.
It should stop or timeout. A simple fix could be setting the RestartPolicy equal to OnFailure or Never. Are there any other recommendation of gracefully handling this error

moving to v1.0 beta release

Before we make node-feature-discovery to a beta release, we need to do things below finished.
Note: This check list is not thorough full. It may change in the future.

  • move import path from github.com/kubernetes-incubator to k8s.io.
  • make an agreement on the label schema about node-feature-discovery. #71
  • adjust node-feature-discovery to meet the requirements to be a Kubernetes repo described in kubernetes-template-project.
  • make an agreement on the release schedule (v1.0 beta and GA) for node-feature-discovery.
  • update client-go to newest stable version.
  • replace glide with godep.
  • decide which registry (quay.io or gcr.io) to use.
  • change node-feature-discovery image namespace from kubernetes-incubator to kubernetes.
    /cc @ConnorDoyle @balajismaniam

org-wide CNCF CLA bot on kubernetes-incubator

Drawing attention to:https://groups.google.com/forum/#!topic/kubernetes-dev/Q9eNHdkIdQ0
As of tomorrow, the application of cncf labels and posting of the message in case someone has not signed it should be handled by our org-wide bot, and I would suggest that you disable the "cla" munger on the fork of mungegithub that you are currently running tomorrow evening, after we deploy. Otherwise, you're likely to see two sets of bots doing the same thing and possible interaction between them.

cc @balajismaniam @ConnorDoyle

Output currently going to stderr

Most (all?) of the print statements use log.Printf which prints to stderr by default. I would expect it to print to stdout for non-errors.

Detect Kernel and OS version

These would help in placement of workloads which are dependent on certain OS or kernel version.

I proposed implementing these in kubelet (Issue #65850), but, that request seems unlikely to be accepted. This feature could be implemented in NFD, instead.

Kernel version can be read from /proc/sys/kernel/osrelease. Kernel version would be split into multiple labels, in order to make it more practical to use as labelselectors need exact values, and, no comparison operators (i.e. "greater than" etc.) can be used. Suggested labels, with example values:

node.alpha.kubernetes-incubator.io/nfd-kernel-version=4.4.0-112-generic
node.alpha.kubernetes-incubator.io/nfd-kernel-version-x=4
node.alpha.kubernetes-incubator.io/nfd-kernel-version-xy=4.4
node.alpha.kubernetes-incubator.io/nfd-kernel-version-xyz=4.4.0

An initial RFD implementation of kernel version detection is available at:
https://github.com/marquiz/node-feature-discovery/tree/feature/kernel-version

OS version would be read from /etc/os-release. At least ID and VERSION_ID should be advertised, I think:

node.alpha.kubernetes-incubator.io/nfd-os-release-ID=ubuntu
node.alpha.kubernetes-incubator.io/nfd-os-release-VERSION_ID=16.04

An initial RFD implementation of OS version detection is available at:
https://github.com/marquiz/node-feature-discovery/tree/feature/os-release

Graceful failure if discovery from a source fails

Currently because of this line whenever a source returns an error on it's Discover() call it causes nfd to fail.

I was wondering if it would it make sense to log the failure and proceed to discover features/add labels that nfd is able to instead of hard failing on any source failure?

Release v0.2.0

There hasn't been a released version of NFD in nearly one and a half years. There have been some major changes and fixes since v0.1.0. I propose to release the current version as v0.2.0.

Release checklist:

  • All OWNERS must LGTM the release proposal.
  • Update the job template to use the new tagged container image
  • An OWNER runs git tag -s $VERSION and insert the changelog into the tag description.
  • Build and push a container image with the same tag to quay.io.
  • Update the :latest virtual tag in quay.io to track the last stable (this) release.
  • An OWNER pushes the tag with git push $VERSION.
  • Write the change log into the Github release info.
  • Add a link to the tagged release in this issue.
  • An announcement email is sent to [email protected] with the
    subject [ANNOUNCE] node-feature-discovery $VERSION is released
  • Close this issue.

Changelog

New features

  • Add support for Kubernetes 1.8+ (#82)
  • Adding SR-IOV capability discovery to node-feature-discovery (#49)
  • Enable nfd framework on Arm64 platform
  • Added nonrotational storage detection
  • Add memory source and NUMA detection.
  • Export proxy env vars inside docker build. Makes it possible to build node-feature-discovery from behind a proxy server, e.g. in corporate networks.
  • Advertise selinux status by adding labels
  • Make it possible to run nfd as a DaemonSet (#105)
  • Added more Intel RDT capability discovery: CMT,MBM,MBA (#120)
  • Added template spec for configuring RBAC (#126)

Misc fixes

  • Fix to report the correct version inside container.
  • Upated RDT Discovery() to use exit status
  • Clean-up RDT helper programs.
  • Improved unit test coverage
  • Add GoReport Card Widget
  • Update Community Code of Conduct
  • Improve GoReportCard metrics, both gofmt and golint (#90)
  • Get node name from spec.nodeName instead of indirectly from pod. (#93)
  • Remove hardcoding of nfd source path (#94)
  • Use a specific released version of intel-cmt-cat

Known issues

  • Occasional restarts when run as a DaemonSet (issue #122).

Explore design for cells (named node classes)

As discussed in Kubernetes issue #28311,

From @jeremyeder:

Here's what I'd propose. The concept of a cell (bikeshed the name...)

Labels are dynamically adjusted per the system/node capabilities "fleecing pod" as per @ConnorDoyle proposal. Next is how to express that in a way that keeps the UX under control.

Propose to de-couple into two parts; one being sort of a cluster administrator job -- defining what a cell contains. It could be a logical group of nodes. A cell could be exclusive or there could be overlap. Not sure how else to describe it, but a cell might be described as a cgroup controllers for datacenters.

And so the admins would define a cell such as this below, which is the union of labels that the fleecer pod applies to a node, and that are well defined:

kind: Cell
apiVersion: v1
metadata:
    name: prod-trading
    daemonset: tunedaemonset
    annotations:
            psap.server.kubernetes.io/model: XYZ1234
            psap.network1.kubernetes.io/name: mgmt
            psap.network2.kubernetes.io/name: backup
            psap.network3.kubernetes.io/name: prod
            psap.passthrough.kubernetes.io/name: /dev/sdd,eth2
            psap.cpuset.mems.kubernetes.io/nodes: 1,exclusive
            psap.cpuset.cpus.kubernetes.io/cores: 2,4,6-11
            psap.memory.hugepages.kubernetes.io/hugetlb: 2MB
            psap.kernel.kubernetes.io/module: nvidia,onload
            psap.kernel.kubernetes.io/version: >= 3.10.0-229
            psap.tuned.kubernetes.io/profile: network-latency

And then ... the UX for a developer is very simple...just specify what cell you want to run on:

kind: Pod
apiVersion: v1
metadata:
    mame: matching-engine
    cell: prod-trading
    annotations:
            psap.kernel.kubernetes.io/ulimit: n:1048576

I should also say that @ConnorDoyle proposal covers over 90% of what we're envisioning, but there's at least one missing piece. A long-running daemon that ensures a node is "fit for service". Concept I'm calling in my head "extended-node-fitness".

In RHEL we have a daemon called "tuned" which handles system tuning based on workload. Tuned has a "verify" verb that goes through and ensures desired state matches what's in place. To avoid sprawl of infra pods/daemonsets I was thinking we could let this pod be long running and have the fitness test (in our case tuned) run.

There would need to be a call-back from this pod to the Kube scheduler if the fitness test ever failed (let's say, a rogue admin ssh'd in and flipped a few tunables that conflict with application needs). The tuned pod would inform the scheduler that it's out of sync and take some action:

  • re-enable the tuning
  • evict the node from the cluster
  • make it unschedulable or drain it
  • shut the node down hard
  • whatever else the app needs

Now ... I have a set of proposals and use-cases that center around delivering this and several other associated features. I'd love to talk with folks at KubeCon about it in more depth!

From @vikaschoudhary16:

+1 for cells. Additionallly, I think it would be helpful if api is available for user to compose a cell using kv pairs (capabilities) she thinks are desrired for her workload. Different cells could be composed for different workloads over same nodes and then accordingly mentioned in pod spec.

Wondering what could be the concerns if we trigger node-feature-discovery when a cell is composed to discover nodes with capabilities mentioned in the cell and labelling nodes with cell names.

Parameterize RDT detection binary location.

NFD shouldn't necessarily assume it runs inside the container we build by default. In sources.go:

RDTBin = "/go/src/github.com/kubernetes-incubator/node-feature-discovery/rdt-discovery"

The program could get value from an environment variable instead (and the build can inject the value above when building the container).

Deployment script for NFD

A deployment/initialization script would make NFD deployment easier and hopefully less error-prone, especially for new users.

Tasks it could handle:

  • DaemonSet deployment
  • Job deployment
  • RBAC configuration
  • Configuration options
    • Common
      • container image which to use
      • k8s namespace
      • feature source whitelist flag (--sources)
      • feature label whitelist flag (--label-whitelist)
    • DaemonSet
      • sleep interval
  • Reset / de-initialization

Release version v0.1.0

Use the 0.1.0 release milestone to track burndown.

  • All OWNERS must LGTM the release proposal.
  • Update the job template to use the new tagged container image (PR #53)
  • An OWNER runs git tag -s $VERSION and insert the changelog into the tag description.
  • Build and push a container image with the same tag to quay.io.
  • Update the :latest virtual tag in quay.io to track the last stable (this) release.
  • An OWNER pushes the tag with git push $VERSION.
  • Write the change log into the Github release info.
  • Add a link to the tagged release in this issue.
  • An announcement email is sent to [email protected] with the
    subject [ANNOUNCE] node-feature-discovery $VERSION is released
  • Close this issue.

Question about the status of NFD

I think that NFD is fundamental in Kubernetes ecosystem. It will provide detailed cluster features for Kubernetes scheduler.

IMO, many issues and prs are pending because we need a beta release. So here is the question: when should we move NFD to beta?

  • What is the status about the NFD?
  • What is the milestone blueprint for the beta release of NFD?

I am still willing to help developing and enhancing NFD as a reviewer and maintainer just like many other satellite projects in Kubernets org.

can we add the value of cpu-architecture into the label format

Hi all,
I'm from ARM. Currently I want to enable NFD in Arm platform.
But it seems that there is no value of cpu-architecture in the label.
So can we re-design the label format.

Please see reference in Openstack. There is a similar project in openstack. It's called 'os_traits'.
The label format in openstack is like this: 'HW_CPU_X86_AVX512ER', 'HW_CPU_X86_AVX512CD',

Please see this link as more reference: https://docs.openstack.org/os-traits/latest/user/index.html

Validated NFD with recent kubernetes version (1.9.2), could add some doc.notes

Here are some notes from "a newcomer run attempt" validating NFD on top of
current k8s (1.9.2). Some doc (code?) changes could be made based on findings.

Systems configuration:
4 nodes as KVM-style VMs on top of x86_64 HW
Distro on all nodes: CentOS 7, stock "minimal server" config
Added repositories for:

  • Kubernetes: packages.cloud.google.com/yum/
  • Docker: download.docker.com/linux/centos/7

Kernel: 3.10.0-693.17.1.el7.x86_64
Docker: docker-ce-17.12.0.ce-1.el7.centos.x86_64
kube- ctl,adm,let versions: 1.9.2
Cluster initial creation by "kubeadm init" and "kubeadm join"

  1. NFD Building does not work behind proxy. This can be improved by PR #91, thanks to @marquiz for creating this PR.

  2. Tried stand-alone run as in README "Command Line Interface". This chapter does not describe how to run NFD as standalone. By checking Dockerfile it becomes clear that container activation is enough. Mentioning this in README would add clarity.

  3. Tried to run as cluster job. label-nodes.sh schedules a pod run on every node which is in "Ready" state. In a default-config cluster, the master node is tainted:NoSchedule which results in one of scheduled pods doomed to fail. If a clean run is desired, we should either schedule NUM-1 jobs leaving master out (master will not get labels, but if it remains NoSchedule that will be consistent, right), or Master config needs to be modified. We could add note in README, and/or comment in label-nodes.sh?

  4. A job run on a node will detect features as designed, but labels-add part will fail in a default-config cluster, caused by RBAC (Role-based Access Control) blocking get,set operations. RBAC was added after NFD was written? The first error will be:

can't get pods: pods "node-feature-discovery-f2v8m" is forbidden: User "system:serviceaccount:default:default" cannot get pods in the namespace "default"

To solve RBAC blockers:
Role access rights can be added using audit2rbac tool. Use of audit2rbac tool requires auditing configured. Roles adding needs 3 iterations to cover all stages of similar errors:

  • User cannot get pods in the namespace "default"
  • User cannot get nodes at the cluster scope
  • User cannot update nodes at the cluster scope

With role rights added, the NFD label adding starts to work.
The main level README could mention roles rights dependency.

  1. README states that "The discovery script will launch a job on each unlabeled node". But that seems not so: label-nodes.sh script does not check labeled vs. unlabeled nodes. It launches a cluster job with NumNodes desired completions. That means, job is always restarted on any Ready node.

  2. Mixed success case: If some node is not able to complete the successful run, for example if RBAC state (see point 4 above) is incorrect on some nodes and correct on others (happens for example if roles have been added using audit2rbac, followed by new node join), cluster will run repeated NFD jobs on "good" nodes in order to fulfill the number of desired completions, and marks overall job progress as good. This is misleading and incorrect as some nodes will remain unlabeled. To cover such mixed cases, more sophisticated logic like "give me exactly one good completion per node" will be required instead of simple "give me N overall completions". Doc change or separate issue based on this? Not sure does k8s provide easy API for such scheduling.

Discover nodes with SSD/Rotational hard drives

It is often important for I/O intensive workloads to utilize SSDs. Being able to select nodes with an SSD available will make operating kubernetes clusters much easier.
For other workloads a rotational hard drive may be good enough.

Consider moving to cpu flags

In a follow-up to #130, I started thinking why can't we rely on kernel's cpu flags more? That'd be more realistic view to what the node/kernel supports (HW support alone isn't necessarily what the node supports).

The additional benefit would be reduced dependencies and code maintenance: RDT detection tools and cpuid Go package would no longer be needed.

Improve SGX labels

Currently, the SGX label is added as true if the cpuid instruction says SGX is supported. However, it may be that the SGX leaf instructions are disabled from the feature control register making the label void (SXG=true but you cannot execute SGX enclaves).

Additionally, SGX Launch control feature bit is necessary as the label.

Create a SECURITY_CONTACTS file.

As per the email sent to kubernetes-dev[1], please create a SECURITY_CONTACTS
file.

The template for the file can be found in the kubernetes-template repository[2].
A description for the file is in the steering-committee docs[3], you might need
to search that page for "Security Contacts".

Please feel free to ping me on the PR when you make it, otherwise I will see when
you close this issue. :)

Thanks so much, let me know if you have any questions.

(This issue was generated from a tool, apologies for any weirdness.)

[1] https://groups.google.com/forum/#!topic/kubernetes-dev/codeiIoQ6QE
[2] https://github.com/kubernetes/kubernetes-template-project/blob/master/SECURITY_CONTACTS
[3] https://github.com/kubernetes/community/blob/master/committee-steering/governance/sig-governance-template-short.md

Discover node CPU hyper-threading status

This is a feature that is of binary nature, and can be detected by reading /sys filesystem, so I thought I can add it as next exercise. There are two values we can add: cpu-ht-capable, cpu-ht-enabled. The question however is, in which source should we expose, existing cpuid ? It's not strictly at same level with other cpuid flags, right? should there be another source "cpu" or would this add confusion?

Detect kernel config flags

Detect (and advertise) kernel config flags, such as CONFIG_NO_HZ, CONFIG_PREEMPT etc. This feature source will be configurable, allowing the user to specify which kernel config flags are detected.

Initially, I thought that this should be non-binary label, reflecting the true value of the kconfig flag. That is, the kconfig value would be directly used as the value of the node label – i.e. “y”, “m”, "32", “apparmor” etc. However, I'm not really sure if this a good idea. I'm thinking if the varying nature of these flags could cause more gray hairs than good for the end users. That's because the label selectors are dependent on exact values and small change in the kconfig could basically cause pod affinity problems (no node satisfying the request, anymore).

Alternatively, using binary labels, raises some questions, too: which kconfig flag values should be advertised as 'true'? One obvious answer is that only '=y' would be accepted and any other value (e.g. '=m' or '="64"') would be ignored.

Remove version string from published label key

There has been feedback from many people that including the version in the published label is bad because it causes affinity annotations in pod specs to be invalidated upon upgrading the version of node-feature-discovery.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.