kubernetes-sigs / node-feature-discovery Goto Github PK

View Code? Open in Web Editor NEW

734.0 31.0 232.0 102.2 MB

Node feature discovery for Kubernetes

License: Apache License 2.0

Makefile 1.02% Shell 2.32% Go 95.88% Dockerfile 0.15% Assembly 0.09% Mustache 0.41% Python 0.13%

kubernetes hardware feature-detection node-labels cpuid rdt k8s-sig-node hacktoberfest

node-feature-discovery's Introduction

Node Feature Discovery

Welcome to Node Feature Discovery – a Kubernetes add-on for detecting hardware features and system configuration!

See our Documentation for detailed instructions and reference

Quick-start – the short-short version

$ kubectl apply -k https://github.com/kubernetes-sigs/node-feature-discovery/deployment/overlays/default?ref=v0.16.2
  namespace/node-feature-discovery created
  customresourcedefinition.apiextensions.k8s.io/nodefeaturerules.nfd.k8s-sigs.io created
  customresourcedefinition.apiextensions.k8s.io/nodefeatures.nfd.k8s-sigs.io created
  serviceaccount/nfd-gc created
  serviceaccount/nfd-master created
  serviceaccount/nfd-worker created
  role.rbac.authorization.k8s.io/nfd-worker created
  clusterrole.rbac.authorization.k8s.io/nfd-gc created
  clusterrole.rbac.authorization.k8s.io/nfd-master created
  rolebinding.rbac.authorization.k8s.io/nfd-worker created
  clusterrolebinding.rbac.authorization.k8s.io/nfd-gc created
  clusterrolebinding.rbac.authorization.k8s.io/nfd-master created
  configmap/nfd-master-conf created
  configmap/nfd-worker-conf created
  deployment.apps/nfd-gc created
  deployment.apps/nfd-master created
  daemonset.apps/nfd-worker created

$ kubectl -n node-feature-discovery get all
  NAME                              READY   STATUS    RESTARTS   AGE
  pod/nfd-gc-565fc85d9b-94jpj       1/1     Running   0          18s
  pod/nfd-master-6796d89d7b-qccrq   1/1     Running   0          18s
  pod/nfd-worker-nwdp6              1/1     Running   0          18s
...

$ kubectl get no -o json | jq '.items[].metadata.labels'
  {
    "kubernetes.io/arch": "amd64",
    "kubernetes.io/os": "linux",
    "feature.node.kubernetes.io/cpu-cpuid.ADX": "true",
    "feature.node.kubernetes.io/cpu-cpuid.AESNI": "true",
...

node-feature-discovery's People

Contributors

Stargazers

Watchers

Forkers

connordoyle balajismaniam philips thed00de ircody linearregression arvinhub andyxning jaypipes lijohnso mbssaiakhil asifdxtreme mrangana spiffxp intelsdi-x alexxnica kryndex kilataban okartau marquiz everpeace chaitanyaenr lubinsz rajr0 poussa nustee kad ra489 keivnhuang adisky zvonkok epcim etsangsplk supalikashyap mirake figo steigr obedmr zeroc0d3 ksimon1 lubinszarm nirmoy dkozlov danp60 jjacobelli rkamudhan kofj riverzhang hemantra zjcbit askervin gaolitao isgasho adrianchiris jr0d mkumatag mgkwill yselkowitz mythi mmnelemane cliffburdick huanwei christ7622 mateusz-szelest toddtaoli arangogutierrez pbrownlow7 uniemimu datatonic renaudwastaken guilhem paroque28 adaptant-labs marobbo satishweb ailin258 matthewshotton hiroyan164 alexeyperevalov hacker-qian mrbobbytables qingwei8 swatisehgal cynepco3hahue dbaker-rh dsb3 michelpromonet goushicui yuanhuaiwang mdedonno1337 simonyangchao e0ne nolancon idnt-devops yjuns bfournie isabella232 qingshanyinyin moshe010 killianmuldoon

node-feature-discovery's Issues

Discover SR-IOV capable network interfaces

cc @thed00de @rkamudhan @swatisehgal @balajismaniam

Explore design for cells (named node classes)

As discussed in Kubernetes issue #28311,

From @jeremyeder:

Here's what I'd propose. The concept of a cell (bikeshed the name...)

Labels are dynamically adjusted per the system/node capabilities "fleecing pod" as per @ConnorDoyle proposal. Next is how to express that in a way that keeps the UX under control.

Propose to de-couple into two parts; one being sort of a cluster administrator job -- defining what a cell contains. It could be a logical group of nodes. A cell could be exclusive or there could be overlap. Not sure how else to describe it, but a cell might be described as a cgroup controllers for datacenters.

And so the admins would define a cell such as this below, which is the union of labels that the fleecer pod applies to a node, and that are well defined:
kind: Cell
apiVersion: v1
metadata:
    name: prod-trading
    daemonset: tunedaemonset
    annotations:
            psap.server.kubernetes.io/model: XYZ1234
            psap.network1.kubernetes.io/name: mgmt
            psap.network2.kubernetes.io/name: backup
            psap.network3.kubernetes.io/name: prod
            psap.passthrough.kubernetes.io/name: /dev/sdd,eth2
            psap.cpuset.mems.kubernetes.io/nodes: 1,exclusive
            psap.cpuset.cpus.kubernetes.io/cores: 2,4,6-11
            psap.memory.hugepages.kubernetes.io/hugetlb: 2MB
            psap.kernel.kubernetes.io/module: nvidia,onload
            psap.kernel.kubernetes.io/version: >= 3.10.0-229
            psap.tuned.kubernetes.io/profile: network-latency
And then ... the UX for a developer is very simple...just specify what cell you want to run on:
kind: Pod
apiVersion: v1
metadata:
    mame: matching-engine
    cell: prod-trading
    annotations:
            psap.kernel.kubernetes.io/ulimit: n:1048576
I should also say that @ConnorDoyle proposal covers over 90% of what we're envisioning, but there's at least one missing piece. A long-running daemon that ensures a node is "fit for service". Concept I'm calling in my head "extended-node-fitness".

In RHEL we have a daemon called "tuned" which handles system tuning based on workload. Tuned has a "verify" verb that goes through and ensures desired state matches what's in place. To avoid sprawl of infra pods/daemonsets I was thinking we could let this pod be long running and have the fitness test (in our case tuned) run.

There would need to be a call-back from this pod to the Kube scheduler if the fitness test ever failed (let's say, a rogue admin ssh'd in and flipped a few tunables that conflict with application needs). The tuned pod would inform the scheduler that it's out of sync and take some action:

re-enable the tuning

evict the node from the cluster

make it unschedulable or drain it

shut the node down hard

whatever else the app needs

Now ... I have a set of proposals and use-cases that center around delivering this and several other associated features. I'd love to talk with folks at KubeCon about it in more depth!

From @vikaschoudhary16:

+1 for cells. Additionallly, I think it would be helpful if api is available for user to compose a cell using kv pairs (capabilities) she thinks are desrired for her workload. Different cells could be composed for different workloads over same nodes and then accordingly mentioned in pod spec.

Wondering what could be the concerns if we trigger node-feature-discovery when a cell is composed to discover nodes with capabilities mentioned in the cell and labelling nodes with cell names.

Improve SGX labels

Currently, the SGX label is added as true if the cpuid instruction says SGX is supported. However, it may be that the SGX leaf instructions are disabled from the feature control register making the label void (SXG=true but you cannot execute SGX enclaves).

Additionally, SGX Launch control feature bit is necessary as the label.

Discover node CPU hyper-threading status

This is a feature that is of binary nature, and can be detected by reading /sys filesystem, so I thought I can add it as next exercise. There are two values we can add: cpu-ht-capable, cpu-ht-enabled. The question however is, in which source should we expose, existing cpuid ? It's not strictly at same level with other cpuid flags, right? should there be another source "cpu" or would this add confusion?

Detect kernel config flags

Detect (and advertise) kernel config flags, such as CONFIG_NO_HZ, CONFIG_PREEMPT etc. This feature source will be configurable, allowing the user to specify which kernel config flags are detected.

Initially, I thought that this should be non-binary label, reflecting the true value of the kconfig flag. That is, the kconfig value would be directly used as the value of the node label – i.e. “y”, “m”, "32", “apparmor” etc. However, I'm not really sure if this a good idea. I'm thinking if the varying nature of these flags could cause more gray hairs than good for the end users. That's because the label selectors are dependent on exact values and small change in the kconfig could basically cause pod affinity problems (no node satisfying the request, anymore).

Alternatively, using binary labels, raises some questions, too: which kconfig flag values should be advertised as 'true'? One obvious answer is that only '=y' would be accepted and any other value (e.g. '=m' or '="64"') would be ignored.

Discover TDP of the CPU

Detect the TDP of the CPU package of the node.

This feature label would require enabling non-binary labels.

Create a SECURITY_CONTACTS file.

As per the email sent to kubernetes-dev[1], please create a SECURITY_CONTACTS
file.

The template for the file can be found in the kubernetes-template repository[2].
A description for the file is in the steering-committee docs[3], you might need
to search that page for "Security Contacts".

Please feel free to ping me on the PR when you make it, otherwise I will see when
you close this issue. :)

Thanks so much, let me know if you have any questions.

(This issue was generated from a tool, apologies for any weirdness.)

[1] https://groups.google.com/forum/#!topic/kubernetes-dev/codeiIoQ6QE
[2] https://github.com/kubernetes/kubernetes-template-project/blob/master/SECURITY_CONTACTS
[3] https://github.com/kubernetes/community/blob/master/committee-steering/governance/sig-governance-template-short.md

Spike: GPU feature discovery for NVIDIA cards

I'd like to leverage NVML to detect specific features about NVIDIA cards on a given node.

Potential features:

GPU family
Architecture
Maybe video memory?

Get the node object using spec.NodeName downward env var.

We get the node object indirectly from the pod name using downward env var (see https://github.com/kubernetes-incubator/node-feature-discovery/blob/master/main.go#L212 and https://github.com/kubernetes-incubator/node-feature-discovery/blob/master/node-feature-discovery-job.json.template#L27). But spec.NodeName was added to the downward env vars. We should consider using it directly to get the node object.

Discover nodes with NUMA architecture

Software such as MongoDB or MySQL have known problems with NUMA architectures. See https://docs.mongodb.com/manual/administration/production-notes/#mongodb-and-numa-hardware and https://blog.jcole.us/2010/09/28/mysql-swap-insanity-and-the-numa-architecture/ for example.
It'd be useful to be able to select nodes without such architecture for such deployments.

Plugin mechanism for customer-specific labels

What: provide a way for customers to write their own feature detectors, creating custom labels.

Originally, I thought, it would be sufficient to implement custom feature sources as user-specific modules under source/ directory in nfd source code. The reasoning behind this idea was that the source modules and their API to "nfd core" are simple, so writing and maintaining a custom plugin would not be too cumbersome. However, this would effectively be forking nfd, requiring maintenance of your own code base and also building and using custom Docker images.

So, I was thinking about providing some relatively simple mechanism to have custom feature detectors, without the need to fork the code or use custom Docker images.

The first, obvious, idea would be to use hooks. E.g. look under /etc/kubernetes/node-feature-discovery/source.d/ and run all executable binaries from there. The hooks would return the detected features to stdout and have logging to stderr.

E.g.
running /etc/kubernetes/node-feature-discovery/source.d/my-source would have stdout output:

FEATURE_1
FEATURE_2

which, in turn, would cause nfd to create labels

node.alpha.kubernetes-incubator.io/nfd-my-source-FEATURE_1=true
node.alpha.kubernetes-incubator.io/nfd-my-source-FEATURE_2=true

This could be easily extended to non-binary labels, too

Consider refactoring node-feature-discovery to run as a daemonset

Instead of requiring the operator to configure a one-shot / periodic job, a node-feature-discovery daemon pod could be responsible for continuously fleecing node capabilities. This way eventually NFD could be included in cluster addons.

cc @jeremyeder @balajismaniam

Increase unit test statement coverage.

Right now the coverage is hovering around 42%. go test -cover

Parameterize RDT detection binary location.

NFD shouldn't necessarily assume it runs inside the container we build by default. In sources.go:

RDTBin = "/go/src/github.com/kubernetes-incubator/node-feature-discovery/rdt-discovery"

The program could get value from an environment variable instead (and the build can inject the value above when building the container).

org-wide CNCF CLA bot on kubernetes-incubator

Drawing attention to:https://groups.google.com/forum/#!topic/kubernetes-dev/Q9eNHdkIdQ0
As of tomorrow, the application of cncf labels and posting of the message in case someone has not signed it should be handled by our org-wide bot, and I would suggest that you disable the "cla" munger on the fork of mungegithub that you are currently running tomorrow evening, after we deploy. Otherwise, you're likely to see two sets of bots doing the same thing and possible interaction between them.

cc @balajismaniam @ConnorDoyle

Discover nodes with SSD/Rotational hard drives

It is often important for I/O intensive workloads to utilize SSDs. Being able to select nodes with an SSD available will make operating kubernetes clusters much easier.
For other workloads a rotational hard drive may be good enough.

moving to v1.0 beta release

Before we make node-feature-discovery to a beta release, we need to do things below finished.
Note: This check list is not thorough full. It may change in the future.

move import path from github.com/kubernetes-incubator to k8s.io.
make an agreement on the label schema about node-feature-discovery. #71
adjust node-feature-discovery to meet the requirements to be a Kubernetes repo described in kubernetes-template-project.
make an agreement on the release schedule (v1.0 beta and GA) for node-feature-discovery.
update client-go to newest stable version.
replace glide with godep.
decide which registry (quay.io or gcr.io) to use.
change node-feature-discovery image namespace from kubernetes-incubator to kubernetes.
/cc @ConnorDoyle @balajismaniam

Detect IOMMU

Detect if IOMMU is present (supported and enabled, by HW and the kernel).

Submit queue munger links to Kubernetes queue instead of this project's queue.

The "details" link that appears when the submit queue PR gate runs links to the wrong queue (upstream Kubernetes.)

cc @balajismaniam

Output currently going to stderr

Most (all?) of the print statements use log.Printf which prints to stderr by default. I would expect it to print to stdout for non-errors.

Release v0.2.0

There hasn't been a released version of NFD in nearly one and a half years. There have been some major changes and fixes since v0.1.0. I propose to release the current version as v0.2.0.

Release checklist:

Changelog

New features

Add support for Kubernetes 1.8+ (#82)
Adding SR-IOV capability discovery to node-feature-discovery (#49)
Enable nfd framework on Arm64 platform
Added nonrotational storage detection
Add memory source and NUMA detection.
Export proxy env vars inside docker build. Makes it possible to build node-feature-discovery from behind a proxy server, e.g. in corporate networks.
Advertise selinux status by adding labels
Make it possible to run nfd as a DaemonSet (#105)
Added more Intel RDT capability discovery: CMT,MBM,MBA (#120)
Added template spec for configuring RBAC (#126)

Misc fixes

Fix to report the correct version inside container.
Upated RDT Discovery() to use exit status
Clean-up RDT helper programs.
Improved unit test coverage
Add GoReport Card Widget
Update Community Code of Conduct
Improve GoReportCard metrics, both gofmt and golint (#90)
Get node name from spec.nodeName instead of indirectly from pod. (#93)
Remove hardcoding of nfd source path (#94)
Use a specific released version of intel-cmt-cat

Known issues

Occasional restarts when run as a DaemonSet (issue #122).

Spinning up of large number of erroneous pods

In case of some configuration errors e.g. SSL misconfiguration, Kubernetes jobs fails to successfully complete. It keeps on creating new pods and finally the ends up having ~5000 pods with Error state.
It should stop or timeout. A simple fix could be setting the RestartPolicy equal to OnFailure or Never. Are there any other recommendation of gracefully handling this error

Add e2e tests for node-feature-discovery

Tests could be similar to what's been implemented for node-problem-detector: https://github.com/kubernetes/kubernetes/blob/master/test/e2e/node_problem_detector.go

Built container wrongly describes version as "-dirty".

The .dockerignore file in the repo excludes the demo/ directory from the Docker context during build, which causes git to think there are uncommitted changes in the workspace.

Deployment script for NFD

A deployment/initialization script would make NFD deployment easier and hopefully less error-prone, especially for new users.

Tasks it could handle:

DaemonSet deployment
Job deployment
RBAC configuration
Configuration options
- Common
  - container image which to use
  - k8s namespace
  - feature source whitelist flag (--sources)
  - feature label whitelist flag (--label-whitelist)
- DaemonSet
  - sleep interval
Reset / de-initialization

Consider expanding the scope to discovering/publishing opaque integer resources.

Per the original NFD scope document that was circulated at project incubation time, this project only handles "binary" (there-or-not) attributes.

Since then, alpha support for opaque integer resources was added to Kubernetes. See Kubernetes PR 31652.

Should we expand the scope to allow NFD sources to advertise new kinds of consumable resources?

For one concrete usage example, consider extending issue #35 (Discover SR-IOV capable network interfaces) to allow allocatable virtual functions. In this case, there is work in progress to isolate VFs in a CNI plugin.

Detection of nodes in Ready state in label-nodes.sh fails too easily

label-nodes.sh calculates nodes in Ready state using this:
kubectl get nodes | grep -i ready
but this parsing method is too simple.
Few examples of strings which produce a false "ready"
in context of label-nodes.sh expectation:

node-1 NotReady 164d v1.10.2
node-2 Ready,SchedulingDisabled 165d v1.10.2

As first idea, I propose a small parsing change to
accept only clean "ready" with spaces on each side:

kubectl get nodes | grep -i ' ready '

Note that this may leave out nodes where some other non-blocking attribute
is appended, so we may need something more complex.
This whole nodes detection could be cleaner via some deterministic API instead of parsing state lines,
but is there any easily available command, without need to write even more parsing code? At least
I don't see better ways using kubectl options

Update demo to match published labels.

Published labels have changed (#24, #27) since the demo was written. cc @balajismaniam

Graceful failure if discovery from a source fails

Currently because of this line whenever a source returns an error on it's Discover() call it causes nfd to fail.

I was wondering if it would it make sense to log the failure and proceed to discover features/add labels that nfd is able to instead of hard failing on any source failure?

Host node-feature-discovery image on quay.io

Create an organization for kubernetes-incubator
Build & push image
Update scripts, job template, and documentation

Question about the status of NFD

I think that NFD is fundamental in Kubernetes ecosystem. It will provide detailed cluster features for Kubernetes scheduler.

IMO, many issues and prs are pending because we need a beta release. So here is the question: when should we move NFD to beta?

What is the status about the NFD?
What is the milestone blueprint for the beta release of NFD?

I am still willing to help developing and enhancing NFD as a reviewer and maintainer just like many other satellite projects in Kubernets org.

mock*.go (generated) files process undocumented

Being new to project, I tried to improve gofmt/golint metrics in mock*.go files, and I was told these files are generated, and manual changes should be avoided. There is no mention of generated files in documentation, also I don't find any code in repo that deals with generation. Who/when/how generates those and what triggers need to refresh?
Doesn't (undocumented) manual generate & commit process create a risk that mock*.so elements get out of sync?
Proposals:

Can we document the mock*.so files maintenance process
Idea: Avoid storing mock* as source elements and instead
generate such files on-the-fly when needed, if possible?

P.S.
mockapihelpers.go has few commits added which touch only small parts so seem like manually made. With manual changes, is that file now in mixed mode?

Release version v0.1.0

Use the 0.1.0 release milestone to track burndown.

Detect Kernel and OS version

These would help in placement of workloads which are dependent on certain OS or kernel version.

I proposed implementing these in kubelet (Issue #65850), but, that request seems unlikely to be accepted. This feature could be implemented in NFD, instead.

Kernel version can be read from /proc/sys/kernel/osrelease. Kernel version would be split into multiple labels, in order to make it more practical to use as labelselectors need exact values, and, no comparison operators (i.e. "greater than" etc.) can be used. Suggested labels, with example values:

node.alpha.kubernetes-incubator.io/nfd-kernel-version=4.4.0-112-generic
node.alpha.kubernetes-incubator.io/nfd-kernel-version-x=4
node.alpha.kubernetes-incubator.io/nfd-kernel-version-xy=4.4
node.alpha.kubernetes-incubator.io/nfd-kernel-version-xyz=4.4.0

An initial RFD implementation of kernel version detection is available at:
https://github.com/marquiz/node-feature-discovery/tree/feature/kernel-version

OS version would be read from /etc/os-release. At least ID and VERSION_ID should be advertised, I think:

node.alpha.kubernetes-incubator.io/nfd-os-release-ID=ubuntu
node.alpha.kubernetes-incubator.io/nfd-os-release-VERSION_ID=16.04

An initial RFD implementation of OS version detection is available at:
https://github.com/marquiz/node-feature-discovery/tree/feature/os-release

Graduate from incubator

Criteria for graduating from the incubator.

Incubation date: 29 August 2016
Incubation deadline: 29 August 2017

When the OWNERS of the Incubator project determine they have met the criteria to graduate they should contact their Champion and Sponsor to discuss. If both the Sponsor and Champion agree that the above criteria have been met the project can graduate to become a Kubernetes Community Project and exit the Kubernetes Incubator.

TODO: Link sub-issues to gather evidence for each topic below.

Documented users: the project tracks evidence from downloads, mailing lists, Twitter, blogs, GitHub issues, etc that the project has gained a user base.
Regular releases: the project is making releases at least once every 3 months; although release should happen more often.
Incubation time: a minimum of 6 months has passed since incubation (March 2017)
Healthy community: a healthy number of users, contributors, and/or OWNERS outside of your own company have participated in the project. The Sponsor and Champion can make this call.
Roadmap: A roadmap for the next release is maintained; this can be in the form of a doc, GitHub milestones, or any other tool but it must be documented in the README.
An announcement of graduation must be sent to the [email protected] mailing list by the Champion.

Remove version string from published label key

There has been feedback from many people that including the version in the published label is bad because it causes affinity annotations in pod specs to be invalidated upon upgrading the version of node-feature-discovery.

"Failure updating labels" sometimes happens, exits daemonset-mode pod

When one-shot job is turned into daemon, few things should get double-checked: resource leaking and stability (i.e. exit points) as these are minor issues for single shot but more serious for daemon.
K8s is good in restarting pods, but that's not excuse to allow low quality.
So I keep NFD daemonset running to check for both.
There is weak sign of slow mem.growth pattern, but it may get leveled at longer run, so I planned to keep it up for long monitoring. But no luck there, as nfd pod exits and gets restarted about once per 24h. Next, I tried shortened cycle time from 60 seconds in hope I see mem.pattern more quickly, but that does not work as the exit rate follows: with 1-sec cycle rate I see one exit in about 1h.

I ran pod log in -f mode to capture the exit messages, and these are:

2018/04/18 11:00:27 can't update node: Operation cannot be fulfilled on nodes "k8-vm-2": the object has been modified; please apply your changes to the latest version and try again
2018/04/18 11:00:27 failed to advertise labels: Operation cannot be fulfilled on nodes "k8-vm-2": the object has been modified; please apply your changes to the latest version and try again
2018/04/18 11:00:27 error occurred while updating node with feature labels: Operation cannot be fulfilled on nodes "k8-vm-2": the object has been modified; please apply your changes to the latest version and try again

That issue has likely always been there, but probability hitting it was low, it has not surfaced before.
Seems there is window during which node labels update is sentenced to fail?
Can that state be detected somehow so that we don't try to update?
Or, should we retry after some delay once such condition is hit?
Or, is this sign of some other problem somewhere else?
In any case, we should try to do better than exit, now that we can run as daemonset.

can we add the value of cpu-architecture into the label format

Hi all,
I'm from ARM. Currently I want to enable NFD in Arm platform.
But it seems that there is no value of cpu-architecture in the label.
So can we re-design the label format.

Please see reference in Openstack. There is a similar project in openstack. It's called 'os_traits'.
The label format in openstack is like this： 'HW_CPU_X86_AVX512ER', 'HW_CPU_X86_AVX512CD',

Please see this link as more reference: https://docs.openstack.org/os-traits/latest/user/index.html

Investigate quay.io vulnerability report

See https://quay.io/repository/kubernetes_incubator/node-feature-discovery/image/51a3065f6dcaed9a067b407a4a71a8fa34e27b90580df3312f2a15e5632e3686?tab=vulnerabilities

Hook up repo management automation

CLA bot
Merge bot
CI
Mungers

Document the release process

Update RELEASE.md with specific information about this repo (how to publish docker images, etc.)

Update RDT detection helper programs to use exit status instead of magic strings to stdout.

Low priority tech debt / cleanup. The two C programs in this repo that are used to detect Intel Resource Director features emit a string "DETECTED", which is tested in the feature source. Testing the exit code would be more idiomatic.

Consider updating the label namespace

Right now NFD publishes labels that begin with node.alpha.intel.com.
Does it make sense to change this to node.alpha.kubernetes-incubator.io?

cc @davidopp @timothysc @balajismaniam

Consider using client-go instead of the unversioned client.

Right now NFD uses the unversioned client to talk to the K8s cluster. We should consider using the client-go (https://github.com/kubernetes/client-go) instead.

node-feature-discovery should delete non-discovered feature labels

For example, if NFD runs when turbo-boost is turned on and then later runs when turbo-boost is turned off, the label should no longer indicate that turbo-boost is enabled on that node.

cc @balajismaniam

Container image size is huge

With the build instructions given, the resulting docker container becomes 1.2GB in size. This is huge.

Improve the Dockerfile to only ship the resulting node-feature-discovery and whatever else may be needed and drop the build environment parts.

Output detailed help for feature sources.

Something like this:

$ node-feature-discovery help cpuid
<detailed help>

Detect if "die clustering" is enabled

Cluster-on-die (COD) and sub-NUMA-Cluster (SNC) are technologies where a physical NUMA node is further split into separate NUMA domains, taking advantage of multiple memory controllers per physical CPU socket. COD and SNC improve the performance of highly NUMA optimized workloads.

These settings are not directly detectable from Linux user space. However, we can detect the number of NUMA nodes and number of physical IDs of CPUs. If the number of NUMA nodes is greater than the number of physical IDs, it is likely that COD/SNC is enabled. There are corner cases like numa emulation (which isn't likely be used in a k8s cluster).

Validated NFD with recent kubernetes version (1.9.2), could add some doc.notes

Here are some notes from "a newcomer run attempt" validating NFD on top of
current k8s (1.9.2). Some doc (code?) changes could be made based on findings.

Systems configuration:
4 nodes as KVM-style VMs on top of x86_64 HW
Distro on all nodes: CentOS 7, stock "minimal server" config
Added repositories for:

Kubernetes: packages.cloud.google.com/yum/
Docker: download.docker.com/linux/centos/7

Kernel: 3.10.0-693.17.1.el7.x86_64
Docker: docker-ce-17.12.0.ce-1.el7.centos.x86_64
kube- ctl,adm,let versions: 1.9.2
Cluster initial creation by "kubeadm init" and "kubeadm join"

NFD Building does not work behind proxy. This can be improved by PR #91, thanks to @marquiz for creating this PR.
Tried stand-alone run as in README "Command Line Interface". This chapter does not describe how to run NFD as standalone. By checking Dockerfile it becomes clear that container activation is enough. Mentioning this in README would add clarity.
Tried to run as cluster job. label-nodes.sh schedules a pod run on every node which is in "Ready" state. In a default-config cluster, the master node is tainted:NoSchedule which results in one of scheduled pods doomed to fail. If a clean run is desired, we should either schedule NUM-1 jobs leaving master out (master will not get labels, but if it remains NoSchedule that will be consistent, right), or Master config needs to be modified. We could add note in README, and/or comment in label-nodes.sh?
A job run on a node will detect features as designed, but labels-add part will fail in a default-config cluster, caused by RBAC (Role-based Access Control) blocking get,set operations. RBAC was added after NFD was written? The first error will be:

can't get pods: pods "node-feature-discovery-f2v8m" is forbidden: User "system:serviceaccount:default:default" cannot get pods in the namespace "default"

To solve RBAC blockers:
Role access rights can be added using audit2rbac tool. Use of audit2rbac tool requires auditing configured. Roles adding needs 3 iterations to cover all stages of similar errors:

User cannot get pods in the namespace "default"
User cannot get nodes at the cluster scope
User cannot update nodes at the cluster scope

With role rights added, the NFD label adding starts to work.
The main level README could mention roles rights dependency.

README states that "The discovery script will launch a job on each unlabeled node". But that seems not so: label-nodes.sh script does not check labeled vs. unlabeled nodes. It launches a cluster job with NumNodes desired completions. That means, job is always restarted on any Ready node.
Mixed success case: If some node is not able to complete the successful run, for example if RBAC state (see point 4 above) is incorrect on some nodes and correct on others (happens for example if roles have been added using audit2rbac, followed by new node join), cluster will run repeated NFD jobs on "good" nodes in order to fulfill the number of desired completions, and marks overall job progress as good. This is misleading and incorrect as some nodes will remain unlabeled. To cover such mixed cases, more sophisticated logic like "give me exactly one good completion per node" will be required instead of simple "give me N overall completions". Doc change or separate issue based on this? Not sure does k8s provide easy API for such scheduling.

Consider moving to cpu flags

In a follow-up to #130, I started thinking why can't we rely on kernel's cpu flags more? That'd be more realistic view to what the node/kernel supports (HW support alone isn't necessarily what the node supports).

The additional benefit would be reduced dependencies and code maintenance: RDT detection tools and cpuid Go package would no longer be needed.