Code Monkey home page Code Monkey logo

sriov-cni's Introduction

Build Status Go Report Card Weekly minutes Coverage Status

SR-IOV CNI plugin

This plugin enables the configuration and usage of SR-IOV VF networks in containers and orchestrators like Kubernetes.

Network Interface Cards (NICs) with SR-IOV capabilities are managed through physical functions (PFs) and virtual functions (VFs). A PF is used by the host and usually represents a single NIC port. VF configurations are applied through the PF. With SR-IOV CNI each VF can be treated as a separate network interface, assigned to a container, and configured with it's own MAC, VLAN, IP and more.

SR-IOV CNI plugin works with SR-IOV device plugin for VF allocation in Kubernetes. A metaplugin such as Multus gets the allocated VF's deviceID(PCI address) and is responsible for invoking the SR-IOV CNI plugin with that deviceID.

Build

This plugin uses Go modules for dependency management and requires Go 1.17+ to build.

To build the plugin binary:

make

Upon successful build the plugin binary will be available in build/sriov.

Kubernetes Quick Start

A full guide on orchestrating SR-IOV virtual functions in Kubernetes can be found at the SR-IOV Device Plugin project.

Creating VFs is outside the scope of the SR-IOV CNI plugin. More information about allocating VFs on different NICs can be found here

To deploy SR-IOV CNI by itself on a Kubernetes 1.16+ cluster:

kubectl apply -f images/sriov-cni-daemonset.yaml

Note The above deployment is not sufficient to manage and configure SR-IOV virtual functions. See the full orchestration guide for more information.

Usage

SR-IOV CNI networks are commonly configured using Multus and SR-IOV Device Plugin using Network Attachment Definitions. More information about configuring Kubernetes networks using this pattern can be found in the Multus configuration reference document.

A Network Attachment Definition for SR-IOV CNI takes the form:

apiVersion: "k8s.cni.cncf.io/v1"
kind: NetworkAttachmentDefinition
metadata:
  name: sriov-net1
  annotations:
    k8s.v1.cni.cncf.io/resourceName: intel.com/intel_sriov_netdevice
spec:
  config: '{
  "type": "sriov",
  "cniVersion": "0.3.1",
  "name": "sriov-network",
  "ipam": {
    "type": "host-local",
    "subnet": "10.56.217.0/24",
    "routes": [{
      "dst": "0.0.0.0/0"
    }],
    "gateway": "10.56.217.1"
  }
}'

The .spec.config field contains the configuration information used by the SR-IOV CNI.

Basic configuration parameters

The following parameters are generic parameters which are not specific to the SR-IOV CNI configuration, though (with the exception of ipam) they need to be included in the config.

  • cniVersion : the version of the CNI spec used.
  • type : CNI plugin used. "sriov" corresponds to SR-IOV CNI.
  • name : the name of the network created.
  • ipam (optional) : the configuration of the IP Address Management plugin. Required to designate an IP for a kernel interface.

Example configurations

The following examples show the config needed to set up basic SR-IOV networking in a container. Each of the json config objects below can be placed in the .spec.config field of a Network Attachment Definition to integrate with Multus.

Kernel driver config

This is the minimum configuration for a working kernel driver interface using an SR-IOV Virtual Function. It applies an IP address using the host-local IPAM plugin in the range of the subnet provided.

{
  "type": "sriov",
  "cniVersion": "0.3.1",
  "name": "sriov-network",
  "ipam": {
    "type": "host-local",
    "subnet": "10.56.217.0/24",
    "routes": [{
      "dst": "0.0.0.0/0"
    }],
    "gateway": "10.56.217.1"
  }
}

Extended kernel driver config

This configuration sets a number of extra parameters that may be key for SR-IOV networks including a vlan tag, disabled spoof checking and enabled trust mode. These parameters are commonly set in more advanced SR-IOV VF based networks.

{
  "cniVersion": "0.3.1",
  "name": "sriov-advanced",
  "type": "sriov",
  "vlan": 1000,
  "spoofchk": "off",
  "trust": "on",
  "ipam": {
    "type": "host-local",
    "subnet": "10.56.217.0/24",
    "routes": [{
      "dst": "0.0.0.0/0"
    }],
    "gateway": "10.56.217.1"
  }
}

DPDK userspace driver config

The below config will configure a VF using a userspace driver (uio/vfio) for use in a container. If this plugin is used with a VF bound to a dpdk driver then the IPAM configuration will still be respected, but it will only allocate IP address(es) using the specified IPAM plugin, not apply the IP address(es) to container interface. Other config parameters should be applicable but implementation may be driver specific.

{
    "cniVersion": "0.3.1",
    "name": "sriov-dpdk",
    "type": "sriov",
    "vlan": 1000
}

Note DHCP IPAM plugin can not be used for VF bound to a dpdk driver (uio/vfio).

Note When VLAN is not specified in the Network-Attachment-Definition, or when it is given a value of 0, VFs connected to this network will have no vlan tag.

Advanced Configuration

SR-IOV CNI allows the setting of other SR-IOV options such as link-state and quality of service parameters. To learn more about how these parameters are set consult the SR-IOV CNI configuration reference guide

Contributing

To report a bug or request a feature, open an issue on this repo using one of the available templates.

sriov-cni's People

Contributors

adilghaffardev avatar adrianchiris avatar ahalimx86 avatar amorenoz avatar andreaskaris avatar booxter avatar cathy-zhou avatar cyclinder avatar dependabot[bot] avatar eoghan1232 avatar frbimo avatar hustcat avatar ii2day avatar killianmuldoon avatar martinkennelly avatar michalguzieniuk avatar mlguerrero12 avatar mmduh-483 avatar mrymsza avatar nvity avatar pmossakx avatar przemeklal avatar rkamudhan avatar samcandy avatar schseba avatar vpickard avatar wizhaoredhat avatar yulng avatar zeeke avatar zshi-redhat avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

sriov-cni's Issues

Issue with vlan

i am using this plugin for Kubernetes POD having DPDK based application. There are 10 virtual functions defined for the physical NIC. For the first POD, three virtual functions are allocated. First one is without any vlan tagging and second & third ones are defined with vlan fields. POD came up fine and we can see three virtual functions are bind by uio_igb driver. "ip link show" commands show VF-0 without any VLAN, VF-1 with vlan 1925 and VF-2 with vlan 1926. ( Here vlan 1925 and vlan 1926 as defined. )

Now we start the second POD. Two virtual function are allocated for that. Both will be with vlan. SRIOV plugin to find out next available virtual function which is VF-3 and VF-4. But we can see its allocates VF-0 for this second POD also and due to this connectivity of the first POD is broken.

This problem seems to be with untagged vlan used for first POD. If no untagged vlan used, this works fine ie for the second POD, VF-3 and VF-4 is assigned,

Supporting SR-IOV non-capable devices

Hi,

I know it goes a bit against the name of this CNI, but are there any plans on supporting devices that are not using SR-IOV? For my team it would be useful in a few different scenarios.

I noticed that Intel SRIOV network device plugin that this CNI works with seems to support non-capable network devices from looking at the README (seems to have been updated recently though). The CNI itself does not seem to support it, however, so are there any plans for it in the future?

Multus: error in invoke Delegate add - "sriov", invalid argument

Just started using the multus-cni, sriov-cni, sriov-device-plugin.

Multus has 2 delegates, Calico on default network, sriov on another.

sriov-device-plugin could see the VFs, "intel.com/sriov_net_A": "16"

$ kubectl get node mtx-huawei2-bld05 -o json | jq '.status.allocatable'
{
"cpu": "64",
"ephemeral-storage": "48294789041",
"hugepages-1Gi": "0",
"hugepages-2Mi": "0",
"intel.com/sriov_net_A": "16",
"intel.com/sriov_net_B": "0",
"memory": "196389160Ki",
"pods": "110"
}

multus conf:

cat 05-multus.conf

{
"name": "multus-cni-network",
"type": "multus",
"delegates": [
{
"name": "k8s-pod-network",
"cniVersion": "0.3.0",
"plugins": [
{
"type": "calico",
"log_level": "info",
"datastore_type": "kubernetes",
"nodename": "mtx-huawei2-bld05",
"mtu": 1440,
"ipam": {
"type": "host-local",
"subnet": "usePodCidr"
},
"policy": {
"type": "k8s"
},
"kubernetes": {
"kubeconfig": "/etc/cni/net.d/calico-kubeconfig"
}
},
{
"type": "portmap",
"snat": true,
"capabilities": {"portMappings": true}
}
]
}
],
"kubeconfig": "/etc/cni/net.d/multus.d/multus.kubeconfig"
}

sriov crd:

cat sriov-nad-4.yaml

apiVersion: "k8s.cni.cncf.io/v1"
kind: NetworkAttachmentDefinition
metadata:
name: sriov-net-4
namespace: mtx-test10
annotations:
k8s.v1.cni.cncf.io/resourceName: intel.com/sriov_net_A
spec:
config: '{
"type": "sriov",
"name": "sriov-network-4",
"ipam": {
"type": "host-local",
"subnet": "10.64.217.0/24",
"routes": [{
"dst": "0.0.0.0/0"
}],
"gateway": "10.64.217.1"
}
}'

sriov-device-plugin resources:

cat configMap.yaml

apiVersion: v1
kind: ConfigMap
metadata:
name: sriovdp-config
namespace: kube-system
data:
config.json: |
{
"resourceList": [{
"resourceName": "sriov_net_A",
"selectors": {
"vendors": ["14e4"],
"devices": ["16af"],
"drivers": ["bnx2x"],
"pfNames": ["eno31"]
}
}
]
}

pod yaml:

cat pod-tc1.yaml

apiVersion: v1
kind: Pod
metadata:
name: testpod1
namespace: mtx-test10
annotations:
k8s.v1.cni.cncf.io/networks: sriov-net-4
spec:
containers:

  • name: appcntr1
    image: centos/tools
    imagePullPolicy: IfNotPresent
    command: [ "/bin/bash", "-c", "--" ]
    args: [ "while true; do sleep 300000; done;" ]
    resources:
    requests:
    intel.com/sriov_net_A: '1'
    limits:
    intel.com/sriov_net_A: '1'
    nodeName: mtx-huawei2-bld05

$ kubectl describe pod testpod1 -n mtx-test10
...
...
Warning FailedCreatePodSandBox 3m28s (x296 over 42m) kubelet, mtx-huawei2-bld05 (combined from similar events): Failed create pod sandbox: rpc error: code = Unknown desc = [failed to set up sandbox container "1bc793ca392c9745f0ce6352f01bd86bb509567308416b09e02ad9a0011ca70c" network for pod "testpod1": NetworkPlugin cni failed to set up pod "testpod1_mtx-test10" network: Multus: Err adding pod to network "sriov-network-4": Multus: error in invoke Delegate add - "sriov": failed to setup netlink device enp59s4f1 "invalid argument", failed to clean up sandbox container "1bc793ca392c9745f0ce6352f01bd86bb509567308416b09e02ad9a0011ca70c" network for pod "testpod1": NetworkPlugin cni failed to teardown pod "testpod1_mtx-test10" network: Multus: error in invoke Delegate del - "sriov": failed to lookup vf device "net1": Link not found]

/var/log/messages reported:
do-change-link[82]: failure changing link: failure 22 (Invalid argument)

Jun 3 15:37:07 mtx-huawei2-bld05 kernel: [444586.545494] IPv6: ADDRCONF(NETDEV_UP): eth0: link is not ready
Jun 3 15:37:07 mtx-huawei2-bld05 kubelet[82494]: 2019-06-03 15:37:07.824 [INFO][24437] network.go 380: Disabling IPv4 forwarding ContainerID="fa008dddb917c7d21b86f52555b1641dd03d6f08404af0032c2b48a4d8988389" Namespace="mtx-test10" Pod="testpod1" WorkloadEndpoint="mtx--huawei2--bld05-k8s-testpod1-eth0"
Jun 3 15:37:07 mtx-huawei2-bld05 kernel: [444586.569626] IPv6: ADDRCONF(NETDEV_UP): cali38aee219176: link is not ready
Jun 3 15:37:07 mtx-huawei2-bld05 kubelet[82494]: 2019-06-03 15:37:07.848 [INFO][24437] k8s.go 392: Added Mac, interface name, and active container ID to endpoint ContainerID="fa008dddb917c7d21b86f52555b1641dd03d6f08404af0032c2b48a4d8988389" Namespace="mtx-test10" Pod="testpod1" WorkloadEndpoint="mtx--huawei2--bld05-k8s-testpod1-eth0" endpoint=&v3.WorkloadEndpoint{TypeMeta:v1.TypeMeta{Kind:"WorkloadEndpoint", APIVersion:"projectcalico.org/v3"}, ObjectMeta:v1.ObjectMeta{Name:"mtx--huawei2--bld05-k8s-testpod1-eth0", GenerateName:"", Namespace:"mtx-test10", SelfLink:"", UID:"1ab4bff0-8650-11e9-bd09-14579fa1adef", ResourceVersion:"9279897", Generation:0, CreationTimestamp:v1.Time{Time:time.Time{wall:0x0, ext:63695198221, loc:(*time.Location)(0x1ede720)}}, DeletionTimestamp:(*v1.Time)(nil), DeletionGracePeriodSeconds:(*int64)(nil), Labels:map[string]string{"projectcalico.org/serviceaccount":"default", "projectcalico.org/namespace":"mtx-test10", "projectcalico.org/orchestrator":"k8s"}, Annotations:map[string]string(nil), OwnerReferences:[]v1.OwnerReference(nil), Initializers:(*v1.Initializers)(nil), Finalizers:[]string(nil), ClusterName:""}, Spec:v3.WorkloadEndpointSpec{Orchestrator:"k8s", Workload:"", Node:"mtx-huawei2-bld05", ContainerID:"fa008dddb917c7d21b86f52555b1641dd03d6f08404af0032c2b48a4d8988389", Pod:"testpod1", Endpoint:"eth0", IPNetworks:[]string{"10.100.11.135/32"}, IPNATs:[]v3.IPNAT(nil), IPv4Gateway:"", IPv6Gateway:"", Profiles:[]string{"kns.mtx-test10", "ksa.mtx-test10.default"}, InterfaceName:"cali38aee219176", MAC:"36:da:3a:16:f9:62", Ports:[]v3.EndpointPort(nil)}}
Jun 3 15:37:07 mtx-huawei2-bld05 kernel: [444586.570825] IPv6: ADDRCONF(NETDEV_CHANGE): cali38aee219176: link becomes ready
Jun 3 15:37:07 mtx-huawei2-bld05 kernel: [444586.571065] IPv6: ADDRCONF(NETDEV_CHANGE): eth0: link becomes ready
Jun 3 15:37:07 mtx-huawei2-bld05 kubelet[82494]: 2019-06-03 15:37:07.850 [INFO][24437] k8s.go 424: Wrote updated endpoint to datastore ContainerID="fa008dddb917c7d21b86f52555b1641dd03d6f08404af0032c2b48a4d8988389" Namespace="mtx-test10" Pod="testpod1" WorkloadEndpoint="mtx--huawei2--bld05-k8s-testpod1-eth0"
Jun 3 15:37:07 mtx-huawei2-bld05 NetworkManager[28813]: [1559601427.8607] device (cali38aee219176): carrier: link connected
Jun 3 15:37:07 mtx-huawei2-bld05 NetworkManager[28813]: [1559601427.8909] manager: (cali38aee219176): new Veth device (/org/freedesktop/NetworkManager/Devices/2925)
Jun 3 15:37:07 mtx-huawei2-bld05 NetworkManager[28813]: [1559601427.8990] device (enp59s4f1): state change: disconnected -> unmanaged (reason 'removed', sys-iface-state: 'removed')
Jun 3 15:37:07 mtx-huawei2-bld05 NetworkManager[28813]: [1559601427.9197] manager: (dev82): new Ethernet device (/org/freedesktop/NetworkManager/Devices/2926)
Jun 3 15:37:07 mtx-huawei2-bld05 NetworkManager[28813]: [1559601427.9321] device (enp59s4f1): state change: unmanaged -> unavailable (reason 'managed', sys-iface-state: 'external')
Jun 3 15:37:07 mtx-huawei2-bld05 kernel: [444586.655546] IPv6: ADDRCONF(NETDEV_UP): enp59s4f1: link is not ready
Jun 3 15:37:07 mtx-huawei2-bld05 NetworkManager[28813]: [1559601427.9372] platform-linux: do-change-link[82]: failure changing link: failure 22 (Invalid argument)
Jun 3 15:37:07 mtx-huawei2-bld05 NetworkManager[28813]: [1559601427.9384] device (enp59s4f1): state change: unavailable -> disconnected (reason 'none', sys-iface-state: 'managed')
Jun 3 15:37:07 mtx-huawei2-bld05 kernel: [444586.660476] IPv6: ADDRCONF(NETDEV_UP): enp59s4f1: link is not ready
Jun 3 15:37:07 mtx-huawei2-bld05 kubelet[82494]: 2019-06-03 15:37:07.982 [INFO][24537] calico.go 442: Extracted identifiers ContainerID="fa008dddb917c7d21b86f52555b1641dd03d6f08404af0032c2b48a4d8988389" Node="mtx-huawei2-bld05" Orchestrator="k8s" WorkloadEndpoint="mtx--huawei2--bld05-k8s-testpod1-eth0"
Jun 3 15:37:07 mtx-huawei2-bld05 kubelet[82494]: 2019-06-03 15:37:07.987 [WARNING][24537] workloadendpoint.go 70: Operation Delete is not supported on WorkloadEndpoint type

Allow to rename VF in container when using DHCP IPAM and Multus

By changing the CNI_IFNAME enviroment variable to the VF's new name to
support using intel-multus with renaming interface, when multus move the
interface to the container it will name the first VF eth0 and then
follow the pattern of netX where X start from 0, so when the interface
moved with sriov-cni name mutlus will not find the interface with the
name it set

Create veth interfaces to mirror VF details for userspace mode

@ahalim-intel @rkamudhan

When using device plugin in userspace mode, it would be nice to have sriov-cni create dummy interfaces in the network namespace with the mac details matching the VF, and the ipam results applied. This way dpdk apps can look for the information in a generic non hostpath/files-sharing way and increase the usability of userspace mode greatly.

https://intel-corp-team.slack.com/archives/C4C5RSEER/p1547232544029500

MaxSharedVf functionality not explained

MaxSharedVf is documented and the value is set 2.
Is this a config param that can be set? Could you provide some documentation/explanation on sharedVF?
Is this something to do with a VF being shared on with different VLANs or with and without VLAN?
What is the use case?

Release VF if cmdAdd failed after setupVF

After moving VF to container check if error occurs then move the VF back
to original namespace.

this code can help solving the issue in cmdAdd

err = setupVF(n, n.IF0, args.IfName, args.ContainerID, netns)
defer func() {
	if err != nil {
		err = netns.Do(func(_ ns.NetNS) error {
			_, err := netlink.LinkByName(args.IfName)
			return err
		})
		if err == nil {
			releaseVF(n, args.IfName, args.ContainerID, netns)
		}
	}
}()

VF configured/attached with unsupported operations in net-attach-def doesn't gets de-configured on pod deletion

Problem Statement: If an unsupported operation is added to network attachment definition, it allows for configuration of Virtual Function during pod creation and the configuration remains even on pod deletion.

The pod never gets created with such a network attachment definition. The end result is someone has to delete the pod.

This is a problem in multi-tenant environment where tenants may be given rights to configure network attachment definitions and they may trying configuring some settings which can leave VF in an undesired state.

Expectations:
On pod deletion, the configurations that were set using net-attach-defintions on Virtual functions are reverted back.

Steps to reproduce:

$ #Get the pool size for resource name - intel_sriov_dpdk
$ kubectl describe nodes compute-1 | grep -A15 Capacity
Capacity:
 cpu:                              16
 ephemeral-storage:                238721632Ki
 hugepages-1Gi:                    210Gi
 intel.com/intel_sriov_dpdk:       16
 intel.com/intel_sriov_netdevice:  8
 memory:                           264162932Ki
 pods:                             110
Allocatable:
 cpu:                              16
 ephemeral-storage:                220005855687
 hugepages-1Gi:                    210Gi
 intel.com/intel_sriov_dpdk:       16
 intel.com/intel_sriov_netdevice:  8
 memory:                           43859572Ki
 pods:                             110
$
$ #Current resource allocations
$ kubectl describe nodes compute-1 | grep -A9 Allocated
Allocated resources:
  (Total limits may be over 100 percent, i.e., overcommitted.)
  Resource                         Requests   Limits
  --------                         --------   ------
  cpu                              350m (2%)  100m (0%)
  memory                           50Mi (0%)  50Mi (0%)
  ephemeral-storage                0 (0%)     0 (0%)
  intel.com/intel_sriov_dpdk       1          1
  intel.com/intel_sriov_netdevice  1          1
Events:                            <none>
$
$ #current VF assignments - before creation of faulty pod
$ ip link show enp68s0f0
4: enp68s0f0: <BROADCAST,MULTICAST> mtu 1500 qdisc mq state DOWN mode DEFAULT group default qlen 1000
    link/ether a0:36:9f:26:f9:30 brd ff:ff:ff:ff:ff:ff
    vf 0 MAC 00:00:00:00:00:00, spoof checking on, link-state auto, trust off, query_rss off
    vf 1 MAC 00:00:00:00:00:00, spoof checking on, link-state auto, trust off, query_rss off
    vf 2 MAC 00:00:00:00:00:00, spoof checking on, link-state auto, trust off, query_rss off
    vf 3 MAC 00:00:00:00:00:00, spoof checking on, link-state auto, trust off, query_rss off
    vf 4 MAC 00:00:00:00:00:00, spoof checking on, link-state auto, trust off, query_rss off
    vf 5 MAC 00:00:00:00:00:00, spoof checking on, link-state auto, trust off, query_rss off
    vf 6 MAC 00:00:00:00:00:00, spoof checking on, link-state auto, trust off, query_rss off
    vf 7 MAC 00:00:00:00:00:00, spoof checking on, link-state auto, trust off, query_rss off
    vf 8 MAC 00:00:00:00:00:00, spoof checking on, link-state auto, trust off, query_rss off
    vf 9 MAC 00:00:00:00:00:00, spoof checking on, link-state auto, trust off, query_rss off
    vf 10 MAC 00:00:00:00:00:00, spoof checking on, link-state auto, trust off, query_rss off
    vf 11 MAC 00:00:00:00:00:00, spoof checking on, link-state auto, trust off, query_rss off
    vf 12 MAC 00:00:00:00:00:00, spoof checking on, link-state auto, trust off, query_rss off
    vf 13 MAC 00:00:00:00:00:00, spoof checking on, link-state auto, trust off, query_rss off
    vf 14 MAC 00:00:00:00:00:00, spoof checking on, link-state auto, trust off, query_rss off
    vf 15 MAC 00:00:00:00:00:00, spoof checking on, link-state auto, trust off, query_rss off
    vf 16 MAC 00:00:00:00:00:00, spoof checking on, link-state auto, trust off, query_rss off
    vf 17 MAC 00:00:00:00:00:00, spoof checking on, link-state auto, trust off, query_rss off
    vf 18 MAC 00:00:00:00:00:00, vlan 1000, spoof checking off, link-state auto, trust on, query_rss off
    vf 19 MAC 00:00:00:00:00:00, spoof checking on, link-state auto, trust off, query_rss off
    vf 20 MAC 00:00:00:00:00:00, spoof checking on, link-state auto, trust off, query_rss off
    vf 21 MAC 00:00:00:00:00:00, spoof checking on, link-state auto, trust off, query_rss off
    vf 22 MAC 00:00:00:00:00:00, spoof checking on, link-state auto, trust off, query_rss off
    vf 23 MAC 00:00:00:00:00:00, spoof checking on, link-state auto, trust off, query_rss off
    vf 24 MAC 00:00:00:00:00:00, spoof checking on, link-state auto, trust off, query_rss off
    vf 25 MAC 00:00:00:00:00:00, spoof checking on, link-state auto, trust off, query_rss off
    vf 26 MAC 00:00:00:00:00:00, spoof checking on, link-state auto, trust off, query_rss off
    vf 27 MAC 00:00:00:00:00:00, spoof checking on, link-state auto, trust off, query_rss off
    vf 28 MAC 00:00:00:00:00:00, spoof checking on, link-state auto, trust off, query_rss off
    vf 29 MAC 00:00:00:00:00:00, spoof checking on, link-state auto, trust off, query_rss off
    vf 30 MAC 00:00:00:00:00:00, spoof checking on, link-state auto, trust off, query_rss off
    vf 31 MAC 00:00:00:00:00:00, spoof checking on, link-state auto, trust off, query_rss off
$
$ #Definition of network attachment definition - with link state (probably not a supported function on ixgbevf)
$ cat intel-sriov-dpdk-nad-faulty.yaml 
apiVersion: "k8s.cni.cncf.io/v1"
kind: NetworkAttachmentDefinition
metadata:
  name: intel-sriov-dpdk-nad-faulty
  annotations:
    k8s.v1.cni.cncf.io/resourceName: intel.com/intel_sriov_dpdk
spec:
  config: '{
  "type": "sriov",
  "cniVersion": "0.3.1",
  "name": "sriov-network",
  "vlan": 1000,
  "spoofchk": "off",
  "trust": "on",
  "link_state": "enable"
}'
$
$ #Definition of pod using the network attachment definition
$ cat sriov-testpod-dpdk-faulty.yaml 
apiVersion: v1
kind: Pod
metadata:
  name: sriov-testpod-dpdk-faulty
  annotations:
    k8s.v1.cni.cncf.io/networks: intel-sriov-dpdk-nad-faulty 
spec:
  containers:
  - name: appcntr1 
    image: centos/tools 
    imagePullPolicy: IfNotPresent
    command: [ "/bin/bash", "-c", "--" ]
    args: [ "while true; do sleep 300000; done;" ]
    resources:
      requests:
        intel.com/intel_sriov_dpdk: '1'
      limits:
        intel.com/intel_sriov_dpdk: '1'
$
$ #Create network attachment definition
$ kubectl create -f intel-sriov-dpdk-nad-faulty.yaml 
networkattachmentdefinition.k8s.cni.cncf.io/intel-sriov-dpdk-nad-faulty created
$
$ #Get all network attachment defintion. The newly created with name "intel-sriov-dpdk-nad-faulty"
$ kubectl get net-attach-def
NAME                          AGE
intel-sriov-dpdk-nad-faulty   7s
sriov-dev-net1                23h
sriov-dpdk-net1               23h
$
$ #Create a pod with intel-sriov-dpdk-nad-faulty network attachment definition
$ kubectl create -f sriov-testpod-dpdk-faulty.yaml 
pod/sriov-testpod-dpdk-faulty created
$
$ #Get the status of all the pods
$ kubectl get po -A | grep sriov
default        sriov-testpod-dpdk                                      1/1     Running             0          45m
default        sriov-testpod-dpdk-faulty                               0/1     ContainerCreating   0          10s
default        sriov-testpod1                                          1/1     Running             0          23h
kube-system    kube-sriov-cni-ds-amd64-8wmzf                           1/1     Running             0          24h
kube-system    kube-sriov-cni-ds-amd64-lc4hp                           1/1     Running             0          24h
kube-system    kube-sriov-device-plugin-amd64-kqhgz                    1/1     Running             0          23h
kube-system    kube-sriov-device-plugin-amd64-prpfj                    1/1     Running             0          8h
$
$ #describe the pod status 
$ kubectl describe po sriov-testpod-dpdk-faulty 
Name:         sriov-testpod-dpdk-faulty
Namespace:    default
Priority:     0
Node:         compute-1/70.151.43.232
Start Time:   Fri, 13 Mar 2020 02:21:01 +0000
Labels:       <none>
Annotations:  cni.projectcalico.org/podIP: 172.16.100.107/32
              k8s.v1.cni.cncf.io/networks: intel-sriov-dpdk-nad-faulty
              k8s.v1.cni.cncf.io/networks-status: 
Status:       Pending
IP:           
IPs:          <none>
Containers:
  appcntr1:
    Container ID:  
    Image:         centos/tools
    Image ID:      
    Port:          <none>
    Host Port:     <none>
    Command:
      /bin/bash
      -c
      --
    Args:
      while true; do sleep 300000; done;
    State:          Waiting
      Reason:       ContainerCreating
    Ready:          False
    Restart Count:  0
    Limits:
      intel.com/intel_sriov_dpdk:  1
    Requests:
      intel.com/intel_sriov_dpdk:  1
    Environment:                   <none>
    Mounts:
      /var/run/secrets/kubernetes.io/serviceaccount from default-token-pkm2n (ro)
Conditions:
  Type              Status
  Initialized       True 
  Ready             False 
  ContainersReady   False 
  PodScheduled      True 
Volumes:
  default-token-pkm2n:
    Type:        Secret (a volume populated by a Secret)
    SecretName:  default-token-pkm2n
    Optional:    false
QoS Class:       BestEffort
Node-Selectors:  <none>
Tolerations:     node.kubernetes.io/not-ready:NoExecute for 300s
                 node.kubernetes.io/unreachable:NoExecute for 300s
Events:
  Type     Reason                  Age               From                  Message
  ----     ------                  ----              ----                  -------
  Normal   Scheduled               62s               default-scheduler     Successfully assigned default/sriov-testpod-dpdk-faulty to compute-1
  Warning  FailedCreatePodSandBox  54s               kubelet, compute-1  Failed create pod sandbox: rpc error: code = Unknown desc = [failed to set up sandbox container "412de788aeb04c5cd151c7815dfd16ccd99f65df35a738f65ad46e74ed23bb86" network for pod "sriov-testpod-dpdk-faulty": networkPlugin cni failed to set up pod "sriov-testpod-dpdk-faulty_default" network: Multus: [default/sriov-testpod-dpdk-faulty]: error adding container to network "sriov-network": delegateAdd: error invoking DelegateAdd - "sriov": error in getting result from AddNetwork: SRIOV-CNI failed to configure VF "failed to set vf 28 link state to 1: operation not supported", failed to clean up sandbox container "412de788aeb04c5cd151c7815dfd16ccd99f65df35a738f65ad46e74ed23bb86" network for pod "sriov-testpod-dpdk-faulty": networkPlugin cni failed to teardown pod "sriov-testpod-dpdk-faulty_default" network: delegateDel: error invoking DelegateDel - "sriov": error in getting result from DelNetwork: error reading cached NetConf in /var/lib/cni/sriov with name 412de788aeb04c5cd151c7815dfd16ccd99f65df35a738f65ad46e74ed23bb86-net1]
  Warning  FailedCreatePodSandBox  46s               kubelet, compute-1  Failed create pod sandbox: rpc error: code = Unknown desc = [failed to set up sandbox container "d8263d4b1f71eaefc02e5db7b2486dea86567564d2e156b20fe8ea65273498e8" network for pod "sriov-testpod-dpdk-faulty": networkPlugin cni failed to set up pod "sriov-testpod-dpdk-faulty_default" network: Multus: [default/sriov-testpod-dpdk-faulty]: error adding container to network "sriov-network": delegateAdd: error invoking DelegateAdd - "sriov": error in getting result from AddNetwork: SRIOV-CNI failed to configure VF "failed to set vf 28 link state to 1: operation not supported", failed to clean up sandbox container "d8263d4b1f71eaefc02e5db7b2486dea86567564d2e156b20fe8ea65273498e8" network for pod "sriov-testpod-dpdk-faulty": networkPlugin cni failed to teardown pod "sriov-testpod-dpdk-faulty_default" network: delegateDel: error invoking DelegateDel - "sriov": error in getting result from DelNetwork: error reading cached NetConf in /var/lib/cni/sriov with name d8263d4b1f71eaefc02e5db7b2486dea86567564d2e156b20fe8ea65273498e8-net1]
  Warning  FailedCreatePodSandBox  38s               kubelet, compute-1  Failed create pod sandbox: rpc error: code = Unknown desc = [failed to set up sandbox container "3adb32a0702d6476dca9d106c96610c756517e8abf6266cd2408231a3479df6e" network for pod "sriov-testpod-dpdk-faulty": networkPlugin cni failed to set up pod "sriov-testpod-dpdk-faulty_default" network: Multus: [default/sriov-testpod-dpdk-faulty]: error adding container to network "sriov-network": delegateAdd: error invoking DelegateAdd - "sriov": error in getting result from AddNetwork: SRIOV-CNI failed to configure VF "failed to set vf 28 link state to 1: operation not supported", failed to clean up sandbox container "3adb32a0702d6476dca9d106c96610c756517e8abf6266cd2408231a3479df6e" network for pod "sriov-testpod-dpdk-faulty": networkPlugin cni failed to teardown pod "sriov-testpod-dpdk-faulty_default" network: delegateDel: error invoking DelegateDel - "sriov": error in getting result from DelNetwork: error reading cached NetConf in /var/lib/cni/sriov with name 3adb32a0702d6476dca9d106c96610c756517e8abf6266cd2408231a3479df6e-net1]
  Warning  FailedCreatePodSandBox  30s               kubelet, compute-1  Failed create pod sandbox: rpc error: code = Unknown desc = [failed to set up sandbox container "a239769ec3b60cf7db81d590a996165b95ab23e75cbd17a578bd067508cef678" network for pod "sriov-testpod-dpdk-faulty": networkPlugin cni failed to set up pod "sriov-testpod-dpdk-faulty_default" network: Multus: [default/sriov-testpod-dpdk-faulty]: error adding container to network "sriov-network": delegateAdd: error invoking DelegateAdd - "sriov": error in getting result from AddNetwork: SRIOV-CNI failed to configure VF "failed to set vf 28 link state to 1: operation not supported", failed to clean up sandbox container "a239769ec3b60cf7db81d590a996165b95ab23e75cbd17a578bd067508cef678" network for pod "sriov-testpod-dpdk-faulty": networkPlugin cni failed to teardown pod "sriov-testpod-dpdk-faulty_default" network: delegateDel: error invoking DelegateDel - "sriov": error in getting result from DelNetwork: error reading cached NetConf in /var/lib/cni/sriov with name a239769ec3b60cf7db81d590a996165b95ab23e75cbd17a578bd067508cef678-net1]
  Warning  FailedCreatePodSandBox  22s               kubelet, compute-1  Failed create pod sandbox: rpc error: code = Unknown desc = [failed to set up sandbox container "46abcd7d2d18a8d1b90ee2764ab92626df67a15bed0a3171f479f6641b23ba56" network for pod "sriov-testpod-dpdk-faulty": networkPlugin cni failed to set up pod "sriov-testpod-dpdk-faulty_default" network: Multus: [default/sriov-testpod-dpdk-faulty]: error adding container to network "sriov-network": delegateAdd: error invoking DelegateAdd - "sriov": error in getting result from AddNetwork: SRIOV-CNI failed to configure VF "failed to set vf 28 link state to 1: operation not supported", failed to clean up sandbox container "46abcd7d2d18a8d1b90ee2764ab92626df67a15bed0a3171f479f6641b23ba56" network for pod "sriov-testpod-dpdk-faulty": networkPlugin cni failed to teardown pod "sriov-testpod-dpdk-faulty_default" network: delegateDel: error invoking DelegateDel - "sriov": error in getting result from DelNetwork: error reading cached NetConf in /var/lib/cni/sriov with name 46abcd7d2d18a8d1b90ee2764ab92626df67a15bed0a3171f479f6641b23ba56-net1]
  Warning  FailedCreatePodSandBox  13s               kubelet, compute-1  Failed create pod sandbox: rpc error: code = Unknown desc = [failed to set up sandbox container "5835a83db2c1b6f24a97e1d2ea901b4163459d54d99a37b5d153fe634d91243d" network for pod "sriov-testpod-dpdk-faulty": networkPlugin cni failed to set up pod "sriov-testpod-dpdk-faulty_default" network: Multus: [default/sriov-testpod-dpdk-faulty]: error adding container to network "sriov-network": delegateAdd: error invoking DelegateAdd - "sriov": error in getting result from AddNetwork: SRIOV-CNI failed to configure VF "failed to set vf 28 link state to 1: operation not supported", failed to clean up sandbox container "5835a83db2c1b6f24a97e1d2ea901b4163459d54d99a37b5d153fe634d91243d" network for pod "sriov-testpod-dpdk-faulty": networkPlugin cni failed to teardown pod "sriov-testpod-dpdk-faulty_default" network: delegateDel: error invoking DelegateDel - "sriov": error in getting result from DelNetwork: error reading cached NetConf in /var/lib/cni/sriov with name 5835a83db2c1b6f24a97e1d2ea901b4163459d54d99a37b5d153fe634d91243d-net1]
  Warning  FailedCreatePodSandBox  5s                kubelet, compute-1  Failed create pod sandbox: rpc error: code = Unknown desc = [failed to set up sandbox container "0e1d5bd3d2cdd128e3cfe33959881a7065605de37cc3319e3f489641ecefa720" network for pod "sriov-testpod-dpdk-faulty": networkPlugin cni failed to set up pod "sriov-testpod-dpdk-faulty_default" network: Multus: [default/sriov-testpod-dpdk-faulty]: error adding container to network "sriov-network": delegateAdd: error invoking DelegateAdd - "sriov": error in getting result from AddNetwork: SRIOV-CNI failed to configure VF "failed to set vf 28 link state to 1: operation not supported", failed to clean up sandbox container "0e1d5bd3d2cdd128e3cfe33959881a7065605de37cc3319e3f489641ecefa720" network for pod "sriov-testpod-dpdk-faulty": networkPlugin cni failed to teardown pod "sriov-testpod-dpdk-faulty_default" network: delegateDel: error invoking DelegateDel - "sriov": error in getting result from DelNetwork: error reading cached NetConf in /var/lib/cni/sriov with name 0e1d5bd3d2cdd128e3cfe33959881a7065605de37cc3319e3f489641ecefa720-net1]
  Normal   SandboxChanged          4s (x7 over 54s)  kubelet, compute-1  Pod sandbox changed, it will be killed and re-created.
$
$ #pod stucks in ContainerCreating state with error reason "ailed to set vf 28 link state to 1: operation not supported"
$
$ #Check the state of Virtual Functions - vlan 28 is now configured with vlan 1000, trust mode on and spoof check off
$ ip link show enp68s0f0
4: enp68s0f0: <BROADCAST,MULTICAST> mtu 1500 qdisc mq state DOWN mode DEFAULT group default qlen 1000
    link/ether a0:36:9f:26:f9:30 brd ff:ff:ff:ff:ff:ff
    vf 0 MAC 00:00:00:00:00:00, spoof checking on, link-state auto, trust off, query_rss off
    vf 1 MAC 00:00:00:00:00:00, spoof checking on, link-state auto, trust off, query_rss off
    vf 2 MAC 00:00:00:00:00:00, spoof checking on, link-state auto, trust off, query_rss off
    vf 3 MAC 00:00:00:00:00:00, spoof checking on, link-state auto, trust off, query_rss off
    vf 4 MAC 00:00:00:00:00:00, spoof checking on, link-state auto, trust off, query_rss off
    vf 5 MAC 00:00:00:00:00:00, spoof checking on, link-state auto, trust off, query_rss off
    vf 6 MAC 00:00:00:00:00:00, spoof checking on, link-state auto, trust off, query_rss off
    vf 7 MAC 00:00:00:00:00:00, spoof checking on, link-state auto, trust off, query_rss off
    vf 8 MAC 00:00:00:00:00:00, spoof checking on, link-state auto, trust off, query_rss off
    vf 9 MAC 00:00:00:00:00:00, spoof checking on, link-state auto, trust off, query_rss off
    vf 10 MAC 00:00:00:00:00:00, spoof checking on, link-state auto, trust off, query_rss off
    vf 11 MAC 00:00:00:00:00:00, spoof checking on, link-state auto, trust off, query_rss off
    vf 12 MAC 00:00:00:00:00:00, spoof checking on, link-state auto, trust off, query_rss off
    vf 13 MAC 00:00:00:00:00:00, spoof checking on, link-state auto, trust off, query_rss off
    vf 14 MAC 00:00:00:00:00:00, spoof checking on, link-state auto, trust off, query_rss off
    vf 15 MAC 00:00:00:00:00:00, spoof checking on, link-state auto, trust off, query_rss off
    vf 16 MAC 00:00:00:00:00:00, spoof checking on, link-state auto, trust off, query_rss off
    vf 17 MAC 00:00:00:00:00:00, spoof checking on, link-state auto, trust off, query_rss off
    vf 18 MAC 00:00:00:00:00:00, vlan 1000, spoof checking off, link-state auto, trust on, query_rss off
    vf 19 MAC 00:00:00:00:00:00, spoof checking on, link-state auto, trust off, query_rss off
    vf 20 MAC 00:00:00:00:00:00, spoof checking on, link-state auto, trust off, query_rss off
    vf 21 MAC 00:00:00:00:00:00, spoof checking on, link-state auto, trust off, query_rss off
    vf 22 MAC 00:00:00:00:00:00, spoof checking on, link-state auto, trust off, query_rss off
    vf 23 MAC 00:00:00:00:00:00, spoof checking on, link-state auto, trust off, query_rss off
    vf 24 MAC 00:00:00:00:00:00, spoof checking on, link-state auto, trust off, query_rss off
    vf 25 MAC 00:00:00:00:00:00, spoof checking on, link-state auto, trust off, query_rss off
    vf 26 MAC 00:00:00:00:00:00, spoof checking on, link-state auto, trust off, query_rss off
    vf 27 MAC 00:00:00:00:00:00, spoof checking on, link-state auto, trust off, query_rss off
    vf 28 MAC 00:00:00:00:00:00, vlan 1000, spoof checking off, link-state auto, trust on, query_rss off
    vf 29 MAC 00:00:00:00:00:00, spoof checking on, link-state auto, trust off, query_rss off
    vf 30 MAC 00:00:00:00:00:00, spoof checking on, link-state auto, trust off, query_rss off
    vf 31 MAC 00:00:00:00:00:00, spoof checking on, link-state auto, trust off, query_rss off
$
$ #Get the current allocated pool 
$ kubectl describe nodes compute-1 | grep -A9 Allocated
Allocated resources:
  (Total limits may be over 100 percent, i.e., overcommitted.)
  Resource                         Requests   Limits
  --------                         --------   ------
  cpu                              350m (2%)  100m (0%)
  memory                           50Mi (0%)  50Mi (0%)
  ephemeral-storage                0 (0%)     0 (0%)
  intel.com/intel_sriov_dpdk       2          2
  intel.com/intel_sriov_netdevice  1          1
Events:                            <none>

Now, even after pod deletion the settings for VF are still intact.

After pod deletion:
==================
$ Deleting the faulty pod
$ kubectl delete po sriov-testpod-dpdk-faulty
pod "sriov-testpod-dpdk-faulty" deleted
$
$ #pod has been deleted now
$ kubectl get po -A | grep sriov
default        sriov-testpod-dpdk                                      1/1     Running            0          67m
default        sriov-testpod1                                          1/1     Running            0          24h
kube-system    kube-sriov-cni-ds-amd64-8wmzf                           1/1     Running            0          24h
kube-system    kube-sriov-cni-ds-amd64-lc4hp                           1/1     Running            0          24h
kube-system    kube-sriov-device-plugin-amd64-kqhgz                    1/1     Running            0          24h
kube-system    kube-sriov-device-plugin-amd64-prpfj                    1/1     Running            0          9h
$
$ #Check the status of VF. It still has vlan 1000 attached to it
$ ip link show enp68s0f0
4: enp68s0f0: <BROADCAST,MULTICAST> mtu 1500 qdisc mq state DOWN mode DEFAULT group default qlen 1000
    link/ether a0:36:9f:26:f9:30 brd ff:ff:ff:ff:ff:ff
    vf 0 MAC 00:00:00:00:00:00, spoof checking on, link-state auto, trust off, query_rss off
    vf 1 MAC 00:00:00:00:00:00, spoof checking on, link-state auto, trust off, query_rss off
    vf 2 MAC 00:00:00:00:00:00, spoof checking on, link-state auto, trust off, query_rss off
    vf 3 MAC 00:00:00:00:00:00, spoof checking on, link-state auto, trust off, query_rss off
    vf 4 MAC 00:00:00:00:00:00, spoof checking on, link-state auto, trust off, query_rss off
    vf 5 MAC 00:00:00:00:00:00, spoof checking on, link-state auto, trust off, query_rss off
    vf 6 MAC 00:00:00:00:00:00, spoof checking on, link-state auto, trust off, query_rss off
    vf 7 MAC 00:00:00:00:00:00, spoof checking on, link-state auto, trust off, query_rss off
    vf 8 MAC 00:00:00:00:00:00, spoof checking on, link-state auto, trust off, query_rss off
    vf 9 MAC 00:00:00:00:00:00, spoof checking on, link-state auto, trust off, query_rss off
    vf 10 MAC 00:00:00:00:00:00, spoof checking on, link-state auto, trust off, query_rss off
    vf 11 MAC 00:00:00:00:00:00, spoof checking on, link-state auto, trust off, query_rss off
    vf 12 MAC 00:00:00:00:00:00, spoof checking on, link-state auto, trust off, query_rss off
    vf 13 MAC 00:00:00:00:00:00, spoof checking on, link-state auto, trust off, query_rss off
    vf 14 MAC 00:00:00:00:00:00, spoof checking on, link-state auto, trust off, query_rss off
    vf 15 MAC 00:00:00:00:00:00, spoof checking on, link-state auto, trust off, query_rss off
    vf 16 MAC 00:00:00:00:00:00, spoof checking on, link-state auto, trust off, query_rss off
    vf 17 MAC 00:00:00:00:00:00, spoof checking on, link-state auto, trust off, query_rss off
    vf 18 MAC 00:00:00:00:00:00, vlan 1000, spoof checking off, link-state auto, trust on, query_rss off
    vf 19 MAC 00:00:00:00:00:00, spoof checking on, link-state auto, trust off, query_rss off
    vf 20 MAC 00:00:00:00:00:00, spoof checking on, link-state auto, trust off, query_rss off
    vf 21 MAC 00:00:00:00:00:00, spoof checking on, link-state auto, trust off, query_rss off
    vf 22 MAC 00:00:00:00:00:00, spoof checking on, link-state auto, trust off, query_rss off
    vf 23 MAC 00:00:00:00:00:00, spoof checking on, link-state auto, trust off, query_rss off
    vf 24 MAC 00:00:00:00:00:00, spoof checking on, link-state auto, trust off, query_rss off
    vf 25 MAC 00:00:00:00:00:00, spoof checking on, link-state auto, trust off, query_rss off
    vf 26 MAC 00:00:00:00:00:00, spoof checking on, link-state auto, trust off, query_rss off
    vf 27 MAC 00:00:00:00:00:00, spoof checking on, link-state auto, trust off, query_rss off
    vf 28 MAC 00:00:00:00:00:00, vlan 1000, spoof checking off, link-state auto, trust on, query_rss off
    vf 29 MAC 00:00:00:00:00:00, spoof checking on, link-state auto, trust off, query_rss off
    vf 30 MAC 00:00:00:00:00:00, spoof checking on, link-state auto, trust off, query_rss off
    vf 31 MAC 00:00:00:00:00:00, spoof checking on, link-state auto, trust off, query_rss off
$
$ #Allocated pool gets populated properly - incrementing dpdk pool on deletion of pod
$ kubectl describe nodes compute-1 | grep -A9 Allocated
Allocated resources:
  (Total limits may be over 100 percent, i.e., overcommitted.)
  Resource                         Requests   Limits
  --------                         --------   ------
  cpu                              350m (2%)  100m (0%)
  memory                           50Mi (0%)  50Mi (0%)
  ephemeral-storage                0 (0%)     0 (0%)
  intel.com/intel_sriov_dpdk       1          1
  intel.com/intel_sriov_netdevice  1          1
Events:                            <none>
Component Version
SR-IOV CNI Plugin v2.2
Multus v3.4
SR-IOV Network Device Plugin v3.1
Kubernetes v1.16.2
OS Ubuntu 18.04.1 kernel version 4.15.0-88-generic

Config Files

Config file locations may be config dependent - SR-IOV config

  config.json: |
    {
        "resourceList": [{
                "resourceName": "intel_sriov_dpdk",
                "selectors": {
                    "vendors": ["8086"],
                    "devices": ["10ed"],
                    "drivers": ["vfio-pci"],
                    "pfNames": ["enp67s0f1","enp68s0f0"]
                }
            }
        ]
    }
Multus config (Try '/etc/cni/multus/net.d')
{
  "name": "multus-cni-network",
  "type": "multus",
  "cniVersion": "0.3.1",
  "kubeconfig": "/etc/cni/net.d/multus.d/multus.kubeconfig",
  "confDir": "/etc/cni/net.d",
  "cniDir": "/var/lib/cni/multus",
  "binDir": "/opt/cni/bin",
  "logFile": "/var/log/multus.log",
  "logLevel": "panic",
  "capabilities": {
    "portMappings": true
  },
  "readinessindicatorfile": "",
  "namespaceIsolation": false,
  "clusterNetwork": "k8s-pod-network",
  "defaultNetwork": [],
  "systemNamespaces": ["kube-system"]
}

Support for PFOnly mode and deviceID compatability

I have a use-case of passing PFs directly to pods/VMs and I tested out different CNI options:
Upstream SRIOV CNI supports a pfOnly option and is functional but is not deviceID aware- so it is hard to use with SRIOV DP.
Ehost-device CNI is functional and deviceID aware. It essentially moves the root device identified by the Device plugin into the pod’s namespace and configures IPAM - functionally similar to Intel SR IOV CNI.
Intel SRIOV CNI doesn’t support pfOnly as yet. #8 was raised previously, but it doesn't quite work. This is an RFE requesting for this support

VLAN is not set if VF is chosen across PFs

I've created a POD by attaching two VFs as net device from the same network using resource pool configuration using SR-IOV network device plugin. I can see POD is created with two VFs from two different PFs as per the selection criteria, but only one VF is set with VLAN ID and other is not.

Here is the net-attach-def:

$ kubectl describe net-attach-def sriov-net1
Name:         sriov-net1
Namespace:    default
Labels:       <none>
Annotations:  k8s.v1.cni.cncf.io/resourceName: intel.com/intel_sriov_netdevice
API Version:  k8s.cni.cncf.io/v1
Kind:         NetworkAttachmentDefinition
Metadata:
  Creation Timestamp:  2019-07-05T10:02:40Z
  Generation:          1
  Resource Version:    412456
  Self Link:           /apis/k8s.cni.cncf.io/v1/namespaces/default/network-attachment-definitions/sriov-net1
  UID:                 064ea8c8-9f0c-11e9-9f49-3cfdfe9eac40
Spec:
  Config:  { "type": "sriov", "name": "sriov-network", "vlan": 2222, "ipam": { "type": "host-local", "subnet": "10.56.217.0/24", "routes": [{ "dst": "0.0.0.0/0" }], "gateway": "10.56.217.1" } }
Events:    <none>

POD definition:

$ kubectl get pod -o yaml testpod2
apiVersion: v1
kind: Pod
metadata:
  annotations:
    k8s.v1.cni.cncf.io/networks: sriov-net1, sriov-net1
    k8s.v1.cni.cncf.io/networks-status: |-
      [{
          "name": "k8s-pod-network",
          "ips": [
              "192.168.162.237"
          ],
          "default": true,
          "dns": {}
      },{
          "name": "sriov-network",
          "dns": {}
      },{
          "name": "sriov-network",
          "dns": {}
      }]
  creationTimestamp: "2019-07-05T11:08:34Z"
  name: testpod2
  namespace: default
  resourceVersion: "418786"
  selfLink: /api/v1/namespaces/default/pods/testpod2
  uid: 3ac54a80-9f15-11e9-9f49-3cfdfe9eac40
spec:
  containers:
......

ip link show output:

$ ip link show ens3f0
2: ens3f0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq portid 3cfdfe9eac40 state UP mode DEFAULT group default qlen 1000
    link/ether 3c:fd:fe:9e:ac:40 brd ff:ff:ff:ff:ff:ff
    vf 0 MAC fe:6e:17:e5:8f:30, vlan 2222, spoof checking on, link-state auto, trust off
    vf 1 MAC 6e:6e:2e:47:d3:ac, spoof checking on, link-state auto, trust off
    vf 2 MAC ee:45:d9:a2:0b:03, spoof checking on, link-state auto, trust off
    vf 3 MAC 00:00:00:00:00:00, spoof checking on, link-state auto, trust off
$ ip link show ens3f3
9: ens3f3: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq portid 3cfdfe9eac43 state UP mode DEFAULT group default qlen 1000
    link/ether 3c:fd:fe:9e:ac:43 brd ff:ff:ff:ff:ff:ff
    vf 0 MAC 00:00:00:00:00:00, spoof checking on, link-state auto, trust off
    vf 1 MAC 00:00:00:00:00:00, spoof checking on, link-state auto, trust off
root@dl380-006-ECCD-SUT:~/cnis/sriov-network-device-plugin/deployments# ip link show ens3f3

In this case, vf 0 from ens3f0 and vf 1 from ens3f3 are chosen for vf passthrough, but vf 1 is not having VLAN Id 2222 configured.

When both vf's chosen from same interface ens3f0, then i can see VLAN ID configured on both vfs like below.

$ ip link show ens3f0
2: ens3f0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq portid 3cfdfe9eac40 state UP mode DEFAULT group default qlen 1000
    link/ether 3c:fd:fe:9e:ac:40 brd ff:ff:ff:ff:ff:ff
    vf 0 MAC fe:6e:17:e5:8f:30, spoof checking on, link-state auto, trust off
    vf 1 MAC 6e:6e:2e:47:d3:ac, spoof checking on, link-state auto, trust off
    vf 2 MAC ee:45:d9:a2:0b:03, vlan 2222, spoof checking on, link-state auto, trust off
    vf 3 MAC 00:00:00:00:00:00, vlan 2222, spoof checking on, link-state auto, trust off

Incorrect documentation

README.md still has original form from time of fork from hustcat repo, while name of parameters was changed quite long time ago.

Correct example can be found in multus docs, but IMO README.md in this repo should also have at least rename of master to if0 and so on.

Plugin Returns Error

Hi, I'm trying to get the SR-IOV plugin working with multus, and no luck so far. When I deploy, I get the error:

  Warning  FailedCreatePodSandBox  73s   kubelet, s011  Failed create pod sandbox: rpc error: code = Unknown desc = [failed to set up sandbox container "af8b92d1581875f0eedfce9773867127d5db4a204dedf075b5006a009fc9b8e9" network for pod "ut-cburdick-b6f192": NetworkPlugin cni failed to set up pod "ut-cburdick-b6f192_default" network: netplugin failed but error parsing its diagnostic message "": unexpected end of JSON input, failed to clean up sandbox container "af8b92d1581875f0eedfce9773867127d5db4a204dedf075b5006a009fc9b8e9" network for pod "ut-cburdick-b6f192": NetworkPlugin cni failed to teardown pod "ut-cburdick-b6f192_default" network: netplugin failed but error parsing its diagnostic message "": unexpected end of JSON input]

My network attachment definition looks like this:

apiVersion: "k8s.cni.cncf.io/v1"
kind: NetworkAttachmentDefinition
metadata:
  name: host-device-conf1
spec:
  config: '{
            "cniVersion": "0.2.0",
            "name": "mynet1",
            "type": "sriov",
            "if0": "ens1f0",
            "l2enable": true
        }'

I'm running multus:snapshot as of today, and SR-IOV version 1.0.0 of the plugin. I also tried the master branch with "master" instead of "if0" with the same error.

My goal is to eventually use DPDK on a Mellanox card across pods. I intentionally didn't use the DPDK section since I don't want the driver to be loaded (Mellanox doesn't need that), but I'm not sure if that's correct.

Multus: error in invoke Delegate add - "sriov": netplugin failed but error parsing its diagnostic message "": unexpected end of JSON input

[root@master ~]# kubectl describe pod pod-case-03
Name: pod-case-03
Namespace: default
Priority: 0
Node: master/192.168.1.203
Start Time: Tue, 03 Dec 2019 11:15:10 +0800
Labels:
Annotations: k8s.v1.cni.cncf.io/networks: sriov-conf-1
k8s.v1.cni.cncf.io/networks-status:
kubectl.kubernetes.io/last-applied-configuration:
{"apiVersion":"v1","kind":"Pod","metadata":{"annotations":{"k8s.v1.cni.cncf.io/networks":"sriov-conf-1"},"name":"pod-case-03","namespace":...
Status: Pending
IP:
Containers:
pod-case-03:
Container ID:
Image: centos/tools
Image ID:
Port:
Host Port:
Command:
/sbin/init
State: Waiting
Reason: ContainerCreating
Ready: False
Restart Count: 0
Environment:
Mounts:
/var/run/secrets/kubernetes.io/serviceaccount from default-token-k52z9 (ro)
Conditions:
Type Status
Initialized True
Ready False
ContainersReady False
PodScheduled True
Volumes:
default-token-k52z9:
Type: Secret (a volume populated by a Secret)
SecretName: default-token-k52z9
Optional: false
QoS Class: BestEffort
Node-Selectors:
Tolerations: node.kubernetes.io/not-ready:NoExecute for 300s
node.kubernetes.io/unreachable:NoExecute for 300s
Events:
Type Reason Age From Message


Warning FailedCreatePodSandBox 3m38s (x1597 over 3h12m) kubelet, master (combined from similar events): Failed create pod sandbox: rpc error: code = Unknown desc = [failed to set up sandbox container "1ab737d326ac02dc180994dba6b0b35f1307e6bd9b3111527918a446fb5d6659" network for pod "pod-case-03": NetworkPlugin cni failed to set up pod "pod-case-03_default" network: Multus: Err adding pod to network "sriov-conf-1": Multus: error in invoke Delegate add - "sriov": netplugin failed but error parsing its diagnostic message "": unexpected end of JSON input, failed to clean up sandbox container "1ab737d326ac02dc180994dba6b0b35f1307e6bd9b3111527918a446fb5d6659" network for pod "pod-case-03": NetworkPlugin cni failed to teardown pod "pod-case-03_default" network: Multus: error in invoke Delegate del - "sriov": netplugin failed but error parsing its diagnostic message "": unexpected end of JSON input / Multus: error in invoke Conflist Del - "cbr0": error in getting result from DelNetworkList: failed to convert major version part "": strconv.Atoi: parsing "": invalid syntax]

I use this NetworkAttachmentDefinition will cause the error.

[root@master ~]# cat sriov-conf-1.yml
apiVersion: "k8s.cni.cncf.io/v1"
kind: NetworkAttachmentDefinition
metadata:
name: sriov-conf-1
spec:
config: '{
"cniVersion": "0.3.0",
"type": "sriov",
"if0": "ens2f3",
"if0name": "sriov-net1",
"ipam": {
"type": "host-local",
"subnet": "192.168.3.0/24",
"rangeStart": "192.168.3.82",
"rangeEnd": "192.168.3.86",
"gateway": "192.168.3.4"
}
}'

But ,when i change it to l2enable it works, as follows:

[root@master ~]# cat sriov-conf-1.yml
apiVersion: "k8s.cni.cncf.io/v1"
kind: NetworkAttachmentDefinition
metadata:
name: sriov-conf-1
spec:
config: '{
"cniVersion": "0.3.0",
"type": "sriov",
"if0": "ens2f3",
"if0name": "sriov-net1",
"l2enable": true
}'

[root@master cbr0]# kubectl get pod
NAME READY STATUS RESTARTS AGE
pod-case-03 1/1 Running 0 11m
[root@master cbr0]# kubectl exec -it pod-case-03 ip a
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1
link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
inet 127.0.0.1/8 scope host lo
valid_lft forever preferred_lft forever
3: eth0@if1998: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1450 qdisc noqueue state UP group default
link/ether 0e:92:1c:b8:79:bb brd ff:ff:ff:ff:ff:ff link-netnsid 0
inet 10.244.0.199/24 scope global eth0
valid_lft forever preferred_lft forever
37: sriov-net1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP group default qlen 1000
link/ether ca:51:61:13:3e:83 brd ff:ff:ff:ff:ff:ff

how can i assign a ip to the pod, thank you.

releaseVF fails

In releaseVF

  • the VF is removed from DPDK
  • the vlan is set to 0
    Between these two operations the VF interface is switched to i40evf driver before vlan can be set to 0.
    It appears if vlan is set before i40evf driver binding completes in the driver, the netlink.LinkSetVfVlan() fails and as a result the releaseVF returns without finishing up rest of the task.

This keeps the VF in an unusable state, causes VF resource leak. As the PODs deleted and re-created the SR-IOV CNI fails as it runs out of VFs.

Workaround is to allow a few seconds after unbinding from DPDK before clearing the VLAN.

454 // bind the sriov vf to the kernel driver
455 if err := enabledpdkmode(df, df.Ifname, false); err != nil {
456 return fmt.Errorf("DPDK: failed to bind %s to kernel space: %s", df.Ifname, err)
457 }
458
459 // Workaround
460 // unbinding from DPDK and binding to kernel driver takes a few seconds
461 // the VLAN resetting call below is failing for i40e which takes couple of seconds
462 time.Sleep(4*time.Second)
463
464 // reset vlan for DPDK code here
465 pfLink, err := netlink.LinkByName(conf.IF0)
466 if err != nil {
467 return fmt.Errorf("DPDK: master device %s not found: %v", conf.IF0, err)
468 }
469
470 if err = netlink.LinkSetVfVlan(pfLink, df.VFID, 0); err != nil {
471 return fmt.Errorf("DPDK: failed to reset vlan tag for vf %d: %v", df.VFID, err)
472 }
473
474 return nil

Release IP from IPAM if cmdAdd failed

After gain IPAM's ip in CMD ADD release it if error happens

this code can help solving the issue in cmdAdd

After:
if result.IP4 == nil { return errors.New("IPAM plugin returned missing IPv4 config") }

add:

defer func() { if err != nil { ipam.ExecDel(n.IPAM.Type, args.StdinData) } }()

A Name Conflict Occurs When Assigning an Intermediary Name

What happened?

The following error snippet was found multiple times in the event log of a Kubernetes pod I was attempting to create:

Multus: error in invoke Delegate add - "sriov": failed to set up pod interface "net1" from the device "p2p1": error setting temp IF name p2p1_18 for p2p1_1

The error snippet was a result of the following code:

https://github.com/intel/sriov-cni/blob/2aa9b0e71e2d9eb53ac996005d32bd90c5f39d1d/pkg/sriov/sriov.go#L154-L165

Dell systems use biosdevname to name Ethernet devices. As a result, I have 128 virtual functions named p2p1_1 - p2p1_127 all based off physical function p2p1. p2p1_1 happens to be the 8th Ethernet device on my system, so the above code assigns it the temporary name p2p1_18, which already exists on my system.

What did you expect to happen?

I expected pod creation to finish without error.

What are the minimal steps needed to reproduce the bug?

Use a Dell server with a large number of virtual functions and wait for Kubernetes to choose a virtual function for which the above code causes a name conflict.

Anything else we need to know?

My Dell server, specifically, causes this problem. I also have a Lenovo server which uses systemd naming on which naming conflicts do not occur.

I can't pretend to be an expert on the design of the sriov-cni, but I recommend something like the following:

tempName := fmt.Sprintf("%s%s", linkName, "_temp")

This should help avoid the potential for name conflicts with other virtual functions.

Component Versions

Please fill in the below table with the version numbers of applicable components used.

Component Version
SR-IOV CNI Plugin 0.2.1
Multus 3.1
SR-IOV Network Device Plugin 3.0.0
Kubernetes 1.14.3
OS Centos 7.6

Host reboots During POD deployment with wrong values in sriov plugin fields

We are using this sriov plugin for Kubernetes POD with DPDK application. It's found that if there is wrong value in the network files for the sriov, host server reboots during POD deployment. For example following is our yaml file for the network definition.

plugin: sriov
args: '[
{
"name": "",
"type": "sriov",
"if0": "",
"vlan": ,
"if0name": "",
"dpdk": {
"kernel_driver":"ixgbevf",
"dpdk_driver":"vfio-pci",
"dpdk_tool":"dpdk-devbind.py"
}
}
]'

if there is a wrong value input for any field like vfio_pci instead of vfio-pci, host server reboots during POD deployment.

Device name lookup failure in netns

In commit 63b44a3, device name is changed before moving into container namespace in order to avoid possible name conflicts inside container, for example:

rename 'em1_0' to 'dev13'

but this also causes failure when CNI tries to rename the interface to 'netx' again after it's moved into container, for example:

rename 'em1_0' to 'net1'

because inside container namespace, the interface is not em1_0 anymore, but dev13

error message:

Warning  FailedCreatePodSandBox  6s    kubelet, nfvsdn-17.oot.lab.eng.rdu2.redhat.com  Failed create pod sandbox: rpc error: code = Unknown desc = [failed to set up sandbox container "f3264b820129f6786899e648b2ec8808285028edb4a3776b3b70f2aad287014b" network for pod "testpod2": NetworkPlugin cni failed to set up pod "testpod2_default" network: Multus: Err in tearing down failed plugins: Multus: error in invoke Delegate add - "sriov": failed to set up pod interface "net1" from the device "em1": failed to rename vf 0 of the device "em1_0" to "net1": failed to lookup device "em1_0": Link not found, failed to clean up sandbox container "f3264b820129f6786899e648b2ec8808285028edb4a3776b3b70f2aad287014b" network for pod "testpod2": NetworkPlugin cni failed to teardown pod "testpod2_default" network: Multus: error in invoke Delegate del - "sriov": failed to lookup vf device "net1": Link not found]

Should we deprecate support of allocating VF using master PF?

@rkamudhan
@zshi-redhat

With introduction of sriov-device-plugin the sriov-cni no longer needs to find and allocate unused VF using a master(SRIOV PF ) interface. Using the master interface to determine and allocate a free VF to a container has other unwanted consequences too. It's not concurrency safe. For proper resource accounting and allocation in K8s the device plugin should be used. Device plugin also takes care of device files permission issues should a VF requires any device files in the container(e.g. vfio, rdma etc.). With that in mind, attaching dpdk driver in CNI is no longer needed. Currently, deprecating DPDKConf is in progress. Should we also remove the master field from the json conf and only allow user to specify the VF's pci address as deviceID(ideally this should be coming thourgh sriov-device-plugin).

What is your opinion on this?

VF MAC address collision

Intermittent ping failures

We persist the IP address and MAC address of a Pod. We generate a random MAC address in such a way that for a given IP address the MAC address remains the same.

SRIOV PodA, is assigned address IP-A and MAC address MAC-A. This is assigned to a VF on deployment. If the Pod were to restart, in our platform the pod will retain both IP-A and MAC-A, but the VF assigned for the Pod would have changed.

This is fine for a while, but we have seen that once in a while the MAC-A is assigned to the new VF and also to the host interface corresponding to the old VF that was released. In essence on the host we see two entities with MAC-A address. When this happens the ping to that container fails.

Multus: v3.4.1
Kubernetes: 1.16.3
SRIOV CNI: Latest

IPv6 on SR-IOV

Hi,

If SR-IOV configured with multus, Does SR-IOV interface supports IPv6 only?

If i am trying to configure only IPv6, I am receiving below error.

Multus: error in invoke Delegate add - "sriov": IPAM plugin returned missing IPv4 config.

Codes needs cohesive restructuring to make them testable

We have done some refactoring recently to so that we can add structured unit tests. However, there are still some more refactoring need to be done. As we can see some challenges with mocking netlink functions and some side-effects of doing so. see #42

Here's what I suggest we can do to improve the code-base:

We are using netlink library in following pkgs/files:

  • sriov/sriov.go
  • pkg/config.go

Here is what I suggest we should do in upcoming days:

  1. Move sriov/main.go into a new cmd/main.go directory – add integration test for this file
  2. Move sriov/sriov.go into new package named sriov into pkg/sriov/sriov.go
  3. Define an Interface as NetlinkManager pkg/sriov/sriov.go with all of the netlink functions that we use in the pkg/sriov/sriov.go –so we can mock them at once & only place in its unit-test file
  4. Move AssignFreeVF() function from pkg/config.go into pkg pkg/sriov/sriov.go so the netlink library only to be imported there. We have to mock it for unit tests only once in here. Netlink no longer needed for config.go

POD with fails SRIOV-VFs after server reboot

Have 3 or 4 PODs with SRIOV-VFs and reboot the server. At system startup (assume the VFs are created), kubelet starts all these PODs pretty much at the same time. Multus/SRIOV plugins are called for the VFs and multiple VF requests hit the i40e/i40evf driver on the host at the same time. And VF configuration fails since the drivers can not handle multiple requests, VLAN config fails or SRIOV plugin add fail randomly.

Looks like the SRIOV plugin add/release VF requests need to be serialised and locked.

Poor DPDK Performance

Hi, I'm using this plugin with the mellanox connectx-5 100G card with DPDK. On bare metal with SRIOV off, I can hit 100Gbps with 2 transmitting threads with 4KB packets. When I use this plugin, I'm not able to get higher than about 70Gbps, regardless of how many threads I have sending out the VFs. I've tried both more queues per VF, and more VFs on the same interface to no avail. All transmitting cores are on the same NUMA node.

Since it's mellanox, I don't specific dpdk in the CRD because of their bifurcated driver. Dpdk recognizes the devices properly, so I didn't think that was the cause either. Does anyone have any ideas to try?

sriov-ipam not working

Kubernetes version 1.12 , Docker version - 1.13.1
SRIOV definition -

apiVersion: "k8s.cni.cncf.io/v1"
kind: NetworkAttachmentDefinition
metadata:
name: sriov-ipam-1
spec:
config: '{
"cniVersion": "0.3.0",
"LogFile": "/var/log/multus.log",
"LogLevel": "debug",
"name": "sriov-ipam-1",
"type": "sriov",
"if0": "ens12f1",
"vlan": 1925,
"ipam": {
"type": "host-local",
"subnet": "subnet/mask",
"rangeStart": "IP",
"rangeEnd": "IP",
"gateway": "GW IP"
}
}'

Deployment -


apiVersion: extensions/v1beta1
kind: Deployment
metadata:
name: sample-storage
labels:
k8s-app: cmm-k8s
kubernetes.io/cluster-service: "true"
spec:
replicas: 1
strategy:
type: RollingUpdate
rollingUpdate:
maxUnavailable: 1
selector:
matchLabels:
k8s-app: cmm-k8s
minReadySeconds: 111
strategy:
rollingUpdate:
maxUnavailable: 0
template:
metadata:
annotations:
k8s.v1.cni.cncf.io/networks: '[
{ "name": "sriov-ipam-1" ,
"interfaceRequest": "eth2" }
]'
labels:
app: app
k8s-app: app-k8s
kubernetes.io/cluster-service: "true"
spec:
hostNetwork: false
nodeSelector:
nodetype: storage
nodename: storage
containers:
- name: sample
image: 172.24.17.100:5000/ubuntu_combined:v4
imagePullPolicy: IfNotPresent
command: ["/bin/sh"]
args: ["-c", "while true; do echo hello; sleep 10;done"]
lifecycle:
preStop:
exec:
command: ["/bin/sh", "-c","/stop.sh; sleep 30;"]
env:
- name: HOSTNAME
value: test
- name: ENV
value: "CMM_ON_DOCKER"
- name: KUBE_DNS_IP
value: "10.96.0.10"
- name: CONSUL_SVC
value: "consul-svc.kube-system.svc.cluster.local"
- name: DOCKER_NETWORK
value: "cilium"
securityContext:
capabilities:
add:
- SYS_ADMIN

Result - POD is not coming up. "kubectl describe " shows

Events:
Type Reason Age From Message


Normal Scheduled 17s default-scheduler Successfully assigned default/sample-storage-84c4f57584-wjjqk to storage
Warning FailedCreatePodSandBox 11s kubelet, storage Failed create pod sandbox: rpc error: code = Unknown desc = [failed to set up sandbox container "948f820bc191df3199023d6dd0665636c0bd65f41fe49be21c25e0b1fd129940" network for pod "sample-storage-84c4f57584-wjjqk": NetworkPlugin cni failed to set up pod "sample-storage-84c4f57584-wjjqk_default" network: Multus: Err in tearing down failed plugins: Multus: error in invoke Delegate add - "sriov": IPAM plugin returned missing IPv4 config, failed to clean up sandbox container "948f820bc191df3199023d6dd0665636c0bd65f41fe49be21c25e0b1fd129940" network for pod "sample-storage-84c4f57584-wjjqk": NetworkPlugin cni failed to teardown pod "sample-storage-84c4f57584-wjjqk_default" network: Multus: error in invoke Delegate del - "sriov": failed to lookup vf device "eth2": Link not found]
Normal SandboxChanged 5s (x2 over 11s) kubelet, storage Pod sandbox changed, it will be killed and re-created.
Warning FailedCreatePodSandBox 5s kubelet, storage Failed create pod sandbox: rpc error: code = Unknown desc = [failed to set up sandbox container "668ec1c5880b6a2b3ab2965a1adae6cdaf01add662df3cffa649473422811dc2" network for pod "sample-storage-84c4f57584-wjjqk": NetworkPlugin cni failed to set up pod "sample-storage-84c4f57584-wjjqk_default" network: Multus: Err in tearing down failed plugins: Multus: error in invoke Delegate add - "sriov": IPAM plugin returned missing IPv4 config, failed to clean up sandbox container "668ec1c5880b6a2b3ab2965a1adae6cdaf01add662df3cffa649473422811dc2" network for pod "sample-storage-84c4f57584-wjjqk": NetworkPlugin cni failed to teardown pod "sample-storage-84c4f57584-wjjqk_default" network: Multus: error in invoke Delegate del - "sriov": failed to lookup vf device "eth2": Link not found]

Have separate log level/file/tracing for SRIOV plugin

Currently SRIOV errors are reported via Multus log file /var/log/multus.log when an error occurs in SRIOV plugin. If I set log level to debug in Multus all debug logs are dumped and it becomes cumbersome to navigate. Essentially I am looking for a way to trace/debug SRIOV plugin only and not interested in Multus logs/debug logs. We should be able set trace/debug level for SRIOV only.

remove IF0NAME from NetConf

The field IF0NAME name was added long before the CRD feature was Multus as a way of bypassing CNI_IFNAME from environment. This is no longer needed as a meta-plugin like Multus can set this env variable as required.

The IF0NAME is no longer being used and we can safely remove this field from sriov NetConf.

Failed to build on golang v1.6 and glide v0.13.1

Hello,

Are you still build this based on golang v1.5+?
I have some error like
obj.IsAlias undefined (type *types.TypeName has no field or method IsAlias)
during building with golang v1.6

And after I upgraded my golang to v1.10, the I am able to build this successfully.

i40e: Failed to delete MAC filter, error I40E_ERR_PARAM

What happened?

When deleting a SR-IOV Pod that has customized MAC address configured. i40e driver reports error message below:

[ 1931.834872] i40e 0000:88:00.1: MAC addr ca:fe:c0:ff:ee:ee has been set by PF, cannot delete it for VF 1, reset VF to change MAC addr
[ 1931.834999] i40e 0000:88:00.1: VF 1 failed opcode 11, retval: -5
[ 1931.835109] i40evf 0000:88:0a.1: Failed to delete MAC filter, error I40E_ERR_PARAM
[ 1931.865124] i40e 0000:88:00.1: Removing MAC on VF 1
[ 1931.865249] i40evf 0000:88:0a.1: Reset warning received from the PF
[ 1931.865257] i40evf 0000:88:0a.1: Scheduling reset task
[ 1931.953132] i40e 0000:88:00.1: Reload the VF driver to make this change effective.
[ 1932.045835] i40e 0000:88:00.1: VF 1 is now untrusted
[ 1932.051530] IPv6: ADDRCONF(NETDEV_UP): enp136s10f1: link is not ready
[ 1932.053490] IPv6: ADDRCONF(NETDEV_UP): enp136s10f1: link is not ready
[ 1932.160215] i40evf 0000:88:0a.1 enp136s10f1: NIC Link is Up 25 Gbps Full Duplex
[ 1932.160231] IPv6: ADDRCONF(NETDEV_CHANGE): enp136s10f1: link becomes ready

What did you expect to happen?

no error message reported from i40e drivers, for example:

[ 2057.031200] i40evf 0000:88:0a.0: Reset warning received from the PF
[ 2057.031209] i40evf 0000:88:0a.0: Scheduling reset task
[ 2057.124303] i40e 0000:88:00.1: VF 0 is now trusted
[ 2057.216816] IPv6: ADDRCONF(NETDEV_UP): net1: link is not ready
[ 2057.258262] i40evf 0000:88:0a.0 net1: NIC Link is Up 25 Gbps Full Duplex
[ 2057.260185] IPv6: ADDRCONF(NETDEV_CHANGE): net1: link becomes ready
[ 2113.069530] i40evf 0000:88:0a.0: Reset warning received from the PF
[ 2113.069539] i40evf 0000:88:0a.0: Scheduling reset task
[ 2113.161275] i40e 0000:88:00.1: VF 0 is now untrusted
[ 2113.167315] IPv6: ADDRCONF(NETDEV_UP): enp136s10: link is not ready
[ 2113.168271] IPv6: ADDRCONF(NETDEV_UP): enp136s10: link is not ready
[ 2113.281669] i40evf 0000:88:0a.0 enp136s10: NIC Link is Up 25 Gbps Full Duplex
[ 2113.281690] IPv6: ADDRCONF(NETDEV_CHANGE): enp136s10: link becomes ready

What are the minimal steps needed to reproduce the bug?

Create a SR-IOV Pod with below spec, then delete the pod, check the system log with dmesg

apiVersion: v1
kind: Pod
metadata:
  name: testpod2
  annotations:
    k8s.v1.cni.cncf.io/networks: '[
        {
                "name": "sriov-net1",
                "mac": "CA:FE:C0:FF:EE:EE"
        }
]'
spec:
  containers:
  - name: appcntr2 
    image: centos/tools 
    imagePullPolicy: IfNotPresent
    command: [ "/bin/bash", "-c", "--" ]
    args: [ "while true; do sleep 300000; done;" ]
    resources:
      requests:
        intel.com/intel_sriov_netdevice: '1' 
      limits:
        intel.com/intel_sriov_netdevice: '1'

Anything else we need to know?

Component Versions

Please fill in the below table with the version numbers of applicable components used.

Component Version
SR-IOV CNI Plugin master
Multus master
SR-IOV Network Device Plugin master
Kubernetes 1.16.1
OS CentOS 7.6

Config Files

Config file locations may be config dependent.

CNI config (Try '/etc/cni/net.d/')

using default:

{
        "name": "multus-cni-network",
        "cniVersion": "0.3.1",
        "type": "multus",
        "logLevel": "debug",
        "logFile": "/tmp/multus.log",
        "delegates": [{
                "type": "flannel",
                "name": "flannel.1",
                "delegate": {
                        "isDefaultGateway": true
                }
        }],
        "kubeconfig": "/etc/kubernetes/admin.conf"
}

Device pool config file location (Try '/etc/pcidp/config.json')

using default:

{
    "resourceList": [{
            "resourceName": "intel_sriov_netdevice",
            "selectors": {
                "vendors": ["8086"],
                "devices": ["154c", "10ed"],
                "drivers": ["i40evf", "ixgbevf"]
            }
        },
        {
            "resourceName": "intel_sriov_dpdk",
            "selectors": {
                "vendors": ["8086"],
                "devices": ["154c", "10ed"],
                "drivers": ["vfio-pci"],
                "pfNames": ["enp0s0f0","enp2s2f1"]
            }
        },
        {
            "resourceName": "mlnx_sriov_rdma",
            "isRdma": true,
            "selectors": {
                "vendors": ["15b3"],
                "devices": ["1018"],
                "drivers": ["mlx5_ib"]
            }
        }
    ]
}
Kubernetes deployment type ( Bare Metal, Kubeadm etc.)

Bare Metal, Kubeadm

SR-IOV Network Custom Resource Definition
apiVersion: "k8s.cni.cncf.io/v1"
kind: NetworkAttachmentDefinition
metadata:
  name: sriov-net1
  annotations:
    k8s.v1.cni.cncf.io/resourceName: intel.com/intel_sriov_netdevice
spec:
  config: '{
  "type": "sriov",
  "cniVersion": "0.3.1",
  "name": "sriov-network",
  "trust": "on",
  "ipam": {
    "type": "host-local",
    "subnet": "10.56.217.0/24",
    "routes": [{
      "dst": "0.0.0.0/0"
    }],
    "gateway": "10.56.217.1"
  }
}'

sriov cni network status is not correctly updated in pod annotation

Multus supports updating network status from each delegated CNI plugins to pod annotation, for example:

# kubectl describe pod testpod1
Name:               testpod1
Namespace:          default
Priority:           0
PriorityClassName:  <none>
Node:               worker-0/10.19.111.16
Start Time:         Sun, 16 Jun 2019 22:35:14 -0400
Labels:             <none>
Annotations:        k8s.v1.cni.cncf.io/networks: sriov-net1
                    k8s.v1.cni.cncf.io/networks-status:
                      [{
                          "name": "",
                          "ips": [
                              "10.96.1.110"
                          ],
                          "default": true,
                          "dns": {}
                      },{
                          "name": "sriov-network",
                          "dns": {}
                      }]
Status:             Running
IP:                 10.96.1.110

But sriov-cni network status is not correctly updated, as above, there are only name and dns fields appear in pod annotation for sriov-network, others such as ip, mac etc are missing.

Tested with

  1. Multus latest master: 1f8b44c575ee60f86ec99decd008a2328586952d
  2. SR-IOV CNI latest master: 968b85e

DESIGN: NETWORK POLICY IN SR-IOV CNI PLUGIN

Problem

A network policy in Kubernetes is a specification of how groups of pods are allowed to communicate with each other and other network endpoints. Multus enables attaching multiple network interfaces to pods. For now SR-IOV CNI is supposed to be used with Multus but SR-IOV CNI doesn’t support Network Policy. However Network Policy should be applied to all CNI plugin otherwise there will be a security issue: the pod can still access another pod through SR-IOV network which is forbidden to be accessed by other plugins. For example, Multus is used and Calico which supporting Network Policy is the default plugin. Pod1 is forbidden to access Pod2 by Network Policy. Then Pod1 can’t access Pod2 through eth0 which is created by Calico. But Pod1 can still access Pod2 through net1 which is created by SR-IOV CNI because Calico can’t control SR-IOV network,.

+------------+           +--------------+
+    Pod1    +           +    Pod2      +
+            +           +              +
+    +-------+           +-------+      +
+    + eth0  +---  X  ---+ eth0  +      +
+    +-------+           +-------+      +
+            +           +              +
+    +-------+           +-------+      +
+    + net1  +-----------+ net1  +      +
+    +-------+           +-------+      +
+------------+           +--------------+

Proposal

There is an embedded switch in the SR-IOV capable NIC and it’s used to forward network traffic between the physical port on the adapter and internal virtual ports. By utilizing DPDK or TC flower, network policy can be offloaded to the embedded switch.
Picture1

Proposed changes

  1. Add a Network Policy Driver. Network Policy Driver should be a separate entity. It can be agents running on Kubernetes nodes (aka hosts). DPDK and TC flower can be implementations of the Network Policy Driver. The base Network Policy Driver need to watch kube-apiserver network policy and pod status.
+-------------------------------------------------+
+                  kube-apiserver                 +
+  +---------------------+   +----------------+   +
+  +   network policy    +   +  pod status    +   +
+  +-----------+---------+   +---------+------+   +
+--------------|-----------------------|----------+
               |                       |
               |        watch          |
               |                       |
        +------+-----------------------+-----+ 
        +                                    +
        +        Network Policy Driver       +
        +                                    +
        +  +----------+   +--------------+   +
        +  +   DPDK   +   +  TC flower   +   +
        +  +----------+   +--------------+   +
        +------------------------------------+
  1. Offloading network policy to embedded switch by the implementation of Network Policy Driver. All network policy can be mapped to DPDK testpmd and TC flower as following: (Testpmd is an example application built with DPDK SDK. The features implemented by testpmd can also be implemeted by our application.)
    networkPolicy

  2. DPDK:
    Need to use user space driver igb_uio as PF driver and then configure VF through PF. Because PF is not bound to kernel space driver, PF won’t have a name. Need to implement existing functions in SR-IOV CNI plugin again because kernel space driver is not used. The following table shows how to implement the existing functions:

Function Implementation
Set VF’s MAC testpmd> set vf mac addr (port_id) (vf_id) (XX:XX:XX:XX:XX:XX) By default, MAC address will be given to VF. If user needs a specific MAC address, he will need to unbind and rebind vf’s driver to change MAC address.
Set VF’s vlan testpmd> set vf vlan insert (port_id) (vf_id) (vlan_id)
Set VF’s vlanQos Need to investigate and should be supported by DPDK. If not, we can request DPDK community to implement this feature.
Set VF spoofcheck testpmd> set vf vlan antispoof (port_id) (vf_id) (on|off)testpmd> set vf mac antispoof  (port_id) (vf_id) (on|off)
Set VF trust Need to investigate and should be supported by DPDK. If not, we can request DPDK community to implement this feature.
Set VF max_tx_rate testpmd> set vf tx max-bandwidth (port_id) (vf_id) (max_bandwidth)
  1. TC flower
    Can reuse kernel space driver and little code changes to SR-IOV CNI existing code. But require the NIC has the NETIF_F_HW_TC feature enabled and implements ndo_setup_tc.
    To enable TC hardware offload:
    ethtool -K eth0 hw-tc-offload on
    To check whether hw-tc-offload has been enabled:
    ethtool -k eth0 | grep hw-tc-offload

Benefits

  1. Network traffic is processed on the embedded switch and thus CPU resource is saved. Usually Network Policies are implemented by iptables which makes network packets to go through kernel space. And CPU will be used to process network traffic if iptables is used.
  2. It can accelerate the speed of processing network traffic if most of the packets should be dropped. Such as DDOS.
  3. The security issue described in the section “Problem” won’t occur.

Can't bind VM interface as DPDK interface inside a k8s POD

Binding VM's virtio interface (attached as vHost port to OVS running on the Host OS) as DPDK interface into K8S Pod is failing with following error.

"Failed create pod sandbox: rpc error: code = Unknown desc = failed to set up sandbox container "017d716ce70a585a7f23e38aaf463f058d3b396929d791204b9253ed58a27b29" network for pod "samplepod": NetworkPlugin cni failed to set up pod "samplepod_default" network: Multus: Err in tearing down failed plugins: Multus: error in invoke Delegate add - "sriov": failed to set up pod interface "net0" from the device "ens18": failed to open the sriov_numfs of device "ens18": lstat /sys/class/net/ens18/device/sriov_numvfs: no such file or directory"

Here are configuration used for creating the POD:

cat <<EOF | kubectl create -f -
apiVersion: "k8s.cni.cncf.io/v1"
kind: NetworkAttachmentDefinition
metadata:
name: dpdk-conf
spec:
config: '{
"name": "mynet",
"type": "sriov",
"if0": "ens18",
"if0name": "net0",
"dpdk": {
"kernel_driver":"uio",
"dpdk_driver":"igb_uio",
"dpdk_tool":"/usr/src/dpdk-stable-17.11.4/install/share/dpdk/usertools/dpdk-devbind.py"
}
}'
EOF

cat <<EOF | kubectl create -f -
apiVersion: v1
kind: Pod
metadata:
name: samplepod
annotations:
k8s.v1.cni.cncf.io/networks: dpdk-conf
spec:
containers:

  • name: samplepod
    command: ["/bin/bash", "-c", "sleep 2000000000000"]
    image: dougbtv/centos-network
    EOF

Excerpts from kubectl describe command on the samplepod:

Events:
Type Reason Age From Message


Normal Scheduled 8s default-scheduler Successfully assigned default/nginx to ubuntu
Warning FailedCreatePodSandBox 7s kubelet, ubuntu Failed create pod sandbox: rpc error: code = Unknown desc = failed to set up sandbox container "dc4ed48ee816067a2bcb1d9d577cd8304b8183545aee103e19b52175a5093623" network for pod "nginx": NetworkPlugin cni failed to set up pod "nginx_default" network: failed to set up pod interface "net0" from the device "ens18": failed to open the sriov_numfs of device "ens18": lstat /sys/class/net/ens18/device/sriov_numvfs: no such file or directory
Warning FailedCreatePodSandBox 6s kubelet, ubuntu Failed create pod sandbox: rpc error: code = Unknown desc = failed to set up sandbox container "9f47783f0743ea5a5f32d4dd75c64506ec7988f6ed2fcc32aa2a300a1737ae4d" network for pod "nginx": NetworkPlugin cni failed to set up pod "nginx_default" network: failed to set up pod interface "net0" from the device "ens18": failed to open the sriov_numfs of device "ens18": lstat /sys/class/net/ens18/device/sriov_numvfs: no such file or directory
Warning FailedCreatePodSandBox 5s kubelet, ubuntu Failed create pod sandbox: rpc error: code = Unknown desc = failed to set up sandbox container "d3d7c9869e26bb1343dc5f2ab43b786ea002ff4e979f1810f393caef299ff273" network for pod "nginx": NetworkPlugin cni failed to set up pod "nginx_default" network: failed to set up pod interface "net0" from the device "ens18": failed to open the sriov_numfs of device "ens18": lstat /sys/class/net/ens18/device/sriov_numvfs: no such file or directory
Warning FailedCreatePodSandBox 4s kubelet, ubuntu Failed create pod sandbox: rpc error: code = Unknown desc = failed to set up sandbox container "49c0f432a088087ff22aeee1a9f4b957d15df88064204d4c90d7839b3f9dea2e" network for pod "nginx": NetworkPlugin cni failed to set up pod "nginx_default" network: failed to set up pod interface "net0" from the device "ens18": failed to open the sriov_numfs of device "ens18": lstat /sys/class/net/ens18/device/sriov_numvfs: no such file or directory
Warning FailedCreatePodSandBox 3s kubelet, ubuntu Failed create pod sandbox: rpc error: code = Unknown desc = failed to set up sandbox container "7f9098119ca8614d5cd499412e852e2203597f667ddb305a0977f2ca00b6e53c" network for pod "nginx": NetworkPlugin cni failed to set up pod "nginx_default" network: failed to set up pod interface "net0" from the device "ens18": failed to open the sriov_numfs of device "ens18": lstat /sys/class/net/ens18/device/sriov_numvfs: no such file or directory
Warning FailedCreatePodSandBox 2s kubelet, ubuntu Failed create pod sandbox: rpc error: code = Unknown desc = failed to set up sandbox container "a07e6f8dfb3d5369cd8ff9d51d033c18e3f97fc96c9264552d7ab6270b8a0078" network for pod "nginx": NetworkPlugin cni failed to set up pod "nginx_default" network: failed to set up pod interface "net0" from the device "ens18": failed to open the sriov_numfs of device "ens18": lstat /sys/class/net/ens18/device/sriov_numvfs: no such file or directory
Warning FailedCreatePodSandBox 1s kubelet, ubuntu Failed create pod sandbox: rpc error: code = Unknown desc = failed to set up sandbox container "6c46db7b766ff80e6e15beb12865961d6d75a31ce9c7b73d049306517598c344" network for pod "nginx": NetworkPlugin cni failed to set up pod "nginx_default" network: failed to set up pod interface "net0" from the device "ens18": failed to open the sriov_numfs of device "ens18": lstat /sys/class/net/ens18/device/sriov_numvfs: no such file or directory
Normal SandboxChanged 0s (x7 over 6s) kubelet, ubuntu Pod sandbox changed, it will be killed and re-created.

Debugging issue with sriov-cni

We have created Custom Resource Definition for the K8S object "Network" and we use the sriov-cni plugin in the Network definition file.

  1. Issue with the SRIOV plugin is that in case there is a mistake in Parent interface ( or Physical function ) definition as shown in "if0" below or similar mistake, POD comes up fine but no VF will be allocated to POD. it's difficult to debug the issue in this scenario as "journalctl -e" doesn't show any related error messages from sriov-cni. Is there any log files generated by it which can be used to find out the error messages ?

apiVersion: "kubernetes.com/v1"
kind: Network
metadata:
name: sriov-ipds-external1
plugin: sriov
args: '[
{
"name": "sriov-ipds-external1",
"type": "sriov",
"if0": "eno1",
"vlan": 1925,
"dpdk": {
"kernel_driver":"ixgbevf",
"dpdk_driver":"igb_uio",
"dpdk_tool":"/var/lib/docker/dpdk-devbind.py"
}
}

]'

  1. Along with this, there is another issue where if there is a mistake in VF creation and POD is deployed with sriov plugin, subsequent POD termination reboots the underlying server.

Support moveing VF which is eth0 on the host to the namespace

When using multus pluing the first interface will be eth0 and
the others will be netX. We need to make sure that before
we move VF which is eth0 to the container namespace it should
first be renamed to temp name before moving to the namespace.
This is to avoid conflict with the eth0 interface which already
exist in the namespace.

cmdDel fails releasing the device when kubelet deletes pause container

What happened?

Kubelet doesn't gaurentee to keep pause container alive while cni tries to delete all the devices attached to the pod. When the pause container is deleted, the netns is not available to release the device from cmdDel. This results in the device on host with wrong name, missing ip and wrong settings.

What did you expect to happen?

Kubelet to provide some guarantee that netns is available for the cni to delete all attached devices.

What are the minimal steps needed to reproduce the bug?

Attach atleast 4 sriov devices to pod. Kill the pod.
To consistently reproduce the error, add 1 sec sleep in cmdDel.

Anything else we need to know?

Raised the issue with kubernetes and unable to get any positive response.
kubernetes/kubernetes#89440
As a workaround having a daemon that tries to fix the broken device on the host periodically.

Component Versions

Please fill in the below table with the version numbers of applicable components used.

Component Version
SR-IOV CNI Plugin v2.2
Multus v3.4
SR-IOV Network Device Plugin v2.2
Kubernetes 1.13.5
OS Ubuntu 18

Config Files

Config file locations may be config dependent.

CNI config (Try '/etc/cni/net.d/')
Device pool config file location (Try '/etc/pcidp/config.json')
Multus config (Try '/etc/cni/multus/net.d')
Kubernetes deployment type ( Bare Metal, Kubeadm etc.)
Kubeconfig file
SR-IOV Network Custom Resource Definition

Logs

SR-IOV Network Device Plugin Logs (use kubectl logs $PODNAME)

Added some custom logs to print cmdArgs and netns
time="2020-04-24T17:22:52Z" level=info msg="read from cache &{NetConf:{CNIVersion:0.3.1 Name:sriov-network Type:sriov Capabilities:map[] IPAM:{Type:} DNS:{Nameservers:[] Domain: Search:[] Options:[]} RawPrevResult:map[dns:map[] interfaces:[map[name:net1 sandbox:/proc/4281/ns/net]]] PrevResult:} DPDKMode:false Master:enp5s0 MAC: AdminMAC: EffectiveMAC: Vlan:0 VlanQoS:0 DeviceID:0000:05:00.1 VFID:0 HostIFNames:net1 ContIFNames:net1 MinTxRate: MaxTxRate: SpoofChk: Trust: LinkState: Delegates:[{CNIVersion:0.3.1 Name:sbr Type:sbr Capabilities:map[] IPAM:{Type:} DNS:{Nameservers:[] Domain: Search:[] Options:[]} RawPrevResult:map[] PrevResult:}] RuntimeConfig:{Mac:} IPNet:}"
time="2020-04-24T17:22:52Z" level=info msg="empty netns , error = failed to Statfs "/proc/4281/ns/net": no such file or directory"
time="2020-04-24T17:22:52Z" level=info msg="ReleaseVF "
time="2020-04-24T17:22:52Z" level=error msg="failed to get netlink device with name net1"

Multus logs (If enabled. Try '/var/log/multus.log' )
Kubelet logs (journalctl -u kubelet)

Mar 23 21:04:42 dgx0098 kubelet[29124]: 2020-03-23T21:04:42Z [error] Multus: error in invoke Delegate del - "sriov": error in removing device from net namespace: 1failed to get netlink device with name net3: Link not found
Mar 23 21:04:42 dgx0098 kubelet[29124]: 2020-03-23T21:04:42Z [debug] delegateDel: , net2, &{{0.3.1 sriov-network sriov map[] {} {[] [] []}} { []} false false [123 34 99 110 105 86 101 114 115 105 111 110 34 58 34 48 46 51 46 49 34 44 34 100 101 108 101 103 97 116 101 115 34 58 91 123 34 99 110 105 86 101 114 115 105 111 110 34 58 34 48 46 51 46 49 34 44 34 110 9 101 34 58 34 115 98 114 34 44 34 116 121 112 101 34 58 34 115 98 114 34 125 93 44 34 100 101 118 105 99 101 73 68 34 58 34 48 48 48 48 58 48 99 58 48 48 46 49 34 44 34 110 97 109 101 34 58 34 115 114 105 111 118 45 110 101 116 119 111 114 107 34 44 34 116 121 112 101 34 58 34 115 114 105 111 118 34 125]}, &{cfba15035e7ef328153ba5c88853b52f97740560bc27a0707ab2f5b536a8f863 /proc/32764/ns/net net2 [[IgnoreUnknown 1] [K8S_POD_NAMESPACE user] [K8S_POD_NAME 847138-worker-1] [K8S_POD_INFRA_CONTAINER_ID cfba15035e7ef328153ba5c88853b52f97740560bc27a0707ab2f5b536a8f863]] map[] }, /opt/cni/bin
Mar 23 21:04:42 dgx0098 kubelet[29124]: 2020-03-23T21:04:42Z [verbose] Del: user:847138-worker-1:sriov-network:net2 {"cniVersion":"0.3.1","delegates":[{"cniVersion":"0.3.1","name":"sbr","type":"sbr"}],"deviceID":"0000:0c:00.1","name":"sriov-network","type":"sriov"}
Mar 23 21:04:46 dgx0098 kubelet[29124]: I0323 21:04:46.544632 29124 plugins.go:391] Calling network plugin cni to tear down pod "847138-worker-1_user"

SRIOV CNI to determine the driver and user config

Currently the kernel driver is specified as part of SRIOV CNI config. And the SRIOV plugin binds to the user specified driver if L2Enable is True. If DPDK is enabled, runs dpdk_devbind to bind to a dpdk driver.

spec:
config: '{
"cniVersion": "0.3.0",
"type": "sriov",
"vlan": 200,
"dpdk": {
"kernel_driver":"i40evf",
"dpdk_driver":"igb_uio",
"dpdk_tool":"/usr/local/sbin/dpdk-devbind"
}
}'

This config does not work if there are multiple different types of NIC (i40e, ConnectX)
This means that one requires network definition based on NIC type.

Currently L2Enable and DPDK are mutually exclusive. But the ConnectX support in DPDK does not require dpdk-devbind to be called and instead it requires that the VF/PF binds to the kernel driver. As a result one needs multiple definitions of a network based on NIC types even though the DPDK application in a POD consumes both the NIC types.

Instead the SRIOV plugin can itself determine the driver type based on VF/PF PCIeaddr.

I believe the NIC type, driver to bind/unbind belongs to SRIOV plugin and will allow a generic network definition which is defined by SRIOVDP and the resource type.

Basically remove dependency on
"kernel_driver":"i40evf",
"dpdk_driver":"igb_uio",

and let SRIOV plugin determine the driver based on VF/PF/pcie-addr/sysfs.

This should work for both cases with and without SRIOVDP.

VLAN not set on VF when DPDK mode is used

What happened?

Using the latest SRIOV-CNI Plugin with Multus
I have a PF with 4 VFs i bind the VFs to dpdk vfio-pci

Now when i create a NAD like below
spec:
config: '{ "cniVersion": "0.3.1", "type": "sriov", "vlan": 1556, "deviceID": "0000:5e:0a.3",
"name": "ani-netdevice-network" }

Everything works fine from container but VLAN is not set on the VF. Why are the VF attributes like VLAN not set when bound to DPDK?

What did you expect to happen?

Expect VLAN to be set on VF

What are the minimal steps needed to reproduce the bug?

Just try the NAD above on a VF bound with DPDK vfio-pci

Anything else we need to know?

NA

Component Versions

Please fill in the below table with the version numbers of applicable components used.

Component Version
SR-IOV CNI Plugin latest
Multus latest
SR-IOV Network Device Plugin latest
Kubernetes 1.18.3
OS Ubuntu 18.04

Config Files

Config file locations may be config dependent.

CNI config (Try '/etc/cni/net.d/')
Device pool config file location (Try '/etc/pcidp/config.json')
Multus config (Try '/etc/cni/multus/net.d')
Kubernetes deployment type ( Bare Metal, Kubeadm etc.)
Kubeconfig file
SR-IOV Network Custom Resource Definition

Logs

SR-IOV Network Device Plugin Logs (use kubectl logs $PODNAME)
Multus logs (If enabled. Try '/var/log/multus.log' )
Kubelet logs (journalctl -u kubelet)

Add capability of directly assigning PFs to Pods

The original SRIOV-CNI was enhanced with the capability of directly assigning PFs to Pods:
hustcat/sriov-cni@793ec5d

The use-cases -mostly TelCo related- mentioned by the commit are still valid, and could be satisfied with the merging of the code, and the introduction of the pfOnly configuration parameter.

The code could be further enhanced by defining the interworking between the newly introduced pfOnly, and the already existing dpdk configuration options.
When both are defined the plugin could first bind the PF directly to the configured DPDK kernel driver, then assign it to the Pod via its PCI address.
Such enhancement would basically unite the best of both plugins: capability of directly assigning PFs to Pods, but via the PFs PCI address, while also taking care of the management of the PF's control plane.

I don't think the existing plugin can serve this use-case right now, but feel free to correct me if I missed something!
I (and/or some of my colleagues) could also handle the contribution in case the idea is welcomed, but I wanted to open a discussion about it first.

make does not build the latest code

make clean is always required before make

~/github/intel/sriov-cni$
modify sriov/sriov.go
~/github/intel/sriov-cni$make
running gofmt...
running golint...
This does not build the SRIOV plugin
~/github/intel/sriov-cni$make clean
~/github/intel/sriov-cni$make
github.com/intel/sriov-cni/sriov
Building sriov...
Done!

building image failed

Running command make image failed and got

package golang is not installed
The command '/bin/sh -c yum install -y $INSTALL_PKGS &&     rpm -V $INSTALL_PKGS &&     cd /usr/src/sriov-cni &&     ./build &&     yum autoremove -y $INSTALL_PKGS &&     yum clean all &&     rm -rf /tmp/*' returned a non-zero code: 1
Makefile:149: recipe for target 'image' failed
make: *** [image] Error 1

it is related to this issue

Support VFIO backed VFs

When SR-IOV device plugin allocates a VFIO typed VF, the device doesn't have a netlink representation. So when SR-IOV CNI then tries to move the device into the pod, it fails because netlink interface is not present.

While one may argue that in case of VFIO devices we may not need a CNI at all, Multus requires that a network has type specified. If it's not sriov then it should be something else. We could write a new CNI plugin that would do nothing in terms of device binding into the namespace; but it's not convenient if we already have SR-IOV CNI with meaningful type name. We could also benefit from SR-IOV CNI handling the binding if we e.g. want IPAM info to be allocated for the device (perhaps because later we will use this information to pass it into software running inside the pod; in case of kubevirt it would be libvirt-backed VM that receives IPAM configuration via cloud-init; #37 is the place where one of potential approaches to pass IPAM into pod is discussed.)

It's my belief that SR-IOV CNI should gracefully handle VFs that are registered with VFIO driver like it already does for non-VFIO devices. This issue is just that - making the plugin not fail on VFIO registered VF and return desired IPAM information in result returned to the caller.

Ability to set MTU for the VF

What would you like to be added?

Ability to set the MTU for the VF by sending the configuration

What is the use case for this feature / enhancement?

I have a PF configured with 9000, but the VF attached inside the Pod shows up with default 1500. I am trying to solve this using the tuning plugin, but it is better to handle this by the CNI plugin instead of a need for an additional plugin to configure this.

if CNI handles this, there is an opportunity to reset the MTU for VF back to the original MTU as part of the reset VF function. I don't see MTU in the VF parameters is there a reason?

Thanks

delegateDel fails when l2enable is True

The network yaml is

apiVersion: "k8s.cni.cncf.io/v1"
kind: NetworkAttachmentDefinition
metadata:
name: vccap-access
namespace: mytest-demo
annotations:
k8s.v1.cni.cncf.io/resourceName: mytest/sriov
spec:
config: '{
"cniVersion": "0.3.0",
"type": "sriov",
"vlan": 200,
"l2enable": true
}'

We assign static IP address and the VF is bound to the i40evf driver

===
Dec 13 21:16:36 skyhi1 kubelet[5783]: 2018-12-13T21:16:36-05:00 [debug] getKubernetesDelegate: &{0xc4203525a0}, &{vccap-access-host casa-cin3701 }, /etc/cni/multus/net.d
Dec 13 21:16:36 skyhi1 kubelet[5783]: 2018-12-13T21:16:36-05:00 [debug] getKubernetesDelegate: found resourceName annotation : vccap/sriov
Dec 13 21:16:36 skyhi1 kubelet[5783]: 2018-12-13T21:16:36-05:00 [debug] getKubernetesDelegate: podID: f71c4b8c-ff3a-11e8-9098-f8bc12195210 deviceID: 0000:19:02.5
Dec 13 21:16:36 skyhi1 kubelet[5783]: 2018-12-13T21:16:36-05:00 [debug] cniConfigFromNetworkResource: &TypeMeta{Kind:NetworkAttachmentDefinition,APIVersion:k8s.cni.cncf.io/v1,}, /etc/cni/multus/net.d
Dec 13 21:16:36 skyhi1 kubelet[5783]: 2018-12-13T21:16:36-05:00 [debug] getCNIConfigFromSpec: { "cniVersion": "0.3.0", "type": "sriov", "vlan": 3701, "l2enable": true }, access-host
Dec 13 21:16:36 skyhi1 kubelet[5783]: 2018-12-13T21:16:36-05:00 [debug] LoadDelegateNetConf: {"cniVersion":"0.3.0","deviceID":"0000:19:02.5","l2enable":true,"name":"vccap-access-host","type":"sriov","vlan":3701},

--- DelegateDel is called ------------------

Dec 13 21:16:40 skyhi1 kubelet[5783]: 2018-12-13T21:16:40-05:00 [debug] delegateDel: , net2, &{{0.3.0 access-host sriov map[] {} {[] [] []}} { []} false false [123 34 99 110 105 86 101 114 115 105 111 110 34 58 34 48 46 51 46 48 34 44 34 100 101 118 105 99 101 73 68 34 58 34 48 48 48 48 58 49 57 58 48 50 46 53 34 44 34 108 50 101 110 97 98 108 101 34 58 116 114 117 101 44 34 110 97 109 101 34 58 34 118 99 99 97 112 45 97 99 99 101 115 115 45 104 111 115 116 34 44 34 116 121 112 101 34 58 34 115 114 105 111 118 34 44 34 118 108 97 110 34 58 51 55 48 49 125]}, &{cf6bdd3a34705a36a609ac12ea554e1605f6da0d5cfe1d45bfbd7ed077d5c952 /proc/412754/ns/net net2 [[IgnoreUnknown 1] [K8S_POD_NAMESPACE casa-cin3701] [K8S_POD_NAME vccap-cp-0] [K8S_POD_INFRA_CONTAINER_ID cf6bdd3a34705a36a609ac12ea554e1605f6da0d5cfe1d45bfbd7ed077d5c952]] map[] }, /opt/cni/bin
Dec 13 21:16:40 skyhi1 kubelet[5783]: 2018-12-13T21:16:40-05:00 [error] Multus: error in invoke Delegate del - "sriov": unable to get shared PF device: Link not found
Dec 13 21:16:40 skyhi1 kubelet[5783]: E1213 21:16:40.971868 5783 cni.go:280] Error deleting network: Multus: error in invoke Delegate del - "sriov": unable to get shared PF device: Link not found

---- Looking at SRIOV.go (k8s-deviceid-model)
ReleaseVF

if conf.L2Mode != false {
	//check for the shared vf net interface
	ifName := podifName + "d1"
	_, err := netlink.LinkByName(ifName)

	if err != nil {
		return fmt.Errorf("unable to get shared PF device: %v", err)
	}
	conf.Sharedvf = true
}

Looks like netlink.LinkByName() is always going fail. What is ifName := podifName + "d1" and shared vf net interface. Is this code stale?

Has anyone else seeing similar issues?

SR-IOV CNI to properly restore the state of a VF

What happened?

SR-IOV CNI does not restore the state of a VF to its original value in case:

  1. Failure mid way during cmdAdd call
  2. cmdDel call

During cmdDel SR-IOV CNI restores some VF parameters implicitly to a hardcoded predefined value
for: spoofchk, vf trust, vf rate

Failure to properly restore the state may lead to future cmdAdd operations to fail as described by:
#113 #114

What did you expect to happen?

VF state is restored to its original value. That is to the state that preceded the last cmdAdd operation.

What are the minimal steps needed to reproduce the bug?

invoke CNI when the initial state of the VFs in the system differs from what is depicted in ResetVfConfig()

Anything else we need to know?

In my opinion the approach should be:

  1. Clean whatever was configured during cmdAdd in case of a failure Or Cache the partial configuration state in case of failure for cleanup (when using K8s + Multus it calls cmdDel even if cmdAdd fails from what i saw but CNI spec does not impose any restrictions on the caller in regards to calling cmdDel even if cmdAdd fails)
  2. Clean as much stuff as possible in cmdDel, today it exits on first failure
  3. The definition of cleanup should be to restore the system to its original state.

how do sriov enabled pods connect across nodes?

Hello,

With the multus-cni, sriov-cni, sriov-device-plugin, we could start multiple pods on same node, and they could connect to each other, using host-local or static ipam.

How do pods connect across different nodes?

Say we have pod1 on node1 with static ip 10.1.1.101, pod2 on node2 with static ip 10.1.1.102, pod3 on node3 with static ip 10.1.1.103, how do they connect?

example:

apiVersion: "k8s.cni.cncf.io/v1"
kind: NetworkAttachmentDefinition
metadata:
name: sriov-net-4
namespace: mtx-test10
annotations:
k8s.v1.cni.cncf.io/resourceName: intel.com/sriov_net_A
spec:
config: '{
"type": "sriov",
"name": "sriov-network-4",
"ipam": {
"type": "static"
}
}'

--

apiVersion: v1
kind: Pod
metadata:
name: testpod1
namespace: mtx-test10
annotations:
k8s.v1.cni.cncf.io/networks: '[
{ "name": "sriov-net-4",
"ips": "10.1.1.101/24"
}
]'
spec:
containers:

  • name: appcntr1
    image: centos/tools
    imagePullPolicy: IfNotPresent
    command: [ "/bin/bash", "-c", "--" ]
    args: [ "while true; do sleep 300000; done;" ]
    resources:
    requests:
    intel.com/sriov_net_A: '1'
    limits:
    intel.com/sriov_net_A: '1'
    nodeName: mtx-hw3-bld01

--

apiVersion: v1
kind: Pod
metadata:
name: testpod2
namespace: mtx-test10
annotations:
k8s.v1.cni.cncf.io/networks: '[
{ "name": "sriov-net-4",
"ips": "10.1.1.102/24"
}
]'
spec:
containers:

  • name: appcntr1
    image: centos/tools
    imagePullPolicy: IfNotPresent
    command: [ "/bin/bash", "-c", "--" ]
    args: [ "while true; do sleep 300000; done;" ]
    resources:
    requests:
    intel.com/sriov_net_A: '1'
    limits:
    intel.com/sriov_net_A: '1'
    nodeName: mtx-hw3-bld02

Thanks. -Jessica

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.