Code Monkey home page Code Monkey logo

k8snetworkplumbingwg / bond-cni Goto Github PK

View Code? Open in Web Editor NEW
59.0 11.0 29.0 5.89 MB

Bond-cni is for fail-over and high availability of networking in cloudnative orchestration

Home Page: https://networkbuilders.intel.com/network-technologies/container-experience-kits

License: Apache License 2.0

Go 89.92% Shell 1.95% Makefile 8.13%
cni networking bonding-cni high-availability failover interface-bonding fault-tolerance active-backup load-balancing link-aggregator tlb alb

bond-cni's Introduction

Bond CNI plugin

Coverage Status

  • Bonding provides a method for aggregating multiple network interfaces into a single logical "bonded" interface.
  • According to the 802.3ad specification, Linux Bonding drivers provides various flavours of bonded interfaces depending on the mode (bonding policies), such as round robin, active aggregation
  • When Bond CNI is configured as a standalone plugin, interfaces are obtained from the host network namespace. With these physical interfaces a bonded interface is created in the container network namespace.
  • When used with Multus users can bond two interfaces that have previously been passed into the container.
  • A major use case for bonding in containers is network redundancy of an application in the case of network device or path failure and unavailability. For more information - refer to network redundancy using interface bonding
  • And for more information on the bonding driver please refer to kernel doc

Build

It is recommended that Bond CNI be built with Go 1.12+ with dependencies managed using Go modules.

  • Build the source code to binary:
make build-bin
  • Copy the binary to the CNI folder for the testing:
cp ./bin/bond /opt/cni/bin/

The binary should be placed at /opt/cni/bin on all nodes on which bonding will take place. That is all nodes to which a container with a bonded interface can be deployed.

Network configuration reference

  • name (string, required): the name of the network
  • type (string, required): "bond"
  • miimon (int, required): specifies the arp link monitoring frequency in milliseconds
  • mtu (int, optional): the mtu of the bond. Default is 1500.
  • failOverMac (int, optional): specifies the failOverMac setting for the bond. Should be set to 1 for active-backup bond modes. Default is 0.
  • linksInContainer(boolean, optional): specifies if slave links are in container to start. Default is false i.e. look for interfaces on host before bonding.
  • links (dictionary, required): master interface names
  • ipam (dictionary, required): IPAM configuration to be used for this network

Usage

Standalone operation

Given the following network configuration:

# cat > /etc/cni/net.d/00-flannel-bonding.conf <<EOF
{
	"name": "mynet",
	"type": "flannel",
	"delegate": {
		"type": "bond",
		"mode": "active-backup",
		"miimon": "100",
		"mtu": 1500,
                "failOverMac": 1,
		"links": [
            {
				"name": "ens3f2"
			},
			{
				"name": "ens3f2d1"
			}
		]
	}
}
EOF

Note: In this example configuration above required "ipam" is provided by flannel plugin implicitly.

Integration with Multus, SRIOV CNI and SRIOV Device Plugin

Users can take advantage of Multus to enable adding multiple interfaces to a K8s Pod. The SRIOV CNI plugin allows a SRIOV VF (Virtual Function) to be added to a container. Additionally the SRIOV Device Plugin allows Kubelet to manage SRIOV virtual functions. This example shows how Bond CNI could be used in conjunction with these plugins to handle more advanced use cases e.g, high performance container networking solution for NFV environment. Specifically the below functionality shows how to set up failover for SR-IOV interfaces in Kubernetes. This configuration is only applicable to SRIOV VFs using the kernel driver. Userspace driver VFs - such as those used in DPDK workloads - can not be bonded with the Bond CNI.

Configuration is based on the Multus CRD Network Attachment Definition. Please follow the configuration details in the link: Usage with Kubernetes CRD based Network Objects

For more information and advanced use refer to the Network Custom Resource standard for more details.

Bonded failover for SRIOV Workloads

Prerequisites:

  • Multus configured as per the quick start guide

  • SRIOV CNI and Multus CNI placed in /opt/cni/bin

  • SRIOV Device Plugin running as a Daemonset on the cluster

The SRIOV Device Plugin will need to be configured to ensure the VFs in the pod are from different network cards. This is important because failover requires that the bonded interface still have connection even if one of the slave interfaces goes down. If both virtual functions are from the same root any connection issues on the physical interface and card will be reflected in both VFs at the same time.

An example SRIOV config - which works on the basis of physical interface names- is:

apiVersion: v1
kind: ConfigMap
metadata:
  name: sriovdp-config
  namespace: kube-system
data:
  config.json: |
    {
        "resourceList": [{
                "resourceName": "intel_sriov_PF_1",
                "selectors": {
                    "vendors": ["8086"],
                    "devices": ["154c", "10ed"],
                    "drivers": ["i40evf", "ixgbevf"],
                    "pfNames": ["<PF_NAME_2>"]

                }
            },
        {
                "resourceName": "intel_sriov_PF_2",
                "selectors": {
                    "vendors": ["8086"],
                    "devices": ["154c", "10ed"],
                    "drivers": ["i40evf", "ixgbevf"],
                    "pfNames": ["<PF_NAME_2>"]
                }
            }
        ]
    }

In the above specific PF names will have to be entered - based on available PFs in the cluster - in order to make the selectors pick up the correct VFs. The other selectors in the above configuration are identical. Note that SRIOV device plugin only picks up new configuration at startup - so if the daemonset was previously running the pods will have to be killed and redeployed before this is advertised again.

Steps for deployment

  1. Deploy Network Attach Definiton for SRIOV
apiVersion: "k8s.cni.cncf.io/v1"
kind: NetworkAttachmentDefinition
metadata:
  name: sriov-net1
  annotations:
    k8s.v1.cni.cncf.io/resourceName: intel.com/intel_sriov_PF_1
spec:
  config: '{
  "type": "sriov",
  "name": "sriov-network",
  "spoofchk":"off"
}'

We will create a separate - but equivalent except for naming - SRIOV network attach definition. This allows us to keep our definitions seperate for our two Physical Function pools as definited above.

apiVersion: "k8s.cni.cncf.io/v1"
kind: NetworkAttachmentDefinition
metadata:
  name: sriov-net2
  annotations:
    k8s.v1.cni.cncf.io/resourceName: intel.com/intel_sriov_PF_2
spec:
  config: '{
  "type": "sriov",
  "name": "sriov-network",
  "spoofchk":"off"
}'
  1. Deploy Network Attach Definition for Bond CNI:
apiVersion: "k8s.cni.cncf.io/v1"
kind: NetworkAttachmentDefinition
metadata:
  name: bond-net1
spec:
  config: '{
  "type": "bond",
  "cniVersion": "0.3.1",
  "name": "bond-net1",
  "mode": "active-backup",
  "failOverMac": 1,
  "linksInContainer": true,
  "miimon": "100",
  "mtu": 1500,
  "links": [
     {"name": "net1"},
     {"name": "net2"}
  ],
  "ipam": {
    "type": "host-local",
    "subnet": "10.56.217.0/24",
    "routes": [{
      "dst": "0.0.0.0/0"
    }],
    "gateway": "10.56.217.1"
  }
}'

Note above the "linksInContainer": true flag. This tells the Bond CNI that the interfaces we're looking for are to be found inside the container. By default it will look for these interfaces on the host which does not work for integration with SRIOV/Multus.

  1. Deploy a pod which requests two SRIOV networks, one from each PF, and one bonded network.
apiVersion: v1
kind: Pod
metadata:
  name: test-pod
  annotations:
        k8s.v1.cni.cncf.io/networks: '[
{"name": "sriov-net1",
"interface": "net1"
},
{"name": "sriov-net2",
"interface": "net2"
},
{"name": "bond-net",
"interface": "bond0"
}]'
spec:
  restartPolicy: Never
  containers:
  - name: bond-test
    image: alpine:latest
    command:
      - /bin/sh
      - "-c"
      - "sleep 60m"
    imagePullPolicy: IfNotPresent
    resources:
      requests:
        intel.com/intel_sriov_PF_1: '1'
        intel.com/intel_sriov_PF_2: '1'
      limits:
        intel.com/intel_sriov_PF_1: '1'
        intel.com/intel_sriov_PF_2: '1'

The order in the request annotation k8s.v1.cni.cncf.io/networks: sriov-net1, sriov-net2, bond-net1 is important as it is the same order in which networks will be added. In the above spec we add one SRIOV network, then we add another identically configured SRIOV network from our second SRIOV VF pool. Multus will give these networks the names net1 and net2 respectively.

Next the bond-net1 network is created - using interfaces net1 and net2. If bond is created before the SRIOV networks the CNI will not be able to find the interfaces in the container.

The name of each interface can be set manually in the annotation according to the CRD Spec. Changing the names applied in the annotation configuration requires matching changes to be made in the bond network attachment definition.

After deploying the above pod spec on Kubernetes running the following command:

kubectl exec -it test-pod -- ip a

Will result in output like:

1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN qlen 1000
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
    inet 127.0.0.1/8 scope host lo
       valid_lft forever preferred_lft forever
3: eth0@if150: <BROADCAST,MULTICAST,UP,LOWER_UP,M-DOWN> mtu 1450 qdisc noqueue state UP 
    link/ether 62:b1:b5:c8:fb:7a brd ff:ff:ff:ff:ff:ff
    inet 10.244.1.122/24 brd 10.244.1.255 scope global eth0
       valid_lft forever preferred_lft forever
4: bond0: <BROADCAST,MULTICAST,UP,LOWER_UP400> mtu 1500 qdisc noqueue state UP qlen 1000
    link/ether 9e:23:69:42:fb:8a brd ff:ff:ff:ff:ff:ff
    inet 10.56.217.66/24 scope global bond0
       valid_lft forever preferred_lft forever
43: net1: <BROADCAST,MULTICAST,UP,LOWER_UP800> mtu 1500 qdisc mq master bond0 state UP qlen 1000
    link/ether 9e:23:69:42:fb:8a brd ff:ff:ff:ff:ff:ff
44: net2: <BROADCAST,MULTICAST,UP,LOWER_UP800> mtu 1500 qdisc mq master bond0 state UP qlen 1000
    link/ether 9e:23:69:42:fb:8a brd ff:ff:ff:ff:ff:ff

We have three new interfaces added to our pod - net1 and net2 are SRIOV interfaces while bond0 is the bond over the two of them. Net1 and Net2 don't require IP addresses - and this can be changed in their CRD.

bond-cni's People

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

bond-cni's Issues

Bond iface does not work when active slave is teared down #12

Update: similar issue to #11 - in my case I see this behaviour when I tear down the net1 which is the active slave interface.

Hi,

I have created a bond interface composed by two sr-iov slaves. I can see that the bond interface is successfully connected to the physical network and the load balance mode set to active-passive is working as expected.

These are the NAD and the pod definitions:

oc get -o yaml net-attach-def bond-dhcp
apiVersion: k8s.cni.cncf.io/v1
kind: NetworkAttachmentDefinition
metadata:
  annotations:
    kubectl.kubernetes.io/last-applied-configuration: |
      {"apiVersion":"k8s.cni.cncf.io/v1","kind":"NetworkAttachmentDefinition","metadata":{"annotations":{},"name":"bond-dhcp","namespace":"sriov-test-bond"},"spec":{"config":"{ \"type\": \"bond\", \"cniVersion\": \"0.3.1\", \"name\": \"bond-net1\", \"ifname\": \"net3\", \"spoofChk\": \"off\", \"mode\": \"active-backup\", \"linksInContainer\": true, \"miimon\": \"100\", \"links\": [ {\"name\": \"net1\"}, {\"name\": \"net2\"} ], \"ipam\": { \"type\":\"dhcp\"\n} }"}}
  creationTimestamp: "2020-05-04T10:08:13Z"
  generation: 2
  name: bond-dhcp
  namespace: sriov-test-bond
  resourceVersion: "4331357"
  selfLink: /apis/k8s.cni.cncf.io/v1/namespaces/sriov-test-bond/network-attachment-definitions/bond-dhcp
  uid: 7fdc413f-16bb-4044-a8ef-666fa6717922
spec:
  config: |-
    { "type": "bond", "cniVersion": "0.3.1", "name": "bond-net1", "ifname": "net3", "spoofChk": "off", "mode": "active-backup", "linksInContainer": true, "miimon": "100", "links": [ {"name": "net1"}, {"name": "net2"} ], "ipam": { "type":"dhcp"
    } }

apiVersion: v1
kind: Pod
metadata:
name: bond-pod-dhcp
namespace: sriov-test-bond
annotations:
k8s.v1.cni.cncf.io/networks: slaves, slaves, bond-dhcp
spec:
containers:

  • name: podexample
    securityContext:
    privileged: true
    image: quay.io/schseba/utility-container:latest
    command: ["/bin/bash", "-c", "sleep INF"]

The pod is able to get an IP from the DHCP server:

```sh
oc rsh bond-pod-dhcp ip a
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
    inet 127.0.0.1/8 scope host lo
       valid_lft forever preferred_lft forever
    inet6 ::1/128 scope host 
       valid_lft forever preferred_lft forever
3: eth0@if9459: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1400 qdisc noqueue state UP group default 
    link/ether 16:f4:4c:87:00:06 brd ff:ff:ff:ff:ff:ff link-netnsid 0
    inet 10.135.0.5/23 brd 10.135.1.255 scope global eth0
       valid_lft forever preferred_lft forever
    inet6 fe80::44a8:8dff:fe66:2437/64 scope link 
       valid_lft forever preferred_lft forever
4: net3: <BROADCAST,MULTICAST,MASTER,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default qlen 1000
    link/ether d6:51:55:dd:09:f4 brd ff:ff:ff:ff:ff:ff
    inet 172.22.0.63/24 scope global net3
       valid_lft forever preferred_lft forever
    inet6 fe80::d451:55ff:fedd:9f4/64 scope link 
       valid_lft forever preferred_lft forever
20: net1: <BROADCAST,MULTICAST,SLAVE,UP,LOWER_UP> mtu 1500 qdisc mq master net3 state UP group default qlen 1000
    link/ether d6:51:55:dd:09:f4 brd ff:ff:ff:ff:ff:ff
21: net2: <BROADCAST,MULTICAST,SLAVE,UP,LOWER_UP> mtu 1500 qdisc mq master net3 state UP group default qlen 1000
    link/ether d6:51:55:dd:09:f4 brd ff:ff:ff:ff:ff:ff

And able to reach the network:

oc rsh bond-pod-dhcp ping 172.22.0.253
PING 172.22.0.253 (172.22.0.253) 56(84) bytes of data.
64 bytes from 172.22.0.253: icmp_seq=1 ttl=64 time=0.205 ms
64 bytes from 172.22.0.253: icmp_seq=2 ttl=64 time=0.256 ms

The bond configuration is the expected one as well:

sh-4.2# cat /proc/net/bonding/net3 
Ethernet Channel Bonding Driver: v3.7.1 (April 27, 2011)

Bonding Mode: fault-tolerance (active-backup)
Primary Slave: None
Currently Active Slave: net1
MII Status: up
MII Polling Interval (ms): 100
Up Delay (ms): 0
Down Delay (ms): 0

Slave Interface: net1
MII Status: up
Speed: 25000 Mbps
Duplex: full
Link Failure Count: 0
Permanent HW addr: d6:51:55:dd:09:f4
Slave queue ID: 0

Slave Interface: net2
MII Status: up
Speed: 25000 Mbps
Duplex: full
Link Failure Count: 0
Permanent HW addr: 62:17:0c:6e:b6:f0
Slave queue ID: 0

If I tear down the net1 interface the net2 becomes the active one, but I am not able to connect to the network:

sh-4.2# ip link set net1 down

sh-4.2# ping 172.22.0.253
PING 172.22.0.253 (172.22.0.253) 56(84) bytes of data.
From 172.22.0.63 icmp_seq=10 Destination Host Unreachable
From 172.22.0.63 icmp_seq=11 Destination Host Unreachable
From 172.22.0.63 icmp_seq=12 Destination Host Unreachable

sh-4.2# cat /proc/net/bonding/net3 
Ethernet Channel Bonding Driver: v3.7.1 (April 27, 2011)

Bonding Mode: fault-tolerance (active-backup)
Primary Slave: None
Currently Active Slave: net2
MII Status: up
MII Polling Interval (ms): 100
Up Delay (ms): 0
Down Delay (ms): 0

Slave Interface: net1
MII Status: down
Speed: 25000 Mbps
Duplex: full
Link Failure Count: 1
Permanent HW addr: d6:51:55:dd:09:f4
Slave queue ID: 0

Slave Interface: net2
MII Status: up
Speed: 25000 Mbps
Duplex: full
Link Failure Count: 0
Permanent HW addr: 62:17:0c:6e:b6:f0
Slave queue ID: 0

dmesg output:

[518128.738347] net3: link status definitely down for interface net1, disabling it
[518128.745685] net3: making interface net2 the new active one
[518147.220852] device f07c8aba10ef6a0 left promiscuous mode
[518155.644880] SELinux: mount invalid.  Same superblock, different security settings for (dev mqueue, type mqueue)
[518155.648273] SELinux: mount invalid.  Same superblock, different security settings for (dev mqueue, type mqueue)
[518155.911369] IPv6: ADDRCONF(NETDEV_UP): eth0: link is not ready
[518155.938309] IPv6: ADDRCONF(NETDEV_UP): veth73f8e2a0: link is not ready
[518155.945130] IPv6: ADDRCONF(NETDEV_CHANGE): veth73f8e2a0: link becomes ready
[518155.947316] SELinux: mount invalid.  Same superblock, different security settings for (dev mqueue, type mqueue)
[518155.952284] IPv6: ADDRCONF(NETDEV_CHANGE): eth0: link becomes ready
[518685.844857] 4e518eaf8d029fd: renamed from vethfbeea613
[518685.895350] device 4e518eaf8d029fd entered promiscuous mode
[518687.915377] SELinux: mount invalid.  Same superblock, different security settings for (dev mqueue, type mqueue)
[518695.548931] SELinux: mount invalid.  Same superblock, different security settings for (dev mqueue, type mqueue)
[518695.822684] SELinux: mount invalid.  Same superblock, different security settings for (dev mqueue, type mqueue)
[518711.233240] SELinux: mount invalid.  Same superblock, different security settings for (dev mqueue, type mqueue)
[518735.567543] SELinux: mount invalid.  Same superblock, different security settings for (dev mqueue, type mqueue)
[518735.851176] SELinux: mount invalid.  Same superblock, different security settings for (dev mqueue, type mqueue)
[518807.358873] device 4e518eaf8d029fd left promiscuous mode

Notes:

  • If I tear down the net2 interface and activate net1 again, so the net1 becomes the active one in the bonding, I am able to connect to the network again.

  • The spoofChk is set to off as I can see from the node vf 0 and 3:

2: eno1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP mode DEFAULT group default qlen 1000
link/ether 98:03:9b:61:ba:d0 brd ff:ff:ff:ff:ff:ff
vf 0 MAC 00:00:00:00:00:00, spoof checking on, link-state auto, trust on, query_rss off
vf 1 MAC 00:00:00:00:00:00, spoof checking off, link-state auto, trust on, query_rss off
vf 2 MAC 00:00:00:00:00:00, spoof checking off, link-state auto, trust on, query_rss off
vf 3 MAC 00:00:00:00:00:00, spoof checking on, link-state enable, trust on, query_rss off
vf 4 MAC 00:00:00:00:00:00, spoof checking on, link-state auto, trust on, query_rss off

  • dmesg does not say anything special apart from showing that the active iface now is net2.

Questions:

  • Looks like to be something related to spoofing, probably the MAC address of the bond interface must change if the net2 is the active one? or in the other way around? I do not think there is something with port-security at switch level...

Failed to set vf 0 vlan: operation not supported

Hello,

I'm trying to conduct bonding for SR-IOV VFs (Mellanox Infiniband) inside POD of kubernetes by using bond-cni.
I deployed sriov-cni, sriov-device-plugin, and multus.
And I also applied ConfigMap for VF pool.

But I'm facing an error when creating PODs 'failed to set vf 0 vlan: operation not supported' as follow:

root@node1$ kubectl describe pod sample-pod2-7487977bcb-c8wd8
Name:           sample-pod2-7487977bcb-c8wd8
Namespace:      default
Priority:       0
Node:           node3/10.253.4.149
Start Time:     Wed, 16 Mar 2022 04:25:14 +0000
Labels:         app=sriov
                pod-template-hash=7487977bcb
Annotations:    cni.projectcalico.org/podIP:
                cni.projectcalico.org/podIPs:
                k8s.v1.cni.cncf.io/networks:
                  [ {"name": "sriov-net1", "interface": "net1" }, {"name": "sriov-net2", "interface": "net2" }, {"name": "bond-net1", "interface": "bond0" }...
Status:         Pending
IP:
IPs:            <none>
Controlled By:  ReplicaSet/sample-pod2-7487977bcb
Containers:
  mlnx-inbox-ctr:
    Container ID:
    Image:         mellanox/mofed-5.5-1.0.3.2:ubuntu20.04-amd64
    Image ID:
    Port:          <none>
    Host Port:     <none>
    Command:
      sh
      -c
      sleep inf
    State:          Waiting
      Reason:       ContainerCreating
    Ready:          False
    Restart Count:  0
    Limits:
      intel.com/intel_sriov_PF_1:  1
      intel.com/intel_sriov_PF_2:  1
    Requests:
      intel.com/intel_sriov_PF_1:  1
      intel.com/intel_sriov_PF_2:  1
    Environment:                   <none>
    Mounts:
      /var/run/secrets/kubernetes.io/serviceaccount from default-token-c6h4x (ro)
Conditions:
  Type              Status
  Initialized       True
  Ready             False
  ContainersReady   False
  PodScheduled      True
Volumes:
  default-token-c6h4x:
    Type:        Secret (a volume populated by a Secret)
    SecretName:  default-token-c6h4x
    Optional:    false
QoS Class:       BestEffort
Node-Selectors:  <none>
Tolerations:     node.kubernetes.io/not-ready:NoExecute op=Exists for 300s
                 node.kubernetes.io/unreachable:NoExecute op=Exists for 300s
Events:
  Type     Reason                  Age                    From               Message
  ----     ------                  ----                   ----               -------
  Normal   Scheduled               40m                    default-scheduler  Successfully assigned default/sample-pod2-7487977bcb-c8wd8 to node3
  Normal   AddedInterface          40m                    multus             Add eth0 [10.233.92.68/32]
  Warning  FailedCreatePodSandBox  40m                    kubelet            Failed to create pod sandbox: rpc error: code = Unknown desc = [failed to set up sandbox container "004cb7f4aab95f957f0cf82114ecf45d7d5c28f86657c89bc62aa55023fe6cff" network for pod "sample-pod2-7487977bcb-c8wd8": networkPlugin cni failed to set up pod "sample-pod2-7487977bcb-c8wd8_default" network: [default/sample-pod2-7487977bcb-c8wd8:sriov-network]: error adding container to network "sriov-network": SRIOV-CNI failed to configure VF "failed to set vf 0 vlan: operation not supported", failed to clean up sandbox container "004cb7f4aab95f957f0cf82114ecf45d7d5c28f86657c89bc62aa55023fe6cff" network for pod "sample-pod2-7487977bcb-c8wd8": networkPlugin cni failed to teardown pod "sample-pod2-7487977bcb-c8wd8_default" network: delegateDel: error invoking DelegateDel - "bond": error in getting result from DelNetwork: Failed to retrieve link objects from configuration file (&{NetConf:{CNIVersion:0.3.1 Name:bond-net1 Type:bond Capabilities:map[] IPAM:{Type:host-local} DNS:{Nameservers:[] Domain: Search:[] Options:[]} RawPrevResult:map[] PrevResult:<nil>} Name:bond0 Mode:active-backup LinksContNs:true FailOverMac:1 Miimon:100 Links:[map[name:net1] map[name:net2]] MTU:1500}), error: Failed to confirm that link (net1) exists, error: Failed to lookup link name net1, error: Link not found]
...

The content of the corresponding files is as follows:

root@node1$ cat 1.sriov-nat-net1.yaml
apiVersion: "k8s.cni.cncf.io/v1"
kind: NetworkAttachmentDefinition
metadata:
  name: sriov-net1
  annotations:
    k8s.v1.cni.cncf.io/resourceName: intel.com/intel_sriov_PF_1
spec:
  config: '{
  "type": "sriov",
  "cniVersion": "0.3.1",
  "name": "sriov-network",
  "spoofchk":"off"
}'

root@node1$ cat 2.sriov-nat-net2.yaml
apiVersion: "k8s.cni.cncf.io/v1"
kind: NetworkAttachmentDefinition
metadata:
  name: sriov-net2
  annotations:
    k8s.v1.cni.cncf.io/resourceName: intel.com/intel_sriov_PF_2
spec:
  config: '{
  "type": "sriov",
  "cniVersion": "0.3.1",
  "name": "sriov-network",
  "spoofchk":"off"
}'

root@node1$ cat 3.bond-nat.yaml
apiVersion: "k8s.cni.cncf.io/v1"
kind: NetworkAttachmentDefinition
metadata:
  name: bond-net1
spec:
  config: '{
  "type": "bond",
  "cniVersion": "0.3.1",
  "name": "bond-net1",
  "ifname": "bond0",
  "mode": "active-backup",
  "failOverMac": 1,
  "linksInContainer": true,
  "miimon": "100",
  "mtu": 1500,
  "links": [
     {"name": "net1"},
     {"name": "net2"}
  ],
  "ipam": {
    "type": "host-local",
    "subnet": "192.168.101.0/24",
    "routes": [{
      "dst": "0.0.0.0/0"
    }],
    "gateway": "192.168.101.1"
  }
}'

root@node1$ cat 4.sample-dep2.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
  name: sample-pod2
  labels:
    app: sriov
spec:
  replicas: 2
  selector:
    matchLabels:
      app: sriov
  template:
    metadata:
      labels:
        app: sriov
      annotations:
        k8s.v1.cni.cncf.io/networks: '[
{"name": "sriov-net1",
"interface": "net1"
},
{"name": "sriov-net2",
"interface": "net2"
},
{"name": "bond-net1",
"interface": "bond0"
}]'
    spec:
      containers:
      - image: "mellanox/mofed-5.5-1.0.3.2:ubuntu20.04-amd64"
        name: mlnx-inbox-ctr
        securityContext:
          capabilities:
            add: [ "IPC_LOCK" ]
        resources:
          requests:
            intel.com/intel_sriov_PF_1: '1'
            intel.com/intel_sriov_PF_2: '1'
          limits:
            intel.com/intel_sriov_PF_1: '1'
            intel.com/intel_sriov_PF_2: '1'
        command:
        - sh
        - -c
        - sleep inf

root@node1$ cat sriov-config.yaml
apiVersion: v1
kind: ConfigMap
metadata:
  name: sriovdp-config
  namespace: kube-system
data:
  config.json: |
    {
        "resourceList": [{
                "resourceName": "intel_sriov_PF_1",
                "selectors": {
                    "vendors": ["15b3"],
                    "devices": ["101c"],
                    "drivers": ["mlx5_core"],
                    "pfNames": ["ibp12s0"]

                }
            },
        {
                "resourceName": "intel_sriov_PF_2",
                "selectors": {
                    "vendors": ["15b3"],
                    "devices": ["101c"],
                    "drivers": ["mlx5_core"],
                    "pfNames": ["ibp75s0"]
                }
            }
        ]
    }

VFs are also created appropriately as follows:

root@node3$ ibdev2netdev
mlx5_0 port 1 ==> ibp12s0 (Up)
mlx5_1 port 1 ==> ibp18s0 (Down)
mlx5_10 port 1 ==> ibp225s0f0 (Down)
mlx5_11 port 1 ==> ibp225s0f1 (Up)
mlx5_12 port 1 ==> ibp12s0v0 (Down)
mlx5_13 port 1 ==> ibp12s0v1 (Down)
mlx5_14 port 1 ==> ibp12s0v2 (Down)
mlx5_15 port 1 ==> ibp12s0v3 (Down)
mlx5_16 port 1 ==> ibp12s0v4 (Down)
mlx5_17 port 1 ==> ibp12s0v5 (Down)
mlx5_18 port 1 ==> ibp12s0v6 (Down)
mlx5_19 port 1 ==> ibp12s0v7 (Down)
mlx5_2 port 1 ==> ibp75s0 (Up)
mlx5_20 port 1 ==> ibp75s0v0 (Down)
mlx5_21 port 1 ==> ibp75s0v1 (Down)
mlx5_22 port 1 ==> ibp75s0v2 (Down)
mlx5_23 port 1 ==> ibp75s0v3 (Down)
mlx5_24 port 1 ==> ibp75s0v4 (Down)
mlx5_25 port 1 ==> ibp75s0v5 (Down)
mlx5_26 port 1 ==> ibp75s0v6 (Down)
mlx5_27 port 1 ==> ibp75s0v7 (Down)
mlx5_3 port 1 ==> ibp84s0 (Down)
mlx5_4 port 1 ==> enp97s0f0 (Up)
mlx5_5 port 1 ==> ibp97s0f1 (Down)
mlx5_6 port 1 ==> ibp141s0 (Down)
mlx5_7 port 1 ==> ibp148s0 (Down)
mlx5_8 port 1 ==> ibp186s0 (Down)
mlx5_9 port 1 ==> ibp204s0 (Down)

root@node3$ lspci | grep -i Virtual
0c:00.1 Infiniband controller: Mellanox Technologies MT28908 Family [ConnectX-6 Virtual Function]
0c:00.2 Infiniband controller: Mellanox Technologies MT28908 Family [ConnectX-6 Virtual Function]
0c:00.3 Infiniband controller: Mellanox Technologies MT28908 Family [ConnectX-6 Virtual Function]
0c:00.4 Infiniband controller: Mellanox Technologies MT28908 Family [ConnectX-6 Virtual Function]
0c:00.5 Infiniband controller: Mellanox Technologies MT28908 Family [ConnectX-6 Virtual Function]
0c:00.6 Infiniband controller: Mellanox Technologies MT28908 Family [ConnectX-6 Virtual Function]
0c:00.7 Infiniband controller: Mellanox Technologies MT28908 Family [ConnectX-6 Virtual Function]
0c:01.0 Infiniband controller: Mellanox Technologies MT28908 Family [ConnectX-6 Virtual Function]
4b:00.1 Infiniband controller: Mellanox Technologies MT28908 Family [ConnectX-6 Virtual Function]
4b:00.2 Infiniband controller: Mellanox Technologies MT28908 Family [ConnectX-6 Virtual Function]
4b:00.3 Infiniband controller: Mellanox Technologies MT28908 Family [ConnectX-6 Virtual Function]
4b:00.4 Infiniband controller: Mellanox Technologies MT28908 Family [ConnectX-6 Virtual Function]
4b:00.5 Infiniband controller: Mellanox Technologies MT28908 Family [ConnectX-6 Virtual Function]
4b:00.6 Infiniband controller: Mellanox Technologies MT28908 Family [ConnectX-6 Virtual Function]
4b:00.7 Infiniband controller: Mellanox Technologies MT28908 Family [ConnectX-6 Virtual Function]
4b:01.0 Infiniband controller: Mellanox Technologies MT28908 Family [ConnectX-6 Virtual Function]

I'm pretty new to bonding so any advices will be very helpful to me.
Thank you!

Incorrect bond name

The bond-cni uses the "name" parameter from the config to name the bond interface it creates.
This is not correct, as the cni spec says that the CNI_IFNAME parameter should be used to name the interface. It also says that naming the ifc differently is not allowed:
CNI_IFNAME: Name of the interface to create inside the container; if the plugin is unable to use this interface name it must return an error.

Not conforming to this can lead to potential issues. One is an issues with plugin chaining. A metaplugin following the bond-cni will assume the name of the created interface is that found in CNI_IFNAME, and if the names differ it will fail trying to act on an incorrect (not-existing) interface.

We can fix this by either:

  • getting rid of the "name" parameter in the config - api will be backwards incompatible
  • ensuring the value in both locations is the same, returning an error otherwise - not nice, but will not break the api

IP assign for bond interface is not show in metadata

Hi, I used bond-cni for my project. However, I realized that the IP is assigned to POD's Interface Bond does not appear in the POD object when GET from the server. Usually I will see it in networks-status as shown below

image

This causes inconvenience when I want to manage IP allocation to Bond Interface. I assume that Interface Bond has to wait for other interfaces to be added before it is created, this misses a pod status update event to the API-server. I can still view the IP of Bond Interface granted by going to Pod to check, but it is inconvenient in monitoring. So is there a way to check the IP has been issued to the Bond Interface of Pod, or did I configure something with Bond-CNI to lead to such a result?

Pod hangs on terminating state due to missing slave interfaces

Scenario

Bond with two (in-container) interfaces.

k8s.v1.cni.cncf.io/networks: openshift-sriov-network-operator/test-sriov-for-bond-network,openshift-sriov-network-operator/test-sriov-for-bond-network,bond-testing/bond@bond0

"{"cniVersion":"0.4.0","name":"bond","plugins":[{"ipam": {"type":
"host-local", "subnet": "1.1.1.0/24"}, \n\t "type": "bond",\n\t\t"ifname":
"bond0",\n\t\t"mode": "active-backup",\n\t\t"failOverMac": 1,\n\t\t"linksInContainer":
true,\n\t\t"miimon": "100",\n\t\t"mtu": 1300,\n\t\t"links": [ {"name":
"net1"}, {"name": "net2"} ]}]}"

Issue

When the pod was deleted, the DEL command was called on the bond cni. This failed at detaching the slave links from the bond. For some reason, it couldn't bring the slave device back up (device or resource busy error).

The following DEL commands of the sriov cni deleted the slave links and when the DEL command on the bond cni was retried, it kept on failing because it couldn't find the slave links of the bond.

Pod deletion could not be completed (hanged on terminating state) affecting other operations.

"Error syncing pod, skipping" err="failed to "KillPodSandbox" for "d2b3c5db-ba5f-44a8-a7b9-0a8c087fd3dd" with KillPodSandboxError: "rpc error: code = Unknown desc = failed to destroy network for pod sandbox k8s_testpod-kgwvl_bond-testing_d2b3c5db-ba5f-44a8-a7b9-0a8c087fd3dd_0(24a237e7b3529293c682030db373fb6d902ffa861f60862beb60bcdaf93fa89f): error removing pod bond-testing_testpod-kgwvl from CNI network \"multus-cni-network\": plugin type=\"multus\" name=\"multus-cni-network\" failed (delete): delegateDel: error invoking ConflistDel - \"bond\": conflistDel: error in getting result from DelNetworkList: Failed to retrieve link objects from configuration file (&{NetConf:{CNIVersion:0.4.0 Name:bond Type:bond Capabilities:map[] IPAM:{Type:host-local} DNS:{Nameservers:[] Domain: Search:[] Options:[]} RawPrevResult:map[cniVersion:0.4.0 dns:map[] interfaces:[map[mac:c2:fb:59:b7:71:d3 name:bond0 sandbox:/var/run/netns/4fe7ca59-b4e2-4025-a199-894307cbe8b3]] ips:[map[address:1.1.1.6/24 gateway:1.1.1.1 interface:0 version:]]] PrevResult:} Mode:active-backup LinksContNs:true FailOverMac:1 Miimon:100 Links:[map[name:net1] map[name:net2]] MTU:1300}), error: Failed to confirm that link (net1) exists, error: Failed to lookup link name net1, error: Link not found"" pod="bond-testing/testpod-kgwvl" podUID=d2b3c5db-ba5f-44a8-a7b9-0a8c087fd3dd

SRIOV WITH DPDK

Hello Team๏ผš
Have you tested the SRIOV DPDK?
tested but can't work well:

# 1.create net1's NAD๏ผš
# cat nad-net1.yaml 
apiVersion: "k8s.cni.cncf.io/v1"
kind: NetworkAttachmentDefinition
metadata:
  name: net1
  annotations:
    k8s.v1.cni.cncf.io/resourceName: intel.com/sriov_dpdk_3
spec:
  config: '{
  "type": "sriov",
  "cniVersion": "0.3.1",
  "name": "sriov-kernelnet0",
  "spoofchk": "off",
  "type": "sriov"
}'

# 2.create net2's NAD๏ผš
# cat nad-net2.yaml 
apiVersion: "k8s.cni.cncf.io/v1"
kind: NetworkAttachmentDefinition
metadata:
  name: net2
  annotations:
    k8s.v1.cni.cncf.io/resourceName: intel.com/sriov_dpdk_4
spec:
  config: '{
  "type": "sriov",
  "cniVersion": "0.3.1",
  "name": "sriov-kernelnet0",
  "spoofchk": "off",
  "type": "sriov"
}'
# 3. create bond'sNAD๏ผš
# cat bond.yaml 
apiVersion: "k8s.cni.cncf.io/v1"
kind: NetworkAttachmentDefinition
metadata:
  name: bond
spec:
  config: '{
  "type": "bond",
  "cniVersion": "0.3.1",
  "name": "bond",
  "ifname": "bond0",
  "mode": "active-backup",
  "failOverMac": 1,
  "linksInContainer": true,
  "miimon": "100",
  "links": [
     {"name": "net1"},
     {"name": "net2"}
  ]
}'

# Pod:
# 1. Pod in ns net1๏ผš
apiVersion: v1
kind: Pod
metadata:
  name: pod-bond
  annotations:
        k8s.v1.cni.cncf.io/networks: '[
{"name": "net1",
"interface": "net1"
},
{"name": "net2",
"interface": "net2"
},
{"name": "bond",
"interface": "bond0"
}]'
spec:
  restartPolicy: Never
  containers:
  - name: bond-test
    image: centos/tools:v1
    imagePullPolicy: IfNotPresent
    command:
    - /sbin/init
    resources:
      requests:
        intel.com/sriov_dpdk_3: '1'
        intel.com/sriov_dpdk_4: '1'
      limits:
        intel.com/sriov_dpdk_3: '1'
        intel.com/sriov_dpdk_4: '1'

# 2.Pod in ns net2๏ผš
apiVersion: v1
kind: Pod
metadata:
  name: pod-bond
  annotations:
        k8s.v1.cni.cncf.io/networks: '[
{"name": "net1",
"interface": "net1"
},
{"name": "net2",
"interface": "net2"
},
{"name": "bond",
"interface": "bond0"
}]'
spec:
  restartPolicy: Never
  containers:
  - name: bond-test
    image: centos/tools:v1
    imagePullPolicy: IfNotPresent
    command:
    - /sbin/init
    resources:
      requests:
        intel.com/sriov_dpdk_3: '1'
        intel.com/sriov_dpdk_4: '1'
      limits:
        intel.com/sriov_dpdk_3: '1'
        intel.com/sriov_dpdk_4: '1'

##

the log show๏ผš
0399bde5fa556692257101cd15ca0264d5a87fdbbf2f0c29" network for pod "pod-bond": networkPlugin cni failed to set up pod "pod-bond_net1" network: [net1/pod-bond:bond]: error adding container to network "bond": Failed to retrieve link objects from configuration file (&{NetConf:{CNIVersion:0.3.1 Name:bond Type:bond Capabilities:map[] IPAM:{Type:host-local} DNS:{Nameservers:[] Domain: Search:[] Options:[]}} Name:bond0 Mode:active-backup LinksContNs:true Miimon:100 Mtu:0 FailOverMac:1 Links:[map[name:net1] map[name:net2]]}), error: Failed to confirm that link (net1) exists, error: Failed to lookup link name net1, error: Link not found, failed to clean up sandbox container "d5aa4031590678cb0399bde5fa556692257101cd15ca0264d5a87fdbbf2f0c29" network for pod "pod-bond": networkPlugin cni failed to teardown pod "pod-bond_net1" network: delegateDel: error invoking DelegateDel - "bond": error in getting result from DelNetwork: Failed to retrieve link objects from configuration file (&{NetConf:{CNIVersion:0.3.1 Name:bond Type:bond Capabilities:map[] IPAM:{Type:host-local} DNS:{Nameservers:[] Domain: Search:[] Options:[]}} Name:bond0 Mode:active-backup LinksContNs:true Miimon:100 Mtu:0 FailOverMac:1 Links:[map[name:net1] map[name:net2]]}), error: Failed to confirm that link (net1) exists, error: Failed to lookup link name net1, error: Link not found / delegateDel: error invoking DelegateDel - "sriov": error in getting result from DelNetwork: error reading cached NetConf in /var/lib/cni/sriov with name d5aa4031590678cb0399bde5fa556692257101cd15ca0264d5a87fdbbf2f0c29-net2 / delegateDel: error invoking DelegateDel - "sriov": error in getting result from DelNetwork: error reading cached NetConf in /var/lib/cni/sriov with name d5aa4031590678cb0399bde5fa556692257101cd15ca0264d5a87fdbbf2f0c29-net1]
  Normal   AddedInterface          5s                 multus             Add net2 [] from net1/net2
  Normal   AddedInterface          5s                 multus             Add eth0 [10.20.45.159/32]
  Normal   AddedInterface          5s                 multus             Add net1 [] from net1/net1
  Normal   AddedInterface          3s                 multus             Add net1 [] from net1/net1
  Normal   AddedInterface          3s                 multus             Add net2 [] from net1/net2
  Normal   AddedInterface          3s                 multus             Add eth0 [10.20.45.168/32]
  Warning  FailedCreatePodSandBox  2s (x2 over 4s)    kubelet            (combined from similar events): Failed to create pod sandbox: rpc error: code = Unknown desc = [failed to set up sandbox container "f3fb5684ae3bc2277e54bed3f4713e07ab1087b7923ab2bf48554809050d3df8" network for pod "pod-bond": networkPlugin cni failed to set up pod "pod-bond_net1" network: [net1/pod-bond:bond]: error adding container to network "bond": Failed to retrieve link objects from configuration file (&{NetConf:{CNIVersion:0.3.1 Name:bond Type:bond Capabilities:map[] IPAM:{Type:host-local} DNS:{Nameservers:[] Domain: Search:[] Options:[]}} Name:bond0 Mode:active-backup LinksContNs:true Miimon:100 Mtu:0 FailOverMac:1 Links:[map[name:net1] map[name:net2]]}), error: Failed to confirm that link (net1) exists, error: Failed to lookup link name net1, error: Link not found, failed to clean up sandbox container "f3fb5684ae3bc2277e54bed3f4713e07ab1087b7923ab2bf48554809050d3df8" network for pod "pod-bond": networkPlugin cni failed to teardown pod "pod-bond_net1" network: delegateDel: error invoking DelegateDel - "bond": error in getting result from DelNetwork: Failed to retrieve link objects from configuration file (&{NetConf:{CNIVersion:0.3.1 Name:bond Type:bond Capabilities:map[] IPAM:{Type:host-local} DNS:{Nameservers:[] Domain: Search:[] Options:[]}} Name:bond0 Mode:active-backup LinksContNs:true Miimon:100 Mtu:0 FailOverMac:1 Links:[map[name:net1] map[name:net2]]}), error: Failed to confirm that link (net1) exists, error: Failed to lookup link name net1, error: Link not found / delegateDel: error invoking DelegateDel - "sriov": error in getting result from DelNetwork: error reading cached NetConf in /var/lib/cni/sriov with name f3fb5684ae3bc2277e54bed3f4713e07ab1087b7923ab2bf48554809050d3df8-net2 / delegateDel: error invoking DelegateDel - "sriov": error in getting result from DelNetwork: error reading cached NetConf in /var/lib/cni/sriov with name f3fb5684ae3bc2277e54bed3f4713e07ab1087b7923ab2bf48554809050d3df8-net1]
  Normal   AddedInterface          1s                 multus             Add net2 [] from net1/net2
  Normal   AddedInterface          1s                 multus             Add eth0 [10.20.45.163/32]
  Normal   AddedInterface          1s                 multus             Add net1 [] from net1/net1
  Normal   SandboxChanged

Unit Tests

There is a lack of Unit Tests for this project.

Any UT contributions are most welcome :)

Unable to create Bond Interface

I am not able to create bond interface, when I manually specify the MAC addresses for both the SRIOV nics. If I use the same MAC address for both the NICs, then creation goes through fine.

Multus Error Logs, I will capture relevant snippets from the logs, please let me know if you need the entire logs.

2020-02-05T13:18:18-08:00 [debug] delegateAdd: set MAC address "52:45:00:21:00:90" to "net1"
2020-02-05T13:18:18-08:00 [verbose] Add: default:test-pod:sriov-network:net1 {"cniVersion":"0.4.0","interfaces":[{"name":"net1","sandbox":"/proc/21415/ns/net"}],"dns":{}}
2020-02-05T13:18:18-08:00 [debug] LoadNetworkStatus: Interfaces:[{Name:net1 Mac: Sandbox:/proc/21415/ns/net}], DNS:{Nameservers:[] Domain: Search:[] Options:[]}, sriov-network, false
2020-02-05T13:18:18-08:00 [debug] getIfname: &{{0.3.1 sriov-network sriov map[] {} {[] [] []} map[] } { false []} net2 52:45:00:21:00:91 false false [123 34 99 110 105 86 101 114 115 105 111 110 34 58 34 48 46 51 46 49 34 44 34 100 101 118 105 99 101 73 68 34 58 34 48 48 48 48 58 48 49 58 49 48 46 51 34 44 34 110 97 109 101 34 58 34 115 114 105 111 118 45 110 101 116 119 111 114 107 34 44 34 112 99 105 66 117 115 73 68 34 58 34 48 48 48 48 58 48 49 58 49 48 46 51 34 44 34 116 121 112 101 34 58 34 115 114 105 111 118 34 125]}, eth0, 2
2020-02-05T13:18:18-08:00 [debug] LoadCNIRuntimeConf: &{503e0271d83a0ed307cd1d364324f0cf2906b3208b4a2922557e4101363f49a2 /proc/21415/ns/net eth0 IgnoreUnknown=1;K8S_POD_NAMESPACE=default;K8S_POD_NAME=test-pod;K8S_POD_INFRA_CONTAINER_ID=503e0271d83a0ed307cd1d364324f0cf2906b3208b4a2922557e4101363f49a2 /opt/cni/bin [123 34 98 105 110 68 105 114 34 58 34 47 111 112 116 47 99 110 105 47 98 105 110 34 44 34 99 110 105 86 101 114 115 105 111 110 34 58 34 48 46 51 46 49 34 44 34 99 111 110 102 68 105 114 34 58 34 47 101 116 99 47 99 110 105 47 110 101 116 46 100 34 44 34 100 101 108 101 103 97 116 101 115 34 58 91 123 34 99 110 105 86 101 114 115 105 111 110 34 58 34 48 46 51 46 49 34 44 34 110 97 109 101 34 58 34 99 97 108 105 99 111 34 44 34 112 108 117 103 105 110 115 34 58 91 123 34 100 97 116 97 115 116 111 114 101 95 116 121 112 101 34 58 34 107 117 98 101 114 110 101 116 101 115 34 44 34 105 112 97 109 34 58 123 34 116 121 112 101 34 58 34 114 111 98 105 110 45 105 112 97 109 34 125 44 34 107 117 98 101 114 110 101 116 101 115 34 58 123 34 107 117 98 101 99 111 110 102 105 103 34 58 34 47 101 116 99 47 99 110 105 47 110 101 116 46 100 47 99 97 108 105 99 111 45 107 117 98 101 99 111 110 102 105 103 34 125 44 34 108 111 103 95 108 101 118 101 108 34 58 34 105 110 102 111 34 44 34 109 116 117 34 58 49 52 56 48 44 34 110 111 100 101 110 97 109 101 34 58 34 99 101 110 116 111 115 45 54 48 45 50 49 56 34 44 34 112 111 108 105 99 121 34 58 123 34 116 121 112 101 34 58 34 107 56 115 34 125 44 34 116 121 112 101 34 58 34 99 97 108 105 99 111 34 125 44 123 34 99 97 112 97 98 105 108 105 116 105 101 115 34 58 123 34 112 111 114 116 77 97 112 112 105 110 103 115 34 58 116 114 117 101 125 44 34 115 110 97 116 34 58 116 114 117 101 44 34 116 121 112 101 34 58 34 112 111 114 116 109 97 112 34 125 93 125 93 44 34 107 117 98 101 99 111 110 102 105 103 34 58 34 47 101 116 99 47 99 110 105 47 110 101 116 46 100 47 109 117 108 116 117 115 46 100 47 109 117 108 116 117 115 46 107 117 98 101 99 111 110 102 105 103 34 44 34 108 111 103 70 105 108 101 34 58 34 47 118 97 114 47 108 111 103 47 109 117 108 116 117 115 46 108 111 103 34 44 34 108 111 103 76 101 118 101 108 34 58 34 100 101 98 117 103 34 44 34 109 117 108 116 117 115 78 97 109 101 115 112 97 99 101 34 58 34 107 117 98 101 45 115 121 115 116 101 109 34 44 34 110 97 109 101 34 58 34 109 117 108 116 117 115 45 99 110 105 45 110 101 116 119 111 114 107 34 44 34 114 101 97 100 105 110 101 115 115 105 110 100 105 99 97 116 111 114 102 105 108 101 34 58 34 47 104 111 109 101 47 114 111 98 105 110 100 115 47 101 116 99 47 114 111 98 105 110 47 98 111 111 116 115 116 114 97 112 95 100 111 110 101 34 44 34 116 121 112 101 34 58 34 109 117 108 116 117 115 34 125]}, &{{true} test-pod default 503e0271d83a0ed307cd1d364324f0cf2906b3208b4a2922557e4101363f49a2}, net2,
2020-02-05T13:18:18-08:00 [debug] delegateAdd: , net2, &{{0.3.1 sriov-network sriov map[] {} {[] [] []} map[] } { false []} net2 52:45:00:21:00:91 false false [123 34 99 110 105 86 101 114 115 105 111 110 34 58 34 48 46 51 46 49 34 44 34 100 101 118 105 99 101 73 68 34 58 34 48 48 48 48 58 48 49 58 49 48 46 51 34 44 34 110 97 109 101 34 58 34 115 114 105 111 118 45 110 101 116 119 111 114 107 34 44 34 112 99 105 66 117 115 73 68 34 58 34 48 48 48 48 58 48 49 58 49 48 46 51 34 44 34 116 121 112 101 34 58 34 115 114 105 111 118 34 125]}, &{503e0271d83a0ed307cd1d364324f0cf2906b3208b4a2922557e4101363f49a2 /proc/21415/ns/net net2 [[IgnoreUnknown 1] [K8S_POD_NAMESPACE default] [K8S_POD_NAME test-pod] [K8S_POD_INFRA_CONTAINER_ID 503e0271d83a0ed307cd1d364324f0cf2906b3208b4a2922557e4101363f49a2]] map[] }, /opt/cni/bin
2020-02-05T13:18:18-08:00 [debug] validateIfName: /proc/21415/ns/net, net2
2020-02-05T13:18:18-08:00 [debug] delegateAdd: set MAC address "52:45:00:21:00:91" to "net2"

2020-02-05T13:18:18-08:00 [verbose] Add: default:test-pod:sriov-network:net2 {"cniVersion":"0.4.0","interfaces":[{"name":"net2","sandbox":"/proc/21415/ns/net"}],"dns":{}}
2020-02-05T13:18:18-08:00 [debug] LoadNetworkStatus: Interfaces:[{Name:net2 Mac: Sandbox:/proc/21415/ns/net}], DNS:{Nameservers:[] Domain: Search:[] Options:[]}, sriov-network, false

2020-02-05T13:18:18-08:00 [debug] delegateAdd: , bond10, &{{0.3.1 bond-net1 bond map[] {host-local} {[] [] []} map[] } { false []} bond10 false false [123 32 34 116 121 112 101 34 58 32 34 98 111 110 100 34 44 32 34 99 110 105 86 101 114 115 105 111 110 34 58 32 34 48 46 51 46 49 34 44 32 34 110 97 109 101 34 58 32 34 98 111 110 100 45 110 101 116 49 34 44 32 34 109 111 100 101 34 58 32 34 97 99 116 105 118 101 45 98 97 99 107 117 112 34 44 32 34 108 105 110 107 115 73 110 67 111 110 116 97 105 110 101 114 34 58 32 116 114 117 101 44 32 34 109 105 105 109 111 110 34 58 32 34 49 48 48 34 44 32 34 108 105 110 107 115 34 58 32 91 32 123 34 110 97 109 101 34 58 32 34 110 101 116 49 34 125 44 32 123 34 110 97 109 101 34 58 32 34 110 101 116 50 34 125 32 93 44 32 34 105 112 97 109 34 58 32 123 32 34 116 121 112 101 34 58 32 34 104 111 115 116 45 108 111 99 97 108 34 44 32 34 115 117 98 110 101 116 34 58 32 34 49 48 46 53 54 46 50 49 55 46 48 47 50 52 34 44 32 34 114 111 117 116 101 115 34 58 32 91 123 32 34 100 115 116 34 58 32 34 48 46 48 46 48 46 48 47 48 34 32 125 93 44 32 34 103 97 116 101 119 97 121 34 58 32 34 49 48 46 53 54 46 50 49 55 46 49 34 32 125 32 125]}, &{503e0271d83a0ed307cd1d364324f0cf2906b3208b4a2922557e4101363f49a2 /proc/21415/ns/net bond10 [[IgnoreUnknown 1] [K8S_POD_NAMESPACE default] [K8S_POD_NAME test-pod] [K8S_POD_INFRA_CONTAINER_ID 503e0271d83a0ed307cd1d364324f0cf2906b3208b4a2922557e4101363f49a2]] map[] }, /opt/cni/bin
2020-02-05T13:18:18-08:00 [debug] validateIfName: /proc/21415/ns/net, bond10
2020-02-05T13:18:18-08:00 [error] delegateAdd: error invoking DelegateAdd - "bond": Failed to attached links to bond, error: Failed to set link: net2 MASTER, master index used: 6, error: operation not permitted
2020-02-05T13:18:18-08:00 [debug] delPlugins: , eth0, [0xc0002a7600 0xc000570000 0xc000134000 0xc0001349a0], 3, &{503e0271d83a0ed307cd1d364324f0cf2906b3208b4a2922557e4101363f49a2 /proc/21415/ns/net bond10 [[IgnoreUnknown 1] [K8S_POD_NAMESPACE default] [K8S_POD_NAME test-pod] [K8S_POD_INFRA_CONTAINER_ID 503e0271d83a0ed307cd1d364324f0cf2906b3208b4a2922557e4101363f49a2]] map[] }, /opt/cni/bin
2020-02-05T13:18:18-08:00 [debug] getIfname: &{{0.3.1 bond-net1 bond map[] {host-local} {[] [] []} map[] } { false []} bond10 false false [123 32 34 116 121 112 101 34 58 32 34 98 111 110 100 34 44 32 34 99 110 105 86 101 114 115 105 111 110 34 58 32 34 48 46 51 46 49 34 44 32 34 110 97 109 101 34 58 32 34 98 111 110 100 45 110 101 116 49 34 44 32 34 109 111 100 101 34 58 32 34 97 99 116 105 118 101 45 98 97 99 107 117 112 34 44 32 34 108 105 110 107 115 73 110 67 111 110 116 97 105 110 101 114 34 58 32 116 114 117 101 44 32 34 109 105 105 109 111 110 34 58 32 34 49 48 48 34 44 32 34 108 105 110 107 115 34 58 32 91 32 123 34 110 97 109 101 34 58 32 34 110 101 116 49 34 125 44 32 123 34 110 97 109 101 34 58 32 34 110 101 116 50 34 125 32 93 44 32 34 105 112 97 109 34 58 32 123 32 34 116 121 112 101 34 58 32 34 104 111 115 116 45 108 111 99 97 108 34 44 32 34 115 117 98 110 101 116 34 58 32 34 49 48 46 53 54 46 50 49 55 46 48 47 50 52 34 44 32 34 114 111 117 116 101 115 34 58 32 91 123 32 34 100 115 116 34 58 32 34 48 46 48 46 48 46 48 47 48 34 32 125 93 44 32 34 103 97 116 101 119 97 121 34 58 32 34 49 48 46 53 54 46 50 49 55 46 49 34 32 125 32 125]}, eth0, 3

YAML files

apiVersion: "k8s.cni.cncf.io/v1"
kind: NetworkAttachmentDefinition
metadata:
name: bond-sriov-net1
namespace: kube-system
annotations:
k8s.v1.cni.cncf.io/resourceName: intel.com/sriov0
spec:
config: '{
"type": "sriov",
"cniVersion": "0.3.1",
"name": "sriov-network"
}'

apiVersion: "k8s.cni.cncf.io/v1"
kind: NetworkAttachmentDefinition
metadata:
name: bond-net1
namespace: kube-system
spec:
config: '{
"type": "bond",
"cniVersion": "0.3.1",
"name": "bond-net1",
"mode": "active-backup",
"linksInContainer": true,
"miimon": "100",
"links": [
{"name": "net1"},
{"name": "net2"}
],
"ipam": {
"type": "host-local",
"subnet": "10.56.217.0/24",
"routes": [{
"dst": "0.0.0.0/0"
}],
"gateway": "10.56.217.1"
}
}'

apiVersion: v1
kind: Pod
metadata:
name: test-pod
annotations:
k8s.v1.cni.cncf.io/networks: '[
{"name": "bond-sriov-net1",
"mac": "52:45:00:21:00:90",
"interface": "net1",
"namespace": "kube-system"
},
{"name": "bond-sriov-net1",
"mac": "52:45:00:21:00:91",
"interface": "net2",
"namespace": "kube-system"
},
{"name": "bond-net1",
"interface": "bond10",
"namespace": "kube-system"
}]'
spec: # specification of the pod's contents
restartPolicy: Never
containers:

  • name: multus-test
    image: alpine:latest
    command:
    • /bin/sh
    • "-c"
    • "sleep 60m"
      imagePullPolicy: IfNotPresent
      resources:
      requests:
      intel.com/sriov0: '2'
      limits:
      intel.com/sriov0: '2'

Request to enhance Bond CNI to use resources discovered via the sriov device plugin

Bond CNI, currently is expecting that the VF names be manually provided. This won't be scalable and does not fit well with the Intel SRIOV device plugin flow, where the VFs are allocated based on the k8s resource names.

  1. Requesting Bond CNI to be enhanced to use resource names.
  2. Control the name of the interface inside the container

The network attachment definition could be something as shown below. In this case, VFs are from the same PFs, but could also be from different PFs.

apiVersion: "k8s.cni.cncf.io/v1"
kind: NetworkAttachmentDefinition
metadata:
name: bond-net
namespace: kube-system
spec:
config: '{
"cniVersion": "0.3.1",
"name": "bond-network",
"plugins": [
{
"type": "bond",
"ifname": "bond0",
"mode": "active-backup",
"miimon": "100",
"links": [
{
"k8s.v1.cni.cncf.io/resourceName: "intel.com/sriov0",
"name": "net0"
},
{
"k8s.v1.cni.cncf.io/resourceName": "intel.com/sriov0",
"name": "net1"
}
],

    "ipam": {
         "type": "host-local",
         "subnet": "192.168.1.0/24",
         "rangeStart": "192.168.1.21",
         "rangeEnd": "192.168.1.30",
         "routes": [
              { "dst": "0.0.0.0/0" }
         ],
         "gateway": "192.168.1.1"
    }
}

]
}'

Then the pod can be deployed using following spec:

apiVersion: v1
kind: Pod
metadata:
name: testpod1
annotations:
k8s.v1.cni.cncf.io/networks: kube-system/bond-net
spec:
containers:

  • name: appcntr1
    image: centos/tools
    imagePullPolicy: IfNotPresent
    command: [ "/bin/bash", "-c", "--" ]
    args: [ "while true; do sleep 300000; done;" ]
    resources:
    requests:
    intel.com/sriov0: '2'
    limits:
    intel.com/sriov0: '2'

Issue with bonding (active-backup mode)

I am deploying 25 pods with bonded SRIOV VFs, in active-backup mode. At least one of the pods goes into a state, where the Pod is not receiving any traffic from other Pods deployed on the same physical node.

The issue is seen only when the following things happen:

  • Instead of eth1, eth2 becomes the active interface
  • Bond0 MAC address is set to eth1 MAC address

[root@test1-pktgen-22 /]# ip addr
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000
link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
inet 127.0.0.1/8 scope host lo
valid_lft forever preferred_lft forever
2: tunl0@NONE: mtu 1480 qdisc noop state DOWN group default qlen 1000
link/ipip 0.0.0.0 brd 0.0.0.0
4: eth0@if598: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1480 qdisc noqueue state UP group default
link/ether f2:cb:cf:b2:3e:6d brd ff:ff:ff:ff:ff:ff link-netnsid 0
inet 172.21.20.97/32 scope global eth0
valid_lft forever preferred_lft forever
5: bond0: <BROADCAST,MULTICAST,MASTER,UP,LOWER_UP> mtu 9000 qdisc noqueue state UP group default qlen 1000
link/ether fa:41:ef:aa:5e:4c brd ff:ff:ff:ff:ff:ff
inet 192.168.60.124/24 scope global bond0
valid_lft forever preferred_lft forever
23: eth1: <BROADCAST,MULTICAST,SLAVE,UP,LOWER_UP> mtu 9000 qdisc mq master bond0 state UP group default qlen 1000
link/ether fa:41:ef:aa:5e:4c brd ff:ff:ff:ff:ff:ff
151: eth2: <BROADCAST,MULTICAST,SLAVE,UP,LOWER_UP> mtu 9000 qdisc mq master bond0 state UP group default qlen 1000
link/ether fa:41:ef:aa:5e:4c brd ff:ff:ff:ff:ff:ff

[root@test1-pktgen-22 /]# cat /proc/net/bonding/bond0
Ethernet Channel Bonding Driver: v3.7.1 (April 27, 2011)

Bonding Mode: fault-tolerance (active-backup)
Primary Slave: None
Currently Active Slave: eth2
MII Status: up
MII Polling Interval (ms): 100
Up Delay (ms): 0
Down Delay (ms): 0

Slave Interface: eth1
MII Status: up
Speed: 10000 Mbps
Duplex: full
Link Failure Count: 1
Permanent HW addr: fa:41:ef:aa:5e:4c
Slave queue ID: 0

Slave Interface: eth2
MII Status: up
Speed: 10000 Mbps
Duplex: full
Link Failure Count: 1
Permanent HW addr: a2:98:78:e0:01:4e
Slave queue ID: 0

[root@cscale-82-73 ~]# echo "$out" | egrep "test1-pktgen-22|test1-pktgen-18|test1-pktgen-21"
test1-pktgen-18 2/2 Running 0 10m 192.168.60.105 qct-03
test1-pktgen-21 2/2 Running 0 10m 192.168.60.142 qct-05
test1-pktgen-22 2/2 Running 0 10m 192.168.60.124 qct-05

Issue in action

Pod with issue - test1-pktgen-22 on Host qct-05

Ping to Pod on the same host fails
[root@test1-pktgen-22 /]# ping 192.168.60.142
PING 192.168.60.142 (192.168.60.142) 56(84) bytes of data.
From 192.168.60.124 icmp_seq=1 Destination Host Unreachable

Ping to Pod on a different host works fine
[root@test1-pktgen-22 /]# ping 192.168.60.105
PING 192.168.60.105 (192.168.60.105) 56(84) bytes of data.
64 bytes from 192.168.60.105: icmp_seq=1 ttl=64 time=0.150 ms

If I bring down the eth2 interface, which changes the primary slave for the bond0 to eth1, then everything works fine

[root@test1-pktgen-22 /]# ip link set down dev eth2
[root@test1-pktgen-22 /]# ip link set up dev eth2
[root@test1-pktgen-22 /]# ping 192.168.60.142
PING 192.168.60.142 (192.168.60.142) 56(84) bytes of data.
64 bytes from 192.168.60.142: icmp_seq=1 ttl=64 time=0.275 ms

If I disable eth1 and enable it again, the primary slave of the bond0 interface would have changed to eth2 and then things don't work again

[root@test1-pktgen-22 /]# ip link set down dev eth1
[root@test1-pktgen-22 /]# ip link set up dev eth1
[root@test1-pktgen-22 /]# ping 192.168.60.142
PING 192.168.60.142 (192.168.60.142) 56(84) bytes of data.

For "active-backup" mode, as far as I know, there is no need for any upstream switch configuration since the switch would only see one MAC address which is picked by the Bond interface. Seems like something is not getting set up properly.

Network Attachments

[root@cscale-82-73 ~]# kubectl get net-attach-def -n t001-u000003 test1-pktgen-22-bond0 test1-pktgen-22-net1-0 test1-pktgen-22-net1-1 -o yaml
apiVersion: v1
items:

  • apiVersion: k8s.cni.cncf.io/v1
    kind: NetworkAttachmentDefinition
    metadata:
    creationTimestamp: "2020-04-30T22:28:34Z"
    generation: 1
    name: test1-pktgen-22-bond0
    namespace: t001-u000003
    resourceVersion: "370830"
    selfLink: /apis/k8s.cni.cncf.io/v1/namespaces/t001-u000003/network-attachment-definitions/test1-pktgen-22-bond0
    uid: 9ae2acf9-e2bd-4eb2-af0d-c02d93e4729d
    spec:
    config: '{ "cniVersion": "0.3.1", "name": "bond-net", "plugins": [ { "type": "bond",
    "mode": "active-backup", "linksInContainer": true, "miimon": "100", "links":
    [{"name": "eth1"}, {"name": "eth2"}], "ipam": { "type": "robin-ipam" } }, {
    "type": "tuning", "mtu": 9000, "sysctl": { "net.ipv6.conf.all.accept_ra": "0",
    "net.ipv6.conf.default.accept_ra": "0", "net.ipv6.conf.bond0.accept_ra": "0"
    } } ] }'
  • apiVersion: k8s.cni.cncf.io/v1
    kind: NetworkAttachmentDefinition
    metadata:
    annotations:
    k8s.v1.cni.cncf.io/resourceName: intel.com/enp59s0f0_iavf
    creationTimestamp: "2020-04-30T22:28:26Z"
    generation: 1
    name: test1-pktgen-22-net1-0
    namespace: t001-u000003
    resourceVersion: "370733"
    selfLink: /apis/k8s.cni.cncf.io/v1/namespaces/t001-u000003/network-attachment-definitions/test1-pktgen-22-net1-0
    uid: cb21d2cd-4c5f-4e2d-80b2-64c0932cf87e
    spec:
    config: '{ "cniVersion": "0.3.1", "name": "sriov-network", "plugins": [ { "vlan":
    20, "spoofchk": "off", "type": "sriov" }, { "type": "tuning", "mtu": 9000, "sysctl":
    { "net.ipv6.conf.all.accept_ra": "0", "net.ipv6.conf.default.accept_ra": "0",
    "net.ipv6.conf.eth1.accept_ra": "0" } } ] }'
  • apiVersion: k8s.cni.cncf.io/v1
    kind: NetworkAttachmentDefinition
    metadata:
    annotations:
    k8s.v1.cni.cncf.io/resourceName: intel.com/enp59s0f1_iavf
    creationTimestamp: "2020-04-30T22:28:31Z"
    generation: 1
    name: test1-pktgen-22-net1-1
    namespace: t001-u000003
    resourceVersion: "370784"
    selfLink: /apis/k8s.cni.cncf.io/v1/namespaces/t001-u000003/network-attachment-definitions/test1-pktgen-22-net1-1
    uid: a95b01f2-ea5f-4d37-b4af-5f60de85a050
    spec:
    config: '{ "cniVersion": "0.3.1", "name": "sriov-network", "plugins": [ { "vlan":
    20, "spoofchk": "off", "type": "sriov" }, { "type": "tuning", "mtu": 9000, "sysctl":
    { "net.ipv6.conf.all.accept_ra": "0", "net.ipv6.conf.default.accept_ra": "0",
    "net.ipv6.conf.eth2.accept_ra": "0" } } ] }'
    kind: List
    metadata:
    resourceVersion: ""
    selfLink: ""

Multus: v3.4.1
Kubernetes: 1.16.3
SRIOV: Latest Version
Bond CNI: Latest Version

Expose bond interface ip as external IP to kubernetes service

I am trying to expose bond interface ip as external IP to a kubernetes service.

I have a Pod in default namespace that is attached with 8 SRIOV VFs. These VFs in the Pod are aggregated as a bond interface (bond0). I am able to reach to bond interface from the physical ports. But the traffic is not being routed from bond interface to kubernetes service endpoint.

Is it possible to achieve such usecase ?

Below are the configurations that i use.

http-service.yaml

apiVersion: v1
kind: Service
metadata:
  name: http-service
  labels:
    app.kubernetes.io/name: http-server
spec:
  externalIPs:
  - 10.21.52.224  # Replace with your desired external IP address
  selector:
    app.kubernetes.io/name: http-server
  ports:
  - protocol: TCP
    port: 8010
    targetPort: 443

bond-cni.yaml NAD:

[root@appliance-1 ~]# cat conf_file/bond.yaml 
apiVersion: "k8s.cni.cncf.io/v1"
kind: NetworkAttachmentDefinition
metadata:
  name: bond-net
spec:
  config: '{
  "type": "bond",
  "cniVersion": "0.3.1",
  "name": "bond-net",
  "ifname": "bond0",
  "mode": "802.3ad",
  "linksInContainer": true,
  "miimon": "100",
  "links": [
     {"name": "net0"},
     {"name": "net1"},
     {"name": "net2"},
     {"name": "net3"},
     {"name": "net4"},
     {"name": "net5"},
     {"name": "net6"},
     {"name": "net7"}
  ],

SRIOV VF 0 NAD yaml: ( in the same way, 1-7 NADs are created)

apiVersion: k8s.cni.cncf.io/v1
kind: NetworkAttachmentDefinition
metadata:
  annotations:
    k8s.v1.cni.cncf.io/resourceName: intel.com/intel_sriov_hadevice0
  generation: 1
  name: sriov-ha0-vm1
  namespace: default
spec:
  config: '{ "cniVersion": "0.3.1", "type": "sriov", "mac": "<mac>", "vlan": 2152, "spoofChk":
    "off", "trust": "on" }'

Pod_spec.yaml:

apiVersion: v1
kind: Pod
metadata:
  name: ha-agent
  labels:
    app.kubernetes.io/name: ha-agent
  annotations:
    k8s.v1.cni.cncf.io/networks: '[
      {"name": "sriov-ha0-vm1", "interface": "net0" },
      {"name": "sriov-ha1-vm1", "interface": "net1" },
      {"name": "sriov-ha2-vm1", "interface": "net2" },
      {"name": "sriov-ha3-vm1", "interface": "net3" },
      {"name": "sriov-ha4-vm1", "interface": "net4" },
      {"name": "sriov-ha5-vm1", "interface": "net5" },
      {"name": "sriov-ha6-vm1", "interface": "net6" },
      {"name": "sriov-ha7-vm1", "interface": "net7" },
      {"name": "bond-net", "interface": "bond0" }
    ]'
<truncated>

SR-IOV and LACP

Hi,

I don't think is an issue with the plugin per say but hoping someone can help me out.
I have a host where physical LACP bonding is setup.
This LACP bond is assigned an IP on linux host, and two ports connect two a switch with LACP on the other end.

I've got several SR-IOV VF functions created on both PF interace cards, and when creating the POD I get a bond assigned as expected but when both the PF devices are up only half the requests work. When I disable one of the PF interfaces everything works as expected.

My network attachment definition looks as follows

apiVersion: "k8s.cni.cncf.io/v1" kind: NetworkAttachmentDefinition metadata: name: sriov-diag-s1-pf1 annotations: k8s.v1.cni.cncf.io/resourceName: intel.com/intel_sriov_netdevice_pf1 spec: config: '{ "type": "sriov", "cniVersion": "0.3.1", "name": "sriov-network", "vlan": 854, "spoofchk":"off" }'

apiVersion: "k8s.cni.cncf.io/v1" kind: NetworkAttachmentDefinition metadata: name: sriov-diag-s1-pf2 annotations: k8s.v1.cni.cncf.io/resourceName: intel.com/intel_sriov_netdevice_pf2 spec: config: '{ "type": "sriov", "cniVersion": "0.3.1", "name": "sriov-network", "vlan": 854, "spoofchk":"off" }'

And then the bond just looks like the example

apiVersion: "k8s.cni.cncf.io/v1" kind: NetworkAttachmentDefinition metadata: name: diag-bond spec: config: '{ "type": "bond", "cniVersion": "0.3.1", "name": "diag-bond", "ifname": "diag-bond", "mode": "active-backup", "failOverMac": 1, "linksInContainer": true, "miimon": "100", "links": [ {"name": "diag1"}, {"name": "diag2"} ], "ipam": { "type": "host-local", "subnet": "10.190.4.240/28", "gateway": "10.190.4.241" } }'

ip link show displays both the vf with spoof checking off pf1 shows

`vf 5 MAC e6:16:f9:31:30:ae, vlan 854, spoof checking off, link-state auto, trust off, query_rss off``

pf2 shows

vf 1 MAC be:74:47:a9:2e:9d, vlan 854, spoof checking off, link-state auto, trust off, query_rss off

the diag bond in the container shows

`bash-5.0# cat /proc/net/bonding/diag-bond
Ethernet Channel Bonding Driver: v3.7.1 (April 27, 2011)

Bonding Mode: fault-tolerance (active-backup) (fail_over_mac active)
Primary Slave: None
Currently Active Slave: diag1
MII Status: up
MII Polling Interval (ms): 100
Up Delay (ms): 0
Down Delay (ms): 0

Slave Interface: diag1
MII Status: up
Speed: 10000 Mbps
Duplex: full
Link Failure Count: 0
Permanent HW addr: e6:16:f9:31:30:ae
Slave queue ID: 0

Slave Interface: diag2
MII Status: up
Speed: 10000 Mbps
Duplex: full
Link Failure Count: 0
Permanent HW addr: be:74:47:a9:2e:9d
Slave queue ID: 0
`

Bond on host shows

`Ethernet Channel Bonding Driver: v3.7.1 (April 27, 2011)

Bonding Mode: IEEE 802.3ad Dynamic link aggregation
Transmit Hash Policy: layer3+4 (1)
MII Status: up
MII Polling Interval (ms): 100
Up Delay (ms): 0
Down Delay (ms): 0

802.3ad info
LACP rate: fast
Min links: 0
Aggregator selection policy (ad_select): stable

Slave Interface: enp4s0f1
MII Status: up
Speed: 10000 Mbps
Duplex: full
Link Failure Count: 1
Permanent HW addr: a0:36:9f:27:a3:56
Slave queue ID: 0
Aggregator ID: 2
Actor Churn State: none
Partner Churn State: none
Actor Churned Count: 0
Partner Churned Count: 0

Slave Interface: enp4s0f0
MII Status: up
Speed: 10000 Mbps
Duplex: full
Link Failure Count: 1
Permanent HW addr: a0:36:9f:27:a3:54
Slave queue ID: 0
Aggregator ID: 2
Actor Churn State: none
Partner Churn State: none
Actor Churned Count: 0
Partner Churned Count: 0
`

I think I am miss-understanding how LACP and SR-IOV play nicely together.

Any help greatly appreciated.

Bonding SR-IOV interfaces fails

Followed the README step by step. Normal SR-IOV Pods with a single interface and an IP set on the SR-IOV interface work fine, but trying to create a bond on top of SRIOV VFs fails every time.

I am expecting to see a Pod started with the bonded interface present but Pods all sit in ContainerCreating and do not proceed. I am able to replace the definition shown below with a basic SRIOV network and everything springs to life. Likewise all MACVLAN, Canal, Hostnetwork and other network types seem to work also.

OS is SLES12SP4 on Intel X710 hardware. Kubernetes is RKE2 version v1.19.8+rke2r1

I see that the first interface is being added but it never proceeds further:

 Normal   AddedInterface          6s                      multus             Add net1 [] from kube-system/sriov-pf1
  Normal   AddedInterface          5s                      multus             Add net1 [] from kube-system/sriov-pf1
  Normal   AddedInterface          5s                      multus             Add eth0 [10.42.1.130/32]
  Normal   AddedInterface          4s                      multus             Add eth0 [10.42.1.131/32]
  Normal   AddedInterface          4s                      multus             Add net1 [] from kube-system/sriov-pf1
  Normal   AddedInterface          3s                      multus             Add net1 [] from kube-system/sriov-pf1
  Normal   AddedInterface          3s                      multus             Add eth0 [10.42.1.132/32]
  Normal   AddedInterface          2s                      multus             Add eth0 [10.42.1.133/32]
  Normal   AddedInterface          2s                      multus             Add net1 [] from kube-system/sriov-pf1
  Normal   AddedInterface          1s                      multus             Add eth0 [10.42.1.134/32]
  Normal   AddedInterface          1s                      multus             Add net1 [] from kube-system/sriov-pf1
  Normal   AddedInterface          0s                      multus             Add eth0 [10.42.1.135/32]
  Normal   AddedInterface          0s                      multus             Add net1 [] from kube-system/sriov-pf1

Annotations used:

      annotations:
        k8s.v1.cni.cncf.io/networks: '[
          {"name": "sriov-pf1",
          "interface": "net1"},
          {"name": "sriov-pf2",
          "interface": "net2"},
          {"name": "bond-net1",
          "interface": "bond0"}
          ]'

sriovdp-config (note the intel_sriov_netdevice pool which is working just fine):

apiVersion: v1
kind: ConfigMap
metadata:
  name: sriovdp-config
  namespace: kube-system
data:
  config.json: |
    {
        "resourceList": [{
                "resourceName": "intel_sriov_PF_1",
                "selectors": {
                    "vendors": ["8086"],
                    "devices": ["154c", "10ed"],
                    "drivers": ["i40evf", "iavf", "ixgbevf"],
                    "pfNames": ["pslave-0#1-10"]
                }
            },
            {
                "resourceName": "intel_sriov_PF_2",
                "selectors": {
                    "vendors": ["8086"],
                    "devices": ["154c", "10ed"],
                    "drivers": ["i40evf", "iavf", "ixgbevf"],
                    "pfNames": ["pslave-1#1-10"]
                }
            },
            {
                "resourceName": "intel_sriov_netdevice",
                "selectors": {
                    "vendors": ["8086"],
                    "devices": ["154c", "10ed"],
                    "drivers": ["i40evf", "iavf", "ixgbevf"]
                }
            },
            {
                "resourceName": "intel_sriov_dpdk",
                "selectors": {
                    "vendors": ["8086"],
                    "devices": ["154c", "10ed"],
                    "drivers": ["vfio-pci"],
                    "pfNames": ["enp0s0f0","enp2s2f1"]
                }
            },
            {
                "resourceName": "mlnx_sriov_rdma",
                "selectors": {
                    "vendors": ["15b3"],
                    "devices": ["1018"],
                    "drivers": ["mlx5_ib"],
                    "isRdma": true
                }
            }
        ]
    }

First SRIOV NetworkAttachmentDefinition:

apiVersion: "k8s.cni.cncf.io/v1"
kind: NetworkAttachmentDefinition
metadata:
  name: sriov-pf1
  annotations:
    k8s.v1.cni.cncf.io/resourceName: intel.com/intel_sriov_PF_1
spec:
  config: '{
  "type": "sriov",
  "name": "sriov-network",
  "spoofchk": "off"
}'

Second SRIOV NetworkAttachmentDefinition:

apiVersion: "k8s.cni.cncf.io/v1"
kind: NetworkAttachmentDefinition
metadata:
  name: sriov-pf2
  annotations:
    k8s.v1.cni.cncf.io/resourceName: intel.com/intel_sriov_PF_2
spec:
  config: '{
  "type": "sriov",
  "name": "sriov-network",
  "spoofchk": "off"
}'

Bond NetworkAttachmentDefinition:

apiVersion: "k8s.cni.cncf.io/v1"
kind: NetworkAttachmentDefinition
metadata:
  name: bond-net1
spec:
  config: '{
    "type": "bond",
    "cniVersion": "0.3.1",
    "name": "bond-net1",
    "ifname": "bond0",
    "mode": "802.3ad",
    "xmit_hash_policy": "layer3+4",
    "failOverMac": 1,
    "linksInContainer": true,
    "miimon": "100",
    "links": [
      {"name": "net1"},
      {"name": "net2"}
    ],
    "ipam": {
      "type": "host-local",
      "subnet": "192.168.3.0/24",
      "rangeStart": "192.168.3.200",
      "rangeEnd": "192.168.3.216",
      "routes": [{
        "dst": "0.0.0.0/0"
      }]
    }
  }'

If I swap all this for a basic SRIOV network config with IPs directly on the VFs then it works fine:

apiVersion: "k8s.cni.cncf.io/v1"
kind: NetworkAttachmentDefinition
metadata:
 name: sriov-net1
 annotations:
   k8s.v1.cni.cncf.io/resourceName: intel.com/intel_sriov_netdevice
spec:
 config: '{
 "type": "sriov",
 "cniVersion": "0.3.1",
 "name": "sriov-network",
 "ipam": {
   "type": "host-local",
   "subnet": "192.168.1.0/24",
   "rangeStart": "192.168.1.100",
   "rangeEnd": "192.168.1.116",
   "routes": [{
     "dst": "0.0.0.0/0"
   }]
 }
}'

Bonding SR-IOV VFs in LACP mode fails

Hello,

I'm trying to attach a pod to two networks including Ethernet and Infiniband and bond two Infiniband SR-IOV VFs inside Docker container.
It worked in the active-backup mode but it didn't work in the 802.3ad mode.

While attaching VFs to bond0, it returns an error 'Failed to set link: net1 MASTER, master index used: 5, error: operation not supported' as follows:

root@node$ kubectl describe pod sample-pod2-7487977bcb-vhpml
Name:           sample-pod2-7487977bcb-vhpml
Namespace:      default
Priority:       0
Node:           node2/10.253.4.109
Start Time:     Fri, 18 Mar 2022 11:43:29 +0000
Labels:         app=sriov
                pod-template-hash=7487977bcb
Annotations:    cni.projectcalico.org/podIP: 10.233.96.68/32
                cni.projectcalico.org/podIPs: 10.233.96.68/32
                k8s.v1.cni.cncf.io/networks:
                  [ {"name": "sriov-net1", "interface": "net1" }, {"name": "sriov-net2", "interface": "net2" }, {"name": "bond-net1", "interface": "bond0" }...
Status:         Pending
IP:
IPs:            <none>
Controlled By:  ReplicaSet/sample-pod2-7487977bcb
Containers:
  mlnx-inbox-ctr:
    Container ID:
    Image:         mellanox/mofed-5.5-1.0.3.2:ubuntu20.04-amd64
    Image ID:
    Port:          <none>
    Host Port:     <none>
    Command:
      sh
      -c
      sleep inf
    State:          Waiting
      Reason:       ContainerCreating
    Ready:          False
    Restart Count:  0
    Limits:
      intel.com/intel_sriov_PF_1:  1
      intel.com/intel_sriov_PF_2:  1
    Requests:
      intel.com/intel_sriov_PF_1:  1
      intel.com/intel_sriov_PF_2:  1
    Environment:                   <none>
    Mounts:
      /var/run/secrets/kubernetes.io/serviceaccount from default-token-dbwbz (ro)
Conditions:
  Type              Status
  Initialized       True
  Ready             False
  ContainersReady   False
  PodScheduled      True
Volumes:
  default-token-dbwbz:
    Type:        Secret (a volume populated by a Secret)
    SecretName:  default-token-dbwbz
    Optional:    false
QoS Class:       BestEffort
Node-Selectors:  <none>
Tolerations:     node.kubernetes.io/not-ready:NoExecute op=Exists for 300s
                 node.kubernetes.io/unreachable:NoExecute op=Exists for 300s
Events:
  Type     Reason                  Age   From               Message
  ----     ------                  ----  ----               -------
  Normal   Scheduled               14s   default-scheduler  Successfully assigned default/sample-pod2-7487977bcb-vhpml to node2
  Normal   AddedInterface          14s   multus             Add eth0 [10.233.96.65/32]
  Normal   AddedInterface          14s   multus             Add net1 [] from default/sriov-net1
  Normal   AddedInterface          11s   multus             Add net2 [] from default/sriov-net2
  Warning  FailedCreatePodSandBox  7s    kubelet            Failed to create pod sandbox: rpc error: code = Unknown desc = [failed to set up sandbox container "099870524e914bcc220521098e9dc3fb92d3ba4d5ea58acdd1ecd9f43ec20562" network for pod "sample-pod2-7487977bcb-vhpml": networkPlugin cni failed to set up pod "sample-pod2-7487977bcb-vhpml_default" network: [default/sample-pod2-7487977bcb-vhpml:bond-lacp]: error adding container to network "bond-lacp": Failed to attached links to bond, error: Failed to set link: net1 MASTER, master index used: 5, error: operation not supported, failed to clean up sandbox container "099870524e914bcc220521098e9dc3fb92d3ba4d5ea58acdd1ecd9f43ec20562" network for pod "sample-pod2-7487977bcb-vhpml": networkPlugin cni failed to teardown pod "sample-pod2-7487977bcb-vhpml_default" network: delegateDel: error invoking DelegateDel - "bond": error in getting result from DelNetwork: Failed to retrieve link objects from configuration file (&{NetConf:{CNIVersion:0.3.1 Name:bond-lacp Type:bond Capabilities:map[] IPAM:{Type:host-local} DNS:{Nameservers:[] Domain: Search:[] Options:[]} RawPrevResult:map[] PrevResult:<nil>} Name:bond0 Mode:802.3ad LinksContNs:true FailOverMac:0 Miimon:100 Links:[map[name:net1] map[name:net2]] MTU:1500}), error: Failed to confirm that link (net1) exists, error: Failed to lookup link name net1, error: Link not found / delegateDel: error invoking ConflistDel - "sriov-network": conflistDel: error in getting result from DelNetworkList: error reading cached NetConf in /var/lib/cni/ib-sriov with name 099870524e914bcc220521098e9dc3fb92d3ba4d5ea58acdd1ecd9f43ec20562-net2 / delegateDel: error invoking ConflistDel - "sriov-network": conflistDel: error in getting result from DelNetworkList: error reading cached NetConf in /var/lib/cni/ib-sriov with name 099870524e914bcc220521098e9dc3fb92d3ba4d5ea58acdd1ecd9f43ec20562-net1]
  Normal   SandboxChanged          6s    kubelet            Pod sandbox changed, it will be killed and re-created.
  Normal   AddedInterface          5s    multus             Add eth0 [10.233.96.68/32]
  Normal   AddedInterface          4s    multus             Add net1 [] from default/sriov-net1
  Normal   AddedInterface          3s    multus             Add net2 [] from default/sriov-net2

The NetworkAttachmentDefinition for the bonding is as follows (I found this example here.):

root@node1$ cat bond-nat.yaml
apiVersion: "k8s.cni.cncf.io/v1"
kind: NetworkAttachmentDefinition
metadata:
  name: bond-net1
spec:
  config: '{
  "type": "bond",
  "cniVersion": "0.3.1",
  "name": "bond-lacp",
  "ifname": "bond0",
  "mode": "802.3ad",
  "xmitHashPolicy": "layer2+3",
  "lacpRate": "fast",
  "mtu": 1500,
  "linksInContainer": true,
  "miimon": "100",
  "links": [
     {"name": "net1"},
     {"name": "net2"}
  ],
  "ipam": {
    "type": "host-local",
    "subnet": "192.168.101.0/24",
    "routes": [{
      "dst": "0.0.0.0/0"
    }],
    "gateway": "192.168.101.1"
  }
}'

But it works when I choose the active-backup mode as follows.

root@node1$ kubectl exec -ti sample-pod2-7487977bcb-2qpj7 -- bash
root@sample-pod2-7487977bcb-2qpj7:/# ip a s
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
    inet 127.0.0.1/8 scope host lo
       valid_lft forever preferred_lft forever
2: tunl0@NONE: <NOARP> mtu 1480 qdisc noop state DOWN group default qlen 1000
    link/ipip 0.0.0.0 brd 0.0.0.0
4: eth0@if50708: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1480 qdisc noqueue state UP group default
    link/ether 0e:cb:cc:bf:a1:54 brd ff:ff:ff:ff:ff:ff link-netnsid 0
    inet 10.233.96.51/32 brd 10.233.96.51 scope global eth0
       valid_lft forever preferred_lft forever
5: bond0: <BROADCAST,MULTICAST,MASTER,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default qlen 1000
    link/infiniband 00:00:18:78:fe:80:00:00:00:00:00:00:22:2a:90:f4:fd:a3:a0:00 brd 00:ff:ff:ff:ff:12:40:1b:ff:ff:00:00:00:00:00:00:ff:ff:ff:ff
    inet 192.168.101.4/24 brd 192.168.101.255 scope global bond0
       valid_lft forever preferred_lft forever
50710: net1: <BROADCAST,MULTICAST,SLAVE,UP,LOWER_UP> mtu 1500 qdisc mq master bond0 state UP group default qlen 256
    link/infiniband 00:00:18:78:fe:80:00:00:00:00:00:00:22:2a:90:f4:fd:a3:a0:00 brd 00:ff:ff:ff:ff:12:40:1b:ff:ff:00:00:00:00:00:00:ff:ff:ff:ff
50712: net2: <BROADCAST,MULTICAST,SLAVE,UP,LOWER_UP> mtu 1500 qdisc mq master bond0 state UP group default qlen 256
    link/infiniband 00:00:7b:0c:fe:80:00:00:00:00:00:00:de:e3:fd:ef:36:a0:3b:9b brd 00:ff:ff:ff:ff:12:40:1b:ff:ff:00:00:00:00:00:00:ff:ff:ff:ff

root@sample-pod2-7487977bcb-2qpj7:/#

Did I miss something or is bonding SR-IOV VFs inherently not supported for LACP?
Please let me know what else left to do for this.
Thank you!


kubernetes: v1.20.7

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.