Comments (9)
You only need to add the PF netdevice information here.
Device plugin automatically detects its child VF and uses them.
Please remove child VF netdevices from the list.
from k8s-rdma-sriov-dev-plugin.
Also , please don't enable sriov by yourself. This device plugin enables sriov and does necessary configuration of the VF for Infiniband and RoCE depending on upstream kernel or MOFED.
from k8s-rdma-sriov-dev-plugin.
@flymark2010 fly I am updated documentation for same. Let me know how it goes with only PFs in the list.
from k8s-rdma-sriov-dev-plugin.
Sorry for no reply for so long. We've been waiting for the new OFED driver and now we have installed driver OFED 4.4, and then tried again, but still failed.
First ,I'm not sure the meaning of "don't enable sriov by yourself". I used the comand mlxconfig -d /dev/mst/mt4115_pciconf0 set SRIOV_EN=1 NUM_OF_VFS=9
and the reboot
the system. Then I get the hca info with command ibv_devinfo
:
hca_id: mlx5_1
transport: InfiniBand (0)
fw_ver: 14.23.1000
node_guid: 506b:4b03:002f:1a3d
sys_image_guid: 506b:4b03:002f:1a3c
vendor_id: 0x02c9
vendor_part_id: 4117
hw_ver: 0x0
board_id: MT_2420110034
phys_port_cnt: 1
Device ports:
port: 1
state: PORT_DOWN (1)
max_mtu: 4096 (5)
active_mtu: 1024 (3)
sm_lid: 0
port_lid: 0
port_lmc: 0x00
link_layer: Ethernet
hca_id: mlx5_0
transport: InfiniBand (0)
fw_ver: 14.23.1000
node_guid: 506b:4b03:002f:1a3c
sys_image_guid: 506b:4b03:002f:1a3c
vendor_id: 0x02c9
vendor_part_id: 4117
hw_ver: 0x0
board_id: MT_2420110034
phys_port_cnt: 1
Device ports:
port: 1
state: PORT_ACTIVE (4)
max_mtu: 4096 (5)
active_mtu: 1024 (3)
sm_lid: 0
port_lid: 0
port_lmc: 0x00
link_layer: Ethernet
and the Ethernet interface info with command ifconfig
:
ens5f0 Link encap:Ethernet HWaddr 50:6b:4b:2f:1a:3c
inet addr:10.128.1.16 Bcast:10.128.1.255 Mask:255.255.255.0
inet6 addr: fe80::526b:4bff:fe2f:1a3c/64 Scope:Link
UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1
RX packets:34153 errors:0 dropped:231 overruns:0 frame:0
TX packets:8405 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:1000
RX bytes:7302014 (7.3 MB) TX bytes:7965863 (7.9 MB)
ens5f1 Link encap:Ethernet HWaddr 50:6b:4b:2f:1a:3d
UP BROADCAST MULTICAST MTU:1500 Metric:1
RX packets:0 errors:0 dropped:0 overruns:0 frame:0
TX packets:0 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:1000
RX bytes:0 (0.0 B) TX bytes:0 (0.0 B)
We have a card with two port on the node, but only one is used. Actually the hca mlx5_0
and Ethernet interface ens5f0
are active.
The content of rdma-sriov-node-config.yaml
:
apiVersion: v1
kind: ConfigMap
metadata:
name: rdma-devices
namespace: kube-system
data:
config.json: |
{
"mode" : "sriov",
"pfNetdevices": [ "ens5f0" ]
}
Then I created the device plugin DamonSet, all the DamonSet Pods run normally with status Running
.
Then I can see 9 virtual hca devices and all the port state is PORT_ACTIVE
, same with the Ethernet interface.
But here is still no resource rdma/vhca
in the node description, and the test Pod is always in Pending
state with message Warning FailedScheduling 50s (x91 over 25m) default-scheduler 0/7 nodes are available: 7 Insufficient rdma/vhca.
.
from k8s-rdma-sriov-dev-plugin.
np @flymark2010.
I will make the documentation more crisp instead of ""don't enable sriov by yourself".
Basically rdma device plugin enables the SRIOV and does necessary rdma configuration.
Therefore, user should not enable it by writing to sysfs files.
What you have done to enable at HCA (firmware/hardware) level is correct.
Can you please share the output of
ip link show ens5f0
and
kubectl show logs --namespace=kube-system <pod_of_device_plugin_ds>
This will help to debug/understand why vhca resources are not published or something else went wrong.
from k8s-rdma-sriov-dev-plugin.
output for ip link show ens5f0
:
4: ens5f0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP mode DEFAULT group default qlen 1000
link/ether 50:6b:4b:2f:1a:3c brd ff:ff:ff:ff:ff:ff
vf 0 MAC fa:8e:ae:13:f6:8f, spoof checking off, link-state auto
vf 1 MAC da:46:61:b7:b2:2f, spoof checking off, link-state auto
vf 2 MAC 6a:f6:3c:f9:75:69, spoof checking off, link-state auto
vf 3 MAC ee:f9:5e:0a:c9:1e, spoof checking off, link-state auto
vf 4 MAC fe:8c:fe:4a:af:bb, spoof checking off, link-state auto
vf 5 MAC 9a:4c:c5:74:7f:75, spoof checking off, link-state auto
vf 6 MAC a2:8a:40:ee:a1:89, spoof checking off, link-state auto
vf 7 MAC 0e:d1:77:26:c3:68, spoof checking off, link-state auto
vf 8 MAC 72:ff:98:e1:54:9c, spoof checking off, link-state auto
Output for device plugin log is repeating with the following log:
2018/07/11 05:45:51 Starting to serve on /var/lib/kubelet/device-plugins/rdma-sriov-dp.sock
2018/07/11 05:45:51 Could not register device plugin: rpc error: code = Unimplemented desc = unknown service v1beta1.Registration
2018/07/11 05:45:51 Could not contact Kubelet, retrying. Did you enable the device plugin feature gate?
2018/07/11 05:45:51 sriov device mode
Configuring SRIOV on ndev= ens5f0 6
max_vfs = 9
cur_vfs = 9
vf = &{0 virtfn0 true false}
vf = &{1 virtfn1 true false}
vf = &{2 virtfn2 true false}
vf = &{3 virtfn3 true false}
vf = &{4 virtfn4 true false}
vf = &{5 virtfn5 true false}
vf = &{6 virtfn6 true false}
vf = &{7 virtfn7 true false}
vf = &{8 virtfn8 true false}
I'm sure the device plugin feature gate is setted for k8s, here is the ps
result:
# ps -ef | grep kubelet
root 2082 1 5 11:10 ? 00:08:09 /usr/local/kubernetes/kubelet --address=10.128.1.16 --hostname-override=10.128.1.16 --pod-infra-container-image=10.128.2.6/kube-system/pause-amd64:3.0 --experimental-bootstrap-kubeconfig=/etc/kubernetes/bootstrap.kubeconfig --kubeconfig=/etc/kubernetes/kubelet.kubeconfig --cert-dir=/etc/kubernetes/ssl --network-plugin=cni --cni-conf-dir=/etc/cni/net.d --cni-bin-dir=/usr/local/kubernetes --cluster-dns=10.0.0.2 --cluster-domain=cluster.cloudwalk. --hairpin-mode hairpin-veth --feature-gates=DevicePlugins=true --allow-privileged=true --fail-swap-on=false --logtostderr=true --v=2
root 16674 10575 0 13:46 pts/2 00:00:00 grep --color=auto kubelet
from k8s-rdma-sriov-dev-plugin.
@flymark2010
plugin seems to configure the VFs correctly. Feature gate is also enabled.
what is the kubeadm, kubelet and kubeadm versions are you using? 1.10.3 or higher should work.
from k8s-rdma-sriov-dev-plugin.
The kubelet version is 1.9.0. I'll try higher kubelet version.
from k8s-rdma-sriov-dev-plugin.
After upgrading the kubelet version to 1.10.4, I can see the resource rdma/vhca
in the node description, and the test Pod can run normally.
Thanks a lot!
from k8s-rdma-sriov-dev-plugin.
Related Issues (20)
- some question about HCA mode HOT 6
- rdma sriov device plugin returns device or resource busy HOT 7
- Support for connectX-3 pro vpi? HOT 2
- Daemonset logs says `Link not found` HOT 24
- Configure ib0 for overlay/virtual netdevice HOT 3
- how can i get ib library in container? HOT 3
- can not use rdma_client, when using hca mode with calico HOT 14
- Driver doesn't support SRIOV configuration via sysfs HOT 7
- what the meaning of configure ib0 as the parent netdevice? HOT 3
- Error Message "Operation Not Supported" when setting max_tx_rate/rate or min_tx_rate HOT 1
- RDMA_CM failure in sriov mode HOT 4
- Is liveness probe a consideration? HOT 1
- Deprecated k8s apiversion in example
- got all VF with port state 'DOWN' HOT 2
- VF is not allocated after recreating a pod HOT 2
- Failed to Create QP HOT 3
- Using Mellanox ConnectX-4 Lx, in the k8s pod command 'show_gids' returns nill while using calico cni. HOT 2
- Could the device plugin support K8S v1.9 ?
- Unable to Connect the HCA's through the link HOT 13
- an error "No such device" is reported, when using hca mode with RoCE adapter
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from k8s-rdma-sriov-dev-plugin.