Comments (30)
Thanks for your help!
Now OK on 1.17 and 1.18!
My taints:
NAME TAINTS
l01 [map[effect:NoSchedule key:node-role.kubernetes.io/master]]
l02 <none>
l03 <none>
My nodes:
+--------------------------------------------------------+
| Node | NodeType | Addresses | State |
|========================================================|
| l01 | SATELLITE | 192.168.2.100:3366 (PLAIN) | Online |
| l02 | SATELLITE | 192.168.2.101:3366 (PLAIN) | Online |
| l03 | SATELLITE | 192.168.2.102:3366 (PLAIN) | Online |
+--------------------------------------------------------+
from kube-linstor.
Yessss ! Thanks a lot !
from kube-linstor.
Hey, thanks for the report!
We're forced to use older version of kube-scheduller for linstor due the upstream bug kubernetes/kubernetes#86281 and formally kubernetes/kubernetes#84169
v1.16.9
is working fine on v1.18.x
cluster, so I set it by default.
But I agree that we need to support for newer versions too, so I'm going to add leases.coordination.k8s.io
resource to stork-scheduler role.
from kube-linstor.
Fixed in 1bf3eca, @fondemen please check version from master if it solves your problem.
Thanks!
from kube-linstor.
Thanks for your reactivity !
Unfortunately, not. Same message thrown at me.
Now, even if I give full access (clsuetr-admin) to linstor-linstor-stork-scheduler, erroneus access log stops, but still, my pod is scheduled on the wrong node:
vagrant@l01:~$ linstor v l
+-------------------------------------------------------------------------------------------------------------------------------------------+
| Node | Resource | StoragePool | VolNr | MinorNr | DeviceName | Allocated | InUse | State |
|===========================================================================================================================================|
| l01 | pvc-7734f1e1-c8e6-436d-b3da-6d60344706da | default | 0 | 1000 | /dev/drbd1000 | 148.60 MiB | Unused | UpToDate |
| l02 | pvc-7734f1e1-c8e6-436d-b3da-6d60344706da | default | 0 | 1000 | /dev/drbd1000 | 148.60 MiB | Unused | UpToDate |
| l03 | pvc-7734f1e1-c8e6-436d-b3da-6d60344706da | DfltDisklessStorPool | 0 | 1000 | /dev/drbd1000 | | InUse | Diskless |
+-------------------------------------------------------------------------------------------------------------------------------------------+
vagrant@l01:~$ k get pod nginx-deploy-bf9dcc9c9-zw2nq -o wide
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
nginx-deploy-bf9dcc9c9-zw2nq 1/1 Running 0 152m 192.168.126.66 l03 <none> <none>
This is a sandobx 3 nodes cluster with master on l01
from kube-linstor.
That's an interesting issue, just to be sure:
-
have you specified
spec: schedulerName: stork
for your pod?
-
Is stork working for you with default
kube-scheduler:v1.16.9
image? -
Do
l01
andl02
nodes have any taints?
from kube-linstor.
-
I didn't know that schedulerName parameter, my bad... But still, my pod is scheduled on the wrong node.
-
Regarding node taint, l01 is master, that's all:
vagrant@l01:~/kube-linstor$ k get no --show-labels
NAME STATUS ROLES AGE VERSION LABELS
l01 Ready master 4h17m v1.18.3 beta.kubernetes.io/arch=amd64,beta.kubernetes.io/os=linux,kubernetes.io/arch=amd64,kubernetes.io/hostname=l01,kubernetes.io/os=linux,linbit.com/hostname=l01,node-role.kubernetes.io/master=
l02 Ready <none> 4h13m v1.18.3 beta.kubernetes.io/arch=amd64,beta.kubernetes.io/os=linux,kubernetes.io/arch=amd64,kubernetes.io/hostname=l02,kubernetes.io/os=linux,linbit.com/hostname=l02
l03 Ready <none> 4h11m v1.18.3 beta.kubernetes.io/arch=amd64,beta.kubernetes.io/os=linux,kubernetes.io/arch=amd64,kubernetes.io/hostname=l03,kubernetes.io/os=linux,linbit.com/hostname=l03
Linstor-related services are all located on master (with nodeSelector) except those belonging to daemon sets:
vagrant@l01:~/kube-linstor$ k -n linstor get pods -o=custom-columns=NAME:.metadata.name,NODE:.spec.nodeName
NAME NODE
linstor-db-7dbdd66fc5-qmhz8 l01
linstor-linstor-controller-0 l01
linstor-linstor-csi-controller-0 l01
linstor-linstor-csi-node-2fzj7 l01
linstor-linstor-csi-node-6g7jm l03
linstor-linstor-csi-node-xg7w5 l02
linstor-linstor-satellite-9fphx l01
linstor-linstor-satellite-flvtt l03
linstor-linstor-satellite-pwxh4 l02
linstor-linstor-stork-fcc868d4b-scj8z l01
linstor-linstor-stork-scheduler-546dd9bbcf-dm28x l01
- I tried the default image (1.16.9). In that case, the cluster-admin rights are no longer necessary, stork-scheduler logs look clean... until the pod is run. Here are logs for stork-scheduler: https://gist.github.com/fondemen/0ccdee9c2a4e98d41aadcdf9512101aa
We can see the scheduler is indeed invoked, with some ACL errors regarding event creation.
When applying full rights: https://gist.github.com/fondemen/0c5400ba48a1ac2100db7b040b849c03. No more event generation problems, but still on bad node.
It's weird how it schedules constantly on the bad node (never seen it go on l02, always on l03)...
Just to be sure, here is my storage class:
apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
name: "linstor"
annotations:
storageclass.kubernetes.io/is-default-class: "true"
provisioner: linstor.csi.linbit.com
parameters:
autoPlace: "2"
storagePool: "default"
I also tried with localStoragePolicy= preferred, but linstor-csi complains as if this parameter didn't exist.
from kube-linstor.
Same behavior on K8s 1.17.6...
Still those E0617 15:11:06.358725 1 reflector.go:153] k8s.io/apiserver/pkg/server/dynamiccertificates/configmap_cafile_content.go:209: Failed to list *v1.ConfigMap: configmaps "extension-apiserver-authentication" is forbidden: User "system:serviceaccount:linstor:linstor-linstor-stork-scheduler" cannot list resource "configmaps" in API group "" in the namespace "kube-system"
on stork-scheduler that disappear when giving full access.
When scheduling my pod (with schedulerName: stork
), stork-scheduler says
Trace[47690483]: [67.512657ms] [67.500912ms] Computing predicates done
Trace[47690483]: [102.43624ms] [34.922473ms] Prioritizing done
no matter if I use scheduler image 1.16.9 or 1.17.6...
from kube-linstor.
Hey, could you check if the following resources created in your cluster:
kubectl get clusterrole/linstor-linstor-stork-scheduler -o yaml
kubectl get clusterrolebinding/linstor-linstor-stork-scheduler -o yaml
they are templating from this file
https://github.com/kvaps/kube-linstor/blob/master/helm/kube-linstor/templates/stork-scheduler-rbac.yaml
from kube-linstor.
K8s 1.17.6, deployed in linstor namespace
kubectl get clusterrole/linstor-linstor-stork-scheduler clusterrolebinding/linstor-linstor-stork-scheduler -o yaml
apiVersion: v1
items:
- apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
annotations:
meta.helm.sh/release-name: linstor
meta.helm.sh/release-namespace: linstor
creationTimestamp: "2020-06-17T17:36:09Z"
labels:
app.kubernetes.io/managed-by: Helm
name: linstor-linstor-stork-scheduler
resourceVersion: "34192"
selfLink: /apis/rbac.authorization.k8s.io/v1/clusterroles/linstor-linstor-stork-scheduler
uid: cbd78aaf-dfa1-4d7e-a725-b950e1294cc1
rules:
- apiGroups:
- ""
resources:
- endpoints
verbs:
- get
- update
- apiGroups:
- ""
resources:
- configmaps
verbs:
- get
- apiGroups:
- ""
resources:
- events
verbs:
- create
- patch
- update
- apiGroups:
- ""
resources:
- endpoints
verbs:
- create
- apiGroups:
- ""
resourceNames:
- kube-scheduler
resources:
- endpoints
verbs:
- delete
- get
- patch
- update
- apiGroups:
- ""
resources:
- nodes
verbs:
- get
- list
- watch
- apiGroups:
- ""
resources:
- pods
verbs:
- delete
- get
- list
- watch
- apiGroups:
- ""
resources:
- bindings
- pods/binding
verbs:
- create
- apiGroups:
- ""
resources:
- pods/status
verbs:
- patch
- update
- apiGroups:
- ""
resources:
- replicationcontrollers
- services
verbs:
- get
- list
- watch
- apiGroups:
- '*'
resources:
- replicasets
verbs:
- get
- list
- watch
- apiGroups:
- apps
resources:
- statefulsets
verbs:
- get
- list
- watch
- apiGroups:
- policy
resources:
- poddisruptionbudgets
verbs:
- get
- list
- watch
- apiGroups:
- ""
resources:
- persistentvolumeclaims
- persistentvolumes
verbs:
- get
- list
- watch
- apiGroups:
- storage.k8s.io
resources:
- storageclasses
- csinodes
verbs:
- get
- list
- watch
- apiGroups:
- coordination.k8s.io
resources:
- leases
verbs:
- get
- create
- update
- apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
annotations:
meta.helm.sh/release-name: linstor
meta.helm.sh/release-namespace: linstor
creationTimestamp: "2020-06-17T17:36:09Z"
labels:
app.kubernetes.io/managed-by: Helm
name: linstor-linstor-stork-scheduler
resourceVersion: "34195"
selfLink: /apis/rbac.authorization.k8s.io/v1/clusterrolebindings/linstor-linstor-stork-scheduler
uid: 6ce3d636-cd89-42c2-86b4-51753954c12d
roleRef:
apiGroup: rbac.authorization.k8s.io
kind: ClusterRole
name: linstor-linstor-stork-scheduler
subjects:
- kind: ServiceAccount
name: linstor-linstor-stork-scheduler
namespace: linstor
kind: List
metadata:
resourceVersion: ""
selfLink: ""
from kube-linstor.
I just found that there was indeed missing list
verb for stork-scheduler, I fixed it in d25cbba
from kube-linstor.
related PR to upstream project libopenstorage/stork#629
from kube-linstor.
New error here :
'events.events.k8s.io is forbidden: User "system:serviceaccount:linstor:linstor-linstor-stork-scheduler" cannot create resource "events" in API group "events.k8s.io" in the namespace "default"' (will not retry!)
and later
User "system:serviceaccount:linstor:linstor-linstor-stork-scheduler" cannot patch resource "events" in API group "events.k8s.io" in the namespace "default"' (will not retry!)
though the "create" verb is there for events
from kube-linstor.
Merely adding
- apiGroups: ["events.k8s.io"]
resources: ["events"]
verbs: ["create", "patch", "update"]
on helm/kube-linstor/templates/stork-scheduler-rbac.yaml seems to solve the issue
but still, my stupid pod is on the wrong node
from kube-linstor.
Merely adding
- apiGroups: ["events.k8s.io"] resources: ["events"] verbs: ["create", "patch", "update"]
Thanks, added events.k8s.io in 89f91fa
but still, my stupid pod is on the wrong node
Check your taints:
kubectl get node -o custom-columns=NAME:.metadata.name,TAINTS:.spec.taints
and your linstor nodes:
linstor n l
from kube-linstor.
maybe storage pools?
linstor sp l
from kube-linstor.
+------------------------------------------------------------------------------------------------------------+
| StoragePool | Node | Driver | PoolName | FreeCapacity | TotalCapacity | CanSnapshots | State |
|============================================================================================================|
| DfltDisklessStorPool | l01 | DISKLESS | | | | False | Ok |
| DfltDisklessStorPool | l02 | DISKLESS | | | | False | Ok |
| DfltDisklessStorPool | l03 | DISKLESS | | | | False | Ok |
| default | l01 | LVM_THIN | linvg/linlv | 59.75 GiB | 59.75 GiB | True | Ok |
| default | l02 | LVM_THIN | linvg/linlv | 59.75 GiB | 59.75 GiB | True | Ok |
| default | l03 | LVM_THIN | linvg/linlv | 59.75 GiB | 59.75 GiB | True | Ok |
+------------------------------------------------------------------------------------------------------------+
from kube-linstor.
What if you cordon l03 node, will pod be scheduled to l02 or it will stuck on Pending state?
from kube-linstor.
Good point. But no, it's scheduled on l02. Worse : if I uncordon l03, then delete and recreate my deployment, pod goes l03 again.
After making more tests by adding more couples deployment+pvc, it seems that, most of the time, pods are scheduled on the good node. But most of the time only.
I also made some tests by installing stork myself and got similar results (but not 100% sure I did it properly).
I set --debug in stork and got those logs:
1 time="2020-06-18T08:22:16Z" level=debug msg="Nodes in filter request:" Namespace=default Owner=ReplicaSet/nginx-deploy-6ffc789457 PodName=nginx-deploy-6ffc789457-wbp76
2 time="2020-06-18T08:22:16Z" level=debug msg="l02 [{Type:InternalIP Address:192.168.2.101} {Type:Hostname Address:l02}]" Namespace=default Owner=ReplicaSet/nginx-deploy-6ffc789457 PodName=nginx-deploy-6ffc789457-wbp76
3 time="2020-06-18T08:22:16Z" level=debug msg="l03 [{Type:InternalIP Address:192.168.2.102} {Type:Hostname Address:l03}]" Namespace=default Owner=ReplicaSet/nginx-deploy-6ffc789457 PodName=nginx-deploy-6ffc789457-wbp76
4 time="2020-06-18T08:22:16Z" level=info msg="called: GetPodVolumes(nginx, default)"
5 time="2020-06-18T08:22:16Z" level=info msg="called: OwnsPVC(test-pvc)"
6 time="2020-06-18T08:22:16Z" level=info msg="-> yes"
7 time="2020-06-18T08:22:16Z" level=info msg="called: InspectVolume(pvc-151e8c5c-7e48-462d-90db-ded27f1d5377)"
8 [DEBUG] curl -X 'GET' -H 'Accept: application/json' 'https://linstor-linstor-controller:3371/v1/resource-definitions/pvc-151e8c5c-7e48-462d-90db-ded27f1d5377'
9 [DEBUG] curl -X 'GET' -H 'Accept: application/json' 'https://linstor-linstor-controller:3371/v1/resource-definitions/pvc-151e8c5c-7e48-462d-90db-ded27f1d5377/resources'
10 [DEBUG] curl -X 'GET' -H 'Accept: application/json' 'https://linstor-linstor-controller:3371/v1/resource-definitions/pvc-151e8c5c-7e48-462d-90db-ded27f1d5377/volume-definitions/0'
11 time="2020-06-18T08:22:16Z" level=info msg="called: GetNodes()"
12 [DEBUG] curl -X 'GET' -H 'Accept: application/json' 'https://linstor-linstor-controller:3371/v1/nodes'
13 time="2020-06-18T08:22:16Z" level=debug msg="nodeInfo: &{l01 l01 [192.168.2.100] Online}" Namespace=default Owner=ReplicaSet/nginx-deploy-6ffc789457 PodName=nginx-deploy-6ffc789457-wbp76
14 time="2020-06-18T08:22:16Z" level=debug msg="nodeInfo: &{l02 l02 [192.168.2.101] Online}" Namespace=default Owner=ReplicaSet/nginx-deploy-6ffc789457 PodName=nginx-deploy-6ffc789457-wbp76
15 time="2020-06-18T08:22:16Z" level=debug msg="nodeInfo: &{l01 l01 [192.168.2.100] Online}" Namespace=default Owner=ReplicaSet/nginx-deploy-6ffc789457 PodName=nginx-deploy-6ffc789457-wbp76
16 time="2020-06-18T08:22:16Z" level=debug msg="nodeInfo: &{l02 l02 [192.168.2.101] Online}" Namespace=default Owner=ReplicaSet/nginx-deploy-6ffc789457 PodName=nginx-deploy-6ffc789457-wbp76
17 time="2020-06-18T08:22:16Z" level=debug msg="nodeInfo: &{l03 l03 [192.168.2.102] Online}" Namespace=default Owner=ReplicaSet/nginx-deploy-6ffc789457 PodName=nginx-deploy-6ffc789457-wbp76
18 time="2020-06-18T08:22:16Z" level=debug msg="Nodes in filter response:" Namespace=default Owner=ReplicaSet/nginx-deploy-6ffc789457 PodName=nginx-deploy-6ffc789457-wbp76
19 time="2020-06-18T08:22:16Z" level=debug msg="l02 [{Type:InternalIP Address:192.168.2.101} {Type:Hostname Address:l02}]"
20 time="2020-06-18T08:22:16Z" level=debug msg="l03 [{Type:InternalIP Address:192.168.2.102} {Type:Hostname Address:l03}]"
21 time="2020-06-18T08:22:16Z" level=debug msg="Nodes in prioritize request:" Namespace=default Owner=ReplicaSet/nginx-deploy-6ffc789457 PodName=nginx-deploy-6ffc789457-wbp76
22 time="2020-06-18T08:22:16Z" level=debug msg="[{Type:InternalIP Address:192.168.2.101} {Type:Hostname Address:l02}]" Namespace=default Owner=ReplicaSet/nginx-deploy-6ffc789457 PodName=nginx-deploy-6ffc789457-wbp76
23 time="2020-06-18T08:22:16Z" level=debug msg="[{Type:InternalIP Address:192.168.2.102} {Type:Hostname Address:l03}]" Namespace=default Owner=ReplicaSet/nginx-deploy-6ffc789457 PodName=nginx-deploy-6ffc789457-wbp76
24 time="2020-06-18T08:22:16Z" level=info msg="called: GetPodVolumes(nginx, default)"
25 time="2020-06-18T08:22:16Z" level=info msg="called: OwnsPVC(test-pvc)"
26 time="2020-06-18T08:22:16Z" level=info msg="-> yes"
27 time="2020-06-18T08:22:16Z" level=info msg="called: InspectVolume(pvc-151e8c5c-7e48-462d-90db-ded27f1d5377)"
28 [DEBUG] curl -X 'GET' -H 'Accept: application/json' 'https://linstor-linstor-controller:3371/v1/resource-definitions/pvc-151e8c5c-7e48-462d-90db-ded27f1d5377'
29 [DEBUG] curl -X 'GET' -H 'Accept: application/json' 'https://linstor-linstor-controller:3371/v1/resource-definitions/pvc-151e8c5c-7e48-462d-90db-ded27f1d5377/resources'
30 [DEBUG] curl -X 'GET' -H 'Accept: application/json' 'https://linstor-linstor-controller:3371/v1/resource-definitions/pvc-151e8c5c-7e48-462d-90db-ded27f1d5377/volume-definitions/0'
31 time="2020-06-18T08:22:16Z" level=debug msg="Got driverVolumes: [0xc0000dec40]" Namespace=default Owner=ReplicaSet/nginx-deploy-6ffc789457 PodName=nginx-deploy-6ffc789457-wbp76
32 time="2020-06-18T08:22:16Z" level=info msg="called: GetNodes()"
33 [DEBUG] curl -X 'GET' -H 'Accept: application/json' 'https://linstor-linstor-controller:3371/v1/nodes'
34 time="2020-06-18T08:22:16Z" level=debug msg="nodeInfo: &{l01 l01 [192.168.2.100] Online}" Namespace=default Owner=ReplicaSet/nginx-deploy-6ffc789457 PodName=nginx-deploy-6ffc789457-wbp76
35 time="2020-06-18T08:22:16Z" level=debug msg="nodeInfo: &{l02 l02 [192.168.2.101] Online}" Namespace=default Owner=ReplicaSet/nginx-deploy-6ffc789457 PodName=nginx-deploy-6ffc789457-wbp76
36 time="2020-06-18T08:22:16Z" level=debug msg="nodeInfo: &{l03 l03 [192.168.2.102] Online}" Namespace=default Owner=ReplicaSet/nginx-deploy-6ffc789457 PodName=nginx-deploy-6ffc789457-wbp76
37 time="2020-06-18T08:22:16Z" level=debug msg="rackMap: map[l01: l02: l03:]" Namespace=default Owner=ReplicaSet/nginx-deploy-6ffc789457 PodName=nginx-deploy-6ffc789457-wbp76
38 time="2020-06-18T08:22:16Z" level=debug msg="zoneMap: map[l01: l02: l03:]" Namespace=default Owner=ReplicaSet/nginx-deploy-6ffc789457 PodName=nginx-deploy-6ffc789457-wbp76
39 time="2020-06-18T08:22:16Z" level=debug msg="regionMap: map[l01: l02: l03:]" Namespace=default Owner=ReplicaSet/nginx-deploy-6ffc789457 PodName=nginx-deploy-6ffc789457-wbp76
40 time="2020-06-18T08:22:16Z" level=debug msg="Volume pvc-151e8c5c-7e48-462d-90db-ded27f1d5377 allocated on nodes:" Namespace=default Owner=ReplicaSet/nginx-deploy-6ffc789457 PodName=nginx-deploy-6ffc789457-wbp76
41 time="2020-06-18T08:22:16Z" level=debug msg="ID: l01 Hostname: l01"
42 time="2020-06-18T08:22:16Z" level=debug msg="ID: l02 Hostname: l02"
43 time="2020-06-18T08:22:16Z" level=debug msg="ID: l03 Hostname: l03"
44 time="2020-06-18T08:22:16Z" level=debug msg="Volume pvc-151e8c5c-7e48-462d-90db-ded27f1d5377 allocated on racks: [ ]" Namespace=default Owner=ReplicaSet/nginx-deploy-6ffc789457 PodName=nginx-deploy-6ffc789457-wbp76
45 time="2020-06-18T08:22:16Z" level=debug msg="Volume pvc-151e8c5c-7e48-462d-90db-ded27f1d5377 allocated in zones: [ ]" Namespace=default Owner=ReplicaSet/nginx-deploy-6ffc789457 PodName=nginx-deploy-6ffc789457-wbp76
46 time="2020-06-18T08:22:16Z" level=debug msg="Volume pvc-151e8c5c-7e48-462d-90db-ded27f1d5377 allocated in regions: [ ]" Namespace=default Owner=ReplicaSet/nginx-deploy-6ffc789457 PodName=nginx-deploy-6ffc789457-wbp76
47 time="2020-06-18T08:22:16Z" level=debug msg="getNodeScore, let's go" node=l02
48 time="2020-06-18T08:22:16Z" level=debug msg="rack info: &{HostnameMap:map[l01: l02: l03:] PreferredLocality:[ ]}" node=l02
49 time="2020-06-18T08:22:16Z" level=debug msg="zone info: &{HostnameMap:map[l01: l02: l03:] PreferredLocality:[ ]}" node=l02
50 time="2020-06-18T08:22:16Z" level=debug msg="region info: &{HostnameMap:map[l01: l02: l03:] PreferredLocality:[ ]}" node=l02
51 time="2020-06-18T08:22:16Z" level=debug msg="nodeRack: " node=l02
52 time="2020-06-18T08:22:16Z" level=debug msg="nodeZone: " node=l02
53 time="2020-06-18T08:22:16Z" level=debug msg="nodeRegion: " node=l02
54 time="2020-06-18T08:22:16Z" level=debug msg="node match, returning node priority score (100)" node=l02
55 time="2020-06-18T08:22:16Z" level=debug msg="getNodeScore, let's go" node=l03
56 time="2020-06-18T08:22:16Z" level=debug msg="rack info: &{HostnameMap:map[l01: l02: l03:] PreferredLocality:[ ]}" node=l03
57 time="2020-06-18T08:22:16Z" level=debug msg="zone info: &{HostnameMap:map[l01: l02: l03:] PreferredLocality:[ ]}" node=l03
58 time="2020-06-18T08:22:16Z" level=debug msg="region info: &{HostnameMap:map[l01: l02: l03:] PreferredLocality:[ ]}" node=l03
59 time="2020-06-18T08:22:16Z" level=debug msg="nodeRack: " node=l03
60 time="2020-06-18T08:22:16Z" level=debug msg="nodeZone: " node=l03
61 time="2020-06-18T08:22:16Z" level=debug msg="nodeRegion: " node=l03
62 time="2020-06-18T08:22:16Z" level=debug msg="node match, returning node priority score (100)" node=l03
63 time="2020-06-18T08:22:16Z" level=debug msg="Nodes in response:" Namespace=default Owner=ReplicaSet/nginx-deploy-6ffc789457 PodName=nginx-deploy-6ffc789457-wbp76
64 time="2020-06-18T08:22:16Z" level=debug msg="{Host:l02 Score:100}" Namespace=default Owner=ReplicaSet/nginx-deploy-6ffc789457 PodName=nginx-deploy-6ffc789457-wbp76
65 time="2020-06-18T08:22:16Z" level=debug msg="{Host:l03 Score:100}" Namespace=default Owner=ReplicaSet/nginx-deploy-6ffc789457 PodName=nginx-deploy-6ffc789457-wbp76
To me, L.43 is the problem: why is the volume believed to be present on l03?
If I curl the linstor-linstor-controller service (curl -k -X 'GET' -H 'Accept: application/json' 'https://10.109.43.182:3371/v1/resource-definitions/pvc-151e8c5c-7e48-462d-90db-ded27f1d5377/volume-definitions/0') I've got an empty response.
Maybe I should add another node and disable linstor on l01...
from kube-linstor.
Would it be the case that DRBD is a master/slave replication and that as long as the pod cannot be assigned to the master node, it is to be scheduled anywhere else? Is there a mean to check who is the master node for a volume? Is there a mean to avoid master being scheduled on some nodes?
from kube-linstor.
It seems there were a lot of changes in upstream stork-driver since the last update
kvaps/stork@linstor-configurable-endpoint...LINBIT:linstor-driver
I prepared new images with the latest changes:
kvaps/linstor-csi:v1.7.1-3
kvaps/linstor-stork:v1.7.1-3
please try them just to make sure if it wasn't fixed there
from kube-linstor.
Would it be the case that DRBD is a master/slave replication and that as long as the pod cannot be assigned to the master node, it is to be scheduled anywhere else?
Stork is just seeking for diskful resources and scheduling your pod into same nodes, if possible. You can see them:
linstor r l -r pvc-151e8c5c-7e48-462d-90db-ded27f1d5377 | grep -b Diskless
Is there a mean to check who is the master node for a volume? Is there a mean to avoid master being scheduled on some nodes?
If I remember it correct, all diskful resource are somwhow "master", current primary writes and reads the data to all of them.
from kube-linstor.
No change with 1.7.1-3.
I've tried linstor v l *-a*
before running my pod and I've got:
+---------------------------------------------------------------------------------------------------------------------------------------------+
| Node | Resource | StoragePool | VolNr | MinorNr | DeviceName | Allocated | InUse | State |
|=============================================================================================================================================|
| l01 | pvc-151e8c5c-7e48-462d-90db-ded27f1d5377 | default | 0 | 1000 | /dev/drbd1000 | 148.60 MiB | Unused | UpToDate |
| l02 | pvc-151e8c5c-7e48-462d-90db-ded27f1d5377 | default | 0 | 1000 | /dev/drbd1000 | 148.60 MiB | Unused | UpToDate |
| l03 | pvc-151e8c5c-7e48-462d-90db-ded27f1d5377 | DfltDisklessStorPool | 0 | 1000 | /dev/drbd1000 | | Unused | TieBreaker |
+---------------------------------------------------------------------------------------------------------------------------------------------+
This means that l03 plays a game regarding my volume : TieBreaker, which would explain why it's considered as schedulable. Might be a linstor driver for stork issue.
I guess I need to perform more tests with more nodes.
from kube-linstor.
No change with 1.7.1-3.
But it is working fine?
This means that l03 plays a game regarding my volume : TieBreaker, which would explain why it's considered as schedulable. Might be a linstor driver for stork issue.
Yep, try to temporarily disable tiebreaker
linstor c sp DrbdOptions/auto-add-quorum-tiebreaker False
and delete this resource:
linstor r d l03 pvc-151e8c5c-7e48-462d-90db-ded27f1d5377
from kube-linstor.
I've added 2 more nodes and disabled tiebraker and... pod scheduled on l02 !!!
I'll try more tests but looks good !
Yes, with 1.7.1-3.
from kube-linstor.
I confirm. I've started 3 more deployments and all are scheduled on a proper node.
I'm using your new images and running K8s 1.18.4.
Now, it might be useful to send an issue to the linstor stork driver.
from kube-linstor.
There is closed issues board, but I think you can try report it to golinstor project or directly to [email protected] mailing list.
from kube-linstor.
It seems upstream bug is fixed, I just rebuilt images and updated helm chart, changes already in master
FYI
from kube-linstor.
Thanks. But stork is no longer working. I've got plenty of
2020/06/22 19:48:59 failed to create cluster domains status object for driver linstor: failed to query linstor controller properties: Get "https://localhost:3371/v1/controller/properties": dial tcp 127.0.0.1:3371: connect: connection refused Next retry in: 10s
time="2020-06-22T19:49:09Z" level=info msg="called: GetClusterID()"
[DEBUG] curl -X 'GET' -H 'Accept: application/json' 'https://localhost:3371/v1/controller/properties'
time="2020-06-22T19:49:09Z" level=info msg="called: String()"
Of course, scheduling fails:
[DEBUG] curl -X 'GET' -H 'Accept: application/json' 'https://localhost:3371/v1/resource-definitions/pvc-9877167d-dea9-4333-955c-e4c5b30e73f4'
time="2020-06-22T20:06:24Z" level=info msg="called: GetPodVolumes(nginx, default)"
time="2020-06-22T20:06:24Z" level=info msg="called: OwnsPVC(test-pvc)"
time="2020-06-22T20:06:24Z" level=info msg="-> yes"
time="2020-06-22T20:06:24Z" level=info msg="called: InspectVolume(pvc-9877167d-dea9-4333-955c-e4c5b30e73f4)"
[DEBUG] curl -X 'GET' -H 'Accept: application/json' 'https://localhost:3371/v1/nodes'
time="2020-06-22T20:06:24Z" level=info msg="called: GetNodes()"
time="2020-06-22T20:06:24Z" level=error msg="Error getting list of driver nodes, returning all nodes: failed to get linstor nodes: Get \"https://localhost:3371/v1/nodes\": dial tcp 127.0.0.1:3371: connect: connection refused" Namespace=default Owner=ReplicaSet/nginx-deploy-6ffc789457 PodName=nginx-deploy-6ffc789457-79sk8
time="2020-06-22T20:06:24Z" level=info msg="called: GetPodVolumes(nginx, default)"
time="2020-06-22T20:06:24Z" level=info msg="called: OwnsPVC(test-pvc)"
time="2020-06-22T20:06:24Z" level=info msg="-> yes"
localhost is clearly the problem here, though LS_ENDPOINT is properly set to 'https://linstor-linstor-controller:3371'
I guess there is a regression here...
from kube-linstor.
You're right in LINBIT/stork@854a531 LS_ENDPOINT
was changed to built-in LS_CONTROLLERS
.
Fixed in 3df6c06 and tested, now stork is working fine for me.
Test instance:
---
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
name: linstor-volume-pvc
spec:
accessModes:
- ReadWriteOnce
volumeMode: Filesystem
resources:
requests:
storage: 8Gi
storageClassName: linstor-1
---
apiVersion: v1
kind: Pod
metadata:
name: fedora
namespace: default
spec:
schedulerName: stork
containers:
- name: fedora
image: fedora
command: [/bin/bash]
args: ["-c", "while true; do sleep 10; done"]
volumeMounts:
- name: linstor-volume-pvc
mountPath: /data
ports:
- containerPort: 80
volumes:
- name: linstor-volume-pvc
persistentVolumeClaim:
claimName: "linstor-volume-pvc"
from kube-linstor.
Related Issues (20)
- The satellite does not support the device provider LVM_THIN HOT 11
- Names changed in v1.9.0 helm chart HOT 3
- Design best practices for production HOT 4
- Etcd DB Conf and Certs HOT 2
- Automatically join nodes HOT 9
- Comparison and/or top level overview of the drbd-on-k8s ecosystem HOT 6
- stork scheduler is missing some authorizations in K8s 1.21 HOT 1
- function "mustToJson" not defined on 1.11.1-3 HOT 3
- linstor 1.12.2 K8s 1.21 : cannot create pv HOT 6
- Add labels to controller service HOT 1
- How can I use this in k8s HOT 4
- image "ghcr.io/kvaps/linstor-stork:v1.12.5" not found HOT 1
- Arm64 support? HOT 7
- files inside a pod stored on local-fs instead of drbd HOT 5
- satellites turn into 'offline' frequently HOT 8
- Using external Linstor controller/cluster HOT 2
- Unable to install stolon via Helm in Kubernetes 1.22 HOT 4
- CSI Plugin Not Registering on Nodes HOT 3
- Updates HOT 6
- Stork 2.6.4 is incompatible with Kubernetes >=1.22 HOT 4
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from kube-linstor.