Comments (11)
你好,麻烦提供一下xxx-kuscia-lite-master、xxx-kuscia-lite-alice、xxx-kuscia-lite-bob里完整的kuscia日志,日志路径是/home/kuscia/var/logs/kuscia.log
from kuscia.
完整日志太大了,我把日志中的ERROR挑出来了
master kuscia.log:
2023-07-26 14:58:48.623 ERROR modules/k3s.go:56 open /home/kuscia/var/k3s/server/tls/server-ca.crt: no such file or directory 2023-07-26 14:58:49.624 ERROR modules/k3s.go:56 open /home/kuscia/var/k3s/server/tls/server-ca.crt: no such file or directory 2023-07-26 14:58:50.623 ERROR modules/k3s.go:56 open /home/kuscia/var/k3s/server/tls/server-ca.crt: no such file or directory 2023-07-26 14:58:51.624 ERROR modules/k3s.go:56 open /home/kuscia/var/k3s/server/tls/server-ca.crt: no such file or directory 2023-07-26 14:58:52.624 ERROR modules/k3s.go:56 open /home/kuscia/var/k3s/server/tls/server-ca.crt: no such file or directory 2023-07-26 14:58:53.624 ERROR modules/k3s.go:56 open /home/kuscia/var/k3s/server/tls/server-ca.crt: no such file or directory 2023-07-26 14:58:54.624 ERROR modules/k3s.go:56 open /home/kuscia/var/k3s/server/tls/server-ca.crt: no such file or directory 2023-07-26 14:58:55.624 ERROR modules/k3s.go:56 open /home/kuscia/var/k3s/server/tls/server-ca.crt: no such file or directory 2023-07-26 14:58:56.624 ERROR modules/k3s.go:56 open /home/kuscia/var/k3s/server/tls/server-ca.crt: no such file or directory 2023-07-26 14:58:57.624 ERROR modules/k3s.go:56 open /home/kuscia/var/k3s/server/tls/server-ca.crt: no such file or directory 2023-07-26 14:58:58.602 ERROR modules/k3s.go:56 open /home/kuscia/var/k3s/server/tls/server-ca.crt: no such file or directory 2023-07-26 14:58:59.602 ERROR modules/k3s.go:56 open /home/kuscia/var/k3s/server/tls/server-ca.crt: no such file or directory 2023-07-26 14:59:00.601 ERROR modules/k3s.go:56 open /home/kuscia/var/k3s/server/tls/server-ca.crt: no such file or directory 2023-07-26 14:59:01.602 ERROR modules/k3s.go:56 open /home/kuscia/var/k3s/server/tls/server-ca.crt: no such file or directory 2023-07-26 14:59:02.602 ERROR modules/k3s.go:56 open /home/kuscia/var/k3s/server/tls/server-ca.crt: no such file or directory 2023-07-26 14:59:03.602 ERROR modules/k3s.go:56 open /home/kuscia/var/k3s/server/tls/server-ca.crt: no such file or directory 2023-07-26 14:59:04.601 ERROR modules/k3s.go:56 open /home/kuscia/var/k3s/server/tls/server-ca.crt: no such file or directory 2023-07-26 14:59:05.602 ERROR modules/k3s.go:56 open /home/kuscia/var/k3s/server/tls/server-ca.crt: no such file or directory 2023-07-26 14:59:06.602 ERROR modules/k3s.go:56 open /home/kuscia/var/k3s/server/tls/server-ca.crt: no such file or directory 2023-07-26 14:59:07.601 ERROR modules/k3s.go:56 open /home/kuscia/var/k3s/server/tls/server-ca.crt: no such file or directory 2023-07-26 14:59:08.603 ERROR modules/k3s.go:56 open /home/kuscia/var/k3s/server/tls/server-ca.crt: no such file or directory 2023-07-26 14:59:09.602 ERROR modules/k3s.go:56 open /home/kuscia/var/k3s/server/tls/server-ca.crt: no such file or directory 2023-07-26 14:59:10.603 ERROR modules/k3s.go:56 open /home/kuscia/var/k3s/server/tls/server-ca.crt: no such file or directory 2023-07-26 14:59:11.601 ERROR modules/k3s.go:56 open /home/kuscia/var/k3s/server/tls/server-ca.crt: no such file or directory 2023-07-26 14:59:12.601 ERROR modules/k3s.go:56 open /home/kuscia/var/k3s/server/tls/server-ca.crt: no such file or directory 2023-07-26 14:59:13.601 ERROR modules/k3s.go:56 open /home/kuscia/var/k3s/server/tls/server-ca.crt: no such file or directory 2023-07-26 14:59:14.601 ERROR modules/k3s.go:56 open /home/kuscia/var/k3s/server/tls/server-ca.crt: no such file or directory 2023-07-26 14:59:15.602 ERROR modules/k3s.go:56 open /home/kuscia/var/k3s/server/tls/server-ca.crt: no such file or directory 2023-07-26 14:59:16.602 ERROR modules/k3s.go:56 open /home/kuscia/var/k3s/server/tls/server-ca.crt: no such file or directory 2023-07-26 14:59:17.602 ERROR modules/k3s.go:201 wait k3s ready timeout 2023-07-26 14:59:17.602 ERROR master/master.go:139 error building kubernetes client config from token, detail-> build config from flags failed, detail-> stat /home/kuscia/etc/kubeconfig: no such file or directory 2023-07-26 14:59:40.948 ERROR modules/k3s.go:56 open /home/kuscia/var/k3s/server/tls/server-ca.crt: no such file or directory 2023-07-26 14:59:41.949 ERROR modules/k3s.go:56 open /home/kuscia/var/k3s/server/tls/server-ca.crt: no such file or directory 2023-07-26 14:59:42.948 ERROR modules/k3s.go:56 open /home/kuscia/var/k3s/server/tls/server-ca.crt: no such file or directory 2023-07-26 14:59:43.947 ERROR modules/k3s.go:56 open /home/kuscia/var/k3s/server/tls/server-ca.crt: no such file or directory 2023-07-26 14:59:44.949 ERROR modules/k3s.go:56 open /home/kuscia/var/k3s/server/tls/server-ca.crt: no such file or directory 2023-07-26 14:59:45.948 ERROR modules/k3s.go:56 open /home/kuscia/var/k3s/server/tls/server-ca.crt: no such file or directory 2023-07-26 14:59:46.948 ERROR modules/k3s.go:56 open /home/kuscia/var/k3s/server/tls/server-ca.crt: no such file or directory 2023-07-26 14:59:47.948 ERROR modules/k3s.go:56 open /home/kuscia/var/k3s/server/tls/server-ca.crt: no such file or directory 2023-07-26 14:59:48.949 ERROR modules/k3s.go:56 open /home/kuscia/var/k3s/server/tls/server-ca.crt: no such file or directory 2023-07-26 14:59:49.948 ERROR modules/k3s.go:56 open /home/kuscia/var/k3s/server/tls/server-ca.crt: no such file or directory 2023-07-26 14:59:50.948 ERROR modules/k3s.go:56 open /home/kuscia/var/k3s/server/tls/server-ca.crt: no such file or directory 2023-07-26 14:59:52.331 ERROR modules/k3s.go:56 open /home/kuscia/var/k3s/server/tls/server-ca.crt: no such file or directory 2023-07-26 14:59:52.948 ERROR modules/k3s.go:56 open /home/kuscia/var/k3s/server/tls/server-ca.crt: no such file or directory 2023-07-26 14:59:53.948 ERROR modules/k3s.go:56 open /home/kuscia/var/k3s/server/tls/server-ca.crt: no such file or directory 2023-07-26 14:59:54.948 ERROR modules/k3s.go:56 open /home/kuscia/var/k3s/server/tls/server-ca.crt: no such file or directory 2023-07-26 14:59:55.948 ERROR modules/k3s.go:56 open /home/kuscia/var/k3s/server/tls/server-ca.crt: no such file or directory 2023-07-26 14:59:56.948 ERROR modules/k3s.go:56 open /home/kuscia/var/k3s/server/tls/server-ca.crt: no such file or directory 2023-07-26 14:59:57.925 ERROR modules/k3s.go:56 open /home/kuscia/var/k3s/server/tls/server-ca.crt: no such file or directory 2023-07-26 14:59:58.925 ERROR modules/k3s.go:56 open /home/kuscia/var/k3s/server/tls/server-ca.crt: no such file or directory 2023-07-26 14:59:59.927 ERROR modules/k3s.go:56 open /home/kuscia/var/k3s/server/tls/server-ca.crt: no such file or directory 2023-07-26 15:00:00.926 ERROR modules/k3s.go:56 open /home/kuscia/var/k3s/server/tls/server-ca.crt: no such file or directory 2023-07-26 15:00:01.925 ERROR modules/k3s.go:56 open /home/kuscia/var/k3s/server/tls/server-ca.crt: no such file or directory 2023-07-26 15:00:02.926 ERROR modules/k3s.go:56 open /home/kuscia/var/k3s/server/tls/server-ca.crt: no such file or directory 2023-07-26 15:00:03.925 ERROR modules/k3s.go:56 open /home/kuscia/var/k3s/server/tls/server-ca.crt: no such file or directory 2023-07-26 15:00:04.926 ERROR modules/k3s.go:56 open /home/kuscia/var/k3s/server/tls/server-ca.crt: no such file or directory 2023-07-26 15:00:05.926 ERROR modules/k3s.go:56 open /home/kuscia/var/k3s/server/tls/server-ca.crt: no such file or directory 2023-07-26 15:00:06.926 ERROR modules/k3s.go:56 open /home/kuscia/var/k3s/server/tls/server-ca.crt: no such file or directory 2023-07-26 15:00:07.927 ERROR modules/k3s.go:56 open /home/kuscia/var/k3s/server/tls/server-ca.crt: no such file or directory 2023-07-26 15:00:08.926 ERROR modules/k3s.go:56 open /home/kuscia/var/k3s/server/tls/server-ca.crt: no such file or directory 2023-07-26 15:00:09.926 ERROR modules/k3s.go:201 wait k3s ready timeout 2023-07-26 15:00:09.926 ERROR master/master.go:139 error building kubernetes client config from token, detail-> build config from flags failed, detail-> stat /home/kuscia/etc/kubeconfig: no such file or directory 2023-07-26 15:00:24.051 ERROR modules/k3s.go:56 open /home/kuscia/var/k3s/server/tls/server-ca.crt: no such file or directory 2023-07-26 15:00:25.051 ERROR modules/k3s.go:56 open /home/kuscia/var/k3s/server/tls/server-ca.crt: no such file or directory 2023-07-26 15:00:26.051 ERROR modules/k3s.go:56 open /home/kuscia/var/k3s/server/tls/server-ca.crt: no such file or directory 2023-07-26 15:00:27.051 ERROR modules/k3s.go:56 open /home/kuscia/var/k3s/server/tls/server-ca.crt: no such file or directory 2023-07-26 15:00:28.028 ERROR modules/k3s.go:56 open /home/kuscia/var/k3s/server/tls/server-ca.crt: no such file or directory 2023-07-26 15:00:29.028 ERROR modules/k3s.go:56 open /home/kuscia/var/k3s/server/tls/server-ca.crt: no such file or directory 2023-07-26 15:00:30.029 ERROR modules/k3s.go:56 open /home/kuscia/var/k3s/server/tls/server-ca.crt: no such file or directory 2023-07-26 15:00:31.028 ERROR modules/k3s.go:56 open /home/kuscia/var/k3s/server/tls/server-ca.crt: no such file or directory 2023-07-26 15:00:32.028 ERROR modules/k3s.go:56 open /home/kuscia/var/k3s/server/tls/server-ca.crt: no such file or directory 2023-07-26 15:00:33.028 ERROR modules/k3s.go:56 open /home/kuscia/var/k3s/server/tls/server-ca.crt: no such file or directory 2023-07-26 15:00:34.028 ERROR modules/k3s.go:56 open /home/kuscia/var/k3s/server/tls/server-ca.crt: no such file or directory 2023-07-26 15:00:35.030 ERROR modules/k3s.go:56 open /home/kuscia/var/k3s/server/tls/server-ca.crt: no such file or directory 2023-07-26 15:00:36.029 ERROR modules/k3s.go:56 open /home/kuscia/var/k3s/server/tls/server-ca.crt: no such file or directory 2023-07-26 15:00:37.028 ERROR modules/k3s.go:56 open /home/kuscia/var/k3s/server/tls/server-ca.crt: no such file or directory 2023-07-26 15:00:38.028 ERROR modules/k3s.go:56 open /home/kuscia/var/k3s/server/tls/server-ca.crt: no such file or directory 2023-07-26 15:00:39.029 ERROR modules/k3s.go:56 open /home/kuscia/var/k3s/server/tls/server-ca.crt: no such file or directory 2023-07-26 15:00:40.028 ERROR modules/k3s.go:56 open /home/kuscia/var/k3s/server/tls/server-ca.crt: no such file or directory 2023-07-26 15:00:41.029 ERROR modules/k3s.go:56 open /home/kuscia/var/k3s/server/tls/server-ca.crt: no such file or directory 2023-07-26 15:00:42.028 ERROR modules/k3s.go:56 open /home/kuscia/var/k3s/server/tls/server-ca.crt: no such file or directory 2023-07-26 15:00:43.028 ERROR modules/k3s.go:56 open /home/kuscia/var/k3s/server/tls/server-ca.crt: no such file or directory 2023-07-26 15:00:44.030 ERROR modules/k3s.go:56 open /home/kuscia/var/k3s/server/tls/server-ca.crt: no such file or directory 2023-07-26 15:00:45.032 ERROR modules/k3s.go:56 open /home/kuscia/var/k3s/server/tls/server-ca.crt: no such file or directory 2023-07-26 15:00:46.029 ERROR modules/k3s.go:56 open /home/kuscia/var/k3s/server/tls/server-ca.crt: no such file or directory 2023-07-26 15:00:47.028 ERROR modules/k3s.go:56 open /home/kuscia/var/k3s/server/tls/server-ca.crt: no such file or directory 2023-07-26 15:00:48.028 ERROR modules/k3s.go:56 open /home/kuscia/var/k3s/server/tls/server-ca.crt: no such file or directory 2023-07-26 15:00:49.028 ERROR modules/k3s.go:56 open /home/kuscia/var/k3s/server/tls/server-ca.crt: no such file or directory 2023-07-26 15:00:50.028 ERROR modules/k3s.go:56 open /home/kuscia/var/k3s/server/tls/server-ca.crt: no such file or directory 2023-07-26 15:00:51.028 ERROR modules/k3s.go:56 open /home/kuscia/var/k3s/server/tls/server-ca.crt: no such file or directory 2023-07-26 15:00:52.029 ERROR modules/k3s.go:56 open /home/kuscia/var/k3s/server/tls/server-ca.crt: no such file or directory 2023-07-26 15:00:53.028 ERROR modules/k3s.go:201 wait k3s ready timeout 2023-07-26 15:00:53.028 ERROR master/master.go:139 error building kubernetes client config from token, detail-> build config from flags failed, detail-> stat /home/kuscia/etc/kubeconfig: no such file or directory 2023-07-26 15:01:10.337 ERROR modules/k3s.go:56 open /home/kuscia/var/k3s/server/tls/server-ca.crt: no such file or directory 2023-07-26 15:01:11.338 ERROR modules/k3s.go:56 open /home/kuscia/var/k3s/server/tls/server-ca.crt: no such file or directory 2023-07-26 15:01:12.338 ERROR modules/k3s.go:56 open /home/kuscia/var/k3s/server/tls/server-ca.crt: no such file or directory 2023-07-26 15:01:13.338 ERROR modules/k3s.go:56 open /home/kuscia/var/k3s/server/tls/server-ca.crt: no such file or directory 2023-07-26 15:01:14.340 ERROR modules/k3s.go:56 open /home/kuscia/var/k3s/server/tls/server-ca.crt: no such file or directory 2023-07-26 15:01:15.338 ERROR modules/k3s.go:56 open /home/kuscia/var/k3s/server/tls/server-ca.crt: no such file or directory 2023-07-26 15:01:16.338 ERROR modules/k3s.go:56 open /home/kuscia/var/k3s/server/tls/server-ca.crt: no such file or directory 2023-07-26 15:01:17.338 ERROR modules/k3s.go:56 open /home/kuscia/var/k3s/server/tls/server-ca.crt: no such file or directory 2023-07-26 15:01:18.338 ERROR modules/k3s.go:56 open /home/kuscia/var/k3s/server/tls/server-ca.crt: no such file or directory 2023-07-26 15:01:19.337 ERROR modules/k3s.go:56 open /home/kuscia/var/k3s/server/tls/server-ca.crt: no such file or directory 2023-07-26 15:01:20.338 ERROR modules/k3s.go:56 open /home/kuscia/var/k3s/server/tls/server-ca.crt: no such file or directory 2023-07-26 15:01:21.338 ERROR modules/k3s.go:56 open /home/kuscia/var/k3s/server/tls/server-ca.crt: no such file or directory 2023-07-26 15:01:22.338 ERROR modules/k3s.go:56 open /home/kuscia/var/k3s/server/tls/server-ca.crt: no such file or directory 2023-07-26 15:01:23.338 ERROR modules/k3s.go:56 open /home/kuscia/var/k3s/server/tls/server-ca.crt: no such file or directory 2023-07-26 15:01:24.338 ERROR modules/k3s.go:56 open /home/kuscia/var/k3s/server/tls/server-ca.crt: no such file or directory 2023-07-26 15:01:25.338 ERROR modules/k3s.go:56 open /home/kuscia/var/k3s/server/tls/server-ca.crt: no such file or directory 2023-07-26 15:01:26.338 ERROR modules/k3s.go:56 open /home/kuscia/var/k3s/server/tls/server-ca.crt: no such file or directory 2023-07-26 15:01:27.338 ERROR modules/k3s.go:56 open /home/kuscia/var/k3s/server/tls/server-ca.crt: no such file or directory 2023-07-26 15:01:28.316 ERROR modules/k3s.go:56 open /home/kuscia/var/k3s/server/tls/server-ca.crt: no such file or directory 2023-07-26 15:01:29.315 ERROR modules/k3s.go:56 open /home/kuscia/var/k3s/server/tls/server-ca.crt: no such file or directory 2023-07-26 15:01:30.317 ERROR modules/k3s.go:56 open /home/kuscia/var/k3s/server/tls/server-ca.crt: no such file or directory 2023-07-26 15:01:31.315 ERROR modules/k3s.go:56 open /home/kuscia/var/k3s/server/tls/server-ca.crt: no such file or directory 2023-07-26 15:01:32.315 ERROR modules/k3s.go:56 open /home/kuscia/var/k3s/server/tls/server-ca.crt: no such file or directory 2023-07-26 15:01:33.316 ERROR modules/k3s.go:56 open /home/kuscia/var/k3s/server/tls/server-ca.crt: no such file or directory 2023-07-26 15:01:34.316 ERROR modules/k3s.go:56 open /home/kuscia/var/k3s/server/tls/server-ca.crt: no such file or directory 2023-07-26 15:01:35.317 ERROR modules/k3s.go:56 open /home/kuscia/var/k3s/server/tls/server-ca.crt: no such file or directory 2023-07-26 15:01:36.315 ERROR modules/k3s.go:56 open /home/kuscia/var/k3s/server/tls/server-ca.crt: no such file or directory 2023-07-26 15:01:37.316 ERROR modules/k3s.go:56 open /home/kuscia/var/k3s/server/tls/server-ca.crt: no such file or directory 2023-07-26 15:01:38.315 ERROR modules/k3s.go:56 open /home/kuscia/var/k3s/server/tls/server-ca.crt: no such file or directory 2023-07-26 15:01:39.315 ERROR modules/k3s.go:201 wait k3s ready timeout 2023-07-26 15:01:39.315 ERROR master/master.go:139 error building kubernetes client config from token, detail-> build config from flags failed, detail-> stat /home/kuscia/etc/kubeconfig: no such file or directory 2023-07-26 15:01:53.885 ERROR modules/k3s.go:56 open /home/kuscia/var/k3s/server/tls/server-ca.crt: no such file or directory 2023-07-26 15:01:54.885 ERROR modules/k3s.go:56 open /home/kuscia/var/k3s/server/tls/server-ca.crt: no such file or directory 2023-07-26 15:01:55.884 ERROR modules/k3s.go:56 open /home/kuscia/var/k3s/server/tls/server-ca.crt: no such file or directory 2023-07-26 15:01:56.884 ERROR modules/k3s.go:56 open /home/kuscia/var/k3s/server/tls/server-ca.crt: no such file or directory 2023-07-26 15:01:57.863 ERROR modules/k3s.go:56 open /home/kuscia/var/k3s/server/tls/server-ca.crt: no such file or directory 2023-07-26 15:01:58.864 ERROR modules/k3s.go:56 open /home/kuscia/var/k3s/server/tls/server-ca.crt: no such file or directory 2023-07-26 15:01:59.863 ERROR modules/k3s.go:56 open /home/kuscia/var/k3s/server/tls/server-ca.crt: no such file or directory 2023-07-26 15:02:00.862 ERROR modules/k3s.go:56 open /home/kuscia/var/k3s/server/tls/server-ca.crt: no such file or directory 2023-07-26 15:02:01.862 ERROR modules/k3s.go:56 open /home/kuscia/var/k3s/server/tls/server-ca.crt: no such file or directory 2023-07-26 15:02:02.866 ERROR modules/k3s.go:56 open /home/kuscia/var/k3s/server/tls/server-ca.crt: no such file or directory 2023-07-26 15:02:28.561 ERROR modules/envoy.go:50 Get ready err:Get "http://127.0.0.1:10000/ready": dial tcp 127.0.0.1:10000: connect: connection refused 2023-07-26 15:02:29.562 ERROR modules/envoy.go:50 Get ready err:Get "http://127.0.0.1:10000/ready": dial tcp 127.0.0.1:10000: connect: connection refused 2023-07-26 15:02:30.561 ERROR modules/envoy.go:50 Get ready err:Get "http://127.0.0.1:10000/ready": dial tcp 127.0.0.1:10000: connect: connection refused 2023-07-26 15:02:31.562 ERROR modules/envoy.go:50 Get ready err:Get "http://127.0.0.1:10000/ready": dial tcp 127.0.0.1:10000: connect: connection refused 2023-07-26 15:02:32.561 ERROR modules/envoy.go:50 Get ready err:Get "http://127.0.0.1:10000/ready": dial tcp 127.0.0.1:10000: connect: connection refused 2023-07-26 15:03:26.255 ERROR clusterdomainroute/controller.go:359 domainroutes.kuscia.secretflow "alice-bob" already exists 2023-07-26 15:03:27.758 ERROR clusterdomainroute/controller.go:359 domainroutes.kuscia.secretflow "bob-alice" already exists 2023-07-26 15:03:49.123 ERROR kusciascheduling/kusciascheduling.go:137 PreFilter failed for pod alice/secretflow-task-20230726150347-single-psi-0, failed to get task resource alice/ for pod 2023-07-26 15:03:49.154 ERROR kusciascheduling/kusciascheduling.go:137 PreFilter failed for pod bob/secretflow-task-20230726150347-single-psi-0, failed to get task resource bob/ for pod 2023-07-26 15:10:32.401 ERROR controller/gateway.go:161 update gateway(name:xiaomi-kuscia-master namespace:kuscia-system) fail: Operation cannot be fulfilled on gateways.kuscia.secretflow "xiaomi-kuscia-master": the object has been modified; please apply your changes to the latest version and try again 2023-07-26 15:10:32.401 ERROR controller/gateway.go:102 sync gateway error: Operation cannot be fulfilled on gateways.kuscia.secretflow "xiaomi-kuscia-master": the object has been modified; please apply your changes to the latest version and try again 2023-07-26 15:11:20.284 ERROR controller/gateway.go:161 update gateway(name:xiaomi-kuscia-master namespace:kuscia-system) fail: Operation cannot be fulfilled on gateways.kuscia.secretflow "xiaomi-kuscia-master": the object has been modified; please apply your changes to the latest version and try again 2023-07-26 15:11:20.284 ERROR controller/gateway.go:102 sync gateway error: Operation cannot be fulfilled on gateways.kuscia.secretflow "xiaomi-kuscia-master": the object has been modified; please apply your changes to the latest version and try again 2023-07-26 15:12:22.577 ERROR controller/gateway.go:161 update gateway(name:xiaomi-kuscia-master namespace:kuscia-system) fail: Operation cannot be fulfilled on gateways.kuscia.secretflow "xiaomi-kuscia-master": the object has been modified; please apply your changes to the latest version and try again 2023-07-26 15:12:22.577 ERROR controller/gateway.go:102 sync gateway error: Operation cannot be fulfilled on gateways.kuscia.secretflow "xiaomi-kuscia-master": the object has been modified; please apply your changes to the latest version and try again 2023-07-26 15:15:58.447 ERROR controller/gateway.go:161 update gateway(name:xiaomi-kuscia-master namespace:kuscia-system) fail: Operation cannot be fulfilled on gateways.kuscia.secretflow "xiaomi-kuscia-master": the object has been modified; please apply your changes to the latest version and try again 2023-07-26 15:15:58.447 ERROR controller/gateway.go:102 sync gateway error: Operation cannot be fulfilled on gateways.kuscia.secretflow "xiaomi-kuscia-master": the object has been modified; please apply your changes to the latest version and try again 2023-07-26 15:39:34.291 ERROR modules/k3s.go:97 Get ready err:Get "https://127.0.0.1:6443/readyz": dial tcp 127.0.0.1:6443: connect: connection refused 2023-07-26 15:39:34.291 ERROR modules/k3s.go:97 Get ready err:Get "https://127.0.0.1:6443/readyz": dial tcp 127.0.0.1:6443: connect: connection refused 2023-07-26 15:39:34.996 ERROR modules/k3s.go:97 Get ready err:Get "https://127.0.0.1:6443/readyz": dial tcp 127.0.0.1:6443: connect: connection refused 2023-07-26 15:39:35.997 ERROR modules/k3s.go:97 Get ready err:Get "https://127.0.0.1:6443/readyz": dial tcp 127.0.0.1:6443: connect: connection refused 2023-07-26 15:39:36.998 ERROR modules/k3s.go:97 Get ready err:Get "https://127.0.0.1:6443/readyz": dial tcp 127.0.0.1:6443: connect: connection refused 2023-07-26 15:39:37.997 ERROR modules/k3s.go:97 Get ready err:Get "https://127.0.0.1:6443/readyz": dial tcp 127.0.0.1:6443: connect: connection refused 2023-07-26 15:39:38.999 ERROR modules/k3s.go:97 Get ready err:Get "https://127.0.0.1:6443/readyz": dial tcp 127.0.0.1:6443: connect: connection refused 2023-07-26 15:39:39.998 ERROR modules/k3s.go:97 Get ready err:Get "https://127.0.0.1:6443/readyz": dial tcp 127.0.0.1:6443: connect: connection refused 2023-07-26 15:39:40.999 ERROR modules/k3s.go:97 Get ready err:Get "https://127.0.0.1:6443/readyz": dial tcp 127.0.0.1:6443: connect: connection refused 2023-07-26 15:39:41.998 ERROR modules/k3s.go:97 Get ready err:Get "https://127.0.0.1:6443/readyz": dial tcp 127.0.0.1:6443: connect: connection refused 2023-07-26 15:39:42.997 ERROR modules/k3s.go:97 Get ready err:Get "https://127.0.0.1:6443/readyz": dial tcp 127.0.0.1:6443: connect: connection refused 2023-07-26 15:39:43.998 ERROR modules/k3s.go:97 Get ready err:Get "https://127.0.0.1:6443/readyz": dial tcp 127.0.0.1:6443: connect: connection refused 2023-07-26 15:39:44.996 ERROR modules/k3s.go:97 Get ready err:Get "https://127.0.0.1:6443/readyz": dial tcp 127.0.0.1:6443: connect: connection refused 2023-07-26 15:39:45.996 ERROR modules/k3s.go:97 Get ready err:Get "https://127.0.0.1:6443/readyz": dial tcp 127.0.0.1:6443: connect: connection refused 2023-07-26 15:39:46.997 ERROR modules/k3s.go:97 Get ready err:Get "https://127.0.0.1:6443/readyz": dial tcp 127.0.0.1:6443: connect: connection refused 2023-07-26 15:39:47.998 ERROR modules/k3s.go:97 Get ready err:Get "https://127.0.0.1:6443/readyz": dial tcp 127.0.0.1:6443: connect: connection refused 2023-07-26 15:39:48.996 ERROR modules/k3s.go:97 Get ready err:Get "https://127.0.0.1:6443/readyz": dial tcp 127.0.0.1:6443: connect: connection refused 2023-07-26 15:39:49.996 ERROR modules/k3s.go:97 Get ready err:Get "https://127.0.0.1:6443/readyz": dial tcp 127.0.0.1:6443: connect: connection refused 2023-07-26 15:39:50.997 ERROR modules/k3s.go:97 Get ready err:Get "https://127.0.0.1:6443/readyz": dial tcp 127.0.0.1:6443: connect: connection refused 2023-07-26 15:39:51.997 ERROR modules/k3s.go:97 Get ready err:Get "https://127.0.0.1:6443/readyz": dial tcp 127.0.0.1:6443: connect: connection refused 2023-07-26 15:39:52.997 ERROR modules/k3s.go:97 Get ready err:Get "https://127.0.0.1:6443/readyz": dial tcp 127.0.0.1:6443: connect: connection refused 2023-07-26 15:39:53.997 ERROR modules/k3s.go:97 Get ready err:Get "https://127.0.0.1:6443/readyz": dial tcp 127.0.0.1:6443: connect: connection refused 2023-07-26 15:39:54.996 ERROR modules/k3s.go:97 Get ready err:Get "https://127.0.0.1:6443/readyz": dial tcp 127.0.0.1:6443: connect: connection refused 2023-07-26 15:39:55.975 ERROR modules/k3s.go:97 Get ready err:Get "https://127.0.0.1:6443/readyz": dial tcp 127.0.0.1:6443: connect: connection refused 2023-07-26 15:39:56.975 ERROR modules/k3s.go:97 Get ready err:Get "https://127.0.0.1:6443/readyz": dial tcp 127.0.0.1:6443: connect: connection refused 2023-07-26 15:39:57.974 ERROR modules/k3s.go:97 Get ready err:Get "https://127.0.0.1:6443/readyz": dial tcp 127.0.0.1:6443: connect: connection refused 2023-07-26 15:39:58.975 ERROR modules/k3s.go:201 wait k3s ready timeout 2023-07-26 15:39:58.997 ERROR modules/k3s.go:196 context had done, no need to wait to restart 2023-07-26 15:39:59.092 ERROR master/master.go:144 Post "https://127.0.0.1:6443/api/v1/namespaces": dial tcp 127.0.0.1:6443: connect: connection refused 2023-07-26 15:40:03.225 ERROR modules/k3s.go:97 Get ready err:Get "https://127.0.0.1:6443/readyz": dial tcp 127.0.0.1:6443: connect: connection refused 2023-07-26 15:40:04.225 ERROR modules/k3s.go:97 Get ready err:Get "https://127.0.0.1:6443/readyz": dial tcp 127.0.0.1:6443: connect: connection refused 2023-07-26 15:40:05.224 ERROR modules/k3s.go:97 Get ready err:Get "https://127.0.0.1:6443/readyz": dial tcp 127.0.0.1:6443: connect: connection refused 2023-07-26 15:40:06.223 ERROR modules/k3s.go:97 Get ready err:Get "https://127.0.0.1:6443/readyz": dial tcp 127.0.0.1:6443: connect: connection refused 2023-07-26 15:40:07.223 ERROR modules/k3s.go:97 Get ready err:Get "https://127.0.0.1:6443/readyz": dial tcp 127.0.0.1:6443: connect: connection refused 2023-07-26 15:40:08.224 ERROR modules/k3s.go:97 Get ready err:Get "https://127.0.0.1:6443/readyz": dial tcp 127.0.0.1:6443: connect: connection refused 2023-07-26 15:40:09.223 ERROR modules/k3s.go:97 Get ready err:Get "https://127.0.0.1:6443/readyz": dial tcp 127.0.0.1:6443: connect: connection refused 2023-07-26 15:40:10.223 ERROR modules/k3s.go:97 Get ready err:Get "https://127.0.0.1:6443/readyz": dial tcp 127.0.0.1:6443: connect: connection refused 2023-07-26 15:40:11.224 ERROR modules/k3s.go:97 Get ready err:Get "https://127.0.0.1:6443/readyz": dial tcp 127.0.0.1:6443: connect: connection refused 2023-07-26 15:40:12.223 ERROR modules/k3s.go:97 Get ready err:Get "https://127.0.0.1:6443/readyz": dial tcp 127.0.0.1:6443: connect: connection refused 2023-07-26 15:40:13.223 ERROR modules/k3s.go:97 Get ready err:Get "https://127.0.0.1:6443/readyz": dial tcp 127.0.0.1:6443: connect: connection refused 2023-07-26 15:40:14.223 ERROR modules/k3s.go:97 Get ready err:Get "https://127.0.0.1:6443/readyz": dial tcp 127.0.0.1:6443: connect: connection refused 2023-07-26 15:40:15.223 ERROR modules/k3s.go:97 Get ready err:Get "https://127.0.0.1:6443/readyz": dial tcp 127.0.0.1:6443: connect: connection refused 2023-07-26 15:40:16.223 ERROR modules/k3s.go:97 Get ready err:Get "https://127.0.0.1:6443/readyz": dial tcp 127.0.0.1:6443: connect: connection refused 2023-07-26 15:40:17.223 ERROR modules/k3s.go:97 Get ready err:Get "https://127.0.0.1:6443/readyz": dial tcp 127.0.0.1:6443: connect: connection refused 2023-07-26 15:40:18.223 ERROR modules/k3s.go:97 Get ready err:Get "https://127.0.0.1:6443/readyz": dial tcp 127.0.0.1:6443: connect: connection refused 2023-07-26 15:40:19.223 ERROR modules/k3s.go:97 Get ready err:Get "https://127.0.0.1:6443/readyz": dial tcp 127.0.0.1:6443: connect: connection refused 2023-07-26 15:40:20.225 ERROR modules/k3s.go:97 Get ready err:Get "https://127.0.0.1:6443/readyz": dial tcp 127.0.0.1:6443: connect: connection refused 2023-07-26 15:40:39.640 ERROR modules/envoy.go:50 Get ready err:Get "http://127.0.0.1:10000/ready": dial tcp 127.0.0.1:10000: connect: connection refused 2023-07-26 15:53:19.221 ERROR kusciascheduling/kusciascheduling.go:137 PreFilter failed for pod bob/fate-task-20230726155318-data-reader-0, failed to get task resource bob/ for pod 2023-07-26 16:26:13.524 ERROR controller/gateway.go:161 update gateway(name:xiaomi-kuscia-master namespace:kuscia-system) fail: Operation cannot be fulfilled on gateways.kuscia.secretflow "xiaomi-kuscia-master": the object has been modified; please apply your changes to the latest version and try again 2023-07-26 16:26:13.524 ERROR controller/gateway.go:102 sync gateway error: Operation cannot be fulfilled on gateways.kuscia.secretflow "xiaomi-kuscia-master": the object has been modified; please apply your changes to the latest version and try again 2023-07-26 17:52:51.674 ERROR kusciascheduling/kusciascheduling.go:137 PreFilter failed for pod alice/secretflow-task-20230726175251-single-psi-0, failed to get task resource alice/ for pod 2023-07-26 17:52:51.724 ERROR kusciascheduling/kusciascheduling.go:137 PreFilter failed for pod bob/secretflow-task-20230726175251-single-psi-0, failed to get task resource bob/ for pod
alice :
2023-07-26 15:02:38.915 ERROR modules/containerd.go:79 Unable to import pause image: failed to run command "/home/kuscia/bin/ctr -a=/home/kuscia/containerd/run/containerd.sock -n=k8s.io images import /home/kuscia/pause/pause.tar", detail-> exit status 1 2023-07-26 15:02:39.846 ERROR modules/containerd.go:79 Unable to import pause image: failed to run command "/home/kuscia/bin/ctr -a=/home/kuscia/containerd/run/containerd.sock -n=k8s.io images import /home/kuscia/pause/pause.tar", detail-> exit status 1 2023-07-26 15:02:40.879 ERROR modules/containerd.go:79 Unable to import pause image: failed to run command "/home/kuscia/bin/ctr -a=/home/kuscia/containerd/run/containerd.sock -n=k8s.io images import /home/kuscia/pause/pause.tar", detail-> exit status 1 2023-07-26 15:02:41.911 ERROR modules/containerd.go:79 Unable to import pause image: failed to run command "/home/kuscia/bin/ctr -a=/home/kuscia/containerd/run/containerd.sock -n=k8s.io images import /home/kuscia/pause/pause.tar", detail-> exit status 1 2023-07-26 15:02:42.842 ERROR modules/containerd.go:79 Unable to import pause image: failed to run command "/home/kuscia/bin/ctr -a=/home/kuscia/containerd/run/containerd.sock -n=k8s.io images import /home/kuscia/pause/pause.tar", detail-> exit status 1 2023-07-26 15:02:43.847 ERROR modules/containerd.go:79 Unable to import pause image: failed to run command "/home/kuscia/bin/ctr -a=/home/kuscia/containerd/run/containerd.sock -n=k8s.io images import /home/kuscia/pause/pause.tar", detail-> exit status 1 2023-07-26 15:02:44.856 ERROR modules/containerd.go:79 Unable to import pause image: failed to run command "/home/kuscia/bin/ctr -a=/home/kuscia/containerd/run/containerd.sock -n=k8s.io images import /home/kuscia/pause/pause.tar", detail-> exit status 1 2023-07-26 15:02:45.853 ERROR modules/containerd.go:79 Unable to import pause image: failed to run command "/home/kuscia/bin/ctr -a=/home/kuscia/containerd/run/containerd.sock -n=k8s.io images import /home/kuscia/pause/pause.tar", detail-> exit status 1 2023-07-26 15:02:46.845 ERROR modules/containerd.go:79 Unable to import pause image: failed to run command "/home/kuscia/bin/ctr -a=/home/kuscia/containerd/run/containerd.sock -n=k8s.io images import /home/kuscia/pause/pause.tar", detail-> exit status 1 2023-07-26 15:02:47.846 ERROR modules/containerd.go:79 Unable to import pause image: failed to run command "/home/kuscia/bin/ctr -a=/home/kuscia/containerd/run/containerd.sock -n=k8s.io images import /home/kuscia/pause/pause.tar", detail-> exit status 1 2023-07-26 15:14:55.167 ERROR controller/gateway.go:161 update gateway(name:xiaomi-kuscia-lite-alice namespace:alice) fail: Operation cannot be fulfilled on gateways.kuscia.secretflow "xiaomi-kuscia-lite-alice": the object has been modified; please apply your changes to the latest version and try again 2023-07-26 15:14:55.167 ERROR controller/gateway.go:102 sync gateway error: Operation cannot be fulfilled on gateways.kuscia.secretflow "xiaomi-kuscia-lite-alice": the object has been modified; please apply your changes to the latest version and try again 2023-07-26 15:15:58.440 ERROR controller/gateway.go:161 update gateway(name:xiaomi-kuscia-lite-alice namespace:alice) fail: Operation cannot be fulfilled on gateways.kuscia.secretflow "xiaomi-kuscia-lite-alice": the object has been modified; please apply your changes to the latest version and try again 2023-07-26 15:15:58.440 ERROR controller/gateway.go:102 sync gateway error: Operation cannot be fulfilled on gateways.kuscia.secretflow "xiaomi-kuscia-lite-alice": the object has been modified; please apply your changes to the latest version and try again 2023-07-26 15:40:13.247 ERROR modules/containerd.go:79 Unable to import pause image: failed to run command "/home/kuscia/bin/ctr -a=/home/kuscia/containerd/run/containerd.sock -n=k8s.io images import /home/kuscia/pause/pause.tar", detail-> exit status 1 2023-07-26 15:40:13.247 ERROR modules/containerd.go:121 wait containerd ready timeout 2023-07-26 15:40:13.248 ERROR modules/coredns.go:146 context canceled 2023-07-26 15:40:13.248 ERROR modules/domainroute.go:90 domain route wait ready failed with error: context canceled 2023-07-26 15:40:13.248 ERROR modules/transport.go:129 context canceled 2023-07-26 15:40:13.248 ERROR modules/envoy.go:146 context canceled 2023-07-26 15:40:13.248 ERROR modules/envoy.go:141 startup process failed at first time, so stop at once, error: start process(0) failed with context canceled 2023-07-26 15:40:13.248 ERROR modules/agent.go:97 context canceled 2023-07-26 15:40:18.124 ERROR modules/containerd.go:79 Unable to import pause image: failed to run command "/home/kuscia/bin/ctr -a=/home/kuscia/containerd/run/containerd.sock -n=k8s.io images import /home/kuscia/pause/pause.tar", detail-> exit status 1 2023-07-26 15:40:18.462 ERROR modules/containerd.go:79 Unable to import pause image: failed to run command "/home/kuscia/bin/ctr -a=/home/kuscia/containerd/run/containerd.sock -n=k8s.io images import /home/kuscia/pause/pause.tar", detail-> exit status 1 2023-07-26 15:40:19.475 ERROR modules/containerd.go:79 Unable to import pause image: failed to run command "/home/kuscia/bin/ctr -a=/home/kuscia/containerd/run/containerd.sock -n=k8s.io images import /home/kuscia/pause/pause.tar", detail-> exit status 1 2023-07-26 15:40:20.353 ERROR modules/containerd.go:79 Unable to import pause image: failed to run command "/home/kuscia/bin/ctr -a=/home/kuscia/containerd/run/containerd.sock -n=k8s.io images import /home/kuscia/pause/pause.tar", detail-> exit status 1 2023-07-26 15:40:28.624 ERROR modules/envoy.go:50 Get ready err:Get "http://127.0.0.1:10000/ready": dial tcp 127.0.0.1:10000: connect: connection refused 2023-07-26 15:40:29.596 ERROR modules/envoy.go:50 Get ready err:Get "http://127.0.0.1:10000/ready": dial tcp 127.0.0.1:10000: connect: connection refused 2023-07-26 15:40:30.595 ERROR modules/envoy.go:50 Get ready err:Get "http://127.0.0.1:10000/ready": dial tcp 127.0.0.1:10000: connect: connection refused 2023-07-26 15:40:31.595 ERROR modules/envoy.go:50 Get ready err:Get "http://127.0.0.1:10000/ready": dial tcp 127.0.0.1:10000: connect: connection refused 2023-07-26 15:40:32.596 ERROR modules/envoy.go:50 Get ready err:Get "http://127.0.0.1:10000/ready": dial tcp 127.0.0.1:10000: connect: connection refused 2023-07-26 15:40:33.596 ERROR modules/envoy.go:50 Get ready err:Get "http://127.0.0.1:10000/ready": dial tcp 127.0.0.1:10000: connect: connection refused 2023-07-26 15:40:34.596 ERROR modules/envoy.go:50 Get ready err:Get "http://127.0.0.1:10000/ready": dial tcp 127.0.0.1:10000: connect: connection refused 2023-07-26 15:40:35.596 ERROR modules/envoy.go:50 Get ready err:Get "http://127.0.0.1:10000/ready": dial tcp 127.0.0.1:10000: connect: connection refused 2023-07-26 15:40:36.596 ERROR modules/envoy.go:50 Get ready err:Get "http://127.0.0.1:10000/ready": dial tcp 127.0.0.1:10000: connect: connection refused 2023-07-26 15:40:37.596 ERROR modules/envoy.go:50 Get ready err:Get "http://127.0.0.1:10000/ready": dial tcp 127.0.0.1:10000: connect: connection refused 2023-07-26 15:40:38.595 ERROR modules/envoy.go:50 Get ready err:Get "http://127.0.0.1:10000/ready": dial tcp 127.0.0.1:10000: connect: connection refused 2023-07-26 15:40:39.596 ERROR modules/envoy.go:50 Get ready err:Get "http://127.0.0.1:10000/ready": dial tcp 127.0.0.1:10000: connect: connection refused 2023-07-26 15:42:25.032 ERROR controller/gateway.go:161 update gateway(name:xiaomi-kuscia-lite-alice namespace:alice) fail: Operation cannot be fulfilled on gateways.kuscia.secretflow "xiaomi-kuscia-lite-alice": the object has been modified; please apply your changes to the latest version and try again 2023-07-26 15:42:25.032 ERROR controller/gateway.go:102 sync gateway error: Operation cannot be fulfilled on gateways.kuscia.secretflow "xiaomi-kuscia-lite-alice": the object has been modified; please apply your changes to the latest version and try again 2023-07-26 16:26:13.537 ERROR controller/gateway.go:161 update gateway(name:xiaomi-kuscia-lite-alice namespace:alice) fail: Operation cannot be fulfilled on gateways.kuscia.secretflow "xiaomi-kuscia-lite-alice": the object has been modified; please apply your changes to the latest version and try again 2023-07-26 16:26:13.537 ERROR controller/gateway.go:102 sync gateway error: Operation cannot be fulfilled on gateways.kuscia.secretflow "xiaomi-kuscia-lite-alice": the object has been modified; please apply your changes to the latest version and try again
bob :
2023-07-26 15:03:13.287 ERROR modules/containerd.go:79 Unable to import pause image: failed to run command "/home/kuscia/bin/ctr -a=/home/kuscia/containerd/run/containerd.sock -n=k8s.io images import /home/kuscia/pause/pause.tar", detail-> exit status 1 2023-07-26 15:03:14.269 ERROR modules/containerd.go:79 Unable to import pause image: failed to run command "/home/kuscia/bin/ctr -a=/home/kuscia/containerd/run/containerd.sock -n=k8s.io images import /home/kuscia/pause/pause.tar", detail-> exit status 1 2023-07-26 15:03:15.278 ERROR modules/containerd.go:79 Unable to import pause image: failed to run command "/home/kuscia/bin/ctr -a=/home/kuscia/containerd/run/containerd.sock -n=k8s.io images import /home/kuscia/pause/pause.tar", detail-> exit status 1 2023-07-26 15:05:52.976 ERROR kuberuntime/kuberuntime_manager.go:713 Container "secretflow" start failed in pod "secretflow-task-20230726150347-single-psi-0_bob(254561d8-03ab-4d01-9b95-5a0f31808cf1)", containerMessage-> context deadline exceeded, err-> CreateContainerError 2023-07-26 15:05:52.976 ERROR framework/pod_workers.go:986 Error syncing pod "secretflow-task-20230726150347-single-psi-0_bob(254561d8-03ab-4d01-9b95-5a0f31808cf1)", skipping: failed to "StartContainer" for "secretflow" with CreateContainerError: "context deadline exceeded" 2023-07-26 15:05:52.983 ERROR kuberuntime/kuberuntime_manager.go:713 Container "secretflow" start failed in pod "secretflow-task-20230726150347-single-psi-0_bob(254561d8-03ab-4d01-9b95-5a0f31808cf1)", containerMessage-> failed to reserve container name "secretflow_secretflow-task-20230726150347-single-psi-0_bob_254561d8-03ab-4d01-9b95-5a0f31808cf1_0": name "secretflow_secretflow-task-20230726150347-single-psi-0_bob_254561d8-03ab-4d01-9b95-5a0f31808cf1_0" is reserved for "66d5665affb01305a3f122f348838236e1f23f3d27bbb992843cf8384d78e73b", err-> CreateContainerError 2023-07-26 15:05:52.983 ERROR framework/pod_workers.go:986 Error syncing pod "secretflow-task-20230726150347-single-psi-0_bob(254561d8-03ab-4d01-9b95-5a0f31808cf1)", skipping: failed to "StartContainer" for "secretflow" with CreateContainerError: "failed to reserve container name \"secretflow_secretflow-task-20230726150347-single-psi-0_bob_254561d8-03ab-4d01-9b95-5a0f31808cf1_0\": name \"secretflow_secretflow-task-20230726150347-single-psi-0_bob_254561d8-03ab-4d01-9b95-5a0f31808cf1_0\" is reserved for \"66d5665affb01305a3f122f348838236e1f23f3d27bbb992843cf8384d78e73b\"" 2023-07-26 15:14:55.152 ERROR controller/gateway.go:161 update gateway(name:xiaomi-kuscia-lite-bob namespace:bob) fail: Operation cannot be fulfilled on gateways.kuscia.secretflow "xiaomi-kuscia-lite-bob": the object has been modified; please apply your changes to the latest version and try again 2023-07-26 15:14:55.152 ERROR controller/gateway.go:102 sync gateway error: Operation cannot be fulfilled on gateways.kuscia.secretflow "xiaomi-kuscia-lite-bob": the object has been modified; please apply your changes to the latest version and try again 2023-07-26 15:15:58.443 ERROR controller/gateway.go:161 update gateway(name:xiaomi-kuscia-lite-bob namespace:bob) fail: Operation cannot be fulfilled on gateways.kuscia.secretflow "xiaomi-kuscia-lite-bob": the object has been modified; please apply your changes to the latest version and try again 2023-07-26 15:15:58.443 ERROR controller/gateway.go:102 sync gateway error: Operation cannot be fulfilled on gateways.kuscia.secretflow "xiaomi-kuscia-lite-bob": the object has been modified; please apply your changes to the latest version and try again 2023-07-26 15:40:13.246 ERROR modules/containerd.go:79 Unable to import pause image: failed to run command "/home/kuscia/bin/ctr -a=/home/kuscia/containerd/run/containerd.sock -n=k8s.io images import /home/kuscia/pause/pause.tar", detail-> exit status 1 2023-07-26 15:40:13.628 ERROR modules/containerd.go:79 Unable to import pause image: failed to run command "/home/kuscia/bin/ctr -a=/home/kuscia/containerd/run/containerd.sock -n=k8s.io images import /home/kuscia/pause/pause.tar", detail-> exit status 1 2023-07-26 15:40:13.628 ERROR modules/containerd.go:121 wait containerd ready timeout 2023-07-26 15:40:13.628 ERROR modules/coredns.go:146 context canceled 2023-07-26 15:40:13.629 ERROR modules/transport.go:129 context canceled 2023-07-26 15:40:13.629 ERROR modules/domainroute.go:90 domain route wait ready failed with error: context canceled 2023-07-26 15:40:18.213 ERROR modules/containerd.go:79 Unable to import pause image: failed to run command "/home/kuscia/bin/ctr -a=/home/kuscia/containerd/run/containerd.sock -n=k8s.io images import /home/kuscia/pause/pause.tar", detail-> exit status 1 2023-07-26 15:40:18.364 ERROR modules/containerd.go:79 Unable to import pause image: failed to run command "/home/kuscia/bin/ctr -a=/home/kuscia/containerd/run/containerd.sock -n=k8s.io images import /home/kuscia/pause/pause.tar", detail-> exit status 1 2023-07-26 15:40:19.396 ERROR modules/containerd.go:79 Unable to import pause image: failed to run command "/home/kuscia/bin/ctr -a=/home/kuscia/containerd/run/containerd.sock -n=k8s.io images import /home/kuscia/pause/pause.tar", detail-> exit status 1 2023-07-26 15:40:20.344 ERROR modules/containerd.go:79 Unable to import pause image: failed to run command "/home/kuscia/bin/ctr -a=/home/kuscia/containerd/run/containerd.sock -n=k8s.io images import /home/kuscia/pause/pause.tar", detail-> exit status 1 2023-07-26 15:40:22.641 ERROR modules/containerd.go:79 Unable to import pause image: failed to run command "/home/kuscia/bin/ctr -a=/home/kuscia/containerd/run/containerd.sock -n=k8s.io images import /home/kuscia/pause/pause.tar", detail-> exit status 1 2023-07-26 15:40:22.711 ERROR modules/containerd.go:79 Unable to import pause image: failed to run command "/home/kuscia/bin/ctr -a=/home/kuscia/containerd/run/containerd.sock -n=k8s.io images import /home/kuscia/pause/pause.tar", detail-> exit status 1 2023-07-26 15:40:23.340 ERROR modules/containerd.go:79 Unable to import pause image: failed to run command "/home/kuscia/bin/ctr -a=/home/kuscia/containerd/run/containerd.sock -n=k8s.io images import /home/kuscia/pause/pause.tar", detail-> exit status 1 2023-07-26 15:40:24.336 ERROR modules/containerd.go:79 Unable to import pause image: failed to run command "/home/kuscia/bin/ctr -a=/home/kuscia/containerd/run/containerd.sock -n=k8s.io images import /home/kuscia/pause/pause.tar", detail-> exit status 1 2023-07-26 15:40:25.326 ERROR modules/containerd.go:79 Unable to import pause image: failed to run command "/home/kuscia/bin/ctr -a=/home/kuscia/containerd/run/containerd.sock -n=k8s.io images import /home/kuscia/pause/pause.tar", detail-> exit status 1 2023-07-26 15:40:26.415 ERROR modules/containerd.go:79 Unable to import pause image: failed to run command "/home/kuscia/bin/ctr -a=/home/kuscia/containerd/run/containerd.sock -n=k8s.io images import /home/kuscia/pause/pause.tar", detail-> exit status 1 2023-07-26 15:40:27.316 ERROR modules/containerd.go:79 Unable to import pause image: failed to run command "/home/kuscia/bin/ctr -a=/home/kuscia/containerd/run/containerd.sock -n=k8s.io images import /home/kuscia/pause/pause.tar", detail-> exit status 1 2023-07-26 15:40:28.318 ERROR modules/containerd.go:79 Unable to import pause image: failed to run command "/home/kuscia/bin/ctr -a=/home/kuscia/containerd/run/containerd.sock -n=k8s.io images import /home/kuscia/pause/pause.tar", detail-> exit status 1 2023-07-26 15:40:29.317 ERROR modules/containerd.go:79 Unable to import pause image: failed to run command "/home/kuscia/bin/ctr -a=/home/kuscia/containerd/run/containerd.sock -n=k8s.io images import /home/kuscia/pause/pause.tar", detail-> exit status 1 2023-07-26 15:40:30.317 ERROR modules/containerd.go:79 Unable to import pause image: failed to run command "/home/kuscia/bin/ctr -a=/home/kuscia/containerd/run/containerd.sock -n=k8s.io images import /home/kuscia/pause/pause.tar", detail-> exit status 1 2023-07-26 15:40:31.309 ERROR modules/containerd.go:79 Unable to import pause image: failed to run command "/home/kuscia/bin/ctr -a=/home/kuscia/containerd/run/containerd.sock -n=k8s.io images import /home/kuscia/pause/pause.tar", detail-> exit status 1 2023-07-26 15:40:32.309 ERROR modules/containerd.go:79 Unable to import pause image: failed to run command "/home/kuscia/bin/ctr -a=/home/kuscia/containerd/run/containerd.sock -n=k8s.io images import /home/kuscia/pause/pause.tar", detail-> exit status 1 2023-07-26 15:40:38.916 ERROR modules/envoy.go:50 Get ready err:Get "http://127.0.0.1:10000/ready": dial tcp 127.0.0.1:10000: connect: connection refused 2023-07-26 15:40:39.916 ERROR modules/envoy.go:50 Get ready err:Get "http://127.0.0.1:10000/ready": dial tcp 127.0.0.1:10000: connect: connection refused 2023-07-26 15:42:25.028 ERROR controller/gateway.go:161 update gateway(name:xiaomi-kuscia-lite-bob namespace:bob) fail: Operation cannot be fulfilled on gateways.kuscia.secretflow "xiaomi-kuscia-lite-bob": the object has been modified; please apply your changes to the latest version and try again 2023-07-26 15:42:25.028 ERROR controller/gateway.go:102 sync gateway error: Operation cannot be fulfilled on gateways.kuscia.secretflow "xiaomi-kuscia-lite-bob": the object has been modified; please apply your changes to the latest version and try again 2023-07-26 16:26:13.999 ERROR controller/gateway.go:161 update gateway(name:xiaomi-kuscia-lite-bob namespace:bob) fail: Operation cannot be fulfilled on gateways.kuscia.secretflow "xiaomi-kuscia-lite-bob": the object has been modified; please apply your changes to the latest version and try again 2023-07-26 16:26:13.999 ERROR controller/gateway.go:102 sync gateway error: Operation cannot be fulfilled on gateways.kuscia.secretflow "xiaomi-kuscia-lite-bob": the object has been modified; please apply your changes to the latest version and try again
from kuscia.
master中 /home/kuscia/var/k3s/server/tls/server-ca.crt是存在的
from kuscia.
@cs1317 麻烦再贴一下 master 容器中k3s 日志 /home/kuscia/var/logs/k3s.log
from kuscia.
head -n 100 k3s.log
time="2023-07-26T15:02:02+08:00" level=warning msg="Webhooks and apiserver aggregation may not function properly without an agent; please set egress-selector-mode to 'cluster' or 'pod'" time="2023-07-26T15:02:02+08:00" level=info msg="Starting k3s v1.26.5+k3s1 (7cefebea)" time="2023-07-26T15:02:02+08:00" level=info msg="Configuring sqlite3 database connection pooling: maxIdleConns=2, maxOpenConns=0, connMaxLifetime=0s" time="2023-07-26T15:02:02+08:00" level=info msg="Configuring database table schema and indexes, this may take a moment..." time="2023-07-26T15:02:02+08:00" level=info msg="Database tables and indexes are up to date" time="2023-07-26T15:02:02+08:00" level=info msg="Kine available at unix://kine.sock" time="2023-07-26T15:02:03+08:00" level=info msg="generated self-signed CA certificate CN=k3s-client-ca@1690354923: notBefore=2023-07-26 07:02:03.081147645 +0000 UTC notAfter=2033-07-23 07:02:03.081147645 +0000 UTC" time="2023-07-26T15:02:03+08:00" level=info msg="certificate CN=system:admin,O=system:masters signed by CN=k3s-client-ca@1690354923: notBefore=2023-07-26 07:02:03 +0000 UTC notAfter=2024-07-25 07:02:03 +0000 UTC" time="2023-07-26T15:02:03+08:00" level=info msg="certificate CN=system:kube-controller-manager signed by CN=k3s-client-ca@1690354923: notBefore=2023-07-26 07:02:03 +0000 UTC notAfter=2024-07-25 07:02:03 +0000 UTC" time="2023-07-26T15:02:03+08:00" level=info msg="certificate CN=system:kube-scheduler signed by CN=k3s-client-ca@1690354923: notBefore=2023-07-26 07:02:03 +0000 UTC notAfter=2024-07-25 07:02:03 +0000 UTC" time="2023-07-26T15:02:03+08:00" level=info msg="certificate CN=system:apiserver,O=system:masters signed by CN=k3s-client-ca@1690354923: notBefore=2023-07-26 07:02:03 +0000 UTC notAfter=2024-07-25 07:02:03 +0000 UTC" time="2023-07-26T15:02:03+08:00" level=info msg="certificate CN=system:kube-proxy signed by CN=k3s-client-ca@1690354923: notBefore=2023-07-26 07:02:03 +0000 UTC notAfter=2024-07-25 07:02:03 +0000 UTC" time="2023-07-26T15:02:03+08:00" level=info msg="certificate CN=system:k3s-controller signed by CN=k3s-client-ca@1690354923: notBefore=2023-07-26 07:02:03 +0000 UTC notAfter=2024-07-25 07:02:03 +0000 UTC" time="2023-07-26T15:02:03+08:00" level=info msg="certificate CN=k3s-cloud-controller-manager signed by CN=k3s-client-ca@1690354923: notBefore=2023-07-26 07:02:03 +0000 UTC notAfter=2024-07-25 07:02:03 +0000 UTC" time="2023-07-26T15:02:03+08:00" level=info msg="generated self-signed CA certificate CN=k3s-server-ca@1690354923: notBefore=2023-07-26 07:02:03.086314283 +0000 UTC notAfter=2033-07-23 07:02:03.086314283 +0000 UTC" time="2023-07-26T15:02:03+08:00" level=info msg="certificate CN=kube-apiserver signed by CN=k3s-server-ca@1690354923: notBefore=2023-07-26 07:02:03 +0000 UTC notAfter=2024-07-25 07:02:03 +0000 UTC" time="2023-07-26T15:02:03+08:00" level=info msg="generated self-signed CA certificate CN=k3s-request-header-ca@1690354923: notBefore=2023-07-26 07:02:03.08757997 +0000 UTC notAfter=2033-07-23 07:02:03.08757997 +0000 UTC" time="2023-07-26T15:02:03+08:00" level=info msg="certificate CN=system:auth-proxy signed by CN=k3s-request-header-ca@1690354923: notBefore=2023-07-26 07:02:03 +0000 UTC notAfter=2024-07-25 07:02:03 +0000 UTC" time="2023-07-26T15:02:03+08:00" level=info msg="generated self-signed CA certificate CN=etcd-server-ca@1690354923: notBefore=2023-07-26 07:02:03.088598811 +0000 UTC notAfter=2033-07-23 07:02:03.088598811 +0000 UTC" time="2023-07-26T15:02:03+08:00" level=info msg="certificate CN=etcd-server signed by CN=etcd-server-ca@1690354923: notBefore=2023-07-26 07:02:03 +0000 UTC notAfter=2024-07-25 07:02:03 +0000 UTC" time="2023-07-26T15:02:03+08:00" level=info msg="certificate CN=etcd-client signed by CN=etcd-server-ca@1690354923: notBefore=2023-07-26 07:02:03 +0000 UTC notAfter=2024-07-25 07:02:03 +0000 UTC" time="2023-07-26T15:02:03+08:00" level=info msg="generated self-signed CA certificate CN=etcd-peer-ca@1690354923: notBefore=2023-07-26 07:02:03.089942846 +0000 UTC notAfter=2033-07-23 07:02:03.089942846 +0000 UTC" time="2023-07-26T15:02:03+08:00" level=info msg="certificate CN=etcd-peer signed by CN=etcd-peer-ca@1690354923: notBefore=2023-07-26 07:02:03 +0000 UTC notAfter=2024-07-25 07:02:03 +0000 UTC" time="2023-07-26T15:02:03+08:00" level=info msg="Saving cluster bootstrap data to datastore" time="2023-07-26T15:02:03+08:00" level=info msg="certificate CN=k3s,O=k3s signed by CN=k3s-server-ca@1690354923: notBefore=2023-07-26 07:02:03 +0000 UTC notAfter=2024-07-25 07:02:03 +0000 UTC" time="2023-07-26T15:02:03+08:00" level=warning msg="dynamiclistener [::]:6443: no cached certificate available for preload - deferring certificate load until storage initialization or first client request" time="2023-07-26T15:02:03+08:00" level=info msg="Active TLS secret / (ver=) (count 10): map[listener.cattle.io/cn-10.43.0.1:10.43.0.1 listener.cattle.io/cn-127.0.0.1:127.0.0.1 listener.cattle.io/cn-172.18.0.2:172.18.0.2 listener.cattle.io/cn-__1-f16284:::1 listener.cattle.io/cn-kubernetes:kubernetes listener.cattle.io/cn-kubernetes.default:kubernetes.default listener.cattle.io/cn-kubernetes.default.svc:kubernetes.default.svc listener.cattle.io/cn-kubernetes.default.svc.cluster.local:kubernetes.default.svc.cluster.local listener.cattle.io/cn-localhost:localhost listener.cattle.io/cn-xiaomi-kuscia-master:xiaomi-kuscia-master listener.cattle.io/fingerprint:SHA1=99CE9A8C8941A3059AD52CE46BA8D0916CE697BE]" time="2023-07-26T15:02:03+08:00" level=info msg="Running kube-apiserver --advertise-port=6443 --allow-privileged=true --anonymous-auth=false --api-audiences=https://kubernetes.default.svc.cluster.local,k3s --authorization-mode=Node,RBAC --bind-address=127.0.0.1 --cert-dir=/home/kuscia/var/k3s/server/tls/temporary-certs --client-ca-file=/home/kuscia/var/k3s/server/tls/client-ca.crt --egress-selector-config-file=/home/kuscia/var/k3s/server/etc/egress-selector-config.yaml --enable-admission-plugins=NodeRestriction --enable-aggregator-routing=true --enable-bootstrap-token-auth=true --etcd-servers=unix://kine.sock --feature-gates=JobTrackingWithFinalizers=true --kubelet-certificate-authority=/home/kuscia/var/k3s/server/tls/server-ca.crt --kubelet-client-certificate=/home/kuscia/var/k3s/server/tls/client-kube-apiserver.crt --kubelet-client-key=/home/kuscia/var/k3s/server/tls/client-kube-apiserver.key --kubelet-preferred-address-types=InternalIP,ExternalIP,Hostname --profiling=false --proxy-client-cert-file=/home/kuscia/var/k3s/server/tls/client-auth-proxy.crt --proxy-client-key-file=/home/kuscia/var/k3s/server/tls/client-auth-proxy.key --requestheader-allowed-names=system:auth-proxy --requestheader-client-ca-file=/home/kuscia/var/k3s/server/tls/request-header-ca.crt --requestheader-extra-headers-prefix=X-Remote-Extra- --requestheader-group-headers=X-Remote-Group --requestheader-username-headers=X-Remote-User --secure-port=6444 --service-account-issuer=https://kubernetes.default.svc.cluster.local --service-account-key-file=/home/kuscia/var/k3s/server/tls/service.key --service-account-signing-key-file=/home/kuscia/var/k3s/server/tls/service.current.key --service-cluster-ip-range=10.43.0.0/16 --service-node-port-range=30000-32767 --storage-backend=etcd3 --tls-cert-file=/home/kuscia/var/k3s/server/tls/serving-kube-apiserver.crt --tls-cipher-suites=TLS_ECDHE_ECDSA_WITH_AES_256_GCM_SHA384,TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384,TLS_ECDHE_ECDSA_WITH_AES_128_GCM_SHA256,TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256,TLS_ECDHE_ECDSA_WITH_CHACHA20_POLY1305,TLS_ECDHE_RSA_WITH_CHACHA20_POLY1305 --tls-private-key-file=/home/kuscia/var/k3s/server/tls/serving-kube-apiserver.key" time="2023-07-26T15:02:03+08:00" level=info msg="Running kube-controller-manager --allocate-node-cidrs=true --authentication-kubeconfig=/home/kuscia/var/k3s/server/cred/controller.kubeconfig --authorization-kubeconfig=/home/kuscia/var/k3s/server/cred/controller.kubeconfig --bind-address=127.0.0.1 --cluster-cidr=10.42.0.0/16 --cluster-signing-kube-apiserver-client-cert-file=/home/kuscia/var/k3s/server/tls/client-ca.nochain.crt --cluster-signing-kube-apiserver-client-key-file=/home/kuscia/var/k3s/server/tls/client-ca.key --cluster-signing-kubelet-client-cert-file=/home/kuscia/var/k3s/server/tls/client-ca.nochain.crt --cluster-signing-kubelet-client-key-file=/home/kuscia/var/k3s/server/tls/client-ca.key --cluster-signing-kubelet-serving-cert-file=/home/kuscia/var/k3s/server/tls/server-ca.nochain.crt --cluster-signing-kubelet-serving-key-file=/home/kuscia/var/k3s/server/tls/server-ca.key --cluster-signing-legacy-unknown-cert-file=/home/kuscia/var/k3s/server/tls/server-ca.nochain.crt --cluster-signing-legacy-unknown-key-file=/home/kuscia/var/k3s/server/tls/server-ca.key --controllers=*,tokencleaner --feature-gates=JobTrackingWithFinalizers=true --kubeconfig=/home/kuscia/var/k3s/server/cred/controller.kubeconfig --leader-elect=false --profiling=false --root-ca-file=/home/kuscia/var/k3s/server/tls/server-ca.crt --secure-port=10257 --service-account-private-key-file=/home/kuscia/var/k3s/server/tls/service.current.key --service-cluster-ip-range=10.43.0.0/16 --use-service-account-credentials=true" time="2023-07-26T15:02:03+08:00" level=info msg="Waiting for API server to become available" W0726 15:02:03.893728 52 feature_gate.go:241] Setting GA feature gate JobTrackingWithFinalizers=true. It will be removed in a future release. time="2023-07-26T15:02:03+08:00" level=info msg="Server node token is available at /home/kuscia/var/k3s/server/token" time="2023-07-26T15:02:03+08:00" level=info msg="To join server node to cluster: k3s server -s https://0.0.0.0:6443 -t ${SERVER_NODE_TOKEN}" time="2023-07-26T15:02:03+08:00" level=info msg="Agent node token is available at /home/kuscia/var/k3s/server/agent-token" time="2023-07-26T15:02:03+08:00" level=info msg="To join agent node to cluster: k3s agent -s https://0.0.0.0:6443 -t ${AGENT_NODE_TOKEN}" time="2023-07-26T15:02:03+08:00" level=info msg="Wrote kubeconfig /home/kuscia/etc/kubeconfig" time="2023-07-26T15:02:03+08:00" level=info msg="Run: k3s kubectl" I0726 15:02:04.912717 52 server.go:569] external host was not specified, using 172.18.0.2 I0726 15:02:04.961511 52 server.go:171] Version: v1.26.5+k3s1 I0726 15:02:04.961540 52 server.go:173] "Golang settings" GOGC="" GOMAXPROCS="" GOTRACEBACK="" time="2023-07-26T15:02:05+08:00" level=info msg="certificate CN=k3s,O=k3s signed by CN=k3s-server-ca@1690354923: notBefore=2023-07-26 07:02:03 +0000 UTC notAfter=2024-07-25 07:02:05 +0000 UTC" time="2023-07-26T15:02:05+08:00" level=info msg="Active TLS secret / (ver=) (count 11): map[listener.cattle.io/cn-0.0.0.0:0.0.0.0 listener.cattle.io/cn-10.43.0.1:10.43.0.1 listener.cattle.io/cn-127.0.0.1:127.0.0.1 listener.cattle.io/cn-172.18.0.2:172.18.0.2 listener.cattle.io/cn-__1-f16284:::1 listener.cattle.io/cn-kubernetes:kubernetes listener.cattle.io/cn-kubernetes.default:kubernetes.default listener.cattle.io/cn-kubernetes.default.svc:kubernetes.default.svc listener.cattle.io/cn-kubernetes.default.svc.cluster.local:kubernetes.default.svc.cluster.local listener.cattle.io/cn-localhost:localhost listener.cattle.io/cn-xiaomi-kuscia-master:xiaomi-kuscia-master listener.cattle.io/fingerprint:SHA1=9AC6ACCA436E8A09B0B4AD8CF7B011F5798C86CC]" I0726 15:02:05.553476 52 shared_informer.go:270] Waiting for caches to sync for node_authorizer I0726 15:02:05.588054 52 plugins.go:158] Loaded 12 mutating admission controller(s) successfully in the following order: NamespaceLifecycle,LimitRanger,ServiceAccount,NodeRestriction,TaintNodesByCondition,Priority,DefaultTolerationSeconds,DefaultStorageClass,StorageObjectInUseProtection,RuntimeClass,DefaultIngressClass,MutatingAdmissionWebhook. I0726 15:02:05.588083 52 plugins.go:161] Loaded 12 validating admission controller(s) successfully in the following order: LimitRanger,ServiceAccount,PodSecurity,Priority,PersistentVolumeClaimResize,RuntimeClass,CertificateApproval,CertificateSigning,CertificateSubjectRestriction,ValidatingAdmissionPolicy,ValidatingAdmissionWebhook,ResourceQuota. time="2023-07-26T15:02:05+08:00" level=info msg="certificate CN=xiaomi-kuscia-master signed by CN=k3s-server-ca@1690354923: notBefore=2023-07-26 07:02:03 +0000 UTC notAfter=2024-07-25 07:02:05 +0000 UTC" W0726 15:02:05.839581 52 genericapiserver.go:660] Skipping API apiextensions.k8s.io/v1beta1 because it has no resources. I0726 15:02:05.840586 52 instance.go:277] Using reconciler: lease time="2023-07-26T15:02:06+08:00" level=info msg="certificate CN=system:node:xiaomi-kuscia-master,O=system:nodes signed by CN=k3s-client-ca@1690354923: notBefore=2023-07-26 07:02:03 +0000 UTC notAfter=2024-07-25 07:02:06 +0000 UTC" I0726 15:02:06.240038 52 instance.go:621] API group "internal.apiserver.k8s.io" is not enabled, skipping. I0726 15:02:06.352715 52 instance.go:621] API group "resource.k8s.io" is not enabled, skipping. I0726 15:02:06.462378 52 cert_rotation.go:137] Starting client certificate rotation controller time="2023-07-26T15:02:06+08:00" level=info msg="Connecting to proxy" url="wss://0.0.0.0:6443/v1-k3s/connect" I0726 15:02:06.463406 52 cert_rotation.go:137] Starting client certificate rotation controller time="2023-07-26T15:02:06+08:00" level=info msg="Handling backend connection request [xiaomi-kuscia-master]" I0726 15:02:06.465147 52 cacher.go:435] cacher (prioritylevelconfigurations.flowcontrol.apiserver.k8s.io): initialized I0726 15:02:06.465162 52 watch_cache.go:565] Replace watchCache (rev: 3) I0726 15:02:06.468448 52 cacher.go:435] cacher (prioritylevelconfigurations.flowcontrol.apiserver.k8s.io): initialized I0726 15:02:06.468464 52 watch_cache.go:565] Replace watchCache (rev: 3) I0726 15:02:06.469148 52 cacher.go:435] cacher (controllerrevisions.apps): initialized I0726 15:02:06.469163 52 watch_cache.go:565] Replace watchCache (rev: 3) I0726 15:02:06.472234 52 storage_factory.go:274] storing tokenreviews.authentication.k8s.io in authentication.k8s.io/v1, reading as authentication.k8s.io/__internal from storagebackend.Config{Type:"etcd3", Prefix:"/registry", Transport:storagebackend.TransportConfig{ServerList:[]string{"unix://kine.sock"}, KeyFile:"", CertFile:"", TrustedCAFile:"", EgressLookup:(egressselector.Lookup)(0x358e6e0), TracerProvider:trace.noopTracerProvider{}}, Paging:true, Codec:runtime.Codec(nil), EncodeVersioner:runtime.GroupVersioner(nil), Transformer:value.Transformer(nil), CompactionInterval:300000000000, CountMetricPollPeriod:60000000000, DBMetricPollInterval:30000000000, HealthcheckTimeout:2000000000, ReadycheckTimeout:2000000000, LeaseManagerConfig:etcd3.LeaseManagerConfig{ReuseDurationSeconds:60, MaxObjectCount:1000}, StorageObjectCountTracker:(*request.objectCountTracker)(0xc000ec2630)} I0726 15:02:06.472283 52 etcd.go:383] "Using watch cache" resource="tokenreviews.authentication.k8s.io" W0726 15:02:06.472310 52 genericapiserver.go:660] Skipping API authentication.k8s.io/v1beta1 because it has no resources. W0726 15:02:06.472316 52 genericapiserver.go:660] Skipping API authentication.k8s.io/v1alpha1 because it has no resources. I0726 15:02:06.473671 52 storage_factory.go:274] storing localsubjectaccessreviews.authorization.k8s.io in authorization.k8s.io/v1, reading as authorization.k8s.io/__internal from storagebackend.Config{Type:"etcd3", Prefix:"/registry", Transport:storagebackend.TransportConfig{ServerList:[]string{"unix://kine.sock"}, KeyFile:"", CertFile:"", TrustedCAFile:"", EgressLookup:(egressselector.Lookup)(0x358e6e0), TracerProvider:trace.noopTracerProvider{}}, Paging:true, Codec:runtime.Codec(nil), EncodeVersioner:runtime.GroupVersioner(nil), Transformer:value.Transformer(nil), CompactionInterval:300000000000, CountMetricPollPeriod:60000000000, DBMetricPollInterval:30000000000, HealthcheckTimeout:2000000000, ReadycheckTimeout:2000000000, LeaseManagerConfig:etcd3.LeaseManagerConfig{ReuseDurationSeconds:60, MaxObjectCount:1000}, StorageObjectCountTracker:(*request.objectCountTracker)(0xc000ec2630)} I0726 15:02:06.473700 52 etcd.go:383] "Using watch cache" resource="localsubjectaccessreviews.authorization.k8s.io" I0726 15:02:06.473795 52 storage_factory.go:274] storing selfsubjectaccessreviews.authorization.k8s.io in authorization.k8s.io/v1, reading as authorization.k8s.io/__internal from storagebackend.Config{Type:"etcd3", Prefix:"/registry", Transport:storagebackend.TransportConfig{ServerList:[]string{"unix://kine.sock"}, KeyFile:"", CertFile:"", TrustedCAFile:"", EgressLookup:(egressselector.Lookup)(0x358e6e0), TracerProvider:trace.noopTracerProvider{}}, Paging:true, Codec:runtime.Codec(nil), EncodeVersioner:runtime.GroupVersioner(nil), Transformer:value.Transformer(nil), CompactionInterval:300000000000, CountMetricPollPeriod:60000000000, DBMetricPollInterval:30000000000, HealthcheckTimeout:2000000000, ReadycheckTimeout:2000000000, LeaseManagerConfig:etcd3.LeaseManagerConfig{ReuseDurationSeconds:60, MaxObjectCount:1000}, StorageObjectCountTracker:(*request.objectCountTracker)(0xc000ec2630)} I0726 15:02:06.473844 52 etcd.go:383] "Using watch cache" resource="selfsubjectaccessreviews.authorization.k8s.io" I0726 15:02:06.474053 52 storage_factory.go:274] storing selfsubjectrulesreviews.authorization.k8s.io in authorization.k8s.io/v1, reading as authorization.k8s.io/__internal from storagebackend.Config{Type:"etcd3", Prefix:"/registry", Transport:storagebackend.TransportConfig{ServerList:[]string{"unix://kine.sock"}, KeyFile:"", CertFile:"", TrustedCAFile:"", EgressLookup:(egressselector.Lookup)(0x358e6e0), TracerProvider:trace.noopTracerProvider{}}, Paging:true, Codec:runtime.Codec(nil), EncodeVersioner:runtime.GroupVersioner(nil), Transformer:value.Transformer(nil), CompactionInterval:300000000000, CountMetricPollPeriod:60000000000, DBMetricPollInterval:30000000000, HealthcheckTimeout:2000000000, ReadycheckTimeout:2000000000, LeaseManagerConfig:etcd3.LeaseManagerConfig{ReuseDurationSeconds:60, MaxObjectCount:1000}, StorageObjectCountTracker:(*request.objectCountTracker)(0xc000ec2630)} I0726 15:02:06.474090 52 etcd.go:383] "Using watch cache" resource="selfsubjectrulesreviews.authorization.k8s.io" I0726 15:02:06.474171 52 storage_factory.go:274] storing subjectaccessreviews.authorization.k8s.io in authorization.k8s.io/v1, reading as authorization.k8s.io/__internal from storagebackend.Config{Type:"etcd3", Prefix:"/registry", Transport:storagebackend.TransportConfig{ServerList:[]string{"unix://kine.sock"}, KeyFile:"", CertFile:"", TrustedCAFile:"", EgressLookup:(egressselector.Lookup)(0x358e6e0), TracerProvider:trace.noopTracerProvider{}}, Paging:true, Codec:runtime.Codec(nil), EncodeVersioner:runtime.GroupVersioner(nil), Transformer:value.Transformer(nil), CompactionInterval:300000000000, CountMetricPollPeriod:60000000000, DBMetricPollInterval:30000000000, HealthcheckTimeout:2000000000, ReadycheckTimeout:2000000000, LeaseManagerConfig:etcd3.LeaseManagerConfig{ReuseDurationSeconds:60, MaxObjectCount:1000}, StorageObjectCountTracker:(*request.objectCountTracker)(0xc000ec2630)} I0726 15:02:06.474205 52 etcd.go:383] "Using watch cache" resource="subjectaccessreviews.authorization.k8s.io" W0726 15:02:06.474253 52 genericapiserver.go:660] Skipping API authorization.k8s.io/v1beta1 because it has no resources. I0726 15:02:06.475705 52 storage_factory.go:274] storing horizontalpodautoscalers.autoscaling in autoscaling/v1, reading as autoscaling/__internal from storagebackend.Config{Type:"etcd3", Prefix:"/registry", Transport:storagebackend.TransportConfig{ServerList:[]string{"unix://kine.sock"}, KeyFile:"", CertFile:"", TrustedCAFile:"", EgressLookup:(egressselector.Lookup)(0x358e6e0), TracerProvider:trace.noopTracerProvider{}}, Paging:true, Codec:runtime.Codec(nil), EncodeVersioner:runtime.GroupVersioner(nil), Transformer:value.Transformer(nil), CompactionInterval:300000000000, CountMetricPollPeriod:60000000000, DBMetricPollInterval:30000000000, HealthcheckTimeout:2000000000, ReadycheckTimeout:2000000000, LeaseManagerConfig:etcd3.LeaseManagerConfig{ReuseDurationSeconds:60, MaxObjectCount:1000}, StorageObjectCountTracker:(*request.objectCountTracker)(0xc000ec2630)} I0726 15:02:06.475745 52 etcd.go:383] "Using watch cache" resource="horizontalpodautoscalers.autoscaling" I0726 15:02:06.475901 52 storage_factory.go:274] storing horizontalpodautoscalers.autoscaling in autoscaling/v1, reading as autoscaling/__internal from storagebackend.Config{Type:"etcd3", Prefix:"/registry", Transport:storagebackend.TransportConfig{ServerList:[]string{"unix://kine.sock"}, KeyFile:"", CertFile:"", TrustedCAFile:"", EgressLookup:(egressselector.Lookup)(0x358e6e0), TracerProvider:trace.noopTracerProvider{}}, Paging:true, Codec:runtime.Codec(nil), EncodeVersioner:runtime.GroupVersioner(nil), Transformer:value.Transformer(nil), CompactionInterval:300000000000, CountMetricPollPeriod:60000000000, DBMetricPollInterval:30000000000, HealthcheckTimeout:2000000000, ReadycheckTimeout:2000000000, LeaseManagerConfig:etcd3.LeaseManagerConfig{ReuseDurationSeconds:60, MaxObjectCount:1000}, StorageObjectCountTracker:(*request.objectCountTracker)(0xc000ec2630)} I0726 15:02:06.475934 52 etcd.go:383] "Using watch cache" resource="horizontalpodautoscalers.autoscaling" I0726 15:02:06.476141 52 cacher.go:435] cacher (mutatingwebhookconfigurations.admissionregistration.k8s.io): initialized I0726 15:02:06.476155 52 watch_cache.go:565] Replace watchCache (rev: 3) I0726 15:02:06.476591 52 cacher.go:435] cacher (validatingwebhookconfigurations.admissionregistration.k8s.io): initialized I0726 15:02:06.476607 52 watch_cache.go:565] Replace watchCache (rev: 3) I0726 15:02:06.477593 52 cacher.go:435] cacher (prioritylevelconfigurations.flowcontrol.apiserver.k8s.io): initialized I0726 15:02:06.477606 52 watch_cache.go:565] Replace watchCache (rev: 3) I0726 15:02:06.478675 52 storage_factory.go:274] storing horizontalpodautoscalers.autoscaling in autoscaling/v1, reading as autoscaling/__internal from storagebackend.Config{Type:"etcd3", Prefix:"/registry", Transport:storagebackend.TransportConfig{ServerList:[]string{"unix://kine.sock"}, KeyFile:"", CertFile:"", TrustedCAFile:"", EgressLookup:(egressselector.Lookup)(0x358e6e0), TracerProvider:trace.noopTracerProvider{}}, Paging:true, Codec:runtime.Codec(nil), EncodeVersioner:runtime.GroupVersioner(nil), Transformer:value.Transformer(nil), CompactionInterval:300000000000, CountMetricPollPeriod:60000000000, DBMetricPollInterval:30000000000, HealthcheckTimeout:2000000000, ReadycheckTimeout:2000000000, LeaseManagerConfig:etcd3.LeaseManagerConfig{ReuseDurationSeconds:60, MaxObjectCount:1000}, StorageObjectCountTracker:(*request.objectCountTracker)(0xc000ec2630)} I0726 15:02:06.478715 52 etcd.go:383] "Using watch cache" resource="horizontalpodautoscalers.autoscaling" I0726 15:02:06.492009 52 storage_factory.go:274] storing horizontalpodautoscalers.autoscaling in autoscaling/v1, reading as autoscaling/__internal from storagebackend.Config{Type:"etcd3", Prefix:"/registry", Transport:storagebackend.TransportConfig{ServerList:[]string{"unix://kine.sock"}, KeyFile:"", CertFile:"", TrustedCAFile:"", EgressLookup:(egressselector.Lookup)(0x358e6e0), TracerProvider:trace.noopTracerProvider{}}, Paging:true, Codec:runtime.Codec(nil), EncodeVersioner:runtime.GroupVersioner(nil), Transformer:value.Transformer(nil), CompactionInterval:300000000000, CountMetricPollPeriod:60000000000, DBMetricPollInterval:30000000000, HealthcheckTimeout:2000000000, ReadycheckTimeout:2000000000, LeaseManagerConfig:etcd3.LeaseManagerConfig{ReuseDurationSeconds:60, MaxObjectCount:1000}, StorageObjectCountTracker:(*request.objectCountTracker)(0xc000ec2630)} I0726 15:02:06.492060 52 etcd.go:383] "Using watch cache" resource="horizontalpodautoscalers.autoscaling" W0726 15:02:06.492133 52 genericapiserver.go:660] Skipping API autoscaling/v2beta1 because it has no resources. W0726 15:02:06.492144 52 genericapiserver.go:660] Skipping API autoscaling/v2beta2 because it has no resources. I0726 15:02:06.495126 52 storage_factory.go:274] storing cronjobs.batch in batch/v1, reading as batch/__internal from storagebackend.Config{Type:"etcd3", Prefix:"/registry", Transport:storagebackend.TransportConfig{ServerList:[]string{"unix://kine.sock"}, KeyFile:"", CertFile:"", TrustedCAFile:"", EgressLookup:(egressselector.Lookup)(0x358e6e0), TracerProvider:trace.noopTracerProvider{}}, Paging:true, Codec:runtime.Codec(nil), EncodeVersioner:runtime.GroupVersioner(nil), Transformer:value.Transformer(nil), CompactionInterval:300000000000, CountMetricPollPeriod:60000000000, DBMetricPollInterval:30000000000, HealthcheckTimeout:2000000000, ReadycheckTimeout:2000000000, LeaseManagerConfig:etcd3.LeaseManagerConfig{ReuseDurationSeconds:60, MaxObjectCount:1000}, StorageObjectCountTracker:(*request.objectCountTracker)(0xc000ec2630)} I0726 15:02:06.495159 52 etcd.go:383] "Using watch cache" resource="cronjobs.batch" I0726 15:02:06.495269 52 storage_factory.go:274] storing cronjobs.batch in batch/v1, reading as batch/__internal from storagebackend.Config{Type:"etcd3", Prefix:"/registry", Transport:storagebackend.TransportConfig{ServerList:[]string{"unix://kine.sock"}, KeyFile:"", CertFile:"", TrustedCAFile:"", EgressLookup:(egressselector.Lookup)(0x358e6e0), TracerProvider:trace.noopTracerProvider{}}, Paging:true, Codec:runtime.Codec(nil), EncodeVersioner:runtime.GroupVersioner(nil), Transformer:value.Transformer(nil), CompactionInterval:300000000000, CountMetricPollPeriod:60000000000, DBMetricPollInterval:30000000000, HealthcheckTimeout:2000000000, ReadycheckTimeout:2000000000, LeaseManagerConfig:etcd3.LeaseManagerConfig{ReuseDurationSeconds:60, MaxObjectCount:1000}, StorageObjectCountTracker:(*request.objectCountTracker)(0xc000ec2630)} I0726 15:02:06.495286 52 etcd.go:383] "Using watch cache" resource="cronjobs.batch" I0726 15:02:06.495551 52 storage_factory.go:274] storing jobs.batch in batch/v1, reading as batch/__internal from storagebackend.Config{Type:"etcd3", Prefix:"/registry", Transport:storagebackend.TransportConfig{ServerList:[]string{"unix://kine.sock"}, KeyFile:"", CertFile:"", TrustedCAFile:"", EgressLookup:(egressselector.Lookup)(0x358e6e0), TracerProvider:trace.noopTracerProvider{}}, Paging:true, Codec:runtime.Codec(nil), EncodeVersioner:runtime.GroupVersioner(nil), Transformer:value.Transformer(nil), CompactionInterval:300000000000, CountMetricPollPeriod:60000000000, DBMetricPollInterval:30000000000, HealthcheckTimeout:2000000000, ReadycheckTimeout:2000000000, LeaseManagerConfig:etcd3.LeaseManagerConfig{ReuseDurationSeconds:60, MaxObjectCount:1000}, StorageObjectCountTracker:(*request.objectCountTracker)(0xc000ec2630)} I0726 15:02:06.495571 52 etcd.go:383] "Using watch cache" resource="jobs.batch" I0726 15:02:06.495669 52 storage_factory.go:274] storing jobs.batch in batch/v1, reading as batch/__internal from storagebackend.Config{Type:"etcd3", Prefix:"/registry", Transport:storagebackend.TransportConfig{ServerList:[]string{"unix://kine.sock"}, KeyFile:"", CertFile:"", TrustedCAFile:"", EgressLookup:(egressselector.Lookup)(0x358e6e0), TracerProvider:trace.noopTracerProvider{}}, Paging:true, Codec:runtime.Codec(nil), EncodeVersioner:runtime.GroupVersioner(nil), Transformer:value.Transformer(nil), CompactionInterval:300000000000, CountMetricPollPeriod:60000000000, DBMetricPollInterval:30000000000, HealthcheckTimeout:2000000000, ReadycheckTimeout:2000000000, LeaseManagerConfig:etcd3.LeaseManagerConfig{ReuseDurationSeconds:60, MaxObjectCount:1000}, StorageObjectCountTracker:(*request.objectCountTracker)(0xc000ec2630)} I0726 15:02:06.495686 52 etcd.go:383] "Using watch cache" resource="jobs.batch" W0726 15:02:06.495746 52 genericapiserver.go:660] Skipping API batch/v1beta1 because it has no resources. I0726 15:02:06.496692 52 storage_factory.go:274] storing certificatesigningrequests.certificates.k8s.io in certificates.k8s.io/v1, reading as certificates.k8s.io/__internal from storagebackend.Config{Type:"etcd3", Prefix:"/registry", Transport:storagebackend.TransportConfig{ServerList:[]string{"unix://kine.sock"}, KeyFile:"", CertFile:"", TrustedCAFile:"", EgressLookup:(egressselector.Lookup)(0x358e6e0), TracerProvider:trace.noopTracerProvider{}}, Paging:true, Codec:runtime.Codec(nil), EncodeVersioner:runtime.GroupVersioner(nil), Transformer:value.Transformer(nil), CompactionInterval:300000000000, CountMetricPollPeriod:60000000000, DBMetricPollInterval:30000000000, HealthcheckTimeout:2000000000, ReadycheckTimeout:2000000000, LeaseManagerConfig:etcd3.LeaseManagerConfig{ReuseDurationSeconds:60, MaxObjectCount:1000}, StorageObjectCountTracker:(*request.objectCountTracker)(0xc000ec2630)}
from kuscia.
从上面k3s日志看,没有发现error信息。如果你本地发现该日志出现大量error信息,也可以贴一下。
上述作业运行失败,可以贴一下相关作业的信息以及作业相关pod的日志。具体命令如下:
-
查看作业的任务
kubectl get kt -
贴一下任务输出的内容
kubectl get kt xxx -o yaml -
贴一下任务pod的日志, pod信息可以通过下面命令查看
kubectl get pod -n alice
kubectl get pod xxxx -o yaml -n alice
kubectl get pod -n bob
kubectl get pod xxxx -o yaml -n bob -
fate相关的任务,请再贴一下fate-alice容器日志
docker ps | grep fate-alice
-> 若发现fate-alice容器不存在,那么fate任务肯定会运行失败。
docker logs fate-alice
from kuscia.
kubectl get kt kubectl get kt xxx -o yaml
kubectl get pods -A
NAMESPACE NAME READY STATUS RESTARTS AGE
bob fate-deploy-bob-6b85647f8b-bfb5n 1/1 Running 2 (16m ago) 23h
alice secretflow-task-20230727141300-single-psi-0 0/1 Pending 0 12m
bob secretflow-task-20230727141300-single-psi-0 0/1 Pending 0 12m
bob fate-task-20230727141852-data-reader-0 0/1 Pending 0 6m38s
[root@xiaomi-kuscia-master kuscia]# kubectl get pod secretflow-task-20230727141300-single-psi-0 -o yaml -n alice
apiVersion: v1
kind: Pod
metadata:
annotations:
kuscia.secretflow/config-template-volumes: config-template
kuscia.secretflow/task-resource-reserving-timestamp: "2023-07-27T14:18:00+08:00"
creationTimestamp: "2023-07-27T06:13:00Z"
labels:
kuscia.secretflow/communication-role-client: "true"
kuscia.secretflow/communication-role-server: "true"
kuscia.secretflow/controller: kusciatask
kuscia.secretflow/initiator: alice
kuscia.secretflow/task-id: secretflow-task-20230727141300-single-psi
kuscia.secretflow/task-resource: secretflow-task-20230727141300-single-psi-2e9c25132bbf
kuscia.secretflow/task-resource-group: secretflow-task-20230727141300-single-psi
task.kuscia.secretflow/pod-name: secretflow-task-20230727141300-single-psi-0
task.kuscia.secretflow/pod-role: ""
name: secretflow-task-20230727141300-single-psi-0
namespace: alice
ownerReferences:
- apiVersion: kuscia.secretflow/v1alpha1
blockOwnerDeletion: true
controller: true
kind: KusciaTask
name: secretflow-task-20230727141300-single-psi
uid: e8db39b1-228d-43ad-acfa-eee459feb288
resourceVersion: "213451"
uid: 41f6fd91-5089-43b1-a7e6-86e23bd4d05c
spec:
automountServiceAccountToken: false
containers: - args:
- -c
- python -m secretflow.kuscia.entry /etc/kuscia/task-config.conf
command: - sh
env: - name: TASK_ID
value: secretflow-task-20230727141300-single-psi - name: TASK_CLUSTER_DEFINE
value: '{"parties":[{"name":"bob","services":[{"port_name":"spu","endpoints":["secretflow-task-20230727141300-single-psi-0-spu.bob.svc"]},{"port_name":"fed","endpoints":["secretflow-task-20230727141300-single-psi-0-fed.bob.svc"]},{"port_name":"global","endpoints":["secretflow-task-20230727141300-single-psi-0-global.bob.svc:8081"]}]},{"name":"alice","services":[{"port_name":"fed","endpoints":["secretflow-task-20230727141300-single-psi-0-fed.alice.svc"]},{"port_name":"global","endpoints":["secretflow-task-20230727141300-single-psi-0-global.alice.svc:8081"]},{"port_name":"spu","endpoints":["secretflow-task-20230727141300-single-psi-0-spu.alice.svc"]}]}],"self_party_idx":1}' - name: ALLOCATED_PORTS
value: '{"ports":[{"name":"spu","port":54509,"scope":"Cluster","protocol":"GRPC"},{"name":"fed","port":8080,"scope":"Cluster","protocol":"GRPC"},{"name":"global","port":8081,"scope":"Domain","protocol":"GRPC"}]}' - name: TASK_INPUT_CONFIG
value: '{"sf_datasource_config":{"bob":{"id":"default-data-source"},"alice":{"id":"default-data-source"}},"sf_cluster_desc":{"parties":["alice","bob"],"devices":[{"name":"spu","type":"spu","parties":["alice","bob"],"config":"{"runtime_config":{"protocol":"REF2K","field":"FM64"},"link_desc":{"connect_retry_times":60,"connect_retry_interval_ms":1000,"brpc_channel_protocol":"http","brpc_channel_connection_type":"pooled","recv_timeout_ms":1200000,"http_timeout_ms":1200000}}"},{"name":"heu","type":"heu","parties":["alice","bob"],"config":"{"mode":
"PHEU", "schema": "paillier", "key_size": 2048}"}]},"sf_node_eval_param":{"domain":"preprocessing","name":"psi","version":"0.0.1","attr_paths":["input/receiver_input/key","input/sender_input/key","protocol","precheck_input","bucket_size","curve_type"],"attrs":[{"ss":["id1"]},{"ss":["id2"]},{"s":"ECDH_PSI_2PC"},{"b":true},{"i64":"1048576"},{"s":"CURVE_FOURQ"}],"inputs":[{"type":"sf.table.individual","meta":{"@type":"type.googleapis.com/secretflow.component.IndividualTable","schema":{"ids":["id1"],"features":["age","education","default","balance","housing","loan","day","duration","campaign","pdays","previous","job_blue-collar","job_entrepreneur","job_housemaid","job_management","job_retired","job_self-employed","job_services","job_student","job_technician","job_unemployed","marital_divorced","marital_married","marital_single"],"id_types":["str"],"feature_types":["f32","f32","f32","f32","f32","f32","f32","f32","f32","f32","f32","f32","f32","f32","f32","f32","f32","f32","f32","f32","f32","f32","f32","f32"]},"num_lines":"-1"},"data_refs":[{"uri":"alice.csv","party":"alice","format":"csv"}]},{"type":"sf.table.individual","meta":{"@type":"type.googleapis.com/secretflow.component.IndividualTable","schema":{"ids":["id2"],"features":["contact_cellular","contact_telephone","contact_unknown","month_apr","month_aug","month_dec","month_feb","month_jan","month_jul","month_jun","month_mar","month_may","month_nov","month_oct","month_sep","poutcome_failure","poutcome_other","poutcome_success","poutcome_unknown"],"labels":["y"],"id_types":["str"],"feature_types":["f32","f32","f32","f32","f32","f32","f32","f32","f32","f32","f32","f32","f32","f32","f32","f32","f32","f32","f32"],"label_types":["i32"]},"num_lines":"-1"},"data_refs":[{"uri":"bob.csv","party":"bob","format":"csv"}]}]},"sf_output_uris":["psi-output.csv"],"sf_output_ids":["psi-output"]}'
image: secretflow/secretflow-lite-anolis8:latest
imagePullPolicy: IfNotPresent
name: secretflow
ports: - containerPort: 54509
name: spu
protocol: TCP - containerPort: 8080
name: fed
protocol: TCP - containerPort: 8081
name: global
protocol: TCP
resources: {}
terminationMessagePath: /dev/termination-log
terminationMessagePolicy: FallbackToLogsOnError
volumeMounts: - mountPath: /etc/kuscia/task-config.conf
name: config-template
subPath: task-config.conf
workingDir: /work
dnsPolicy: ClusterFirst
enableServiceLinks: true
nodeSelector:
kuscia.secretflow/namespace: alice
preemptionPolicy: PreemptLowerPriority
priority: 0
restartPolicy: Never
schedulerName: kuscia-scheduler
securityContext: {}
serviceAccount: default
serviceAccountName: default
terminationGracePeriodSeconds: 30
tolerations:
- effect: NoSchedule
key: kuscia.secretflow/agent
operator: Exists - effect: NoExecute
key: node.kubernetes.io/not-ready
operator: Exists
tolerationSeconds: 300 - effect: NoExecute
key: node.kubernetes.io/unreachable
operator: Exists
tolerationSeconds: 300
volumes: - configMap:
defaultMode: 420
name: secretflow-task-20230727141300-single-psi-configtemplate
name: config-template
status:
conditions: - lastProbeTime: null
lastTransitionTime: "2023-07-27T06:13:00Z"
message: '0/2 nodes are available: 2 node(s) had untolerated taint {node.kubernetes.io/disk-pressure:
}. preemption: 0/2 nodes are available: 2 Preemption is not helpful for scheduling.,
reject the pod secretflow-task-20230727141300-single-psi-0 is unschedulable
even after PostFilter.'
reason: Unschedulable
status: "False"
type: PodScheduled
phase: Pending
qosClass: BestEffort
[root@xiaomi-kuscia-master kuscia]# kubectl get pod secretflow-task-20230727141300-single-psi-0 -o yaml -n bob
apiVersion: v1
kind: Pod
metadata:
annotations:
kuscia.secretflow/config-template-volumes: config-template
kuscia.secretflow/task-resource-reserving-timestamp: "2023-07-27T14:18:00+08:00"
creationTimestamp: "2023-07-27T06:13:00Z"
labels:
kuscia.secretflow/communication-role-client: "true"
kuscia.secretflow/communication-role-server: "true"
kuscia.secretflow/controller: kusciatask
kuscia.secretflow/initiator: alice
kuscia.secretflow/task-id: secretflow-task-20230727141300-single-psi
kuscia.secretflow/task-resource: secretflow-task-20230727141300-single-psi-6de118d1b2d0
kuscia.secretflow/task-resource-group: secretflow-task-20230727141300-single-psi
task.kuscia.secretflow/pod-name: secretflow-task-20230727141300-single-psi-0
task.kuscia.secretflow/pod-role: ""
name: secretflow-task-20230727141300-single-psi-0
namespace: bob
ownerReferences:
- apiVersion: kuscia.secretflow/v1alpha1
blockOwnerDeletion: true
controller: true
kind: KusciaTask
name: secretflow-task-20230727141300-single-psi
uid: e8db39b1-228d-43ad-acfa-eee459feb288
resourceVersion: "213452"
uid: dca2a812-f57b-4e6f-8113-0f43b639fe4f
spec:
automountServiceAccountToken: false
containers: - args:
- -c
- python -m secretflow.kuscia.entry /etc/kuscia/task-config.conf
command: - sh
env: - name: TASK_ID
value: secretflow-task-20230727141300-single-psi - name: TASK_CLUSTER_DEFINE
value: '{"parties":[{"name":"bob","services":[{"port_name":"spu","endpoints":["secretflow-task-20230727141300-single-psi-0-spu.bob.svc"]},{"port_name":"fed","endpoints":["secretflow-task-20230727141300-single-psi-0-fed.bob.svc"]},{"port_name":"global","endpoints":["secretflow-task-20230727141300-single-psi-0-global.bob.svc:8081"]}]},{"name":"alice","services":[{"port_name":"fed","endpoints":["secretflow-task-20230727141300-single-psi-0-fed.alice.svc"]},{"port_name":"global","endpoints":["secretflow-task-20230727141300-single-psi-0-global.alice.svc:8081"]},{"port_name":"spu","endpoints":["secretflow-task-20230727141300-single-psi-0-spu.alice.svc"]}]}]}' - name: ALLOCATED_PORTS
value: '{"ports":[{"name":"fed","port":8080,"scope":"Cluster","protocol":"GRPC"},{"name":"global","port":8081,"scope":"Domain","protocol":"GRPC"},{"name":"spu","port":54509,"scope":"Cluster","protocol":"GRPC"}]}' - name: TASK_INPUT_CONFIG
value: '{"sf_datasource_config":{"bob":{"id":"default-data-source"},"alice":{"id":"default-data-source"}},"sf_cluster_desc":{"parties":["alice","bob"],"devices":[{"name":"spu","type":"spu","parties":["alice","bob"],"config":"{"runtime_config":{"protocol":"REF2K","field":"FM64"},"link_desc":{"connect_retry_times":60,"connect_retry_interval_ms":1000,"brpc_channel_protocol":"http","brpc_channel_connection_type":"pooled","recv_timeout_ms":1200000,"http_timeout_ms":1200000}}"},{"name":"heu","type":"heu","parties":["alice","bob"],"config":"{"mode":
"PHEU", "schema": "paillier", "key_size": 2048}"}]},"sf_node_eval_param":{"domain":"preprocessing","name":"psi","version":"0.0.1","attr_paths":["input/receiver_input/key","input/sender_input/key","protocol","precheck_input","bucket_size","curve_type"],"attrs":[{"ss":["id1"]},{"ss":["id2"]},{"s":"ECDH_PSI_2PC"},{"b":true},{"i64":"1048576"},{"s":"CURVE_FOURQ"}],"inputs":[{"type":"sf.table.individual","meta":{"@type":"type.googleapis.com/secretflow.component.IndividualTable","schema":{"ids":["id1"],"features":["age","education","default","balance","housing","loan","day","duration","campaign","pdays","previous","job_blue-collar","job_entrepreneur","job_housemaid","job_management","job_retired","job_self-employed","job_services","job_student","job_technician","job_unemployed","marital_divorced","marital_married","marital_single"],"id_types":["str"],"feature_types":["f32","f32","f32","f32","f32","f32","f32","f32","f32","f32","f32","f32","f32","f32","f32","f32","f32","f32","f32","f32","f32","f32","f32","f32"]},"num_lines":"-1"},"data_refs":[{"uri":"alice.csv","party":"alice","format":"csv"}]},{"type":"sf.table.individual","meta":{"@type":"type.googleapis.com/secretflow.component.IndividualTable","schema":{"ids":["id2"],"features":["contact_cellular","contact_telephone","contact_unknown","month_apr","month_aug","month_dec","month_feb","month_jan","month_jul","month_jun","month_mar","month_may","month_nov","month_oct","month_sep","poutcome_failure","poutcome_other","poutcome_success","poutcome_unknown"],"labels":["y"],"id_types":["str"],"feature_types":["f32","f32","f32","f32","f32","f32","f32","f32","f32","f32","f32","f32","f32","f32","f32","f32","f32","f32","f32"],"label_types":["i32"]},"num_lines":"-1"},"data_refs":[{"uri":"bob.csv","party":"bob","format":"csv"}]}]},"sf_output_uris":["psi-output.csv"],"sf_output_ids":["psi-output"]}'
image: secretflow/secretflow-lite-anolis8:latest
imagePullPolicy: IfNotPresent
name: secretflow
ports: - containerPort: 54509
name: spu
protocol: TCP - containerPort: 8080
name: fed
protocol: TCP - containerPort: 8081
name: global
protocol: TCP
resources: {}
terminationMessagePath: /dev/termination-log
terminationMessagePolicy: FallbackToLogsOnError
volumeMounts: - mountPath: /etc/kuscia/task-config.conf
name: config-template
subPath: task-config.conf
workingDir: /work
dnsPolicy: ClusterFirst
enableServiceLinks: true
nodeSelector:
kuscia.secretflow/namespace: bob
preemptionPolicy: PreemptLowerPriority
priority: 0
restartPolicy: Never
schedulerName: kuscia-scheduler
securityContext: {}
serviceAccount: default
serviceAccountName: default
terminationGracePeriodSeconds: 30
tolerations:
- effect: NoSchedule
key: kuscia.secretflow/agent
operator: Exists - effect: NoExecute
key: node.kubernetes.io/not-ready
operator: Exists
tolerationSeconds: 300 - effect: NoExecute
key: node.kubernetes.io/unreachable
operator: Exists
tolerationSeconds: 300
volumes: - configMap:
defaultMode: 420
name: secretflow-task-20230727141300-single-psi-configtemplate
name: config-template
status:
conditions: - lastProbeTime: null
lastTransitionTime: "2023-07-27T06:13:00Z"
message: '0/2 nodes are available: 2 node(s) had untolerated taint {node.kubernetes.io/disk-pressure:
}. preemption: 0/2 nodes are available: 2 Preemption is not helpful for scheduling.,
reject the pod secretflow-task-20230727141300-single-psi-0 is unschedulable
even after PostFilter.'
reason: Unschedulable
status: "False"
type: PodScheduled
phase: Pending
qosClass: BestEffort
from kuscia.
从上面的内容可以看到,你所使用的机器磁盘空间不足,导致任务Pod失败和fate-alice容器运行异常。
解决方法:清理下机器,腾些磁盘空间出来。可以通过如下命令,检查disk-pressure taint是否还在。如果该taint还在,那么任务Pod是无法完成调度的,从而无法正常运行任务和作业。
from kuscia.
你好,请问清理完磁盘空间之后是否还有异常?
from kuscia.
我换台机器之后,作业执行成功!
from kuscia.
好的,那我把issue关掉了
from kuscia.
Related Issues (20)
- 目前 Kuscia 仅支持`localfs`数据源,何时支持oss在内的其他数据源呢? HOT 2
- 安装kuscia lite节点的时候,出现unknown flag: --mount HOT 2
- DomainData 字段有comment 会导致job失败 HOT 2
- bug: start_standalone.sh init_sf_image_info未正确设置SF_IMAGE_REGISTRY HOT 3
- Kuscia API中CreateDomainRoute的使用 HOT 6
- Kusica:0.5.0.dev231207部署问题 HOT 10
- P2P模式,任务经常报错,请问是什么原因? HOT 2
- 用阿里云k8s部署kuscia报错 HOT 7
- 3机点对点部署下,3方参与的求交任务执行 HOT 8
- 目前 Kuscia 只支持 root 用户权限部署吗? HOT 8
- kuscia中使用的几种distdata数据表具体的结构在哪边可以查看? HOT 2
- Kuscia 观测指标导出及其可视化技术 HOT 1
- kuscia 中的算法组件清单 HOT 9
- kuscia支持pod跑在TEE中吗 HOT 6
- Kuscia 模块ready探测函数抽取 HOT 3
- kuscia如何支持的端口复用 HOT 8
- fate作业执行失败 HOT 25
- 多机部署中心化集群执行测试作业失败 HOT 3
- 多机部署点对点集群是否支持secretpad HOT 2
- start_secretpad.sh脚本只支持master alice、bob节点在一台服务器的情况 HOT 41
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from kuscia.