Code Monkey home page Code Monkey logo

Comments (11)

santiago-wjq avatar santiago-wjq commented on September 27, 2024

你好,麻烦提供一下xxx-kuscia-lite-master、xxx-kuscia-lite-alice、xxx-kuscia-lite-bob里完整的kuscia日志,日志路径是/home/kuscia/var/logs/kuscia.log

from kuscia.

cs1317 avatar cs1317 commented on September 27, 2024

完整日志太大了,我把日志中的ERROR挑出来了
master kuscia.log:
2023-07-26 14:58:48.623 ERROR modules/k3s.go:56 open /home/kuscia/var/k3s/server/tls/server-ca.crt: no such file or directory 2023-07-26 14:58:49.624 ERROR modules/k3s.go:56 open /home/kuscia/var/k3s/server/tls/server-ca.crt: no such file or directory 2023-07-26 14:58:50.623 ERROR modules/k3s.go:56 open /home/kuscia/var/k3s/server/tls/server-ca.crt: no such file or directory 2023-07-26 14:58:51.624 ERROR modules/k3s.go:56 open /home/kuscia/var/k3s/server/tls/server-ca.crt: no such file or directory 2023-07-26 14:58:52.624 ERROR modules/k3s.go:56 open /home/kuscia/var/k3s/server/tls/server-ca.crt: no such file or directory 2023-07-26 14:58:53.624 ERROR modules/k3s.go:56 open /home/kuscia/var/k3s/server/tls/server-ca.crt: no such file or directory 2023-07-26 14:58:54.624 ERROR modules/k3s.go:56 open /home/kuscia/var/k3s/server/tls/server-ca.crt: no such file or directory 2023-07-26 14:58:55.624 ERROR modules/k3s.go:56 open /home/kuscia/var/k3s/server/tls/server-ca.crt: no such file or directory 2023-07-26 14:58:56.624 ERROR modules/k3s.go:56 open /home/kuscia/var/k3s/server/tls/server-ca.crt: no such file or directory 2023-07-26 14:58:57.624 ERROR modules/k3s.go:56 open /home/kuscia/var/k3s/server/tls/server-ca.crt: no such file or directory 2023-07-26 14:58:58.602 ERROR modules/k3s.go:56 open /home/kuscia/var/k3s/server/tls/server-ca.crt: no such file or directory 2023-07-26 14:58:59.602 ERROR modules/k3s.go:56 open /home/kuscia/var/k3s/server/tls/server-ca.crt: no such file or directory 2023-07-26 14:59:00.601 ERROR modules/k3s.go:56 open /home/kuscia/var/k3s/server/tls/server-ca.crt: no such file or directory 2023-07-26 14:59:01.602 ERROR modules/k3s.go:56 open /home/kuscia/var/k3s/server/tls/server-ca.crt: no such file or directory 2023-07-26 14:59:02.602 ERROR modules/k3s.go:56 open /home/kuscia/var/k3s/server/tls/server-ca.crt: no such file or directory 2023-07-26 14:59:03.602 ERROR modules/k3s.go:56 open /home/kuscia/var/k3s/server/tls/server-ca.crt: no such file or directory 2023-07-26 14:59:04.601 ERROR modules/k3s.go:56 open /home/kuscia/var/k3s/server/tls/server-ca.crt: no such file or directory 2023-07-26 14:59:05.602 ERROR modules/k3s.go:56 open /home/kuscia/var/k3s/server/tls/server-ca.crt: no such file or directory 2023-07-26 14:59:06.602 ERROR modules/k3s.go:56 open /home/kuscia/var/k3s/server/tls/server-ca.crt: no such file or directory 2023-07-26 14:59:07.601 ERROR modules/k3s.go:56 open /home/kuscia/var/k3s/server/tls/server-ca.crt: no such file or directory 2023-07-26 14:59:08.603 ERROR modules/k3s.go:56 open /home/kuscia/var/k3s/server/tls/server-ca.crt: no such file or directory 2023-07-26 14:59:09.602 ERROR modules/k3s.go:56 open /home/kuscia/var/k3s/server/tls/server-ca.crt: no such file or directory 2023-07-26 14:59:10.603 ERROR modules/k3s.go:56 open /home/kuscia/var/k3s/server/tls/server-ca.crt: no such file or directory 2023-07-26 14:59:11.601 ERROR modules/k3s.go:56 open /home/kuscia/var/k3s/server/tls/server-ca.crt: no such file or directory 2023-07-26 14:59:12.601 ERROR modules/k3s.go:56 open /home/kuscia/var/k3s/server/tls/server-ca.crt: no such file or directory 2023-07-26 14:59:13.601 ERROR modules/k3s.go:56 open /home/kuscia/var/k3s/server/tls/server-ca.crt: no such file or directory 2023-07-26 14:59:14.601 ERROR modules/k3s.go:56 open /home/kuscia/var/k3s/server/tls/server-ca.crt: no such file or directory 2023-07-26 14:59:15.602 ERROR modules/k3s.go:56 open /home/kuscia/var/k3s/server/tls/server-ca.crt: no such file or directory 2023-07-26 14:59:16.602 ERROR modules/k3s.go:56 open /home/kuscia/var/k3s/server/tls/server-ca.crt: no such file or directory 2023-07-26 14:59:17.602 ERROR modules/k3s.go:201 wait k3s ready timeout 2023-07-26 14:59:17.602 ERROR master/master.go:139 error building kubernetes client config from token, detail-> build config from flags failed, detail-> stat /home/kuscia/etc/kubeconfig: no such file or directory 2023-07-26 14:59:40.948 ERROR modules/k3s.go:56 open /home/kuscia/var/k3s/server/tls/server-ca.crt: no such file or directory 2023-07-26 14:59:41.949 ERROR modules/k3s.go:56 open /home/kuscia/var/k3s/server/tls/server-ca.crt: no such file or directory 2023-07-26 14:59:42.948 ERROR modules/k3s.go:56 open /home/kuscia/var/k3s/server/tls/server-ca.crt: no such file or directory 2023-07-26 14:59:43.947 ERROR modules/k3s.go:56 open /home/kuscia/var/k3s/server/tls/server-ca.crt: no such file or directory 2023-07-26 14:59:44.949 ERROR modules/k3s.go:56 open /home/kuscia/var/k3s/server/tls/server-ca.crt: no such file or directory 2023-07-26 14:59:45.948 ERROR modules/k3s.go:56 open /home/kuscia/var/k3s/server/tls/server-ca.crt: no such file or directory 2023-07-26 14:59:46.948 ERROR modules/k3s.go:56 open /home/kuscia/var/k3s/server/tls/server-ca.crt: no such file or directory 2023-07-26 14:59:47.948 ERROR modules/k3s.go:56 open /home/kuscia/var/k3s/server/tls/server-ca.crt: no such file or directory 2023-07-26 14:59:48.949 ERROR modules/k3s.go:56 open /home/kuscia/var/k3s/server/tls/server-ca.crt: no such file or directory 2023-07-26 14:59:49.948 ERROR modules/k3s.go:56 open /home/kuscia/var/k3s/server/tls/server-ca.crt: no such file or directory 2023-07-26 14:59:50.948 ERROR modules/k3s.go:56 open /home/kuscia/var/k3s/server/tls/server-ca.crt: no such file or directory 2023-07-26 14:59:52.331 ERROR modules/k3s.go:56 open /home/kuscia/var/k3s/server/tls/server-ca.crt: no such file or directory 2023-07-26 14:59:52.948 ERROR modules/k3s.go:56 open /home/kuscia/var/k3s/server/tls/server-ca.crt: no such file or directory 2023-07-26 14:59:53.948 ERROR modules/k3s.go:56 open /home/kuscia/var/k3s/server/tls/server-ca.crt: no such file or directory 2023-07-26 14:59:54.948 ERROR modules/k3s.go:56 open /home/kuscia/var/k3s/server/tls/server-ca.crt: no such file or directory 2023-07-26 14:59:55.948 ERROR modules/k3s.go:56 open /home/kuscia/var/k3s/server/tls/server-ca.crt: no such file or directory 2023-07-26 14:59:56.948 ERROR modules/k3s.go:56 open /home/kuscia/var/k3s/server/tls/server-ca.crt: no such file or directory 2023-07-26 14:59:57.925 ERROR modules/k3s.go:56 open /home/kuscia/var/k3s/server/tls/server-ca.crt: no such file or directory 2023-07-26 14:59:58.925 ERROR modules/k3s.go:56 open /home/kuscia/var/k3s/server/tls/server-ca.crt: no such file or directory 2023-07-26 14:59:59.927 ERROR modules/k3s.go:56 open /home/kuscia/var/k3s/server/tls/server-ca.crt: no such file or directory 2023-07-26 15:00:00.926 ERROR modules/k3s.go:56 open /home/kuscia/var/k3s/server/tls/server-ca.crt: no such file or directory 2023-07-26 15:00:01.925 ERROR modules/k3s.go:56 open /home/kuscia/var/k3s/server/tls/server-ca.crt: no such file or directory 2023-07-26 15:00:02.926 ERROR modules/k3s.go:56 open /home/kuscia/var/k3s/server/tls/server-ca.crt: no such file or directory 2023-07-26 15:00:03.925 ERROR modules/k3s.go:56 open /home/kuscia/var/k3s/server/tls/server-ca.crt: no such file or directory 2023-07-26 15:00:04.926 ERROR modules/k3s.go:56 open /home/kuscia/var/k3s/server/tls/server-ca.crt: no such file or directory 2023-07-26 15:00:05.926 ERROR modules/k3s.go:56 open /home/kuscia/var/k3s/server/tls/server-ca.crt: no such file or directory 2023-07-26 15:00:06.926 ERROR modules/k3s.go:56 open /home/kuscia/var/k3s/server/tls/server-ca.crt: no such file or directory 2023-07-26 15:00:07.927 ERROR modules/k3s.go:56 open /home/kuscia/var/k3s/server/tls/server-ca.crt: no such file or directory 2023-07-26 15:00:08.926 ERROR modules/k3s.go:56 open /home/kuscia/var/k3s/server/tls/server-ca.crt: no such file or directory 2023-07-26 15:00:09.926 ERROR modules/k3s.go:201 wait k3s ready timeout 2023-07-26 15:00:09.926 ERROR master/master.go:139 error building kubernetes client config from token, detail-> build config from flags failed, detail-> stat /home/kuscia/etc/kubeconfig: no such file or directory 2023-07-26 15:00:24.051 ERROR modules/k3s.go:56 open /home/kuscia/var/k3s/server/tls/server-ca.crt: no such file or directory 2023-07-26 15:00:25.051 ERROR modules/k3s.go:56 open /home/kuscia/var/k3s/server/tls/server-ca.crt: no such file or directory 2023-07-26 15:00:26.051 ERROR modules/k3s.go:56 open /home/kuscia/var/k3s/server/tls/server-ca.crt: no such file or directory 2023-07-26 15:00:27.051 ERROR modules/k3s.go:56 open /home/kuscia/var/k3s/server/tls/server-ca.crt: no such file or directory 2023-07-26 15:00:28.028 ERROR modules/k3s.go:56 open /home/kuscia/var/k3s/server/tls/server-ca.crt: no such file or directory 2023-07-26 15:00:29.028 ERROR modules/k3s.go:56 open /home/kuscia/var/k3s/server/tls/server-ca.crt: no such file or directory 2023-07-26 15:00:30.029 ERROR modules/k3s.go:56 open /home/kuscia/var/k3s/server/tls/server-ca.crt: no such file or directory 2023-07-26 15:00:31.028 ERROR modules/k3s.go:56 open /home/kuscia/var/k3s/server/tls/server-ca.crt: no such file or directory 2023-07-26 15:00:32.028 ERROR modules/k3s.go:56 open /home/kuscia/var/k3s/server/tls/server-ca.crt: no such file or directory 2023-07-26 15:00:33.028 ERROR modules/k3s.go:56 open /home/kuscia/var/k3s/server/tls/server-ca.crt: no such file or directory 2023-07-26 15:00:34.028 ERROR modules/k3s.go:56 open /home/kuscia/var/k3s/server/tls/server-ca.crt: no such file or directory 2023-07-26 15:00:35.030 ERROR modules/k3s.go:56 open /home/kuscia/var/k3s/server/tls/server-ca.crt: no such file or directory 2023-07-26 15:00:36.029 ERROR modules/k3s.go:56 open /home/kuscia/var/k3s/server/tls/server-ca.crt: no such file or directory 2023-07-26 15:00:37.028 ERROR modules/k3s.go:56 open /home/kuscia/var/k3s/server/tls/server-ca.crt: no such file or directory 2023-07-26 15:00:38.028 ERROR modules/k3s.go:56 open /home/kuscia/var/k3s/server/tls/server-ca.crt: no such file or directory 2023-07-26 15:00:39.029 ERROR modules/k3s.go:56 open /home/kuscia/var/k3s/server/tls/server-ca.crt: no such file or directory 2023-07-26 15:00:40.028 ERROR modules/k3s.go:56 open /home/kuscia/var/k3s/server/tls/server-ca.crt: no such file or directory 2023-07-26 15:00:41.029 ERROR modules/k3s.go:56 open /home/kuscia/var/k3s/server/tls/server-ca.crt: no such file or directory 2023-07-26 15:00:42.028 ERROR modules/k3s.go:56 open /home/kuscia/var/k3s/server/tls/server-ca.crt: no such file or directory 2023-07-26 15:00:43.028 ERROR modules/k3s.go:56 open /home/kuscia/var/k3s/server/tls/server-ca.crt: no such file or directory 2023-07-26 15:00:44.030 ERROR modules/k3s.go:56 open /home/kuscia/var/k3s/server/tls/server-ca.crt: no such file or directory 2023-07-26 15:00:45.032 ERROR modules/k3s.go:56 open /home/kuscia/var/k3s/server/tls/server-ca.crt: no such file or directory 2023-07-26 15:00:46.029 ERROR modules/k3s.go:56 open /home/kuscia/var/k3s/server/tls/server-ca.crt: no such file or directory 2023-07-26 15:00:47.028 ERROR modules/k3s.go:56 open /home/kuscia/var/k3s/server/tls/server-ca.crt: no such file or directory 2023-07-26 15:00:48.028 ERROR modules/k3s.go:56 open /home/kuscia/var/k3s/server/tls/server-ca.crt: no such file or directory 2023-07-26 15:00:49.028 ERROR modules/k3s.go:56 open /home/kuscia/var/k3s/server/tls/server-ca.crt: no such file or directory 2023-07-26 15:00:50.028 ERROR modules/k3s.go:56 open /home/kuscia/var/k3s/server/tls/server-ca.crt: no such file or directory 2023-07-26 15:00:51.028 ERROR modules/k3s.go:56 open /home/kuscia/var/k3s/server/tls/server-ca.crt: no such file or directory 2023-07-26 15:00:52.029 ERROR modules/k3s.go:56 open /home/kuscia/var/k3s/server/tls/server-ca.crt: no such file or directory 2023-07-26 15:00:53.028 ERROR modules/k3s.go:201 wait k3s ready timeout 2023-07-26 15:00:53.028 ERROR master/master.go:139 error building kubernetes client config from token, detail-> build config from flags failed, detail-> stat /home/kuscia/etc/kubeconfig: no such file or directory 2023-07-26 15:01:10.337 ERROR modules/k3s.go:56 open /home/kuscia/var/k3s/server/tls/server-ca.crt: no such file or directory 2023-07-26 15:01:11.338 ERROR modules/k3s.go:56 open /home/kuscia/var/k3s/server/tls/server-ca.crt: no such file or directory 2023-07-26 15:01:12.338 ERROR modules/k3s.go:56 open /home/kuscia/var/k3s/server/tls/server-ca.crt: no such file or directory 2023-07-26 15:01:13.338 ERROR modules/k3s.go:56 open /home/kuscia/var/k3s/server/tls/server-ca.crt: no such file or directory 2023-07-26 15:01:14.340 ERROR modules/k3s.go:56 open /home/kuscia/var/k3s/server/tls/server-ca.crt: no such file or directory 2023-07-26 15:01:15.338 ERROR modules/k3s.go:56 open /home/kuscia/var/k3s/server/tls/server-ca.crt: no such file or directory 2023-07-26 15:01:16.338 ERROR modules/k3s.go:56 open /home/kuscia/var/k3s/server/tls/server-ca.crt: no such file or directory 2023-07-26 15:01:17.338 ERROR modules/k3s.go:56 open /home/kuscia/var/k3s/server/tls/server-ca.crt: no such file or directory 2023-07-26 15:01:18.338 ERROR modules/k3s.go:56 open /home/kuscia/var/k3s/server/tls/server-ca.crt: no such file or directory 2023-07-26 15:01:19.337 ERROR modules/k3s.go:56 open /home/kuscia/var/k3s/server/tls/server-ca.crt: no such file or directory 2023-07-26 15:01:20.338 ERROR modules/k3s.go:56 open /home/kuscia/var/k3s/server/tls/server-ca.crt: no such file or directory 2023-07-26 15:01:21.338 ERROR modules/k3s.go:56 open /home/kuscia/var/k3s/server/tls/server-ca.crt: no such file or directory 2023-07-26 15:01:22.338 ERROR modules/k3s.go:56 open /home/kuscia/var/k3s/server/tls/server-ca.crt: no such file or directory 2023-07-26 15:01:23.338 ERROR modules/k3s.go:56 open /home/kuscia/var/k3s/server/tls/server-ca.crt: no such file or directory 2023-07-26 15:01:24.338 ERROR modules/k3s.go:56 open /home/kuscia/var/k3s/server/tls/server-ca.crt: no such file or directory 2023-07-26 15:01:25.338 ERROR modules/k3s.go:56 open /home/kuscia/var/k3s/server/tls/server-ca.crt: no such file or directory 2023-07-26 15:01:26.338 ERROR modules/k3s.go:56 open /home/kuscia/var/k3s/server/tls/server-ca.crt: no such file or directory 2023-07-26 15:01:27.338 ERROR modules/k3s.go:56 open /home/kuscia/var/k3s/server/tls/server-ca.crt: no such file or directory 2023-07-26 15:01:28.316 ERROR modules/k3s.go:56 open /home/kuscia/var/k3s/server/tls/server-ca.crt: no such file or directory 2023-07-26 15:01:29.315 ERROR modules/k3s.go:56 open /home/kuscia/var/k3s/server/tls/server-ca.crt: no such file or directory 2023-07-26 15:01:30.317 ERROR modules/k3s.go:56 open /home/kuscia/var/k3s/server/tls/server-ca.crt: no such file or directory 2023-07-26 15:01:31.315 ERROR modules/k3s.go:56 open /home/kuscia/var/k3s/server/tls/server-ca.crt: no such file or directory 2023-07-26 15:01:32.315 ERROR modules/k3s.go:56 open /home/kuscia/var/k3s/server/tls/server-ca.crt: no such file or directory 2023-07-26 15:01:33.316 ERROR modules/k3s.go:56 open /home/kuscia/var/k3s/server/tls/server-ca.crt: no such file or directory 2023-07-26 15:01:34.316 ERROR modules/k3s.go:56 open /home/kuscia/var/k3s/server/tls/server-ca.crt: no such file or directory 2023-07-26 15:01:35.317 ERROR modules/k3s.go:56 open /home/kuscia/var/k3s/server/tls/server-ca.crt: no such file or directory 2023-07-26 15:01:36.315 ERROR modules/k3s.go:56 open /home/kuscia/var/k3s/server/tls/server-ca.crt: no such file or directory 2023-07-26 15:01:37.316 ERROR modules/k3s.go:56 open /home/kuscia/var/k3s/server/tls/server-ca.crt: no such file or directory 2023-07-26 15:01:38.315 ERROR modules/k3s.go:56 open /home/kuscia/var/k3s/server/tls/server-ca.crt: no such file or directory 2023-07-26 15:01:39.315 ERROR modules/k3s.go:201 wait k3s ready timeout 2023-07-26 15:01:39.315 ERROR master/master.go:139 error building kubernetes client config from token, detail-> build config from flags failed, detail-> stat /home/kuscia/etc/kubeconfig: no such file or directory 2023-07-26 15:01:53.885 ERROR modules/k3s.go:56 open /home/kuscia/var/k3s/server/tls/server-ca.crt: no such file or directory 2023-07-26 15:01:54.885 ERROR modules/k3s.go:56 open /home/kuscia/var/k3s/server/tls/server-ca.crt: no such file or directory 2023-07-26 15:01:55.884 ERROR modules/k3s.go:56 open /home/kuscia/var/k3s/server/tls/server-ca.crt: no such file or directory 2023-07-26 15:01:56.884 ERROR modules/k3s.go:56 open /home/kuscia/var/k3s/server/tls/server-ca.crt: no such file or directory 2023-07-26 15:01:57.863 ERROR modules/k3s.go:56 open /home/kuscia/var/k3s/server/tls/server-ca.crt: no such file or directory 2023-07-26 15:01:58.864 ERROR modules/k3s.go:56 open /home/kuscia/var/k3s/server/tls/server-ca.crt: no such file or directory 2023-07-26 15:01:59.863 ERROR modules/k3s.go:56 open /home/kuscia/var/k3s/server/tls/server-ca.crt: no such file or directory 2023-07-26 15:02:00.862 ERROR modules/k3s.go:56 open /home/kuscia/var/k3s/server/tls/server-ca.crt: no such file or directory 2023-07-26 15:02:01.862 ERROR modules/k3s.go:56 open /home/kuscia/var/k3s/server/tls/server-ca.crt: no such file or directory 2023-07-26 15:02:02.866 ERROR modules/k3s.go:56 open /home/kuscia/var/k3s/server/tls/server-ca.crt: no such file or directory 2023-07-26 15:02:28.561 ERROR modules/envoy.go:50 Get ready err:Get "http://127.0.0.1:10000/ready": dial tcp 127.0.0.1:10000: connect: connection refused 2023-07-26 15:02:29.562 ERROR modules/envoy.go:50 Get ready err:Get "http://127.0.0.1:10000/ready": dial tcp 127.0.0.1:10000: connect: connection refused 2023-07-26 15:02:30.561 ERROR modules/envoy.go:50 Get ready err:Get "http://127.0.0.1:10000/ready": dial tcp 127.0.0.1:10000: connect: connection refused 2023-07-26 15:02:31.562 ERROR modules/envoy.go:50 Get ready err:Get "http://127.0.0.1:10000/ready": dial tcp 127.0.0.1:10000: connect: connection refused 2023-07-26 15:02:32.561 ERROR modules/envoy.go:50 Get ready err:Get "http://127.0.0.1:10000/ready": dial tcp 127.0.0.1:10000: connect: connection refused 2023-07-26 15:03:26.255 ERROR clusterdomainroute/controller.go:359 domainroutes.kuscia.secretflow "alice-bob" already exists 2023-07-26 15:03:27.758 ERROR clusterdomainroute/controller.go:359 domainroutes.kuscia.secretflow "bob-alice" already exists 2023-07-26 15:03:49.123 ERROR kusciascheduling/kusciascheduling.go:137 PreFilter failed for pod alice/secretflow-task-20230726150347-single-psi-0, failed to get task resource alice/ for pod 2023-07-26 15:03:49.154 ERROR kusciascheduling/kusciascheduling.go:137 PreFilter failed for pod bob/secretflow-task-20230726150347-single-psi-0, failed to get task resource bob/ for pod 2023-07-26 15:10:32.401 ERROR controller/gateway.go:161 update gateway(name:xiaomi-kuscia-master namespace:kuscia-system) fail: Operation cannot be fulfilled on gateways.kuscia.secretflow "xiaomi-kuscia-master": the object has been modified; please apply your changes to the latest version and try again 2023-07-26 15:10:32.401 ERROR controller/gateway.go:102 sync gateway error: Operation cannot be fulfilled on gateways.kuscia.secretflow "xiaomi-kuscia-master": the object has been modified; please apply your changes to the latest version and try again 2023-07-26 15:11:20.284 ERROR controller/gateway.go:161 update gateway(name:xiaomi-kuscia-master namespace:kuscia-system) fail: Operation cannot be fulfilled on gateways.kuscia.secretflow "xiaomi-kuscia-master": the object has been modified; please apply your changes to the latest version and try again 2023-07-26 15:11:20.284 ERROR controller/gateway.go:102 sync gateway error: Operation cannot be fulfilled on gateways.kuscia.secretflow "xiaomi-kuscia-master": the object has been modified; please apply your changes to the latest version and try again 2023-07-26 15:12:22.577 ERROR controller/gateway.go:161 update gateway(name:xiaomi-kuscia-master namespace:kuscia-system) fail: Operation cannot be fulfilled on gateways.kuscia.secretflow "xiaomi-kuscia-master": the object has been modified; please apply your changes to the latest version and try again 2023-07-26 15:12:22.577 ERROR controller/gateway.go:102 sync gateway error: Operation cannot be fulfilled on gateways.kuscia.secretflow "xiaomi-kuscia-master": the object has been modified; please apply your changes to the latest version and try again 2023-07-26 15:15:58.447 ERROR controller/gateway.go:161 update gateway(name:xiaomi-kuscia-master namespace:kuscia-system) fail: Operation cannot be fulfilled on gateways.kuscia.secretflow "xiaomi-kuscia-master": the object has been modified; please apply your changes to the latest version and try again 2023-07-26 15:15:58.447 ERROR controller/gateway.go:102 sync gateway error: Operation cannot be fulfilled on gateways.kuscia.secretflow "xiaomi-kuscia-master": the object has been modified; please apply your changes to the latest version and try again 2023-07-26 15:39:34.291 ERROR modules/k3s.go:97 Get ready err:Get "https://127.0.0.1:6443/readyz": dial tcp 127.0.0.1:6443: connect: connection refused 2023-07-26 15:39:34.291 ERROR modules/k3s.go:97 Get ready err:Get "https://127.0.0.1:6443/readyz": dial tcp 127.0.0.1:6443: connect: connection refused 2023-07-26 15:39:34.996 ERROR modules/k3s.go:97 Get ready err:Get "https://127.0.0.1:6443/readyz": dial tcp 127.0.0.1:6443: connect: connection refused 2023-07-26 15:39:35.997 ERROR modules/k3s.go:97 Get ready err:Get "https://127.0.0.1:6443/readyz": dial tcp 127.0.0.1:6443: connect: connection refused 2023-07-26 15:39:36.998 ERROR modules/k3s.go:97 Get ready err:Get "https://127.0.0.1:6443/readyz": dial tcp 127.0.0.1:6443: connect: connection refused 2023-07-26 15:39:37.997 ERROR modules/k3s.go:97 Get ready err:Get "https://127.0.0.1:6443/readyz": dial tcp 127.0.0.1:6443: connect: connection refused 2023-07-26 15:39:38.999 ERROR modules/k3s.go:97 Get ready err:Get "https://127.0.0.1:6443/readyz": dial tcp 127.0.0.1:6443: connect: connection refused 2023-07-26 15:39:39.998 ERROR modules/k3s.go:97 Get ready err:Get "https://127.0.0.1:6443/readyz": dial tcp 127.0.0.1:6443: connect: connection refused 2023-07-26 15:39:40.999 ERROR modules/k3s.go:97 Get ready err:Get "https://127.0.0.1:6443/readyz": dial tcp 127.0.0.1:6443: connect: connection refused 2023-07-26 15:39:41.998 ERROR modules/k3s.go:97 Get ready err:Get "https://127.0.0.1:6443/readyz": dial tcp 127.0.0.1:6443: connect: connection refused 2023-07-26 15:39:42.997 ERROR modules/k3s.go:97 Get ready err:Get "https://127.0.0.1:6443/readyz": dial tcp 127.0.0.1:6443: connect: connection refused 2023-07-26 15:39:43.998 ERROR modules/k3s.go:97 Get ready err:Get "https://127.0.0.1:6443/readyz": dial tcp 127.0.0.1:6443: connect: connection refused 2023-07-26 15:39:44.996 ERROR modules/k3s.go:97 Get ready err:Get "https://127.0.0.1:6443/readyz": dial tcp 127.0.0.1:6443: connect: connection refused 2023-07-26 15:39:45.996 ERROR modules/k3s.go:97 Get ready err:Get "https://127.0.0.1:6443/readyz": dial tcp 127.0.0.1:6443: connect: connection refused 2023-07-26 15:39:46.997 ERROR modules/k3s.go:97 Get ready err:Get "https://127.0.0.1:6443/readyz": dial tcp 127.0.0.1:6443: connect: connection refused 2023-07-26 15:39:47.998 ERROR modules/k3s.go:97 Get ready err:Get "https://127.0.0.1:6443/readyz": dial tcp 127.0.0.1:6443: connect: connection refused 2023-07-26 15:39:48.996 ERROR modules/k3s.go:97 Get ready err:Get "https://127.0.0.1:6443/readyz": dial tcp 127.0.0.1:6443: connect: connection refused 2023-07-26 15:39:49.996 ERROR modules/k3s.go:97 Get ready err:Get "https://127.0.0.1:6443/readyz": dial tcp 127.0.0.1:6443: connect: connection refused 2023-07-26 15:39:50.997 ERROR modules/k3s.go:97 Get ready err:Get "https://127.0.0.1:6443/readyz": dial tcp 127.0.0.1:6443: connect: connection refused 2023-07-26 15:39:51.997 ERROR modules/k3s.go:97 Get ready err:Get "https://127.0.0.1:6443/readyz": dial tcp 127.0.0.1:6443: connect: connection refused 2023-07-26 15:39:52.997 ERROR modules/k3s.go:97 Get ready err:Get "https://127.0.0.1:6443/readyz": dial tcp 127.0.0.1:6443: connect: connection refused 2023-07-26 15:39:53.997 ERROR modules/k3s.go:97 Get ready err:Get "https://127.0.0.1:6443/readyz": dial tcp 127.0.0.1:6443: connect: connection refused 2023-07-26 15:39:54.996 ERROR modules/k3s.go:97 Get ready err:Get "https://127.0.0.1:6443/readyz": dial tcp 127.0.0.1:6443: connect: connection refused 2023-07-26 15:39:55.975 ERROR modules/k3s.go:97 Get ready err:Get "https://127.0.0.1:6443/readyz": dial tcp 127.0.0.1:6443: connect: connection refused 2023-07-26 15:39:56.975 ERROR modules/k3s.go:97 Get ready err:Get "https://127.0.0.1:6443/readyz": dial tcp 127.0.0.1:6443: connect: connection refused 2023-07-26 15:39:57.974 ERROR modules/k3s.go:97 Get ready err:Get "https://127.0.0.1:6443/readyz": dial tcp 127.0.0.1:6443: connect: connection refused 2023-07-26 15:39:58.975 ERROR modules/k3s.go:201 wait k3s ready timeout 2023-07-26 15:39:58.997 ERROR modules/k3s.go:196 context had done, no need to wait to restart 2023-07-26 15:39:59.092 ERROR master/master.go:144 Post "https://127.0.0.1:6443/api/v1/namespaces": dial tcp 127.0.0.1:6443: connect: connection refused 2023-07-26 15:40:03.225 ERROR modules/k3s.go:97 Get ready err:Get "https://127.0.0.1:6443/readyz": dial tcp 127.0.0.1:6443: connect: connection refused 2023-07-26 15:40:04.225 ERROR modules/k3s.go:97 Get ready err:Get "https://127.0.0.1:6443/readyz": dial tcp 127.0.0.1:6443: connect: connection refused 2023-07-26 15:40:05.224 ERROR modules/k3s.go:97 Get ready err:Get "https://127.0.0.1:6443/readyz": dial tcp 127.0.0.1:6443: connect: connection refused 2023-07-26 15:40:06.223 ERROR modules/k3s.go:97 Get ready err:Get "https://127.0.0.1:6443/readyz": dial tcp 127.0.0.1:6443: connect: connection refused 2023-07-26 15:40:07.223 ERROR modules/k3s.go:97 Get ready err:Get "https://127.0.0.1:6443/readyz": dial tcp 127.0.0.1:6443: connect: connection refused 2023-07-26 15:40:08.224 ERROR modules/k3s.go:97 Get ready err:Get "https://127.0.0.1:6443/readyz": dial tcp 127.0.0.1:6443: connect: connection refused 2023-07-26 15:40:09.223 ERROR modules/k3s.go:97 Get ready err:Get "https://127.0.0.1:6443/readyz": dial tcp 127.0.0.1:6443: connect: connection refused 2023-07-26 15:40:10.223 ERROR modules/k3s.go:97 Get ready err:Get "https://127.0.0.1:6443/readyz": dial tcp 127.0.0.1:6443: connect: connection refused 2023-07-26 15:40:11.224 ERROR modules/k3s.go:97 Get ready err:Get "https://127.0.0.1:6443/readyz": dial tcp 127.0.0.1:6443: connect: connection refused 2023-07-26 15:40:12.223 ERROR modules/k3s.go:97 Get ready err:Get "https://127.0.0.1:6443/readyz": dial tcp 127.0.0.1:6443: connect: connection refused 2023-07-26 15:40:13.223 ERROR modules/k3s.go:97 Get ready err:Get "https://127.0.0.1:6443/readyz": dial tcp 127.0.0.1:6443: connect: connection refused 2023-07-26 15:40:14.223 ERROR modules/k3s.go:97 Get ready err:Get "https://127.0.0.1:6443/readyz": dial tcp 127.0.0.1:6443: connect: connection refused 2023-07-26 15:40:15.223 ERROR modules/k3s.go:97 Get ready err:Get "https://127.0.0.1:6443/readyz": dial tcp 127.0.0.1:6443: connect: connection refused 2023-07-26 15:40:16.223 ERROR modules/k3s.go:97 Get ready err:Get "https://127.0.0.1:6443/readyz": dial tcp 127.0.0.1:6443: connect: connection refused 2023-07-26 15:40:17.223 ERROR modules/k3s.go:97 Get ready err:Get "https://127.0.0.1:6443/readyz": dial tcp 127.0.0.1:6443: connect: connection refused 2023-07-26 15:40:18.223 ERROR modules/k3s.go:97 Get ready err:Get "https://127.0.0.1:6443/readyz": dial tcp 127.0.0.1:6443: connect: connection refused 2023-07-26 15:40:19.223 ERROR modules/k3s.go:97 Get ready err:Get "https://127.0.0.1:6443/readyz": dial tcp 127.0.0.1:6443: connect: connection refused 2023-07-26 15:40:20.225 ERROR modules/k3s.go:97 Get ready err:Get "https://127.0.0.1:6443/readyz": dial tcp 127.0.0.1:6443: connect: connection refused 2023-07-26 15:40:39.640 ERROR modules/envoy.go:50 Get ready err:Get "http://127.0.0.1:10000/ready": dial tcp 127.0.0.1:10000: connect: connection refused 2023-07-26 15:53:19.221 ERROR kusciascheduling/kusciascheduling.go:137 PreFilter failed for pod bob/fate-task-20230726155318-data-reader-0, failed to get task resource bob/ for pod 2023-07-26 16:26:13.524 ERROR controller/gateway.go:161 update gateway(name:xiaomi-kuscia-master namespace:kuscia-system) fail: Operation cannot be fulfilled on gateways.kuscia.secretflow "xiaomi-kuscia-master": the object has been modified; please apply your changes to the latest version and try again 2023-07-26 16:26:13.524 ERROR controller/gateway.go:102 sync gateway error: Operation cannot be fulfilled on gateways.kuscia.secretflow "xiaomi-kuscia-master": the object has been modified; please apply your changes to the latest version and try again 2023-07-26 17:52:51.674 ERROR kusciascheduling/kusciascheduling.go:137 PreFilter failed for pod alice/secretflow-task-20230726175251-single-psi-0, failed to get task resource alice/ for pod 2023-07-26 17:52:51.724 ERROR kusciascheduling/kusciascheduling.go:137 PreFilter failed for pod bob/secretflow-task-20230726175251-single-psi-0, failed to get task resource bob/ for pod

alice :
2023-07-26 15:02:38.915 ERROR modules/containerd.go:79 Unable to import pause image: failed to run command "/home/kuscia/bin/ctr -a=/home/kuscia/containerd/run/containerd.sock -n=k8s.io images import /home/kuscia/pause/pause.tar", detail-> exit status 1 2023-07-26 15:02:39.846 ERROR modules/containerd.go:79 Unable to import pause image: failed to run command "/home/kuscia/bin/ctr -a=/home/kuscia/containerd/run/containerd.sock -n=k8s.io images import /home/kuscia/pause/pause.tar", detail-> exit status 1 2023-07-26 15:02:40.879 ERROR modules/containerd.go:79 Unable to import pause image: failed to run command "/home/kuscia/bin/ctr -a=/home/kuscia/containerd/run/containerd.sock -n=k8s.io images import /home/kuscia/pause/pause.tar", detail-> exit status 1 2023-07-26 15:02:41.911 ERROR modules/containerd.go:79 Unable to import pause image: failed to run command "/home/kuscia/bin/ctr -a=/home/kuscia/containerd/run/containerd.sock -n=k8s.io images import /home/kuscia/pause/pause.tar", detail-> exit status 1 2023-07-26 15:02:42.842 ERROR modules/containerd.go:79 Unable to import pause image: failed to run command "/home/kuscia/bin/ctr -a=/home/kuscia/containerd/run/containerd.sock -n=k8s.io images import /home/kuscia/pause/pause.tar", detail-> exit status 1 2023-07-26 15:02:43.847 ERROR modules/containerd.go:79 Unable to import pause image: failed to run command "/home/kuscia/bin/ctr -a=/home/kuscia/containerd/run/containerd.sock -n=k8s.io images import /home/kuscia/pause/pause.tar", detail-> exit status 1 2023-07-26 15:02:44.856 ERROR modules/containerd.go:79 Unable to import pause image: failed to run command "/home/kuscia/bin/ctr -a=/home/kuscia/containerd/run/containerd.sock -n=k8s.io images import /home/kuscia/pause/pause.tar", detail-> exit status 1 2023-07-26 15:02:45.853 ERROR modules/containerd.go:79 Unable to import pause image: failed to run command "/home/kuscia/bin/ctr -a=/home/kuscia/containerd/run/containerd.sock -n=k8s.io images import /home/kuscia/pause/pause.tar", detail-> exit status 1 2023-07-26 15:02:46.845 ERROR modules/containerd.go:79 Unable to import pause image: failed to run command "/home/kuscia/bin/ctr -a=/home/kuscia/containerd/run/containerd.sock -n=k8s.io images import /home/kuscia/pause/pause.tar", detail-> exit status 1 2023-07-26 15:02:47.846 ERROR modules/containerd.go:79 Unable to import pause image: failed to run command "/home/kuscia/bin/ctr -a=/home/kuscia/containerd/run/containerd.sock -n=k8s.io images import /home/kuscia/pause/pause.tar", detail-> exit status 1 2023-07-26 15:14:55.167 ERROR controller/gateway.go:161 update gateway(name:xiaomi-kuscia-lite-alice namespace:alice) fail: Operation cannot be fulfilled on gateways.kuscia.secretflow "xiaomi-kuscia-lite-alice": the object has been modified; please apply your changes to the latest version and try again 2023-07-26 15:14:55.167 ERROR controller/gateway.go:102 sync gateway error: Operation cannot be fulfilled on gateways.kuscia.secretflow "xiaomi-kuscia-lite-alice": the object has been modified; please apply your changes to the latest version and try again 2023-07-26 15:15:58.440 ERROR controller/gateway.go:161 update gateway(name:xiaomi-kuscia-lite-alice namespace:alice) fail: Operation cannot be fulfilled on gateways.kuscia.secretflow "xiaomi-kuscia-lite-alice": the object has been modified; please apply your changes to the latest version and try again 2023-07-26 15:15:58.440 ERROR controller/gateway.go:102 sync gateway error: Operation cannot be fulfilled on gateways.kuscia.secretflow "xiaomi-kuscia-lite-alice": the object has been modified; please apply your changes to the latest version and try again 2023-07-26 15:40:13.247 ERROR modules/containerd.go:79 Unable to import pause image: failed to run command "/home/kuscia/bin/ctr -a=/home/kuscia/containerd/run/containerd.sock -n=k8s.io images import /home/kuscia/pause/pause.tar", detail-> exit status 1 2023-07-26 15:40:13.247 ERROR modules/containerd.go:121 wait containerd ready timeout 2023-07-26 15:40:13.248 ERROR modules/coredns.go:146 context canceled 2023-07-26 15:40:13.248 ERROR modules/domainroute.go:90 domain route wait ready failed with error: context canceled 2023-07-26 15:40:13.248 ERROR modules/transport.go:129 context canceled 2023-07-26 15:40:13.248 ERROR modules/envoy.go:146 context canceled 2023-07-26 15:40:13.248 ERROR modules/envoy.go:141 startup process failed at first time, so stop at once, error: start process(0) failed with context canceled 2023-07-26 15:40:13.248 ERROR modules/agent.go:97 context canceled 2023-07-26 15:40:18.124 ERROR modules/containerd.go:79 Unable to import pause image: failed to run command "/home/kuscia/bin/ctr -a=/home/kuscia/containerd/run/containerd.sock -n=k8s.io images import /home/kuscia/pause/pause.tar", detail-> exit status 1 2023-07-26 15:40:18.462 ERROR modules/containerd.go:79 Unable to import pause image: failed to run command "/home/kuscia/bin/ctr -a=/home/kuscia/containerd/run/containerd.sock -n=k8s.io images import /home/kuscia/pause/pause.tar", detail-> exit status 1 2023-07-26 15:40:19.475 ERROR modules/containerd.go:79 Unable to import pause image: failed to run command "/home/kuscia/bin/ctr -a=/home/kuscia/containerd/run/containerd.sock -n=k8s.io images import /home/kuscia/pause/pause.tar", detail-> exit status 1 2023-07-26 15:40:20.353 ERROR modules/containerd.go:79 Unable to import pause image: failed to run command "/home/kuscia/bin/ctr -a=/home/kuscia/containerd/run/containerd.sock -n=k8s.io images import /home/kuscia/pause/pause.tar", detail-> exit status 1 2023-07-26 15:40:28.624 ERROR modules/envoy.go:50 Get ready err:Get "http://127.0.0.1:10000/ready": dial tcp 127.0.0.1:10000: connect: connection refused 2023-07-26 15:40:29.596 ERROR modules/envoy.go:50 Get ready err:Get "http://127.0.0.1:10000/ready": dial tcp 127.0.0.1:10000: connect: connection refused 2023-07-26 15:40:30.595 ERROR modules/envoy.go:50 Get ready err:Get "http://127.0.0.1:10000/ready": dial tcp 127.0.0.1:10000: connect: connection refused 2023-07-26 15:40:31.595 ERROR modules/envoy.go:50 Get ready err:Get "http://127.0.0.1:10000/ready": dial tcp 127.0.0.1:10000: connect: connection refused 2023-07-26 15:40:32.596 ERROR modules/envoy.go:50 Get ready err:Get "http://127.0.0.1:10000/ready": dial tcp 127.0.0.1:10000: connect: connection refused 2023-07-26 15:40:33.596 ERROR modules/envoy.go:50 Get ready err:Get "http://127.0.0.1:10000/ready": dial tcp 127.0.0.1:10000: connect: connection refused 2023-07-26 15:40:34.596 ERROR modules/envoy.go:50 Get ready err:Get "http://127.0.0.1:10000/ready": dial tcp 127.0.0.1:10000: connect: connection refused 2023-07-26 15:40:35.596 ERROR modules/envoy.go:50 Get ready err:Get "http://127.0.0.1:10000/ready": dial tcp 127.0.0.1:10000: connect: connection refused 2023-07-26 15:40:36.596 ERROR modules/envoy.go:50 Get ready err:Get "http://127.0.0.1:10000/ready": dial tcp 127.0.0.1:10000: connect: connection refused 2023-07-26 15:40:37.596 ERROR modules/envoy.go:50 Get ready err:Get "http://127.0.0.1:10000/ready": dial tcp 127.0.0.1:10000: connect: connection refused 2023-07-26 15:40:38.595 ERROR modules/envoy.go:50 Get ready err:Get "http://127.0.0.1:10000/ready": dial tcp 127.0.0.1:10000: connect: connection refused 2023-07-26 15:40:39.596 ERROR modules/envoy.go:50 Get ready err:Get "http://127.0.0.1:10000/ready": dial tcp 127.0.0.1:10000: connect: connection refused 2023-07-26 15:42:25.032 ERROR controller/gateway.go:161 update gateway(name:xiaomi-kuscia-lite-alice namespace:alice) fail: Operation cannot be fulfilled on gateways.kuscia.secretflow "xiaomi-kuscia-lite-alice": the object has been modified; please apply your changes to the latest version and try again 2023-07-26 15:42:25.032 ERROR controller/gateway.go:102 sync gateway error: Operation cannot be fulfilled on gateways.kuscia.secretflow "xiaomi-kuscia-lite-alice": the object has been modified; please apply your changes to the latest version and try again 2023-07-26 16:26:13.537 ERROR controller/gateway.go:161 update gateway(name:xiaomi-kuscia-lite-alice namespace:alice) fail: Operation cannot be fulfilled on gateways.kuscia.secretflow "xiaomi-kuscia-lite-alice": the object has been modified; please apply your changes to the latest version and try again 2023-07-26 16:26:13.537 ERROR controller/gateway.go:102 sync gateway error: Operation cannot be fulfilled on gateways.kuscia.secretflow "xiaomi-kuscia-lite-alice": the object has been modified; please apply your changes to the latest version and try again

bob :
2023-07-26 15:03:13.287 ERROR modules/containerd.go:79 Unable to import pause image: failed to run command "/home/kuscia/bin/ctr -a=/home/kuscia/containerd/run/containerd.sock -n=k8s.io images import /home/kuscia/pause/pause.tar", detail-> exit status 1 2023-07-26 15:03:14.269 ERROR modules/containerd.go:79 Unable to import pause image: failed to run command "/home/kuscia/bin/ctr -a=/home/kuscia/containerd/run/containerd.sock -n=k8s.io images import /home/kuscia/pause/pause.tar", detail-> exit status 1 2023-07-26 15:03:15.278 ERROR modules/containerd.go:79 Unable to import pause image: failed to run command "/home/kuscia/bin/ctr -a=/home/kuscia/containerd/run/containerd.sock -n=k8s.io images import /home/kuscia/pause/pause.tar", detail-> exit status 1 2023-07-26 15:05:52.976 ERROR kuberuntime/kuberuntime_manager.go:713 Container "secretflow" start failed in pod "secretflow-task-20230726150347-single-psi-0_bob(254561d8-03ab-4d01-9b95-5a0f31808cf1)", containerMessage-> context deadline exceeded, err-> CreateContainerError 2023-07-26 15:05:52.976 ERROR framework/pod_workers.go:986 Error syncing pod "secretflow-task-20230726150347-single-psi-0_bob(254561d8-03ab-4d01-9b95-5a0f31808cf1)", skipping: failed to "StartContainer" for "secretflow" with CreateContainerError: "context deadline exceeded" 2023-07-26 15:05:52.983 ERROR kuberuntime/kuberuntime_manager.go:713 Container "secretflow" start failed in pod "secretflow-task-20230726150347-single-psi-0_bob(254561d8-03ab-4d01-9b95-5a0f31808cf1)", containerMessage-> failed to reserve container name "secretflow_secretflow-task-20230726150347-single-psi-0_bob_254561d8-03ab-4d01-9b95-5a0f31808cf1_0": name "secretflow_secretflow-task-20230726150347-single-psi-0_bob_254561d8-03ab-4d01-9b95-5a0f31808cf1_0" is reserved for "66d5665affb01305a3f122f348838236e1f23f3d27bbb992843cf8384d78e73b", err-> CreateContainerError 2023-07-26 15:05:52.983 ERROR framework/pod_workers.go:986 Error syncing pod "secretflow-task-20230726150347-single-psi-0_bob(254561d8-03ab-4d01-9b95-5a0f31808cf1)", skipping: failed to "StartContainer" for "secretflow" with CreateContainerError: "failed to reserve container name \"secretflow_secretflow-task-20230726150347-single-psi-0_bob_254561d8-03ab-4d01-9b95-5a0f31808cf1_0\": name \"secretflow_secretflow-task-20230726150347-single-psi-0_bob_254561d8-03ab-4d01-9b95-5a0f31808cf1_0\" is reserved for \"66d5665affb01305a3f122f348838236e1f23f3d27bbb992843cf8384d78e73b\"" 2023-07-26 15:14:55.152 ERROR controller/gateway.go:161 update gateway(name:xiaomi-kuscia-lite-bob namespace:bob) fail: Operation cannot be fulfilled on gateways.kuscia.secretflow "xiaomi-kuscia-lite-bob": the object has been modified; please apply your changes to the latest version and try again 2023-07-26 15:14:55.152 ERROR controller/gateway.go:102 sync gateway error: Operation cannot be fulfilled on gateways.kuscia.secretflow "xiaomi-kuscia-lite-bob": the object has been modified; please apply your changes to the latest version and try again 2023-07-26 15:15:58.443 ERROR controller/gateway.go:161 update gateway(name:xiaomi-kuscia-lite-bob namespace:bob) fail: Operation cannot be fulfilled on gateways.kuscia.secretflow "xiaomi-kuscia-lite-bob": the object has been modified; please apply your changes to the latest version and try again 2023-07-26 15:15:58.443 ERROR controller/gateway.go:102 sync gateway error: Operation cannot be fulfilled on gateways.kuscia.secretflow "xiaomi-kuscia-lite-bob": the object has been modified; please apply your changes to the latest version and try again 2023-07-26 15:40:13.246 ERROR modules/containerd.go:79 Unable to import pause image: failed to run command "/home/kuscia/bin/ctr -a=/home/kuscia/containerd/run/containerd.sock -n=k8s.io images import /home/kuscia/pause/pause.tar", detail-> exit status 1 2023-07-26 15:40:13.628 ERROR modules/containerd.go:79 Unable to import pause image: failed to run command "/home/kuscia/bin/ctr -a=/home/kuscia/containerd/run/containerd.sock -n=k8s.io images import /home/kuscia/pause/pause.tar", detail-> exit status 1 2023-07-26 15:40:13.628 ERROR modules/containerd.go:121 wait containerd ready timeout 2023-07-26 15:40:13.628 ERROR modules/coredns.go:146 context canceled 2023-07-26 15:40:13.629 ERROR modules/transport.go:129 context canceled 2023-07-26 15:40:13.629 ERROR modules/domainroute.go:90 domain route wait ready failed with error: context canceled 2023-07-26 15:40:18.213 ERROR modules/containerd.go:79 Unable to import pause image: failed to run command "/home/kuscia/bin/ctr -a=/home/kuscia/containerd/run/containerd.sock -n=k8s.io images import /home/kuscia/pause/pause.tar", detail-> exit status 1 2023-07-26 15:40:18.364 ERROR modules/containerd.go:79 Unable to import pause image: failed to run command "/home/kuscia/bin/ctr -a=/home/kuscia/containerd/run/containerd.sock -n=k8s.io images import /home/kuscia/pause/pause.tar", detail-> exit status 1 2023-07-26 15:40:19.396 ERROR modules/containerd.go:79 Unable to import pause image: failed to run command "/home/kuscia/bin/ctr -a=/home/kuscia/containerd/run/containerd.sock -n=k8s.io images import /home/kuscia/pause/pause.tar", detail-> exit status 1 2023-07-26 15:40:20.344 ERROR modules/containerd.go:79 Unable to import pause image: failed to run command "/home/kuscia/bin/ctr -a=/home/kuscia/containerd/run/containerd.sock -n=k8s.io images import /home/kuscia/pause/pause.tar", detail-> exit status 1 2023-07-26 15:40:22.641 ERROR modules/containerd.go:79 Unable to import pause image: failed to run command "/home/kuscia/bin/ctr -a=/home/kuscia/containerd/run/containerd.sock -n=k8s.io images import /home/kuscia/pause/pause.tar", detail-> exit status 1 2023-07-26 15:40:22.711 ERROR modules/containerd.go:79 Unable to import pause image: failed to run command "/home/kuscia/bin/ctr -a=/home/kuscia/containerd/run/containerd.sock -n=k8s.io images import /home/kuscia/pause/pause.tar", detail-> exit status 1 2023-07-26 15:40:23.340 ERROR modules/containerd.go:79 Unable to import pause image: failed to run command "/home/kuscia/bin/ctr -a=/home/kuscia/containerd/run/containerd.sock -n=k8s.io images import /home/kuscia/pause/pause.tar", detail-> exit status 1 2023-07-26 15:40:24.336 ERROR modules/containerd.go:79 Unable to import pause image: failed to run command "/home/kuscia/bin/ctr -a=/home/kuscia/containerd/run/containerd.sock -n=k8s.io images import /home/kuscia/pause/pause.tar", detail-> exit status 1 2023-07-26 15:40:25.326 ERROR modules/containerd.go:79 Unable to import pause image: failed to run command "/home/kuscia/bin/ctr -a=/home/kuscia/containerd/run/containerd.sock -n=k8s.io images import /home/kuscia/pause/pause.tar", detail-> exit status 1 2023-07-26 15:40:26.415 ERROR modules/containerd.go:79 Unable to import pause image: failed to run command "/home/kuscia/bin/ctr -a=/home/kuscia/containerd/run/containerd.sock -n=k8s.io images import /home/kuscia/pause/pause.tar", detail-> exit status 1 2023-07-26 15:40:27.316 ERROR modules/containerd.go:79 Unable to import pause image: failed to run command "/home/kuscia/bin/ctr -a=/home/kuscia/containerd/run/containerd.sock -n=k8s.io images import /home/kuscia/pause/pause.tar", detail-> exit status 1 2023-07-26 15:40:28.318 ERROR modules/containerd.go:79 Unable to import pause image: failed to run command "/home/kuscia/bin/ctr -a=/home/kuscia/containerd/run/containerd.sock -n=k8s.io images import /home/kuscia/pause/pause.tar", detail-> exit status 1 2023-07-26 15:40:29.317 ERROR modules/containerd.go:79 Unable to import pause image: failed to run command "/home/kuscia/bin/ctr -a=/home/kuscia/containerd/run/containerd.sock -n=k8s.io images import /home/kuscia/pause/pause.tar", detail-> exit status 1 2023-07-26 15:40:30.317 ERROR modules/containerd.go:79 Unable to import pause image: failed to run command "/home/kuscia/bin/ctr -a=/home/kuscia/containerd/run/containerd.sock -n=k8s.io images import /home/kuscia/pause/pause.tar", detail-> exit status 1 2023-07-26 15:40:31.309 ERROR modules/containerd.go:79 Unable to import pause image: failed to run command "/home/kuscia/bin/ctr -a=/home/kuscia/containerd/run/containerd.sock -n=k8s.io images import /home/kuscia/pause/pause.tar", detail-> exit status 1 2023-07-26 15:40:32.309 ERROR modules/containerd.go:79 Unable to import pause image: failed to run command "/home/kuscia/bin/ctr -a=/home/kuscia/containerd/run/containerd.sock -n=k8s.io images import /home/kuscia/pause/pause.tar", detail-> exit status 1 2023-07-26 15:40:38.916 ERROR modules/envoy.go:50 Get ready err:Get "http://127.0.0.1:10000/ready": dial tcp 127.0.0.1:10000: connect: connection refused 2023-07-26 15:40:39.916 ERROR modules/envoy.go:50 Get ready err:Get "http://127.0.0.1:10000/ready": dial tcp 127.0.0.1:10000: connect: connection refused 2023-07-26 15:42:25.028 ERROR controller/gateway.go:161 update gateway(name:xiaomi-kuscia-lite-bob namespace:bob) fail: Operation cannot be fulfilled on gateways.kuscia.secretflow "xiaomi-kuscia-lite-bob": the object has been modified; please apply your changes to the latest version and try again 2023-07-26 15:42:25.028 ERROR controller/gateway.go:102 sync gateway error: Operation cannot be fulfilled on gateways.kuscia.secretflow "xiaomi-kuscia-lite-bob": the object has been modified; please apply your changes to the latest version and try again 2023-07-26 16:26:13.999 ERROR controller/gateway.go:161 update gateway(name:xiaomi-kuscia-lite-bob namespace:bob) fail: Operation cannot be fulfilled on gateways.kuscia.secretflow "xiaomi-kuscia-lite-bob": the object has been modified; please apply your changes to the latest version and try again 2023-07-26 16:26:13.999 ERROR controller/gateway.go:102 sync gateway error: Operation cannot be fulfilled on gateways.kuscia.secretflow "xiaomi-kuscia-lite-bob": the object has been modified; please apply your changes to the latest version and try again

from kuscia.

cs1317 avatar cs1317 commented on September 27, 2024

master中 /home/kuscia/var/k3s/server/tls/server-ca.crt是存在的
image

from kuscia.

gshilei avatar gshilei commented on September 27, 2024

@cs1317 麻烦再贴一下 master 容器中k3s 日志 /home/kuscia/var/logs/k3s.log

from kuscia.

cs1317 avatar cs1317 commented on September 27, 2024

head -n 100 k3s.log
time="2023-07-26T15:02:02+08:00" level=warning msg="Webhooks and apiserver aggregation may not function properly without an agent; please set egress-selector-mode to 'cluster' or 'pod'" time="2023-07-26T15:02:02+08:00" level=info msg="Starting k3s v1.26.5+k3s1 (7cefebea)" time="2023-07-26T15:02:02+08:00" level=info msg="Configuring sqlite3 database connection pooling: maxIdleConns=2, maxOpenConns=0, connMaxLifetime=0s" time="2023-07-26T15:02:02+08:00" level=info msg="Configuring database table schema and indexes, this may take a moment..." time="2023-07-26T15:02:02+08:00" level=info msg="Database tables and indexes are up to date" time="2023-07-26T15:02:02+08:00" level=info msg="Kine available at unix://kine.sock" time="2023-07-26T15:02:03+08:00" level=info msg="generated self-signed CA certificate CN=k3s-client-ca@1690354923: notBefore=2023-07-26 07:02:03.081147645 +0000 UTC notAfter=2033-07-23 07:02:03.081147645 +0000 UTC" time="2023-07-26T15:02:03+08:00" level=info msg="certificate CN=system:admin,O=system:masters signed by CN=k3s-client-ca@1690354923: notBefore=2023-07-26 07:02:03 +0000 UTC notAfter=2024-07-25 07:02:03 +0000 UTC" time="2023-07-26T15:02:03+08:00" level=info msg="certificate CN=system:kube-controller-manager signed by CN=k3s-client-ca@1690354923: notBefore=2023-07-26 07:02:03 +0000 UTC notAfter=2024-07-25 07:02:03 +0000 UTC" time="2023-07-26T15:02:03+08:00" level=info msg="certificate CN=system:kube-scheduler signed by CN=k3s-client-ca@1690354923: notBefore=2023-07-26 07:02:03 +0000 UTC notAfter=2024-07-25 07:02:03 +0000 UTC" time="2023-07-26T15:02:03+08:00" level=info msg="certificate CN=system:apiserver,O=system:masters signed by CN=k3s-client-ca@1690354923: notBefore=2023-07-26 07:02:03 +0000 UTC notAfter=2024-07-25 07:02:03 +0000 UTC" time="2023-07-26T15:02:03+08:00" level=info msg="certificate CN=system:kube-proxy signed by CN=k3s-client-ca@1690354923: notBefore=2023-07-26 07:02:03 +0000 UTC notAfter=2024-07-25 07:02:03 +0000 UTC" time="2023-07-26T15:02:03+08:00" level=info msg="certificate CN=system:k3s-controller signed by CN=k3s-client-ca@1690354923: notBefore=2023-07-26 07:02:03 +0000 UTC notAfter=2024-07-25 07:02:03 +0000 UTC" time="2023-07-26T15:02:03+08:00" level=info msg="certificate CN=k3s-cloud-controller-manager signed by CN=k3s-client-ca@1690354923: notBefore=2023-07-26 07:02:03 +0000 UTC notAfter=2024-07-25 07:02:03 +0000 UTC" time="2023-07-26T15:02:03+08:00" level=info msg="generated self-signed CA certificate CN=k3s-server-ca@1690354923: notBefore=2023-07-26 07:02:03.086314283 +0000 UTC notAfter=2033-07-23 07:02:03.086314283 +0000 UTC" time="2023-07-26T15:02:03+08:00" level=info msg="certificate CN=kube-apiserver signed by CN=k3s-server-ca@1690354923: notBefore=2023-07-26 07:02:03 +0000 UTC notAfter=2024-07-25 07:02:03 +0000 UTC" time="2023-07-26T15:02:03+08:00" level=info msg="generated self-signed CA certificate CN=k3s-request-header-ca@1690354923: notBefore=2023-07-26 07:02:03.08757997 +0000 UTC notAfter=2033-07-23 07:02:03.08757997 +0000 UTC" time="2023-07-26T15:02:03+08:00" level=info msg="certificate CN=system:auth-proxy signed by CN=k3s-request-header-ca@1690354923: notBefore=2023-07-26 07:02:03 +0000 UTC notAfter=2024-07-25 07:02:03 +0000 UTC" time="2023-07-26T15:02:03+08:00" level=info msg="generated self-signed CA certificate CN=etcd-server-ca@1690354923: notBefore=2023-07-26 07:02:03.088598811 +0000 UTC notAfter=2033-07-23 07:02:03.088598811 +0000 UTC" time="2023-07-26T15:02:03+08:00" level=info msg="certificate CN=etcd-server signed by CN=etcd-server-ca@1690354923: notBefore=2023-07-26 07:02:03 +0000 UTC notAfter=2024-07-25 07:02:03 +0000 UTC" time="2023-07-26T15:02:03+08:00" level=info msg="certificate CN=etcd-client signed by CN=etcd-server-ca@1690354923: notBefore=2023-07-26 07:02:03 +0000 UTC notAfter=2024-07-25 07:02:03 +0000 UTC" time="2023-07-26T15:02:03+08:00" level=info msg="generated self-signed CA certificate CN=etcd-peer-ca@1690354923: notBefore=2023-07-26 07:02:03.089942846 +0000 UTC notAfter=2033-07-23 07:02:03.089942846 +0000 UTC" time="2023-07-26T15:02:03+08:00" level=info msg="certificate CN=etcd-peer signed by CN=etcd-peer-ca@1690354923: notBefore=2023-07-26 07:02:03 +0000 UTC notAfter=2024-07-25 07:02:03 +0000 UTC" time="2023-07-26T15:02:03+08:00" level=info msg="Saving cluster bootstrap data to datastore" time="2023-07-26T15:02:03+08:00" level=info msg="certificate CN=k3s,O=k3s signed by CN=k3s-server-ca@1690354923: notBefore=2023-07-26 07:02:03 +0000 UTC notAfter=2024-07-25 07:02:03 +0000 UTC" time="2023-07-26T15:02:03+08:00" level=warning msg="dynamiclistener [::]:6443: no cached certificate available for preload - deferring certificate load until storage initialization or first client request" time="2023-07-26T15:02:03+08:00" level=info msg="Active TLS secret / (ver=) (count 10): map[listener.cattle.io/cn-10.43.0.1:10.43.0.1 listener.cattle.io/cn-127.0.0.1:127.0.0.1 listener.cattle.io/cn-172.18.0.2:172.18.0.2 listener.cattle.io/cn-__1-f16284:::1 listener.cattle.io/cn-kubernetes:kubernetes listener.cattle.io/cn-kubernetes.default:kubernetes.default listener.cattle.io/cn-kubernetes.default.svc:kubernetes.default.svc listener.cattle.io/cn-kubernetes.default.svc.cluster.local:kubernetes.default.svc.cluster.local listener.cattle.io/cn-localhost:localhost listener.cattle.io/cn-xiaomi-kuscia-master:xiaomi-kuscia-master listener.cattle.io/fingerprint:SHA1=99CE9A8C8941A3059AD52CE46BA8D0916CE697BE]" time="2023-07-26T15:02:03+08:00" level=info msg="Running kube-apiserver --advertise-port=6443 --allow-privileged=true --anonymous-auth=false --api-audiences=https://kubernetes.default.svc.cluster.local,k3s --authorization-mode=Node,RBAC --bind-address=127.0.0.1 --cert-dir=/home/kuscia/var/k3s/server/tls/temporary-certs --client-ca-file=/home/kuscia/var/k3s/server/tls/client-ca.crt --egress-selector-config-file=/home/kuscia/var/k3s/server/etc/egress-selector-config.yaml --enable-admission-plugins=NodeRestriction --enable-aggregator-routing=true --enable-bootstrap-token-auth=true --etcd-servers=unix://kine.sock --feature-gates=JobTrackingWithFinalizers=true --kubelet-certificate-authority=/home/kuscia/var/k3s/server/tls/server-ca.crt --kubelet-client-certificate=/home/kuscia/var/k3s/server/tls/client-kube-apiserver.crt --kubelet-client-key=/home/kuscia/var/k3s/server/tls/client-kube-apiserver.key --kubelet-preferred-address-types=InternalIP,ExternalIP,Hostname --profiling=false --proxy-client-cert-file=/home/kuscia/var/k3s/server/tls/client-auth-proxy.crt --proxy-client-key-file=/home/kuscia/var/k3s/server/tls/client-auth-proxy.key --requestheader-allowed-names=system:auth-proxy --requestheader-client-ca-file=/home/kuscia/var/k3s/server/tls/request-header-ca.crt --requestheader-extra-headers-prefix=X-Remote-Extra- --requestheader-group-headers=X-Remote-Group --requestheader-username-headers=X-Remote-User --secure-port=6444 --service-account-issuer=https://kubernetes.default.svc.cluster.local --service-account-key-file=/home/kuscia/var/k3s/server/tls/service.key --service-account-signing-key-file=/home/kuscia/var/k3s/server/tls/service.current.key --service-cluster-ip-range=10.43.0.0/16 --service-node-port-range=30000-32767 --storage-backend=etcd3 --tls-cert-file=/home/kuscia/var/k3s/server/tls/serving-kube-apiserver.crt --tls-cipher-suites=TLS_ECDHE_ECDSA_WITH_AES_256_GCM_SHA384,TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384,TLS_ECDHE_ECDSA_WITH_AES_128_GCM_SHA256,TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256,TLS_ECDHE_ECDSA_WITH_CHACHA20_POLY1305,TLS_ECDHE_RSA_WITH_CHACHA20_POLY1305 --tls-private-key-file=/home/kuscia/var/k3s/server/tls/serving-kube-apiserver.key" time="2023-07-26T15:02:03+08:00" level=info msg="Running kube-controller-manager --allocate-node-cidrs=true --authentication-kubeconfig=/home/kuscia/var/k3s/server/cred/controller.kubeconfig --authorization-kubeconfig=/home/kuscia/var/k3s/server/cred/controller.kubeconfig --bind-address=127.0.0.1 --cluster-cidr=10.42.0.0/16 --cluster-signing-kube-apiserver-client-cert-file=/home/kuscia/var/k3s/server/tls/client-ca.nochain.crt --cluster-signing-kube-apiserver-client-key-file=/home/kuscia/var/k3s/server/tls/client-ca.key --cluster-signing-kubelet-client-cert-file=/home/kuscia/var/k3s/server/tls/client-ca.nochain.crt --cluster-signing-kubelet-client-key-file=/home/kuscia/var/k3s/server/tls/client-ca.key --cluster-signing-kubelet-serving-cert-file=/home/kuscia/var/k3s/server/tls/server-ca.nochain.crt --cluster-signing-kubelet-serving-key-file=/home/kuscia/var/k3s/server/tls/server-ca.key --cluster-signing-legacy-unknown-cert-file=/home/kuscia/var/k3s/server/tls/server-ca.nochain.crt --cluster-signing-legacy-unknown-key-file=/home/kuscia/var/k3s/server/tls/server-ca.key --controllers=*,tokencleaner --feature-gates=JobTrackingWithFinalizers=true --kubeconfig=/home/kuscia/var/k3s/server/cred/controller.kubeconfig --leader-elect=false --profiling=false --root-ca-file=/home/kuscia/var/k3s/server/tls/server-ca.crt --secure-port=10257 --service-account-private-key-file=/home/kuscia/var/k3s/server/tls/service.current.key --service-cluster-ip-range=10.43.0.0/16 --use-service-account-credentials=true" time="2023-07-26T15:02:03+08:00" level=info msg="Waiting for API server to become available" W0726 15:02:03.893728 52 feature_gate.go:241] Setting GA feature gate JobTrackingWithFinalizers=true. It will be removed in a future release. time="2023-07-26T15:02:03+08:00" level=info msg="Server node token is available at /home/kuscia/var/k3s/server/token" time="2023-07-26T15:02:03+08:00" level=info msg="To join server node to cluster: k3s server -s https://0.0.0.0:6443 -t ${SERVER_NODE_TOKEN}" time="2023-07-26T15:02:03+08:00" level=info msg="Agent node token is available at /home/kuscia/var/k3s/server/agent-token" time="2023-07-26T15:02:03+08:00" level=info msg="To join agent node to cluster: k3s agent -s https://0.0.0.0:6443 -t ${AGENT_NODE_TOKEN}" time="2023-07-26T15:02:03+08:00" level=info msg="Wrote kubeconfig /home/kuscia/etc/kubeconfig" time="2023-07-26T15:02:03+08:00" level=info msg="Run: k3s kubectl" I0726 15:02:04.912717 52 server.go:569] external host was not specified, using 172.18.0.2 I0726 15:02:04.961511 52 server.go:171] Version: v1.26.5+k3s1 I0726 15:02:04.961540 52 server.go:173] "Golang settings" GOGC="" GOMAXPROCS="" GOTRACEBACK="" time="2023-07-26T15:02:05+08:00" level=info msg="certificate CN=k3s,O=k3s signed by CN=k3s-server-ca@1690354923: notBefore=2023-07-26 07:02:03 +0000 UTC notAfter=2024-07-25 07:02:05 +0000 UTC" time="2023-07-26T15:02:05+08:00" level=info msg="Active TLS secret / (ver=) (count 11): map[listener.cattle.io/cn-0.0.0.0:0.0.0.0 listener.cattle.io/cn-10.43.0.1:10.43.0.1 listener.cattle.io/cn-127.0.0.1:127.0.0.1 listener.cattle.io/cn-172.18.0.2:172.18.0.2 listener.cattle.io/cn-__1-f16284:::1 listener.cattle.io/cn-kubernetes:kubernetes listener.cattle.io/cn-kubernetes.default:kubernetes.default listener.cattle.io/cn-kubernetes.default.svc:kubernetes.default.svc listener.cattle.io/cn-kubernetes.default.svc.cluster.local:kubernetes.default.svc.cluster.local listener.cattle.io/cn-localhost:localhost listener.cattle.io/cn-xiaomi-kuscia-master:xiaomi-kuscia-master listener.cattle.io/fingerprint:SHA1=9AC6ACCA436E8A09B0B4AD8CF7B011F5798C86CC]" I0726 15:02:05.553476 52 shared_informer.go:270] Waiting for caches to sync for node_authorizer I0726 15:02:05.588054 52 plugins.go:158] Loaded 12 mutating admission controller(s) successfully in the following order: NamespaceLifecycle,LimitRanger,ServiceAccount,NodeRestriction,TaintNodesByCondition,Priority,DefaultTolerationSeconds,DefaultStorageClass,StorageObjectInUseProtection,RuntimeClass,DefaultIngressClass,MutatingAdmissionWebhook. I0726 15:02:05.588083 52 plugins.go:161] Loaded 12 validating admission controller(s) successfully in the following order: LimitRanger,ServiceAccount,PodSecurity,Priority,PersistentVolumeClaimResize,RuntimeClass,CertificateApproval,CertificateSigning,CertificateSubjectRestriction,ValidatingAdmissionPolicy,ValidatingAdmissionWebhook,ResourceQuota. time="2023-07-26T15:02:05+08:00" level=info msg="certificate CN=xiaomi-kuscia-master signed by CN=k3s-server-ca@1690354923: notBefore=2023-07-26 07:02:03 +0000 UTC notAfter=2024-07-25 07:02:05 +0000 UTC" W0726 15:02:05.839581 52 genericapiserver.go:660] Skipping API apiextensions.k8s.io/v1beta1 because it has no resources. I0726 15:02:05.840586 52 instance.go:277] Using reconciler: lease time="2023-07-26T15:02:06+08:00" level=info msg="certificate CN=system:node:xiaomi-kuscia-master,O=system:nodes signed by CN=k3s-client-ca@1690354923: notBefore=2023-07-26 07:02:03 +0000 UTC notAfter=2024-07-25 07:02:06 +0000 UTC" I0726 15:02:06.240038 52 instance.go:621] API group "internal.apiserver.k8s.io" is not enabled, skipping. I0726 15:02:06.352715 52 instance.go:621] API group "resource.k8s.io" is not enabled, skipping. I0726 15:02:06.462378 52 cert_rotation.go:137] Starting client certificate rotation controller time="2023-07-26T15:02:06+08:00" level=info msg="Connecting to proxy" url="wss://0.0.0.0:6443/v1-k3s/connect" I0726 15:02:06.463406 52 cert_rotation.go:137] Starting client certificate rotation controller time="2023-07-26T15:02:06+08:00" level=info msg="Handling backend connection request [xiaomi-kuscia-master]" I0726 15:02:06.465147 52 cacher.go:435] cacher (prioritylevelconfigurations.flowcontrol.apiserver.k8s.io): initialized I0726 15:02:06.465162 52 watch_cache.go:565] Replace watchCache (rev: 3) I0726 15:02:06.468448 52 cacher.go:435] cacher (prioritylevelconfigurations.flowcontrol.apiserver.k8s.io): initialized I0726 15:02:06.468464 52 watch_cache.go:565] Replace watchCache (rev: 3) I0726 15:02:06.469148 52 cacher.go:435] cacher (controllerrevisions.apps): initialized I0726 15:02:06.469163 52 watch_cache.go:565] Replace watchCache (rev: 3) I0726 15:02:06.472234 52 storage_factory.go:274] storing tokenreviews.authentication.k8s.io in authentication.k8s.io/v1, reading as authentication.k8s.io/__internal from storagebackend.Config{Type:"etcd3", Prefix:"/registry", Transport:storagebackend.TransportConfig{ServerList:[]string{"unix://kine.sock"}, KeyFile:"", CertFile:"", TrustedCAFile:"", EgressLookup:(egressselector.Lookup)(0x358e6e0), TracerProvider:trace.noopTracerProvider{}}, Paging:true, Codec:runtime.Codec(nil), EncodeVersioner:runtime.GroupVersioner(nil), Transformer:value.Transformer(nil), CompactionInterval:300000000000, CountMetricPollPeriod:60000000000, DBMetricPollInterval:30000000000, HealthcheckTimeout:2000000000, ReadycheckTimeout:2000000000, LeaseManagerConfig:etcd3.LeaseManagerConfig{ReuseDurationSeconds:60, MaxObjectCount:1000}, StorageObjectCountTracker:(*request.objectCountTracker)(0xc000ec2630)} I0726 15:02:06.472283 52 etcd.go:383] "Using watch cache" resource="tokenreviews.authentication.k8s.io" W0726 15:02:06.472310 52 genericapiserver.go:660] Skipping API authentication.k8s.io/v1beta1 because it has no resources. W0726 15:02:06.472316 52 genericapiserver.go:660] Skipping API authentication.k8s.io/v1alpha1 because it has no resources. I0726 15:02:06.473671 52 storage_factory.go:274] storing localsubjectaccessreviews.authorization.k8s.io in authorization.k8s.io/v1, reading as authorization.k8s.io/__internal from storagebackend.Config{Type:"etcd3", Prefix:"/registry", Transport:storagebackend.TransportConfig{ServerList:[]string{"unix://kine.sock"}, KeyFile:"", CertFile:"", TrustedCAFile:"", EgressLookup:(egressselector.Lookup)(0x358e6e0), TracerProvider:trace.noopTracerProvider{}}, Paging:true, Codec:runtime.Codec(nil), EncodeVersioner:runtime.GroupVersioner(nil), Transformer:value.Transformer(nil), CompactionInterval:300000000000, CountMetricPollPeriod:60000000000, DBMetricPollInterval:30000000000, HealthcheckTimeout:2000000000, ReadycheckTimeout:2000000000, LeaseManagerConfig:etcd3.LeaseManagerConfig{ReuseDurationSeconds:60, MaxObjectCount:1000}, StorageObjectCountTracker:(*request.objectCountTracker)(0xc000ec2630)} I0726 15:02:06.473700 52 etcd.go:383] "Using watch cache" resource="localsubjectaccessreviews.authorization.k8s.io" I0726 15:02:06.473795 52 storage_factory.go:274] storing selfsubjectaccessreviews.authorization.k8s.io in authorization.k8s.io/v1, reading as authorization.k8s.io/__internal from storagebackend.Config{Type:"etcd3", Prefix:"/registry", Transport:storagebackend.TransportConfig{ServerList:[]string{"unix://kine.sock"}, KeyFile:"", CertFile:"", TrustedCAFile:"", EgressLookup:(egressselector.Lookup)(0x358e6e0), TracerProvider:trace.noopTracerProvider{}}, Paging:true, Codec:runtime.Codec(nil), EncodeVersioner:runtime.GroupVersioner(nil), Transformer:value.Transformer(nil), CompactionInterval:300000000000, CountMetricPollPeriod:60000000000, DBMetricPollInterval:30000000000, HealthcheckTimeout:2000000000, ReadycheckTimeout:2000000000, LeaseManagerConfig:etcd3.LeaseManagerConfig{ReuseDurationSeconds:60, MaxObjectCount:1000}, StorageObjectCountTracker:(*request.objectCountTracker)(0xc000ec2630)} I0726 15:02:06.473844 52 etcd.go:383] "Using watch cache" resource="selfsubjectaccessreviews.authorization.k8s.io" I0726 15:02:06.474053 52 storage_factory.go:274] storing selfsubjectrulesreviews.authorization.k8s.io in authorization.k8s.io/v1, reading as authorization.k8s.io/__internal from storagebackend.Config{Type:"etcd3", Prefix:"/registry", Transport:storagebackend.TransportConfig{ServerList:[]string{"unix://kine.sock"}, KeyFile:"", CertFile:"", TrustedCAFile:"", EgressLookup:(egressselector.Lookup)(0x358e6e0), TracerProvider:trace.noopTracerProvider{}}, Paging:true, Codec:runtime.Codec(nil), EncodeVersioner:runtime.GroupVersioner(nil), Transformer:value.Transformer(nil), CompactionInterval:300000000000, CountMetricPollPeriod:60000000000, DBMetricPollInterval:30000000000, HealthcheckTimeout:2000000000, ReadycheckTimeout:2000000000, LeaseManagerConfig:etcd3.LeaseManagerConfig{ReuseDurationSeconds:60, MaxObjectCount:1000}, StorageObjectCountTracker:(*request.objectCountTracker)(0xc000ec2630)} I0726 15:02:06.474090 52 etcd.go:383] "Using watch cache" resource="selfsubjectrulesreviews.authorization.k8s.io" I0726 15:02:06.474171 52 storage_factory.go:274] storing subjectaccessreviews.authorization.k8s.io in authorization.k8s.io/v1, reading as authorization.k8s.io/__internal from storagebackend.Config{Type:"etcd3", Prefix:"/registry", Transport:storagebackend.TransportConfig{ServerList:[]string{"unix://kine.sock"}, KeyFile:"", CertFile:"", TrustedCAFile:"", EgressLookup:(egressselector.Lookup)(0x358e6e0), TracerProvider:trace.noopTracerProvider{}}, Paging:true, Codec:runtime.Codec(nil), EncodeVersioner:runtime.GroupVersioner(nil), Transformer:value.Transformer(nil), CompactionInterval:300000000000, CountMetricPollPeriod:60000000000, DBMetricPollInterval:30000000000, HealthcheckTimeout:2000000000, ReadycheckTimeout:2000000000, LeaseManagerConfig:etcd3.LeaseManagerConfig{ReuseDurationSeconds:60, MaxObjectCount:1000}, StorageObjectCountTracker:(*request.objectCountTracker)(0xc000ec2630)} I0726 15:02:06.474205 52 etcd.go:383] "Using watch cache" resource="subjectaccessreviews.authorization.k8s.io" W0726 15:02:06.474253 52 genericapiserver.go:660] Skipping API authorization.k8s.io/v1beta1 because it has no resources. I0726 15:02:06.475705 52 storage_factory.go:274] storing horizontalpodautoscalers.autoscaling in autoscaling/v1, reading as autoscaling/__internal from storagebackend.Config{Type:"etcd3", Prefix:"/registry", Transport:storagebackend.TransportConfig{ServerList:[]string{"unix://kine.sock"}, KeyFile:"", CertFile:"", TrustedCAFile:"", EgressLookup:(egressselector.Lookup)(0x358e6e0), TracerProvider:trace.noopTracerProvider{}}, Paging:true, Codec:runtime.Codec(nil), EncodeVersioner:runtime.GroupVersioner(nil), Transformer:value.Transformer(nil), CompactionInterval:300000000000, CountMetricPollPeriod:60000000000, DBMetricPollInterval:30000000000, HealthcheckTimeout:2000000000, ReadycheckTimeout:2000000000, LeaseManagerConfig:etcd3.LeaseManagerConfig{ReuseDurationSeconds:60, MaxObjectCount:1000}, StorageObjectCountTracker:(*request.objectCountTracker)(0xc000ec2630)} I0726 15:02:06.475745 52 etcd.go:383] "Using watch cache" resource="horizontalpodautoscalers.autoscaling" I0726 15:02:06.475901 52 storage_factory.go:274] storing horizontalpodautoscalers.autoscaling in autoscaling/v1, reading as autoscaling/__internal from storagebackend.Config{Type:"etcd3", Prefix:"/registry", Transport:storagebackend.TransportConfig{ServerList:[]string{"unix://kine.sock"}, KeyFile:"", CertFile:"", TrustedCAFile:"", EgressLookup:(egressselector.Lookup)(0x358e6e0), TracerProvider:trace.noopTracerProvider{}}, Paging:true, Codec:runtime.Codec(nil), EncodeVersioner:runtime.GroupVersioner(nil), Transformer:value.Transformer(nil), CompactionInterval:300000000000, CountMetricPollPeriod:60000000000, DBMetricPollInterval:30000000000, HealthcheckTimeout:2000000000, ReadycheckTimeout:2000000000, LeaseManagerConfig:etcd3.LeaseManagerConfig{ReuseDurationSeconds:60, MaxObjectCount:1000}, StorageObjectCountTracker:(*request.objectCountTracker)(0xc000ec2630)} I0726 15:02:06.475934 52 etcd.go:383] "Using watch cache" resource="horizontalpodautoscalers.autoscaling" I0726 15:02:06.476141 52 cacher.go:435] cacher (mutatingwebhookconfigurations.admissionregistration.k8s.io): initialized I0726 15:02:06.476155 52 watch_cache.go:565] Replace watchCache (rev: 3) I0726 15:02:06.476591 52 cacher.go:435] cacher (validatingwebhookconfigurations.admissionregistration.k8s.io): initialized I0726 15:02:06.476607 52 watch_cache.go:565] Replace watchCache (rev: 3) I0726 15:02:06.477593 52 cacher.go:435] cacher (prioritylevelconfigurations.flowcontrol.apiserver.k8s.io): initialized I0726 15:02:06.477606 52 watch_cache.go:565] Replace watchCache (rev: 3) I0726 15:02:06.478675 52 storage_factory.go:274] storing horizontalpodautoscalers.autoscaling in autoscaling/v1, reading as autoscaling/__internal from storagebackend.Config{Type:"etcd3", Prefix:"/registry", Transport:storagebackend.TransportConfig{ServerList:[]string{"unix://kine.sock"}, KeyFile:"", CertFile:"", TrustedCAFile:"", EgressLookup:(egressselector.Lookup)(0x358e6e0), TracerProvider:trace.noopTracerProvider{}}, Paging:true, Codec:runtime.Codec(nil), EncodeVersioner:runtime.GroupVersioner(nil), Transformer:value.Transformer(nil), CompactionInterval:300000000000, CountMetricPollPeriod:60000000000, DBMetricPollInterval:30000000000, HealthcheckTimeout:2000000000, ReadycheckTimeout:2000000000, LeaseManagerConfig:etcd3.LeaseManagerConfig{ReuseDurationSeconds:60, MaxObjectCount:1000}, StorageObjectCountTracker:(*request.objectCountTracker)(0xc000ec2630)} I0726 15:02:06.478715 52 etcd.go:383] "Using watch cache" resource="horizontalpodautoscalers.autoscaling" I0726 15:02:06.492009 52 storage_factory.go:274] storing horizontalpodautoscalers.autoscaling in autoscaling/v1, reading as autoscaling/__internal from storagebackend.Config{Type:"etcd3", Prefix:"/registry", Transport:storagebackend.TransportConfig{ServerList:[]string{"unix://kine.sock"}, KeyFile:"", CertFile:"", TrustedCAFile:"", EgressLookup:(egressselector.Lookup)(0x358e6e0), TracerProvider:trace.noopTracerProvider{}}, Paging:true, Codec:runtime.Codec(nil), EncodeVersioner:runtime.GroupVersioner(nil), Transformer:value.Transformer(nil), CompactionInterval:300000000000, CountMetricPollPeriod:60000000000, DBMetricPollInterval:30000000000, HealthcheckTimeout:2000000000, ReadycheckTimeout:2000000000, LeaseManagerConfig:etcd3.LeaseManagerConfig{ReuseDurationSeconds:60, MaxObjectCount:1000}, StorageObjectCountTracker:(*request.objectCountTracker)(0xc000ec2630)} I0726 15:02:06.492060 52 etcd.go:383] "Using watch cache" resource="horizontalpodautoscalers.autoscaling" W0726 15:02:06.492133 52 genericapiserver.go:660] Skipping API autoscaling/v2beta1 because it has no resources. W0726 15:02:06.492144 52 genericapiserver.go:660] Skipping API autoscaling/v2beta2 because it has no resources. I0726 15:02:06.495126 52 storage_factory.go:274] storing cronjobs.batch in batch/v1, reading as batch/__internal from storagebackend.Config{Type:"etcd3", Prefix:"/registry", Transport:storagebackend.TransportConfig{ServerList:[]string{"unix://kine.sock"}, KeyFile:"", CertFile:"", TrustedCAFile:"", EgressLookup:(egressselector.Lookup)(0x358e6e0), TracerProvider:trace.noopTracerProvider{}}, Paging:true, Codec:runtime.Codec(nil), EncodeVersioner:runtime.GroupVersioner(nil), Transformer:value.Transformer(nil), CompactionInterval:300000000000, CountMetricPollPeriod:60000000000, DBMetricPollInterval:30000000000, HealthcheckTimeout:2000000000, ReadycheckTimeout:2000000000, LeaseManagerConfig:etcd3.LeaseManagerConfig{ReuseDurationSeconds:60, MaxObjectCount:1000}, StorageObjectCountTracker:(*request.objectCountTracker)(0xc000ec2630)} I0726 15:02:06.495159 52 etcd.go:383] "Using watch cache" resource="cronjobs.batch" I0726 15:02:06.495269 52 storage_factory.go:274] storing cronjobs.batch in batch/v1, reading as batch/__internal from storagebackend.Config{Type:"etcd3", Prefix:"/registry", Transport:storagebackend.TransportConfig{ServerList:[]string{"unix://kine.sock"}, KeyFile:"", CertFile:"", TrustedCAFile:"", EgressLookup:(egressselector.Lookup)(0x358e6e0), TracerProvider:trace.noopTracerProvider{}}, Paging:true, Codec:runtime.Codec(nil), EncodeVersioner:runtime.GroupVersioner(nil), Transformer:value.Transformer(nil), CompactionInterval:300000000000, CountMetricPollPeriod:60000000000, DBMetricPollInterval:30000000000, HealthcheckTimeout:2000000000, ReadycheckTimeout:2000000000, LeaseManagerConfig:etcd3.LeaseManagerConfig{ReuseDurationSeconds:60, MaxObjectCount:1000}, StorageObjectCountTracker:(*request.objectCountTracker)(0xc000ec2630)} I0726 15:02:06.495286 52 etcd.go:383] "Using watch cache" resource="cronjobs.batch" I0726 15:02:06.495551 52 storage_factory.go:274] storing jobs.batch in batch/v1, reading as batch/__internal from storagebackend.Config{Type:"etcd3", Prefix:"/registry", Transport:storagebackend.TransportConfig{ServerList:[]string{"unix://kine.sock"}, KeyFile:"", CertFile:"", TrustedCAFile:"", EgressLookup:(egressselector.Lookup)(0x358e6e0), TracerProvider:trace.noopTracerProvider{}}, Paging:true, Codec:runtime.Codec(nil), EncodeVersioner:runtime.GroupVersioner(nil), Transformer:value.Transformer(nil), CompactionInterval:300000000000, CountMetricPollPeriod:60000000000, DBMetricPollInterval:30000000000, HealthcheckTimeout:2000000000, ReadycheckTimeout:2000000000, LeaseManagerConfig:etcd3.LeaseManagerConfig{ReuseDurationSeconds:60, MaxObjectCount:1000}, StorageObjectCountTracker:(*request.objectCountTracker)(0xc000ec2630)} I0726 15:02:06.495571 52 etcd.go:383] "Using watch cache" resource="jobs.batch" I0726 15:02:06.495669 52 storage_factory.go:274] storing jobs.batch in batch/v1, reading as batch/__internal from storagebackend.Config{Type:"etcd3", Prefix:"/registry", Transport:storagebackend.TransportConfig{ServerList:[]string{"unix://kine.sock"}, KeyFile:"", CertFile:"", TrustedCAFile:"", EgressLookup:(egressselector.Lookup)(0x358e6e0), TracerProvider:trace.noopTracerProvider{}}, Paging:true, Codec:runtime.Codec(nil), EncodeVersioner:runtime.GroupVersioner(nil), Transformer:value.Transformer(nil), CompactionInterval:300000000000, CountMetricPollPeriod:60000000000, DBMetricPollInterval:30000000000, HealthcheckTimeout:2000000000, ReadycheckTimeout:2000000000, LeaseManagerConfig:etcd3.LeaseManagerConfig{ReuseDurationSeconds:60, MaxObjectCount:1000}, StorageObjectCountTracker:(*request.objectCountTracker)(0xc000ec2630)} I0726 15:02:06.495686 52 etcd.go:383] "Using watch cache" resource="jobs.batch" W0726 15:02:06.495746 52 genericapiserver.go:660] Skipping API batch/v1beta1 because it has no resources. I0726 15:02:06.496692 52 storage_factory.go:274] storing certificatesigningrequests.certificates.k8s.io in certificates.k8s.io/v1, reading as certificates.k8s.io/__internal from storagebackend.Config{Type:"etcd3", Prefix:"/registry", Transport:storagebackend.TransportConfig{ServerList:[]string{"unix://kine.sock"}, KeyFile:"", CertFile:"", TrustedCAFile:"", EgressLookup:(egressselector.Lookup)(0x358e6e0), TracerProvider:trace.noopTracerProvider{}}, Paging:true, Codec:runtime.Codec(nil), EncodeVersioner:runtime.GroupVersioner(nil), Transformer:value.Transformer(nil), CompactionInterval:300000000000, CountMetricPollPeriod:60000000000, DBMetricPollInterval:30000000000, HealthcheckTimeout:2000000000, ReadycheckTimeout:2000000000, LeaseManagerConfig:etcd3.LeaseManagerConfig{ReuseDurationSeconds:60, MaxObjectCount:1000}, StorageObjectCountTracker:(*request.objectCountTracker)(0xc000ec2630)}

from kuscia.

gshilei avatar gshilei commented on September 27, 2024

从上面k3s日志看,没有发现error信息。如果你本地发现该日志出现大量error信息,也可以贴一下。

image
上述作业运行失败,可以贴一下相关作业的信息以及作业相关pod的日志。具体命令如下:

  1. 查看作业的任务
    kubectl get kt

  2. 贴一下任务输出的内容
    kubectl get kt xxx -o yaml

  3. 贴一下任务pod的日志, pod信息可以通过下面命令查看
    kubectl get pod -n alice
    kubectl get pod xxxx -o yaml -n alice
    kubectl get pod -n bob
    kubectl get pod xxxx -o yaml -n bob

  4. fate相关的任务,请再贴一下fate-alice容器日志
    docker ps | grep fate-alice
    -> 若发现fate-alice容器不存在,那么fate任务肯定会运行失败。
    docker logs fate-alice

from kuscia.

cs1317 avatar cs1317 commented on September 27, 2024

kubectl get kt kubectl get kt xxx -o yaml
image

kubectl get pods -A
NAMESPACE NAME READY STATUS RESTARTS AGE
bob fate-deploy-bob-6b85647f8b-bfb5n 1/1 Running 2 (16m ago) 23h
alice secretflow-task-20230727141300-single-psi-0 0/1 Pending 0 12m
bob secretflow-task-20230727141300-single-psi-0 0/1 Pending 0 12m
bob fate-task-20230727141852-data-reader-0 0/1 Pending 0 6m38s

[root@xiaomi-kuscia-master kuscia]# kubectl get pod secretflow-task-20230727141300-single-psi-0 -o yaml -n alice
apiVersion: v1
kind: Pod
metadata:
annotations:
kuscia.secretflow/config-template-volumes: config-template
kuscia.secretflow/task-resource-reserving-timestamp: "2023-07-27T14:18:00+08:00"
creationTimestamp: "2023-07-27T06:13:00Z"
labels:
kuscia.secretflow/communication-role-client: "true"
kuscia.secretflow/communication-role-server: "true"
kuscia.secretflow/controller: kusciatask
kuscia.secretflow/initiator: alice
kuscia.secretflow/task-id: secretflow-task-20230727141300-single-psi
kuscia.secretflow/task-resource: secretflow-task-20230727141300-single-psi-2e9c25132bbf
kuscia.secretflow/task-resource-group: secretflow-task-20230727141300-single-psi
task.kuscia.secretflow/pod-name: secretflow-task-20230727141300-single-psi-0
task.kuscia.secretflow/pod-role: ""
name: secretflow-task-20230727141300-single-psi-0
namespace: alice
ownerReferences:

  • apiVersion: kuscia.secretflow/v1alpha1
    blockOwnerDeletion: true
    controller: true
    kind: KusciaTask
    name: secretflow-task-20230727141300-single-psi
    uid: e8db39b1-228d-43ad-acfa-eee459feb288
    resourceVersion: "213451"
    uid: 41f6fd91-5089-43b1-a7e6-86e23bd4d05c
    spec:
    automountServiceAccountToken: false
    containers:
  • args:
    • -c
    • python -m secretflow.kuscia.entry /etc/kuscia/task-config.conf
      command:
    • sh
      env:
    • name: TASK_ID
      value: secretflow-task-20230727141300-single-psi
    • name: TASK_CLUSTER_DEFINE
      value: '{"parties":[{"name":"bob","services":[{"port_name":"spu","endpoints":["secretflow-task-20230727141300-single-psi-0-spu.bob.svc"]},{"port_name":"fed","endpoints":["secretflow-task-20230727141300-single-psi-0-fed.bob.svc"]},{"port_name":"global","endpoints":["secretflow-task-20230727141300-single-psi-0-global.bob.svc:8081"]}]},{"name":"alice","services":[{"port_name":"fed","endpoints":["secretflow-task-20230727141300-single-psi-0-fed.alice.svc"]},{"port_name":"global","endpoints":["secretflow-task-20230727141300-single-psi-0-global.alice.svc:8081"]},{"port_name":"spu","endpoints":["secretflow-task-20230727141300-single-psi-0-spu.alice.svc"]}]}],"self_party_idx":1}'
    • name: ALLOCATED_PORTS
      value: '{"ports":[{"name":"spu","port":54509,"scope":"Cluster","protocol":"GRPC"},{"name":"fed","port":8080,"scope":"Cluster","protocol":"GRPC"},{"name":"global","port":8081,"scope":"Domain","protocol":"GRPC"}]}'
    • name: TASK_INPUT_CONFIG
      value: '{"sf_datasource_config":{"bob":{"id":"default-data-source"},"alice":{"id":"default-data-source"}},"sf_cluster_desc":{"parties":["alice","bob"],"devices":[{"name":"spu","type":"spu","parties":["alice","bob"],"config":"{"runtime_config":{"protocol":"REF2K","field":"FM64"},"link_desc":{"connect_retry_times":60,"connect_retry_interval_ms":1000,"brpc_channel_protocol":"http","brpc_channel_connection_type":"pooled","recv_timeout_ms":1200000,"http_timeout_ms":1200000}}"},{"name":"heu","type":"heu","parties":["alice","bob"],"config":"{"mode":
      "PHEU", "schema": "paillier", "key_size": 2048}"}]},"sf_node_eval_param":{"domain":"preprocessing","name":"psi","version":"0.0.1","attr_paths":["input/receiver_input/key","input/sender_input/key","protocol","precheck_input","bucket_size","curve_type"],"attrs":[{"ss":["id1"]},{"ss":["id2"]},{"s":"ECDH_PSI_2PC"},{"b":true},{"i64":"1048576"},{"s":"CURVE_FOURQ"}],"inputs":[{"type":"sf.table.individual","meta":{"@type":"type.googleapis.com/secretflow.component.IndividualTable","schema":{"ids":["id1"],"features":["age","education","default","balance","housing","loan","day","duration","campaign","pdays","previous","job_blue-collar","job_entrepreneur","job_housemaid","job_management","job_retired","job_self-employed","job_services","job_student","job_technician","job_unemployed","marital_divorced","marital_married","marital_single"],"id_types":["str"],"feature_types":["f32","f32","f32","f32","f32","f32","f32","f32","f32","f32","f32","f32","f32","f32","f32","f32","f32","f32","f32","f32","f32","f32","f32","f32"]},"num_lines":"-1"},"data_refs":[{"uri":"alice.csv","party":"alice","format":"csv"}]},{"type":"sf.table.individual","meta":{"@type":"type.googleapis.com/secretflow.component.IndividualTable","schema":{"ids":["id2"],"features":["contact_cellular","contact_telephone","contact_unknown","month_apr","month_aug","month_dec","month_feb","month_jan","month_jul","month_jun","month_mar","month_may","month_nov","month_oct","month_sep","poutcome_failure","poutcome_other","poutcome_success","poutcome_unknown"],"labels":["y"],"id_types":["str"],"feature_types":["f32","f32","f32","f32","f32","f32","f32","f32","f32","f32","f32","f32","f32","f32","f32","f32","f32","f32","f32"],"label_types":["i32"]},"num_lines":"-1"},"data_refs":[{"uri":"bob.csv","party":"bob","format":"csv"}]}]},"sf_output_uris":["psi-output.csv"],"sf_output_ids":["psi-output"]}'
      image: secretflow/secretflow-lite-anolis8:latest
      imagePullPolicy: IfNotPresent
      name: secretflow
      ports:
    • containerPort: 54509
      name: spu
      protocol: TCP
    • containerPort: 8080
      name: fed
      protocol: TCP
    • containerPort: 8081
      name: global
      protocol: TCP
      resources: {}
      terminationMessagePath: /dev/termination-log
      terminationMessagePolicy: FallbackToLogsOnError
      volumeMounts:
    • mountPath: /etc/kuscia/task-config.conf
      name: config-template
      subPath: task-config.conf
      workingDir: /work
      dnsPolicy: ClusterFirst
      enableServiceLinks: true
      nodeSelector:
      kuscia.secretflow/namespace: alice
      preemptionPolicy: PreemptLowerPriority
      priority: 0
      restartPolicy: Never
      schedulerName: kuscia-scheduler
      securityContext: {}
      serviceAccount: default
      serviceAccountName: default
      terminationGracePeriodSeconds: 30
      tolerations:
  • effect: NoSchedule
    key: kuscia.secretflow/agent
    operator: Exists
  • effect: NoExecute
    key: node.kubernetes.io/not-ready
    operator: Exists
    tolerationSeconds: 300
  • effect: NoExecute
    key: node.kubernetes.io/unreachable
    operator: Exists
    tolerationSeconds: 300
    volumes:
  • configMap:
    defaultMode: 420
    name: secretflow-task-20230727141300-single-psi-configtemplate
    name: config-template
    status:
    conditions:
  • lastProbeTime: null
    lastTransitionTime: "2023-07-27T06:13:00Z"
    message: '0/2 nodes are available: 2 node(s) had untolerated taint {node.kubernetes.io/disk-pressure:
    }. preemption: 0/2 nodes are available: 2 Preemption is not helpful for scheduling.,
    reject the pod secretflow-task-20230727141300-single-psi-0 is unschedulable
    even after PostFilter.'
    reason: Unschedulable
    status: "False"
    type: PodScheduled
    phase: Pending
    qosClass: BestEffort

[root@xiaomi-kuscia-master kuscia]# kubectl get pod secretflow-task-20230727141300-single-psi-0 -o yaml -n bob
apiVersion: v1
kind: Pod
metadata:
annotations:
kuscia.secretflow/config-template-volumes: config-template
kuscia.secretflow/task-resource-reserving-timestamp: "2023-07-27T14:18:00+08:00"
creationTimestamp: "2023-07-27T06:13:00Z"
labels:
kuscia.secretflow/communication-role-client: "true"
kuscia.secretflow/communication-role-server: "true"
kuscia.secretflow/controller: kusciatask
kuscia.secretflow/initiator: alice
kuscia.secretflow/task-id: secretflow-task-20230727141300-single-psi
kuscia.secretflow/task-resource: secretflow-task-20230727141300-single-psi-6de118d1b2d0
kuscia.secretflow/task-resource-group: secretflow-task-20230727141300-single-psi
task.kuscia.secretflow/pod-name: secretflow-task-20230727141300-single-psi-0
task.kuscia.secretflow/pod-role: ""
name: secretflow-task-20230727141300-single-psi-0
namespace: bob
ownerReferences:

  • apiVersion: kuscia.secretflow/v1alpha1
    blockOwnerDeletion: true
    controller: true
    kind: KusciaTask
    name: secretflow-task-20230727141300-single-psi
    uid: e8db39b1-228d-43ad-acfa-eee459feb288
    resourceVersion: "213452"
    uid: dca2a812-f57b-4e6f-8113-0f43b639fe4f
    spec:
    automountServiceAccountToken: false
    containers:
  • args:
    • -c
    • python -m secretflow.kuscia.entry /etc/kuscia/task-config.conf
      command:
    • sh
      env:
    • name: TASK_ID
      value: secretflow-task-20230727141300-single-psi
    • name: TASK_CLUSTER_DEFINE
      value: '{"parties":[{"name":"bob","services":[{"port_name":"spu","endpoints":["secretflow-task-20230727141300-single-psi-0-spu.bob.svc"]},{"port_name":"fed","endpoints":["secretflow-task-20230727141300-single-psi-0-fed.bob.svc"]},{"port_name":"global","endpoints":["secretflow-task-20230727141300-single-psi-0-global.bob.svc:8081"]}]},{"name":"alice","services":[{"port_name":"fed","endpoints":["secretflow-task-20230727141300-single-psi-0-fed.alice.svc"]},{"port_name":"global","endpoints":["secretflow-task-20230727141300-single-psi-0-global.alice.svc:8081"]},{"port_name":"spu","endpoints":["secretflow-task-20230727141300-single-psi-0-spu.alice.svc"]}]}]}'
    • name: ALLOCATED_PORTS
      value: '{"ports":[{"name":"fed","port":8080,"scope":"Cluster","protocol":"GRPC"},{"name":"global","port":8081,"scope":"Domain","protocol":"GRPC"},{"name":"spu","port":54509,"scope":"Cluster","protocol":"GRPC"}]}'
    • name: TASK_INPUT_CONFIG
      value: '{"sf_datasource_config":{"bob":{"id":"default-data-source"},"alice":{"id":"default-data-source"}},"sf_cluster_desc":{"parties":["alice","bob"],"devices":[{"name":"spu","type":"spu","parties":["alice","bob"],"config":"{"runtime_config":{"protocol":"REF2K","field":"FM64"},"link_desc":{"connect_retry_times":60,"connect_retry_interval_ms":1000,"brpc_channel_protocol":"http","brpc_channel_connection_type":"pooled","recv_timeout_ms":1200000,"http_timeout_ms":1200000}}"},{"name":"heu","type":"heu","parties":["alice","bob"],"config":"{"mode":
      "PHEU", "schema": "paillier", "key_size": 2048}"}]},"sf_node_eval_param":{"domain":"preprocessing","name":"psi","version":"0.0.1","attr_paths":["input/receiver_input/key","input/sender_input/key","protocol","precheck_input","bucket_size","curve_type"],"attrs":[{"ss":["id1"]},{"ss":["id2"]},{"s":"ECDH_PSI_2PC"},{"b":true},{"i64":"1048576"},{"s":"CURVE_FOURQ"}],"inputs":[{"type":"sf.table.individual","meta":{"@type":"type.googleapis.com/secretflow.component.IndividualTable","schema":{"ids":["id1"],"features":["age","education","default","balance","housing","loan","day","duration","campaign","pdays","previous","job_blue-collar","job_entrepreneur","job_housemaid","job_management","job_retired","job_self-employed","job_services","job_student","job_technician","job_unemployed","marital_divorced","marital_married","marital_single"],"id_types":["str"],"feature_types":["f32","f32","f32","f32","f32","f32","f32","f32","f32","f32","f32","f32","f32","f32","f32","f32","f32","f32","f32","f32","f32","f32","f32","f32"]},"num_lines":"-1"},"data_refs":[{"uri":"alice.csv","party":"alice","format":"csv"}]},{"type":"sf.table.individual","meta":{"@type":"type.googleapis.com/secretflow.component.IndividualTable","schema":{"ids":["id2"],"features":["contact_cellular","contact_telephone","contact_unknown","month_apr","month_aug","month_dec","month_feb","month_jan","month_jul","month_jun","month_mar","month_may","month_nov","month_oct","month_sep","poutcome_failure","poutcome_other","poutcome_success","poutcome_unknown"],"labels":["y"],"id_types":["str"],"feature_types":["f32","f32","f32","f32","f32","f32","f32","f32","f32","f32","f32","f32","f32","f32","f32","f32","f32","f32","f32"],"label_types":["i32"]},"num_lines":"-1"},"data_refs":[{"uri":"bob.csv","party":"bob","format":"csv"}]}]},"sf_output_uris":["psi-output.csv"],"sf_output_ids":["psi-output"]}'
      image: secretflow/secretflow-lite-anolis8:latest
      imagePullPolicy: IfNotPresent
      name: secretflow
      ports:
    • containerPort: 54509
      name: spu
      protocol: TCP
    • containerPort: 8080
      name: fed
      protocol: TCP
    • containerPort: 8081
      name: global
      protocol: TCP
      resources: {}
      terminationMessagePath: /dev/termination-log
      terminationMessagePolicy: FallbackToLogsOnError
      volumeMounts:
    • mountPath: /etc/kuscia/task-config.conf
      name: config-template
      subPath: task-config.conf
      workingDir: /work
      dnsPolicy: ClusterFirst
      enableServiceLinks: true
      nodeSelector:
      kuscia.secretflow/namespace: bob
      preemptionPolicy: PreemptLowerPriority
      priority: 0
      restartPolicy: Never
      schedulerName: kuscia-scheduler
      securityContext: {}
      serviceAccount: default
      serviceAccountName: default
      terminationGracePeriodSeconds: 30
      tolerations:
  • effect: NoSchedule
    key: kuscia.secretflow/agent
    operator: Exists
  • effect: NoExecute
    key: node.kubernetes.io/not-ready
    operator: Exists
    tolerationSeconds: 300
  • effect: NoExecute
    key: node.kubernetes.io/unreachable
    operator: Exists
    tolerationSeconds: 300
    volumes:
  • configMap:
    defaultMode: 420
    name: secretflow-task-20230727141300-single-psi-configtemplate
    name: config-template
    status:
    conditions:
  • lastProbeTime: null
    lastTransitionTime: "2023-07-27T06:13:00Z"
    message: '0/2 nodes are available: 2 node(s) had untolerated taint {node.kubernetes.io/disk-pressure:
    }. preemption: 0/2 nodes are available: 2 Preemption is not helpful for scheduling.,
    reject the pod secretflow-task-20230727141300-single-psi-0 is unschedulable
    even after PostFilter.'
    reason: Unschedulable
    status: "False"
    type: PodScheduled
    phase: Pending
    qosClass: BestEffort

fate-alice容器存在的
image

from kuscia.

gshilei avatar gshilei commented on September 27, 2024

从上面的内容可以看到,你所使用的机器磁盘空间不足,导致任务Pod失败和fate-alice容器运行异常。
image

image

解决方法:清理下机器,腾些磁盘空间出来。可以通过如下命令,检查disk-pressure taint是否还在。如果该taint还在,那么任务Pod是无法完成调度的,从而无法正常运行任务和作业。
image

from kuscia.

santiago-wjq avatar santiago-wjq commented on September 27, 2024

你好,请问清理完磁盘空间之后是否还有异常?

from kuscia.

cs1317 avatar cs1317 commented on September 27, 2024

我换台机器之后,作业执行成功!

from kuscia.

gshilei avatar gshilei commented on September 27, 2024

好的,那我把issue关掉了

from kuscia.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.