根据操作文档进行操作
点对点组网模式#
启动集群,会拉起两个 docker 容器,分别表示 Autonomy 节点 alice 和 bob。
./start_standalone.sh p2p
登入 alice 节点容器(或 bob 节点容器)。
docker exec -it ${USER}-kuscia-autonomy-alice bash
创建并启动作业(两方 PSI 任务)。
scripts/user/create_example_job.sh
查看作业状态。
kubectl get kj
docker logs root-kuscia-autonomy-bob
2023-07-17 11:19:32.102 INFO status/status_manager.go:634 Status for pod "secretflow-task-20230717111921-single-psi-0_bob(5826f340-a6ee-4854-bded-47f0469804a8)" updated successfully, statusVersion=4, status={Failed [{Initialized True 0001-01-01 00:00:00 +0000 UTC 2023-07-17 11:19:24 +0800 CST } {Ready False 0001-01-01 00:00:00 +0000 UTC 2023-07-17 11:19:31 +0800 CST PodFailed } {ContainersReady False 0001-01-01 00:00:00 +0000 UTC 2023-07-17 11:19:31 +0800 CST PodFailed } {PodScheduled True 0001-01-01 00:00:00 +0000 UTC 2023-07-17 11:19:24 +0800 CST }] 172.18.0.3 [] 2023-07-17 11:19:24 +0800 CST [] [{secretflow {nil nil &ContainerStateTerminated{ExitCode:1,Signal:0,Reason:Error,Message:entry.py", line 255, in
main()
File "/usr/local/lib/python3.8/site-packages/click/core.py", line 1157, in call
return self.main(*args, **kwargs)
File "/usr/local/lib/python3.8/site-packages/click/core.py", line 1078, in main
rv = self.invoke(ctx)
File "/usr/local/lib/python3.8/site-packages/click/core.py", line 1434, in invoke
return ctx.invoke(self.callback, **ctx.params)
File "/usr/local/lib/python3.8/site-packages/click/core.py", line 783, in invoke
return __callback(*args, **kwargs)
File "/usr/local/lib/python3.8/site-packages/secretflow/kuscia/entry.py", line 237, in main
sf_cluster_config = get_sf_cluster_config(task_conf, datamesh_addr, datasource_id)
File "/usr/local/lib/python3.8/site-packages/secretflow/kuscia/sf_config.py", line 142, in get_sf_cluster_config
return compose_sf_cluster_config(
File "/usr/local/lib/python3.8/site-packages/secretflow/kuscia/sf_config.py", line 86, in compose_sf_cluster_config
domain_data_source = get_domain_data_source(stub, datasource_id)
File "/usr/local/lib/python3.8/site-packages/secretflow/kuscia/datamesh.py", line 94, in get_domain_data_source
ret = stub.QueryDomainDataSource(QueryDomainDataSourceRequest(datasource_id=id))
File "/usr/local/lib/python3.8/site-packages/grpc/_channel.py", line 946, in call
return _end_unary_response_blocking(state, call, False, None)
File "/usr/local/lib/python3.8/site-packages/grpc/_channel.py", line 849, in _end_unary_response_blocking
raise _InactiveRpcError(state)
grpc._channel._InactiveRpcError: <_InactiveRpcError of RPC that terminated with:
status = StatusCode.UNAVAILABLE
details = "DNS resolution failed for datamesh:8071: C-ares status is not ARES_SUCCESS qtype=A name=datamesh is_balancer=0: Domain name not found"
debug_error_string = "UNKNOWN:DNS resolution failed for datamesh:8071: C-ares status is not ARES_SUCCESS qtype=A name=datamesh is_balancer=0: Domain name not found {created_time:"2023-07-17T03:19:31.226849931+00:00", grpc_status:14}"