Comments (15)
状态Available了,可以预测了,感谢!
from kuscia.
最近会发布serving的0.4.x版本,这个地方关于app_image的文档部分也会一并更新,欢迎到时候试用😁
from kuscia.
具体日志:2024-06-11T17:39:02.424966395+08:00 stdout F 2024-06-11 17:39:02.424 [info] [trace.cc:SetUpTracerProvider:137] no span processor configured, noop tracer will be used
2024-06-11T17:39:02.425013026+08:00 stdout F 2024-06-11 17:39:02.424 [info] [main.cc:main:83] version: 0.3.1b0
2024-06-11T17:39:02.425022007+08:00 stdout F 2024-06-11 17:39:02.424 [info] [main.cc:main:95] op list: MERGE_Y, DOT_PRODUCT, ARROW_PROCESSING, TREE_SELECT, TREE_MERGE, TREE_ENSEMBLE_PREDICT
2024-06-11T17:39:02.42504776+08:00 stdout F 2024-06-11 17:39:02.424 [info] [config_parser.cc:KusciaConfigParser:40] raw kuscia serving config content: {
2024-06-11T17:39:02.425055042+08:00 stdout F "serving_id": "zze-serving",
2024-06-11T17:39:02.425067115+08:00 stdout F "input_config": "{"partyConfigs": {"node1": {"serverConfig": {"featureMapping": {"x1": "x1", "x2": "x2"}}, "modelConfig": {"modelId": "model-export", "basePath": "/home/kuscia/var/storage/data", "sourcePath": "/home/kuscia/var/storage/data/model.tar", "sourceType": "ST_FILE"}, "featureSourceConfig": {"mockOpts": {}}, "channel_desc": {"protocol": "http"}}, "node2": {"serverConfig": {"featureMapping": {"x3": "x3"}}, "modelConfig": {"modelId": "model-export", "basePath": "/home/kuscia/var/storage/data", "sourcePath": "/home/kuscia/var/storage/data/model.tar", "sourceType": "ST_FILE"}, "featureSourceConfig": {"mockOpts": {}}, "channel_desc": {"protocol": "http"}}}}",
2024-06-11T17:39:02.425076789+08:00 stdout F "cluster_def": "{"parties":[{"name":"node1","role":"","services":[{"portName":"service","endpoints":["zze-serving-service.node1.svc:22504"]},{"portName":"communication","endpoints":["zze-serving-communication.node1.svc"]},{"portName":"internal","endpoints":["zze-serving-internal.node1.svc:22506"]},{"portName":"brpc-builtin","endpoints":["zze-serving-brpc-builtin.node1.svc:22503"]}]},{"name":"node2","role":"","services":[{"portName":"communication","endpoints":["zze-serving-communication.node2.svc"]}]}],"selfPartyIdx":0,"selfEndpointIdx":0}",
2024-06-11T17:39:02.425083731+08:00 stdout F "allocated_ports": "{"ports":[{"name":"service","port":22504,"scope":"Domain","protocol":"HTTP"},{"name":"communication","port":22505,"scope":"Cluster","protocol":"HTTP"},{"name":"internal","port":22506,"scope":"Domain","protocol":"HTTP"},{"name":"brpc-builtin","port":22503,"scope":"Domain","protocol":"HTTP"}]}"
2024-06-11T17:39:02.425089475+08:00 stdout F }
2024-06-11T17:39:02.425094965+08:00 stdout F
2024-06-11T17:39:02.426042651+08:00 stdout F 2024-06-11 17:39:02.425 [info] [source.cc:PullModel:52] remove tmp model file:/home/kuscia/var/storage/data/zze-serving/model-export/model_bundle.tar.gz
2024-06-11T17:39:02.426222942+08:00 stdout F 2024-06-11 17:39:02.426 [info] [filesystem_source.cc:OnPullModel:37] copy model file from /home/kuscia/var/storage/data/model.tar to /home/kuscia/var/storage/data/zze-serving/model-export/model_bundle.tar.gz
2024-06-11T17:39:02.432330243+08:00 stdout F 2024-06-11 17:39:02.432 [info] [model_loader.cc:Load:37] begin load file: /home/kuscia/var/storage/data/zze-serving/model-export/model_bundle.tar.gz
2024-06-11T17:39:02.432359529+08:00 stdout F 2024-06-11 17:39:02.432 [warning] [model_loader.cc:Load:43] remove tmp model dir: /home/kuscia/var/storage/data/zze-serving/model-export/data
2024-06-11T17:39:02.443818929+08:00 stdout F 2024-06-11 17:39:02.443 [info] [model_loader.cc:Load:82] end load model bundle, name: zze-model_156b6676-7b6d-483f-9160-b61e1d7b1be0, desc: zze model desc, graph version: 0.1.0
2024-06-11T17:39:02.444036952+08:00 stdout F 2024-06-11 17:39:02.443 [info] [thread_pool.h:Start:94] Create and start thread pool with 8 threads
2024-06-11T17:39:02.445483283+08:00 stdout F 2024-06-11 17:39:02.445 [info] [execution_core.cc:ExecutionCore:73] create feature adapter, type:1
2024-06-11T17:39:02.490860377+08:00 stdout F 2024-06-11 17:39:02.490 [info] [server.cc:Start:173] begin metrics service listen at 0.0.0.0:22506,
2024-06-11T17:39:02.490954294+08:00 stdout F 2024-06-11 17:39:02.490 [info] [model_info_collector.cc:ModelInfoCollector:100] local model info: party: node1 : {"name":"zze-model_156b6676-7b6d-483f-9160-b61e1d7b1be0","desc":"zze model desc","graphView":{"version":"0.1.0","nodeList":[{"name":"ss_sgd_1_dot","op":"DOT_PRODUCT","opVersion":"0.0.2"},{"name":"ss_sgd_1_merge_y","op":"MERGE_Y","parents":["ss_sgd_1_dot"],"opVersion":"0.0.2"}],"executionList":[{"nodes":["ss_sgd_1_dot"],"config":{"dispatchType":"DP_ALL"}},{"nodes":["ss_sgd_1_merge_y"],"config":{"dispatchType":"DP_ANYONE"}}]}}
2024-06-11T17:39:02.493223669+08:00 stdout F 2024-06-11 17:39:02.493 [info] [server.cc:Start:230] begin communication server listen at 0.0.0.0:22505,
2024-06-11T17:39:02.493247336+08:00 stdout F 2024-06-11 17:39:02.493 [info] [server.cc:Start:243] brpc built-in service port: 22503
2024-06-11T17:39:02.496824273+08:00 stdout F 2024-06-11 17:39:02.496 [info] [server.cc:Start:273] begin service server listen at 0.0.0.0:22504,
2024-06-11T17:39:02.496843471+08:00 stdout F 2024-06-11 17:39:02.496 [info] [server.cc:Start:276] start exchange model_info
2024-06-11T17:39:02.501598612+08:00 stdout F 2024-06-11 17:39:02.501 [warning] [model_info_collector.cc:TryCollect:148] call (node2) from (node1) GetModelInfo failed, msg:[E1010]HTTP/1.1 404 Not Found, may need retry
2024-06-11T17:39:07.504682308+08:00 stdout F 2024-06-11 17:39:07.504 [warning] [model_info_collector.cc:TryCollect:148] call (node2) from (node1) GetModelInfo failed, msg:[E1010]HTTP/1.1 404 Not Found, may need retry
from kuscia.
@Sept98 hi
可以提供下create的参数吗?
另一方是否起了serving服务呢?
from kuscia.
create的参数:{
"serving_id": "zze-serving",
"serving_input_config": "{"partyConfigs": {"node1": {"serverConfig": {"featureMapping": {"x1": "x1", "x2": "x2"}}, "modelConfig": {"modelId": "model-export", "basePath": "/home/kuscia/var/storage/data", "sourcePath": "/home/kuscia/var/storage/data/model.tar", "sourceType": "ST_FILE"}, "featureSourceConfig": {"mockOpts": {}}, "channel_desc": {"protocol": "http"}}, "node2": {"serverConfig": {"featureMapping": {"x3": "x3"}}, "modelConfig": {"modelId": "model-export", "basePath": "/home/kuscia/var/storage/data", "sourcePath": "/home/kuscia/var/storage/data/model.tar", "sourceType": "ST_FILE"}, "featureSourceConfig": {"mockOpts": {}}, "channel_desc": {"protocol": "http"}}}}",
"initiator": "node1",
"parties": [
{
"domain_id": "node1",
"app_image": "secretflow-serving-image",
"role": "",
"replicas": 1,
"resources": []
},
{
"domain_id": "node2",
"app_image": "secretflow-serving-image",
"role": "",
"replicas": 1,
"resources": []
}
]
}
from kuscia.
from kuscia.
然后下一步进行查询serving状态:
出现的状态一直Progressing
from kuscia.
确认下你的模型包是tar包还是tar.gz包以及sourcepath路径是否准确
from kuscia.
并且两个节点对应文件夹下model也都复制到{basePath}/{modelId}下了
如果sourcepath不正确应该复制不过去吧
from kuscia.
模型是tar包,当时保存的时候命名直接是tar,这个会有影响么
from kuscia.
这里,双方节点都有操作吗?
from kuscia.
from kuscia.
能发一下serving的app_image配置吗, 参考https://www.secretflow.org.cn/zh-CN/docs/kuscia/v0.8.0b0/reference/concepts/appimage_cn#id2
from kuscia.
改了个镜像,哦这里还需要有配置和create里面对应么:
apiVersion: kuscia.secretflow/v1alpha1
kind: AppImage
metadata:
name: secretflow-serving-image
spec:
configTemplates:
serving-config.conf: |
{
"serving_id": "{{.SERVING_ID}}",
"input_config": "{{.INPUT_CONFIG}}",
"cluster_def": "{{.CLUSTER_DEFINE}}",
"allocated_ports": "{{.ALLOCATED_PORTS}}"
}
deployTemplates:
- name: secretflow
replicas: 1
spec:
containers:
- command:
- sh
- -c
- ./secretflow_serving --flagfile=conf/gflags.conf --config_mode=kuscia --serving_config_file=/etc/kuscia/serving-config.conf
configVolumeMounts:
- mountPath: /etc/kuscia/serving-config.conf
subPath: serving-config.conf
name: secretflow
ports:
- name: service
port: 53508
protocol: HTTP
scope: Domain
- name: communication
port: 53509
protocol: HTTP
scope: Cluster
- name: internal
port: 53510
protocol: HTTP
scope: Domain
- name: brpc-builtin
port: 53511
protocol: HTTP
scope: Domain
readinessProbe:
httpGet:
path: /health
port: 53511
livenessProbe:
httpGet:
path: /health
port: 53511
startupProbe:
failureThreshold: 30
httpGet:
path: /health
port: 53511
periodSeconds: 10
successThreshold: 1
timeoutSeconds: 1
workingDir: /root/sf_serving
image:
name: secretflow-registry.cn-hangzhou.cr.aliyuncs.com/secretflow/serving-anolis8
tag: 0.3.1b0
from kuscia.
kuscia中把一些端口改成了自动分配,app_image的写法需要变一下,改成这样试试呢
apiVersion: kuscia.secretflow/v1alpha1
kind: AppImage
metadata:
name: secretflow-serving-image
spec:
configTemplates:
serving-config.conf: |
{
"serving_id": "{{.SERVING_ID}}",
"input_config": "{{.INPUT_CONFIG}}",
"cluster_def": "{{.CLUSTER_DEFINE}}",
"allocated_ports": "{{.ALLOCATED_PORTS}}"
}
deployTemplates:
- name: secretflow
replicas: 1
spec:
containers:
- command:
- sh
- -c
- ./secretflow_serving --flagfile=conf/gflags.conf --config_mode=kuscia --serving_config_file=/etc/kuscia/serving-config.conf
configVolumeMounts:
- mountPath: /etc/kuscia/serving-config.conf
subPath: serving-config.conf
name: secretflow
ports:
- name: service
port: 53508
protocol: HTTP
scope: Domain
- name: communication
port: 53509
protocol: HTTP
scope: Cluster
- name: internal
port: 53510
protocol: HTTP
scope: Domain
- name: brpc-builtin
port: 53511
protocol: HTTP
scope: Domain
readinessProbe:
httpGet:
path: /health
port: brpc-builtin
livenessProbe:
httpGet:
path: /health
port: brpc-builtin
startupProbe:
failureThreshold: 30
httpGet:
path: /health
port: brpc-builtin
periodSeconds: 10
successThreshold: 1
timeoutSeconds: 1
workingDir: /root/sf_serving
image:
name: secretflow-registry.cn-hangzhou.cr.aliyuncs.com/secretflow/serving-anolis8
tag: 0.3.1b0
from kuscia.
Related Issues (20)
- K8S点对点Runp模式运行测试作业时,任务一直pending HOT 12
- 创建serving服务,当采用OSS存储,涉及的accessKey、secretKey的安全问题 HOT 2
- 在k8S Kuscia点对点runp集群中,用API创建JOB,split的Task报错 HOT 7
- onehot_encode失败 HOT 3
- kuscia create job的API参数 HOT 2
- secretflow运行时如何向自己的部署节点发送请求 HOT 2
- kuscia如何低版本升级到高版本而不影响运行的任务 HOT 2
- PSI运行报错 HOT 12
- 如何将docker环境的自定义深度联邦学习目标检测算法,迁移到kuscia环境上 HOT 1
- PSI 运行错误 HOT 12
- 使用kuscia 同时开启多个训练任务报错 HOT 4
- 请求/api/v1/job/watch/api/v1/job/watch没有返回结果 HOT 9
- Empty grafana dashboard HOT 13
- 查询job状态的时候,kuscia返回error HOT 1
- docker kuscia安装问题 HOT 12
- BFIA协议支持情况 HOT 2
- kuscia party reason错误日志是不是有长度限制 HOT 5
- kuscia多机部署点对点集群 HOT 1
- Request entity too large: limit is 3145728 HOT 8
- 导出模型异常 HOT 1
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from kuscia.