Comments (10)
from graph-learn.
顺便一提,执行
export DgsServiceIP=$(kubectl get ingress --namespace default dgs-u2i-frontend-ingress --output jsonpath='{.status.loadBalancer.ingress[0].ip}')
echo $DgsServiceIP
也无法输出service ip,是空的。
执行kubectl get ingress dgs-u2i-frontend-ingress
之后返回如下:
ADDRESS字段是空的。这又是为什么呢?是否跟我没有正确配置好nginx controller有关系呢?
执行 kubectl get svc
返回如下:
有没有人能帮助我?
from graph-learn.
能否给出serving worker里面更详细的日志,目前看来你给出的log只显示了serving worker向coordinator注册了,但是没有显示自己处于ready状态。只有当serving worker处于ready之后,相关的service port才可以被访问。 @Homura2333
from graph-learn.
能否给出serving worker里面更详细的日志,目前看来你给出的log只显示了serving worker向coordinator注册了,但是没有显示自己处于ready状态。只有当serving worker处于ready之后,相关的service port才可以被访问。 @Homura2333
更详细的log也只能看到这些:
还有什么其他方法能看到更有用的log吗?
from graph-learn.
进入容器内部,/serving_workdir/package/bin下的log如下:
from graph-learn.
@Homura2333 看起来是serving worker只是向coordinator注册了,但是没有拿到init info,因此根本没有启动,能够把coordinator的日志也贴一下嘛
from graph-learn.
@Homura2333 看起来是serving worker只是向coordinator注册了,但是没有拿到init info,因此根本没有启动,能够把coordinator的日志也贴一下嘛
from graph-learn.
@Homura2333 从coordinator的日志来看serving worker的启动流程是正常的,但是init之后的信息并没有出现,说明serving worker在init阶段卡住或者出现了错误,建议检查一下是不是设置的port或者k8s存储出现冲突之类的原因,然后删掉这个release重新拉起试一试。
from graph-learn.
也是在搭建DGS时,执行完以下命令后,DGS相关pod都error。
执行命令如下:
helm install dgs-u2i DGS/dgs --set frontend.ingressHostName="dynamic-graph-service.info" --set-file graphSchema=./conf/u2i/schema.u2i.json --set kafka.dl2spl.servers=[localhost:9092] --set kafka.dl2spl.topic="record-batches" --set kafka.dl2spl.partitions=4 --set kafka.spl2srv.servers=[localhost:9092] --set kafka.spl2srv.topic="sample-batches" --set kafka.spl2srv.partitions=4 --set glog.toConsole=true
pod状态如下:
pod的log显示,并没有对应的目录,例如/coordinator_workdir、/serving_workdir、/sampling_workdir等:
想请问这个问题应该如何解决?是否是因为容器镜像有问题?
from graph-learn.
也是在搭建DGS时,执行完以下命令后,DGS相关pod都error。 执行命令如下:
helm install dgs-u2i DGS/dgs --set frontend.ingressHostName="dynamic-graph-service.info" --set-file graphSchema=./conf/u2i/schema.u2i.json --set kafka.dl2spl.servers=[localhost:9092] --set kafka.dl2spl.topic="record-batches" --set kafka.dl2spl.partitions=4 --set kafka.spl2srv.servers=[localhost:9092] --set kafka.spl2srv.topic="sample-batches" --set kafka.spl2srv.partitions=4 --set glog.toConsole=true
pod状态如下: pod的log显示,并没有对应的目录,例如/coordinator_workdir、/serving_workdir、/sampling_workdir等: 想请问这个问题应该如何解决?是否是因为容器镜像有问题?
@Homura2333 ,看起来是pod里面的container没有成功下载运行相关的package,你可以在你的k8s集群中执行wget https://graphlearn.oss-cn-hangzhou.aliyuncs.com/package/dgs-built-1.0.0.tgz
确认是否能够下载这个pacakge,或者进一步排查网络问题。
from graph-learn.
Related Issues (20)
- Current Version whether support Caching Neighbors of Important Vertices HOT 5
- 是否支持pyspark数据格式的输入? HOT 1
- About quick start issue HOT 1
- 请问在执行tutorial的过程中helm install dgs-u2i dgs/dgs报错是为什么
- Provide Instructions for macOS installation?
- [BUG] GraphLearn doesn't work with Python 3.10 & Python 3.11
- GraphLearn动态图在线推理仅支持TopK采样吗? HOT 3
- 与其他GNN框架的性能对比 HOT 1
- 目前Graph-learn是用vineyard的哪个结构来存储图拓扑 HOT 1
- dgs部署失败 HOT 1
- 使用当前tutorial中的代码示例无法完成载图操作,在进行string类型特征的时候,导入出现问题。
- 参考tutorial中进行dist.yaml的分布式训练时,worker产生了Unimplemented和Unavailable的报错 HOT 4
- Training process triggered core dump
- 相同参数情况下 分布式和单机训练模型精度出现差异 HOT 1
- Cannot use pip3 to install graph-learn HOT 1
- graph-learn 引入 pywrap_graphlearn包报错, 咨询是因为 Mac M1芯片不兼容, 有没有其他的兼容方案?
- Error occurs when running gl on ps mode
- readthedocs 文档格式紊乱
- 项目更新不太活跃
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from graph-learn.