Code Monkey home page Code Monkey logo

Comments (20)

koutann avatar koutann commented on September 27, 2024 3

感谢回复,抱歉才回复@gshilei
serviceStatuses和podStatuses一样单独存储便于后续扩展,在endpoint里添加service自定义时间参数,在KusciaTask通过Kist-Watch获取变更进行处理做到解耦,你指出的改进很有道理,方案可行,我准备按这个方案着手开发了。
计划本周开发完,下周验证后提交pr

from kuscia.

hdkoutann avatar hdkoutann commented on September 27, 2024 2

hdkoutann Give it to me

from kuscia.

gshilei avatar gshilei commented on September 27, 2024 1

Hi @hdkoutann , 使用下面方式,下载依赖的镜像
docker pull secretflow-registry.cn-hangzhou.cr.aliyuncs.com/secretflow/kuscia-deps:0.1.0b0
docker tag secretflow-registry.cn-hangzhou.cr.aliyuncs.com/secretflow/kuscia-deps:0.1.0b0 docker.io/secretflow/kuscia-deps:0.1.0b0

docker pull secretflow-registry.cn-hangzhou.cr.aliyuncs.com/secretflow/kuscia-envoy:0.2.0b0
docker tag secretflow-registry.cn-hangzhou.cr.aliyuncs.com/secretflow/kuscia-envoy:0.2.0b0 docker.io/secretflow/kuscia-envoy:0.2.0b0

from kuscia.

hdkoutann avatar hdkoutann commented on September 27, 2024
  1. 概要分析

    1. Pod资源时间观测

      包括两个内容,节点创建pod时间戳和pod拉起成功时间戳

      KusciaTask的TaskStatus定义里面已经有podStatuses内容,podStatuses包含所有task part所需要创建的pod,在当前版本的podStatuses中添加两个字段:

      1. podCreateTime(pod创建时间)
      2. podStartupTime(pod启动时间)
      podStatuses:
          alice/secretflow-task-psi-0:
            podCreateTime: "2023-06-26T03:46:58Z"
            podStartupTime: "2023-06-26T03:46:58Z"
    2. Service(Endpoints)资源时间观测

      当前KusciaTask的CRD定义没有service相关内容,需要添加字段。

      在podStatuses下新建envoyServices字段,以及处理时间戳decorateTime

      podStatuses:
          alice/secretflow-task-psi-0:
            podCreateTime: "2023-06-26T03:46:58Z"
            podStartupTime: "2023-06-26T03:46:58Z"
      			envoyServices:
      				- serviceName: task-template-psi-0-global
      					decorateTime: "2023-06-26T03:46:58Z"
  2. 详细方案

    1. pod创建和拉起时间添加

      KusciaTask处理逻辑中监听pod状态并更新逻辑,在pkg\controllers\kusciatask\handler\running_handler.go添加判断,判断pod状态

      当pod在pending状态,更新podStatuses中podCreateTime字段

      当pod在running状态,更新podStatuses中podStartupTime字段;

      pod挂掉自动重启情况应该也可以覆盖到。

    2. envoy完成时间添加

      envoy的后置处理逻辑在pkg\gateway\controller\endpoints.go中,监听Service变更事件后创建envoy节点。

      service的变更事件中可以获取到kuscia的Service,通过service的OwnerReferences获取到pod,通过pod的OwnerReferences可以获取到KusciaTask,通过获取到的kusciaTask更新envoServices的decorateTime字段;

from kuscia.

hdkoutann avatar hdkoutann commented on September 27, 2024

这样设计实现是否可行呀

from kuscia.

gshilei avatar gshilei commented on September 27, 2024

@hdkoutann 感谢你的积极参与,针对你上面设计方案,有以下建议:

  1. Pod资源时间观测
  • podStartupTime 可以细粒度地拆分成:scheduledTime/readyTime;podStatuses结构如下:
podStatuses:
   alice/secretflow-task-psi-0:
     ...
     createdTime: "2023-06-26T03:46:58Z"       -> 对应pod资源的 metadata.creationTimestamp
     scheduledTime: "2023-06-26T03:47:01Z"   -> 对应pod资源status.conditions[PodScheduled].lastTransitionTime
     readyTime: "2023-06-26T03:47:05Z"           -> 对应pod资源status.conditions[Ready].lastTransitionTime
  1. Service(Endpoints)资源时间观测
  • 新增serviceStatuses字段,存放service的时间信息
  • service包含2个字段:createdTime 和 readyTime
  • readyTime:当gateway处理watch的service和endpoint时,最后会调用AddEnvoyCluster。在add envoy cluster成功之后,gateway 给对应service的annotation新增字段kuscia.secretflow/ready-time;kusciaTask Controller 通过List-Watch机制监听到该service有变化之后,更新 KusciaTask 中的serviceStatuses字段。
podStatuses:
  ...
serviceStatuses:
  alice/secretflow-task-xxx-single-psi-0-spu:
    createdTime: "2023-06-26T03:46:58Z"    -> 对应service资源的 metadata.creationTimestamp
    readyTime: "2023-06-26T03:46:59Z"        -> 对应service资源的 annotations kuscia.secretflow/ready-time:  2023-06-26T03:46:59Z
  alice/secretflow-task-xxx-single-psi-0-global:
    createdTime: "2023-06-26T03:46:58Z"
    readyTime: "2023-06-26T03:46:59Z" 
  bob/secretflow-task-xxx-single-psi-0-spu:
    createdTime: "2023-06-26T03:46:58Z"
    readyTime: "2023-06-26T03:46:59Z" 
  bob/secretflow-task-xxx-single-psi-0-global:
    createdTime: "2023-06-26T03:46:58Z"
    readyTime: "2023-06-26T03:46:59Z" 

from kuscia.

hdkoutann avatar hdkoutann commented on September 27, 2024

在mac上make image速度非常非常慢,基本make不出来,有什么办法可以提速么@gshilei

from kuscia.

gshilei avatar gshilei commented on September 27, 2024

能看一下,在Make的过程中,哪一步比较慢吗?

from kuscia.

hdkoutann avatar hdkoutann commented on September 27, 2024

1691989496842
依赖的kuscia-envoy和kuscia-deps下载很慢

from kuscia.

hdkoutann avatar hdkoutann commented on September 27, 2024

搭了下centos虚拟机上跑也不行,是不是机器配置不行还是网络配置有问题?

from kuscia.

gshilei avatar gshilei commented on September 27, 2024

从上面看,构建Kuscia镜像时依赖的两个基础镜像:secretflow/kuscia-envoy:0.2.0b0 和 secretflow/kuscia-deps:0.1.0b0下载比较慢,可以尝试下配个加速器 https://gist.github.com/y0ngb1n/7e8f16af3242c7815e7ca2f0833d3ea6 看是否有效果。

在构建Kuscia镜像之前,可以手动先把两个镜像下载到本地

from kuscia.

hdkoutann avatar hdkoutann commented on September 27, 2024

https://gist.github.com/y0ngb1n/7e8f16af3242c7815e7ca2f0833d3ea6 这个打不开呀

from kuscia.

gshilei avatar gshilei commented on September 27, 2024

Hi @hdkoutann, 抱歉现在才回复,针对podStatuses下的时间需要再调整下,以便更细粒度的展示每个阶段的时间点。建议如下:

下面4个时间字段,会先定义出来。除了scheduleTime字段本期留空外,其他字段根据实际值填写。

podStatuses:
  alice/secretflow-task-psi-0:
    ...
    createTime: "2023-06-26T03:46:58Z"        -> 1. pod创建时间,对应pod.metadata.creationTimestamp
    scheduleTime: "2023-06-26T03:40:00Z"      -> 2. pod调度时间,本期不填
    startTime: "2023-06-26T03:47:02Z"         -> 3. pod被agent接受时间,对应pod.status.startTime
    readyTime: "2023-06-26T03:47:05Z"         -> 4. pod被agent拉起时间,对应pod.status.conditions[Ready].lastTransitionTime

from kuscia.

hdkoutann avatar hdkoutann commented on September 27, 2024

了解,下来我调整下

from kuscia.

hdkoutann avatar hdkoutann commented on September 27, 2024

image
调整好了,createTime、scheduleTime两个字段和之前的定义createdTime、scheduledTime字段名不一样,也需要统一调整吗

from kuscia.

gshilei avatar gshilei commented on September 27, 2024

这两个字段名称需要改一下

  • createdTime 改成 createTime
  • scheduledTime 改成 scheduleTime

from kuscia.

hdkoutann avatar hdkoutann commented on September 27, 2024

调整完了,service的createTime也一起调整了,风格保持一致@gshilei

from kuscia.

hdkoutann avatar hdkoutann commented on September 27, 2024

中心化模式
image
image

from kuscia.

hdkoutann avatar hdkoutann commented on September 27, 2024

p2p模式
image
image

from kuscia.

hdkoutann avatar hdkoutann commented on September 27, 2024

验证ok了@gshilei

from kuscia.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.