Code Monkey home page Code Monkey logo

jobflow's Introduction

JobFlow

jobflowAnimation

Project Status

This project is being donated to the volcano community

Introduction

Volcano is an CNCF sandbox project aiming for running tranditional batch jobs on Kubernetes. It abstracts those batch jobs into an CRD called VCJob and has an excellet scheduler to imporve resource utilization. However, to solve an real-world issue, we need many VCJobs to cooperate each other and orchestrate them mannualy or by another Job Orchestruating Platrom to get the job done finally.We present an new way of orchestruing VCJobs called JobFlow. We proposed two concepts to running multiple batch jobs automatically named JobTemplate and JobFlow so end users can easily declare their jobs and run them using complex controlling primitives, for example, sequential or parallel executing, if-then-else statement, switch-case statement, loop executing and so on.

JobFlow helps migrating AI, BigData, HPC workloads to the cloudnative world. Though there are already some workload flow engines, they are not designed for batch job workloads. Those jobs typically have a complex running dependencies and take long time to run, for example days or weeks. JobFlow helps the end users to declaire their jobs as an jobTemplate and then reuse them accordingly. Also, JobFlow orchestruating those jobs using complex controlling primitives and lanch those jobs automatically. This can significantly reduce the time consumption of an complex job and improve resource utilization. Finally, JobFlow is not an generally purposed workflow engine, it knows the details of VCJobs. End user can have a better understanding of their jobs, for example, job's running state, beginning and ending timestamps, the next jobs to run, pod-failure-ratio and so on.

Demo video

https://www.bilibili.com/video/BV1c44y1Y7FX

Deploy

kubectl apply -f https://raw.githubusercontent.com/BoCloud/JobFlow/main/deploy/jobflow.yaml

Donation Self-Check Form

ID Item Description Required Compliance Conditions Note complete
1 Code of Conduct The conduct for the source code Y Contributor Covenant Code of Conduct Submit the code scanning report yes
2 License The License the project obeys Y Apache 2.0 yes
3 Readme Brief introduction of the project along with the source code Y yes
4 CI/CD The CI/CD to judge the compliance for all PRs Y Github Action yes
5 Security Security policy including vulnerability discovery and disposal Y Security Release Process Submit security scanning report yesTotal alerts
6 Roadmap Roadmap file about the important features in the feature Y yes
7 Design Documentations Documentations about the record of feature designs Y yes

jobflow's People

Contributors

hecarimv avatar hwdef avatar sssl93 avatar zhangzhenhua avatar zhoumingcheng avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar

jobflow's Issues

How to use jobflow to depend on the status of a job in the previous task?

Hi, I used jobflow to build a tensorflow-ps-workers task, and when the tensorflow distributed training finished. The ps-job will be terminated and the workers were succeed. But the jobflow can not start the next job which depends on the previous whole task job. It looks like the jobflow was affected by the state of ps-job.
So, I want to know how to start the next job with the one of the whole task(such as the completed worker-job)?
Hope to reply soon, thanks!

关于job依赖执行的实现问题

您好,最近在做类似 job 依赖执行的项目,想问问 假如 job b 依赖 job a,当进入调协时,一定是创建job a,
之后如何实现 job b 的等待或是说 如何回调使得 调协loop中 的资源收到 job a 执行完成 可以执行job b 呢?
我的理解 调协loop 中应该不能轮巡等待查看状态对吧?
抱歉,在controller.go中 没有看到类似 job a 完成后回调的逻辑,还请大神抽空回复一下。感谢!

创建jobflow无法正常运行

你好,我在搭建好的集群尝试使用jobflow
jobtemplate.yaml如下:

apiVersion: batch.volcano.sh/v1alpha1
kind: JobTemplate
metadata:
  name: a
spec:
  minAvailable: 1
  schedulerName: volcano
  queue: default
  policies:
    - event: PodEvicted
      action: RestartJob
  maxRetry: 5
  queue: default

  tasks:
    - replicas: 1
      name: "test-task"
      policies: 
        - event: TaskCompleted
          action: CompletedJob
      template:
        metadata:
          name: test-task
        spec:
          containers:
            - name: test
              image: centos
              imagePullPolicy: IfNotPresent
              command: ["/bin/bash", "-c"]
              args: ["touch /home/a.txt && echo 'this is test string' >> /home/a.txt && sleep 10 "]
              resources:
                requests:
                  cpu: "1"
              volumeMounts:
                - mountPath: /home
                  name: my-pvc
          volumes:
          - name: my-pvc
            persistentVolumeClaim:
              claimName: my-pvc-test
---
apiVersion: batch.volcano.sh/v1alpha1
kind: JobTemplate
metadata:
  name: b
spec:
  minAvailable: 1
  schedulerName: volcano
  queue: default
  policies:
    - event: PodEvicted
      action: RestartJob
  maxRetry: 5
  queue: default

  tasks:
    - replicas: 1
      name: "test-task"
      policies: 
        - event: TaskCompleted
          action: CompletedJob
      template:
        metadata:
          name: test-task
        spec:
          containers:
            - name: test
              image: centos
              imagePullPolicy: IfNotPresent
              command: ["/bin/bash", "-c"]
              args: ["touch /home/b.txt && echo 'this is test string' >> /home/b.txt && sleep 10 "]
              resources:
                requests:
                  cpu: "1"
              volumeMounts:
                - mountPath: /home
                  name: my-pvc
          volumes:
          - name: my-pvc
            persistentVolumeClaim:
              claimName: my-pvc-test
---

job flow.yaml如下:

apiVersion: batch.volcano.sh/v1alpha1
kind: JobFlow
metadata:
  name: test
  namespace: default
spec:
  jobRetainPolicy: retain   # After jobflow runs, keep the generated job. Otherwise, delete it.
  flows:
    - name: a
    - name: b
      dependsOn:
        targets: ['a']

我现在面临的问题是,create jobtemplate后create jobflow,执行 kubectl get jf,存在的jf无状态信息显示,如下:

NAME   STATUS   AGE
test            10m

执行kubectl get pods ,也未见pod被创建。

Install steps

Do you have detailed make tutorials? such as setting the GOPATH and some requirements.

I want to build a docker image in other architecture machine.
I just follow the volcano jobflow's readme, git clone the source code and use make instruction in JobFlow files, but get the problem.

go: creating new go.mod: module tmp
Downloading sigs.k8s.io/controller-tools/cmd/[email protected]
go: added github.com/fatih/color v1.7.0
go: added github.com/gobuffalo/flect v0.2.0
go: added github.com/gogo/protobuf v1.3.1
go: added github.com/google/gofuzz v1.1.0
go: added github.com/inconshreveable/mousetrap v1.0.0
go: added github.com/json-iterator/go v1.1.8
go: added github.com/mattn/go-colorable v0.1.2
go: added github.com/mattn/go-isatty v0.0.8
go: added github.com/modern-go/concurrent v0.0.0-20180306012644-bacd9c7ef1dd
go: added github.com/modern-go/reflect2 v1.0.1
go: added github.com/spf13/cobra v1.0.0
go: added github.com/spf13/pflag v1.0.5
go: added golang.org/x/mod v0.2.0
go: added golang.org/x/net v0.0.0-20200226121028-0de0cce0169b
go: added golang.org/x/sys v0.0.0-20191022100944-742c48ecaeb7
go: added golang.org/x/text v0.3.2
go: added golang.org/x/tools v0.0.0-20200616195046-dc31b401abb5
go: added golang.org/x/xerrors v0.0.0-20191204190536-9bdfabe68543
go: added gopkg.in/inf.v0 v0.9.1
go: added gopkg.in/yaml.v2 v2.2.8
go: added gopkg.in/yaml.v3 v3.0.0-20190905181640-827449938966
go: added k8s.io/api v0.18.2
go: added k8s.io/apiextensions-apiserver v0.18.2
go: added k8s.io/apimachinery v0.18.2
go: added k8s.io/klog v1.0.0
go: added k8s.io/utils v0.0.0-20200324210504-a9aa75ae1b89
go: added sigs.k8s.io/controller-tools v0.4.1
go: added sigs.k8s.io/structured-merge-diff/v3 v3.0.0
go: added sigs.k8s.io/yaml v1.2.0
/Users/xiaozhu/Code/go/bin
/Users/xiaozhu/JobFlow/bin/controller-gen object:headerFile="hack/boilerplate.go.txt" paths="./..."
bash: /Users/xiaozhu/JobFlow/bin/controller-gen: No such file or directory

How do I solve this problem? Is this related to the GOPATH?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.