Let's support listing job runs. We'll want to update the OpenAPI spec accordingly <a c

/ cc <a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url

continued : I think it would be fine to define the endpoint: <div class="s

Chiming in here, would rather see <div class="snippet-clipboard-content notranslat

Being added in <a class="issue-link js-issue-link" data-error-text="Failed to load tit

List job runs about marquez HOT 8 CLOSED

marquezproject commented on August 27, 2024

List job runs

from marquez.

Comments (8)

wslulciuc commented on August 27, 2024

/ cc @Liorba

from marquez.

ashulmanWeWork commented on August 27, 2024

Before I started coding, I wanted to better understand the intent of the issue before using the API definition suggested above.

The main way today to get info about current job runs is with this endpoint--
namespaces/{namespace}/jobs/{job}/runs

If we want to list the active jobs run for each job when we list all jobs, that might involve some changes to how we represent the Job object in the API, namely that we'll need to add the job's current runs to its description. This is problematic b/c the service layer will now need to also know about that current job runs for every job. So this change will permeate all the layers.

If the goal here is to get all the job runs in a namespace, could we instead create an endpoint for that? For example /namespaces/:namespace/job_runs, or create a new endpoint, /jobs/runs/, that will list all the job runs, with the optional parameter of namespace, which will get all the job runs for the namespace.

from marquez.

wslulciuc commented on August 27, 2024

It might help to highlight the question we are looking to answer:

Given job K, return a list of runs L

So, let's say I have the job my_job under the namespace my_namespace. The API call:

GET namespaces/my_namespace/jobs/my_job/runs

will return the list of runs for my_job. Cool. Now, maybe I'm only interested in failed run attempts, we can and should add the filter run_state to limit our results:

GET namespaces/my_namespace/jobs/my_job/runs?run_state=FAILED

Note that the jobs/runs/* endpoints were introduced to simplify interactions with a single job run instance, not a global list.

But, now let's say we wanted to view runs for multiple jobs: my_job, my_other_job, then that would require following the steps outlined above for each job.

The issue does outline (possibly) returning a list of runs when retrieving a job:

  ...
  "runs": [
    "/jobs/runs/cfc4b5e6-c630-48d4-ad19-f2bd16c93a9d",
    "/jobs/runs/d33ef190-73bd-4a65-ab59-1bbd65364d0b",
    "/jobs/runs/5ced1097-8d59-46d8-933e-c9a688be8b8c",
    ...
  ]

My thinking here is that it's more of an optimization for the caller. Maybe we return the last N completed runs or something similar, but not a feature we'd need to support in release 0.2.0.

from marquez.

wslulciuc commented on August 27, 2024

continued: I think it would be fine to define the endpoint:

GET /jobs/runs

returning a list of run IDs. But to learn more about the job run, the caller would have to make another API call:

GET /jobs/runs/cfc4b5e6-c630-48d4-ad19-f2bd16c93a9d

from marquez.

ashulmanWeWork commented on August 27, 2024

I like the idea of calling GET /jobs/runs, and then potentially filtering by namespace or by namespace and job_name. For example, to find all the jobs runs for namespace finance:
GET /api/v1/jobs/runs?namespace=finance
To find info about any runs for a given job, say quarterly_billings, which resides in the finance namespace, we would do this:
GET /api/v1/jobs/runs?namespace=finance?&job_name=quarterly_billings

Does this sound like a sensible way to proceed?

from marquez.

wslulciuc commented on August 27, 2024

The endpoint /jobs/runs/{id} has a fundamental assumption:

The caller may or may not know the namespace and/or the job associated with {id}.

That is, a run ID would encode both the namespace and job version associated with the run instance, but this is very much an internal association maintained by Marquez.

I guess I'm not really sure how adding the filters namespace or job to /jobs/runs is any different than:

GET namespaces/{namespace}/jobs/{job}/runs

The call above would return a list of runs, allowing the caller to filter runs by job under a given namespace. What it wouldn't allow you to do is filter runs only by job name, but that's not a feature we have thought about supporting.

/cc @sshah-wework @hougs

from marquez.

sshah-wework commented on August 27, 2024

Chiming in here, would rather see

GET namespaces/{namespace}/jobs/{job}/runs

Than

/jobs/runs?namespace=...&job=

since the first is more canonical. the /jobs/runs endpoint was just meant to be convenient for fetching a single run by ID, but I'd rather not support more functionality from that endpoint.

One day, we may want the equivalent lookup via a id filter (e.g. GET namespaces/{namespace}/jobs/{job}/runs?id=...) for consistency.

from marquez.

wslulciuc commented on August 27, 2024

Being added in #633

from marquez.

List job runs about marquez HOT 8 CLOSED

Comments (8)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent