Comments (8)
/ cc @Liorba
from marquez.
Before I started coding, I wanted to better understand the intent of the issue before using the API definition suggested above.
The main way today to get info about current job runs is with this endpoint--
namespaces/{namespace}/jobs/{job}/runs
If we want to list the active jobs run for each job when we list all jobs, that might involve some changes to how we represent the Job
object in the API, namely that we'll need to add the job's current runs to its description. This is problematic b/c the service layer will now need to also know about that current job runs for every job. So this change will permeate all the layers.
If the goal here is to get all the job runs in a namespace, could we instead create an endpoint for that? For example /namespaces/:namespace/job_runs
, or create a new endpoint, /jobs/runs/
, that will list all the job runs, with the optional parameter of namespace, which will get all the job runs for the namespace.
from marquez.
It might help to highlight the question we are looking to answer:
Given job K, return a list of runs L
So, let's say I have the job my_job
under the namespace my_namespace
. The API call:
GET namespaces/my_namespace/jobs/my_job/runs
will return the list of runs for my_job
. Cool. Now, maybe I'm only interested in failed run attempts, we can and should add the filter run_state
to limit our results:
GET namespaces/my_namespace/jobs/my_job/runs?run_state=FAILED
Note that the jobs/runs/*
endpoints were introduced to simplify interactions with a single job run instance, not a global list.
But, now let's say we wanted to view runs for multiple jobs: my_job
, my_other_job
, then that would require following the steps outlined above for each job.
The issue does outline (possibly) returning a list of runs when retrieving a job:
...
"runs": [
"/jobs/runs/cfc4b5e6-c630-48d4-ad19-f2bd16c93a9d",
"/jobs/runs/d33ef190-73bd-4a65-ab59-1bbd65364d0b",
"/jobs/runs/5ced1097-8d59-46d8-933e-c9a688be8b8c",
...
]
My thinking here is that it's more of an optimization for the caller. Maybe we return the last N
completed runs or something similar, but not a feature we'd need to support in release 0.2.0
.
from marquez.
continued: I think it would be fine to define the endpoint:
GET /jobs/runs
returning a list of run IDs. But to learn more about the job run, the caller would have to make another API call:
GET /jobs/runs/cfc4b5e6-c630-48d4-ad19-f2bd16c93a9d
from marquez.
I like the idea of calling GET /jobs/runs
, and then potentially filtering by namespace or by namespace and job_name. For example, to find all the jobs runs for namespace finance
:
GET /api/v1/jobs/runs?namespace=finance
To find info about any runs for a given job, say quarterly_billings
, which resides in the finance
namespace, we would do this:
GET /api/v1/jobs/runs?namespace=finance?&job_name=quarterly_billings
Does this sound like a sensible way to proceed?
from marquez.
The endpoint /jobs/runs/{id}
has a fundamental assumption:
The caller may or may not know the namespace and/or the job associated with
{id}
.
That is, a run ID would encode both the namespace and job version associated with the run instance, but this is very much an internal association maintained by Marquez.
I guess I'm not really sure how adding the filters namespace
or job
to /jobs/runs
is any different than:
GET namespaces/{namespace}/jobs/{job}/runs
The call above would return a list of runs, allowing the caller to filter runs by job under a given namespace. What it wouldn't allow you to do is filter runs only by job name, but that's not a feature we have thought about supporting.
/cc @sshah-wework @hougs
from marquez.
Chiming in here, would rather see
GET namespaces/{namespace}/jobs/{job}/runs
Than
/jobs/runs?namespace=...&job=
since the first is more canonical. the /jobs/runs
endpoint was just meant to be convenient for fetching a single run by ID, but I'd rather not support more functionality from that endpoint.
One day, we may want the equivalent lookup via a id
filter (e.g. GET namespaces/{namespace}/jobs/{job}/runs?id=...
) for consistency.
from marquez.
Being added in #633
from marquez.
Related Issues (20)
- Docker release with `arm` architecture HOT 4
- bug: cannot query lineage if job namespace contains colon character HOT 2
- Extra spacing when toggling "Show Field Tags"
- Add Job Tagging to UI
- DATASETS JOBS Something went wrong while fetching initial data. HOT 4
- Job, Dataset, and Event Tables should be filterable, sortable, and searchable
- Web UI error - Module not found HOT 5
- Expose HTTP endpoint SQL queries, queries count and execution time via Prometheus HOT 6
- Dropwizard `3.X`
- Write Kafka consumer for `OLEvent` processing
- Web UI - quality display HOT 4
- Web UI - version history HOT 2
- Dataset and job panels do not page for versions and run history respectively
- No Columns or ability to add field tags when using Job Event static lineage
- zoom and pan not working locally HOT 1
- Connection error while using PostgreSQL and Marquez. HOT 2
- Download lineage as a png image HOT 2
- Support improved transformation metadata from column lineage HOT 1
- [PERF] DatasetDAO findAll query fail if there are too much facet
- Unable to create initial connections of pool. marquez-api HOT 2
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from marquez.