I was wondering, wouldn't it be nice to have a neat API for getting an experiment by i

<a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="/us

Clean API for getting experiment name about mlflow HOT 8 CLOSED

cryptexis commented on May 19, 2024 2

Clean API for getting experiment name

from mlflow.

Comments (8)

mateiz commented on May 19, 2024

I agree, we should add a way to get it directly by name in the AbstractStore class. This sounds like it will be a pretty common use case. We can also document that we expect experiments to have unique names (I'm not sure we're actually enforcing this yet).

from mlflow.

cryptexis commented on May 19, 2024

@mateiz thanks for the feedback. Currently, it throws an exception if one tries to create experiment with an existing name. Also, what would be the best way to tackle this issue? This kind of enhancement requires changes in several parts of the project. Should I commit the whole change at once or in step-by-step manner ?

from mlflow.

mateiz commented on May 19, 2024

I think doing it in smaller patches is fine, as you did by starting with the protobuf. It would also be good to see a quick API sketch if possible -- what do we expect the Python API to be, as well as the command line interface (where you can currently pass an experiment ID, how would you pass a name).

from mlflow.

mparkhe commented on May 19, 2024

+1 to request of accessing experiment by name.

My recommendation however is that it should be a different API GetExperimentByName for instance. The way it is implemented in the proto -- means that both id and name arguments will be necessary inputs. Also, interface logic would need to ascertain that only 1 of them is provided.

You can share the response message of course like suggested here

Also as @mateiz mentioned, this would give us an opportunity to unify the store interfaces as well. As you must have seen FileStore already supports that API. Would be good to add this to AbstractStore as well. #68, seems to be creating an experiment. It would be clearer to continue with get_experiment_by_name() semantics.

I have left some comments in #122 as well.

from mlflow.

cryptexis commented on May 19, 2024

@mparkhe thanks for the suggestion. Definitely a way to go. What do you think, do we need correct the naming for the existing functionality e.g. GetExperiment -> GetExperimentById ?
@mateiz I have found couple of more touch points where experiment_id is used.

As part of mlflow.tracking API: mlflow.tracking.start_run -> if we are getting the Experiment Object by calling GetExperimentByName we can also obtain experiment_id from it. e.g.

exp_obj = mlflow.tracking.GetExperimentByName("name")
with mlflow.start_run(experiment_id=exp_obj.experiment_id)

I don't think, that in this particular scenario we need to make changes in start_run function.
2) As part of mlflow project API: CLI interface allows to pass experiment_id as an argument, I think we should allow experiment_name also there. I see two possibilities there:
a) add experiment_name as parameter to every single function on the way and add parameter handling chunk where we check which parameter was actually passed and which one held to its default value.
b) get experiment object -> extract_id -> pass experiment_id to projects.run (cli.py line 97) and rely on previous logic
3) CLI experiments -> actually here I am not sure. At this moment it supports only FileStore backend. On the other hand it supports only two API calls: create and list, which are implemented also in RestStore class. Maybe there is a special reason for that which I don't see. Also, is there any intention to add more commands there such as GetExperimentById and GetExperimentByName. Could you please clarify a bit here ?

from mlflow.

mparkhe commented on May 19, 2024

One of the concerns about changing interfaces is that it affects a whole bunch of users. I would not recommending changing GetExperiment APIs name or what it does. We expect people to know their experiments by # more than names, because of the most the APIs use experiment_id (more on that in a bit). My input on your thoughts above

I think that usage makes sense. Maybe users can add a check for None returned, but using the same Experiment object means will help.
My concern with adding name as an additional argument to all CLI/python APIs is that each of them need to implement some boiler plate handling code: precedence checks between id vs name arguments, what if they dont match, what if they are both null ... etc? The other solution of possibly duplicating all APIs to handle id version (what we have currently) and new set for name version. The concerns I have with that is a lot of replicated code, and if we add a new unique way of identifying experiments like GetExperimentByArtifactLocation that would be another flavor.

It is a legit ask to be able to query experiments by id and name (for now). Those apis definitely could be added. For everything else, my recommendation is that we stay with the flow that does

get_experiment (by id) OR get_experiment_by_name
null and other checks on returned object
use experiment_id from returned object for all APIs.

As above, it definitely makes sense to have a first class query level API to "get experiment by name". We should add that to AbstractStore interface and then add those all store implementations and then expose that out using both Python and REST API.

from mlflow.

cryptexis commented on May 19, 2024

Can't disagree, all makes sense! Thanks for the input :)

from mlflow.

cryptexis commented on May 19, 2024

@mparkhe @mateiz The following snippet helps to access experiment object either by name or by id.

    from mlflow.tracking import get_service
    mlflow.set_tracking_uri("http://localhost:5000")
    service = get_service()

    exp_id = service.get_experiment_by_id(ID).experiment_id
    with mlflow.start_run(experiment_id=exp_id):

    exp_id = service.get_experiment_by_name(NAME).experiment_id
    with mlflow.start_run(experiment_id=exp_id):

I tested it on tutorial scripts. Please let me know if anything is missing or should be corrected.
Also, in case everything is fine, shall I add the info about API calls to the documentation too ?

from mlflow.

Clean API for getting experiment name about mlflow HOT 8 CLOSED

Comments (8)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent