Comments (8)
I agree, we should add a way to get it directly by name in the AbstractStore class. This sounds like it will be a pretty common use case. We can also document that we expect experiments to have unique names (I'm not sure we're actually enforcing this yet).
from mlflow.
@mateiz thanks for the feedback. Currently, it throws an exception if one tries to create experiment with an existing name. Also, what would be the best way to tackle this issue? This kind of enhancement requires changes in several parts of the project. Should I commit the whole change at once or in step-by-step manner ?
from mlflow.
I think doing it in smaller patches is fine, as you did by starting with the protobuf. It would also be good to see a quick API sketch if possible -- what do we expect the Python API to be, as well as the command line interface (where you can currently pass an experiment ID, how would you pass a name).
from mlflow.
+1 to request of accessing experiment by name.
My recommendation however is that it should be a different API GetExperimentByName
for instance. The way it is implemented in the proto -- means that both id
and name
arguments will be necessary inputs. Also, interface logic would need to ascertain that only 1 of them is provided.
You can share the response message of course like suggested here
Also as @mateiz mentioned, this would give us an opportunity to unify the store interfaces as well. As you must have seen FileStore
already supports that API. Would be good to add this to AbstractStore
as well. #68, seems to be creating an experiment. It would be clearer to continue with get_experiment_by_name()
semantics.
I have left some comments in #122 as well.
from mlflow.
@mparkhe thanks for the suggestion. Definitely a way to go. What do you think, do we need correct the naming for the existing functionality e.g. GetExperiment -> GetExperimentById ?
@mateiz I have found couple of more touch points where experiment_id is used.
- As part of mlflow.tracking API: mlflow.tracking.start_run -> if we are getting the Experiment Object by calling GetExperimentByName we can also obtain experiment_id from it. e.g.
exp_obj = mlflow.tracking.GetExperimentByName("name")
with mlflow.start_run(experiment_id=exp_obj.experiment_id)
I don't think, that in this particular scenario we need to make changes in start_run function.
2) As part of mlflow project API: CLI interface allows to pass experiment_id as an argument, I think we should allow experiment_name also there. I see two possibilities there:
a) add experiment_name as parameter to every single function on the way and add parameter handling chunk where we check which parameter was actually passed and which one held to its default value.
b) get experiment object -> extract_id -> pass experiment_id to projects.run (cli.py line 97) and rely on previous logic
3) CLI experiments -> actually here I am not sure. At this moment it supports only FileStore backend. On the other hand it supports only two API calls: create and list, which are implemented also in RestStore class. Maybe there is a special reason for that which I don't see. Also, is there any intention to add more commands there such as GetExperimentById and GetExperimentByName. Could you please clarify a bit here ?
from mlflow.
One of the concerns about changing interfaces is that it affects a whole bunch of users. I would not recommending changing GetExperiment
APIs name or what it does. We expect people to know their experiments by # more than names, because of the most the APIs use experiment_id
(more on that in a bit). My input on your thoughts above
-
I think that usage makes sense. Maybe users can add a check for
None
returned, but using the same Experiment object means will help. -
My concern with adding
name
as an additional argument to all CLI/python APIs is that each of them need to implement some boiler plate handling code: precedence checks betweenid
vsname
arguments, what if they dont match, what if they are both null ... etc? The other solution of possibly duplicating all APIs to handleid
version (what we have currently) and new set forname
version. The concerns I have with that is a lot of replicated code, and if we add a new unique way of identifying experiments likeGetExperimentByArtifactLocation
that would be another flavor.
It is a legit ask to be able to query experiments by id and name (for now). Those apis definitely could be added. For everything else, my recommendation is that we stay with the flow that does
- get_experiment (by id) OR get_experiment_by_name
- null and other checks on returned object
- use experiment_id from returned object for all APIs.
- As above, it definitely makes sense to have a first class query level API to "get experiment by name". We should add that to AbstractStore interface and then add those all store implementations and then expose that out using both Python and REST API.
from mlflow.
Can't disagree, all makes sense! Thanks for the input :)
from mlflow.
@mparkhe @mateiz The following snippet helps to access experiment object either by name or by id.
from mlflow.tracking import get_service
mlflow.set_tracking_uri("http://localhost:5000")
service = get_service()
exp_id = service.get_experiment_by_id(ID).experiment_id
with mlflow.start_run(experiment_id=exp_id):
or
exp_id = service.get_experiment_by_name(NAME).experiment_id
with mlflow.start_run(experiment_id=exp_id):
I tested it on tutorial scripts. Please let me know if anything is missing or should be corrected.
Also, in case everything is fine, shall I add the info about API calls to the documentation too ?
from mlflow.
Related Issues (20)
- [BUG] Cannot save prompt with partial variables. HOT 3
- [BUG] langchain could not determine a constructor for the tag error HOT 2
- [BUG] Loading Pyfunc artifact from pre- saved model uses Windows path format instead of linux. HOT 2
- Artificial artifact limit in `log_table` [BUG] HOT 1
- [BUG] When viewing a json file in artifact as a table, scrollbar is not available. HOT 1
- [DOC-FIX] Missing complete list of REST APIs. HOT 1
- [FR] Add regex comparator `~` to search functions for string attributes and tags HOT 2
- [BUG] Azure Databricks disk_offload error HOT 9
- [BUG] Security Vulnerability HOT 3
- [BUG] UI crashes or is extremely slow from too many columns HOT 1
- [BUG] Conversion of non-dataframe type to dataframe in a pyfunc.PythonModel predict calls HOT 3
- [BUG] performing a custom split persists an additional feature labeled `split` HOT 1
- [FR] Auto-refresh metrics HOT 1
- [FR] ability to drop columns that get used for split, but not for training HOT 1
- Hello, I encountered an issue while attempting to run the official MLflow image. Can anyone assist me with this? I'm a beginner. Thanks HOT 5
- Add load_delta support with `deltalake` package HOT 2
- [BUG] Exception During Model Logging with Custom Ollama Class HOT 6
- comparing metrics with timestamp is not working if step is same between them [BUG] HOT 2
- [BUG] Can't delete runs with mlflow gc due to api timeout HOT 9
- [FR] Display the complete dataset profile in the data details view
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from mlflow.