Comments (4)
I have the same problem, also with the same example run. I have a Linux machine (Ubuntu 22.04) with Python 3.11 running via Miniconda, using MLflow 2.12.2 and Ray 2.22.0.
One thing I did notice is that when I call the trainable function manually with a dictionary it does work well. But if I call it via ray.tune it doesn't work anymore. Then I also am not able to retrieve the tracking-uri from mlflow (mlflow.get_tracking_uri()
) as that just returns None
.
I also cannot complete my tasks so therefore I would also like some help :)
from ray.
I have been able to solve it with the following:
def train_function_mlflow(config):
mlflow_config = config.pop('mlflow', None)
setup_mlflow(config)
mlflow.set_tracking_uri(uri=mlflow_config['tracking_uri'])
mlflow.set_experiment(experiment_name=mlflow_config['experiment_name'])
mlflow.log_params(config)
# Hyperparameters
width, height = config["width"], config["height"]
for step in range(config.get("steps", 100)):
# Iterative training function - can be any arbitrary training procedure
intermediate_score = evaluation_fn(step, width, height)
# Log the metrics to mlflow
mlflow.log_metrics(dict(mean_loss=intermediate_score), step=step)
# Feed the score back to Tune.
train.report({"iterations": step, "mean_loss": intermediate_score})
time.sleep(0.1)
def tune_with_setup(mlflow_tracking_uri, experiment_name, finish_fast=False):
# Set the experiment, or create a new one if does not exist yet.
mlflow.set_tracking_uri(mlflow_tracking_uri)
mlflow.set_experiment(experiment_name=experiment_name)
tuner = tune.Tuner(
train_function_mlflow,
run_config=train.RunConfig(
name="mlflow",
),
tune_config=tune.TuneConfig(
num_samples=5,
),
param_space={
"width": tune.randint(10, 100),
"height": tune.randint(0, 100),
"steps": 5 if finish_fast else 25,
"mlflow": {
"experiment_name": experiment_name,
"tracking_uri": mlflow_tracking_uri,
},
},
)
tuner.fit()
Note that I have to manually log the parameters but it does log to mlflow correctly and the runs don't get the correct Trial name.
from ray.
@JMBokhorst fantastic! I also did a little bit of experiment and it seems like the setup_mlflow line makes no difference to the final results. I am able to log my parameters, metrics, and save a trained model with this though
from ray.
So still a big bug, glad that we found a workaround!
I hope that the ray team can fix it, as it would be neater to use the setup_mlflow
function.
from ray.
Related Issues (20)
- [Data] Allow split by column value in Dataset HOT 3
- [Dashboard] Allow user to stop jobs on Ray Dashboard
- [Dashboard] Add job groups / folders onto the Ray Dashboard
- [Core] GCS crashed with Check failed: sync_reactors_.find(reactor->GetRemoteNodeID()) == sync_reactors_.end() HOT 2
- `np.float` was a deprecated alias for the builtin `float` HOT 2
- [Core] Raylet check failed: placement_group_resource_manager.cc:29: Check failed: ReturnBundle(*iter->second).ok()
- [Core] Emit a metric with Ray (semantic version, commit hash)
- [RLlib] Why does mean_raw_obs_processing_ms include the environment reset time?
- RuntimeError:Unable to meet other process at the rendezvous store HOT 1
- [RLlib] Unable to replicate original PPO performance HOT 1
- [Core] async actors do not terminate cleanly with `__ray_terminate__` HOT 1
- [core][dashboard] GPUs in PGs are not shown in UI page
- [RLlib] Slice Error returned when environment step is longer than 1/20 s
- [Data] Provide a timeout value for map_batch call in ray data. HOT 2
- [Dashboard] Ray Dashboard sometimes auto refreshes to point to wrong job id temporarily. HOT 1
- [RLlib] 'PPOConfig' object has no attribute 'env_runners' HOT 2
- [RLlib] PPO with LSTM, shared vf layers, and custom tokenizer: KeyError: 'infos' in SampleBatch._batch_slice
- CI test windows://python/ray/tests:accelerators/test_amd_gpu is consistently_failing
- CI test windows://python/ray/tests:accelerators/test_amd_gpu is consistently_failing HOT 5
- CI test darwin://python/ray/tests:accelerators/test_amd_gpu is consistently_failing HOT 4
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from ray.