googlecloudplatform / cloudml-hypertune Goto Github PK

View Code? Open in Web Editor NEW

34.0 22.0 17.0 13 KB

License: Other

Python 100.00%

cloudml-hypertune's Introduction

Metric Reporting Python Package for CloudML Hypertune

Helper Functions for CloudML Engine Hypertune Services.

Prerequisites

Google CloudML Engine Overview.
Google CloudML Engine Hyperparameter Tuning Overview.

Installation

Install via pip:

pip install cloudml-hypertune

Usage

import hypertune

hpt = hypertune.HyperTune()
hpt.report_hyperparameter_tuning_metric(
    hyperparameter_metric_tag='my_metric_tag',
    metric_value=0.987,
    global_step=1000)

By default, the metric entries will be stored to /tmp/hypertune/output.metrics in json format:

{"global_step": "1000", "my_metric_tag": "0.987", "timestamp": 1525851440.123456, "trial": "0"}

Licensing

Apache 2.0

cloudml-hypertune's People

Contributors

Stargazers

Watchers

Forkers

best-cloud-practice-for-data-science tiravata hanfeijp lvikstrom post2web thebinaryone1 global19-atlassian-net global-localhost global19 muskanmahajan37 argparsehtmlfreedesktop shey2702 proppy isabella232 manu87ds arpitjain799 renovate-bot

cloudml-hypertune's Issues

Import Error

I am doing import hypertune and then hypertune.HyperTune() as in the docs of hypertuning jobs but it should be
from hypertune import hypertune. Right?

ROOT user required to report metrics

I used the package for custom container training with the AI training platform and it works if the user running the training process in the container is the root user. I feel a bit queasy about that and ideally would like to run the training as a less privileged user. However, the location where this package is writing the training-result quantities to appears to be only writeable by root.
It might be useful to allow this package to also work with non-root users.

Report hyperparameters to Google Cloud ML

Hello,
I was trying to setup hypertuning with custom containers on Google Cloud ML, however the reported hyperparameters do not show up in the training dashboard in the column where the reported metric values are supposed to show up. Is there anything special needed (e.g. environment variables) to make this work with Google Cloud ML?

Stable release

Even though this is the recommended way to use hyperparameter tuning jobs on Vertex AI according to the documentation the latest release is an alpha-quality release (v0.1.0.dev6), from 2019, which only declares to be compatible with python 3.5 (hasn't been supported for a year and a half now), and has failing CI on main.

The current state of affairs makes it so that the only reliable path to using hyperparameter tuning in Vertex AI I find is creating my own implementation, which is simple enough as it's just writing a json to a specific location on disk, but seems a bit of a waste given that there's already an "official" implementation.

But I think ideally this should be better supported by GCP itself, as easy distributed hparam tuning is one of the major competitive advantages Vertex AI has over some alternatives.

It'd be good to have a stable (i.e. non-alpha) release of this library with explicit support for python 3.7 to 3.10.

Why does this work when tf.summary.scalar doesn't?

Thanks for the repo!

Problem

I have been trying to display the accuracy of a Keras model which I run with hyperparameter tuning on the AI platform. The method outlined in the AI platform documentation does not work, whereas yours does, and I would like to know what this stems from (it might be a bug?).

Documentation

I have been following the following documentation pages:
1. Overview of hyperparameter tuning
2. Using hyperparameter tuning

According to [1]:

How AI Platform Training gets your metric
You may notice that there are no instructions in this documentation for passing your hyperparameter metric to the AI Platform Training training service. That's because the service monitors TensorFlow summary events generated by your training application and retrieves the metric."

And according to [2], one way of generating such a Tensorflow summary event is by creating a callback class as so:

class MyMetricCallback(tf.keras.callbacks.Callback):

    def on_epoch_end(self, epoch, logs=None):
        tf.summary.scalar('metric1', logs['RootMeanSquaredError'], epoch)

My code

So in my code I included:

# hptuning_config.yaml

trainingInput:
  hyperparameters:
    goal: MAXIMIZE
    maxTrials: 4
    maxParallelTrials: 2
    hyperparameterMetricTag: val_accuracy
    params:
    - parameterName: learning_rate
      type: DOUBLE
      minValue: 0.001
      maxValue: 0.01
      scaleType: UNIT_LOG_SCALE

# model.py

class MetricCallback(tf.keras.callbacks.Callback):

    def on_epoch_end(self, epoch, logs):
        tf.summary.scalar('val_accuracy', logs['val_accuracy'], epoch)

I even tried

# model.py

class MetricCallback(tf.keras.callbacks.Callback):
    def __init__(self, logdir):
        self.writer = tf.summary.create_file_writer(logdir)

    def on_epoch_end(self, epoch, logs):
        with writer.as_default():
            tf.summary.scalar('val_accuracy', logs['val_accuracy'], epoch)

Which successfully saved the 'val_accuracy' metric to Google storage, which I could also see with TensorBoard. But this was not picked up by AI platform jobs, despite the claim made in [1].

Using your package, I created the following class:

# model.py

class MetricCallback(tf.keras.callbacks.Callback):
    def __init__(self):
        self.hpt = hypertune.HyperTune()

    def on_epoch_end(self, epoch, logs):
        self.hpt.report_hyperparameter_tuning_metric(
            hyperparameter_metric_tag='val_accuracy',
            metric_value=logs['val_accuracy'],
            global_step=epoch
        )

which works! But I don't see how, since it all it seems to do is write to a file on the AI platform worker at /tmp/hypertune/*. There is nothing in the documentation that explains how this is getting picked up by the AI platform...

Could you please explain why your HyperTune.report_hyperparameter_tuning_metric works? Are the docs wrong or out of date?

How to `report_hyperparameter_tuning_metric` inside `tf.estimator.train_and_evaluate`?

I'm running hyperparameter tuning with a custom container. I've followed the tutorial but I'd like some guidance on how I might be able to report metrics to the hypertune service during the train_and_estimate call instead of afterwards.

Maintain and update the package :)

This seems to be a dependency for Vertex AI HPT jobs and it's really seldomly maintained :)

Upgrade python version
Support multi gpus (The HyperTune constructor will fail in multi gpu layout since it will fail on os directoy recreation of the same path)
etc..