Code Monkey home page Code Monkey logo

cloudml-hypertune's Introduction

Metric Reporting Python Package for CloudML Hypertune

Helper Functions for CloudML Engine Hypertune Services.

pypi versions

Prerequisites

Installation

Install via pip:

pip install cloudml-hypertune

Usage

import hypertune

hpt = hypertune.HyperTune()
hpt.report_hyperparameter_tuning_metric(
    hyperparameter_metric_tag='my_metric_tag',
    metric_value=0.987,
    global_step=1000)

By default, the metric entries will be stored to /tmp/hypertune/output.metrics in json format:

{"global_step": "1000", "my_metric_tag": "0.987", "timestamp": 1525851440.123456, "trial": "0"}

Licensing

  • Apache 2.0

cloudml-hypertune's People

Contributors

nseay avatar wenzhel101 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

cloudml-hypertune's Issues

Import Error

I am doing import hypertune and then hypertune.HyperTune() as in the docs of hypertuning jobs but it should be
from hypertune import hypertune. Right?

ROOT user required to report metrics

I used the package for custom container training with the AI training platform and it works if the user running the training process in the container is the root user. I feel a bit queasy about that and ideally would like to run the training as a less privileged user. However, the location where this package is writing the training-result quantities to appears to be only writeable by root.
It might be useful to allow this package to also work with non-root users.

Report hyperparameters to Google Cloud ML

Hello,
I was trying to setup hypertuning with custom containers on Google Cloud ML, however the reported hyperparameters do not show up in the training dashboard in the column where the reported metric values are supposed to show up. Is there anything special needed (e.g. environment variables) to make this work with Google Cloud ML?

Stable release

Even though this is the recommended way to use hyperparameter tuning jobs on Vertex AI according to the documentation the latest release is an alpha-quality release (v0.1.0.dev6), from 2019, which only declares to be compatible with python 3.5 (hasn't been supported for a year and a half now), and has failing CI on main.

The current state of affairs makes it so that the only reliable path to using hyperparameter tuning in Vertex AI I find is creating my own implementation, which is simple enough as it's just writing a json to a specific location on disk, but seems a bit of a waste given that there's already an "official" implementation.

But I think ideally this should be better supported by GCP itself, as easy distributed hparam tuning is one of the major competitive advantages Vertex AI has over some alternatives.

It'd be good to have a stable (i.e. non-alpha) release of this library with explicit support for python 3.7 to 3.10.

Why does this work when tf.summary.scalar doesn't?

Thanks for the repo!

Problem

I have been trying to display the accuracy of a Keras model which I run with hyperparameter tuning on the AI platform. The method outlined in the AI platform documentation does not work, whereas yours does, and I would like to know what this stems from (it might be a bug?).

Documentation

I have been following the following documentation pages:
1. Overview of hyperparameter tuning
2. Using hyperparameter tuning

According to [1]:

How AI Platform Training gets your metric
You may notice that there are no instructions in this documentation for passing your hyperparameter metric to the AI Platform Training training service. That's because the service monitors TensorFlow summary events generated by your training application and retrieves the metric."

And according to [2], one way of generating such a Tensorflow summary event is by creating a callback class as so:

class MyMetricCallback(tf.keras.callbacks.Callback):

    def on_epoch_end(self, epoch, logs=None):
        tf.summary.scalar('metric1', logs['RootMeanSquaredError'], epoch)

My code

So in my code I included:

# hptuning_config.yaml

trainingInput:
  hyperparameters:
    goal: MAXIMIZE
    maxTrials: 4
    maxParallelTrials: 2
    hyperparameterMetricTag: val_accuracy
    params:
    - parameterName: learning_rate
      type: DOUBLE
      minValue: 0.001
      maxValue: 0.01
      scaleType: UNIT_LOG_SCALE
# model.py

class MetricCallback(tf.keras.callbacks.Callback):

    def on_epoch_end(self, epoch, logs):
        tf.summary.scalar('val_accuracy', logs['val_accuracy'], epoch)

I even tried

# model.py

class MetricCallback(tf.keras.callbacks.Callback):
    def __init__(self, logdir):
        self.writer = tf.summary.create_file_writer(logdir)

    def on_epoch_end(self, epoch, logs):
        with writer.as_default():
            tf.summary.scalar('val_accuracy', logs['val_accuracy'], epoch)

Which successfully saved the 'val_accuracy' metric to Google storage, which I could also see with TensorBoard. But this was not picked up by AI platform jobs, despite the claim made in [1].

Using your package, I created the following class:

# model.py

class MetricCallback(tf.keras.callbacks.Callback):
    def __init__(self):
        self.hpt = hypertune.HyperTune()

    def on_epoch_end(self, epoch, logs):
        self.hpt.report_hyperparameter_tuning_metric(
            hyperparameter_metric_tag='val_accuracy',
            metric_value=logs['val_accuracy'],
            global_step=epoch
        )

which works! But I don't see how, since it all it seems to do is write to a file on the AI platform worker at /tmp/hypertune/*. There is nothing in the documentation that explains how this is getting picked up by the AI platform...

Could you please explain why your HyperTune.report_hyperparameter_tuning_metric works? Are the docs wrong or out of date?

Maintain and update the package :)

This seems to be a dependency for Vertex AI HPT jobs and it's really seldomly maintained :)

  • Upgrade python version
  • Support multi gpus (The HyperTune constructor will fail in multi gpu layout since it will fail on os directoy recreation of the same path)
  • etc..

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.