Thanks for the repo!
Problem
I have been trying to display the accuracy of a Keras model which I run with hyperparameter tuning on the AI platform. The method outlined in the AI platform documentation does not work, whereas yours does, and I would like to know what this stems from (it might be a bug?).
Documentation
I have been following the following documentation pages:
1. Overview of hyperparameter tuning
2. Using hyperparameter tuning
According to [1]:
How AI Platform Training gets your metric
You may notice that there are no instructions in this documentation for passing your hyperparameter metric to the AI Platform Training training service. That's because the service monitors TensorFlow summary events generated by your training application and retrieves the metric."
And according to [2], one way of generating such a Tensorflow summary event is by creating a callback class as so:
class MyMetricCallback(tf.keras.callbacks.Callback):
def on_epoch_end(self, epoch, logs=None):
tf.summary.scalar('metric1', logs['RootMeanSquaredError'], epoch)
My code
So in my code I included:
# hptuning_config.yaml
trainingInput:
hyperparameters:
goal: MAXIMIZE
maxTrials: 4
maxParallelTrials: 2
hyperparameterMetricTag: val_accuracy
params:
- parameterName: learning_rate
type: DOUBLE
minValue: 0.001
maxValue: 0.01
scaleType: UNIT_LOG_SCALE
# model.py
class MetricCallback(tf.keras.callbacks.Callback):
def on_epoch_end(self, epoch, logs):
tf.summary.scalar('val_accuracy', logs['val_accuracy'], epoch)
I even tried
# model.py
class MetricCallback(tf.keras.callbacks.Callback):
def __init__(self, logdir):
self.writer = tf.summary.create_file_writer(logdir)
def on_epoch_end(self, epoch, logs):
with writer.as_default():
tf.summary.scalar('val_accuracy', logs['val_accuracy'], epoch)
Which successfully saved the 'val_accuracy' metric to Google storage, which I could also see with TensorBoard. But this was not picked up by AI platform jobs, despite the claim made in [1].
Using your package, I created the following class:
# model.py
class MetricCallback(tf.keras.callbacks.Callback):
def __init__(self):
self.hpt = hypertune.HyperTune()
def on_epoch_end(self, epoch, logs):
self.hpt.report_hyperparameter_tuning_metric(
hyperparameter_metric_tag='val_accuracy',
metric_value=logs['val_accuracy'],
global_step=epoch
)
which works! But I don't see how, since it all it seems to do is write to a file on the AI platform worker at /tmp/hypertune/*
. There is nothing in the documentation that explains how this is getting picked up by the AI platform...
Could you please explain why your HyperTune.report_hyperparameter_tuning_metric
works? Are the docs wrong or out of date?