We need to improve API for kubebench job. Current idea is to have 2 tiers of parameters, where the first tier specifies the kubebench job workflow (e.g. location of second tier configs, type and location of outputs and reports, etc.), the second tier specifies the tfjob (and the like) that includes job specific parameters. The reason we need 2 tiers of config is because we want to decouple the kubebench's configuration/reporting workflow from the actual tf jobs' parameters (which we are interested in for actual benchmarks). We will expect user to have a less frequently changed 1st tier config to specify the config/result locations, while store a bunch of 2nd tier configs in .yaml files and have the 1st tier config point to them so they can easily run multiple benchmark jobs with single-line of parameter changes.
name: my-kubebench-job
namespace: default
configuratorImage: kubeflow/kubebench-helper:0.0.1 //image info is just for example
configuratorCmd: kubebench-configurator
configuratorArgs: --source=local,--runner-config=config/tf-cnn-scenario-1.yaml
configuratorSecrets: github-token,gcloud-cred
configuratorVolumes: kubebench-pvc
outputProcessorImage: some-repo/tf-cnn-output-processor:1.0
outputProcessorCmd: python main.py
outputProcessorArgs: null
outputProcessorSecrets: null
outputProcessorVolumes: kubebench-pvc
reporterImage: kubeflow/kubebench-helper:0.0.1
reporterCmd: kubebench-reporter
reporterArgs: --dest=local,--type=csv,--report-file=report/report.csv
reporterSecrets: null
reporterVolumes: kubebench-pvc
metadata:
name: my-test-scenario
spec:
prototype:
name: tf-job
package: tf-job
registry: github.com/kubeflow/kubeflow/tree/master/kubeflow
parameters:
name: my-tf-job
namespace: default
args: null
image: null
image_gpu: null
image_pull_secrets: null
num_masters: 1
num_ps number: 1
num_workers: 1
num_gpus: 1