tensorflow / tfx Goto Github PK

View Code? Open in Web Editor NEW

2.1K 87.0 706.0 240.84 MB

TFX is an end-to-end platform for deploying production ML pipelines

Home Page: https://tensorflow.org/tfx

License: Apache License 2.0

Python 98.68% Shell 0.38% Dockerfile 0.10% Jupyter Notebook 0.54% Starlark 0.30%

tensorflow machine-learning apache-beam

tfx's Issues

Unable to visualize Model information in chicago_taxi_tfma.ipynb

Jupyter instance: 8 vCPUs, 30 GB RAM

Versions:

tensorflow==1.13.1
tensorflow-data-validation==0.13.1
tensorflow-metadata==0.13.0
tensorflow-model-analysis==0.13.2
tensorflow-serving-api==1.13.0
tensorflow-transform==0.13.0

When I run this notebook chicago_taxi_tfma.ipynb in the following step:

tfma.view.render_slicing_metrics(result, slicing_column='trip_start_hour')

Nothing happens

PATH_TO_RESULT shows 3 files in eval_result_dir folder:

!gsutil ls $PATH_TO_RESULT
gs://dpe-cloud-mle-chicago-taxi/chicago-taxi-preprocess-20190509-155657/chicago_taxi_output/tft_output/eval_result_dir/eval_config
gs://dpe-cloud-mle-chicago-taxi/chicago-taxi-preprocess-20190509-155657/chicago_taxi_output/tft_output/eval_result_dir/metrics
gs://dpe-cloud-mle-chicago-taxi/chicago-taxi-preprocess-20190509-155657/chicago_taxi_output/tft_output/eval_result_dir/plots

When I run cell, I just get this:

result = tfma.load_eval_result(PATH_TO_RESULT)
WARNING:tensorflow:From /home/jupyter/.local/lib/python2.7/site-packages/tensorflow_model_analysis/evaluators/metrics_and_plots_evaluator.py:83: tf_record_iterator (from tensorflow.python.lib.io.tf_record) is deprecated and will be removed in a future version.
Instructions for updating:
Use eager execution and: 
`tf.data.TFRecordDataset(path)`

tfma.view.render_slicing_metrics(result, slicing_column='trip_start_hour')

I just see this:

U2xpY2luZ01ldHJpY3NWaWV3ZXIoY29uZmlnPXsnd2VpZ2h0ZWRFeGFtcGxlc0NvbHVtbic6ICdwb3N0X2V4cG9ydF9tZXRyaWNzL2V4YW1wbGVfY291bnQnfSwgZGF0YT1beydtZXRyaWNzJzrigKY=

Similar to:

tfma.view.render_plot(result, tfma.slicer.SingleSliceSpec(features=[('trip_start_hour', 1)]))

I see this text only:

UGxvdFZpZXdlcihjb25maWc9eydzbGljZU5hbWUnOiAndHJpcF9zdGFydF9ob3VyOjEnLCAnbWV0cmljS2V5cyc6IHsnYXVjUGxvdCc6IHsnZGF0YVNlcmllcyc6ICdtYXRyaWNlcycsICdtZXTigKY

Support split config across the pipeline platform

We need to support specifying a list of splits in various example_gen implementations and corresponding distribution, and make sure all components can select which splits they process.

target not being properly removed in the example

In this example the label is removed with a pop (here), but unless I'm mistaken it doesn't actually get removed.

Instead of

return transformed_features, transformed_features.pop([...] LABEL_KEY))

Should be

transformed_labels = transformed_features.pop([...] LABEL_KEY)
return transformed_features, transformed_labels

Workshop tutorial bug: unexpected keyword argument 'components'

The commit 7541df5 seems to break the TFX Developer Tutorial in step 2, when executing:

# Open a new terminal window, and in that window ...
source ~/tfx-env/bin/activate
airflow webserver -p 8080

It outputs the error message:

[2019-04-25 18:57:19,644] {__init__.py:416} ERROR - Failed to import: /home/j3soon/airflow/dags/taxi_pipeline_solution.py
Traceback (most recent call last):
  File "/home/j3soon/tfx-env/local/lib/python2.7/site-packages/airflow/models/__init__.py", line 413, in process_file
    m = imp.load_source(mod_name, filepath)
  File "/home/j3soon/airflow/dags/taxi_pipeline_solution.py", line 147, in <module>
    taxi_pipeline = AirflowDAGRunner(_airflow_config).run(_create_pipeline())
  File "/home/j3soon/tfx-env/local/lib/python2.7/site-packages/tfx/orchestration/airflow/airflow_runner.py", line 45, in run
    airflow_dag = airflow_pipeline.AirflowPipeline(**self._config)
TypeError: __init__() got an unexpected keyword argument 'components'
Running the Gunicorn Server with:
Workers: 4 sync
Host: 0.0.0.0:8080
Timeout: 120
Logfiles: - -

My partner following the tutorial last week and it works perfectly. However, when I'm trying to follow the tutorial again on the same machine yesterday, the error occurs.

After replacing the file /home/j3soon/airflow/dags/taxi_pipeline_solution.py with the old file, the error is resolved.

Import errors when trying to run Chicago Taxi on Dataflow

Similarly as in issue #47, I still have a problem with running CTE on Dataflow. When I use the code with no modifications, the error from previous issue persists - it seems that somehow the try-except around the imports doesn't do its job.

When I changed the code to include only the relative import in my fork here, the problem disappeared, but another one manifested.

This time, there's a problem with importing estimator from tensorflow somewhere in the dependencies. Stacktrace:

  File "/usr/local/lib/python2.7/dist-packages/dataflow_worker/batchworker.py", line 773, in run
    self._load_main_session(self.local_staging_directory)
  File "/usr/local/lib/python2.7/dist-packages/dataflow_worker/batchworker.py", line 489, in _load_main_session
    pickler.load_session(session_file)
  File "/usr/local/lib/python2.7/dist-packages/apache_beam/internal/pickler.py", line 269, in load_session
    return dill.load_session(file_path)
  File "/usr/local/lib/python2.7/dist-packages/dill/_dill.py", line 410, in load_session
    module = unpickler.load()
  File "/usr/lib/python2.7/pickle.py", line 864, in load
    dispatch[key](self)
  File "/usr/lib/python2.7/pickle.py", line 1139, in load_reduce
    value = func(*args)
  File "/usr/local/lib/python2.7/dist-packages/dill/_dill.py", line 828, in _import_module
    return getattr(__import__(module, None, None, [obj]), obj)
  File "/usr/local/lib/python2.7/dist-packages/trainer/taxi.py", line 19, in <module>
    from tensorflow_transform import coders as tft_coders
  File "/usr/local/lib/python2.7/dist-packages/tensorflow_transform/__init__.py", line 19, in <module>
    from tensorflow_transform.analyzers import *
  File "/usr/local/lib/python2.7/dist-packages/tensorflow_transform/analyzers.py", line 39, in <module>
    from tensorflow_transform import tf_utils
  File "/usr/local/lib/python2.7/dist-packages/tensorflow_transform/tf_utils.py", line 24, in <module>
    from tensorflow.contrib.proto.python.ops import encode_proto_op
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/contrib/__init__.py", line 48, in <module>
    from tensorflow.contrib import distribute
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/contrib/distribute/__init__.py", line 34, in <module>
    from tensorflow.contrib.distribute.python.tpu_strategy import TPUStrategy
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/contrib/distribute/python/tpu_strategy.py", line 27, in <module>
    from tensorflow.contrib.tpu.python.ops import tpu_ops
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/contrib/tpu/__init__.py", line 73, in <module>
    from tensorflow.contrib.tpu.python.tpu.keras_support import tpu_model as keras_to_tpu_model
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/contrib/tpu/python/tpu/keras_support.py", line 71, in <module>
    from tensorflow.python.estimator import model_fn as model_fn_lib
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/estimator/__init__.py", line 25, in <module>
    import tensorflow.python.estimator.estimator_lib
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/estimator/estimator_lib.py", line 22, in <module>
    from tensorflow.python.estimator.canned.baseline import BaselineClassifier
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/estimator/canned/baseline.py", line 50, in <module>
    from tensorflow.python.estimator import estimator
ImportError: cannot import name estimator

Is there anything I can do to fix this?

Example to send beam jobs to cloud dataflow and training on Cloud ML Engine

Example shows the bare metal running of ml pipeline using airflow/kubeflow.
However there is no information regarding how to launch the beam jobs to cloud data-flow and training on ML engine to scale. Also how to serve model on google Cloud ML Engine.

TFX tutorial 0.13rc1 - jupyter notebook cannot connect to kernel

Environment:
Ubuntu 18.0.4
TFX tutorial
TFX 0.13rc1
Python 3.5
tfx-env virtual environment

When attempting to execute jupyter notebook as part of the tutorial, all notebooks are unable to connect the kernel and therefore cannot be run. The problem is unique to the tfx-env. Jupyter Notebooks outside the environment work fine.

This is a known issues with jupyter and tornado 6.0.x (see jupyterhub/jupyterhub#2451) . To fix the issue I uninstalled tornado while the environment was active and reinstalled with tornado 5.1.1. The commands used were:
pip3 uninstall tornado
pip3 install tornado==5.1.1

After this was done, and jupyter restarted, I was able to execute step3.ipynb, etc, or any other notebook using both the tfx and python3 kernels.

I suggest the environment either:

install a version of jupyter that works with tornado 6.x is installed.
or install tornado 5.1.1 until a version of jupyter that works with the tornado 6.x can be added to the environment.

[Feature Request] Nightly .whl release and docker image

For the purpose of development, it would be useful to have a nightly build from head, both for .whl file for package, as well as Docker image for Kubeflow Runner.

Tutorial missing step: airflow initdb

Never used airflow before, but it appears you need to run:

airflow initdb

before you make the webserver for it to operate correctly. This command was not listed in the tutorial instructions, please update.

Failure to do so gives an error about None not having at method to create dags, because a get_dag() somewhere in the airflow code returns None.

predictions returns empty array: []

I have implemented a custom example after working trough the taxi example.

My model takes in a string with stackoverflow tags separated by a |
like : 'javascript|python|java'
In the preprocessing function I split the one input tensor into mutiple tensors using tf.string_split

the model is a regression that should result in a single number.

The custom implementation of the pipeline runs without problems all the way.
The ml engine model exists, the saved model.pb exists.
I did not change anything to any code that seems responsible for serving.
When I input incorrect data it correctly fails, so the data is actually making it into the model.

But when I try to make a prediction to the new ml engine model it returns an empty array.
like: {'predictions': []}

import json
import numpy as np
import googleapiclient
from googleapiclient import discovery, errors


def predict_json(project, model, instances, version=None):
    """Send json data to a deployed model for prediction.

    Args:
        project (str): project where the Cloud ML Engine Model is deployed.
        model (str): model name.
        instances ([Mapping[str: Any]]): Keys should be the names of Tensors
            your deployed model expects as inputs. Values should be datatypes
            convertible to Tensors, or (potentially nested) lists of datatypes
            convertible to tensors.
        version: str, version of the model to target.
    Returns:
        Mapping[str: any]: dictionary of prediction results defined by the
            model.
    """
    # Create the ML Engine service object.
    # To authenticate set the environment variable
    # GOOGLE_APPLICATION_CREDENTIALS=<path_to_service_account_file>
    service = googleapiclient.discovery.build('ml', 'v1')
    name = 'projects/{}/models/{}'.format(project, model)

    if version is not None:
        name += '/versions/{}'.format(version)

    response = service.projects().predict(
        name=name,
        body={'instances': instances}
    ).execute()

    if 'error' in response:
        print(response['error'])

    return response


data = ['javascript|python|java', 'javascript|java|node']


response = predict_json(
    "my-project", "stackoverflow_tfx", data)

print(response)

Where do I start debugging or does anyone know what could be going on?

could i use pytorch model code on tfx

Update README links after moving examples

After moving the location of the examples in the latest commit 7f5df9f the link in the README is incorrect.

Transforming date values

Hi.

I have a use case where I want to use date features as input values for a predictive model. I need to transform the date features to be useful.
For example, I need to know the difference between two dates (for example, just the difference in days between 01-04-2019 and 16-04-2019, but the dates can also be months or years apart).
Or just getting the day of the month, the month itself or the year (i.e. for 16-04-2019, getting 16, 4 and 2019 as seperate values).

My question is if it is possible to do this within TFX and if not, is this a feature that is coming up?
It would be important for my use case because the transform needs to be done in the graph format so that I can serve the model with the transformations inside the pipeline.
Otherwise I would need to add something that can do this for me outside of TFX.

Thanks in advance!

Martijn

Serving with TF Lite: "expected names to be a vector, got shape: []" while executing ParseExample

Hello,

I successfully converted my model trained and served with TFX to TF Lite with the following code:

converter = tf.lite.TFLiteConverter.from_saved_model(saved_model_dir)
converter.target_ops = [tf.lite.OpsSet.TFLITE_BUILTINS, tf.lite.OpsSet.SELECT_TF_OPS]
tflite_model = converter.convert()
open("converted_model.tflite", "wb").write(tflite_model)

I am trying to run the TF Lite benchmark, and after having built the TF Lite libraries with the additional TensorFlow ops library ('benchmark_model_plus_flex'), I got the following error when I run the benchmark:

STARTING!
Min num runs: [50]
Min runs duration (seconds): [1]
Inter-run delay (seconds): [-1]
Num threads: [1]
Benchmark name: []
Output prefix: []
Min warmup runs: [1]
Min warmup runs duration (seconds): [0.5]
Graph: [converted_model.tflite]
Input layers: []
Input shapes: []
Use nnapi : [0]
Use legacy nnapi : [0]
Use gpu : [0]
Allow fp16 : [0]
Loaded model converted_model.tflite
resolved reporter
INFO: Initialized TensorFlow Lite runtime.
INFO: Created TensorFlow Lite delegate for select TF ops.
2019-05-07 15:25:15.623225: I tensorflow/core/platform/cpu_feature_guard.cc:142] Your CPU supports instructions that this TensorFlow binary was not compiled to use: SSE4.1 SSE4.2 AVX
Initialized session in 16.254ms
Running benchmark for at least 1 iterations and at least 0.5 seconds
ERROR: Expected names to be a vector, got shape: []
	 (while executing 'ParseExample' via Eager)
ERROR: Node number 176 (TfLiteFlexDelegate) failed to invoke.

Failed to invoke!
Aborted (core dumped)

I'm not sure this error comes from the served model by TFX, but I didn't find info on a similar error elsewhere. The benchmark works with public .tflite models (Mobilenet 1.0) though.
I'm using TFX 0.12 and TF 1.13.1

Thank you!

Is support for k-fold cross-validation on the roadmap?

Right now, the training/evaluation split is managed by TFX internally. For making sure model parameters are set appropriately and to avoid having a 'lucky' split affecting the model quality, k-fold cross-validation be useful to have baked into TFX. Is this on the roadmap?

Moreover, will there be support for train/evaluation/validation splits, which is especially useful when tuning hyperparameters of deep networks. The evaluation data may bleed into the training procedure by tweaking the hyperparameters, so that a pure validation data set would be ideal: one the model does not see during training or hyperparameter tuning. Is this on the TFX's radar?

Access TFX logger in module file

Is it possible to (re-)use the TFX logger from the pipeline in the module file?

I have adapted the taxi example for my own use case but I am using print to get additional entries for debugging. It works, but it's not as nice as there is no timestamp and log level. It would be great if we could reuse the TFX logger. If it's not possible, what is the preferred/recommended way of logging in the @PipelineDecorator function or module file: Python's logging module or tf.logging?

Had to update LOCAL_MODEL_DIR location for chicago_taxi_pipeline example

I was following the tfx/examples/chicago_taxi_pipeline/README.md. Everything worked pretty flawlessly. I did have to make a change to:

https://github.com/tensorflow/tfx/blob/master/examples/chicago_taxi/start_model_server_local.sh#L31

I changed LOCAL_MODEL_DIR to:

LOCAL_MODEL_DIR=$TAXI_DIR/serving_model/taxi_simple/

It looks like the code in the chicago_taxi dir is different from the chicago_taxi_pipeline dir example for where the serving model is stored.

Should I submit a small PR for the change?

If so, would it be preferred to have a copy of the start_model_server_local.sh file in both chicago_taxi and chicago_taxi with different values for LOCAL_MODEL_DIR or it's better to have one start_model_server_local.sh file that accepts a LOCAL_MODEL_DIR argument?

If I do a PR, I can also include an update to the README.md to document the change.

my software version

Mac OSX
Python 2.7

p.s.

I just want to thank you for all of the work that Google has done on TFX. It's amazing for ML to be so accessible and for the ecosystem to come together like this.

Thanks

Is support for k-fold cross-validation on the roadmap?

Include python version information in the User Guide

Improve TFMA guide

The TFMA guide shows what charts are available inside a Jupyter notebook, but no such notebook exists in the TFX repository, although there is one for TFDV.

On my own local machine I have done the following to connect to the MLMD:

from ml_metadata.metadata_store import metadata_store
from ml_metadata.proto import metadata_store_pb2

TFX_HOME = os.path.join(os.environ['HOME'], 'tfx')
METADATA_URI = os.path.join(TFX_HOME, 'metadata/taxi/metadata.db')

conn_conf = metadata_store_pb2.ConnectionConfig()
conn_conf.sqlite.filename_uri = METADATA_URI
conn_conf.sqlite.connection_mode = 3
store = metadata_store.MetadataStore(conn_conf)

def get_uris(type_name, split):
    return [a.uri for a in store.get_artifacts_by_type(type_name) if a.properties['split'].string_value == split]

model_path = get_uris('ModelEvalPath', '')[-1] # grab last in list
result = tfma.load_eval_result(model_path)

tfma.view.render_plot(result) # shows nothing!
tfma.view.render_slicing_metrics(result) # shows nothing!

I started Jupyter with jupyter notebook. Note that the notebook does not throw any exceptions. It executes the tfma.view.render_* code but outputs nothing.

I kept the code from the (local) taxi pipeline as is and I've successfully run it locally with Airflow.

Either the code I have is incorrect, or I am missing something, because I do not see any plots in my notebooks.

setup environment

is this a warning or requirment?

tensorflow-model-analysis 0.12.1 has requirement protobuf==3.7.0rc2, but you'll have protobuf 3.7.0 which is incompatible.
tensorflow-transform 0.12.0 has requirement protobuf==3.7.0rc2, but you'll have protobuf 3.7.0 which is incompatible.
apache-beam 2.11.0 has requirement httplib2<=0.11.3,>=0.8, but you'll have httplib2 0.12.1 which is incompatible.
googledatastore 7.0.2 has requirement httplib2<=0.12.0,>=0.9.1, but you'll have httplib2 0.12.1 which is incompatible.

after i do pip install protobuf==3.7.0rc2

get this error:ml-metadata 0.13.2 has requirement protobuf<4,>=3.7, but you'll have protobuf 3.7.0rc2 which is incompatible.
apache-beam 2.11.0 has requirement httplib2<=0.11.3,>=0.8, but you'll have httplib2 0.12.1 which is incompatible.
tfx 0.12.0 has requirement protobuf<4,>=3.7, but you'll have protobuf 3.7.0rc2 which is incompatible.

which one should I follow...

Is there any dashboard like visualization project for the ml-metadata ?

It would be amazing if we have that type of project, would win a lot of heart to on board ml-metadata!

Consider adding "connector" components

It would be beneficial to have core connector components which connect to sources such as BigQuery, CSV, Kafka, S3 ...etc without doing any additional logic and returning either a dict or PCollection.

Reasoning for this is that data sources are not perfect and some transformations may be required prior to example_gen. eg, BigQueryExampleGen fails if the table contains a TIMESTAMP field.

Does this go along with what TFX is attempting to solve or should we assume our data is clean enough for a tf example?

After installing all dependencies, when I run airflow initdb, I get ImportError: cannot import name evaluator_pb2

setup.py creates some generated files that aren't gitignore'd

Here in setup.py:

https://github.com/tensorflow/tfx/blob/master/setup.py#L66-L67

some generated files are being created, but they're not gitignored. I tested it out, and they could be gitignored by:

# .gitignore

# ignore files generated in setup.py by generate_proto
tfx/proto/*_pb2.py

Should they be gitignored then?

no module named 'tfx.executors'

Attempting to run tfx==0.13rc1 on Ubuntu 18.04

Install threw this error:
pip install tfx==0.13rc1
c-google-iam-v1 googleapis-common-protos backcall tornado prometheus-client pandocfilters pyrsistent
ERROR: apache-beam 2.12.0 has requirement httplib2<=0.11.3,>=0.8, but you'll have httplib2 0.12.3 which is incompatible.

Continuing on Airflow ran into:
Initializing Airflow database
/home/jerry/tfx-env/lib/python3.5/site-packages/airflow/configuration.py:214: FutureWarning: The task_runner setting in [core] has the old default value of 'BashTaskRunner'. This value has been changed to 'StandardTaskRunner' in the running config, but please update your config before Apache Airflow 2.0.
FutureWarning,
/home/jerry/tfx-env/lib/python3.5/site-packages/airflow/configuration.py:575: DeprecationWarning: Specifying airflow_home in the config file is deprecated. As you have left it at the default value you should remove the setting from your airflow.cfg and suffer no change in behaviour.
category=DeprecationWarning,
/home/jerry/tfx-env/lib/python3.5/site-packages/apache_beam/init.py:84: UserWarning: Running the Apache Beam SDK on Python 3 is not yet fully supported. You may encounter buggy behavior or missing features.
'Running the Apache Beam SDK on Python 3 is not yet fully supported. '
[2019-04-30 17:46:19,679] {plugins_manager.py:143} ERROR - No module named 'tfx.executors'
Traceback (most recent call last):
File "/home/jerry/tfx-env/lib/python3.5/site-packages/airflow/plugins_manager.py", line 137, in
m = imp.load_source(namespace, filepath)
File "/home/jerry/tfx-env/lib/python3.5/imp.py", line 172, in load_source
module = _load(spec)
File "", line 693, in _load
File "", line 673, in _load_unlocked
File "", line 697, in exec_module
File "", line 222, in _call_with_frames_removed
File "/home/jerry/airflow/plugins/tfx_example/model.py", line 27, in
from tfx.executors.trainer import TrainingSpec
ImportError: No module named 'tfx.executors'
[2019-04-30 17:46:19,679] {plugins_manager.py:144} ERROR - Failed to import plugin /home/jerry/airflow/plugins/tfx_example/model.py
[2019-04-30 17:46:19,695] {plugins_manager.py:143} ERROR - No module named 'tfx.executors'

Error repeats over and over.
Eventually server crashes.

Improvement: allow many files when using csv_input from tfx.utils.dsl_utils

When creating a pipeline to load data from csv, the function tfx.utils.dsl_utils.csv_input(uri) is used. Both from the documentation and from own tests, I noticed that it is not possible to have multiple csv files inside the folder.

I think it is a good feature to be able to have an arbitrary number of csv files inside the folder, and the reader function could put the data together. Some check to ensure that the header is the same for all csvs should be done.

The use case is to skip joining datasets outside of the framework: one could simply add a new csv inside the folder and the next pipeline execution would pick that up together with the rest of the data.

Chicago example model.py: transformed_feature_spec created but not used

In build_estimator the variable transformed_feature_spec is created but seems never to be used in the function.

Publisher nonexistant?

The tfx guide explains every component composed of driver, executer and a publisher.
however, looking at the code I can find evidence to the executers and drivers but not the publishers.
Is there an example of an implemented publisher? if not how do the components are saving their artifacts to the mlmd in the current examples?

Airflow BigQueryExampleGen example

Hi there,

After going through the workshop tutorial, I am attempting to build my own pipeline ingesting from BigQuery rather than a CSV.

The only example using BigQuery is taxi_pipeline_kubeflow.py which assumes execution on GCP.

Is it possible to use the same pipeline as the tutorial where Airflow & AirflowScheduler are running locally but pull data from BigQuery?

How does GCP authentication work in this scenario?

I have tried this snippet along with editing bigquery_default under admin>connections in the Airflow webapp with no luck

@PipelineDecorator(
    pipeline_name='series',
    enable_cache=True,
    metadata_db_root=_metadata_db_root,
    additional_pipeline_args={
            'logger_args': logger_overrides,
            'beam_pipeline_args': [
                '--runner=DirectRunner',
                '--experiments=shuffle_mode=auto',
                '--project=<MY-PROJECT-ID>',
                '--temp_location=<TEMP-DIR-GCP>',
                '--region=us-central1',
            ],
        },
    pipeline_root=_pipeline_root)
def _create_pipeline():
  """Series Pipeline"""
  query = """
    SELECT * FROM `<MY-TABLE>`
  """
  example_gen = BigQueryExampleGen(query=query)

  return [
      example_gen,
  ]


pipeline = AirflowDAGRunner(_airflow_config).run(_create_pipeline())

Add visualizations for custom pipelines in Kubeflow on GKE

The example pipeline in Kubeflow in GCP has artifacts (static HTML) that show the various outputs in visual form (e.g. with TFDV and TFMA). I have created my own pipeline by adapting the TFX kubeflow Python module file, but when running it in Kubeflow (on GCP) I see output paths that lead to the data, but no artifacts being generated.

I have been unable to find output viewers in the TFX repo. Since it's available in the example pipeline I'm assuming it's possible.

Is there special set-up that needs to be done in the code (e.g. an annotation or parameters) or is this something that's lacking in the Docker image that runs in Kubeflow?

Kubeflow's doc says it needs to be in a file /mlpipeline-ui-metadata.json in the root of the container that's run for the step. One of the example notebooks actually has different images for the steps, which might be needed to avoid the JSON from being overwritten.

A default Tensorboard viewer for the training would also be extremely helpful and appears to be supported by Kubeflow.

tfx support for python 3

Hi
is there an estimation when tfx will be based on python 3 (and not as today - python 2)?

Eval gets stuck forever in the Trainer Component

When I'm in the Trainer component, eval gets stuck forever:

[2019-05-13 17:40:53,499] {logging_mixin.py:95} INFO - [2019-05-13 17:40:53,499] {saver.py:1270} INFO - Restoring parameters from /home/benjamintan/workspace/darkrai/logs/shapes_768_1024_20190513T1740/model.ckpt-100
[2019-05-13 17:40:54,151] {logging_mixin.py:95} INFO - [2019-05-13 17:40:54,151] {session_manager.py:491} INFO - Running local_init_op.
[2019-05-13 17:40:54,198] {logging_mixin.py:95} INFO - [2019-05-13 17:40:54,198] {session_manager.py:493} INFO - Done running local_init_op.

Strangely, if I replace the path eval example path with the training example path, it manages to make progress to the model validator (though it fails model validation).

Any pointers on how to debug this?

Link to the example is broken

From the Readme.md page - the link to the 'Chicago Taxi...' example returns a 404

Consider adding Docker-based TFX workshop tutorial

context: https://twitter.com/uncontainer/status/1119979796224561153

All of the links under this page https://www.tensorflow.org/tfx/guide/serving/ returns 404

I just saw the announcement in the Tensorflow Summit; and . looking at the serving models portion in the tfx to understand how serving api fits in the TFX, but all of the links:
https://www.tensorflow.org/tfx/guide/serving/api_rest
https://www.tensorflow.org/tfx/guide/serving/api_docs/cc
https://www.tensorflow.org/tfx/guide/serving/architecture

are returning 404.

Airflow timeout error in chicago_taxi_pipeline example

I followed the steps for installing and running the chicago taxi example and I seem to be getting this timeout error when initializing the DAG in airflow:

[2019-05-04 16:29:56,995] {__init__.py:416} ERROR - Failed to import: /Users/rwu1997/airflow/dags/taxi/taxi_pipeline_simple.py
Traceback (most recent call last):
  File "/Users/rwu1997/anaconda3/envs/taxi_pipeline/lib/python2.7/site-packages/airflow/models/__init__.py", line 413, in process_file
    m = imp.load_source(mod_name, filepath)
  File "/Users/rwu1997/airflow/dags/taxi/taxi_pipeline_simple.py", line 23, in <module>
    from tfx.components.evaluator.component import Evaluator
  File "/Users/rwu1997/anaconda3/envs/taxi_pipeline/lib/python2.7/site-packages/tfx/components/evaluator/component.py", line 24, in <module>
    from tfx.components.evaluator import executor
  File "/Users/rwu1997/anaconda3/envs/taxi_pipeline/lib/python2.7/site-packages/tfx/components/evaluator/executor.py", line 22, in <module>
    import tensorflow_model_analysis as tfma
  File "/Users/rwu1997/anaconda3/envs/taxi_pipeline/lib/python2.7/site-packages/tensorflow_model_analysis/__init__.py", line 19, in <module>
    from tensorflow_model_analysis import view
[2019-05-04 16:31:19 -0400] [82036] [CRITICAL] WORKER TIMEOUT (pid:82765)
  File "/Users/rwu1997/anaconda3/envs/taxi_pipeline/lib/python2.7/site-packages/tensorflow_model_analysis/view/__init__.py", line 15, in <module>
    from tensorflow_model_analysis.view.widget_view import render_plot
  File "/Users/rwu1997/anaconda3/envs/taxi_pipeline/lib/python2.7/site-packages/tensorflow_model_analysis/view/widget_view.py", line 21, in <module>
    from tensorflow_model_analysis.api import model_eval_lib
  File "/Users/rwu1997/anaconda3/envs/taxi_pipeline/lib/python2.7/site-packages/tensorflow_model_analysis/api/model_eval_lib.py", line 31, in <module>
    from tensorflow_model_analysis import types
  File "/Users/rwu1997/anaconda3/envs/taxi_pipeline/lib/python2.7/site-packages/tensorflow_model_analysis/types.py", line 24, in <module>
    from tensorflow_transform.beam import shared
  File "/Users/rwu1997/anaconda3/envs/taxi_pipeline/lib/python2.7/site-packages/tensorflow_transform/__init__.py", line 19, in <module>
    from tensorflow_transform.analyzers import *
  File "/Users/rwu1997/anaconda3/envs/taxi_pipeline/lib/python2.7/site-packages/tensorflow_transform/analyzers.py", line 39, in <module>
    from tensorflow_transform import tf_utils
  File "/Users/rwu1997/anaconda3/envs/taxi_pipeline/lib/python2.7/site-packages/tensorflow_transform/tf_utils.py", line 24, in <module>
    from tensorflow.contrib.proto.python.ops import encode_proto_op
  File "/Users/rwu1997/anaconda3/envs/taxi_pipeline/lib/python2.7/site-packages/tensorflow/contrib/__init__.py", line 49, in <module>
    from tensorflow.contrib import distributions
  File "/Users/rwu1997/anaconda3/envs/taxi_pipeline/lib/python2.7/site-packages/tensorflow/contrib/distributions/__init__.py", line 16, in <module>
    """
  File "/Users/rwu1997/anaconda3/envs/taxi_pipeline/lib/python2.7/site-packages/airflow/utils/timeout.py", line 43, in handle_timeout
    raise AirflowTaskTimeout(self.error_message)
AirflowTaskTimeout: Timeout, PID: 82765

This is my anaconda environment:

WARNING: The conda.compat module is deprecated and will be removed in a future release.
# packages in environment at /Users/rwu1997/anaconda3/envs/taxi_pipeline:
#
# Name                    Version                   Build  Channel
absl-py                   0.7.1                    pypi_0    pypi
alembic                   0.9.10                   pypi_0    pypi
apache-airflow            1.10.3                   pypi_0    pypi
apache-beam               2.12.0                   pypi_0    pypi
appnope                   0.1.0                    pypi_0    pypi
astor                     0.7.1                    pypi_0    pypi
attrs                     19.1.0                   pypi_0    pypi
avro                      1.8.2                    pypi_0    pypi
babel                     2.6.0                    pypi_0    pypi
backports-abc             0.5                      pypi_0    pypi
backports-shutil-get-terminal-size 1.0.0                    pypi_0    pypi
backports-ssl-match-hostname 3.7.0.1                  pypi_0    pypi
backports-weakref         1.0.post1                pypi_0    pypi
bleach                    3.1.0                    pypi_0    pypi
cachetools                3.1.0                    pypi_0    pypi
certifi                   2019.3.9                 py27_0
chardet                   3.0.4                    pypi_0    pypi
click                     7.0                      pypi_0    pypi
colorama                  0.4.1                    pypi_0    pypi
configparser              3.5.3                    pypi_0    pypi
crcmod                    1.7                      pypi_0    pypi
croniter                  0.3.30                   pypi_0    pypi
decorator                 4.4.0                    pypi_0    pypi
defusedxml                0.6.0                    pypi_0    pypi
dill                      0.2.9                    pypi_0    pypi
docker                    3.7.2                    pypi_0    pypi
docker-pycreds            0.4.0                    pypi_0    pypi
docopt                    0.6.2                    pypi_0    pypi
docutils                  0.14                     pypi_0    pypi
entrypoints               0.3                      pypi_0    pypi
enum34                    1.1.6                    pypi_0    pypi
fastavro                  0.21.22                  pypi_0    pypi
fasteners                 0.14.1                   pypi_0    pypi
flask                     1.0.2                    pypi_0    pypi
flask-admin               1.5.3                    pypi_0    pypi
flask-appbuilder          1.12.3                   pypi_0    pypi
flask-babel               0.12.2                   pypi_0    pypi
flask-caching             1.3.3                    pypi_0    pypi
flask-login               0.4.1                    pypi_0    pypi
flask-openid              1.2.5                    pypi_0    pypi
flask-sqlalchemy          2.4.0                    pypi_0    pypi
flask-swagger             0.2.13                   pypi_0    pypi
flask-wtf                 0.14.2                   pypi_0    pypi
funcsigs                  1.0.0                    pypi_0    pypi
functools32               3.2.3.post2              pypi_0    pypi
future                    0.16.0                   pypi_0    pypi
futures                   3.2.0                    pypi_0    pypi
gast                      0.2.2                    pypi_0    pypi
gitdb2                    2.0.5                    pypi_0    pypi
gitpython                 2.1.11                   pypi_0    pypi
google-api-core           1.10.0                   pypi_0    pypi
google-api-python-client  1.7.8                    pypi_0    pypi
google-apitools           0.5.26                   pypi_0    pypi
google-auth               1.6.3                    pypi_0    pypi
google-auth-httplib2      0.0.3                    pypi_0    pypi
google-cloud-bigquery     1.6.1                    pypi_0    pypi
google-cloud-bigtable     0.31.1                   pypi_0    pypi
google-cloud-core         0.28.1                   pypi_0    pypi
google-cloud-pubsub       0.39.0                   pypi_0    pypi
google-resumable-media    0.3.2                    pypi_0    pypi
googleapis-common-protos  1.5.10                   pypi_0    pypi
googledatastore           7.0.2                    pypi_0    pypi
grpc-google-iam-v1        0.11.4                   pypi_0    pypi
grpcio                    1.20.1                   pypi_0    pypi
gunicorn                  19.9.0                   pypi_0    pypi
h5py                      2.9.0                    pypi_0    pypi
hdfs                      2.5.2                    pypi_0    pypi
httplib2                  0.12.3                   pypi_0    pypi
idna                      2.8                      pypi_0    pypi
ipaddress                 1.0.22                   pypi_0    pypi
ipykernel                 4.10.0                   pypi_0    pypi
ipython                   5.8.0                    pypi_0    pypi
ipython-genutils          0.2.0                    pypi_0    pypi
ipywidgets                7.4.2                    pypi_0    pypi
iso8601                   0.1.12                   pypi_0    pypi
itsdangerous              1.1.0                    pypi_0    pypi
jinja2                    2.10                     pypi_0    pypi
json-merge-patch          0.2                      pypi_0    pypi
jsonschema                3.0.1                    pypi_0    pypi
jupyter                   1.0.0                    pypi_0    pypi
jupyter-client            5.2.4                    pypi_0    pypi
jupyter-console           5.2.0                    pypi_0    pypi
jupyter-core              4.4.0                    pypi_0    pypi
keras-applications        1.0.7                    pypi_0    pypi
keras-preprocessing       1.0.9                    pypi_0    pypi
libcxx                    4.0.1                hcfea43d_1
libcxxabi                 4.0.1                hcfea43d_1
libedit                   3.1.20181209         hb402a30_0
libffi                    3.2.1                h475c297_4
lockfile                  0.12.2                   pypi_0    pypi
lxml                      4.3.3                    pypi_0    pypi
mako                      1.0.9                    pypi_0    pypi
markdown                  2.6.11                   pypi_0    pypi
markupsafe                1.1.1                    pypi_0    pypi
mistune                   0.8.4                    pypi_0    pypi
ml-metadata               0.13.2                   pypi_0    pypi
mock                      2.0.0                    pypi_0    pypi
monotonic                 1.5                      pypi_0    pypi
nbconvert                 5.5.0                    pypi_0    pypi
nbformat                  4.4.0                    pypi_0    pypi
ncurses                   6.1                  h0a44026_1
notebook                  5.7.8                    pypi_0    pypi
numpy                     1.16.3                   pypi_0    pypi
oauth2client              3.0.0                    pypi_0    pypi
ordereddict               1.1                      pypi_0    pypi
pandas                    0.24.2                   pypi_0    pypi
pandocfilters             1.4.2                    pypi_0    pypi
pathlib2                  2.3.3                    pypi_0    pypi
pbr                       5.2.0                    pypi_0    pypi
pendulum                  1.4.4                    pypi_0    pypi
pexpect                   4.7.0                    pypi_0    pypi
pickleshare               0.7.5                    pypi_0    pypi
pip                       19.1                     py27_0
prometheus-client         0.6.0                    pypi_0    pypi
prompt-toolkit            1.0.16                   pypi_0    pypi
proto-google-cloud-datastore-v1 0.90.4                   pypi_0    pypi
protobuf                  3.7.1                    pypi_0    pypi
psutil                    5.6.2                    pypi_0    pypi
ptyprocess                0.6.0                    pypi_0    pypi
pyarrow                   0.11.1                   pypi_0    pypi
pyasn1                    0.4.5                    pypi_0    pypi
pyasn1-modules            0.2.5                    pypi_0    pypi
pydot                     1.2.4                    pypi_0    pypi
pygments                  2.3.1                    pypi_0    pypi
pyparsing                 2.4.0                    pypi_0    pypi
pyrsistent                0.15.1                   pypi_0    pypi
python                    2.7.16               h97142e2_0
python-daemon             2.1.2                    pypi_0    pypi
python-dateutil           2.8.0                    pypi_0    pypi
python-editor             1.0.4                    pypi_0    pypi
python-openid             2.2.5                    pypi_0    pypi
pytz                      2019.1                   pypi_0    pypi
pytzdata                  2019.1                   pypi_0    pypi
pyvcf                     0.6.8                    pypi_0    pypi
pyyaml                    3.13                     pypi_0    pypi
pyzmq                     18.0.1                   pypi_0    pypi
qtconsole                 4.4.4                    pypi_0    pypi
readline                  7.0                  h1de35cc_5
requests                  2.21.0                   pypi_0    pypi
rsa                       4.0                      pypi_0    pypi
scandir                   1.10.0                   pypi_0    pypi
scikit-learn              0.20.3                   pypi_0    pypi
scipy                     0.19.1                   pypi_0    pypi
send2trash                1.5.0                    pypi_0    pypi
setproctitle              1.1.10                   pypi_0    pypi
setuptools                41.0.1                   py27_0
simplegeneric             0.8.1                    pypi_0    pypi
singledispatch            3.4.0.3                  pypi_0    pypi
six                       1.12.0                   pypi_0    pypi
smmap2                    2.0.5                    pypi_0    pypi
sqlalchemy                1.2.19                   pypi_0    pypi
sqlite                    3.28.0               ha441bb4_0
tabulate                  0.8.3                    pypi_0    pypi
tenacity                  4.12.0                   pypi_0    pypi
tensorboard               1.12.2                   pypi_0    pypi
tensorflow                1.12.0                   pypi_0    pypi
tensorflow-data-validation 0.12.0                   pypi_0    pypi
tensorflow-metadata       0.12.1                   pypi_0    pypi
tensorflow-model-analysis 0.12.1                   pypi_0    pypi
tensorflow-transform      0.12.0                   pypi_0    pypi
termcolor                 1.1.0                    pypi_0    pypi
terminado                 0.8.2                    pypi_0    pypi
testpath                  0.4.2                    pypi_0    pypi
text-unidecode            1.2                      pypi_0    pypi
tfx                       0.12.0                   pypi_0    pypi
thrift                    0.11.0                   pypi_0    pypi
tk                        8.6.8                ha441bb4_0
tornado                   5.1.1                    pypi_0    pypi
traitlets                 4.3.2                    pypi_0    pypi
typing                    3.6.6                    pypi_0    pypi
tzlocal                   1.5.1                    pypi_0    pypi
unicodecsv                0.14.1                   pypi_0    pypi
uritemplate               3.0.0                    pypi_0    pypi
urllib3                   1.24.3                   pypi_0    pypi
wcwidth                   0.1.7                    pypi_0    pypi
webencodings              0.5.1                    pypi_0    pypi
websocket-client          0.56.0                   pypi_0    pypi
werkzeug                  0.14.1                   pypi_0    pypi
wheel                     0.33.1                   py27_0
widgetsnbextension        3.4.2                    pypi_0    pypi
wtforms                   2.2.1                    pypi_0    pypi
zlib                      1.2.11               h1de35cc_3
zope-deprecation          4.4.0                    pypi_0    pypi

Publish Python3 type hint

Add example with hyperparameters for Kubeflow

It would be great if there would be an example on how to run TFX pipelines in Kubeflow with hyperparameters. Right now all the parameters are hard-coded, but having these exposed in the Kubeflow Pipelines UI (with defaults) would help with setting up a professional ML pipeline.

It's not entirely clear how to do this, so that @PipelineDecorator can pick these up. I presume they need to live in input_dict as kfp.dsl.PipelineParams but a simple example would make that a lot easier to do.

Can a standard Python `.gitignore` be added?

Can a standard Python .gitignore be added?

This is the one that I usually start off with:

https://github.com/github/gitignore/blob/master/Python.gitignore

base_component.BaseComponent.str method raising KeyError

I got a KeyError when calling this method: base_component.BaseComponent.__str__

Here is the code to reproduce:

import os
from tfx.utils.dsl_utils import csv_input
from tfx.components.example_gen.csv_example_gen.component import CsvExampleGen

_taxi_root = os.path.join(os.environ['HOME'], 'taxi')
_data_root = os.path.join(_taxi_root, 'data/simple')
examples = csv_input(_data_root)
example_gen = CsvExampleGen(input_base=examples)
print(example_gen)

The error trace is:

/Users/alelevier/Documents/github/tfx/tfx/components/base/base_component.pyc in __str__(self)
     89         input_dict=self.input_dict,
     90         outputs=self.outputs,
---> 91         exec_properties=self.exec_properties)
     92 
     93   def __repr__(self):

KeyError: '\n  component_name'

I looked at the method, it needs use double {{ and }} so change from:

  def __str__(self):
    return """
{
  component_name: {component_name},
  unique_name: {unique_name},
  driver: {driver},
  executor: {executor},
  input_dict: {input_dict},
  outputs: {outputs},
  exec_properties: {exec_properties}
}
    """.format(  # pylint: disable=missing-format-argument-key
        component_name=self.component_name,
        unique_name=self.unique_name,
        driver=self.driver,
        executor=self.executor,
        input_dict=self.input_dict,
        outputs=self.outputs,
        exec_properties=self.exec_properties)

To:

  def __str__(self):
    return """
{{
  component_name: {component_name},
  unique_name: {unique_name},
  driver: {driver},
  executor: {executor},
  input_dict: {input_dict},
  outputs: {outputs},
  exec_properties: {exec_properties}
}}
    """.format(  # pylint: disable=missing-format-argument-key
        component_name=self.component_name,
        unique_name=self.unique_name,
        driver=self.driver,
        executor=self.executor,
        input_dict=self.input_dict,
        outputs=self.outputs,
        exec_properties=self.exec_properties)

ImportError: No module named tfx.examples.chicago_taxi.trainer on Dataflow

I'm trying to run the Chicago Taxi Example on Google Dataflow and I ran into a problem with imports. After setting PYTHONPATH env in order to make the python files run on my environment, the pipelines fail with

Traceback (most recent call last):
  File "/usr/local/lib/python2.7/dist-packages/dataflow_worker/batchworker.py", line 766, in run
  self._load_main_session(self.local_staging_directory)
  File "/usr/local/lib/python2.7/dist-packages/dataflow_worker/batchworker.py", line 482, in _load_main_session
 pickler.load_session(session_file)
 File "/usr/local/lib/python2.7/dist-packages/apache_beam/internal/pickler.py", line 269, in load_session
 return dill.load_session(file_path)
File "/usr/local/lib/python2.7/dist-packages/dill/_dill.py", line 410, in load_session
   module = unpickler.load()
File "/usr/lib/python2.7/pickle.py", line 864, in load
   dispatch[key](self)
 File "/usr/lib/python2.7/pickle.py", line 1139, in load_reduce
value = func(*args)
 File "/usr/local/lib/python2.7/dist-packages/dill/_dill.py", line 828, in _import_module
return getattr(__import__(module, None, None, [obj]), obj)
ImportError: No module named tfx.examples.chicago_taxi.trainer

I don't know how to remedy this - is it possible to set an environmental variable in a Dataflow pipeline?

Addding Support for Image Dataset

Hi,
We are looking for an pipeline example around Mnist or Cifar10 or Coco data sets.
If you could guide us on how to implement one also will do, we would be happy to support.

Thanks.

@thak123 FYI

[Question] How to prevent running entire pipeline

Hi TFX team,

Amazing work at TF Dev Summit '19. My team has been all in on setting up a TFX pipeline.

We have gotten the Taxi example up and running on AirFlow successfully. However, when we run it, it runs the entire pipeline all the way through.

It was our understanding that TFX would not re-run tasks if the underline component did not change. For instance, when we hit "Trigger DAG", if the data did not change, "CsvExampleGen" and "StatisticsGen" should not run.

Please let us know if we are missing some simple param or what gap we needed filled in if possible. Happy to submit a PR to the docs if needed as well!

Document how to run the test suite

Can it be documented how to run the test suite?

I tried running the tests. I wanted two write a simple test for a code change for #23 to the __str__ method, but trying to run the test suite as is didn't initially work, so I think I'm doing something wrong. Here's the commands I tried:

(taxi_pipeline) aaron@ ~/Documents/github/tfx (object_detection) $ python -m unittest tfx

----------------------------------------------------------------------
Ran 0 tests in 0.000s

OK
(taxi_pipeline) aaron@ ~/Documents/github/tfx (object_detection) $ python -m unittest tfx/
CONTRIBUTING.md        build_docker_image.sh  proto/                 utils/                 
__init__.py            components/            scripts/               version.py             
__init__.pyc           orchestration/         tools/                 version.pyc            
(taxi_pipeline) aaron@ ~/Documents/github/tfx (object_detection) $ python -m unittest tfx.components.example_gen

----------------------------------------------------------------------
Ran 0 tests in 0.000s

OK
(taxi_pipeline) aaron@ ~/Documents/github/tfx (object_detection) $ python tfx/components/example_gen/base_example_gen_executor_test.py 
Traceback (most recent call last):
  File "tfx/components/example_gen/base_example_gen_executor_test.py", line 24, in <module>
    from tfx.components.example_gen import base_example_gen_executor
ImportError: cannot import name base_example_gen_executor

Components not cached in Airflow?

Hello,

I'm running a TFX pipeline in Airflow, with a Transform component. Each time I trigger my DAG, the CsvExampleGen, StatisticsGen, SchemaGen and ExampleValidator all use their cached outputs since no changes occured, each one takes ~20s on my machine. On the other hand, the Transform component re-compute its outputs at each time (taking ~10min) even though nothing changed.
My preprocessing_fn only calculates z_score on dense float features (+ fill-in missing values).

I also have problems with the Evaluator component, it uses its cached results even when the Trainer outputs are different. I am forced to manually delete the Evaluator output folder before running my DAG.

Is this normal behavior?
Thanks

Adding support for parquet files for exampleGen

Hi,
Currently the pipeline supports only inputs from CSV and BigQuery.
Is there a chance to add support for parquet files as well?
Perhaps even adding a tutorial on how to generate a custom exampleGen (component, driver, executor).
Thanks!

Tfx samples without gcs.

Is tfx using kubeflow pipeline strictly tied with gcs access?

How to set the size of train-eval split with CsvExampleGen?

I have been working with a very simple pipeline which loads the iris dataset and generates some statistics about it. But when I run the pipeline, even without specifying a train-eval split anywhere, the folders eval and train are being created under the CsvExampleGen pipeline folder, with tfrecords inside of it, and an apparently predefined split is being applied (namely around 100 training examples and 50 evaluation examples).
My question is: where can I opt for doing the split or not, and where can I set the size of the split?

Pipeline code below, being run in AirFlow:

import os
import logging
import datetime
from tfx.orchestration.airflow.airflow_runner import AirflowDAGRunner
from tfx.orchestration.pipeline import PipelineDecorator

from tfx.utils.dsl_utils import csv_input
from tfx.components.example_gen.csv_example_gen.component import CsvExampleGen
from tfx.components.statistics_gen.component import StatisticsGen
from tfx.orchestration.tfx_runner import TfxRunner


_CASE_FOLDER = os.path.join(os.environ['HOME'], 'cases', 'iris')
_DATA_FOLDER = os.path.join(_CASE_FOLDER, 'data')
_PIPELINE_ROOT_FOLDER = os.path.join(_CASE_FOLDER, 'pipelines')
_METADATA_DB_ROOT_FOLDER = os.path.join(_CASE_FOLDER, 'metadata')
_LOG_ROOT_FOLDER = os.path.join(_CASE_FOLDER, 'logs')


@PipelineDecorator(
    pipeline_name='test_tfx_pipeline_iris',
    pipeline_root=_PIPELINE_ROOT_FOLDER,
    metadata_db_root=_METADATA_DB_ROOT_FOLDER,
    additional_pipeline_args={'logger_args': {
        'log_root': _LOG_ROOT_FOLDER,
        'log_level': logging.INFO
    }}
)
def create_pipeline():

    print("HELLO")
    examples = csv_input(_DATA_FOLDER)

    example_gen = CsvExampleGen(input_base=examples, name='iris_example_gen_1')
    #ingests this examples thing, and returns tf.Example records

    statistics_gen = StatisticsGen(input_data=example_gen.outputs.examples)

    return [
        example_gen, statistics_gen
    ]

_airflow_config = {
    'schedule_interval': None,
    'start_date': datetime.datetime(2019, 1, 1),
}
pipeline = AirflowDAGRunner(_airflow_config).run(create_pipeline())

The folder structure being generated:

Can't set num_steps to None in Trainer component

Hello,

In my Airflow pipeline, I can't set num_steps in TrainArgs to None. According to the doc of TrainSpec, max_steps can be set to None (useful for training exactly one epoch)

max_steps: Int. Positive number of total steps for which to train model. If None, train forever.

trainer = Trainer(module_file=_module_file, transformed_examples=transform.outputs.transformed_examples, schema=infer_schema.outputs.output, transform_output=transform.outputs.transform_output, train_args=trainer_pb2.TrainArgs(num_steps=None), eval_args=trainer_pb2.EvalArgs(num_steps=None))

At execution I get "'ERROR - Must specify max_steps > 0, given: 0".
In my utils.py, the trainer_fn takes hparams in parameter, sent by TrainArgs and EvalArgs above, but inside trainer_fn hparams.train_steps is set to 0, not None.

The problem seems to come from tfx/proto/trainer_pb2.py :

_TRAINARGS = _descriptor.Descriptor( name='TrainArgs', ... has_default_value=False, default_value=0, ...

However, changing the default_value to None didn't seem to help my case. I hard-codded None in my trainer_fn for the moment. Can you allow us to pass None ?

I'm using TFX 0.12 and TF 1.12.0
Thanks

tensorflow / tfx Goto Github PK

tfx's Issues

Recommend Projects

Recommend Topics

Recommend Org