googlecloudplatform / training-data-analyst Goto Github PK

Labs and demos for courses for GCP Training (http://cloud.google.com/training).

License: Apache License 2.0

Shell 0.28% HTML 0.86% Python 6.82% Jupyter Notebook 89.25% Java 0.90% R 0.01% PigLatin 0.01% JavaScript 1.46% C++ 0.01% Dockerfile 0.03% Jsonnet 0.01% Makefile 0.01% C 0.01% CSS 0.02% HCL 0.03% C# 0.01% Go 0.03% Pug 0.17% Jinja 0.09% Scala 0.01%

training-data-analyst's People

Contributors

Stargazers

Watchers

Forkers

wangjiahong ml-ai-nlp-ir bartvandervurst oussamazoghlami ranjankislay digideskio delfosseaurelien srivallapa27 knilima brillozon ramnath-k ishanrustgi tmatsuo jyprincejose clvcooke bbarnes52-zz yardenahirsch alextep linearregression viktort eddyholland ojarjur yuko1972 dorotaw-essence billzajac hakanu leomzhong jonas-eberle justrnr500 mdeloge aosterloh rajivpb nagyistge courageklutse jnsumuec arjunpmm markxp richasdy brokenairplane dsignr k-batayan peihe sinmetal miclae76 weiweivivianwang moorissa buttabu raonyguimaraes mrgoogol da4throux mstiegl codeinpeace veronicagiro2 trampfox teb0w swampo gpraveencr data-carpentry reuben-li samjabrahams chattob mftaha yencarnacion mflaxman10 rajasekharyakkali kanjih-ciandt simonletort sophrinix l30yu charleswhchan enterstudio saasgasques jairo9587 komasoftware 007vasy eulerianial david-mag shivajid sayadrameez pylablanche athanaseus jinjk92217 periasamyr vikramtiwari jwendyr tokimasa xprofessor sowmyakann jinlinsong hemant-rout coiwo srianant rfern topblue danielela billmask sunivazquez nidhi0801 jburke007 telescopeuser

training-data-analyst's Issues

Below error while installing packages PYTHON

google-cloud-storage 1.13.2 has requirement google-cloud-core<0.30dev,>=0.29.0, but you'll have google-cloud-core 0.25.0 which is incompatible.
google-gax 0.15.16 has requirement future<0.17dev,>=0.16.0, but you'll have future 0.17.1 which is incompatible.
apache-beam 2.5.0 has requirement httplib2<0.10,>=0.8, but you'll have httplib2 0.12.0 which is incompatible.
google-cloud-logging 1.9.1 has requirement google-cloud-core<0.30dev,>=0.29.0, but you'll have google-cloud-core 0.25.0 which is incompatible.
google-cloud-spanner 1.7.1 has requirement google-cloud-core<0.30dev,>=0.29.0, but you'll have google-cloud-core 0.25.0 which is incompatible.

Installing collected packages: six
Found existing installation: six 1.10.0
Uninstalling six-1.10.0:
Successfully uninstalled six-1.10.0
Successfully installed six-1.10.0

While installing below packages if I am not wrong

$ cat install_packages.sh
#!/bin/bash
apt-get install python-pip
pip install google-cloud-dataflow oauth2client==3.0.0
pip install --force six==1.10 # downgrade as 1.11 breaks apitools
pip install -U pip

Use train_and_evaluate() rather than learn_runner

For example:

update 08_image/mnistmodel/trainer/task.py to use model.train_and_evaluate() instead of using learn_runner

Production ML models

Hi,
When can we have contents for 6. Production ML models ?

Thanks

ImportError: cannot import name pywrap_tensorflow - deepdive/07_structured/4_preproc_tft.ipynb

I got this error while running through this codelab

Provide a Feature-Engineering example with complex operation

Hi Team,

For the feature engineering function section can you please provide an example with a complex calculation.

Example: We want to generate a new feature as division of two columns only if a third column has value = "Y" else set value for that row as -1.

In pandas it is easily possible with .apply() function but in Tensorflow pipeline how should this be done ?

I tried using tf.where and tf.cond but it doesn't seem to work fine in pipelines for me.

How to create currConditionsTable based on sensorId dynamically?

Dear Sir,

I am executing streaming process job using https://github.com/GoogleCloudPlatform/training-data-analyst/blob/master/courses/streaming/process/sandiego/src/main/java/com/google/cloud/training/dataanalyst/sandiego/CurrentConditions.java.

I would like to create table at Bigquery dynamically per sensorId and table name with sensorId as below. I am not taking project name from options but hard-coded.

String sensorId = info.getSensorKey();
String currConditionsTable =  "brss-711:demos." + sensorId ;

The table name is defined in main function, since I want to create new table per sonsorId, I need to call the above statements from "ToBQRow" function.

Even though I made "currConditionsTable" as global variable, it is not executing and I am getting "Nullpointer exception", since variable contain "null".

Please help me to resolve the issue.

Regards,
Kiran.

response: { message: 'publisher is not defined', internalCode: undefined } }

I'm following codelab instructions to publish a topic in pubsub as below, but an error is returning:
response: { message: 'publisher is not defined', internalCode: undefined } }

at next (/home/google2145703_student/training-data-analyst/courses/developingapps/nodejs/pubsub-languageapi-spanner/start/node_modules/express/lib/router/index.js:275:10

Code:
// Handler for feedback POSTed from the client app
router.post('/feedback/:quiz', (req, res, next) => {
const feedback = req.body;
// TODO: Publish the message into Cloud Pub/Sub
publisher.publishFeedback(feedback).then(() => {
// TODO: Move the statement that returns a message to
// the client app here
res.json('Feedback received');

// END TODO

// TODO: Add a catch
}).catch(err => {
// TODO: There was an error, invoke the next middleware
next(err);

// END TODO

});

// END TODO
});

Problem creating the datalab compute engine.

When I execute the script in this location:
training-data-analyst/datalab/cloudshell/create_vm.sh
I get the following error:
ERROR: (gcloud.compute.instances.create) Could not fetch resource:

The resource 'projects/google-containers/global/images/family/container-vm' was not found

During the stop of tensorboard, the cell will stack trace if no running tensorboard instances

Here is a fix

Stop tensorboard instances

pids_df = TensorBoard.list()
if not pids_df.empty:
  for pid in pids_df['pid']:
    TensorBoard().stop(pid)
    print 'Stopped TensorBoard with pid {}'.format(pid)

Use --job-dir and not --output-dir

In Cloud ML models, use the supplied --job-dir as the output dir. This avoids the need to do code like this:

output_dir = os.path.join(
output_dir,
json.loads(
os.environ.get('TF_CONFIG', '{}')
).get('task', {}).get('trial', '')
)

(it's already done for --job-dir)

--delete--

Error while running python transform.py

While running python transform.py in SSH, getting the below error:

Traceback (most recent call last):
File "transform.py", line 11, in
import urllib.request, urllib.error, urllib.parse
ImportError: No module named request

Please help.

Hidden units is string; needs to be parsed to list of ints

This line in lab and in solution

should have:

arguments['hidden_units'] = [int(v) for v in arguments['hidden_units'].split(' ')]

stacktrace during execution of deepdive/04_features/taxifare/feateng.ipynb

I'm not sure if this is related to new changes released. This code used to work.

python 2.
All cells cleared and restarted. Everything runs until this cell:

preprocess(50*100, 'DataflowRunner') 
#change first arg to None to preprocess full dataset

Result is this stack trace:

Launching Dataflow job preprocess-taxifeatures-180901-165026 ... hang on
/usr/local/envs/py2env/lib/python2.7/site-packages/apache_beam/coders/typecoders.py:135: UserWarning: Using fallback coder for typehint: Any.
  warnings.warn('Using fallback coder for typehint: %r.' % typehint)

CalledProcessErrorTraceback (most recent call last)
<ipython-input-10-b4775e416971> in <module>()
----> 1 preprocess(50*100, 'DataflowRunner')
      2 #change first arg to None to preprocess full dataset

<ipython-input-8-8419c1762ff8> in preprocess(EVERY_N, RUNNER)
     53     )
     54 
---> 55   p.run()

/usr/local/envs/py2env/lib/python2.7/site-packages/apache_beam/pipeline.pyc in run(self, test_runner_api)
    174       finally:
    175         shutil.rmtree(tmpdir)
--> 176     return self.runner.run(self)
    177 
    178   def __enter__(self):

/usr/local/envs/py2env/lib/python2.7/site-packages/apache_beam/runners/dataflow/dataflow_runner.pyc in run(self, pipeline)
    250     # Create the job
    251     result = DataflowPipelineResult(
--> 252         self.dataflow_client.create_job(self.job), self)
    253 
    254     self._metrics = DataflowMetrics(self.dataflow_client, result, self.job)

/usr/local/envs/py2env/lib/python2.7/site-packages/apache_beam/utils/retry.pyc in wrapper(*args, **kwargs)
    166       while True:
    167         try:
--> 168           return fun(*args, **kwargs)
    169         except Exception as exn:  # pylint: disable=broad-except
    170           if not retry_filter(exn):

/usr/local/envs/py2env/lib/python2.7/site-packages/apache_beam/runners/dataflow/internal/apiclient.pyc in create_job(self, job)
    423   def create_job(self, job):
    424     """Creates job description. May stage and/or submit for remote execution."""
--> 425     self.create_job_description(job)
    426 
    427     # Stage and submit the job when necessary

/usr/local/envs/py2env/lib/python2.7/site-packages/apache_beam/runners/dataflow/internal/apiclient.pyc in create_job_description(self, job)
    446     """Creates a job described by the workflow proto."""
    447     resources = dependency.stage_job_resources(
--> 448         job.options, file_copy=self._gcs_file_copy)
    449     job.proto.environment = Environment(
    450         packages=resources, options=job.options,

/usr/local/envs/py2env/lib/python2.7/site-packages/apache_beam/runners/dataflow/internal/dependency.pyc in stage_job_resources(options, file_copy, build_setup_args, temp_dir, populate_requirements_cache)
    377       else:
    378         sdk_remote_location = setup_options.sdk_location
--> 379       _stage_beam_sdk_tarball(sdk_remote_location, staged_path, temp_dir)
    380       resources.append(names.DATAFLOW_SDK_TARBALL_FILE)
    381     else:

/usr/local/envs/py2env/lib/python2.7/site-packages/apache_beam/runners/dataflow/internal/dependency.pyc in _stage_beam_sdk_tarball(sdk_remote_location, staged_path, temp_dir)
    462   elif sdk_remote_location == 'pypi':
    463     logging.info('Staging the SDK tarball from PyPI to %s', staged_path)
--> 464     _dependency_file_copy(_download_pypi_sdk_package(temp_dir), staged_path)
    465   else:
    466     raise RuntimeError(

/usr/local/envs/py2env/lib/python2.7/site-packages/apache_beam/runners/dataflow/internal/dependency.pyc in _download_pypi_sdk_package(temp_dir)
    525       '--no-binary', ':all:', '--no-deps']
    526   logging.info('Executing command: %s', cmd_args)
--> 527   processes.check_call(cmd_args)
    528   zip_expected = os.path.join(
    529       temp_dir, '%s-%s.zip' % (package_name, version))

/usr/local/envs/py2env/lib/python2.7/site-packages/apache_beam/utils/processes.pyc in check_call(*args, **kwargs)
     42   if force_shell:
     43     kwargs['shell'] = True
---> 44   return subprocess.check_call(*args, **kwargs)
     45 
     46 

/usr/local/envs/py2env/lib/python2.7/subprocess.pyc in check_call(*popenargs, **kwargs)
    188         if cmd is None:
    189             cmd = popenargs[0]
--> 190         raise CalledProcessError(retcode, cmd)
    191     return 0
    192 

CalledProcessError: Command '['/usr/local/envs/py2env/bin/python', '-m', 'pip', 'install', '--download', '/tmp/tmp6JRn77', 'google-cloud-dataflow==2.0.0', '--no-binary', ':all:', '--no-deps']' returned non-zero exit status 2

/usr/local/envs/py2env/lib/python2.7/site-packages/apache_beam/io/gcp/gcsio.py:113: DeprecationWarning: object() takes no parameters
  super(GcsIO, cls).__new__(cls, storage_client))

Python3 Support

Seems like code in some of the projects do not support Python3.

For example in devenv project server.py starts as follows:

from BaseHTTPServer import BaseHTTPRequestHandler, HTTPServer

This import will not work in Python3 as BaseHTTPRequestHandler and HTTPServer has been moved to http.server module.

Also the output stream for response must be written as bytes.

A fix for both Python2/3 compatibility for import is

try:
    from BaseHTTPServer import BaseHTTPRequestHandler, HTTPServer
except ImportError:
    from http.server import BaseHTTPRequestHandler, HTTPServer

and for response output stream is

self.wfile.write(b'Hello GCP dev!')

Dual imports of different bigquery libraries as bq can create confusion

In courses/machine_learning/deepdive/02_generalization/create_datasets.ipynb there are 2 imports. One at the top

import google.datalab.bigquery as bq

and second at the last cell

import datalab.bigquery as bq

The problem with this is that the first one uses standard sql by default and second one uses legacy sql by default. If someone runs through the whole notebook and then tries to run earlier queries then it won't work and fail with errors related to enabling standard sql.

Migrate to Python 3

Change scripts and notebooks in this repo from Python 2 to Python 3. You can do this course-by-course & submit pull-requests.

How to predict in feateng.ipynb from csv data

Used training-data-analyst/courses/machine_learning/feateng/feateng.ipynb
with kaggle nyc taxi fare dataset on colab. The issue I encountered is how to predict after trained model? Since I am not using GCP directly I read test data and call predict but predictions were all empty. Can you please let me know how to perform prediction?

CSV_COLUMNS = 'key,fare_amount,pickup_longitude,pickup_latitude,dropoff_longitude,dropoff_latitude,passenger_count,dayofweek,hourofday'.split(',')
LABEL_COLUMN = 'fare_amount'

def pandas_test_input_fn(df):
return tf.estimator.inputs.pandas_input_fn(
x=df,
y=None,
batch_size=512,
num_epochs=1,
shuffle=False,
queue_capacity=1000
)
df_valid2 = pd.read_csv('mydata/valid.csv', header = None, names = CSV_COLUMNS)
predictions = estimator.predict(input_fn = pandas_test_input_fn(df_valid2))

It complains "ValueError: Feature euclidean is not in features dictionary." since that is coming from add_engineered. But confused how to process add_engineered when I feed the data thru pandas?
Seem like need to define tf.estimator.ModeKeys.EVAL but how, right?
Thanks

Error while running Beam on Dataflow in feateng.ipynb

Hello - I'm getting an error when running the code in "Run Beam pipeline on Cloud Dataflow" section of the "feateng" notebook.

Command:
preprocess(50*100, 'DataflowRunner')

Stacktrace:

Launching Dataflow job preprocess-taxifeatures-181109-182408 ... hang on

ContextualVersionConflictTraceback (most recent call last)
<ipython-input-14-b4775e416971> in <module>()
----> 1 preprocess(50*100, 'DataflowRunner')
      2 #change first arg to None to preprocess full dataset

<ipython-input-8-0ab357cc98ce> in preprocess(EVERY_N, RUNNER)
     50           p | 'read_{}'.format(phase) >> beam.io.Read(beam.io.BigQuerySource(query=query))
     51             | 'tocsv_{}'.format(phase) >> beam.Map(to_csv)
---> 52             | 'write_{}'.format(phase) >> beam.io.Write(beam.io.WriteToText(outfile))
     53         )
     54   print("Done")

/usr/local/envs/py2env/lib/python2.7/site-packages/apache_beam/pipeline.pyc in __exit__(self, exc_type, exc_val, exc_tb)
    421   def __exit__(self, exc_type, exc_val, exc_tb):
    422     if not exc_type:
--> 423       self.run().wait_until_finish()
    424 
    425   def visit(self, visitor):

/usr/local/envs/py2env/lib/python2.7/site-packages/apache_beam/pipeline.pyc in run(self, test_runner_api)
    401     if test_runner_api and self._verify_runner_api_compatible():
    402       return Pipeline.from_runner_api(
--> 403           self.to_runner_api(), self.runner, self._options).run(False)
    404 
    405     if self._options.view_as(TypeOptions).runtime_type_check:

/usr/local/envs/py2env/lib/python2.7/site-packages/apache_beam/pipeline.pyc in run(self, test_runner_api)
    414       finally:
    415         shutil.rmtree(tmpdir)
--> 416     return self.runner.run_pipeline(self)
    417 
    418   def __enter__(self):

/usr/local/envs/py2env/lib/python2.7/site-packages/apache_beam/runners/dataflow/dataflow_runner.pyc in run_pipeline(self, pipeline)
    387     # raise an exception.
    388     result = DataflowPipelineResult(
--> 389         self.dataflow_client.create_job(self.job), self)
    390 
    391     # TODO(BEAM-4274): Circular import runners-metrics. Requires refactoring.

/usr/local/envs/py2env/lib/python2.7/site-packages/apache_beam/utils/retry.pyc in wrapper(*args, **kwargs)
    182       while True:
    183         try:
--> 184           return fun(*args, **kwargs)
    185         except Exception as exn:  # pylint: disable=broad-except
    186           if not retry_filter(exn):

/usr/local/envs/py2env/lib/python2.7/site-packages/apache_beam/runners/dataflow/internal/apiclient.pyc in create_job(self, job)
    488   def create_job(self, job):
    489     """Creates job description. May stage and/or submit for remote execution."""
--> 490     self.create_job_description(job)
    491 
    492     # Stage and submit the job when necessary

/usr/local/envs/py2env/lib/python2.7/site-packages/apache_beam/runners/dataflow/internal/apiclient.pyc in create_job_description(self, job)
    517 
    518     # Stage other resources for the SDK harness
--> 519     resources = self._stage_resources(job.options)
    520 
    521     job.proto.environment = Environment(

/usr/local/envs/py2env/lib/python2.7/site-packages/apache_beam/runners/dataflow/internal/apiclient.pyc in _stage_resources(self, options)
    450         options,
    451         temp_dir=tempfile.mkdtemp(),
--> 452         staging_location=google_cloud_options.staging_location)
    453     return resources
    454 

/usr/local/envs/py2env/lib/python2.7/site-packages/apache_beam/runners/portability/stager.pyc in stage_job_resources(self, options, build_setup_args, temp_dir, populate_requirements_cache, staging_location)
    221         resources.extend(
    222             self._stage_beam_sdk(sdk_remote_location, staging_location,
--> 223                                  temp_dir))
    224       elif setup_options.sdk_location == 'container':
    225         # Use the SDK that's built into the container, rather than re-staging

/usr/local/envs/py2env/lib/python2.7/site-packages/apache_beam/runners/portability/stager.pyc in _stage_beam_sdk(self, sdk_remote_location, staging_location, temp_dir)
    464       """
    465     if sdk_remote_location == 'pypi':
--> 466       sdk_local_file = Stager._download_pypi_sdk_package(temp_dir)
    467       sdk_sources_staged_name = Stager.\
    468           _desired_sdk_filename_in_staging_location(sdk_local_file)

/usr/local/envs/py2env/lib/python2.7/site-packages/apache_beam/runners/portability/stager.pyc in _download_pypi_sdk_package(temp_dir, fetch_binary, language_version_tag, language_implementation_tag, abi_tag, platform_tag)
    513     package_name = Stager.get_sdk_package_name()
    514     try:
--> 515       version = pkg_resources.get_distribution(package_name).version
    516     except pkg_resources.DistributionNotFound:
    517       raise RuntimeError('Please set --sdk_location command-line option '

/usr/local/envs/py2env/lib/python2.7/site-packages/pkg_resources/__init__.pyc in get_distribution(dist)
    469         dist = Requirement.parse(dist)
    470     if isinstance(dist, Requirement):
--> 471         dist = get_provider(dist)
    472     if not isinstance(dist, Distribution):
    473         raise TypeError("Expected string, Requirement, or Distribution", dist)

/usr/local/envs/py2env/lib/python2.7/site-packages/pkg_resources/__init__.pyc in get_provider(moduleOrReq)
    345     """Return an IResourceProvider for the named module or requirement"""
    346     if isinstance(moduleOrReq, Requirement):
--> 347         return working_set.find(moduleOrReq) or require(str(moduleOrReq))[0]
    348     try:
    349         module = sys.modules[moduleOrReq]

/usr/local/envs/py2env/lib/python2.7/site-packages/pkg_resources/__init__.pyc in require(self, *requirements)
    889         included, even if they were already activated in this working set.
    890         """
--> 891         needed = self.resolve(parse_requirements(requirements))
    892 
    893         for dist in needed:

/usr/local/envs/py2env/lib/python2.7/site-packages/pkg_resources/__init__.pyc in resolve(self, requirements, env, installer, replace_conflicting, extras)
    780                 # Oops, the "best" so far conflicts with a dependency
    781                 dependent_req = required_by[req]
--> 782                 raise VersionConflict(dist, req).with_context(dependent_req)
    783 
    784             # push the new requirements onto the stack

ContextualVersionConflict: (pytz 2016.7 (/usr/local/envs/py2env/lib/python2.7/site-packages), Requirement.parse('pytz<=2018.4,>=2018.3'), set(['apache-beam']))

Simplify REST access to deployed ML service

Avoid the use of Discovery API and directly hit the ML end-point (since it is documented and won't change)

c9228cb

The resulting code is simpler and easier to understand

cffi.error.VerificationError (undefined symbol: SSLv2_client_method)

Hi,
Running the first block of feateng.ipynb notebook results in cffi.error.VerificationError so the rest doesn't work as well. The same notebook worked fine yesterday.

Isn't this supposed to be a TODO (also the next line)?

training-data-analyst/courses/machine_learning/deepdive/06_structured/serving/application/main.py

Line 32 in 86692d3

credentials = GoogleCredentials.get_application_default()

Hey @lakshmanok
the course is nice but there is very little chance to actually actively learn in most labs as usually one just clicks through pre written code. In this case this example is supposed to be be a TODO according to the lab sheet on QuickLabs...

Updated the CLA with correct details but still the checks are failing!

Originally posted by @lokeshsoni in #330

Error in serving lab 2 with run_dataflow.sh

In the "Prod ML Systems Lab 2 : Serving ML Predictions in batch and real-time" lab, it says:

Step 2

Back in your Cloud Shell, modify the script run_dataflow.sh to get Project Id (using --project) from command line arguments, and then run as follows:

cd ~/training-data-analyst/courses/machine_learning/deepdive/06_structured/labs/serving
./run_dataflow.sh

However, I can already see this set here: https://github.com/GoogleCloudPlatform/training-data-analyst/blob/master/courses/machine_learning/deepdive/06_structured/labs/serving/run_dataflow.sh#L11

I then get this Java error running the script:

[WARNING]
java.lang.reflect.InvocationTargetException
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:498)
        at org.codehaus.mojo.exec.ExecJavaMojo$1.run(ExecJavaMojo.java:293)
        at java.lang.Thread.run(Thread.java:748)
Caused by: java.lang.RuntimeException: Failed to construct instance from factory method DataflowRunner#fromOptions(interface org.apache.beam.sdk.options.PipelineOptions)
        at org.apache.beam.sdk.util.InstanceBuilder.buildFromMethod(InstanceBuilder.java:233)
        at org.apache.beam.sdk.util.InstanceBuilder.build(InstanceBuilder.java:162)
        at org.apache.beam.sdk.PipelineRunner.fromOptions(PipelineRunner.java:55)
        at org.apache.beam.sdk.Pipeline.create(Pipeline.java:150)
        at com.google.cloud.training.mlongcp.AddPrediction.main(AddPrediction.java:69)
        ... 6 more
Caused by: java.lang.reflect.InvocationTargetException
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:498)
        at org.apache.beam.sdk.util.InstanceBuilder.buildFromMethod(InstanceBuilder.java:222)
        ... 10 more
Caused by: java.lang.NoSuchMethodError: com.google.api.client.googleapis.services.json.AbstractGoogleJsonClient$Builder.setBatchPath(Ljava/lang/String;)Lcom/google/api/client/googleapis/services/AbstractG
oogleClient$Builder;
        at com.google.api.services.cloudresourcemanager.CloudResourceManager$Builder.setBatchPath(CloudResourceManager.java:5929)
        at com.google.api.services.cloudresourcemanager.CloudResourceManager$Builder.<init>(CloudResourceManager.java:5908)
        at org.apache.beam.sdk.extensions.gcp.options.GcpOptions$GcpTempLocationFactory.newCloudResourceManagerClient(GcpOptions.java:370)
        at org.apache.beam.sdk.extensions.gcp.options.GcpOptions$GcpTempLocationFactory.create(GcpOptions.java:240)
        at org.apache.beam.sdk.extensions.gcp.options.GcpOptions$GcpTempLocationFactory.create(GcpOptions.java:228)
        at org.apache.beam.sdk.options.ProxyInvocationHandler.returnDefaultHelper(ProxyInvocationHandler.java:592)
        at org.apache.beam.sdk.options.ProxyInvocationHandler.getDefault(ProxyInvocationHandler.java:533)
        at org.apache.beam.sdk.options.ProxyInvocationHandler.invoke(ProxyInvocationHandler.java:155)
        at com.sun.proxy.$Proxy37.getGcpTempLocation(Unknown Source)
        at org.apache.beam.runners.dataflow.DataflowRunner.fromOptions(DataflowRunner.java:240)
        ... 15 more
[INFO] ------------------------------------------------------------------------
[INFO] BUILD FAILURE
[INFO] ------------------------------------------------------------------------
[INFO] Total time: 8.330 s
[INFO] Finished at: 2018-10-14T14:31:16+01:00
[INFO] Final Memory: 26M/62M
[INFO] ------------------------------------------------------------------------
[ERROR] Failed to execute goal org.codehaus.mojo:exec-maven-plugin:1.4.0:java (default-cli) on project pipeline: An exception occured while executing the Java class. null: InvocationTargetException: Faile
d to construct instance from factory method DataflowRunner#fromOptions(interface org.apache.beam.sdk.options.PipelineOptions): com.google.api.client.googleapis.services.json.AbstractGoogleJsonClient$Build
er.setBatchPath(Ljava/lang/String;)Lcom/google/api/client/googleapis/services/AbstractGoogleClient$Builder; -> [Help 1]

Permission denied on resource project

When running b_hyperparam.ipynb from
datalab/notebooks/training-data-analyst/courses/machine_learning/deepdive/05_artandscience/labs
or
datalab/notebooks/training-data-analyst/courses/machine_learning/deepdive/05_artandscience/

I keep on getting:
Removing gs://qwiklabs-gcp-0c3e9ec8e4427080/house_trained/packages/0a26bfb7f0d02a513fe9c410c169dd06a3a4a5c1f6fdd14ca96cc66ac17fdf17/trainer-0.0.0.tar.gz#1545415738922023...
/ [1 objects]
Operation completed over 1 objects.
ERROR: (gcloud.ml-engine.jobs.submit.training) User [[email protected]] does not have permission to access project [qwiklabs-gcp-0c3e9ec8e4427080s] (or it may not exist): Permission denied on resource project qwiklabs-gcp-0c3e9ec8e4427080s.

'@type': type.googleapis.com/google.rpc.Help
links:
- description: Google developers console
  url: https://console.developers.google.com

gcloud SDK, a better solution reproducible environmental variables?

I've been doing the

Machine Learning with TensorFlow on Google Cloud Platform

course on coursera. While working through labs in the course, I have noticed that the strategies for configuring the gcloud sdk are not very robust. Perhaps it is because they are intended to be run on GCP in datalab, but I like doing them on my computer or VMs: datalab itself has been showing non-responsive UI, which may be caused poor network latency, or my persistent use of firefox.

Anyhow, moving onward, there doesn't seem to be a place in the documentation with an advised way of automating the setup of a GCP config, and I have broke quite a few gcloud configurations by running scripts like the one here. It changes the project id, bucket and region in my currently open config. These configurations are proving quite tedious to keep an eye on.

I know terraform and other devops tools offer partial solutions, but this really feels like something that should be native. Does anyone have suggestions on how we could improve the scripts used for setting up GCP environmental variables to stop them from being set on-top of existing configs, but to use a temporary one that belongs exclusively to the script?

Perhaps it is possible to set all of these variables with the python api, and avoid changing any of the configs that are used for bash calls.

San Diego Traffic Example: What is the role of LaneInfo.Java? help wanted

Hi,
I am currently learning GCP, and I've been following some of the examples in codelabs. To be more specific, I've been studying the San Diego traffic example. I don't quite understand what's the role of the file "LaneInfo.Java". It seems that these files define the input variables as strings, and then Currentconditions.java and AverageSpeeds.java use those variable definitions? As part of my learning, I am trying to replicate the same process using the Chicago Traffic dataset, but I keep running into issues when running the averagespeeds.java & laneinfo.java. Any type of insight(s) would be helpful. I am still very new to GCP and java/apache beam in general.

datalab/cloudshell doesn't exist

The coursera video is referencing a folder that doesn't exist

No module named sklearn_crfsuite.estimator

I using sklearn_crfsuite estimator

crf = sklearn_crfsuite.CRF(
algorithm='lbfgs',
c1=0.1,
c2=0.1,
max_iterations=2,
all_possible_transitions=True
)

I'm saving the model as described below:

model = 'model.joblib'
joblib.dump(crf, model)

and when I try to deploy the model it reports this error:

ERROR: (gcloud.alpha.ml-engine.versions.create) Bad model detected with error: "Failed to load model: Could not load the model: /tmp/model/0001/model.joblib. No module named sklearn_crfsuite.estimator. (Error code: 0)"

deploy model:
gcloud alpha ml-engine versions create v1 --model teste --origin $ORI --python-version 2.7 --runtime-version 1.8 --framework scikit-learn

Small typo in compose_gcf_trigger lab

In the jupyter notebook courses/machine_learning/deepdive/10_recommend/composer_gcf_trigger/composertriggered.ipynb
there is a typo/inconsistency in the name of the airflow-variable gcp_completion_bucket.
In the section Complete the DAG file it is called gcs_completion_bucket (note the s at third position).
However, in the section Setting Airflow variables it is called gcp_completion_bucket. (I guess this name is correct since it conforms with the names of the other variables.)

The same applies to the the jupyter notebook in the labs folder courses/machine_learning/deepdive/10_recommend/labs/composer_gcf_trigger/composertriggered.ipynb

Error running TPU lab: prefetch() missing 1 required positional argument

Reminder that this might be enough to correct the lab problem.
#360

tensorflow.python.framework.errors_impl.FailedPreconditionError: Table not initialized.

2019-02-19 11:20:15.948958: W tensorflow/core/framework/op_kernel.cc:1273] OP_REQUIRES failed at lookup_table_op.cc:674 : Failed precondition: Table not initialized.
2019-02-19 11:20:15.948958: W tensorflow/core/framework/op_kernel.cc:1273] OP_REQUIRES failed at lookup_table_op.cc:674 : Failed precondition: Table not initialized.
2019-02-19 11:20:15.958287: I tensorflow/core/kernels/lookup_util.cc:376] Table trying to initialize from file ./temp_output/vocab.txt is already initialized.
Traceback (most recent call last):
File "/home/user12/Documents/answer_evaluation_12_2_19/env3/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1334, in _do_call
return fn(*args)
File "/home/user12/Documents/answer_evaluation_12_2_19/env3/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1319, in _run_fn
options, feed_dict, fetch_list, target_list, run_metadata)
File "/home/user12/Documents/answer_evaluation_12_2_19/env3/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1407, in _call_tf_sessionrun
run_metadata)
tensorflow.python.framework.errors_impl.FailedPreconditionError: Table not initialized.
[[{{node hash_table_Lookup}} = LookupTableFindV2[Tin=DT_STRING, Tout=DT_INT64, _device="/device:CPU:0"](string_to_index/hash_table, SparseToDense, string_to_index/hash_table/Const)]]
[[{{node IteratorGetNext}} = IteratorGetNextoutput_shapes=[[?,?], [?,?], [?,1]], output_types=[DT_INT64, DT_INT64, DT_FLOAT], _device="/job:localhost/replica:0/task:0/device:CPU:0"]]

During handling of the above exception, another exception occurred:

I am trying to run model native in local environment. tf.contrib.lookup.index_table_from_file throws table not initialized error.

debug_demo.ipynb downloads but does not run.

debug_demo.ipynb downloads but does not run. Where are the Google Cloud Datalab credentials etc. for this lab? Is this just a dry lab?

print() is a function in Python 3

flake8 testing of https://github.com/GoogleCloudPlatform/training-data-analyst on Python 3.7.0

$ flake8 . --count --select=E901,E999,F821,F822,F823 --show-source --statistics

./blogs/nexrad2/visualize/plot_pngs.py:41:40: E999 SyntaxError: invalid syntax
  print "Plotting {} into {} upto {} km".format(args.nexrad, args.png, args.range)
                                       ^
./blogs/landsat/ndvi.py:32:33: E999 SyntaxError: invalid syntax
      print 'Getting {0} to {1} '.format(self.gsfile, self.dest)
                                ^
./blogs/landsat/setup.py:81:31: E999 SyntaxError: invalid syntax
    print 'Running command: %s' % command_list
                              ^
./blogs/landsat/dfndvi.py:33:42: E999 SyntaxError: invalid syntax
        print "WARNING! format error on {", line, "}"        
                                         ^
./blogs/lightning/ltgpred/create_dataset.py:330:43: E999 SyntaxError: invalid syntax
    print 'Launching local job ... hang on'
                                          ^
./blogs/timeseries/simplernn/trainer/model.py:48:22: E999 SyntaxError: invalid syntax
    print 'readcsv={}'.format(value_column)
                     ^
./blogs/tf_dataflow_serving/run_pipeline.py:57:12: E999 SyntaxError: invalid syntax
    print ''
           ^
./blogs/tf_dataflow_serving/simulate_stream.py:52:93: E999 SyntaxError: invalid syntax
        print 'Topic does not exist. Please run a stream pipeline first to create the topic.'
                                                                                            ^
./blogs/goes16/maria/create_image.py:84:39: E999 SyntaxError: invalid syntax
        | 'to_jpg' >> beam.Map(lambda (dt,name,lat,lon): 
                                      ^
./courses/machine_learning/feateng/taxifare/trainer/model.py:201:21: F821 undefined name 'SCALE_COLUMNS'
        for name in SCALE_COLUMNS:
                    ^
./courses/machine_learning/feateng/taxifare_tft/trainer/model.py:184:17: F821 undefined name 'tflearn'
        'rmse': tflearn.MetricSpec(metric_fn=metrics.streaming_root_mean_squared_error),
                ^
./courses/machine_learning/feateng/taxifare_tft/trainer/model.py:184:46: F821 undefined name 'metrics'
        'rmse': tflearn.MetricSpec(metric_fn=metrics.streaming_root_mean_squared_error),
                                             ^
./courses/machine_learning/feateng/taxifare_tft/trainer/model.py:185:37: F821 undefined name 'tflearn'
        'training/hptuning/metric': tflearn.MetricSpec(metric_fn=metrics.streaming_root_mean_squared_error),
                                    ^
./courses/machine_learning/feateng/taxifare_tft/trainer/model.py:185:66: F821 undefined name 'metrics'
        'training/hptuning/metric': tflearn.MetricSpec(metric_fn=metrics.streaming_root_mean_squared_error),
                                                                 ^
./courses/machine_learning/deepdive/08_image/mnistmodel/trainer/task.py:117:34: E999 SyntaxError: invalid syntax
     print "Training for {} steps".format(hparams['train_steps'])
                                 ^
./courses/machine_learning/deepdive/08_image/labs/flowersmodel/model.py:103:44: E999 SyntaxError: invalid syntax
    image = #TODO: decode contents into JPEG
                                           ^
./courses/machine_learning/deepdive/08_image/labs/mnistmodel/trainer/task.py:117:34: E999 SyntaxError: invalid syntax
     print "Training for {} steps".format(hparams['train_steps'])
                                 ^
./courses/machine_learning/deepdive/06_structured/labs/serving/application/main.py:32:20: E999 SyntaxError: invalid syntax
credentials = # TODO
                   ^
./courses/machine_learning/deepdive/10_recommend/labs/hybrid_recommendations/hybrid_recommendations_module/trainer/model.py:266:24: F821 undefined name 'NON_FACTOR_COLUMNS'
        for colname in NON_FACTOR_COLUMNS[1:-1]
                       ^
./courses/machine_learning/deepdive/10_recommend/hybrid_recommendations/hybrid_recommendations_module/trainer/model.py:266:24: F821 undefined name 'NON_FACTOR_COLUMNS'
        for colname in NON_FACTOR_COLUMNS[1:-1]
                       ^
./courses/machine_learning/deepdive/09_sequence/kubeflow-app/vendor/kubeflow/core/jupyterhub_spawner.py:66:1: F821 undefined name 'c'
c.JupyterHub.ip = '0.0.0.0'
^
./courses/machine_learning/deepdive/09_sequence/kubeflow-app/vendor/kubeflow/core/jupyterhub_spawner.py:67:1: F821 undefined name 'c'
c.JupyterHub.hub_ip = '0.0.0.0'
^
./courses/machine_learning/deepdive/09_sequence/kubeflow-app/vendor/kubeflow/core/jupyterhub_spawner.py:70:1: F821 undefined name 'c'
c.JupyterHub.cleanup_servers = False
^
./courses/machine_learning/deepdive/09_sequence/kubeflow-app/vendor/kubeflow/core/jupyterhub_spawner.py:76:1: F821 undefined name 'c'
c.JupyterHub.spawner_class = KubeFormSpawner
^
./courses/machine_learning/deepdive/09_sequence/kubeflow-app/vendor/kubeflow/core/jupyterhub_spawner.py:77:1: F821 undefined name 'c'
c.KubeSpawner.singleuser_image_spec = 'gcr.io/kubeflow/tensorflow-notebook'
^
./courses/machine_learning/deepdive/09_sequence/kubeflow-app/vendor/kubeflow/core/jupyterhub_spawner.py:78:1: F821 undefined name 'c'
c.KubeSpawner.cmd = 'start-singleuser.sh'
^
./courses/machine_learning/deepdive/09_sequence/kubeflow-app/vendor/kubeflow/core/jupyterhub_spawner.py:79:1: F821 undefined name 'c'
c.KubeSpawner.args = ['--allow-root']
^
./courses/machine_learning/deepdive/09_sequence/kubeflow-app/vendor/kubeflow/core/jupyterhub_spawner.py:81:1: F821 undefined name 'c'
c.KubeSpawner.start_timeout = 60 * 10
^
./courses/machine_learning/deepdive/09_sequence/kubeflow-app/vendor/kubeflow/core/jupyterhub_spawner.py:90:1: F821 undefined name 'c'
c.KubeSpawner.user_storage_pvc_ensure = True
^
./courses/machine_learning/deepdive/09_sequence/kubeflow-app/vendor/kubeflow/core/jupyterhub_spawner.py:92:1: F821 undefined name 'c'
c.KubeSpawner.user_storage_capacity = '10Gi'
^
./courses/machine_learning/deepdive/09_sequence/kubeflow-app/vendor/kubeflow/core/jupyterhub_spawner.py:93:1: F821 undefined name 'c'
c.KubeSpawner.pvc_name_template = 'claim-{username}{servername}'
^
./courses/machine_learning/deepdive/09_sequence/kubeflow-app/vendor/kubeflow/core/jupyterhub_spawner.py:94:1: F821 undefined name 'c'
c.KubeSpawner.volumes = [
^
./courses/machine_learning/deepdive/09_sequence/kubeflow-app/vendor/kubeflow/core/jupyterhub_spawner.py:102:1: F821 undefined name 'c'
c.KubeSpawner.volume_mounts = [
^
./courses/machine_learning/deepdive/09_sequence/labs/txtclsmodel/trainer/model.py:89:36: E999 SyntaxError: invalid syntax
    x = # TODO (hint: use tokenizer)
                                   ^
./courses/data_analysis/deepdive/pubsub-prework-solution/python/action_publisher.py:31:11: F821 undefined name 'topic_name'
          topic_name, message_future.exception()))
          ^
./courses/data_analysis/deepdive/composer-exercises/hello_world_solution.py:36:14: F821 undefined name 'xrange'
    for i in xrange(number_of_templated_tasks):
             ^
./courses/developingapps/demos/gs2ds/gs2ds.py:33:18: F821 undefined name 'unicode'
    'firstName': unicode(firstName), 
                 ^
./courses/developingapps/demos/gs2ds/gs2ds.py:34:17: F821 undefined name 'unicode'
    'lastName': unicode(lastName), 
                ^
./courses/developingapps/demos/gs2ds/gs2ds.py:37:14: F821 undefined name 'unicode'
    'party': unicode(party), 
             ^
./courses/developingapps/demos/gs2ds/gs2ds.py:38:18: F821 undefined name 'unicode'
    'homeState': unicode(homeState), 
                 ^
./courses/developingapps/python/cloudstorage/end/quiz/__init__.py:26:24: F821 undefined name 'api'
app.register_blueprint(api.routes.api_blueprint, url_prefix='/api')
                       ^
./courses/developingapps/python/cloudstorage/end/quiz/__init__.py:27:24: F821 undefined name 'webapp'
app.register_blueprint(webapp.routes.webapp_blueprint, url_prefix='')                       ^
./courses/developingapps/python/cloudstorage/end/quiz/webapp/questions.py:58:28: F821 undefined name 'unicode'
        data['imageUrl'] = unicode(upload_file(image_file, True))
                           ^
./courses/developingapps/python/cloudstorage/start/quiz/__init__.py:26:24: F821 undefined name 'api'
app.register_blueprint(api.routes.api_blueprint, url_prefix='/api')
                       ^
./courses/developingapps/python/cloudstorage/start/quiz/__init__.py:27:24: F821 undefined name 'webapp'
app.register_blueprint(webapp.routes.webapp_blueprint, url_prefix='')                       ^
./courses/developingapps/python/kubernetesengine/end/frontend/quiz/__init__.py:26:24: F821 undefined name 'api'
app.register_blueprint(api.routes.api_blueprint, url_prefix='/api')
                       ^
./courses/developingapps/python/kubernetesengine/end/frontend/quiz/__init__.py:27:24: F821 undefined name 'webapp'
app.register_blueprint(webapp.routes.webapp_blueprint, url_prefix='')                       ^
./courses/developingapps/python/kubernetesengine/end/frontend/quiz/webapp/questions.py:39:28: F821 undefined name 'unicode'
        data['imageUrl'] = unicode(upload_file(image_file, True))
                           ^
./courses/developingapps/python/kubernetesengine/end/backend/start/frontend/quiz/__init__.py:26:24: F821 undefined name 'api'
app.register_blueprint(api.routes.api_blueprint, url_prefix='/api')
                       ^
./courses/developingapps/python/kubernetesengine/end/backend/start/frontend/quiz/__init__.py:27:24: F821 undefined name 'webapp'
app.register_blueprint(webapp.routes.webapp_blueprint, url_prefix='')                       ^
./courses/developingapps/python/kubernetesengine/end/backend/start/frontend/quiz/webapp/questions.py:39:28: F821 undefined name 'unicode'
        data['imageUrl'] = unicode(upload_file(image_file, True))
                           ^
./courses/developingapps/python/kubernetesengine/start/frontend/quiz/__init__.py:26:24: F821 undefined name 'api'
app.register_blueprint(api.routes.api_blueprint, url_prefix='/api')
                       ^
./courses/developingapps/python/kubernetesengine/start/frontend/quiz/__init__.py:27:24: F821 undefined name 'webapp'
app.register_blueprint(webapp.routes.webapp_blueprint, url_prefix='')                       ^
./courses/developingapps/python/kubernetesengine/start/frontend/quiz/webapp/questions.py:39:28: F821 undefined name 'unicode'
        data['imageUrl'] = unicode(upload_file(image_file, True))
                           ^
./courses/developingapps/python/kubernetesengine/bonus/frontend/quiz/__init__.py:26:24: F821 undefined name 'api'
app.register_blueprint(api.routes.api_blueprint, url_prefix='/api')
                       ^
./courses/developingapps/python/kubernetesengine/bonus/frontend/quiz/__init__.py:27:24: F821 undefined name 'webapp'
app.register_blueprint(webapp.routes.webapp_blueprint, url_prefix='')                       ^
./courses/developingapps/python/kubernetesengine/bonus/frontend/quiz/api/api.py:71:27: E999 SyntaxError: invalid syntax
        print 'answer sent'
                          ^
./courses/developingapps/python/kubernetesengine/bonus/frontend/quiz/webapp/questions.py:39:28: F821 undefined name 'unicode'
        data['imageUrl'] = unicode(upload_file(image_file, True))
                           ^
./courses/developingapps/python/pubsub-languageapi-spanner/end/quiz/__init__.py:26:24: F821 undefined name 'api'
app.register_blueprint(api.routes.api_blueprint, url_prefix='/api')
                       ^
./courses/developingapps/python/pubsub-languageapi-spanner/end/quiz/__init__.py:27:24: F821 undefined name 'webapp'
app.register_blueprint(webapp.routes.webapp_blueprint, url_prefix='')                       ^
./courses/developingapps/python/pubsub-languageapi-spanner/end/quiz/webapp/questions.py:39:28: F821 undefined name 'unicode'
        data['imageUrl'] = unicode(upload_file(image_file, True))
                           ^
./courses/developingapps/python/pubsub-languageapi-spanner/start/quiz/__init__.py:26:24: F821 undefined name 'api'
app.register_blueprint(api.routes.api_blueprint, url_prefix='/api')
                       ^
./courses/developingapps/python/pubsub-languageapi-spanner/start/quiz/__init__.py:27:24: F821 undefined name 'webapp'
app.register_blueprint(webapp.routes.webapp_blueprint, url_prefix='')                       ^
./courses/developingapps/python/pubsub-languageapi-spanner/start/quiz/webapp/questions.py:39:28: F821 undefined name 'unicode'
        data['imageUrl'] = unicode(upload_file(image_file, True))
                           ^
./courses/developingapps/python/pubsub-languageapi-spanner/start/quiz/gcp/pubsub.py:69:16: E999 IndentationError: expected an indented block
"""pull_feedback
               ^
./courses/developingapps/python/pubsub-languageapi-spanner/start/quiz/gcp/spanner.py:87:0: E999 SyntaxError: unexpected EOF while parsing
^
./courses/developingapps/python/pubsub-languageapi-spanner/start/quiz/gcp/languageapi.py:60:14: E999 SyntaxError: unexpected EOF while parsing
    # END TODO             ^
./courses/developingapps/python/pubsub-languageapi-spanner/bonus/quiz/__init__.py:26:24: F821 undefined name 'api'
app.register_blueprint(api.routes.api_blueprint, url_prefix='/api')
                       ^
./courses/developingapps/python/pubsub-languageapi-spanner/bonus/quiz/__init__.py:27:24: F821 undefined name 'webapp'
app.register_blueprint(webapp.routes.webapp_blueprint, url_prefix='')                       ^
./courses/developingapps/python/pubsub-languageapi-spanner/bonus/quiz/webapp/questions.py:39:28: F821 undefined name 'unicode'
        data['imageUrl'] = unicode(upload_file(image_file, True))
                           ^
./courses/developingapps/python/datastore/end/quiz/__init__.py:26:24: F821 undefined name 'api'
app.register_blueprint(api.routes.api_blueprint, url_prefix='/api')
                       ^
./courses/developingapps/python/datastore/end/quiz/__init__.py:27:24: F821 undefined name 'webapp'
app.register_blueprint(webapp.routes.webapp_blueprint, url_prefix='')                       ^
./courses/developingapps/python/datastore/start/quiz/__init__.py:26:24: F821 undefined name 'api'
app.register_blueprint(api.routes.api_blueprint, url_prefix='/api')
                       ^
./courses/developingapps/python/datastore/start/quiz/__init__.py:27:24: F821 undefined name 'webapp'
app.register_blueprint(webapp.routes.webapp_blueprint, url_prefix='')                       ^
./courses/developingapps/python/datastore/bonus/quiz/__init__.py:26:24: F821 undefined name 'api'
app.register_blueprint(api.routes.api_blueprint, url_prefix='/api')
                       ^
./courses/developingapps/python/datastore/bonus/quiz/__init__.py:27:24: F821 undefined name 'webapp'
app.register_blueprint(webapp.routes.webapp_blueprint, url_prefix='')                       ^
./courses/developingapps/python/firebase/end/quiz/__init__.py:26:24: F821 undefined name 'api'
app.register_blueprint(api.routes.api_blueprint, url_prefix='/api')
                       ^
./courses/developingapps/python/firebase/end/quiz/__init__.py:27:24: F821 undefined name 'webapp'
app.register_blueprint(webapp.routes.webapp_blueprint, url_prefix='')                       ^
./courses/developingapps/python/firebase/end/quiz/webapp/questions.py:58:28: F821 undefined name 'unicode'
        data['imageUrl'] = unicode(upload_file(image_file, True))
                           ^
./courses/developingapps/python/firebase/start/quiz/__init__.py:26:24: F821 undefined name 'api'
app.register_blueprint(api.routes.api_blueprint, url_prefix='/api')
                       ^
./courses/developingapps/python/firebase/start/quiz/__init__.py:27:24: F821 undefined name 'webapp'
app.register_blueprint(webapp.routes.webapp_blueprint, url_prefix='')                       ^
./courses/developingapps/python/firebase/start/quiz/webapp/questions.py:58:28: F821 undefined name 'unicode'
        data['imageUrl'] = unicode(upload_file(image_file, True))
                           ^
./courses/developingapps/python/appengine/end/frontend/quiz/__init__.py:26:24: F821 undefined name 'api'
app.register_blueprint(api.routes.api_blueprint, url_prefix='/api')
                       ^
./courses/developingapps/python/appengine/end/frontend/quiz/__init__.py:27:24: F821 undefined name 'webapp'
app.register_blueprint(webapp.routes.webapp_blueprint, url_prefix='')                       ^
./courses/developingapps/python/appengine/end/frontend/quiz/webapp/questions.py:39:28: F821 undefined name 'unicode'
        data['imageUrl'] = unicode(upload_file(image_file, True))
                           ^
./courses/developingapps/python/appengine/start/frontend/quiz/__init__.py:26:24: F821 undefined name 'api'
app.register_blueprint(api.routes.api_blueprint, url_prefix='/api')
                       ^
./courses/developingapps/python/appengine/start/frontend/quiz/__init__.py:27:24: F821 undefined name 'webapp'
app.register_blueprint(webapp.routes.webapp_blueprint, url_prefix='')                       ^
./courses/developingapps/python/appengine/start/frontend/quiz/webapp/questions.py:39:28: F821 undefined name 'unicode'
        data['imageUrl'] = unicode(upload_file(image_file, True))
                           ^
./bootcamps/imagereco/fashionmodel/trainer/model.py:52:12: F821 undefined name 'p2'
  outlen = p2.shape[1]*p2.shape[2]*p2.shape[3] #outlen should be 980
           ^
./bootcamps/imagereco/fashionmodel/trainer/model.py:52:24: F821 undefined name 'p2'
  outlen = p2.shape[1]*p2.shape[2]*p2.shape[3] #outlen should be 980
                       ^
./bootcamps/imagereco/fashionmodel/trainer/model.py:52:36: F821 undefined name 'p2'
  outlen = p2.shape[1]*p2.shape[2]*p2.shape[3] #outlen should be 980
                                   ^
./bootcamps/imagereco/fashionmodel/trainer/model.py:53:23: F821 undefined name 'p2'
  p2flat = tf.reshape(p2, [-1, outlen]) # flattened
                      ^
./bootcamps/imagereco/fashionmodel/trainer/model.py:85:10: F821 undefined name 'ylogits'
  return ylogits, NCLASSES
         ^
./bootcamps/imagereco/fashionmodel/trainer/task.py:117:34: E999 SyntaxError: invalid syntax
     print "Training for {} steps".format(hparams['train_steps'])
                                 ^
19    E999 SyntaxError: invalid syntax
75    F821 undefined name 'p2'
94

Serverless Machine Learning - Lab 7 : Feature Engineering v1.3: apache-beam not installed because of dill

I have some issue with the first cell of the following notebook:
https://github.com/GoogleCloudPlatform/training-data-analyst/blob/master/courses/machine_learning/feateng/feateng.ipynb

%%bash
conda update -y -n base -c defaults conda
source activate py2env
pip uninstall -y google-cloud-dataflow
conda install -y pytz 
pip install apache-beam[gcp]

It seems the issue is the installation of apache-beam:

...
Requirement already satisfied: cachetools>=2.0.0 in /usr/local/envs/py2env/lib/python2.7/site-packages (from google-auth<2.0dev,>=0.4.0->google-api-core[grpc]<2.0.0dev,>=1.4.1->google-cloud-pubsub==0.39.0; extra == "gcp"->apache-beam[gcp]) (2.1.0)
Installing collected packages: dill, pyarrow, typing, pyvcf, fastavro, httplib2, docopt, hdfs, grpc-google-iam-v1, google-api-core, google-cloud-pubsub, monotonic, fasteners, google-apitools, google-cloud-bigquery, apache-beam
Found existing installation: dill 0.2.6
Skipping google-cloud-dataflow as it is not installed.
google-cloud-monitoring 0.28.0 has requirement google-api-core<0.2.0dev,>=0.1.1, but you'll have google-api-core 1.7.0 which is incompatible.
googledatastore 7.0.1 has requirement httplib2<0.10,>=0.9.1, but you'll have httplib2 0.11.3 which is incompatible.
Cannot uninstall 'dill'. It is a distutils installed project and thus we cannot accurately determine which files belong to it which would lead to only a partial uninstall.

then if I check, I don't see the package installed:
conda list

the right env is activated:

py2env * /usr/local/envs/py2env

so when trying to in port the package it doesn't work (restarting the kernel doesn't help since the package was not installed):

ImportErrorTraceback (most recent call last)
<ipython-input-4-830e0319c5fc> in <module>()
      1 import tensorflow as tf
----> 2 import apache_beam as beam
      3 import shutil
      4 print(tf.__version__)


ImportError: No module named apache_beam

It seems that

!conda uninstall dill=0.2.6 -y

can drop dill and then the installation of apache-beam is working. My 1:30 min session is over. I will start again and see if this was a temporary glitch and if the solution above is working.

r'^' seems unnecessary in re.match

training-data-analyst/courses/data_analysis/lab2/python/grep.py

Line 21 in 10f6da3

if re.match( r'^' + re.escape(term), line):

training-data-analyst/courses/data_analysis/lab2/python/grepc.py

Line 20 in 10f6da3

if re.match( r'^' + re.escape(term), line):

Since re.match() checks for a match only at the beginning of the string - regardless of mode - I wonder whether we can get rid of r'^' in front of re.escape(term) as it will do the same, and IMHO, more readable.

Alternatively, I am also considering simply using line.startswith(term), which doesn't need re module at all, and, again IMHO, more Pythonic and faster.

Passing RNN state to next batch in customer estimator?

source: https://github.com/GoogleCloudPlatform/training-data-analyst/blob/master/courses/machine_learning/deepdive/05_artandscience/d_customestimator.ipynb

In model code "simple_rnn" the state is not passed on to the next batch (if I understand it correctly):

   outputs, _ = rnn.static_rnn(lstm_cell, x, dtype = tf.float32)

However if we had a very long time series (SEQ_LEN large e.g. 10000) that we would want to split up into smaller chunks how could we pass on the state to the next batch (i.e. rather than starting from a zero state each time)?

   # initialize somewhere
   state = tf.zeros([BATCH_SIZE, LSTM_SIZE], dtype=tf.float32)  

   # in model code
   outputs, state = rnn.static_rnn(lstm_cell, x, initial_state=state, dtype=tf.float32)

   # the outputs are passed on and used to produce the "predictions_dict"

QUESTION: where & how do we initialize the state, and how is the state passed on when creating a customer estimator?

Improve BigQuery Link for Recommendation Systems Lab 2 (Content-based NN)

The BigQuery link in the notebook points to the welcome page of the (legacy) BigQuery Web UI. As the project always is a fresh, new (Qwiklabs) project, the project & data list is empty.

Finding the data set is a chore and you really need to scan it to solve the TODO (get custom dimension index). So replace the link with a deeplink, aka:
https://console.cloud.google.com/bigquery?dataset=GA360_test&p=cloud-training-demos&d=GA360_test&t=ga_sessions_sample

[edit]Ah, can just make small PR. Will do so when I have time.

Iot data is inserted to Bigquery table based on item name

Hi Sir,

Now the records are created not based on timestamp but on item name as below.
If you see the timestamp column it is not in order. Everytime a new record is created it is getting appended based on item name.

Please find the below output.

device	item	type	state	timestamp
fueb_38B1DB168ABB	dimmer	Dimmer	41	2018-07-24 12:27:31 UTC
fueb_38B1DB168ABB	dimmer	Dimmer	63	2018-07-24 12:24:50 UTC
fueb_38B1DB168ABB	dimmer	Dimmer	80	2018-07-24 12:27:04 UTC
fueb_38B1DB168ABB	light	Switch	ON	2018-07-24 12:24:43 UTC
fueb_38B1DB168ABB	light	Switch	ON	2018-07-24 12:26:03 UTC
fueb_38B1DB168ABB	light	Switch	OFF	2018-07-24 12:22:39 UTC
fueb_38B1DB168ABB	light	Switch	OFF	2018-07-24 12:25:47 UTC
fueb_38B1DB168ABB	color	Color	109100100	2018-07-24 12:27:56 UTC
fueb_38B1DB168ABB	color	Color	201100100	2018-07-24 12:24:57 UTC

Please find the dataflow program which I am using to push iot data to BQ table.
PubSubReader.java.zip

How to resolve this issue and make BQ records based on timestamp.

train_and_evaluate does not seem to work for me

Hi,

I was running this code on GCP and when I get to this line of code, it ended up in an error

train_and_evaluate('babyweight_trained')

I checked every single line of code in the training section and it seems like the error is from the line I mentioned.

InvalidArgumentErrorTraceback (most recent call last)
in ()
26
27 shutil.rmtree('babyweight_trained', ignore_errors=True) # start fresh each time
---> 28 train_and_evaluate('babyweight_trained')

in train_and_evaluate(output_dir)
23 steps=None,
24 exporters=exporter)
---> 25 tf.estimator.train_and_evaluate(estimator, train_spec, eval_spec)
26
27 shutil.rmtree('babyweight_trained', ignore_errors=True) # start fresh each time

/usr/local/envs/py2env/lib/python2.7/site-packages/tensorflow/python/estimator/training.pyc in train_and_evaluate(estimator, train_spec, eval_spec)
437 '(with task id 0). Given task id {}'.format(config.task_id))
438
--> 439 executor.run()
440
441

/usr/local/envs/py2env/lib/python2.7/site-packages/tensorflow/python/estimator/training.pyc in run(self)
516 config.task_type != run_config_lib.TaskType.EVALUATOR):
517 logging.info('Running training and evaluation locally (non-distributed).')
--> 518 self.run_local()
519 return
520

/usr/local/envs/py2env/lib/python2.7/site-packages/tensorflow/python/estimator/training.pyc in run_local(self)
648 input_fn=self._train_spec.input_fn,
649 max_steps=self._train_spec.max_steps,
--> 650 hooks=train_hooks)
651
652 # Final export signal: For any eval result with global_step >= train

/usr/local/envs/py2env/lib/python2.7/site-packages/tensorflow/python/estimator/estimator.pyc in train(self, input_fn, hooks, steps, max_steps, saving_listeners)
361
362 saving_listeners = _check_listeners_type(saving_listeners)
--> 363 loss = self._train_model(input_fn, hooks, saving_listeners)
364 logging.info('Loss for final step: %s.', loss)
365 return self

/usr/local/envs/py2env/lib/python2.7/site-packages/tensorflow/python/estimator/estimator.pyc in _train_model(self, input_fn, hooks, saving_listeners)
841 return self._train_model_distributed(input_fn, hooks, saving_listeners)
842 else:
--> 843 return self._train_model_default(input_fn, hooks, saving_listeners)
844
845 def _train_model_default(self, input_fn, hooks, saving_listeners):

/usr/local/envs/py2env/lib/python2.7/site-packages/tensorflow/python/estimator/estimator.pyc in _train_model_default(self, input_fn, hooks, saving_listeners)
857 return self._train_with_estimator_spec(estimator_spec, worker_hooks,
858 hooks, global_step_tensor,
--> 859 saving_listeners)
860
861 def _train_model_distributed(self, input_fn, hooks, saving_listeners):

/usr/local/envs/py2env/lib/python2.7/site-packages/tensorflow/python/estimator/estimator.pyc in _train_with_estimator_spec(self, estimator_spec, worker_hooks, hooks, global_step_tensor, saving_listeners)
1057 loss = None
1058 while not mon_sess.should_stop():
-> 1059 _, loss = mon_sess.run([estimator_spec.train_op, estimator_spec.loss])
1060 return loss
1061

/usr/local/envs/py2env/lib/python2.7/site-packages/tensorflow/python/training/monitored_session.pyc in exit(self, exception_type, exception_value, traceback)
677 if exception_type in [errors.OutOfRangeError, StopIteration]:
678 exception_type = None
--> 679 self._close_internal(exception_type)
680 # exit should return True to suppress an exception.
681 return exception_type is None

/usr/local/envs/py2env/lib/python2.7/site-packages/tensorflow/python/training/monitored_session.pyc in _close_internal(self, exception_type)
714 if self._sess is None:
715 raise RuntimeError('Session is already closed.')
--> 716 self._sess.close()
717 finally:
718 self._sess = None

/usr/local/envs/py2env/lib/python2.7/site-packages/tensorflow/python/training/monitored_session.pyc in close(self)
962 if self._sess:
963 try:
--> 964 self._sess.close()
965 except _PREEMPTION_ERRORS:
966 pass

/usr/local/envs/py2env/lib/python2.7/site-packages/tensorflow/python/training/monitored_session.pyc in close(self)
1106 self._coord.join(
1107 stop_grace_period_secs=self._stop_grace_period_secs,
-> 1108 ignore_live_threads=True)
1109 finally:
1110 try:

/usr/local/envs/py2env/lib/python2.7/site-packages/tensorflow/python/training/coordinator.pyc in join(self, threads, stop_grace_period_secs, ignore_live_threads)
387 self._registered_threads = set()
388 if self._exc_info_to_raise:
--> 389 six.reraise(*self._exc_info_to_raise)
390 elif stragglers:
391 if ignore_live_threads:

/usr/local/envs/py2env/lib/python2.7/site-packages/tensorflow/python/training/queue_runner_impl.pyc in _run(self, sess, enqueue_op, coord)
250 break
251 try:
--> 252 enqueue_callable()
253 except self._queue_closed_exception_types: # pylint: disable=catching-non-exception
254 # This exception indicates that a queue was closed.

/usr/local/envs/py2env/lib/python2.7/site-packages/tensorflow/python/client/session.pyc in _single_operation_run()
1242
1243 def _single_operation_run():
-> 1244 self._call_tf_sessionrun(None, {}, [], target_list, None)
1245
1246 return _single_operation_run

/usr/local/envs/py2env/lib/python2.7/site-packages/tensorflow/python/client/session.pyc in _call_tf_sessionrun(self, options, feed_dict, fetch_list, target_list, run_metadata)
1407 return tf_session.TF_SessionRun_wrapper(
1408 self._session, options, feed_dict, fetch_list, target_list,
-> 1409 run_metadata)
1410 else:
1411 with errors.raise_exception_on_not_ok_status() as status:

InvalidArgumentError: assertion failed: [string_input_producer requires a non-null input tensor]
[[Node: input_producer/Assert/Assert = Assert[T=[DT_STRING], summarize=3, _device="/job:localhost/replica:0/task:0/device:CPU:0"](input_producer/Greater, input_producer/Assert/Assert/data_0)]]

InvalidArgumentErrorTraceback (most recent call last)
in ()
26
27 shutil.rmtree('babyweight_trained', ignore_errors=True) # start fresh each time
---> 28 train_and_evaluate('babyweight_trained')

TF Serving for "Distributed training and monitoring" by running d_traineval.ipynb gives { "error": "Serving signature name: "serving_default" not found in signature def" }

While i executed "d_traineval.ipynb" I could export the exported model. I used the model and used a local docker image of Tensorflow serving 1.8 CPU and i get the following result as output for REST post call
{
"error": "Serving signature name: "serving_default" not found in signature def"
}

My request JSON:
{
"instances": [
{"pickuplon" : -73.987625,
"pickuplat" : 40.750617,
"dropofflat" : 40.78518,
"dropofflon" : -73.971163,
"passengers" : 2
}
]
}

Can you please help me what is the error?

Update use of google.datalab.bigquery to google-cloud-bigquery

Update tutorials and course content that currently use Datalab's BigQuery module to use the official python client library. The google-cloud-* client libraries are now Google's recommended way of interacting with GCP.

Error while running Beam on Dataflow in stager.pyc

RuntimeError Traceback (most recent call last)
in ()
89
90 if name=="main":
---> 91 preprocessing()

in preprocessing(argv)
85 # print(lines)
86 messages | beam.io.Write(beam.io.WriteToText("gs://anadarko/output.txt"))
---> 87 result = p.run()
88 result.wait_until_finish()
89

/root/anaconda2/lib/python2.7/site-packages/apache_beam/pipeline.pyc in run(self, test_runner_api)
401 if test_runner_api and self._verify_runner_api_compatible():
402 return Pipeline.from_runner_api(
--> 403 self.to_runner_api(), self.runner, self._options).run(False)
404
405 if self._options.view_as(TypeOptions).runtime_type_check:

/root/anaconda2/lib/python2.7/site-packages/apache_beam/pipeline.pyc in run(self, test_runner_api)
414 finally:
415 shutil.rmtree(tmpdir)
--> 416 return self.runner.run_pipeline(self)
417
418 def enter(self):

/root/anaconda2/lib/python2.7/site-packages/apache_beam/runners/dataflow/dataflow_runner.pyc in run_pipeline(self, pipeline)
387 # raise an exception.
388 result = DataflowPipelineResult(
--> 389 self.dataflow_client.create_job(self.job), self)
390
391 # TODO(BEAM-4274): Circular import runners-metrics. Requires refactoring.

/root/anaconda2/lib/python2.7/site-packages/apache_beam/utils/retry.pyc in wrapper(*args, **kwargs)
182 while True:
183 try:
--> 184 return fun(*args, **kwargs)
185 except Exception as exn: # pylint: disable=broad-except
186 if not retry_filter(exn):

/root/anaconda2/lib/python2.7/site-packages/apache_beam/runners/dataflow/internal/apiclient.pyc in create_job(self, job)
488 def create_job(self, job):
489 """Creates job description. May stage and/or submit for remote execution."""
--> 490 self.create_job_description(job)
491
492 # Stage and submit the job when necessary

/root/anaconda2/lib/python2.7/site-packages/apache_beam/runners/dataflow/internal/apiclient.pyc in create_job_description(self, job)
517
518 # Stage other resources for the SDK harness
--> 519 resources = self._stage_resources(job.options)
520
521 job.proto.environment = Environment(

/root/anaconda2/lib/python2.7/site-packages/apache_beam/runners/dataflow/internal/apiclient.pyc in _stage_resources(self, options)
450 options,
451 temp_dir=tempfile.mkdtemp(),
--> 452 staging_location=google_cloud_options.staging_location)
453 return resources
454

/root/anaconda2/lib/python2.7/site-packages/apache_beam/runners/portability/stager.pyc in stage_job_resources(self, options, build_setup_args, temp_dir, populate_requirements_cache, staging_location)
221 resources.extend(
222 self._stage_beam_sdk(sdk_remote_location, staging_location,
--> 223 temp_dir))
224 elif setup_options.sdk_location == 'container':
225 # Use the SDK that's built into the container, rather than re-staging

/root/anaconda2/lib/python2.7/site-packages/apache_beam/runners/portability/stager.pyc in _stage_beam_sdk(self, sdk_remote_location, staging_location, temp_dir)
464 """
465 if sdk_remote_location == 'pypi':
--> 466 sdk_local_file = Stager._download_pypi_sdk_package(temp_dir)
467 sdk_sources_staged_name = Stager.
468 _desired_sdk_filename_in_staging_location(sdk_local_file)

/root/anaconda2/lib/python2.7/site-packages/apache_beam/runners/portability/stager.pyc in _download_pypi_sdk_package(temp_dir, fetch_binary, language_version_tag, language_implementation_tag, abi_tag, platform_tag)
552 processes.check_call(cmd_args)
553 except subprocess.CalledProcessError as e:
--> 554 raise RuntimeError(repr(e))
555
556 for sdk_file in expected_files:

RuntimeError: CalledProcessError()

TypeError: init() takes exactly 2 arguments (3 given) / Dataflow

I am running (https://github.com/GoogleCloudPlatform/training-data-analyst/blob/master/courses/machine_learning/feateng/tftransform.ipynb)

based on the updates discussed in the previous issue:

#313

But I'm getting the following error:

Traceback (most recent call last):
File "/usr/local/lib/python2.7/dist-packages/dataflow_worker/batchworker.py", line 642, in do_work
work_executor.execute()
File "/usr/local/lib/python2.7/dist-packages/dataflow_worker/executor.py", line 130, in execute
test_shuffle_sink=self._test_shuffle_sink)
File "/usr/local/lib/python2.7/dist-packages/dataflow_worker/executor.py", line 104, in create_operation
is_streaming=False)
File "apache_beam/runners/worker/operations.py", line 636, in apache_beam.runners.worker.operations.create_operation
op = create_pgbk_op(name_context, spec, counter_factory, state_sampler)
File "apache_beam/runners/worker/operations.py", line 482, in apache_beam.runners.worker.operations.create_pgbk_op
return PGBKCVOperation(step_name, spec, counter_factory, state_sampler)
File "apache_beam/runners/worker/operations.py", line 538, in apache_beam.runners.worker.operations.PGBKCVOperation.init
fn, args, kwargs = pickler.loads(self.spec.combine_fn)[:3]
File "/usr/local/lib/python2.7/dist-packages/apache_beam/internal/pickler.py", line 246, in loads
return dill.loads(s)
File "/usr/local/lib/python2.7/dist-packages/dill/_dill.py", line 316, in loads
return load(file, ignore)
File "/usr/local/lib/python2.7/dist-packages/dill/_dill.py", line 304, in load
obj = pik.load()
File "/usr/lib/python2.7/pickle.py", line 864, in load
dispatchkey
File "/usr/lib/python2.7/pickle.py", line 1139, in load_reduce
value = func(*args)
TypeError: init() takes exactly 2 arguments (3 given)

apache-airflow==1.9.0
apache-beam==2.8.0
tensorflow==1.9.0
tensorflow-metadata==0.9.0
tensorflow-transform==0.9.0

I also noticed the SDK changed from 2.7 (10/22) to 2.8 (10/26)

Use Keras to implement model_fn in machine_learning/deepdive

In machine_learning/deepdive, simplify the model functions by replacing them with Keras instead of using tf.layers functionality.

failure with feat engineering notebook when using model trained in cloud.

@alexhanna @lakshmanok Notice the /tmp/test.json works for the locally trained model but not the cloud trained model. Once the additional features are added the cloud version can do a predict.

taxi-feateng.pdf

Big Data & ML Fundamentals Lab 4: Recommendations ML with Dataproc v1.3: "19/02/13 12:26:23 WARN org.apache.hadoop.hdfs.DataStreamer: Caught exception java.lang.InterruptedException"

Hi there,

just for info. In "Big Data & ML Fundamentals Lab 4: Recommendations ML with Dataproc v1.3" when running a pyspark job with Dataproc, code is running but there are "caught exception". Maybe something to be update in the config. At the end the job run and succeed. Here the full log and the following code is run:

https://github.com/GoogleCloudPlatform/training-data-analyst/blob/master/CPB100/lab3b/sparkml/train_and_apply.py

19/02/13 12:25:54 INFO org.spark_project.jetty.util.log: Logging initialized @3300ms
19/02/13 12:25:54 INFO org.spark_project.jetty.server.Server: jetty-9.3.z-SNAPSHOT, build timestamp: unknown, git hash: unknown
19/02/13 12:25:54 INFO org.spark_project.jetty.server.Server: Started @3435ms
19/02/13 12:25:54 INFO org.spark_project.jetty.server.AbstractConnector: Started ServerConnector@5a39e97c{HTTP/1.1,[http/1.1]}{0.0.0.0:4040}
19/02/13 12:25:54 WARN org.apache.spark.scheduler.FairSchedulableBuilder: Fair Scheduler configuration file not found so jobs will be scheduled in FIFO order. To use fair scheduling, configure pools in fairscheduler.xml or set spark.scheduler.allocation.file to a file that contains the configuration.
19/02/13 12:25:56 INFO org.apache.hadoop.yarn.client.RMProxy: Connecting to ResourceManager at cluster-d34e-m/10.128.0.2:8032
19/02/13 12:25:56 INFO org.apache.hadoop.yarn.client.AHSProxy: Connecting to Application History server at cluster-d34e-m/10.128.0.2:10200
19/02/13 12:25:59 WARN org.apache.hadoop.hdfs.DataStreamer: Caught exception
java.lang.InterruptedException
at java.lang.Object.wait(Native Method)
at java.lang.Thread.join(Thread.java:1252)
at java.lang.Thread.join(Thread.java:1326)
at org.apache.hadoop.hdfs.DataStreamer.closeResponder(DataStreamer.java:980)
at org.apache.hadoop.hdfs.DataStreamer.endBlock(DataStreamer.java:630)
at org.apache.hadoop.hdfs.DataStreamer.run(DataStreamer.java:807)
19/02/13 12:26:00 INFO org.apache.hadoop.yarn.client.api.impl.YarnClientImpl: Submitted application application_1550060283142_0001
19/02/13 12:26:09 WARN org.apache.spark.SparkContext: Spark is not running in local mode, therefore the checkpoint directory must not be on the local filesystem. Directory 'checkpoint/' appears to be on the local filesystem.
read ...
19/02/13 12:26:23 WARN org.apache.hadoop.hdfs.DataStreamer: Caught exception
java.lang.InterruptedException
at java.lang.Object.wait(Native Method)
at java.lang.Thread.join(Thread.java:1252)
at java.lang.Thread.join(Thread.java:1326)
at org.apache.hadoop.hdfs.DataStreamer.closeResponder(DataStreamer.java:980)
at org.apache.hadoop.hdfs.DataStreamer.endBlock(DataStreamer.java:630)
at org.apache.hadoop.hdfs.DataStreamer.run(DataStreamer.java:807)
trained ...
predicted for user=0
predicted for user=1
predicted for user=2

Thanks
Cheers
Fabien

regarding the code sample courses/machine_learning/deepdive/03_tensorflow/e_cloudmle.ipynb

In the decode_csv() function, it has

features = dict(zip(CSV_COLUMNS, columns)

and CSV_COLUMNS includes the key column but it does not do:

features.pop('key')

Although `key' is listed as one of the columns in the CSV, shouldn't this column be dropped prior to assigning to the feature set?

Also, this routine prints this message when doing training. Is this a problem?

INFO:tensorflow:'serving_default' : Regression input must be a single string Tensor; got {'passengers': <tf.Tensor 'Placeholder_4:0' shape=(?,) dtype=float32>, 'pickuplon': <tf.Tensor 'Placeholder:0' shape=(?,) dtype=float32>, 'dropofflon': <tf.Tensor 'Placeholder_3:0' shape=(?,) dtype=float32>, 'pickuplat': <tf.Tensor 'Placeholder_1:0' shape=(?,) dtype=float32>, 'dropofflat': <tf.Tensor 'Placeholder_2:0' shape=(?,) dtype=float32>}
INFO:tensorflow:'regression' : Regression input must be a single string Tensor; got {'passengers': <tf.Tensor 'Placeholder_4:0' shape=(?,) dtype=float32>, 'pickuplon': <tf.Tensor 'Placeholder:0' shape=(?,) dtype=float32>, 'dropofflon': <tf.Tensor 'Placeholder_3:0' shape=(?,) dtype=float32>, 'pickuplat': <tf.Tensor 'Placeholder_1:0' shape=(?,) dtype=float32>, 'dropofflat': <tf.Tensor 'Placeholder_2:0' shape=(?,) dtype=float32>}

d_customestimator.ipynb and other ipynb's won't load

Small Typo on Sklearn + Cloud ML Tutorial

On the Jupyter Notebook (https://github.com/GoogleCloudPlatform/training-data-analyst/blob/master/blogs/sklearn/babyweight_skl.ipynb), there's a small typo.
In the 3rd coding cell under the section "Packaging up as a Python package", and for the "install_requires" argument, it says 'cloudml-hypertune,'. The comma should go outside of the string and not within the string.
The file generated by the %writefile magic function (babyweight/setup.py) of that coding cell is actually correct, so my guess is that the Jupyter Notebook cell must have been changed after the the file was written.

Thanks Lak for the great tutorials! On my team, we definitely following them carefully and closely.