The dotscience from dotmesh-io

[1d] publish a python client library for printing dotscience metadata

Context

We want to make it easier for people to emit the DOTSCIENCE_ annotations, and the print statements have the following problems:

They're hard to read, the json.dumps isn't obvious to a data scientist.
They rely on having variables in global scope - but parameters may be used inside a function. Technically, users can do the print statements from inside a function but it's not clear that this will work.
There's no validation of input and output datasets, or output format.

Requirements

Develop a trivial Python library and publish it on PyPI which solves the above problems. Usage should be like:

import dotscience as ds
# The following two methods throws exceptions if agent1/2 or model isn't a mountpoint
ds.input("agent1", "agent2")
ds.output("model")

# The following methods can either copy the values at call time, or keep the
# reference for completion - probably a copy is better as the user will probably
# expect the value _right now_ to be captured.
ds.metric("f-score", f_score)
ds.parameter("batch-size", batch_size)
ds.label("frobrinator", "off")

# They also return the result for handy use like this:
tensorflow.setBatchSize(ds.parameter("batch-size", 0.3))

# Multiple stats, params or labels can be passed as long as there are 2*x params
ds.metric("f-score", f_score, "batch_size", batch_size)

# Alternate calling style with **kwargs
ds.metric(a=1, b=2)
ds.parameter(c=3, d=4)

# Preview the metrics in human-readable form without publishing them to
# dotscience even if the notebook is saved
ds.debug()

# Report the metrics
ds.report()

# Report data changes, but no metrics (summary stats)
ds.report(plot=False)

The final method will print:

---
DOTSCIENCE_INPUTS=["agent1", "agent2"]
DOTSCIENCE_OUTPUTS=["model"]
DOTSCIENCE_SUMMARY={"f-score": 0.1, "batch-size": 0.9}
DOTSCIENCE_PARAMETERS={"c": 3, "d": 4}
---
Note to Jupyter users: don't forget to save your notebook in order to publish
these results to dotscience.

Open questions

metric or summary? Let's decide

dotmesh-io / dotscience Goto Github PK

dotscience's People

Contributors

Watchers

Forkers

dotscience's Issues

[1d] publish a python client library for printing dotscience metadata

Context

Requirements

Open questions

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent