Code Monkey home page Code Monkey logo

vetiver-python's Introduction

vetiver

Lifecycle: experimental codecov

Vetiver, the oil of tranquility, is used as a stabilizing ingredient in perfumery to preserve more volatile fragrances.

The goal of vetiver is to provide fluent tooling to version, share, deploy, and monitor a trained model. Functions handle both recording and checking the model's input data prototype, and predicting from a remote API endpoint. The vetiver package is extensible, with generics that can support many kinds of models, and available for both Python and R. To learn more about vetiver, see:

You can use vetiver with:

Installation

You can install the released version of vetiver from PyPI:

python -m pip install vetiver

And the development version from GitHub with:

python -m pip install git+https://github.com/rstudio/vetiver-python

Example

A VetiverModel() object collects the information needed to store, version, and deploy a trained model.

from vetiver import mock, VetiverModel

X, y = mock.get_mock_data()
model = mock.get_mock_model().fit(X, y)

v = VetiverModel(model, model_name='mock_model', prototype_data=X)

You can version and share your VetiverModel() by choosing a pins "board" for it, including a local folder, Connect, Amazon S3, and more.

from pins import board_temp
from vetiver import vetiver_pin_write

model_board = board_temp(versioned = True, allow_pickle_read = True)
vetiver_pin_write(model_board, v)

You can deploy your pinned VetiverModel() using VetiverAPI(), an extension of FastAPI.

from vetiver import VetiverAPI
app = VetiverAPI(v, check_prototype = True)

To start a server using this object, use app.run(port = 8080) or your port of choice.

Contributing

This project is released with a Contributor Code of Conduct. By contributing to this project, you agree to abide by its terms.

  • For questions and discussions about deploying models, statistical modeling, and machine learning, please post on Posit Community.

  • If you think you have encountered a bug, please submit an issue.

vetiver-python's People

Contributors

ganesh-k13 avatar gsingh91 avatar has2k1 avatar isabelizimm avatar juliasilge avatar m4thm4gician avatar machow avatar mikemahoney218 avatar randyzwitch avatar sagerb avatar samedwardes avatar tomruhland avatar xuf12 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar

vetiver-python's Issues

Use singledispatch to assign a handler

Like the ptype, the handler depends on a type (model) that may not be known to vetiver. singledispatch allows the user to extend vetiver to their custom models.

rerunning rsconnect_deploy deploys new API

Describe the bug
using rsconnect_deploy to redeploy an existing API on Connect, it deploys a new API. I did not specify the new argument either

To Reproduce
Use can use the jupyter notebook published here.

Expected behavior
Redeploying an existing API should overwrite the existing content.

change monitoring column from `estimate` to `score`

In monitoring, we have 4 output columns for compute_metrics: index, n, metric, and estimate.

Is estimate the right label? It is not an estimated value, rather it is a specific, calculated value. It might be confusing outside of the dataframe (ie, in plots created).

CC: @juliasilge

using joblib to serve sklearn models

Currently, vetiver uses joblib to uniformly serve sklearn models. This is because of the straightforward integration with FastAPI and the ability to handle many types of models.

Concerns:
Should user or package create this file?
Where to store (or cache?) joblib file?
Can joblib handle all sorts of sklearn models (ie, will it only serve predict methods? what about transform?)

remove nested async calls

If possible, it would be good to remove the nested async calls when people run VetiverAPI.run in Jupyter Notebooks, as it could have unintended consequences to user's original server

less vetiver in function names (?)

When using this package, it can feel awkward typing vetiver.vetiver_XXX for many functions. On the R side, this makes sense as it better follows the naming conventions in the language, but it is often redundant when importing from vetiver-python.

I am thinking about renaming a few functions (ie, maybe vetiver_pin_write() -> vetiver.pin_write()) to feel more fluent in use.

CC: @has2k1

allow rewriting py file

Is your feature request related to a problem? Please describe.
while using the vetiver_write_app function, it does not allow rewriting the file if the file already exists

Describe the solution you'd like
Allow rewriting the file

Describe alternatives you've considered
For now, we manually delete the file first before rerunning the function.

Additional context

VetiverAPI() auto-docs redirects to base of URL

Describe the bug
When redirecting, the URL goes to the base of the url, ie, when deploying to https://colorado.rstudio.com/rsc/superbowlads/ it would be expected to redirect to https://colorado.rstudio.com/rsc/superbowlads/__docs__, but instead it redirects to https://colorado.rstudio.com/rsc/__docs__.

add mock into README.md or other documentation

For users just getting into the package, it would be helpful to have a self-contained example, without other dependencies such as numpy, pandas, scikit-learn, etc. This allows new users to not be overwhelmed with external dependencies and focus on the tasks that vetiver handles.

Docs redirection on Connect not working

Describe the bug
After deploying a Vetiver API on Connect, it does not redirect to /docs, and errors out

To Reproduce
Deploy an API to Connect

Expected behavior
The landing page should be https://colorado.rstudio.com/rsc/vetiver_api/docs

Screenshots
image

Desktop (please complete the following information):

  • OS: Ubuntu
  • Browser: Chrome
  • Version: 0.1.3

CC: @xuf12

Support different types of prediction

Currently, vetiver supports the predict method when handling models. In other use cases, it maybe important to be able to use predict_proba or predict_log_proba.

raise better error for checking ptype

Is your feature request related to a problem? Please describe.
When users want to check the ptype, they must give the VetiverModel() a sample of ptype data. Currently, this raises an AttributeError, but does not give guidance to users how to remedy the problem.

Describe the solution you'd like
Better error, ie, something like "check_ptype is True, but you did not give any data to create a ptype. Do you need to add data to ptype_data?"

Identical rapidoc visual documentation

To enhance the feeling of feature parity, we would like to have identical visual documentation inside the vetiver API for R and Python vetiver.

We will most likely use the Rapidoc web component for OpenAPI specification as the default for the /docs route in the API.

Resources on Rapidoc vs. Swagger vs. Redoc:
rapi-doc/RapiDoc#141

dictionary input for metrics

Related to #76 (comment)

It would be ideal to allow users to be able to submit a dictionary input for the monitoring metrics, to give custom labels if the __qualname__ is not as expected.

add support `xgboost` models

Describe the solution you'd like
Request for new handler for xgboost package

Additional context
XGBoost is currently supported by vetiver-r, and is a multilingual package. This would be a great step for parity between R and Python vetiver.

wrap `rsconnect` for vetiver-specific deployment

Similar to Vetiver-R, help users deploy to RSConnect in a vetiver-aware way. Currently, users are able to deploy to connect using the rsconnect-python package via deploy_python_fastapi, but we would like to make this more clear and discoverable for vetiver users. This functionality can come from a thin wrapper around rsconnect-python.

versioned argument not available

Describe the bug
In the documentation for function vetiver.pin_read_write.vetiver_pin_read, it is mentioned that a versioned argument is available, but I get the error while using it:

image

To Reproduce
Use this pin_write function:

v = pin_read_write.vetiver_pin_read(
    board,
    "username/modelname"
)

Expected behavior
versioned argument should be available

Desktop (please complete the following information):

  • OS: Ubuntu
  • Browser: Chrome
  • Version: 0.1.3

CC: @xuf12

Double check that predict(...) raises for http errors

Describe the bug

From meeting with RStudio SE team, it looks like predict tried to json decode an error page.

To Reproduce

Create an endpoint on RStudio Connect and set it so one of us can't access. Then run something like...

from vetiver.server import predict
from vetiver.data import mtcars

endpoint = "URL_TO_ENDPOINT"
predict(endpoint, mtcars)

We may need to include response.raise_for_status() in predict

support batch prediction in `VetiverAPI`

We would like to support batch prediction for API calls. Currently, this does not work with Pydantic BaseModel handling predictions. With Pandera naturally handling DataFrames while still being based off Pydantic, it seems like it might be able to support batch calls elegantly.

We want Pandera to also be able to work nicely with non-DataFrame data structures, such as numpy arrays.

.Renv / requirements.txt parity

In both vetiver-r and vetiver-python, the metadata of a vetiver model holds a minimal set of dependencies needed to make a prediction.

For R, the packages needed to make a prediction are saved in the vetiver_model() metadata, and then we use renv to create a scoped-down renv.lock that only tracks the packages + versions needed for prediction.

In Python, a requirements.txt must be generated. The big options here might be pip freeze, pip-compile, or generating this file manually from a pre-set list of requirements. The benefits of the pip options is that there would be a defined version number, to save users headache in case different metadata is used in new vetiver versions. This seems preferable to a list with unspecified version numbers. There might need to be some parsing to either pip freeze or pip-compile to specify only the packages needed for a prediction, rather than all packages a user has installed.

It seems that the "best parity" option might to be to take the necessary requirements from VetiverModel() metadata, then generate a more robust requirements.txt via some other tool.

Is there any other useful context I am missing?

what to do with ptype enforcement at deployment

Currently, if you save a VetiverModel() with no ptype, you can deploy with check_ptype=True with no error. This can POST to an endpoint with no error, but with certain data types, predictions come back dictionaries of errors.

I am planning on enforcing the following ⬇️

  • ptype saved, deploy with ptype ✅
  • ptype saved, deploy with NO ptype ❓ give warning
  • NO ptype saved, deploy with ptype ❌ raise error
  • NO ptype saved, deploy with NO ptype ✅

Adding function output for rsconnect_deploy

Current version of rsconnect_deploy does not have any output, which is not an ideal experience

Describe the solution you'd like
this function wraps the deploy_fastapi() function from rsconnect-python, which outputs a lot of information after the deployment is complete. We do not need the whole output, but some text with details on where on Connect the API is deployed would be helpful(this info is already available from deploy_fast_api())

Example dropdown improvement

Describe the bug
Not sure if this is a bug, but wanted to capture it here. In the docs page of the API, the drop down under the Examples section has same values ptype for both single and batch predictions.

Expected behavior
Different names for different types of predictions(single/batch etc)

Screenshots
Screen Shot 2022-07-07 at 7 35 22 PM

On documentation

  • Add link to documentation in the about section on the main page
  • Add link to documentation in the readme
  • Have 2 documentation destinations
    • Released / stable
    • Unreleased / main branch / latest
    • Make the versioning of the package under development clearly show that it is not a released version
  • Prevent docs workflows initiated by PRs (esp. from contributors) from attempting to push to the gh-pages branch. Currently this happens and is indicated as a failed test

load_pkgs doesn't work on Windows

Describe the bug
The load_pkgs method generates "permission denied" error when run on Windows 10 OS.
On Windows OS, the tempfile package that's used to create the temporary .in file doesn't allow the file to be written with a context manager because the NamedTemporaryFile function actually opens the file first. This prevents the open(tmp.name) and the pip-compile commands, which use this temporary file, from opening the file.

To Reproduce
Steps to reproduce the behavior:

  1. On Windows 10 OS, run vetiver.load_pkgs(path="YOUR PATH HERE")
  2. See error

Expected behavior
This should create a temp file, write the list of requirements from the model/list, then compile the temp file into a requirements.txt file.

Screenshots
image

Desktop (please complete the following information):

  • OS: Windows 10
  • Browser chrome/edge

Additional context
This can be solved by closing the temporary file after it's opened (requires using delete=False when creating the temp file), and then deleting it after the requirements.txt file has been compiled. (see below)

def load_pkgs(model: VetiverModel = None, packages: list = None, path=""):
    """Load packages necessary for predictions

    Args
    ----
        model: VetiverModel
            VetiverModel to extract packages from
        packages: list
            List of extra packages to include
        path: str
            Where to save output file
    """

    required_pkgs = ["vetiver"]
    if packages:
        required_pkgs = list(set(required_pkgs + packages))
    if model.metadata.get("required_pkgs"):
        required_pkgs = list(set(required_pkgs + model.metadata.get("required_pkgs")))

    tmp = tempfile.NamedTemporaryFile(suffix=".in", delete=False)  # delete=False needed so file not deleted after closing
    tmp.close()  # Added to close file after created
    with open(tmp.name, "a") as f:
        for package in required_pkgs:
            f.write(package + "\n")

    os.system(f"pip-compile {tmp.name} --output-file={path}vetiver_requirements.txt")
    os.remove(tmp.name)  # Need to delete file after compiling completes

Add support for Model Card reporting

Once a VetiverModel is created, we would like to nudge users to create a model card for reporting relevant information about the model. We would like some of this to be automatically populated, possibly by using a Quarto template.

Considerations:

  • how can users create this model card?

ideas: maybe users create this through CLI (which feels clunky), or have a function to automatically generate the template in user's directory, where they can edit at their discretion

Original paper: https://arxiv.org/pdf/1810.03993.pdf
VetiveR model card: https://github.com/tidymodels/vetiver/blob/main/vignettes/model-card.Rmd

model monitoring

issue for tracking purposes 🥳

main monitoring functions

  • vetiver.compute_metrics
  • vetiver.pin_metrics
  • vetiver.plot_metrics

Update to PyPI `pins`

Is your feature request related to a problem? Please describe.
With a stable release of pins on PyPI, the dependency can be updated in the setup.cfg file, and other small changes as needed.

Error importing vetiver.handlers when torch not installed

Describe the bug

Because vetiver's TorchHandler class tries to set base_class = torch.nn.Module when it is defined, it cannot be imported without torch. Since it is imported in vetiver.handlers, not having torch causes an error on import.

To Reproduce

From vetiver repo

python -m venv env
source env/bin/activate

pip install .

from python

import vetiver.handlers

output:

>>> import vetiver.handlers
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/Users/machow/repos/vetiver-python/vetiver/__init__.py", line 9, in <module>
    from .vetiver_model import VetiverModel
  File "/Users/machow/repos/vetiver-python/vetiver/vetiver_model.py", line 3, in <module>
    from vetiver.handlers._interface import create_handler
  File "/Users/machow/repos/vetiver-python/vetiver/handlers/_interface.py", line 1, in <module>
    from vetiver.handlers import torch, sklearn, base
  File "/Users/machow/repos/vetiver-python/vetiver/handlers/torch.py", line 13, in <module>
    class TorchHandler(VetiverHandler):
  File "/Users/machow/repos/vetiver-python/vetiver/handlers/torch.py", line 22, in TorchHandler
    base_class = torch.nn.Module
NameError: name 'torch' is not defined

Wrong vetiver module in function generated file

Describe the bug
When using the vetiver.vetiver_write_app function, the auto generated .py file has vetiver.vetiver_pin_read function, which is not available

To Reproduce
Steps to reproduce the behavior:
Run vetiver.vetiver_write_app function to generate the file

vetiver_write_app(
        board,
        "username/modelname",
        file="superbowlads.py"
)

Expected behavior
The correct funtion is vetiver.pin_read_write.vetiver_pin_read

Desktop (please complete the following information):

  • OS: Ubuntu
  • Browser: Chrome
  • Version: 0.1.3

CC: @xuf12

`VetiverAPI(model, check_ptype = False)` has QUERY parameters

Is your feature request related to a problem? Please describe.
When users use VetiverAPI(model, check_ptype = False), the data posted to the /predict/ endpoint is ingested into VetiverAPI() as a QUERY parameter. When VetiverAPI(model, check_ptype = True)`, the data is ingested as the body of the request.

Describe the solution you'd like
Data should be POST to body of request, regardless of checking the data prototype.

arrow support in vetiver

Trying to figure out how to support arrow in vetiver. What would this look like? Where does it make sense to implement?

Cleanup files created by the testing

Some tests create directory/files at the root of the repository. These should be cleaned up when the tests end or should be created in the temporary directory.

Identified files and directories:

  • model/*: model directory is probably created by test_pin_write.py

Add credentials to predict

Currently, if people use vetiver.server.predict() to POST data to a model in Connect that is not open to the public, it will give a JSONDecodeError: [Errno Extra data] 404 page not found. We should use the information users provide when generating an RSConnectServer to add credentials so private APIs can be accessed (add an optional parameter to pass in RSConnectServer)

Pins board function argument missing

Describe the bug
The .py file generated after using function does not have allow_pickle_read=True option set, which is needed to write models as pins

To Reproduce
Run vetiver.vetiver_write_app function to generate the file

vetiver_write_app(
        board,
        "username/modelname",
        file="superbowlads.py"
)

Expected behavior
allow_pickle_read=True should be set by default

Desktop (please complete the following information):

  • OS: Ubuntu
  • Browser: Chrome
  • Version: 0.1.3

CC: @xuf12

Error when predicting with a dataframe using an API deployed version 0.1.6

Describe the bug
I get this error when predicting with both 0.1.5 and 0.1.6 versions of vetiver using an endpoint deployed with version 0.1.6 of vetiver from PyPI.

Traceback (most recent call last):
File "/usr/home/xu.fei/bike_predict_python/app/.venv/lib/python3.9/site-packages/requests/models.py", line 971, in json
return complexjson.loads(self.text, **kwargs)
File "/opt/python/3.9.6/lib/python3.9/json/init.py", line 346, in loads
return _default_decoder.decode(s)
File "/opt/python/3.9.6/lib/python3.9/json/decoder.py", line 337, in decode
obj, end = self.raw_decode(s, idx=_w(s, 0).end())
File "/opt/python/3.9.6/lib/python3.9/json/decoder.py", line 355, in raw_decode
raise JSONDecodeError("Expecting value", s, err.value) from None
json.decoder.JSONDecodeError: Expecting value: line 1 column 1 (char 0)

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "/usr/home/xu.fei/bike_predict_python/app/.venv/lib/python3.9/site-packages/shiny/session/_session.py", line 977, in output_obs
message[output_name] = fn()
File "/usr/home/xu.fei/bike_predict_python/app/.venv/lib/python3.9/site-packages/shiny/render/_render.py", line 205, in call
return _utils.run_coro_sync(self._run())
File "/usr/home/xu.fei/bike_predict_python/app/.venv/lib/python3.9/site-packages/shiny/_utils.py", line 178, in run_coro_sync
coro.send(None)
File "/usr/home/xu.fei/bike_predict_python/app/.venv/lib/python3.9/site-packages/shiny/render/_render.py", line 219, in _run
x = await self._fn()
File "/usr/home/xu.fei/bike_predict_python/app/.venv/lib/python3.9/site-packages/shiny/_utils.py", line 132, in fn_async
return fn()
File "/usr/home/xu.fei/bike_predict_python/app/app.py", line 202, in plot
df_to_plot_id["pred"] = predict(endpoint, df_to_pred)
File "/usr/home/xu.fei/bike_predict_python/app/.venv/lib/python3.9/site-packages/vetiver/server.py", line 227, in predict
response_df = pd.DataFrame.from_dict(response.json())
File "/usr/home/xu.fei/bike_predict_python/app/.venv/lib/python3.9/site-packages/requests/models.py", line 975, in json
raise RequestsJSONDecodeError(e.msg, e.doc, e.pos)
requests.exceptions.JSONDecodeError: Expecting value: line 1 column 1 (char 0)

To Reproduce
Steps to reproduce the behavior:

import pandas as pd
from vetiver.server import predict, vetiver_endpoint
data = {
"id": ["1", "2"],
"hour": [1, 10],
"month": [7, 8],
"Friday": [0,0],
"Monday": [0,0],
"Saturday": [1,0],
"Sunday": [0,0],
"Thursday": [0,0],
"Tuesday": [0,0],
"Wednesday": [0,1],
}
df_to_test = pd.DataFrame.from_dict(data)

# works with API deployed with 0.1.5
endpoint_150 = vetiver_endpoint(
"https://colorado.rstudio.com/rsc/new-bikeshare-model/predict/"
)
predict(endpoint_150, df_to_test)

# doesn't work with API deployed with 0.1.6
endpoint_160 = vetiver_endpoint(
"https://colorado.rstudio.com/rsc/bike-predict-python-api/predict/"
)
predict(endpoint_160, df_to_test)

Expected behavior
I expect endpoint_160 to work the same way as endpoint_150.

Screenshots
If applicable, add screenshots to help explain your problem.

@gsingh91

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.