rstudio / vetiver-python Goto Github PK

View Code? Open in Web Editor NEW

59.0 6.0 17.0 6.65 MB

Version, share, deploy, and monitor models.

Home Page: https://rstudio.github.io/vetiver-python/stable/

License: MIT License

Python 97.77% Makefile 2.12% Shell 0.11%

mlops model-deploy model-monitoring model-versioning python

vetiver-python's Introduction

vetiver

Vetiver, the oil of tranquility, is used as a stabilizing ingredient in perfumery to preserve more volatile fragrances.

The goal of vetiver is to provide fluent tooling to version, share, deploy, and monitor a trained model. Functions handle both recording and checking the model's input data prototype, and predicting from a remote API endpoint. The vetiver package is extensible, with generics that can support many kinds of models, and available for both Python and R. To learn more about vetiver, see:

the documentation at https://vetiver.rstudio.com/
the R package at https://rstudio.github.io/vetiver-r/

You can use vetiver with:

scikit-learn
torch
statsmodels
xgboost
spacy
or utilize custom handlers to support your own models!

Installation

You can install the released version of vetiver from PyPI:

python -m pip install vetiver

And the development version from GitHub with:

python -m pip install git+https://github.com/rstudio/vetiver-python

Example

A VetiverModel() object collects the information needed to store, version, and deploy a trained model.

from vetiver import mock, VetiverModel

X, y = mock.get_mock_data()
model = mock.get_mock_model().fit(X, y)

v = VetiverModel(model, model_name='mock_model', prototype_data=X)

You can version and share your VetiverModel() by choosing a pins "board" for it, including a local folder, Connect, Amazon S3, and more.

from pins import board_temp
from vetiver import vetiver_pin_write

model_board = board_temp(versioned = True, allow_pickle_read = True)
vetiver_pin_write(model_board, v)

You can deploy your pinned VetiverModel() using VetiverAPI(), an extension of FastAPI.

from vetiver import VetiverAPI
app = VetiverAPI(v, check_prototype = True)

To start a server using this object, use app.run(port = 8080) or your port of choice.

Contributing

This project is released with a Contributor Code of Conduct. By contributing to this project, you agree to abide by its terms.

For questions and discussions about deploying models, statistical modeling, and machine learning, please post on Posit Community.
If you think you have encountered a bug, please submit an issue.

vetiver-python's People

Contributors

Stargazers

Watchers

Forkers

isabelizimm has2k1 fdoperezi machow xuf12 m4thm4gician bridgecrew-perf7 attilaszombati ganesh-k13 samedwardes qiushiyan psanogo jamiebeverley gsingh91 slashml sagerb mikemahoney218

vetiver-python's Issues

Use singledispatch to assign a handler

Like the ptype, the handler depends on a type (model) that may not be known to vetiver. singledispatch allows the user to extend vetiver to their custom models.

rerunning rsconnect_deploy deploys new API

Describe the bug
using rsconnect_deploy to redeploy an existing API on Connect, it deploys a new API. I did not specify the new argument either

To Reproduce
Use can use the jupyter notebook published here.

Expected behavior
Redeploying an existing API should overwrite the existing content.

change monitoring column from `estimate` to `score`

In monitoring, we have 4 output columns for compute_metrics: index, n, metric, and estimate.

Is estimate the right label? It is not an estimated value, rather it is a specific, calculated value. It might be confusing outside of the dataframe (ie, in plots created).

CC: @juliasilge

using `board_url` with vetiver

The hope is to use pins.board_url for model cards, but it is currently not since vetiver requires a version.

Pins will be adding this capability! Tracking in
rstudio/pins-python#147

automatically skip rsconnect tests if not setup

most people will not have a local setup for rsconnect, so these tests should default to skipping

add maintainers file

using joblib to serve sklearn models

Currently, vetiver uses joblib to uniformly serve sklearn models. This is because of the straightforward integration with FastAPI and the ability to handle many types of models.

Concerns:
Should user or package create this file?
Where to store (or cache?) joblib file?
Can joblib handle all sorts of sklearn models (ie, will it only serve predict methods? what about transform?)

remove nested async calls

If possible, it would be good to remove the nested async calls when people run VetiverAPI.run in Jupyter Notebooks, as it could have unintended consequences to user's original server

Use tags for versioning

setuptools_scm

With tagged versioning you have each commit as a distinct version and can have different documentation for stable & development.

less vetiver in function names (?)

When using this package, it can feel awkward typing vetiver.vetiver_XXX for many functions. On the R side, this makes sense as it better follows the naming conventions in the language, but it is often redundant when importing from vetiver-python.

I am thinking about renaming a few functions (ie, maybe vetiver_pin_write() -> vetiver.pin_write()) to feel more fluent in use.

CC: @has2k1

allow rewriting py file

Is your feature request related to a problem? Please describe.
while using the vetiver_write_app function, it does not allow rewriting the file if the file already exists

Describe the solution you'd like
Allow rewriting the file

Describe alternatives you've considered
For now, we manually delete the file first before rerunning the function.

Additional context

VetiverAPI() auto-docs redirects to base of URL

Describe the bug
When redirecting, the URL goes to the base of the url, ie, when deploying to https://colorado.rstudio.com/rsc/superbowlads/ it would be expected to redirect to https://colorado.rstudio.com/rsc/superbowlads/__docs__, but instead it redirects to https://colorado.rstudio.com/rsc/__docs__.

add mock into README.md or other documentation

For users just getting into the package, it would be helpful to have a self-contained example, without other dependencies such as numpy, pandas, scikit-learn, etc. This allows new users to not be overwhelmed with external dependencies and focus on the tasks that vetiver handles.

Docs redirection on Connect not working

Describe the bug
After deploying a Vetiver API on Connect, it does not redirect to /docs, and errors out

To Reproduce
Deploy an API to Connect

Expected behavior
The landing page should be https://colorado.rstudio.com/rsc/vetiver_api/docs

Screenshots

Desktop (please complete the following information):

OS: Ubuntu
Browser: Chrome
Version: 0.1.3

CC: @xuf12

Support different types of prediction

Currently, vetiver supports the predict method when handling models. In other use cases, it maybe important to be able to use predict_proba or predict_log_proba.

raise better error for checking ptype

Is your feature request related to a problem? Please describe.
When users want to check the ptype, they must give the VetiverModel() a sample of ptype data. Currently, this raises an AttributeError, but does not give guidance to users how to remedy the problem.

Describe the solution you'd like
Better error, ie, something like "check_ptype is True, but you did not give any data to create a ptype. Do you need to add data to ptype_data?"

torch and sklearn optional

Identical rapidoc visual documentation

To enhance the feeling of feature parity, we would like to have identical visual documentation inside the vetiver API for R and Python vetiver.

We will most likely use the Rapidoc web component for OpenAPI specification as the default for the /docs route in the API.

Resources on Rapidoc vs. Swagger vs. Redoc:
rapi-doc/RapiDoc#141

add windows, macos to ci

dictionary input for metrics

Related to #76 (comment)

It would be ideal to allow users to be able to submit a dictionary input for the monitoring metrics, to give custom labels if the __qualname__ is not as expected.

add support `xgboost` models

Describe the solution you'd like
Request for new handler for xgboost package

Additional context
XGBoost is currently supported by vetiver-r, and is a multilingual package. This would be a great step for parity between R and Python vetiver.

make translator accessible by VetiverModel()

escape hatch for custom implementations

wrap `rsconnect` for vetiver-specific deployment

Similar to Vetiver-R, help users deploy to RSConnect in a vetiver-aware way. Currently, users are able to deploy to connect using the rsconnect-python package via deploy_python_fastapi, but we would like to make this more clear and discoverable for vetiver users. This functionality can come from a thin wrapper around rsconnect-python.

versioned argument not available

Describe the bug
In the documentation for function vetiver.pin_read_write.vetiver_pin_read, it is mentioned that a versioned argument is available, but I get the error while using it:

To Reproduce
Use this pin_write function:

v = pin_read_write.vetiver_pin_read(
    board,
    "username/modelname"
)

Expected behavior
versioned argument should be available

Desktop (please complete the following information):

OS: Ubuntu
Browser: Chrome
Version: 0.1.3

CC: @xuf12

Double check that predict(...) raises for http errors

Describe the bug

From meeting with RStudio SE team, it looks like predict tried to json decode an error page.

To Reproduce

Create an endpoint on RStudio Connect and set it so one of us can't access. Then run something like...

from vetiver.server import predict
from vetiver.data import mtcars

endpoint = "URL_TO_ENDPOINT"
predict(endpoint, mtcars)

We may need to include response.raise_for_status() in predict

support batch prediction in `VetiverAPI`

We would like to support batch prediction for API calls. Currently, this does not work with Pydantic BaseModel handling predictions. With Pandera naturally handling DataFrames while still being based off Pydantic, it seems like it might be able to support batch calls elegantly.

We want Pandera to also be able to work nicely with non-DataFrame data structures, such as numpy arrays.

.Renv / requirements.txt parity

In both vetiver-r and vetiver-python, the metadata of a vetiver model holds a minimal set of dependencies needed to make a prediction.

For R, the packages needed to make a prediction are saved in the vetiver_model() metadata, and then we use renv to create a scoped-down renv.lock that only tracks the packages + versions needed for prediction.

In Python, a requirements.txt must be generated. The big options here might be pip freeze, pip-compile, or generating this file manually from a pre-set list of requirements. The benefits of the pip options is that there would be a defined version number, to save users headache in case different metadata is used in new vetiver versions. This seems preferable to a list with unspecified version numbers. There might need to be some parsing to either pip freeze or pip-compile to specify only the packages needed for a prediction, rather than all packages a user has installed.

It seems that the "best parity" option might to be to take the necessary requirements from VetiverModel() metadata, then generate a more robust requirements.txt via some other tool.

Is there any other useful context I am missing?

multicolumn output with monitoring

what to do with ptype enforcement at deployment

Currently, if you save a VetiverModel() with no ptype, you can deploy with check_ptype=True with no error. This can POST to an endpoint with no error, but with certain data types, predictions come back dictionaries of errors.

I am planning on enforcing the following ⬇️

ptype saved, deploy with ptype ✅
ptype saved, deploy with NO ptype ❓ give warning
NO ptype saved, deploy with ptype ❌ raise error
NO ptype saved, deploy with NO ptype ✅

Adding function output for rsconnect_deploy

Current version of rsconnect_deploy does not have any output, which is not an ideal experience

Describe the solution you'd like
this function wraps the deploy_fastapi() function from rsconnect-python, which outputs a lot of information after the deployment is complete. We do not need the whole output, but some text with details on where on Connect the API is deployed would be helpful(this info is already available from deploy_fast_api())

Example dropdown improvement

Describe the bug
Not sure if this is a bug, but wanted to capture it here. In the docs page of the API, the drop down under the Examples section has same values ptype for both single and batch predictions.

Expected behavior
Different names for different types of predictions(single/batch etc)

Screenshots

On documentation

Add link to documentation in the about section on the main page
Add link to documentation in the readme
Have 2 documentation destinations
- Released / stable
- Unreleased / main branch / latest
- Make the versioning of the package under development clearly show that it is not a released version
Prevent docs workflows initiated by PRs (esp. from contributors) from attempting to push to the gh-pages branch. Currently this happens and is indicated as a failed test

Example data not showing up in API docs

Describe the bug
After publishing a vetiver API, the EXAMPLE section is not picking up data from ptype. The same data shows up in SCHEMA section though.

To Reproduce
Use this API: https://colorado.rstudio.com/rsc/bike-predict-python-api/

Expected behavior
Example should also be populated with default values.

Screenshots

load_pkgs doesn't work on Windows

Describe the bug
The load_pkgs method generates "permission denied" error when run on Windows 10 OS.
On Windows OS, the tempfile package that's used to create the temporary .in file doesn't allow the file to be written with a context manager because the NamedTemporaryFile function actually opens the file first. This prevents the open(tmp.name) and the pip-compile commands, which use this temporary file, from opening the file.

To Reproduce
Steps to reproduce the behavior:

On Windows 10 OS, run vetiver.load_pkgs(path="YOUR PATH HERE")
See error

Expected behavior
This should create a temp file, write the list of requirements from the model/list, then compile the temp file into a requirements.txt file.

Screenshots

Desktop (please complete the following information):

OS: Windows 10
Browser chrome/edge

Additional context
This can be solved by closing the temporary file after it's opened (requires using delete=False when creating the temp file), and then deleting it after the requirements.txt file has been compiled. (see below)

def load_pkgs(model: VetiverModel = None, packages: list = None, path=""):
    """Load packages necessary for predictions

    Args
    ----
        model: VetiverModel
            VetiverModel to extract packages from
        packages: list
            List of extra packages to include
        path: str
            Where to save output file
    """

    required_pkgs = ["vetiver"]
    if packages:
        required_pkgs = list(set(required_pkgs + packages))
    if model.metadata.get("required_pkgs"):
        required_pkgs = list(set(required_pkgs + model.metadata.get("required_pkgs")))

    tmp = tempfile.NamedTemporaryFile(suffix=".in", delete=False)  # delete=False needed so file not deleted after closing
    tmp.close()  # Added to close file after created
    with open(tmp.name, "a") as f:
        for package in required_pkgs:
            f.write(package + "\n")

    os.system(f"pip-compile {tmp.name} --output-file={path}vetiver_requirements.txt")
    os.remove(tmp.name)  # Need to delete file after compiling completes

Add support for Model Card reporting

Once a VetiverModel is created, we would like to nudge users to create a model card for reporting relevant information about the model. We would like some of this to be automatically populated, possibly by using a Quarto template.

Considerations:

how can users create this model card?

ideas: maybe users create this through CLI (which feels clunky), or have a function to automatically generate the template in user's directory, where they can edit at their discretion

Original paper: https://arxiv.org/pdf/1810.03993.pdf
VetiveR model card: https://github.com/tidymodels/vetiver/blob/main/vignettes/model-card.Rmd

model monitoring

issue for tracking purposes 🥳

main monitoring functions

vetiver.compute_metrics
vetiver.pin_metrics
vetiver.plot_metrics

Update to PyPI `pins`

Is your feature request related to a problem? Please describe.
With a stable release of pins on PyPI, the dependency can be updated in the setup.cfg file, and other small changes as needed.

Add examples to all docstrings

In an effort to give more comprehensive documentation, it would be helpful if all functions had examples.

Error importing vetiver.handlers when torch not installed

Describe the bug

Because vetiver's TorchHandler class tries to set base_class = torch.nn.Module when it is defined, it cannot be imported without torch. Since it is imported in vetiver.handlers, not having torch causes an error on import.

To Reproduce

From vetiver repo

python -m venv env
source env/bin/activate

pip install .

from python

import vetiver.handlers

output:

>>> import vetiver.handlers
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/Users/machow/repos/vetiver-python/vetiver/__init__.py", line 9, in <module>
    from .vetiver_model import VetiverModel
  File "/Users/machow/repos/vetiver-python/vetiver/vetiver_model.py", line 3, in <module>
    from vetiver.handlers._interface import create_handler
  File "/Users/machow/repos/vetiver-python/vetiver/handlers/_interface.py", line 1, in <module>
    from vetiver.handlers import torch, sklearn, base
  File "/Users/machow/repos/vetiver-python/vetiver/handlers/torch.py", line 13, in <module>
    class TorchHandler(VetiverHandler):
  File "/Users/machow/repos/vetiver-python/vetiver/handlers/torch.py", line 22, in TorchHandler
    base_class = torch.nn.Module
NameError: name 'torch' is not defined

Wrong vetiver module in function generated file

Describe the bug
When using the vetiver.vetiver_write_app function, the auto generated .py file has vetiver.vetiver_pin_read function, which is not available

To Reproduce
Steps to reproduce the behavior:
Run vetiver.vetiver_write_app function to generate the file

vetiver_write_app(
        board,
        "username/modelname",
        file="superbowlads.py"
)

Expected behavior
The correct funtion is vetiver.pin_read_write.vetiver_pin_read

Desktop (please complete the following information):

OS: Ubuntu
Browser: Chrome
Version: 0.1.3

CC: @xuf12

`VetiverAPI(model, check_ptype = False)` has QUERY parameters

Is your feature request related to a problem? Please describe.
When users use VetiverAPI(model, check_ptype = False), the data posted to the /predict/ endpoint is ingested into VetiverAPI() as a QUERY parameter. When VetiverAPI(model, check_ptype = True)`, the data is ingested as the body of the request.

Describe the solution you'd like
Data should be POST to body of request, regardless of checking the data prototype.

Add a regular testing against pins

To make it less likely that pins ships a breaking change, maybe we should schedule weekly tests against pins main branch.

Ref: https://docs.github.com/en/actions/using-workflows/events-that-trigger-workflows#schedule

arrow support in vetiver

Trying to figure out how to support arrow in vetiver. What would this look like? Where does it make sense to implement?

import mtcars into package

add mtcars dataset for demo use

Link R and Python packages in README of other language

To make sure everyone that comes across this understands it both R and Python.

check IPython

Cleanup files created by the testing

Some tests create directory/files at the root of the repository. These should be cleaned up when the tests end or should be created in the temporary directory.

Identified files and directories:

model/*: model directory is probably created by test_pin_write.py

Add credentials to predict

Currently, if people use vetiver.server.predict() to POST data to a model in Connect that is not open to the public, it will give a JSONDecodeError: [Errno Extra data] 404 page not found. We should use the information users provide when generating an RSConnectServer to add credentials so private APIs can be accessed (add an optional parameter to pass in RSConnectServer)

Pins board function argument missing

Describe the bug
The .py file generated after using function does not have allow_pickle_read=True option set, which is needed to write models as pins

To Reproduce
Run vetiver.vetiver_write_app function to generate the file

vetiver_write_app(
        board,
        "username/modelname",
        file="superbowlads.py"
)

Expected behavior
allow_pickle_read=True should be set by default

Desktop (please complete the following information):

OS: Ubuntu
Browser: Chrome
Version: 0.1.3

CC: @xuf12

Error when predicting with a dataframe using an API deployed version 0.1.6

Describe the bug
I get this error when predicting with both 0.1.5 and 0.1.6 versions of vetiver using an endpoint deployed with version 0.1.6 of vetiver from PyPI.

Traceback (most recent call last):
File "/usr/home/xu.fei/bike_predict_python/app/.venv/lib/python3.9/site-packages/requests/models.py", line 971, in json
return complexjson.loads(self.text, **kwargs)
File "/opt/python/3.9.6/lib/python3.9/json/init.py", line 346, in loads
return _default_decoder.decode(s)
File "/opt/python/3.9.6/lib/python3.9/json/decoder.py", line 337, in decode
obj, end = self.raw_decode(s, idx=_w(s, 0).end())
File "/opt/python/3.9.6/lib/python3.9/json/decoder.py", line 355, in raw_decode
raise JSONDecodeError("Expecting value", s, err.value) from None
json.decoder.JSONDecodeError: Expecting value: line 1 column 1 (char 0)

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "/usr/home/xu.fei/bike_predict_python/app/.venv/lib/python3.9/site-packages/shiny/session/_session.py", line 977, in output_obs
message[output_name] = fn()
File "/usr/home/xu.fei/bike_predict_python/app/.venv/lib/python3.9/site-packages/shiny/render/_render.py", line 205, in call
return _utils.run_coro_sync(self._run())
File "/usr/home/xu.fei/bike_predict_python/app/.venv/lib/python3.9/site-packages/shiny/_utils.py", line 178, in run_coro_sync
coro.send(None)
File "/usr/home/xu.fei/bike_predict_python/app/.venv/lib/python3.9/site-packages/shiny/render/_render.py", line 219, in _run
x = await self._fn()
File "/usr/home/xu.fei/bike_predict_python/app/.venv/lib/python3.9/site-packages/shiny/_utils.py", line 132, in fn_async
return fn()
File "/usr/home/xu.fei/bike_predict_python/app/app.py", line 202, in plot
df_to_plot_id["pred"] = predict(endpoint, df_to_pred)
File "/usr/home/xu.fei/bike_predict_python/app/.venv/lib/python3.9/site-packages/vetiver/server.py", line 227, in predict
response_df = pd.DataFrame.from_dict(response.json())
File "/usr/home/xu.fei/bike_predict_python/app/.venv/lib/python3.9/site-packages/requests/models.py", line 975, in json
raise RequestsJSONDecodeError(e.msg, e.doc, e.pos)
requests.exceptions.JSONDecodeError: Expecting value: line 1 column 1 (char 0)

To Reproduce
Steps to reproduce the behavior:

import pandas as pd
from vetiver.server import predict, vetiver_endpoint
data = {
"id": ["1", "2"],
"hour": [1, 10],
"month": [7, 8],
"Friday": [0,0],
"Monday": [0,0],
"Saturday": [1,0],
"Sunday": [0,0],
"Thursday": [0,0],
"Tuesday": [0,0],
"Wednesday": [0,1],
}
df_to_test = pd.DataFrame.from_dict(data)

# works with API deployed with 0.1.5
endpoint_150 = vetiver_endpoint(
"https://colorado.rstudio.com/rsc/new-bikeshare-model/predict/"
)
predict(endpoint_150, df_to_test)

# doesn't work with API deployed with 0.1.6
endpoint_160 = vetiver_endpoint(
"https://colorado.rstudio.com/rsc/bike-predict-python-api/predict/"
)
predict(endpoint_160, df_to_test)

Expected behavior
I expect endpoint_160 to work the same way as endpoint_150.

Screenshots
If applicable, add screenshots to help explain your problem.

@gsingh91

rstudio / vetiver-python Goto Github PK

vetiver-python's Introduction

vetiver

Installation

Example

Contributing

vetiver-python's People

Contributors

Stargazers

Watchers

Forkers

vetiver-python's Issues

Recommend Projects

Recommend Topics

Recommend Org