Code Monkey home page Code Monkey logo

datarobot-user-models's Introduction

DataRobot User Models

What is this repository?

This repository contains tools, templates, and information for assembling, debugging, testing, and running your custom inference models, custom tasks and custom notebook environments with DataRobot.

The ./task_templates and ./model_templates folders provide reference examples to help users learn how to create custom tasks and/or custom inference models. The templates there are simple, well documented, and can be used as tutorials. These templates should also remain up to date with any API or other changes.

For further examples, provided as-is, that often contain more complex logic please see the community examples repo at: https://github.com/datarobot-community/custom-models. Please note that these examples may not stay up to date with the latest API or best practices.

The ./public_dropin_notebook_environments contains template examples (sample Dockerfile and context) for how to create custom images to use as the environments for DataRobot Notebooks.

For further documentation on this and all other features please visit our comprehensive documentation at: https://docs.datarobot.com/

Terminology

DataRobot has 2 mechanisms for bringing custom ML code:

  1. Custom task: an ML algorithm, for example, XGBoost or One-hot encoding, that can be used as a step in an ML pipeline (blueprint) inside DataRobot.

  2. Custom inference model: a pre-trained model or user code prepared for inference. An inference model can have a predefined input/output schema or be unstructured. Learn more here

Content

  1. Custom Tasks Reference
  2. Custom Inference Model Reference
  3. Contribution and development
  4. Communication

Custom Tasks Reference

Materials for getting started:

Other resources:

  • There is a chance that the task you are looking for has already been implemented. Check custom tasks community Github to see some off-the-shelf examples
    • Note: The community repo above is NOT the place to start learning the basic concepts. The examples tend to have more complex logic and are meant to be used as-is rather than as a reference.
    • This repo is the appropriate place to start with tutorial examples.

Custom Inference Models Reference

Materials for getting started:

Other sources:

Contribution & development

Prerequisites for development

Note: Only reference this section if you plan to work with DRUM.

To build it, the following packages are required: make, Java 11, maven, docker, R E.g. for Ubuntu 18.04
apt-get install build-essential openjdk-11-jdk openjdk-11-jre maven python3-dev docker apt-utils curl gpg-agent software-properties-common dirmngr libssl-dev ca-certificates locales libcurl4-openssl-dev libxml2-dev libgomp1 gcc libc6-dev pandoc

R

Ubuntu 18.04

apt-key adv --keyserver keyserver.ubuntu.com --recv-keys E298A3A825C0D65DFD57CBB651716619E084DAB9
add-apt-repository 'deb https://cloud.r-project.org/bin/linux/ubuntu bionic-cran35/'
apt-get install r-cran-littler r-base r-base-dev

R packages

Rscript -e "install.packages(c('devtools', 'tidyverse', 'caret', 'recipes', 'glmnet', 'plumber', 'Rook', 'rjson', 'e1071'), Ncpus=4)"
Rscript -e 'library(caret); install.packages(unique(modelLookup()[modelLookup()$forReg, c(1)]), Ncpus=4)'
Rscript -e 'library(caret); install.packages(unique(modelLookup()[modelLookup()$forClass, c(1)]), Ncpus=4)'

DRUM developers

Setting Up Local Env For Testing

  1. create Py 3.7 or 3.8 venv
  2. pip install -r requirements_dev.txt
  3. pip install -e custom_model_runner/
  4. pytest to your heart's content

DataRobot Confluence

To get more information, search for custom models and datarobot user models in DataRobot Confluence.

Committing into the repo

  1. Ask repository admin for write access.
  2. Develop your contribution in a separate branch run tests and push to the repository.
  3. Create a pull request.

Testing changes to drum in DR app

There is a script called create-drum-dev-image.sh which will build and save an image with your latest local changes to the DRUM codebase. You can test new changes to drum in the DR app by running this script with an argument for which dropin env to modify, and uploading the image which gets built as an execution environment.

Non-DataRobot developers

To contribute to the project, use a regular GitHub process: fork the repo and create a pull request to the original repository.

Tests

Test artifacts

Artifacts used in tests are located here: ./tests/fixtures/drop_in_model_artifacts.
There is also the code in (*.ipynb, Pytorch.py, Rmodel.R, etc files) to generate those artifacts.
Check for generate* scripts in ./tests/fixtures/drop_in_model_artifacts and ./tests/fixtures/artifacts.py

Model examples in ./model_templates are also used in functional testing. In the most cases, artifacts for those models are the same as in the ./tests/fixtures/drop_in_model_artifacts and can be simply copied accordingly. If artifact for model template is not in the ./tests/fixtures/drop_in_model_artifacts, check template's README for more instructions.

Communication

Some places to ask for help are:

datarobot-user-models's People

Contributors

yakov-g avatar scottp-ml avatar jmbannon avatar jlpolit avatar dependabot[bot] avatar rvorobii avatar drenfr01 avatar zohar-mizrahi avatar gkoundry avatar timsetsfire avatar rsugumar avatar lkanggithub avatar elatt avatar klichukb avatar eric-s-s avatar aslisabanci avatar dmytrokarpovych avatar liororama avatar cdevent avatar snyk-bot avatar m-borkowski avatar amarmudrankit avatar zachmayer avatar brau0300 avatar mxpoliakov avatar ahjota avatar vnedvyga avatar pudr avatar vwvolodya avatar cartertroy avatar

Watchers

James Cloos avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.