Code Monkey home page Code Monkey logo

yggdrasil-decision-forests's Introduction

Yggdrasil Decision Forests (YDF) is a collection of state-of-the-art algorithms for the training, serving and interpretation of Decision Forest models. The library is developed in C++ and available in C++, CLI (command-line-interface, i.e. shell commands) and in TensorFlow under the name TensorFlow Decision Forests (TF-DF).

Developing models in TF-DF and productionizing them (possibly including re-training) in C++ with YDF allows both for a flexible and fast development and an efficient and safe serving.

Usage example

Train, evaluate and benchmark the speed of a model in a few shell lines with the CLI interface:

# Training configuration
echo 'label:"my_label" learner:"RANDOM_FOREST" ' > config.pbtxt
# Scan the dataset
infer_dataspec --dataset="csv:train.csv" --output="spec.pbtxt"
# Train a model
train --dataset="csv:train.csv" --dataspec="spec.pbtxt" --config="config.pbtxt" --output="my_model"
# Evaluate the model
evaluate --dataset="csv:test.csv" --model="my_model" > evaluation.txt
# Benchmark the speed of the model
benchmark_inference --dataset="csv:test.csv" --model="my_model" > benchmark.txt

(see the examples/beginner.sh for more details)

or use the C++ interface:

auto dataset_path = "csv:/train@10";
// Training configuration
TrainingConfig train_config;
train_config.set_learner("RANDOM_FOREST");
train_config.set_task(Task::CLASSIFICATION);
train_config.set_label("my_label");
// Scan the dataset
DataSpecification spec;
CreateDataSpec(dataset_path, false, {}, &spec);
// Train a model
std::unique_ptr<AbstractLearner> learner;
GetLearner(train_config, &learner);
auto model = learner->Train(dataset_path, spec);
// Export the model
SaveModel("my_model", model.get());

(see the examples/beginner.cc for more details)

or use the Keras/Python interface of TensorFlow Decision Forests:

import tensorflow_decision_forests as tfdf
import pandas as pd
# Load the dataset in a Pandas dataframe.
train_df = pd.read_csv("project/train.csv")
# Convert the dataset into a TensorFlow dataset.
train_ds = tfdf.keras.pd_dataframe_to_tf_dataset(train_df, label="my_label")
# Train the model
model = tfdf.keras.RandomForestModel()
model.fit(train_ds)
# Export a SavedModel.
model.save("project/model")

(see TensorFlow Decision Forests for more details)

Documentation & Resources

The following resources are available:

Installation from pre-compiled binaries

Download one of the build releases, and then run examples/beginner.{sh,bat}.

Installation from Source

Install Bazel and run:

git clone https://github.com/google/yggdrasil-decision-forests.git
cd yggdrasil_decision_forests
bazel build //yggdrasil_decision_forests/cli:all --config=linux_cpp17 --config=linux_avx2

# Then, run the example:
examples/beginner.sh

See the installation page for more details, troubleshooting and alternative installation solutions.

Long-time-support commitments

Inference and serving

  • The serving code is isolated from the rest of the framework (i.e., training, evaluation) and has minimal dependencies.
  • Changes to serving-related code are guaranteed to be backward compatible.
  • Model inference is deterministic: the same example is guaranteed to yield the same prediction.
  • Learners and models are extensively tested, including integration testing on real datasets; and, there exists no execution path in the serving code that crashes as a result of an error; Instead, in case of failure (e.g., malformed input example), the inference code returns a util::Status.

Training

  • Hyper-parameters' semantic is never modified.
  • The default value of hyper-parameters is never modified.
  • The default value of a newly-introduced hyper-parameter is set in such a way that the hyper-parameter is effectively disabled.

Quality Assurance

The following mechanisms will be put in place to ensure the quality of the library:

  • Peer-reviewing.
  • Unit testing.
  • Training benchmarks with ranges of acceptable evaluation metrics.
  • Sanitizers.

Contributing

Contributions to TensorFlow Decision Forests and Yggdrasil Decision Forests are welcome. If you want to contribute, make sure to review the user manual, developer manual and contribution guidelines.

Credits

TensorFlow Decision Forests was developed by:

  • Mathieu Guillame-Bert (gbm AT google DOT com)
  • Jan Pfeifer (janpf AT google DOT com)
  • Sebastian Bruch (sebastian AT bruch DOT io)
  • Arvind Srinivasan (arvnd AT google DOT com)

License

Apache License 2.0

yggdrasil-decision-forests's People

Contributors

achoum avatar arvnds avatar janpfeifer avatar

Watchers

 avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.