Code Monkey home page Code Monkey logo

leaves's Introduction

leaves

version Build Status GoDoc Coverage Status Go Report Card

Logo

Introduction

leaves is a library implementing prediction code for GBRT (Gradient Boosting Regression Trees) models in pure Go. The goal of the project - make it possible to use models from popular GBRT frameworks in Go programs without C API bindings.

NOTE: Before 1.0.0 release the API is a subject to change.

Features

  • General Features:
    • support parallel predictions for batches
    • support sigmoid, softmax transformation functions
    • support getting leaf indices of decision trees
  • Support LightGBM (repo) models:
    • read models from text format and from JSON format
    • support gbdt, rf (random forest) and dart models
    • support multiclass predictions
    • addition optimizations for categorical features (for example, one hot decision rule)
    • addition optimizations exploiting only prediction usage
  • Support XGBoost (repo) models:
    • read models from binary format
    • support gbtree, gblinear, dart models
    • support multiclass predictions
    • support missing values (nan)
  • Support scikit-learn (repo) tree models (experimental support):
    • read models from pickle format (protocol 0)
    • support sklearn.ensemble.GradientBoostingClassifier

Usage examples

In order to start, go get this repository:

go get github.com/dmitryikh/leaves

Minimal example:

package main

import (
	"fmt"

	"github.com/dmitryikh/leaves"
)

func main() {
	// 1. Read model
	useTransformation := true
	model, err := leaves.LGEnsembleFromFile("lightgbm_model.txt", useTransformation)
	if err != nil {
		panic(err)
	}

	// 2. Do predictions!
	fvals := []float64{1.0, 2.0, 3.0}
	p := model.PredictSingle(fvals, 0)
	fmt.Printf("Prediction for %v: %f\n", fvals, p)
}

In order to use XGBoost model, just change leaves.LGEnsembleFromFile, to leaves.XGEnsembleFromFile.

Documentation

Documentation is hosted on godoc (link). Documentation contains complex usage examples and full API reference. Some additional information about usage examples can be found in leaves_test.go.

Compatibility

Most leaves features are tested to be compatible with old and coming versions of GBRT libraries. In compatibility.md one can found detailed report about leaves correctness against different versions of external GBRT libraries.

Some additional information on new features and backward compatibility can be found in NOTES.md.

Benchmark

Below are comparisons of prediction speed on batches (~1000 objects in 1 API call). Hardware: MacBook Pro (15-inch, 2017), 2,9 GHz Intel Core i7, 16 ГБ 2133 MHz LPDDR3. C API implementations were called from python bindings. But large batch size should neglect overhead of python bindings. leaves benchmarks were run by means of golang test framework: go test -bench. See benchmark for mode details on measurments. See testdata/README.md for data preparation pipelines.

Single thread:

Test Case Features Trees Batch size C API leaves
LightGBM MS LTR 137 500 1000 49ms 51ms
LightGBM Higgs 28 500 1000 50ms 50ms
LightGBM KDD Cup 99* 41 1200 1000 70ms 85ms
XGBoost Higgs 28 500 1000 44ms 50ms

4 threads:

Test Case Features Trees Batch size C API leaves
LightGBM MS LTR 137 500 1000 14ms 14ms
LightGBM Higgs 28 500 1000 14ms 14ms
LightGBM KDD Cup 99* 41 1200 1000 19ms 24ms
XGBoost Higgs 28 500 1000 ? 14ms

(?) - currenly I'm unable to utilize multithreading form XGBoost predictions by means of python bindings

(*) - KDD Cup 99 problem involves continuous and categorical features simultaneously

Limitations

  • LightGBM models:
    • limited support of transformation functions (support only sigmoid, softmax)
  • XGBoost models:
    • limited support of transformation functions (support only sigmoid, softmax)
    • could be slight divergence between C API predictions vs. leaves because of floating point convertions and comparisons tolerances
  • scikit-learn tree models:
    • no support transformations functions. Output scores is raw scores (as from GradientBoostingClassifier.decision_function)
    • only pickle protocol 0 is supported
    • could be slight divergence between sklearn predictions vs. leaves because of floating point convertions and comparisons tolerances

Contacts

In case if you are interested in the project or if you have questions, please contact with me by email: khdmitryi at gmail.com

leaves's People

Contributors

arnwas avatar dmitryikh avatar erjanmx avatar fredrikluo avatar imscientist avatar mottl avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

leaves's Issues

The prediction is wrong when using XGEnsembleFromFile to load model

I'm using xgbModel.nativeBooster.saveModel on spark to save the native model, then by XGEnsembleFromFile loading model to predict the validation dataset, but the results are not meet the same prediction done on spark. Here are the results predicted on leaves framework:

label: 1, pred: 0.836042
label: 1, pred: 0.836042
label: 1, pred: 0.797784
label: 1, pred: 0.934794
label: 1, pred: 0.793824
label: 1, pred: 0.797579
label: 1, pred: 0.959390
label: 1, pred: 0.959390
label: 1, pred: 0.959390
label: 1, pred: 0.704733
label: 1, pred: 0.787566
label: 1, pred: 0.941911
label: 1, pred: 0.934794
label: 1, pred: 0.749724
label: 1, pred: 0.929430
label: 1, pred: 0.931993
label: 1, pred: 0.797579
label: 1, pred: 0.839686
label: 1, pred: 0.759537
label: 1, pred: 0.813373
label: 1, pred: 0.760041
label: 1, pred: 0.793824
label: 1, pred: 0.934794
label: 1, pred: 0.759537
label: 1, pred: 0.929430
label: 1, pred: 0.945538
label: 1, pred: 0.785153
label: 1, pred: 0.959390
label: 1, pred: 0.793824
label: 1, pred: 0.779831
label: 1, pred: 0.959390
label: 1, pred: 0.749724
label: 1, pred: 0.941911
label: 1, pred: 0.798052
label: 1, pred: 0.749724
label: 1, pred: 0.931993
label: 1, pred: 0.749724
label: 1, pred: 0.929430
label: 1, pred: 0.839686
label: 1, pred: 0.839686
label: 1, pred: 0.806166
label: 1, pred: 0.934794
label: 1, pred: 0.839686
label: 1, pred: 0.785153
label: 1, pred: 0.806166
label: 1, pred: 0.945538
label: 1, pred: 0.803833
label: 1, pred: 0.759537
label: 1, pred: 0.806166
label: 1, pred: 0.768660
label: 1, pred: 0.797784
label: 1, pred: 0.931993
label: 1, pred: 0.749724
label: 1, pred: 0.824530
label: 1, pred: 0.959390
label: 1, pred: 0.959390
label: 1, pred: 0.806893
label: 1, pred: 0.929430
label: 1, pred: 0.803833
label: 1, pred: 0.797148
label: 1, pred: 0.931993
label: 1, pred: 0.797579
label: 1, pred: 0.787042
label: 1, pred: 0.803833
label: 1, pred: 0.959390
label: 1, pred: 0.931993
label: 1, pred: 0.806166
label: 1, pred: 0.836042
label: 1, pred: 0.934794
label: 1, pred: 0.934794
label: 1, pred: 0.803833
label: 1, pred: 0.749724
label: 1, pred: 0.931993
label: 1, pred: 0.759537
label: 1, pred: 0.779831
label: 1, pred: 0.787042
label: 1, pred: 0.785153
label: 1, pred: 0.749724
label: 1, pred: 0.749724
label: 1, pred: 0.934794
label: 1, pred: 0.929430
label: 1, pred: 0.797579
label: 1, pred: 0.945538
label: 1, pred: 0.934794
label: 1, pred: 0.959390
label: 1, pred: 0.959390
label: 1, pred: 0.787042
label: 1, pred: 0.787042
label: 1, pred: 0.931993
label: 1, pred: 0.759537
label: 1, pred: 0.941911
label: 1, pred: 0.749724
label: 1, pred: 0.850764
label: 1, pred: 0.945538
label: 1, pred: 0.803833
label: 1, pred: 0.749724
label: 1, pred: 0.797579
label: 1, pred: 0.785153
label: 1, pred: 0.941911
label: 1, pred: 0.806166
label: 0, pred: 0.767173
label: 0, pred: 0.807510
label: 0, pred: 0.797784
label: 0, pred: 0.824530
label: 0, pred: 0.839686
label: 0, pred: 0.767173
label: 0, pred: 0.839686
label: 0, pred: 0.767176
label: 0, pred: 0.797579
label: 0, pred: 0.793824
label: 0, pred: 0.772110
label: 0, pred: 0.768660
label: 0, pred: 0.759537
label: 0, pred: 0.839686
label: 0, pred: 0.759537
label: 0, pred: 0.929430
label: 0, pred: 0.941911
label: 0, pred: 0.822525
label: 0, pred: 0.839686
label: 0, pred: 0.945538
label: 0, pred: 0.749724
label: 0, pred: 0.929430
label: 0, pred: 0.787042
label: 0, pred: 0.797579
label: 0, pred: 0.797784
label: 0, pred: 0.797784
label: 0, pred: 0.945538
label: 0, pred: 0.785153
label: 0, pred: 0.797784
label: 0, pred: 0.836042
label: 0, pred: 0.931993
label: 0, pred: 0.836042
label: 0, pred: 0.779831
label: 0, pred: 0.945538
label: 0, pred: 0.812733
label: 0, pred: 0.945538
label: 0, pred: 0.745542
label: 0, pred: 0.779849
label: 0, pred: 0.903047
label: 0, pred: 0.816076
label: 0, pred: 0.807510
label: 0, pred: 0.749971
label: 0, pred: 0.945538
label: 0, pred: 0.804371
label: 0, pred: 0.767173
label: 0, pred: 0.934794
label: 0, pred: 0.785153
label: 0, pred: 0.767173
label: 0, pred: 0.797784
label: 0, pred: 0.785153
label: 0, pred: 0.807510
label: 0, pred: 0.768660
label: 0, pred: 0.804371
label: 0, pred: 0.787042
label: 0, pred: 0.704733
label: 0, pred: 0.813373
label: 0, pred: 0.749724
label: 0, pred: 0.836042
label: 0, pred: 0.772110
label: 0, pred: 0.855798
label: 0, pred: 0.836042
label: 0, pred: 0.784896
label: 0, pred: 0.804371
label: 0, pred: 0.813373
label: 0, pred: 0.749724
label: 0, pred: 0.903047
label: 0, pred: 0.787042
label: 0, pred: 0.839686
label: 0, pred: 0.759537
label: 0, pred: 0.797579
label: 0, pred: 0.803833
label: 0, pred: 0.793824
label: 0, pred: 0.749724
label: 0, pred: 0.806166
label: 0, pred: 0.793824
label: 0, pred: 0.793824
label: 0, pred: 0.787042
label: 0, pred: 0.806166
label: 0, pred: 0.903047
label: 0, pred: 0.839686
label: 0, pred: 0.768660
label: 0, pred: 0.787042
label: 0, pred: 0.745542
label: 0, pred: 0.787042
label: 0, pred: 0.802708
label: 0, pred: 0.797784
label: 0, pred: 0.839686
label: 0, pred: 0.929430
label: 0, pred: 0.803833
label: 0, pred: 0.704733
label: 0, pred: 0.704733
label: 0, pred: 0.793824
label: 0, pred: 0.793824
label: 0, pred: 0.813373
label: 0, pred: 0.836042
label: 0, pred: 0.767173
label: 0, pred: 0.803833
label: 0, pred: 0.793824
label: 0, pred: 0.818067
label: 0, pred: 0.787566

LG: num_leaves=1 support

LightGBM tree can be like next:

Tree=2236
num_leaves=1
num_cat=0
split_feature=
split_gain=
threshold=
decision_type=
left_child=
right_child=
leaf_value=0
leaf_count=
internal_value=
internal_count=
shrinkage=1.03754e-322

But leaves treats num_leaves < 2 as input error.

TODO:

  • support a tree with only one leaf
  • add test for this case

Question: support for objective:quantile

I have a model trained with quantile regression in light gbm. I get an error that this is not a valid option for objective when I used my model. Is there a workaround to get it working?

xgEnsemble prediction results are different from xgboost in python

I traning and testing data with xgboost in python, then use leaves in production env.
The more infos are there:

In Python
xgb testing, The data structure that I set up with pd.DataFrame is
[0:value1 1:v2, 2:v3, ... , n:v(n+1)]
the value1 is any value int type. and v2, ... , v(n+1) is float64 type. The 0 is prediction value.
This result is testing result.

And this structure:
[feature1:v2, f2:v3, ... , f(n):v(n+1)]
This result is NOT testing result.

In Golang
and I use leaves XGEnsembleFromFile->model.PredictCSR() also the result is NOT testing result.

I have tried to find how to solve it for over 5h like add {0:0} to first features group, but for my ridiculous low English level and Math level I can't find it.
What's wrong with my testing data = =

obtain the leaf index of gbdt tree

My online prediction service wants to use GBDT + LR (Practical Lessons from Predicting Clicks on Ads at Facebook) algorithm combination, It will use the leaf index of tree. But Leaves doesn't support it.

error for load xgboost:gbtree

I got an error when I tried to load binary model of xgboost:gbtree, the error message as follow:

panic: unexpected EOF

goroutine 1 [running]:
main.main()
/Users/zhangxiatian/tuotuo/workspace/go/predictor/main.go:13 +0x1ba

Process finished with exit code 2

======================
the code as follow:

package main

import (
"fmt"

"github.com/dmitryikh/leaves"
)

func main() {
// 1. Read model
model, err := leaves.XGEnsembleFromFile("/Users/zhangxiatian/tuotuo/recsys/engin/model/model")
if err != nil {
panic(err)
}

}

support DART from LightGBM?

The source code of DART class in LightGBM seems like nothing we should do in leaves and this class of models is already supported..

At least we should add test on this case..

Gonum/mat

Currently, leaves uses a standalone matrix implementation that's almost completely binary compatible with gonum/mat.Dense implementation.

It would be very beneficial to just use the one from gonum/mat. (Or, at least, implement the mat.Matrix interface.)

common model interface{}

Specifically for rests purpose it would be useful to have a common interface for LGEnsemble & XGEnsemble.

Unexpected objective field: 'lambdarank'

leaves.LGEnsembleFromFile() failed when load an objective=lambdarank model (lightGBM)

Error message: unexpected objective field: 'lambdarank', model:

tree
version=v2
num_class=1
num_tree_per_iteration=1
label_index=0
max_feature_idx=24
objective=lambdarank
feature_names=t quality freshness navboost pctr video_type lctr_1_3 lctr_4_7 lctr_8_30 sctr_1_3 sctr_4_7 sctr_8_30 ctr_1_3 ctr_4_7 ctr_8_30 loglclick_1_3 logclick_1_3 logsclick_1_3 lctr_ins ctr_ins sctr_ins loglclick_ins logclick_ins logsclick_ins instant_navboost
feature_infos=[0:1.3200000524520874] [3.3299998904112726e-05:1] [0.36787900328636169:1] [0.36790001392364502:0.9999966025352478] [0:0.98189848661422729] [1:200] [0:10.87989330291748] [0:9.3969650268554688] [0:11.486390113830566] [0:4.2822332382202148] [0:3.7750816345214844] [0:2.8636219501495361] [0:8.7641057968139648] [0:7.1885638236999512] [0:10.401005744934082] [0:6.4371075630187988] [0:6.4220900535583496] [0:5.9216046333312988] [0:5.5910482406616211] [0:3.3260509967803955] [0:1.3753839731216431] [0:5.3813371658325195] [0:5.3396997451782227] [0:4.9291071891784668] [0.36790001392364502:0.9999929666519165]
tree_sizes=1308 911 993 1073 1235 1154 992 1316 1234 1151 997 1163 1234 1077 1090 1244 1237 1152 1400 1228 1246 1310 1240 1072 1327 1068 1242 1081 1312 1082 1162 1000 1330 1310 1408 1253 1165 1328 1082 1004 1172 1328 1161 1081 1151 1323 1325 1321 1410 1166 1073 1403 996 1242 991 1336 1232 1250 995 1309

Does leaves support go 1.11

After changing go version to 1.11 limited in "go.mod", it passed all unit test
my go version on my machine: go1.11.2 darwin/amd64

support transformation functions

Currently leaves outputs prediction as a raw score. Client code should transform it to probabilities (logistic), lambdarank and so.

Let's introduce this ability to leaves.

Short list for XGBoost:

  • "binary:logistic"
  • "binary:logitraw"
  • "multi:softmax"
  • "multi:softprob"
  • "reg:linear"

Short list for LightGBM:
todo

TestXGHiggs mismatch

Here is discrepancy between original predictions and leaves predictions for XGBoost ensemble for Higgs problem (45th row of test data).

One should investigate: is it a bug or because of float tolerances on decision thresholds?

internal/xbin I/O

This internal package makes quite extensive use calls to 'binary.Read'.

This is very slow as it makes heavy use of reflection.
One should use the binary.ByteOrder.PutUintXX and binary.ByteOrder.UintXX methods instead.

Understanding the output of Predict

Hi,

I'm not sure I fully understand the output of the Predict() methods.

I have a fully trained model with 9 classes and 100 estimators. I then run:

predictions := make([]float64, 9)
err = model.Predict(values, 100, predictions)
util.SigmoidFloat64SliceInplace(predictions)
log.Infof("Prediction for %v:\n %v", values, predictions)

That yields:

Prediction for [110 0 12 0 2 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0]: 
[0.2276 0.1822 0.2664 0.0594 0.0682 0.9859 0.1283 0.6349 0.0706]

I understand those are the probabilities for EACH of the 9 classes being the right one. However, how am I able to get the actual value of the class? In python if I do y_pred = model.predict(values), it will correctly show me the expected class values. E.g. my class values look like this: 1242, 1152, 1552, 6662, etc. How can I map the prediction output from above to the class values? I haven't provided any specific order of it to the model

xgboost consistency failed

i build xgb model by python, and then run the results of test dataset.
but when i use leaves to load model and predict, the results is inconsistent with python results.

and i test lgb model with the same dataset, the results are consistent.

only version=v2 is supported

code:

import (
    "fmt"
    "github.com/dmitryikh/leaves"
)

func main() {
    // 1. Read model
    useTransformation := true
    model, err := leaves.LGEnsembleFromFile("lightgbm_model.txt", useTransformation)
    if err != nil {
        fmt.Println(err)
        panic(err)
    }
}

go build lgbuse.go, and then report an error:

only version=v2 is supported
panic: only version=v2 is supported
goroutine 1 [running]:
main.main()
lgbuse.go:14 +0x246
exit status 2

The line of code that cause probles:
model, err := leaves.LGEnsembleFromFile("lightgbm_model.txt", useTransformation)

How can I solve this problem?

Total incorrect python xgboost train model,use leaves load model and predict

We use spark to generate libsvm file, then use python sklearn to load it and xgboost to train and save model, finaly use leaves load it and predict.
the predict result was total incorrect between python demo and go.
just want to ask if leve not support or we use leaves wrong.
the python code like:

my_workpath = 'D:\\project\\py\\train_demo\\'
X_train, y_train = load_svmlight_file(my_workpath + 'train')
X_test, y_test = load_svmlight_file(my_workpath + 'validation')
bst = XGBClassifier()
bst.fit(X_train, y_train)
bst.save_model(my_workpath + "train_model")
train_preds = [x[1] for x in bst.predict_proba(X_train)]
test_preds = [x[1] for x in bst.predict_proba(X_test)]

the go code like:

model, e := leaves.XGEnsembleFromFile(model_path,true)
	if e != nil{
		println(e)
	}
	if model.Transformation().Type() != transformation.Logistic {
		log.Fatalf("expected TransforType = Logistic (got %s)", model.Transformation().Name())
	}
	csr, err := mat.CSRMatFromLibsvmFile(validate_path, 0, true)
	if err != nil{
		println(err)
	}
	predictions := make([]float64, csr.Rows()*model.NOutputGroups())
	e = model.PredictCSR(csr.RowHeaders, csr.ColIndexes, csr.Values, predictions, 50, 5)
	if e != nil{
		println(e)
	}
	fmt.Printf("Prediction for %v\n", predictions)

Prediction result always be 0.000

I use leaves to load my lightgbm model and predict instances, the results are always 0.00, while use python to predict, the result is not. Any one meet the problem ?
The type of feature including numerical and categorical

Support for newer versions of XGBoost

Something has changed in XGBoost model's binary format. The highest versions I've managed to make leaves work with is 1.0. Starting from 1.1+ I keep getting "panic: unexpected EOF". Is support for newer versions planned?
Moreover, they've started to save models in JSON format and it looks like they're going to deprecate binaries altogether.

Meet error when load xgboost model which trained with the API in sklearn

leaves does not support the sklearn xgboost model ?
I use the below python code to train one xgboost model, meet error when use the below API to load this model in go code
leaves.XGEnsembleFromFile("xg_iris.model", false)


The error :
mark@mark:~/golang $ go run predict_iris.go
Name: xgboost.gbtree
NFeatures: 4
NOutputGroups: 3
NEstimators: 100
panic: different sizes: len(a) = 30, len(b) = 90

goroutine 1 [running]:
main.main()
/home/mark/golang/predict_iris.go:44 +0x686
exit status 2


Below is the python code to train the xgboost model using the xgboost API in sklearn:
import numpy as np
from sklearn import datasets
from sklearn.model_selection import train_test_split
import xgboost as xgb
from xgboost.sklearn import XGBClassifier

X, y = datasets.load_iris(return_X_y=True)
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=0)

xg_train = xgb.DMatrix(X_train, label=y_train)
xg_test = xgb.DMatrix(X_test, label=y_test)
params = {
'objective': 'multi:softmax',
'num_class': 3,
}
n_estimators = 5
#clf = xgb.train(params, xg_train, n_estimators)
clf = XGBClassifier(**params)
clf = clf.fit(X_train, y_train)
y_pred = clf.predict_proba(X_test)[:,1]
clf.save_model('xg_iris.model')
np.savetxt('xg_iris_true_predictions.txt', y_pred, delimiter='\t')
datasets.dump_svmlight_file(X_test, y_test, 'iris_test.libsvm')

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.