dmitryikh / leaves Goto Github PK

View Code? Open in Web Editor NEW

413.0 17.0 70.0 1.25 MB

pure Go implementation of prediction part for GBRT (Gradient Boosting Regression Trees) models from popular frameworks

License: MIT License

Go 83.29% Shell 0.63% Python 16.08%

machine-learning lightgbm xgboost decision-trees boosting go golang

leaves's Introduction

leaves

Introduction

leaves is a library implementing prediction code for GBRT (Gradient Boosting Regression Trees) models in pure Go. The goal of the project - make it possible to use models from popular GBRT frameworks in Go programs without C API bindings.

NOTE: Before 1.0.0 release the API is a subject to change.

Features

General Features:
- support parallel predictions for batches
- support sigmoid, softmax transformation functions
- support getting leaf indices of decision trees
Support LightGBM (repo) models:
- read models from text format and from JSON format
- support gbdt, rf (random forest) and dart models
- support multiclass predictions
- addition optimizations for categorical features (for example, one hot decision rule)
- addition optimizations exploiting only prediction usage
Support XGBoost (repo) models:
- read models from binary format
- support gbtree, gblinear, dart models
- support multiclass predictions
- support missing values (nan)
Support scikit-learn (repo) tree models (experimental support):
- read models from pickle format (protocol 0)
- support sklearn.ensemble.GradientBoostingClassifier

Usage examples

In order to start, go get this repository:

go get github.com/dmitryikh/leaves

Minimal example:

package main

import (
	"fmt"

	"github.com/dmitryikh/leaves"
)

func main() {
	// 1. Read model
	useTransformation := true
	model, err := leaves.LGEnsembleFromFile("lightgbm_model.txt", useTransformation)
	if err != nil {
		panic(err)
	}

	// 2. Do predictions!
	fvals := []float64{1.0, 2.0, 3.0}
	p := model.PredictSingle(fvals, 0)
	fmt.Printf("Prediction for %v: %f\n", fvals, p)
}

In order to use XGBoost model, just change leaves.LGEnsembleFromFile, to leaves.XGEnsembleFromFile.

Documentation

Documentation is hosted on godoc (link). Documentation contains complex usage examples and full API reference. Some additional information about usage examples can be found in leaves_test.go.

Compatibility

Most leaves features are tested to be compatible with old and coming versions of GBRT libraries. In compatibility.md one can found detailed report about leaves correctness against different versions of external GBRT libraries.

Some additional information on new features and backward compatibility can be found in NOTES.md.

Benchmark

Below are comparisons of prediction speed on batches (~1000 objects in 1 API call). Hardware: MacBook Pro (15-inch, 2017), 2,9 GHz Intel Core i7, 16 ГБ 2133 MHz LPDDR3. C API implementations were called from python bindings. But large batch size should neglect overhead of python bindings. leaves benchmarks were run by means of golang test framework: go test -bench. See benchmark for mode details on measurments. See testdata/README.md for data preparation pipelines.

Single thread:

Test Case	Features	Trees	Batch size	C API	leaves
LightGBM MS LTR	137	500	1000	49ms	51ms
LightGBM Higgs	28	500	1000	50ms	50ms
LightGBM KDD Cup 99*	41	1200	1000	70ms	85ms
XGBoost Higgs	28	500	1000	44ms	50ms

4 threads:

Test Case	Features	Trees	Batch size	C API	leaves
LightGBM MS LTR	137	500	1000	14ms	14ms
LightGBM Higgs	28	500	1000	14ms	14ms
LightGBM KDD Cup 99*	41	1200	1000	19ms	24ms
XGBoost Higgs	28	500	1000	?	14ms

(?) - currenly I'm unable to utilize multithreading form XGBoost predictions by means of python bindings

(*) - KDD Cup 99 problem involves continuous and categorical features simultaneously

Limitations

LightGBM models:
- limited support of transformation functions (support only sigmoid, softmax)
XGBoost models:
- limited support of transformation functions (support only sigmoid, softmax)
- could be slight divergence between C API predictions vs. leaves because of floating point convertions and comparisons tolerances
scikit-learn tree models:
- no support transformations functions. Output scores is raw scores (as from GradientBoostingClassifier.decision_function)
- only pickle protocol 0 is supported
- could be slight divergence between sklearn predictions vs. leaves because of floating point convertions and comparisons tolerances

Contacts

In case if you are interested in the project or if you have questions, please contact with me by email: khdmitryi at gmail.com

leaves's People

Contributors

Stargazers

Watchers

Forkers

happy-ferret sbinet-gonum sovianum frankiegu awinnie 5up3rc yuanzichao aivarasbaranauskas jingxil justingoes lindaqiang zartbot captainxudui wkl7123 yunfei86 abeusher grasevski t0mk hxmqd a-lucas wade-welles mewbak hcpups0708 arnwas noyousjtu tusharkalecam mottl cagito-ymzhu42 koth lotuski fredrikluo shanghuiyang xxnmxx joelwesleyreed ucfunnel adangadang stevenlee-belief mikeldiezs dumpmemory raceli wap2017 tonylee19544 flukeish isgasho wushicanasl fatefaker boatower gyyixr wx2480 zjuwrong jpbirdy xierui921326 stillmatic artimet123 ebarahona smjure fasttoolbox vv1zard iq-scm cts2021 imscientist shantanu1058

leaves's Issues

Is there the plan for xgboost Ranker with rank:pairwise?

Hi, As the title said, May leaves would support this type of model, thank for your teams' coding, haha

The prediction is wrong when using XGEnsembleFromFile to load model

I'm using xgbModel.nativeBooster.saveModel on spark to save the native model, then by XGEnsembleFromFile loading model to predict the validation dataset, but the results are not meet the same prediction done on spark. Here are the results predicted on leaves framework:

label: 1, pred: 0.836042
label: 1, pred: 0.836042
label: 1, pred: 0.797784
label: 1, pred: 0.934794
label: 1, pred: 0.793824
label: 1, pred: 0.797579
label: 1, pred: 0.959390
label: 1, pred: 0.959390
label: 1, pred: 0.959390
label: 1, pred: 0.704733
label: 1, pred: 0.787566
label: 1, pred: 0.941911
label: 1, pred: 0.934794
label: 1, pred: 0.749724
label: 1, pred: 0.929430
label: 1, pred: 0.931993
label: 1, pred: 0.797579
label: 1, pred: 0.839686
label: 1, pred: 0.759537
label: 1, pred: 0.813373
label: 1, pred: 0.760041
label: 1, pred: 0.793824
label: 1, pred: 0.934794
label: 1, pred: 0.759537
label: 1, pred: 0.929430
label: 1, pred: 0.945538
label: 1, pred: 0.785153
label: 1, pred: 0.959390
label: 1, pred: 0.793824
label: 1, pred: 0.779831
label: 1, pred: 0.959390
label: 1, pred: 0.749724
label: 1, pred: 0.941911
label: 1, pred: 0.798052
label: 1, pred: 0.749724
label: 1, pred: 0.931993
label: 1, pred: 0.749724
label: 1, pred: 0.929430
label: 1, pred: 0.839686
label: 1, pred: 0.839686
label: 1, pred: 0.806166
label: 1, pred: 0.934794
label: 1, pred: 0.839686
label: 1, pred: 0.785153
label: 1, pred: 0.806166
label: 1, pred: 0.945538
label: 1, pred: 0.803833
label: 1, pred: 0.759537
label: 1, pred: 0.806166
label: 1, pred: 0.768660
label: 1, pred: 0.797784
label: 1, pred: 0.931993
label: 1, pred: 0.749724
label: 1, pred: 0.824530
label: 1, pred: 0.959390
label: 1, pred: 0.959390
label: 1, pred: 0.806893
label: 1, pred: 0.929430
label: 1, pred: 0.803833
label: 1, pred: 0.797148
label: 1, pred: 0.931993
label: 1, pred: 0.797579
label: 1, pred: 0.787042
label: 1, pred: 0.803833
label: 1, pred: 0.959390
label: 1, pred: 0.931993
label: 1, pred: 0.806166
label: 1, pred: 0.836042
label: 1, pred: 0.934794
label: 1, pred: 0.934794
label: 1, pred: 0.803833
label: 1, pred: 0.749724
label: 1, pred: 0.931993
label: 1, pred: 0.759537
label: 1, pred: 0.779831
label: 1, pred: 0.787042
label: 1, pred: 0.785153
label: 1, pred: 0.749724
label: 1, pred: 0.749724
label: 1, pred: 0.934794
label: 1, pred: 0.929430
label: 1, pred: 0.797579
label: 1, pred: 0.945538
label: 1, pred: 0.934794
label: 1, pred: 0.959390
label: 1, pred: 0.959390
label: 1, pred: 0.787042
label: 1, pred: 0.787042
label: 1, pred: 0.931993
label: 1, pred: 0.759537
label: 1, pred: 0.941911
label: 1, pred: 0.749724
label: 1, pred: 0.850764
label: 1, pred: 0.945538
label: 1, pred: 0.803833
label: 1, pred: 0.749724
label: 1, pred: 0.797579
label: 1, pred: 0.785153
label: 1, pred: 0.941911
label: 1, pred: 0.806166
label: 0, pred: 0.767173
label: 0, pred: 0.807510
label: 0, pred: 0.797784
label: 0, pred: 0.824530
label: 0, pred: 0.839686
label: 0, pred: 0.767173
label: 0, pred: 0.839686
label: 0, pred: 0.767176
label: 0, pred: 0.797579
label: 0, pred: 0.793824
label: 0, pred: 0.772110
label: 0, pred: 0.768660
label: 0, pred: 0.759537
label: 0, pred: 0.839686
label: 0, pred: 0.759537
label: 0, pred: 0.929430
label: 0, pred: 0.941911
label: 0, pred: 0.822525
label: 0, pred: 0.839686
label: 0, pred: 0.945538
label: 0, pred: 0.749724
label: 0, pred: 0.929430
label: 0, pred: 0.787042
label: 0, pred: 0.797579
label: 0, pred: 0.797784
label: 0, pred: 0.797784
label: 0, pred: 0.945538
label: 0, pred: 0.785153
label: 0, pred: 0.797784
label: 0, pred: 0.836042
label: 0, pred: 0.931993
label: 0, pred: 0.836042
label: 0, pred: 0.779831
label: 0, pred: 0.945538
label: 0, pred: 0.812733
label: 0, pred: 0.945538
label: 0, pred: 0.745542
label: 0, pred: 0.779849
label: 0, pred: 0.903047
label: 0, pred: 0.816076
label: 0, pred: 0.807510
label: 0, pred: 0.749971
label: 0, pred: 0.945538
label: 0, pred: 0.804371
label: 0, pred: 0.767173
label: 0, pred: 0.934794
label: 0, pred: 0.785153
label: 0, pred: 0.767173
label: 0, pred: 0.797784
label: 0, pred: 0.785153
label: 0, pred: 0.807510
label: 0, pred: 0.768660
label: 0, pred: 0.804371
label: 0, pred: 0.787042
label: 0, pred: 0.704733
label: 0, pred: 0.813373
label: 0, pred: 0.749724
label: 0, pred: 0.836042
label: 0, pred: 0.772110
label: 0, pred: 0.855798
label: 0, pred: 0.836042
label: 0, pred: 0.784896
label: 0, pred: 0.804371
label: 0, pred: 0.813373
label: 0, pred: 0.749724
label: 0, pred: 0.903047
label: 0, pred: 0.787042
label: 0, pred: 0.839686
label: 0, pred: 0.759537
label: 0, pred: 0.797579
label: 0, pred: 0.803833
label: 0, pred: 0.793824
label: 0, pred: 0.749724
label: 0, pred: 0.806166
label: 0, pred: 0.793824
label: 0, pred: 0.793824
label: 0, pred: 0.787042
label: 0, pred: 0.806166
label: 0, pred: 0.903047
label: 0, pred: 0.839686
label: 0, pred: 0.768660
label: 0, pred: 0.787042
label: 0, pred: 0.745542
label: 0, pred: 0.787042
label: 0, pred: 0.802708
label: 0, pred: 0.797784
label: 0, pred: 0.839686
label: 0, pred: 0.929430
label: 0, pred: 0.803833
label: 0, pred: 0.704733
label: 0, pred: 0.704733
label: 0, pred: 0.793824
label: 0, pred: 0.793824
label: 0, pred: 0.813373
label: 0, pred: 0.836042
label: 0, pred: 0.767173
label: 0, pred: 0.803833
label: 0, pred: 0.793824
label: 0, pred: 0.818067
label: 0, pred: 0.787566

support v3 model encoding

lightgbm changed their model encoding to v3 in v2.3.0 to support weights in the model. Would like to see leaves support this new format.

LG: num_leaves=1 support

LightGBM tree can be like next:

Tree=2236
num_leaves=1
num_cat=0
split_feature=
split_gain=
threshold=
decision_type=
left_child=
right_child=
leaf_value=0
leaf_count=
internal_value=
internal_count=
shrinkage=1.03754e-322

But leaves treats num_leaves < 2 as input error.

TODO:

support a tree with only one leaf
add test for this case

support sklearn.ensemble.GradientBoostingRegressor

https://scikit-learn.org/stable/modules/generated/sklearn.ensemble.GradientBoostingRegressor.html

Benchmarking C API vs pure go approaches

Question: support for objective:quantile

I have a model trained with quantile regression in light gbm. I get an error that this is not a valid option for objective when I used my model. Is there a workaround to get it working?

xgEnsemble prediction results are different from xgboost in python

I traning and testing data with xgboost in python, then use leaves in production env.
The more infos are there:

In Python
xgb testing, The data structure that I set up with pd.DataFrame is
[0:value1 1:v2, 2:v3, ... , n:v(n+1)]
the value1 is any value int type. and v2, ... , v(n+1) is float64 type. The 0 is prediction value.
This result is testing result.

And this structure:
[feature1:v2, f2:v3, ... , f(n):v(n+1)]
This result is NOT testing result.

In Golang
and I use leaves XGEnsembleFromFile->model.PredictCSR() also the result is NOT testing result.

I have tried to find how to solve it for over 5h like add {0:0} to first features group, but for my ridiculous low English level and Math level I can't find it.
What's wrong with my testing data = =

Support xgboost models

read LightGBM model from JSON

support sklearn.ensemble.RandomForestRegressor

https://scikit-learn.org/stable/modules/generated/sklearn.ensemble.RandomForestRegressor.html

obtain the leaf index of gbdt tree

My online prediction service wants to use GBDT + LR (Practical Lessons from Predicting Clicks on Ads at Facebook) algorithm combination, It will use the leaf index of tree. But Leaves doesn't support it.

error for load xgboost:gbtree

I got an error when I tried to load binary model of xgboost:gbtree, the error message as follow:

panic: unexpected EOF

goroutine 1 [running]:
main.main()
/Users/zhangxiatian/tuotuo/workspace/go/predictor/main.go:13 +0x1ba

Process finished with exit code 2

======================
the code as follow:

package main

import (
"fmt"

"github.com/dmitryikh/leaves"
)

func main() {
// 1. Read model
model, err := leaves.XGEnsembleFromFile("/Users/zhangxiatian/tuotuo/recsys/engin/model/model")
if err != nil {
panic(err)
}

}

support DART from LightGBM?

The source code of DART class in LightGBM seems like nothing we should do in leaves and this class of models is already supported..

At least we should add test on this case..

Does for the LightGBM have support for missing values

Does for the LightGBM have support for missing values and where can I see examples about it (read txt files with missing values)?

support XGBoost GBLinear (generalized linear models)

support pickle protocol 3

Gonum/mat

Currently, leaves uses a standalone matrix implementation that's almost completely binary compatible with gonum/mat.Dense implementation.

It would be very beneficial to just use the one from gonum/mat. (Or, at least, implement the mat.Matrix interface.)

what's exactly meanings of some parameters?

for such as loadTransformation and nEstimators, can any one tell me more details?

support multiclass predictions for LightGBM

support RandomForest from LightGBM

It seem that only thing one should do is to support average_output parameter from model file.

common model interface{}

Specifically for rests purpose it would be useful to have a common interface for LGEnsemble & XGEnsemble.

Unexpected objective field: 'lambdarank'

leaves.LGEnsembleFromFile() failed when load an objective=lambdarank model (lightGBM)

Error message: unexpected objective field: 'lambdarank', model:

tree
version=v2
num_class=1
num_tree_per_iteration=1
label_index=0
max_feature_idx=24
objective=lambdarank
feature_names=t quality freshness navboost pctr video_type lctr_1_3 lctr_4_7 lctr_8_30 sctr_1_3 sctr_4_7 sctr_8_30 ctr_1_3 ctr_4_7 ctr_8_30 loglclick_1_3 logclick_1_3 logsclick_1_3 lctr_ins ctr_ins sctr_ins loglclick_ins logclick_ins logsclick_ins instant_navboost
feature_infos=[0:1.3200000524520874] [3.3299998904112726e-05:1] [0.36787900328636169:1] [0.36790001392364502:0.9999966025352478] [0:0.98189848661422729] [1:200] [0:10.87989330291748] [0:9.3969650268554688] [0:11.486390113830566] [0:4.2822332382202148] [0:3.7750816345214844] [0:2.8636219501495361] [0:8.7641057968139648] [0:7.1885638236999512] [0:10.401005744934082] [0:6.4371075630187988] [0:6.4220900535583496] [0:5.9216046333312988] [0:5.5910482406616211] [0:3.3260509967803955] [0:1.3753839731216431] [0:5.3813371658325195] [0:5.3396997451782227] [0:4.9291071891784668] [0.36790001392364502:0.9999929666519165]
tree_sizes=1308 911 993 1073 1235 1154 992 1316 1234 1151 997 1163 1234 1077 1090 1244 1237 1152 1400 1228 1246 1310 1240 1072 1327 1068 1242 1081 1312 1082 1162 1000 1330 1310 1408 1253 1165 1328 1082 1004 1172 1328 1161 1081 1151 1323 1325 1321 1410 1166 1073 1403 996 1242 991 1336 1232 1250 995 1309

add test&benchmark on categorical features with LightGBM

support sklearn.ensemble.RandomForestClassifier

https://scikit-learn.org/stable/modules/generated/sklearn.ensemble.RandomForestClassifier.html

Does leaves support go 1.11

After changing go version to 1.11 limited in "go.mod", it passed all unit test
my go version on my machine: go1.11.2 darwin/amd64

support transformation functions

Currently leaves outputs prediction as a raw score. Client code should transform it to probabilities (logistic), lambdarank and so.

Let's introduce this ability to leaves.

Short list for XGBoost:

"binary:logistic"
"binary:logitraw"
"multi:softmax"
"multi:softprob"
"reg:linear"

Short list for LightGBM:
todo

How complicated would it be to provide the model training part as well?

Just a general question:
How complicated would it be to provide the model training part as well?
Are there any plans for it?

read XGBoost model from JSON

TestXGHiggs mismatch

Here is discrepancy between original predictions and leaves predictions for XGBoost ensemble for Higgs problem (45th row of test data).

One should investigate: is it a bug or because of float tolerances on decision thresholds?

Could you please add support for lambdarank(lambdamart)?

This is an edge tool for ranking applications Which is implemented in lightgbm.
Really hope it can be supported! Thanks!

internal/xbin I/O

This internal package makes quite extensive use calls to 'binary.Read'.

This is very slow as it makes heavy use of reflection.
One should use the binary.ByteOrder.PutUintXX and binary.ByteOrder.UintXX methods instead.

support DART from XGBoost

support pickle protocol 2

Support for returning feature_name in Ensemble struct

I want to get the feature name list in model, does the author have the space to support this feature?

support pickle protocol 4

Leaves can't load xgboost model which trained for java api?

when I load xgboost model return errs,model is trained for xgboost4j。

Understanding the output of Predict

Hi,

I'm not sure I fully understand the output of the Predict() methods.

I have a fully trained model with 9 classes and 100 estimators. I then run:

predictions := make([]float64, 9)
err = model.Predict(values, 100, predictions)
util.SigmoidFloat64SliceInplace(predictions)
log.Infof("Prediction for %v:\n %v", values, predictions)

That yields:

Prediction for [110 0 12 0 2 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0]: 
[0.2276 0.1822 0.2664 0.0594 0.0682 0.9859 0.1283 0.6349 0.0706]

I understand those are the probabilities for EACH of the 9 classes being the right one. However, how am I able to get the actual value of the class? In python if I do y_pred = model.predict(values), it will correctly show me the expected class values. E.g. my class values look like this: 1242, 1152, 1552, 6662, etc. How can I map the prediction output from above to the class values? I haven't provided any specific order of it to the model

add human readable text output of parsed pickle data

we have internal.pickle package to read pickle format.

It's difficult to debug what was read into the memory. We need a function to print parsed pickle data for debug purpose.

Get performance gain on multicore systems

Let's use multiprocessing for batch predictions

xgboost consistency failed

i build xgb model by python, and then run the results of test dataset.
but when i use leaves to load model and predict, the results is inconsistent with python results.

and i test lgb model with the same dataset, the results are consistent.

is this library compatible with catboost?

I hope you will implement support for catboost Yandex library

only version=v2 is supported

code:

import (
    "fmt"
    "github.com/dmitryikh/leaves"
)

func main() {
    // 1. Read model
    useTransformation := true
    model, err := leaves.LGEnsembleFromFile("lightgbm_model.txt", useTransformation)
    if err != nil {
        fmt.Println(err)
        panic(err)
    }
}

go build lgbuse.go, and then report an error:

only version=v2 is supported
panic: only version=v2 is supported
goroutine 1 [running]:
main.main()
lgbuse.go:14 +0x246
exit status 2

The line of code that cause probles:
model, err := leaves.LGEnsembleFromFile("lightgbm_model.txt", useTransformation)

How can I solve this problem?

Total incorrect python xgboost train model,use leaves load model and predict

We use spark to generate libsvm file, then use python sklearn to load it and xgboost to train and save model， finaly use leaves load it and predict.
the predict result was total incorrect between python demo and go.
just want to ask if leve not support or we use leaves wrong.
the python code like:

my_workpath = 'D:\\project\\py\\train_demo\\'
X_train, y_train = load_svmlight_file(my_workpath + 'train')
X_test, y_test = load_svmlight_file(my_workpath + 'validation')
bst = XGBClassifier()
bst.fit(X_train, y_train)
bst.save_model(my_workpath + "train_model")
train_preds = [x[1] for x in bst.predict_proba(X_train)]
test_preds = [x[1] for x in bst.predict_proba(X_test)]

the go code like:

model, e := leaves.XGEnsembleFromFile(model_path,true)
	if e != nil{
		println(e)
	}
	if model.Transformation().Type() != transformation.Logistic {
		log.Fatalf("expected TransforType = Logistic (got %s)", model.Transformation().Name())
	}
	csr, err := mat.CSRMatFromLibsvmFile(validate_path, 0, true)
	if err != nil{
		println(err)
	}
	predictions := make([]float64, csr.Rows()*model.NOutputGroups())
	e = model.PredictCSR(csr.RowHeaders, csr.ColIndexes, csr.Values, predictions, 50, 5)
	if e != nil{
		println(e)
	}
	fmt.Printf("Prediction for %v\n", predictions)

Prediction result always be 0.000

I use leaves to load my lightgbm model and predict instances, the results are always 0.00, while use python to predict, the result is not. Any one meet the problem ?
The type of feature including numerical and categorical

Add documentation for `doctest.py`

testscripts/doctest.py is utility to automate examples checking.
It should contain small documentation and examples inside of it.

Support for newer versions of XGBoost

Something has changed in XGBoost model's binary format. The highest versions I've managed to make leaves work with is 1.0. Starting from 1.1+ I keep getting "panic: unexpected EOF". Is support for newer versions planned?
Moreover, they've started to save models in JSON format and it looks like they're going to deprecate binaries altogether.

Meet error when load xgboost model which trained with the API in sklearn

leaves does not support the sklearn xgboost model ?
I use the below python code to train one xgboost model, meet error when use the below API to load this model in go code
leaves.XGEnsembleFromFile("xg_iris.model", false)

The error :
mark@mark:~/golang $ go run predict_iris.go
Name: xgboost.gbtree
NFeatures: 4
NOutputGroups: 3
NEstimators: 100
panic: different sizes: len(a) = 30, len(b) = 90

goroutine 1 [running]:
main.main()
/home/mark/golang/predict_iris.go:44 +0x686
exit status 2

Below is the python code to train the xgboost model using the xgboost API in sklearn:
import numpy as np
from sklearn import datasets
from sklearn.model_selection import train_test_split
import xgboost as xgb
from xgboost.sklearn import XGBClassifier

X, y = datasets.load_iris(return_X_y=True)
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=0)

xg_train = xgb.DMatrix(X_train, label=y_train)
xg_test = xgb.DMatrix(X_test, label=y_test)
params = {
'objective': 'multi:softmax',
'num_class': 3,
}
n_estimators = 5
#clf = xgb.train(params, xg_train, n_estimators)
clf = XGBClassifier(**params)
clf = clf.fit(X_train, y_train)
y_pred = clf.predict_proba(X_test)[:,1]
clf.save_model('xg_iris.model')
np.savetxt('xg_iris_true_predictions.txt', y_pred, delimiter='\t')
datasets.dump_svmlight_file(X_test, y_test, 'iris_test.libsvm')

NOutputGroups always be 0

after I followed the example in the doc, there exist a starnge error

Support the use of sklearn pipelines with prediction model

I found this super handy, will be great if we can not just predict based on trained model but can also used a sklearn pipeline including the transformation steps before actual prediction