Comments (6)
Hey @oooo26, thanks for using LightGBM.
Should I add the init_score to lgb prediction?
Yes
It seems useful for this "l2" objective, but still weird for e.g. "tweedie"
You should be able to match the outputs of init_score=None
if you set it to the same value that LightGBM starts the boosting from, which is different for each objective. For l2 is the mean of the target, but for tweedie is the log of that
LightGBM/src/objective/regression_objective.hpp
Lines 468 to 470 in 28536a0
from lightgbm.
Should I add the init_score
to lgb prediction? It seems useful for this "l2" objective, but still weird for e.g. "tweedie".
import pandas as pd
import lightgbm as lgb
from sklearn import datasets
from sklearn.linear_model import LinearRegression
diabetes = datasets.load_diabetes()
n, p = diabetes["data"].shape
def lgb_model(init_score):
train_set = lgb.Dataset(
data=diabetes["data"],
label=diabetes["target"],
feature_name=diabetes["feature_names"],
init_score=init_score,
)
params = {
"objective": "tweedie", # change to tweedie
"verbosity": -1,
"seed": 0,
}
model = lgb.train(params, train_set)
predictions = model.predict(diabetes["data"])
if init_score is not None:
predictions += init_score. # add init_score
return predictions
init_model = LinearRegression().fit(diabetes["data"], diabetes["target"])
results = pd.DataFrame(
{
"target": diabetes["target"],
"init - None": lgb_model(None),
"init - mean": lgb_model([diabetes["target"].mean()] * n),
"init - linear": lgb_model(init_model.predict(diabetes["data"])),
}
)
print(results.head())
target init - None init - mean init - linear
0 151.0 157.811490 152.952215 207.116677
1 75.0 75.164031 152.952215 69.071033
2 141.0 140.120412 152.952215 177.882790
3 206.0 223.799875 152.952215 167.914458
4 135.0 119.464777 152.952215 129.462258
from lightgbm.
Thank you for helping!
Let me double check, for tweedie, I should set init_score
as the log scale of initial predictions and, after fitting, still add the normal scale of them?
from lightgbm.
Oh I see. it should be like
def lgb_model(init_score):
train_set = lgb.Dataset(
data=diabetes["data"],
label=diabetes["target"],
feature_name=diabetes["feature_names"],
)
if init_score is not None:
train_set.set_init_score(np.log(init_score))
params = {
"objective": "tweedie",
"verbosity": -1,
"seed": 0,
}
model = lgb.train(params, train_set)
predictions = model.predict(diabetes["data"])
if init_score is not None:
# transform back to log scale and add init_score
predictions = np.exp(np.log(predictions) + train_set.init_score)
return predictions
It is a bit complex i think...but anyway, thank you!
from lightgbm.
Yes, some objectives have a ConvertOutput
method, which is used to translate the raw scores (which the model is trained to predict) to the actual scores (the ones we're interested in). In the case of Tweedie it's the exp (as you've correctly done)
LightGBM/src/objective/regression_objective.hpp
Lines 460 to 462 in 5dfe716
So another alternative would be to always predict the raw score, add the init score (which will be on the same "scale"), and then apply that transformation. So the following would be equivalent but maybe easier since for each objective you'd only need to apply the ConvertOutput
method.
convert_output = np.exp # for tweedie and poisson
predictions = model.predict(diabetes["data"], raw_score=True)
if init_score is not None:
predictions += init_score
return convert_output(predictions)
from lightgbm.
Closing since I believe the issue has been solved, feel free to reopen if you run into any more problems.
from lightgbm.
Related Issues (20)
- C API refitting HOT 6
- [python-package] bug : dump_model json loads failed HOT 2
- [R-package] latest {Matrix} requires R>=4.4.0
- [RFC] remove 'categorical_feature' and 'feature_name' parameters in cv() and train() HOT 6
- [RFC] remove HDFS support? HOT 3
- (transferred) [python-package] how to install the Python package from source? HOT 5
- [RFC] provide Python/R implementations of all the built-in objectives? HOT 1
- [python-package] Error in installation in github MacOS runners HOT 2
- [docs] readthedocs builds are broken HOT 1
- CMake `__BUILD_FOR_R` fails to find R source files HOT 4
- [docker] Update Tutorial for Installing LightGBM GPU on Docker Environment HOT 3
- [python-package] NumPy 2.0 support HOT 1
- LightGBM failed to testlightgbm.exe on MSVC HOT 1
- Lightgbm trains much slower than catboost. HOT 15
- Any suggestions for predicting all values to be 0? HOT 1
- [python-package] How to refit a classifier? HOT 4
- Uncertainty in LightGBM (again) HOT 4
- Can not predict with multithread? HOT 1
- [ci] CUDA 11.8 wheel (gcc) CI jobs failing: 'libomp.so.5: no such file or directory'
- [GPU] lightgbm.basic.LightGBMError: Check failed: (best_split_info.left_count) > (0), lightgbm.basic.LightGBMError: Check failed: (best_split_info.right_count) > (0)
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from lightgbm.