Comments (6)
The reason to use the cut(....right = FALSE) by default in woebin function is due to business sense. Take age for example, 20s and 30s are usually split into different groups.
I have added this feature according your issue. Please update the package to the latest version on Github. You can set the bins closed on right via options (bin_close_right = TRUE), see the last example of the woebin function.
from scorecard.
Thanks for your response. I understand the argument for the use right = FALSE
.
About the feature. I tested it but I have some problems:
-
The example 1 the
options(bin_close_right = TRUE)
andwoebin(germancredit, y = 'creditability', x = 'age.in.years')
make amissing
category and the counts don't are the same usingcut
. -
Using
x = NULL
inswoebin
makes bins usingright = FALSE
even usingoptions(bin_close_right = TRUE)
.
Tell me if I can help. Thanks.
# devtools::install_github("shichenxie/scorecard")
.Platform$OS.type
#> [1] "windows"
library(scorecard)
packageVersion("scorecard")
#> [1] '0.3.2.999'
options(bin_close_right = TRUE)
data("germancredit")
# example 1
bins <- woebin(germancredit, y = "creditability", x = c("age.in.years", "duration.in.month"))
#> [INFO] creating woe binning ...
bins$age.in.years
#> variable bin count count_distr neg pos posprob woe
#> 1: age.in.years missing 16 0.016 10 6 0.3750000 0.3364722
#> 2: age.in.years (-Inf,24] 174 0.174 100 74 0.4252874 0.5461928
#> 3: age.in.years (24,26] 101 0.101 74 27 0.2673267 -0.1609304
#> 4: age.in.years (26,33] 257 0.257 172 85 0.3307393 0.1424546
#> 5: age.in.years (33,35] 79 0.079 67 12 0.1518987 -0.8724881
#> 6: age.in.years (35, Inf] 373 0.373 277 96 0.2573727 -0.2123715
#> bin_iv total_iv breaks is_special_values
#> 1: 0.001922698 0.1312002 missing FALSE
#> 2: 0.056700011 0.1312002 24 FALSE
#> 3: 0.002528906 0.1312002 26 FALSE
#> 4: 0.005359008 0.1312002 33 FALSE
#> 5: 0.048610052 0.1312002 35 FALSE
#> 6: 0.016079553 0.1312002 Inf FALSE
setNames(bins$age.in.years$count, bins$age.in.years$bin)
#> missing (-Inf,24] (24,26] (26,33] (33,35] (35, Inf]
#> 16 174 101 257 79 373
table(
cut(
germancredit$age.in.years,
breaks = c(-Inf, na.omit(as.numeric(bins$age.in.years$breaks)))
)
)
#> Warning in na.omit(as.numeric(bins$age.in.years$breaks)): NAs introducidos por
#> coerción
#>
#> (-Inf,24] (24,26] (26,33] (33,35] (35, Inf]
#> 149 91 276 72 412
# example 2
bins <- woebin(germancredit, y = "creditability")
#> [INFO] creating woe binning ...
bins$age.in.years$bin
#> [1] "[-Inf,26)" "[26,28)" "[28,35)" "[35,37)" "[37, Inf)"
Created on 2021-05-23 by the reprex package (v2.0.0)
from scorecard.
It should be fixed. Please upgrade to the latest version and try again.
library(scorecard)
data("germancredit")
options(bin_close_right = TRUE)
binsR <- woebin(germancredit, y = "creditability", x = c("age.in.years"))
#> [INFO] creating woe binning ...
binsR
#> $age.in.years
#> variable bin count count_distr neg pos posprob woe
#> 1: age.in.years (-Inf,25] 190 0.190 110 80 0.4210526 0.5288441
#> 2: age.in.years (25,27] 101 0.101 74 27 0.2673267 -0.1609304
#> 3: age.in.years (27,34] 257 0.257 172 85 0.3307393 0.1424546
#> 4: age.in.years (34,36] 79 0.079 67 12 0.1518987 -0.8724881
#> 5: age.in.years (36, Inf] 373 0.373 277 96 0.2573727 -0.2123715
#> bin_iv total_iv breaks is_special_values
#> 1: 0.057921024 0.1304985 25 FALSE
#> 2: 0.002528906 0.1304985 27 FALSE
#> 3: 0.005359008 0.1304985 34 FALSE
#> 4: 0.048610052 0.1304985 36 FALSE
#> 5: 0.016079553 0.1304985 Inf FALSE
options(bin_close_right = FALSE)
binsL <- woebin(germancredit, y = "creditability", x = c("age.in.years"))
#> [INFO] creating woe binning ...
binsL
#> $age.in.years
#> variable bin count count_distr neg pos posprob woe
#> 1: age.in.years [-Inf,26) 190 0.190 110 80 0.4210526 0.5288441
#> 2: age.in.years [26,28) 101 0.101 74 27 0.2673267 -0.1609304
#> 3: age.in.years [28,35) 257 0.257 172 85 0.3307393 0.1424546
#> 4: age.in.years [35,37) 79 0.079 67 12 0.1518987 -0.8724881
#> 5: age.in.years [37, Inf) 373 0.373 277 96 0.2573727 -0.2123715
#> bin_iv total_iv breaks is_special_values
#> 1: 0.057921024 0.1304985 26 FALSE
#> 2: 0.002528906 0.1304985 28 FALSE
#> 3: 0.005359008 0.1304985 35 FALSE
#> 4: 0.048610052 0.1304985 37 FALSE
#> 5: 0.016079553 0.1304985 Inf FALSE
Created on 2021-05-24 by the reprex package (v1.0.0)
from scorecard.
Thanks so much @ShichenXie , the example works perfectly, but when I use woebin(..., x = NULL)
something happens. Please, see reprex.
Lastly, what do you think to prefix al the options? For example, dplyr and data.table packages use this pattern/nomenclature.
options(scorecad.bin_close_right = TRUE)
# like:
options(datatable.auto.index = TRUE)
options(dplyr.show_progress = FALSE)
Thanks again!
# devtools::install_github("shichenxie/scorecard")
library(scorecard)
packageVersion("scorecard")
#> [1] '0.3.2.999'
options(bin_close_right = TRUE)
data("germancredit")
# example 1: only age in years and other variable
bins1 <- woebin(germancredit, y = "creditability", x = c("age.in.years", "duration.in.month"))
#> [INFO] creating woe binning ...
bins1$age.in.years[, c(1, 2, 3)]
#> variable bin count
#> 1: age.in.years (-Inf,25] 190
#> 2: age.in.years (25,27] 101
#> 3: age.in.years (27,34] 257
#> 4: age.in.years (34,36] 79
#> 5: age.in.years (36, Inf] 373
# example 2: x = NULL
# the cloese in left
bins2 <- woebin(germancredit, y = "creditability")
#> [INFO] creating woe binning ...
bins2$age.in.years[, c(1, 2, 3)]
#> variable bin count
#> 1: age.in.years [-Inf,26) 190
#> 2: age.in.years [26,28) 101
#> 3: age.in.years [28,35) 257
#> 4: age.in.years [35,37) 79
#> 5: age.in.years [37, Inf) 373
Created on 2021-05-24 by the reprex package (v2.0.0)
from scorecard.
Good point. I have changed the argument to options(scorecard.bin_close_right=TRUE)
.
from scorecard.
This issue should be solved. I close it now.
from scorecard.
Related Issues (20)
- Formulas HOT 1
- 关于woebin等频分箱报错 HOT 2
- Gini with to = 'bin' HOT 1
- Scorecard2 issue with probability set to TRUE HOT 2
- question min and max score HOT 1
- Information Value from scorecard::iv() is not equal to Information value from scorecard::woebin() HOT 3
- 分箱区间问题 HOT 3
- woebin 指定breaklist时有问题 HOT 13
- Line plot for woebin_adj with line_value = "woe" resets to positive probability after adjusting breaks HOT 5
- Native pipe |> requires R >=4.1 HOT 1
- Cannot install.packages("scorecard") on windows HOT 2
- Fail to install - 0.3.9 HOT 2
- Definition of offset in the scorecard function HOT 1
- Is there any way to export the scorecard to PMML? HOT 8
- Error after latest update HOT 1
- Woe and points do not follow the same pattern HOT 2
- Function error HOT 5
- Let we choose whether to let the missing value be a separate bin HOT 3
- woebin持续运算得不到结果 HOT 2
- woebin bug (?) HOT 2
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from scorecard.