Code Monkey home page Code Monkey logo

Comments (6)

ShichenXie avatar ShichenXie commented on June 25, 2024

The reason to use the cut(....right = FALSE) by default in woebin function is due to business sense. Take age for example, 20s and 30s are usually split into different groups.

I have added this feature according your issue. Please update the package to the latest version on Github. You can set the bins closed on right via options (bin_close_right = TRUE), see the last example of the woebin function.

from scorecard.

jbkunst avatar jbkunst commented on June 25, 2024

Thanks for your response. I understand the argument for the use right = FALSE.

About the feature. I tested it but I have some problems:

  1. The example 1 the options(bin_close_right = TRUE) and woebin(germancredit, y = 'creditability', x = 'age.in.years') make a missing category and the counts don't are the same using cut.

  2. Using x = NULL ins woebin makes bins using right = FALSE even using options(bin_close_right = TRUE).

Tell me if I can help. Thanks.

# devtools::install_github("shichenxie/scorecard")

.Platform$OS.type
#> [1] "windows"

library(scorecard)


packageVersion("scorecard")
#> [1] '0.3.2.999'

options(bin_close_right = TRUE)

data("germancredit")

# example 1
bins <- woebin(germancredit, y = "creditability", x = c("age.in.years", "duration.in.month"))
#> [INFO] creating woe binning ...
bins$age.in.years
#>        variable       bin count count_distr neg pos   posprob        woe
#> 1: age.in.years   missing    16       0.016  10   6 0.3750000  0.3364722
#> 2: age.in.years (-Inf,24]   174       0.174 100  74 0.4252874  0.5461928
#> 3: age.in.years   (24,26]   101       0.101  74  27 0.2673267 -0.1609304
#> 4: age.in.years   (26,33]   257       0.257 172  85 0.3307393  0.1424546
#> 5: age.in.years   (33,35]    79       0.079  67  12 0.1518987 -0.8724881
#> 6: age.in.years (35, Inf]   373       0.373 277  96 0.2573727 -0.2123715
#>         bin_iv  total_iv  breaks is_special_values
#> 1: 0.001922698 0.1312002 missing             FALSE
#> 2: 0.056700011 0.1312002      24             FALSE
#> 3: 0.002528906 0.1312002      26             FALSE
#> 4: 0.005359008 0.1312002      33             FALSE
#> 5: 0.048610052 0.1312002      35             FALSE
#> 6: 0.016079553 0.1312002     Inf             FALSE

setNames(bins$age.in.years$count, bins$age.in.years$bin)
#>   missing (-Inf,24]   (24,26]   (26,33]   (33,35] (35, Inf] 
#>        16       174       101       257        79       373

table(
  cut(
    germancredit$age.in.years,
    breaks = c(-Inf, na.omit(as.numeric(bins$age.in.years$breaks)))
  )
)
#> Warning in na.omit(as.numeric(bins$age.in.years$breaks)): NAs introducidos por
#> coerción
#> 
#> (-Inf,24]   (24,26]   (26,33]   (33,35] (35, Inf] 
#>       149        91       276        72       412

# example 2
bins <- woebin(germancredit, y = "creditability")
#> [INFO] creating woe binning ...

bins$age.in.years$bin
#> [1] "[-Inf,26)" "[26,28)"   "[28,35)"   "[35,37)"   "[37, Inf)"

Created on 2021-05-23 by the reprex package (v2.0.0)

from scorecard.

ShichenXie avatar ShichenXie commented on June 25, 2024

It should be fixed. Please upgrade to the latest version and try again.

library(scorecard)
data("germancredit")

options(bin_close_right = TRUE)
binsR <- woebin(germancredit, y = "creditability", x = c("age.in.years"))
#> [INFO] creating woe binning ...
binsR
#> $age.in.years
#>        variable       bin count count_distr neg pos   posprob        woe
#> 1: age.in.years (-Inf,25]   190       0.190 110  80 0.4210526  0.5288441
#> 2: age.in.years   (25,27]   101       0.101  74  27 0.2673267 -0.1609304
#> 3: age.in.years   (27,34]   257       0.257 172  85 0.3307393  0.1424546
#> 4: age.in.years   (34,36]    79       0.079  67  12 0.1518987 -0.8724881
#> 5: age.in.years (36, Inf]   373       0.373 277  96 0.2573727 -0.2123715
#>         bin_iv  total_iv breaks is_special_values
#> 1: 0.057921024 0.1304985     25             FALSE
#> 2: 0.002528906 0.1304985     27             FALSE
#> 3: 0.005359008 0.1304985     34             FALSE
#> 4: 0.048610052 0.1304985     36             FALSE
#> 5: 0.016079553 0.1304985    Inf             FALSE

options(bin_close_right = FALSE)
binsL <- woebin(germancredit, y = "creditability", x = c("age.in.years"))
#> [INFO] creating woe binning ...
binsL
#> $age.in.years
#>        variable       bin count count_distr neg pos   posprob        woe
#> 1: age.in.years [-Inf,26)   190       0.190 110  80 0.4210526  0.5288441
#> 2: age.in.years   [26,28)   101       0.101  74  27 0.2673267 -0.1609304
#> 3: age.in.years   [28,35)   257       0.257 172  85 0.3307393  0.1424546
#> 4: age.in.years   [35,37)    79       0.079  67  12 0.1518987 -0.8724881
#> 5: age.in.years [37, Inf)   373       0.373 277  96 0.2573727 -0.2123715
#>         bin_iv  total_iv breaks is_special_values
#> 1: 0.057921024 0.1304985     26             FALSE
#> 2: 0.002528906 0.1304985     28             FALSE
#> 3: 0.005359008 0.1304985     35             FALSE
#> 4: 0.048610052 0.1304985     37             FALSE
#> 5: 0.016079553 0.1304985    Inf             FALSE

Created on 2021-05-24 by the reprex package (v1.0.0)

from scorecard.

jbkunst avatar jbkunst commented on June 25, 2024

Thanks so much @ShichenXie , the example works perfectly, but when I use woebin(..., x = NULL) something happens. Please, see reprex.

Lastly, what do you think to prefix al the options? For example, dplyr and data.table packages use this pattern/nomenclature.

options(scorecad.bin_close_right = TRUE)

# like:
options(datatable.auto.index = TRUE)
options(dplyr.show_progress = FALSE)

Thanks again!

# devtools::install_github("shichenxie/scorecard")
library(scorecard)

packageVersion("scorecard")
#> [1] '0.3.2.999'

options(bin_close_right = TRUE)

data("germancredit")

# example 1: only age in years and other variable
bins1 <- woebin(germancredit, y = "creditability", x = c("age.in.years", "duration.in.month"))
#> [INFO] creating woe binning ...
bins1$age.in.years[, c(1, 2, 3)]
#>        variable       bin count
#> 1: age.in.years (-Inf,25]   190
#> 2: age.in.years   (25,27]   101
#> 3: age.in.years   (27,34]   257
#> 4: age.in.years   (34,36]    79
#> 5: age.in.years (36, Inf]   373

# example 2: x = NULL
# the cloese in left
bins2 <- woebin(germancredit, y = "creditability")
#> [INFO] creating woe binning ...
bins2$age.in.years[, c(1, 2, 3)]
#>        variable       bin count
#> 1: age.in.years [-Inf,26)   190
#> 2: age.in.years   [26,28)   101
#> 3: age.in.years   [28,35)   257
#> 4: age.in.years   [35,37)    79
#> 5: age.in.years [37, Inf)   373

Created on 2021-05-24 by the reprex package (v2.0.0)

from scorecard.

ShichenXie avatar ShichenXie commented on June 25, 2024

Good point. I have changed the argument to options(scorecard.bin_close_right=TRUE).

from scorecard.

ShichenXie avatar ShichenXie commented on June 25, 2024

This issue should be solved. I close it now.

from scorecard.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.