Code Monkey home page Code Monkey logo

Comments (6)

ShichenXie avatar ShichenXie commented on June 24, 2024 1

It could be a problem if you have many variable that need to handle in this way.
I'll add this feature into woebin function. Thanks.

from scorecard.

ShichenXie avatar ShichenXie commented on June 24, 2024

Thanks. You can specify the breakpoints via option break_list in the function woebin. And you can get the optimal binning based on the dataset that excludes the two special values.

library(scorecard) library(data.table) dat<-data.table( y=c(0,0,0,1,1,1,1,1,0,0,1,1,1,0,0,0,1,1,1,0), x=c(1,2,3,4,5,888,888,888,9,10,666,666,666,666,15,16,17,18,19,20))

# get optimal breakpoints for rest dataset
bins <- woebin(dat[x != 666 & x != 888], "y")

# specify the breakpoints
bins2 <- woebin(dat, "y", breaks_list = list(x=c(16, 18, 666, 888)))
woebin_plot(bins2)

from scorecard.

xiaoguozhi avatar xiaoguozhi commented on June 24, 2024

Thanks. very detailed answer. In my case, i mean the value '666' and '888' is a categorical variable. so we should convert it as a factor before woebin.
wheile,the following code regard '666' and '888' as numerical variable.
bins2 <- woebin(dat, "y", breaks_list = list(x=c(16, 18, 666, 888)))
in my opinion, i can do following
library(scorecard)
library(data.table)
library(dplyr)
dat<-data.table( y=c(0,0,0,1,1,0,0,1,0,0,1,1,1,0,0,0,1,1,1,0), x=c(1,2,3,4,5,888,888,888,9,10,666,666,666,666,15,16,17,18,19,20))

special value

sp<-c(666,888)
#get the special data
dat_sp<-filter(dat, x %in% sp)
#get normal data
dat_nor<-filter(dat, !x %in% sp)

convert it to factor

dat_sp$x<-as.factor(dat_sp$x)
bins_sp <- woebin(dat_sp, "y")
woebin_plot(bins_sp)

bin for normal data

bins_nor <- woebin(dat_nor, "y")
woebin_plot(bins_nor)

and now the question is 1) how to combine these two plot in one plot. 2) how to combine these two woe for the variable in this case becase we can't do it by rbind function simply. if we have many such variable , how to get woe ? what I really warried is that we can't do bin for many variables automatically. for example,many functions in your package support batch process, obviously,if we bin for special value and normal value respectively,it destroies batch process.

from scorecard.

xiaoguozhi avatar xiaoguozhi commented on June 24, 2024

get optimal breakpoints for rest dataset

bins <- woebin(dat[x != 666 & x != 888], "y")
here, if we could really get the optimal breakponits once we omit some special samples?
and i realized that compute woe separately is wrong.

from scorecard.

xiaoguozhi avatar xiaoguozhi commented on June 24, 2024

Thanks again for your nice solution! Looking forward to your improved version.

from scorecard.

ShichenXie avatar ShichenXie commented on June 24, 2024

see the following example:

library(scorecard)

dat <- data.frame(y=c(0,0,0,1,1,1,1,1,0,0,1,1,1,0,0,0,1,1,1,0),
x=c(1,2,3,4,5,888,888,888,9,10,666,666,666,666,15,16,17,18,19,20))

#' specify two values as two class
bin = woebin(dat, "y", special_values = c(666,888))
#' specify two values as one class
bin2 = woebin(dat, "y", special_values = c("666%,%888"))

from scorecard.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.