Comments (8)
ok set count_distr_limit=0.01 solve this ,thanks
from scorecard.
>iv(df,"pflagall","cus_cus_classall")
variable info_value
1: cus_cus_classall 0.8214501
> woebin(df,"pflagall","cus_cus_classall")
[INFO] creating woe binning ...
$cus_cus_classall
variable bin count count_distr good bad badprob woe bin_iv total_iv breaks is_special_values
1: cus_cus_classall 1%,%3 258899 1 15924 242975 0.9384934 0 0 0 1%,%3 FALSE
>smbinning.factor(df,"pflagall","cus_cus_classall")$ivtable
Cutpoint CntRec CntGood CntBad CntCumRec CntCumGood CntCumBad PctRec GoodRate BadRate Odds LnOdds WoE
1 = '1' 253491 241213 12278 253491 241213 12278 0.9791 0.9516 0.0484 19.6460 2.9779 0.2527
2 = '3' 5408 1762 3646 258899 242975 15924 0.0209 0.3258 0.6742 0.4833 -0.7272 -3.4523
3 Missing 0 0 0 258899 242975 15924 0.0000 NaN NaN NaN NaN NaN
4 Total 258899 242975 15924 NA NA NA 1.0000 0.9385 0.0615 15.2584 2.7251 0.0000
IV
1 0.0560
2 0.7654
3 NaN
4 0.8214
--
|
from scorecard.
To make a robust model or stable woe values , each bin should not contains too less observations. A 5% is recommended usually as the minimum count distribution rate .
from scorecard.
but this variable is indeed a very strong indicator of real business ,so set it to 0.01 is a tradeoff.
a off-topic question
using woebin on a dataset (row num is 250000 col num is 309) on windows server 2016 ,it has 48 core 96 threads.
run time of woebin with no_cores 2 4 6 8 no difference , 6-7 mins .
is this normal?Windows task manager show background r process started,but runtime always same
from scorecard.
What about 2 and 20 cores?
In my experience, the more cores used, should be the less running time. The parallel calculation effects when you have many features, like thousands of columns. Your dataset seems not so big in columns.
This package depends on data.table, which might do some parallel calculation on the backend. I'm not sure is this related.
from scorecard.
bad luck ,use 20 cores still same time, now data is 260000 rows, 602 columns, runtime of 8 core is 13 mins ,20 cores is also 13 mins.
from scorecard.
It seems OK on Macbook Pro 2017.
The issue in your computer might be due to the setting of OpenBLAS package. You can read the posts in Parallel processing in R limited. Actually, I don't understand it fully. Try to figure out by yourself if you have time.
library(scorecard)
system.time(a <- woebin(germancredit, 'creditability', print_info=FALSE, no_cores = 1))
system.time(b <- woebin(germancredit, 'creditability', print_info=FALSE, no_cores = 2))
from scorecard.
Thanks a lot , my r use basic blas ,not openblas ,My guess is r parallel packages malfunction on windows.
I will try to figure it out if I have time.
Thank you for wonderful package
from scorecard.
Related Issues (20)
- cut argument right= FALSE HOT 6
- `woebin`: some count are NA, but neg/pos counts are ok HOT 2
- Problem with perf_eva density plot HOT 2
- `woebin`: break_list doesn't work HOT 2
- Formulas HOT 1
- 关于woebin等频分箱报错 HOT 2
- Gini with to = 'bin' HOT 1
- Scorecard2 issue with probability set to TRUE HOT 2
- question min and max score HOT 1
- Information Value from scorecard::iv() is not equal to Information value from scorecard::woebin() HOT 3
- 分箱区间问题 HOT 3
- woebin 指定breaklist时有问题 HOT 13
- Line plot for woebin_adj with line_value = "woe" resets to positive probability after adjusting breaks HOT 5
- Native pipe |> requires R >=4.1 HOT 1
- Cannot install.packages("scorecard") on windows HOT 2
- Fail to install - 0.3.9 HOT 2
- Definition of offset in the scorecard function HOT 1
- Is there any way to export the scorecard to PMML? HOT 8
- Error after latest update HOT 1
- Woe and points do not follow the same pattern HOT 2
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from scorecard.