Comments (2)
你好,非常非常抱歉,快两个月了才回复你,年后这段时间没有经常看这个邮箱,你的邮件就没淹没了。。。今天才偶然发现,不知道问题是否已经解决?非常感谢你使用我写的东西,没有及时回复,希望没有给你造成很大的不方便。。。
1.“keyerror:450~inf”
我看了一下相关代码,“keyerror:450~inf”是_applyScoreCard
这个函数的scores = map_np(intervals, score_dict)
这一行报错?仔细检查了一下,发现这是由于在少数情况下,用ChiMerge进行特征离散化的时候输出的不是负无穷到正无穷的连续区间。这处bug在最新版本中已被解决,可pip install scorecardbundle -U
升级到最新版本。
这个问题的详细原因是,当某特征在训练集的唯一值数量较少(例如全都是0)时,用于分割区间的boundaries的最大值会等于特征的最大值,原先的代码虽然故意增加了inf,但由于本库通篇采用左开右闭区间,导致特征最大值分配到的区间是xxxmax,而不是xxxinf,如果测试集出现了更大的数字,就会导致WOE、评分卡等基于此区间的后续步骤报错,因此调整ChiMerge是解决此bug一劳永逸的路径。
- 离散化区间范围正无穷到负无穷
目前这个库是用ChiMerge算法做特征的离散化的,这个算法会考虑特征不同取值区间的因变量响应率的差异,将相似的区间合
并、最终保留统计上有显著差异的取值区间,如果一个特征的取值区间都没有显著差异,就会被ChiMerge合并为一个区间,也就是-inf~inf,此时可以认为此特征没有足够的区分度,可以去除;如果还是想保留这个特征,可以将ChiMerge的min_intervals
参数设置为2或更大,这会在区间只剩下min_intervals个的时候停止合并,这样就可以输出这个特征的多个取值区间了。
最后非常感谢你的反馈,这个库是我第一次做开源,现在看维护频率有点太低了。。。再次抱歉哈。
from scorecard-bundle.
For others who encountered KeyError (e.g. KeyError 450~inf) as well due to more extreme values in the test set. This issue has been resolved in the newest release. To avoid this bug, please use pip install scorecardbundle --U
to update to the newest version.
Here is the explaination on what happened. When a feature has unique values less than min_intervals
parameter (e.g. all values of this feature is 0), the maximum interval boundaries may equal to the maximum value of the feature. In this case, although I have added 'inf' deliberately to the boundaries, the maximum value of the feature would still be assigned to interval “xxxmax” rather than "xxxinf" since all intervals used in this module are closed to the right. This would cause KeyError in WOE, Scorecard and other subsequent steps that rely on intervals when there are values in the test set larger than the maximum value in the training set. Therefore, adjusting ChiMerge can tackle this bug fundamentally.
Thanks songshijun007 again for bringing up this bug.
from scorecard-bundle.
Related Issues (6)
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from scorecard-bundle.