Code Monkey home page Code Monkey logo

Comments (2)

Lantianzz avatar Lantianzz commented on June 29, 2024

你好,非常非常抱歉,快两个月了才回复你,年后这段时间没有经常看这个邮箱,你的邮件就没淹没了。。。今天才偶然发现,不知道问题是否已经解决?非常感谢你使用我写的东西,没有及时回复,希望没有给你造成很大的不方便。。。

1.“keyerror:450~inf”

我看了一下相关代码,“keyerror:450~inf”是_applyScoreCard这个函数的scores = map_np(intervals, score_dict)这一行报错?仔细检查了一下,发现这是由于在少数情况下,用ChiMerge进行特征离散化的时候输出的不是负无穷到正无穷的连续区间。这处bug在最新版本中已被解决,可pip install scorecardbundle -U升级到最新版本。

这个问题的详细原因是,当某特征在训练集的唯一值数量较少(例如全都是0)时,用于分割区间的boundaries的最大值会等于特征的最大值,原先的代码虽然故意增加了inf,但由于本库通篇采用左开右闭区间,导致特征最大值分配到的区间是xxxmax,而不是xxxinf,如果测试集出现了更大的数字,就会导致WOE、评分卡等基于此区间的后续步骤报错,因此调整ChiMerge是解决此bug一劳永逸的路径。

  1. 离散化区间范围正无穷到负无穷

目前这个库是用ChiMerge算法做特征的离散化的,这个算法会考虑特征不同取值区间的因变量响应率的差异,将相似的区间合

并、最终保留统计上有显著差异的取值区间,如果一个特征的取值区间都没有显著差异,就会被ChiMerge合并为一个区间,也就是-inf~inf,此时可以认为此特征没有足够的区分度,可以去除;如果还是想保留这个特征,可以将ChiMerge的min_intervals参数设置为2或更大,这会在区间只剩下min_intervals个的时候停止合并,这样就可以输出这个特征的多个取值区间了。

最后非常感谢你的反馈,这个库是我第一次做开源,现在看维护频率有点太低了。。。再次抱歉哈。

from scorecard-bundle.

Lantianzz avatar Lantianzz commented on June 29, 2024

For others who encountered KeyError (e.g. KeyError 450~inf) as well due to more extreme values in the test set. This issue has been resolved in the newest release. To avoid this bug, please use pip install scorecardbundle --U to update to the newest version.

Here is the explaination on what happened. When a feature has unique values less than min_intervals parameter (e.g. all values of this feature is 0), the maximum interval boundaries may equal to the maximum value of the feature. In this case, although I have added 'inf' deliberately to the boundaries, the maximum value of the feature would still be assigned to interval “xxxmax” rather than "xxxinf" since all intervals used in this module are closed to the right. This would cause KeyError in WOE, Scorecard and other subsequent steps that rely on intervals when there are values in the test set larger than the maximum value in the training set. Therefore, adjusting ChiMerge can tackle this bug fundamentally.

Thanks songshijun007 again for bringing up this bug.

from scorecard-bundle.

Related Issues (6)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.