Code Monkey home page Code Monkey logo

ebcic's Introduction

EBCIC: Exact Binomial Confidence Interval Calculator

Downloads Language grade: Python GitHub tag (latest by date) Total alerts GitHub issues GitHub

日本語 Japanese

These programs are mainly for researchers, developers, and designers who calculate Binomial Confidence Intervals for given parameters:

  • n: the number of Bernoulli or Binomial trials.
  • k: the number of target events happened.
  • confi_perc: confidence percentage:
    • for two-sided of 0<k<n where 0 < confi_perc < 100, or for one-sided of k=0 or k=n.
    • for one-sided of 0<k<n, set confi_perc = (2 * confi_perc_for_one_sided - 100) where 50 < confi_perc_for_one_sided < 100.

EBCIC calculates binomial intervals exactly, i.e. by implementing Clopper-Pearson interval [CP34] without simplifying mathematical equations that may deteriorate intervals for certain combinations of the above parameters. EBCIC can also shows graphs for comparing exact intervals with approximated ones.

How to use

Jupyter notebook

  1. Open ebcic.ipynb with Jupyter-notebook-compatible development environment such as Jupyter Notebook, JupyterLab, or Visual Studio Code.

  2. Run the following initial cells:

    # Run this cell, if `ebcic` package has not been installed yet:
    %pip install ebcic
    import ebcic
    from ebcic import *
  3. Run the cells you want to execute.

Command line

  1. Installation

    • When using PyPI ebcic package:

      pip install ebcic
    • When using github ebcic repo:

      git clone https://github.com/KazKobara/ebcic.git
      cd ebcic
  2. Command-line help

    • Check the version and options:

      python -m ebcic -h
  3. Cf. the examples below.

MATLAB (with Python and ebcic package)

  1. Install Python for MATLAB and ebcic package according to this page.
  2. Open a sample MATLAB code file ebcic_in_matlab.m as a 'live script' as shown this page.
  3. Edit and run the sections you want to execute.

NOTE: If you manage the edited file with git, save it as a MATLAB code file (*.m) file to commit (or commit the live code file (*.mlx) to a git LFS (Large File Storage)) since live code files (*.mlx) are not git friendly. If necessary, save it as a *.html file as well to check its look.

Examples

Print exact interval as text

Command line:

python -m ebcic -k 1 -n 501255 --confi-perc 99.0

Python Interpreter or Jupyter cell to run:

"""Print exact interval as text.
Edit the following parameters, k, n, confi_perc, and run this cell.
"""
print_interval(Params(
    k=1,             # Number of errors
    n=501255,        # Number of trials
    confi_perc=99.0  # Confidence percentage
        # for two-sided of 0<k<n where 0 < confi_perc < 100,
        # or for one-sided of k=0 or k=n.
        # For one-sided of 0<k<n, set
        # confi_perc = (2 * confi_perc_for_one_sided - 100)
        # where 50 < confi_perc_for_one_sided < 100.
    ))

Result:

===== Exact interval of p with 99.0 [%] two-sided (or 99.5 [%] one-sided) confidence  =====
Upper : 1.482295806e-05
Lower : 9.99998e-09
Width : 1.481295808e-05

Depict graphs

Exact intervals and the line of k/n for k=1

This program can show not only the typical 95% and 99% confidence lines but also any confidence percentage lines.

Python Interpreter or Jupyter cell to run:

interval_graph(GraProps(
    # Set the range of k with k_*
    k_start=1,  # >= 0
    k_end=1,    # >= k_start
    k_step=1,   # >= 1
    # Edit the list of confidence percentages to depict, [confi_perc, ...],
    #   for two-sided of 0<k<n where 0 < confi_perc < 100, or
    #   for one-sided of k=0 or k=n.
    # NOTE For one-sided of 0<k<n, set 
    #   confi_perc=(2 * confi_perc_for_one_sided - 100)
    #   where 50 < confi_perc_for_one_sided < 100
    #   (though both lower and upper intervals are shown).
    confi_perc_list=[90, 95, 99, 99.9, 99.99],
    # Lines to depict
    line_list=[
        'with_exact',
        'with_line_kn',  # Line of k/n
    ],
    # savefig=True,  # uncomment on Python Interpreter 
    # fig_file_name='intervals.png',
    ))

Result:

If figures or links are not shown appropriately, visit here.

Exact intervals and the line of k/n for k=1

Exact intervals for k=0 to 5

Python Interpreter or Jupyter cell to run:

interval_graph(GraProps(
    k_start=0,  # >= 0
    k_end=5,    # >= k_start
    line_list=['with_exact'],
    # savefig=True,  # uncomment on Python Interpreter 
    # fig_file_name='intervals.png',
    ))

Result:

Exact intervals for k=0 to 5

Comparison of exact and approximated intervals for k=0

Python Interpreter or Jupyter cell to run:

interval_graph(GraProps(
    k_start=0,    # >= 0
    k_end=0,      # >= k_start
    log_n_end=3,  # max(n) = k_end*10**log_n_end
    line_list=[
        'with_exact',
        'with_rule_of_la',  # rule of -ln(alpha)
                            # available only for k=0 and k=n
        #'with_normal',     # not available for k=0 and k=n
        'with_wilson',
        'with_wilson_cc',
        'with_beta_approx',
    ],
    # savefig=True,  # uncomment on Python Interpreter 
    # fig_file_name='intervals.png',
    ))

where interval names to be added in the line_list and their conditions are as follows:

Interval name (after 'with_') Explanation Condition
exact Implementation of Clopper-Pearson interval [CP34] without approximation.
rule_of_la 'Rule of -ln(a)' or 'Rule of -log_e(alpha)'; Generalization of the 'Rule of three' [Lou81,HL83,JL97,Way00,ISO/IEC19795-1] that is for k=0 and alpha=0.05 (95% confidence percentage), to other confidence percentages than 95% and k=n. k=0 or k=n
wilson Wilson score interval [Wil27].
wilson_cc Wilson score interval with continuity correction [New98].
beta_approx Approximated interval using beta function.
normal Normal approximation interval or Wald confidence interval. 0<k<n

Result:

As you can see from the following figure, 'rule of -ln(a)' is a good approximation for k=0 and large n depending on the confidence percentage.

For k=0, interval_graph(), v0.0.3 or later, displays only upper intervals since their lower intervals must be 0 (though some approximations, such as 'Wilson cc', output wrong values than 0).

Comparison of exact and approximated intervals for k=0

Comparison of exact and approximated intervals for k=1

Python Interpreter or Jupyter cell to run:

interval_graph(GraProps(
    k_start=1,  # >= 0
    k_end=1,    # >= k_start
    line_list=[
        'with_line_kn'
        # 'with_rule_of_la',  # available only for k=0
        'with_exact',
        'with_normal',
        'with_wilson',
        'with_wilson_cc',
        'with_beta_approx',
    ],
    # savefig=True,  # uncomment on Python Interpreter 
    # fig_file_name='intervals.png',
    ))

Result:

As you can see from the following figures and warned in a lot of papers, such as [BLC01], normal approximation intervals are not good approximation for small k.

Upper intervals of the other approximations look tight. The approximation using beta function looks tight except for the border k=n=1.

Comparison of exact and approximated intervals for k=1

Comparison of exact and approximated intervals for k=10

Python Interpreter or Jupyter cell to run:

interval_graph(GraProps(
    k_start=10,   # >= 0
    k_end=10,     # >= k_start
    log_n_end=2,  # max(n) = k_end*10**log_n_end
    line_list=[
        'with_exact',
        'with_normal',
        'with_wilson',
        'with_wilson_cc',
        'with_beta_approx',
    ],
    # savefig=True,  # uncomment on Python Interpreter 
    # fig_file_name='intervals.png',
    ))

Result:

For k=10, 'normal' still does not provide a good approximation.

Comparison of exact and approximated intervals for k=10

Comparison of exact and approximated intervals for k=100

Python Interpreter or Jupyter cell to run:

interval_graph(GraProps(
    k_start=100,  # >= 0
    k_end=100,    # >= k_start
    log_n_end=2,  # max(n) = k_end*10**log_n_end
    line_list=[
        'with_exact',
        'with_normal',
        'with_wilson',
        'with_wilson_cc',
        'with_beta_approx',
    ],
    # savefig=True,  # uncomment on Python Interpreter 
    # fig_file_name='intervals.png',
    ))

Result:

At least for k=100 and confidence percentage, confi_perc=99.0, all these approximations look tight.

Comparison of exact and approximated intervals for k=20

  1. Download

    git clone https://github.com/KazKobara/ebcic.git
  2. Open the following file with your browser (after replacing <path to the downloaded ebcic> appropriately):

    file://<path to the downloaded ebcic>/docs/_build/index.html
    

    For WSL Ubuntu-20.04, replace <username> and <path to the downloaded ebcic> appropriately:

    file://wsl%24/Ubuntu-20.04/home/<username>/<path to the downloaded ebcic>/docs/_build/index.html
    

Bibliography

[CP34]: Clopper, C. and Pearson, E.S. "The use of confidence or fiducial limits illustrated in the case of the binomial," Biometrika. 26 (4): pp.404-413, 1934

[Lou81]: Louis, T.A. "Confidence intervals for a binomial parameter after observing no successes," The American Statistician, 35(3), p.154, 1981

[HL83]: Hanley, J.A. and Lippman-Hand, A. "If nothing goes wrong, is everything all right? Interpreting zero numerators," Journal of the American Medical Association, 249(13), pp.1743-1745, 1983

[JL97]: Jovanovic, B.D. and Levy, P.S. "A look at the rule of three," The American Statistician, 51(2), pp.137-139, 1997

[Way00]: Wayman, J.L. "Technical testing and evaluation of biometric identification devices," Biometrics: Personal identification in networked society, edited by A.K. Jain, et al., Kluwer, pp.345-368, 2000

[ISO/IEC19795-1]: ISO/IEC 19795-1, "Information technology-Biometric performance testing and reporting-Part 1: Principles and framework"

[New98]: Newcombe, R.G. "Two-sided confidence intervals for the single proportion: comparison of seven methods," Statistics in Medicine. 17 (8): pp.857-872, 1998

[Wil27]: Wilson, E.B. "Probable inference, the law of succession, and statistical inference," Journal of the American Statistical Association. 22 (158): pp.209-212, 1927

[BLC01]: Brown, L.D., Cai, T.T. and DasGupta, A. "Interval Estimation for a Binomial Proportion," Statistical Science. 16 (2): pp. 101-133, 2001

License

MIT License

When you use or publish the confidence interval obtained with the software, please refer to the software name, version, platform, and so on, so that readers can verify the correctness and reproducibility of the interval with the input parameters.

An example of the reference is:

The confidence interval is obtained by EBCIC X.X.X on Python 3."

where X.X.X is the version of EBCIC.

The initial software is based on results obtained from a project, JPNP16007, commissioned by the New Energy and Industrial Technology Development Organization (NEDO).

Copyright (c) 2020-2022 National Institute of Advanced Industrial Science and Technology (AIST)


ebcic's People

Contributors

kazkobara avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.