Code Monkey home page Code Monkey logo

statistics's Introduction

Contributors Forks Stargazers Issues MS-PL License LinkedIn


Logo

Lisp-Stat Statistics

A consolidation of Common Lisp statistics libraries
Explore the docs »

Report Bug · Request Feature · Reference Manual

Table of Contents

  1. About the Project
  2. Installation
  3. Usage
  4. Functions
  5. Roadmap
  6. Resources
  7. Contributing
  8. License
  9. Contact

About the Project

There are three statistics libraries that can be considered relatively complete and well written:

  • The statistics library from numerical-utilities
  • Larry Hunter's cl-statistics
  • Gary Warren King's cl-mathstats

There are a few challenges in using these as independent systems on projects though:

  • There is a good amount of overlap. Everyone implements, for example mean (as does alexandria, cephes, and others)
  • In the case of mean, variance, etc., the functions deal only with samples, not distributions

This library brings these three systems under a single 'umbrella', and adds a few missing ones. To do this we use Tim Bradshaw's conduit-packages. For the few functions that require dispatch on type (sample data vs. a distribution), we use typecase because of its simplicity and not needing another system. There's a slight performance hit here in the case of run-time determination of types, but until it's a problem prefer it. Some alternatives considered for dispatch was https://github.com/pcostanza/filtered-functions.

nu-statistics

These functions cover sample moments in detail, and are accurate. They include up to forth moments, and are well suited to the work of an econometrist (and were written by one).

cl-statistics

These were written by Larry Hunter, based on the methods described in Bernard Rosner's book, Fundamentals of Biostatistics 5th Edition, along with some from the CLASP system. They cover a wide range of statistical applications.

gwk-statistics

These are from Gary Warren King, and also partially based on CLASP. It is well written, and the functions have excellent documentation. The major reason we don't include it by default is because it uses an older ecosystem of libraries that duplicate more widely used system (for example, numerical utilities, alexandria). If you want to use these, you'll need to uncomment the appropriate code in the ASDF and pkgdcl.lisp files.

Accuracy

LH and GWK statistics compute quantiles, CDF, PDF, etc. using routines from CLASP, that in turn are based on algorithms from Numerical Recipes. These are known to be accurate to only about four decimal places. This is probably accurate enough for many statistical problem, however should you need greater accuracy look at the distributions system. The computations there are based on special-functions, which has accuracy around 15 digits. Unfortunately documentation of distributions and the 'wrapping' of them here are incomplete, so you'll need to know the pattern, e.g. pdf-gamma, cdf-gamma, etc., which is described in the link above.

Versions

Because this system is likely to change rapidly, we have adopted a system of versioning proposed in defpackage+. This is also the system alexandria uses where a version number is appended to the API. So, statistics-1 is our current package name. statistics-2 will be the next and so on. If you don't like these names, you can always change it locally using a package local nickname.

Installation

To get a local copy up and running follow these steps:

(ql:quickload :statistics)

or

(asdf:load-system :statistics)

If you already have the system downloaded to your local machine.

If you are using SBCL you will see a large number of notes printed about the inability to optimise. This was the subject of issue #1 and the short answer is that the functions all take arbitrary inputs, do input tests specific to the calculation, and then coerce and provide declarations so that the actual calculations can be optimized. So, you should be able to ignore the notes.

Usage

Create a data frame of weather data:

(load #P"LS:DATA;sg-weather")

and take the mean maximum temperature:

LS-USER> (statistics-1:mean sg-weather:max-temps)

For more examples, please refer to the Documentation.

You can use a package local nickname to give the package a shorter name, e.g. "stats" if you like.

Often times all you'll need is lh-stats for general statistical analysis. You can load that with:

(asdf:load-system :statistics/lh)

NB You can expect to see many warnings when loading lh-stats. These are expected and nothing to worry about.

LH-Stat Functions

These abbreviations are used in function and variable names:

abbreviation meaning
ci confidence interval
cdf cumulative density function
ge greater than or equal to
le less than or equal to
pdf probability density function
sd standard deviation
rxc rows by columns
sse sample size estimate

Descriptive statistics

  • mean
  • median
  • mode
  • geometric mean
  • range
  • percentile
  • variance
  • standard-deviation (sd)
  • coefficient-of-variation
  • standard-error-of-the-mean

Distribution functions

  • Poisson & Binomial
  • binomial-probability
  • binomial-cumulative-probability
  • binomial-ge-probability
  • poisson-probability
  • poisson-cumulative-probability
  • poisson-ge-probability
  • normal
  • normal-pdf
  • convert-to-standard-normal
  • phi
  • z
  • t-distribution
  • chi-square
  • chi-square-cdf

Confidence Intervals

  • binomial-probability-ci
  • poisson-mu-ci
  • normal-mean-ci
  • normal-mean-ci-on-sequences
  • normal-variance-ci
  • normal-variance-ci-on-sequence
  • normal-sd-ci

Hypothesis tests (parametric)

  • z-test
  • z-test-on-sequence
  • t-test-one-sample
  • t-test-one-sample-on-sequence
  • t-test-paired
  • t-test-paired-on-sequences
  • t-test-two-sample
  • t-test-two-sample-on-sequences
  • chi-square-test-one-sample
  • f-test
  • binomial-test-one-sample
  • binomial-test-two-sample
  • fisher-exact-test
  • mcnemars-test
  • poisson-test-one-sample

Hypothesis tests (non-parametric)

  • sign-test
  • sign-test-on-sequence
  • wilcoxon-signed-rank-test
  • chi-square-test-rxc
  • chi-square-test-for-trend

Sample size estimates

  • t-test-one-sample-sse
  • t-test-two-sample-sse
  • t-test-paired-sse
  • binomial-test-one-sample-sse
  • binomial-test-two-sample-sse
  • binomial-test-paired-sse
  • correlation-sse

Correlation and Regression

  • linear-regression
  • correlation-coefficient
  • correlation-test-two-sample
  • spearman-rank-correlation

Significance test functions

  • t-significance
  • f-significance (chi square significance is calculated from chi-square-cdf in various ways depending on the problem)

Utilities

  • random-sample
  • random-pick
  • bin-and-count
  • fishers-z-transform
  • mean-sd-n
  • square
  • choose
  • permutations
  • round-float

Roadmap

gwk-stats has many useful functions. We'd like to port them to use the Lisp-Stat ecosystem of utilities.

Resources

This system is part of the Lisp-Stat project; that should be your first stop for information. Also see the resources and community pages for more information.

Contributing

Contributions are what make the open source community such an amazing place to be learn, inspire, and create. Any contributions you make are greatly appreciated. Please see CONTRIBUTING for details on the code of conduct and the process for submitting pull requests.

Licenses

CLASP Copyright

Copyright (c) 1990 - 1994 University of Massachusetts Department of Computer Science Experimental Knowledge Systems Laboratory Professor Paul Cohen, Director. All rights reserved.

Permission to use, copy, modify and distribute this software and its documentation is hereby granted without fee, provided that the above copyright notice of EKSL, this paragraph and the one following appear in all copies and in supporting documentation.

EKSL makes no representation about the suitability of this software for any purposes. It is provided "AS IS", without express or implied warranties including (but not limited to) all implied warranties of merchantability and fitness for a particular purpose, and notwithstanding any other provision contained herein. In no event shall EKSL be liable for any special, indirect or consequential damages whatsoever resulting from loss of use, data or profits, whether in an action of contract, negligence or other tortuous action, arising out of or in connection with the use or performance of this software, even if EKSL is advised of the possibility of such damages.

Contact

Project Link: https://github.com/lisp-stat/statistics

statistics's People

Contributors

snunez1 avatar

Stargazers

 avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar

statistics's Issues

Improve documentation

This issue is to track two potential improvements in documentation:

  • API reference
  • System manual

API Reference
Lisp-Stat uses declt to generate reference documentation from doc strings and output HTML, PDF, epub, etc. For example see the reference for data-frame. This doesn't work too well for statistics because either:

  1. There is no doc string
  2. The doc string is in a comment preceding the function.

System Manual
It seems that the majority of new users are not from the Common Lisp community and often they're not well versed in statistics either. If we look at other statistical computing systems, like Python, Julia or R, they have robust documentation, with plenty of examples. See the data frame system manual for an example.

3511 compilation notes from SBCL

When compiling with SBCL, 3511 notes are printed. These are mostly optimisation warnings. See attached file.

Generally I've found SBCL's warnings of this kind to be accurate. Kind of a 'lint' for common lisp and, like 'lint', one code fix can suppress many warnings. Probably worth looking into.

compile-notes.txt

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.