dcwuser / metanumerics Goto Github PK

Meta.Numerics is library for advanced numerical computing on the .NET platform. It offers an object-oriented API for statistical analysis, advanced functions, Fourier transforms, numerical integration and optimization, and matrix algebra.

Home Page: http://www.meta-numerics.net

C# 100.00%

math-library matrix statistics numerics numerical-optimization numerical-integration numerical-analysis statistical-analysis statistical-tests matrix-factorization matrix-multiplication matrix-library special-functions math scientific-computing data-analysis dotnet csharp-library matrix-algebra optimization

metanumerics's Introduction

Meta.Numerics

Meta.Numerics is a library for advanced numerical computing for the .NET platform. It offers an object-oriented API for data manipulation, statistical analysis, advanced functions, matrix algebra, Fourier transforms, advanced functions, extended precision arithmetic, and solver functionality such as integration, optimization, and root finding.

For more information, visit http://www.meta-numerics.net.

metanumerics's People

Contributors

Stargazers

Watchers

metanumerics's Issues

EM Clustering

We have K-Means clustering, we need to do EM clustering too.

Whitespace in CSV

Un-escaped leading and tailing whitespace in CSV should be ignored. So
...,"a_b",...
should yield a_b for the illustrated cell.

Data: Interpolate

Clearly useful, but not clear how to organize.

Example scenario: given y and random times t, produce interpolated y's at regular t's.

Example scenario: replace null values with interpolated values.

Extended: Int128

Add Int128 type. Will be useful internally for exact null distributions that need large integers.

Data Frames

See data frames for R, pandas, and deedle.

Support CSV, HTML, and JSON import and export. Eventually we should support IDataReader import and export, but for now System.Data APIs are not in .NET core.

Support Select, Where, GroupBy, Pivot, Join, etc.

Support transformations that do not copy an entire table, but instead represent a view of a table. This is different from R data frames, which copy data, and indeed R data frames do poorly with large data sets.

Support missing values. Pandas does this via NaN, but that has a lot of yucky side-effects and the Pandas people appear to basically admit that this was a bad idea, but they are stuck with it. .NET has Nullable and we are starting from scratch, so we should use it.

Some things are easier for R and python because they are dynamically typed. We should try to support strong typing as much as possible. Deedle is a good example of introducing typing, but their type handling does introduce a lot of overhead (e.g. all frames must declare two index types). Also, their API is very F# oriented; it's pretty confusing for me as a C# developer.

Existing statistics APIs should be re-packaged in a way that allows these tables to be used as input.

K-S for 2 samples - possible issue?

I'm not sure if this is an issue, but I'm using the K-S test for 2 samples to examine the compatibility of samples.

I've found that if I use identical samples for sample a and sample b it sometimes tells me the samples are not compatible (i.e., it's a low probability that both samples are drawn from the same underlying distribution).

I don't know enough about how the K-S test works to have an idea about whether that makes any sense, but it's certainly counterintuitive...

Sorry if I'm just wasting your time by flagging up a non-issue!

Regression Results comparison

Since there are several types of regression results were implemented in last release it became hard to choose most appropriate model for my data because of lack of common values between them. Since R squared seems to be incorrect for noninear regressions would it be possible to see regression standard error as a property of basic FitResult class?
I some classes like LinearRR, MultiLinearRR and PolynimialRR it is already calculated as sigma2. It could simply be assigned to that basic property as squared root of sigma2/sigmaSquared.
Residuals list is such a basic feature too. By the way residuals could be implemented as bivariate sample using original x values and resudual y values, which makes it's Y property as currently available Residuals list.

Matrices: Complex

Add complex matrices and vectors.

Enable reduced QR Decomposition

We use QR decomposition to do MultiLinearRegression. The QRDecomposition produces the full n X n Q, which is too big for memory for large data sets (e.g. n = 10000). But we don't need the full Q for this case, so we should provide at least an internal method for producing the reduced Q.

Add CorrectedStandardDeviation

A lot of people expect StandardDeviation to return the Bessel-corrected standard deviation, and most other software packages support this expectation.

To re-iterate why this doesn't make sense:

Other moments are not similarly corrected, e.g. nobody's Skewness means some approximate estimator of the population's skewness.
Bessel's correction is incomplete; it makes the corrected variance an unbiased estimation of the population's variance, but it does not make the corrected standard deviation an unbiased estimation of the population's standard deviation.

Now that our own PopulationStandardDeviation does better than Bessel's correction, there is no method that gives quantity. From the point of view of good statistical practice, that is absolutely as it should be: people who want the sample's standard deviation get it, people who want the population's standard deviation get a better estimator than that provided by Bessel's correction. But from the point of view of customer expectations, it is a disaster. I keep having to explain this in discussions, and probably more people are turned off by the wordy explanation than are convinced by the logic. So as a compromise, we should provide a CorrectedStandardDeviation property that reports what they want, and put the explanation there. If they read the explanation and are convinced, great. If not, they got what they wanted.

StandardDeviation VS PopulationStandardDeviation.Value

I use Meta.Numeric to process my data automatically. But the result confused me. I analysed a list of data( double) via sample to get Min Max Mean and StandardDeviation. everything is ok except StandardDeviation. it is diffirent with the result come from SPSS.
so I check my code and output more data result. such as PopulationStandardDeviation.
the PopulationStandardDeviation.value matched the SPSS' StandardDeviation. So I confused.
I'm not good at statiscs. I am not sure that PopulationStandardDeviation.value is StandardDeviation.

Does anybody tell me what mistake i made?
Thank you.

Statistics: Circular Statistics

Circular sample analysis for Mean, variance, moments, Kuiper tests, etc.

Abstract circular distribution and concrete sub-types including von Mies, uniform, wrapped normal, etc.

Does not include support for spherical and higher-dimensional distributions on balls.

How to handle different periods?

Fisher Exact Test for arbitrary contingency tables

The Kuiper Exact Test is generalizable to r X c contingency tables. Just iterate over all tables with the same marginal totals, compute the probability of each, and add up all the probabilities less than or equal to the one observed.

The number of degrees of freedom is r * c - r - c + 1. For example:

2 X 2: 4 - 2 - 2 + 1 = 1
3 X 2: 6 - 3 - 2 + 1 = 2
3 X 3: 9 - 3 - 3 + 1 = 4
4 X 2: 8 - 4 - 2 + 1 = 3
4 X 3: 12 - 4 - 3 + 1 = 4
4 X 4: 16 - 4 - 4 + 1 = 9

Only 2 X 2 has only a single degree of freedom corresponding to a hyper-geometrically distributed variable, which is why it is the one most commonly supported.

Others can be supported via enumeration of multiple degrees of freedom, or via Monte Carlo simulation. Here is a recent article about the latter: https://arxiv.org/pdf/1507.00070.pdf

Functions: Incomplete elliptic integral ranges

Right now the ranges of the incomplete elliptic integrals are limited to the first period. Expand to all values using range reduction techniques.

Is it possible to use the library with streaming data?

I need to count mean and variance without keeping all the samples in the memory. They will be arriving in a large amount from a remote source and there's no practical reason to keep the data in memory.

Is it possible with the library?

Asymmetry in Fisher's Exact Test

A user reports that the Fisher exact test for
18 16
12 14
gives P = 0.7947, while for
12 14
18 16
it gives P = 0.6147. Since the rows are just permuted, the result should be the same in both cases.

https://www.graphpad.com/quickcalcs/contingency2/ agrees with the 0.7947 value.

Debugging indicates what is happening is this: the Fisher exact test works by iterating over all possible 2X2 tables with the same marginal totals. For each table, the probability of that table (under the null hypothesis of no interaction) is computed, and if that probability is lower than your original tables, it is added to the P value. This idea is to compute the total probability of all tables as or more unlikely than your original table. Because of floating point "noise", one permutation in the first case comes in with a probability ever so slightly below that of your original table and is counted, while the corresponding table in the second case comes in with a probability ever so slightly above that of your original table and is not.

I can probably fix this problem by moving to use the HypergeometricDistribution to do the calculation instead of explicit table enumeration. But I wanted to gain a bit more experience with that class before relying on it. And indeed I am made suspicious by the fact that it appears to give the result 0.7825, close to but not in exact agreement with the web-site or my present calculation.

Changes to FunctionMath.Integrate

My c# code has been using Meta.Numerics 1.3.0.0 and today I upgraded to 3.1.0.0. Now some of my unit tests that call Integrate are failing. I was able to change some values in my unit tests and now they pass. However, I was interested in what changed with this code in between the two versions I list.

compilation problem

When I try with VS 2017 to load the solution file of metanumerics-4.0.7 I get the error:

E:\dev\code_library\numerical\meeta.numerics\metanumerics-4.0.7\Numerics\Numerics.csproj : error : The expression "[System.IO.Path]::Combine(C:\Users\lou\AppData\Local\Temp, .NETPortable,Version=v5.0,Profile=
.AssemblyAttributes.cs)" cannot be evaluated. Illegal characters in path. C:\Program Files (x86)\Microsoft Visual Studio\2017\Community\MSBuild\15.0\Bin\Microsoft.Common.CurrentVersion.targets

Minimization using slope

An immediate application would be linear logistic regression, for which we can produce derivatives analytically.

Data: Union

Combine two data frames. Could be symmetric, i.e. FrameTable.Union (FrameView a, FrameView b) :: FrameTable, or append-oriented, i.e. FrameTable.Append(FrameView) :: void.

Polynomial Root-Finder

Add a root-finding method to the polynomial class.

Functions: Complex Beta

Data: Randomization

Add method to FrameView that supports randomization:

Randomize(int n): Seperate into n equal groups.
Randomize(double p_1, double p_2, ..., double p_{n-1}): Seperate into n groups in the given proportions.

Zeros of Bessel functions

BesselJZero(double nu, int k), BesselKZero(double nu, int k), AiryAiZero(int k), AiryBiZero(int k)

Known initial approximation are good. We should be able to increase convergence speed by using derivatives (and even second derivatives).

Issue with NonlinearRegression

Hi, I'm getting issues with calculating nonlinear regression with my data.
Data in attached file (which is csv file with "," decimal delimiter and ";" column separator) contains 2 variables: age (x) and length (y). What i'm tried is to build nonlinear regression expressed as
y = p[0] * (1.0 - Math.Exp(-p[1] * (x - p[2])))
using NonlinearRegression() function of BivariateSample class and getting Nonconvergence exception. Is there a way to find out a reason and/or give the parameters for result occuracy to that function (if it is a reason why it does not converge).

Code is:
NonlinearRegressionResult fit = data.NonlinearRegression(
(x, p) => { return p[0] * (1.0 - Math.Exp(-p[1] * (x - p[2])))},
new double[] { 1, 1, 0 }
)

P.S.: r nls function eats that data and gives all 3 parameters
esox.txt

Random Poisson Deviates

There is a possible issue with the generation of Poisson deviates for large mu. For mu = 400 and sufficiently large counts, we seem to fail a chi squared test.

sample.Skewness() returns population skewness

The Statistics class has Skewness() and PopulationSkewness() methods where skewness has a bessel correction.

However if you make a new instance of Sample, the two (extension) methods Skewness() and PopulationSkewness() give the same answer in meta.numerics 4.0.7.

This is because sample.Skewness() actually defaults into the Univariate class implementation of Skewness(), which seems to calculate population skewness.

I believe that either Univariate needs fixing, or Sample needs an override Skewness() method which calls through to the Statistics class, but I don't know which is more sensible.

Example code:

        var sample = new Sample { 2, 2.3, -5.6, 10, -21, 42 };

        var sampleSkewness = Statistics.Skewness(sample);
        var populationSkewness = sample.PopulationSkewness(); // Equivalent to Statistics.PopulationSkewness(sample);
        var alsoPopulationSkewness = sample.Skewness(); // Equivalent to Univariate.Skewness(sample);

MPFit.Solve Exception

Hi,
I'm using MPFit.Solve to solve inverse problems. My code references Meta.Numerics version 4.0.7.0. I have a set of input parameters which are throwing a NonconvergenceException, i.e. the algorithm did not converge within the allowed number of iterations. I have attached a screen shot of the exception.

I wondered if this exception could be handled and passed back to the user via the return status.
Please let me know if I can provide any additional data that would help debug this issue.
Thanks for your help!
Best regards,
Carole

RectangularMatrix.SingularValueDecomposition fails in certain cases

For the following matrix:

44.6667 -392.0000 -66.0000
-392.0000 3488.0000 504.0001
-66.0000 504.0001 216.0001

RectangularMatrix.SingularValueDecomposition throws:
Meta.Numerics.NonconvergenceException : The algorithm did not converge within the allowed number of iterations.

For slightly different matrix:
44.6667 -391.9633 -66.0000
-391.9633 3487.4401 503.8801
-66.0000 503.8801 216.0001

it converges.
Is it not so that a SVD exists for EACH matrix?

Version 4.0.7 does not support .NET Framework 4.0 even though it is listed in release

In release 4.0.7 (https://github.com/dcwuser/metanumerics/releases/tag/v4.0.7) we can see that .NET Framework 4.0 is listed:

Targets .NET Standard 1.1 (.NET Framework 4.0 or higher or .NET Core)

but actually according to Microsoft .NET Standard 1.1 is only supported by .NET Framework 4.5+ (https://github.com/dotnet/standard/blob/master/docs/versions/netstandard1.1.md)

Orthogonal polynomial coefficeints

Introduce a polynomial class hierarchy that allows us to get orthogonal polynomials as classes that inherit from Polynomial. This should enable reading coefficients, as well as using properties of the orthogonal polynomials to implement integration, differentiation, etc.

v3 Eigensystem is now Eigendecomposition or Eigendvalues

Sorry for my questions, but i'm trying to convert some code from v3 to v4.

I'm assuming that SquareMatrix.Eigensystem changed to Eigendecomposition.

if so, the old ComplexEigensystem had a method for Eigenvector(int), what replaced it?

Scale each axis separately for muli-dimensional minimizer.

Add RightRegularizedBeta

And use it in right tail of binomial distribution, where we can get 0/0 = NaN for extreme distributions right now.

Better initial guesses for logistic regression

We do a full-on multi-dimensional optimization to get logistic regression parameters via likelihood maximization. I don't see any alternatives in the literature, but we should at least be able to make a better initial guess to feed into that algorithm than "all zeros", which is what we currently do.

dcwuser / metanumerics Goto Github PK

metanumerics's Introduction

Meta.Numerics

metanumerics's People

Contributors

Stargazers

Watchers

Forkers

metanumerics's Issues

Recommend Projects

Recommend Topics

Recommend Org