Comments (3)
I'll leave the discussion of statistical functions to @AtheMathmo, as this certainly is not my area of expertise, but I have some input on the accuracy of floating point summation. First, I think this is an interesting idea and if there is some need for this kind of feature (which I can imagine there is), I personally think that this would be a nice addition to the library, so it gets my vote.
While summing floating point numbers in ascending order in terms of magnitude certainly improves the accuracy of the result, it's still prone to a relatively large error bound (basically proportional to the number of summands, O(n)
). I think if we wish to provide a more accurate method of summation, we should probably provide some stronger guarantees. In fact, it's possible to compute the result to full precision with relatively simple algorithms. This Python cookbook looks useful in this regard. Do you have any thoughts on this, @dgrnbrg?
from rulinalg.
That's absolutely true, Andlon! I would recommend using Kahan summation, since they have a constant memory usage and are easy to implement efficiently--it's already too bad to do the sorting step!
I think that one big question to me is what the API should look like--should the row_sum() and col_sum() functions be parameterized over the algorithm used? I could imagine different applications wishing to choose a la carte the techniques that improve accuracy.
from rulinalg.
Sorry for my delay in answering, I've been travelling. Here's some food for thought.
Kahan summation is - as you point out - already an accurate way of computing the sum, while at the same time maintaining high performance and low memory overhead. However, we need to decide on the desired semantics. Kahan summation essentially provides (much) more accurate summation as opposed to naive summation, whereas summation to full precision gives the most accurate result representable in floating point arithmetic. I can imagine that if someone actually cares about the accuracy of summation, then they might want it to be as exact as possible?
While parametrization over the summation algorithm sounds like a neat solution, the lack of default parameter types in Rust causes some unnecessary friction for the default use case. In likely the vast majority of use cases, the user is not overly interested in the accuracy and just wants to use naive summation. Having to write something like
// Example naming - the point is that an extra `use` statement is necessary
// even for the typical use case of naive summation
use rulinalg::NaiveSummation;
let sums = matrix.sum_rows::<NaiveSummation>();
instead of
let sums = matrix.sum_rows();
would quickly become rather annoying. Here I just picked summation instead of mean computation, but the situation is the same. I think the impact on usability is so great that we need to provide separate methods for the algorithms provided. In your initial post, you proposed something like stable_mean
. I like this, except I think perhaps the name is slightly misleading - the way I see it, the issue is one of accuracy and not stability.
Note that I think parametrizing over the algorithm would probably be the best solution if we had support for default generic types, so we may want to switch to that sometime in the future.
from rulinalg.
Related Issues (20)
- (feature?) More flexible inner products HOT 2
- Make `Matrix::from_fn` row-major
- Using assert_*_equal macros in all tests HOT 1
- SVD goes in an infinite loop for certain matrices HOT 2
- Matrix debug info is incorrect on docs homepage? HOT 2
- Add serde support HOT 1
- Eigenvalues goes into infinite loop for certain matrices HOT 8
- Document that Cholesky only uses the lower triangular part HOT 1
- Variance and mean could be calculated in a better way HOT 1
- Adapt matrix factorizations from nalgebra? HOT 5
- SVD algorithm will segfault if matrix has 0 rows or 0 columns
- 'Matrix row counts not equal.' should give matrix dimensions
- Multidimensional tensors
- Hosted documentation out-of-date
- I want to help HOT 4
- Right-multiplication by permutation matrix is inconsistent with representation
- Matrix Operations on Complex Numbers
- API soundness issue in `raw_slice` and `raw_slice_mut` HOT 1
- [Question] Will sparse matrices be supported?
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from rulinalg.