sarah-ek / faer-rs Goto Github PK
View Code? Open in Web Editor NEWLinear algebra foundation for the Rust programming language
Home Page: https://faer-rs.github.io
License: MIT License
Linear algebra foundation for the Rust programming language
Home Page: https://faer-rs.github.io
License: MIT License
Sorry I don't have time currently to try and track and fix the issue in more detail:
Running cargo t
in faer-core
fails non-deterministically with the error:
failures:
---- mul::tests::triangular stdout ----
thread 'mul::tests::triangular' panicked at 'attempt to subtract with overflow', /home/daniel/.cargo/registry/src/github.com-1ecc6299db9ec823/gemm-0.12.0/src/gemm.rs:960:35
failures:
mul::tests::triangular
test result: FAILED. 12 passed; 1 failed; 0 ignored; 0 measured; 0 filtered out; finished in 0.07s
implement addition for Mat<T>
the traits core::ops::Add<Mat<T>>
and core::ops::Add<&Mat<T>>
should be implemented for Mat<T>
and &Mat<T>
T
should satisfy the trait ComplexField
for those operations to be implementable. Mat::with_dims
could be used, with a closure that takes two elements and adds them
implement the qr decomposition with and without column pivoting, along with the solve/reconstruct/inverse/least_squares functions to complete the decomposition api
the qr module is also needed for the future svd module implementation, since it could be used as a preconditioner.
Is your feature request related to a problem? Please describe.
Faer right now doesn't work with Polars 0.34. We should update it and make sure nothing breaks and the tests still pass.
Describe the solution you'd like
polars = { version = "0.34", optional = true, features = ["lazy"] }
and see if existing functions still work.
Describe alternatives you've considered
NA
Additional context
NA
Thanks in advance
When this matrix
[
[0.0 + 0.0 * I, 0.0 + 0.0 * I, 0.0 + 0.0 * I, 2.220446049250313e-16 + -1.0000000000000002 * I],
[0.0 + 0.0 * I, 0.0 + 0.0 * I, 2.220446049250313e-16 + -1.0000000000000002 * I, 0.0 + 0.0 * I],
[0.0 + 0.0 * I, 2.220446049250313e-16 + -1.0000000000000002 * I, 0.0 + 0.0 * I, 0.0 + 0.0 * I],
[2.220446049250313e-16 + -1.0000000000000002 * I, 0.0 + 0.0 * I, 0.0 + 0.0 * I, 0.0 + 0.0 * I],
]
as a Mat<c64>
is passed to eigenvalues
or complex_eigenvalues
, it throws an error.
numpy does compute eigenvalues:
ar = np.array([
[0.0 + 0.0j, 0.0 + 0.0j, 0.0 + 0.0j, 2.220446049250313e-16 + -1.0000000000000002j],
[0.0 + 0.0j, 0.0 + 0.0j, 2.220446049250313e-16 + -1.0000000000000002j, 0.0 + 0.0j],
[0.0 + 0.0j, 2.220446049250313e-16 + -1.0000000000000002j, 0.0 + 0.0j, 0.0 + 0.0j],
[2.220446049250313e-16 + -1.0000000000000002j, 0.0 + 0.0j, 0.0 + 0.0j, 0.0 + 0.0j],
])
In [1]: np.linalg.eigvals(ar)
Out[1]:
array([ 1.11022302e-16-1.j, -2.22044605e-16+1.j, 1.11022302e-16-1.j,
-2.22044605e-16+1.j])
The error messages are
Assertion failed at /home/lapeyre/.cargo/registry/src/index.crates.io-6f17d22bba15001f/faer-core-0.13.0/src/lib.rs:2093:13:
assert!( row < this.nrows() )
with expansion:
4 < 4
Several other matrices do give the same eigenvalues that numpy
does.
Version: 0.13.1
Arch Linux
I was skimming through the high level API PR, and saw renaming of set_constant
to fill_with_constant
. It's really a minor thing, but how about naming this method just fill
as analogous to slice::fill? As for fill_with_zero
, it does not have a counterpart in std (or I am now aware of it), but fill_zero
sounds fine to me. This way, you can leave fill_with
name for the method taking a filling closure, as is also a convention in std (slice::fill_with).
Is your feature request related to a problem? Please describe.
I have a x: Mat<f64>
and y: MatRef<'_, f64>
and I need to update x = x - y
.
Describe the solution you'd like
I think it would be convenient to have an impl SubAssign<Rhs = MatRef> for Mat
such that it reuses the memory allocated for x
.
Describe alternatives you've considered
I am currently solving my issue with Mat::with_dims
to create a new matrix. Implementing these traits could be a bit more performant, and it would definitely be more readable.
Running cargo bench faer-mt-cplx-llt-512
leads to a segfault on my machine (the corresponding single threaded bench seems to run fine). I get the following traceback:
#0 0x000055c4371ca569 in core::ptr::write<[num_complex::Complex<f64>; 3]> () at /rustc/84c898d65adf2f39a5a98507f1fe0ce10a2b8dbc/library/core/src/ptr/mod.rs:1354
#1 core::ptr::mut_ptr::{impl#0}::write<[num_complex::Complex<f64>; 3]> () at /rustc/84c898d65adf2f39a5a98507f1fe0ce10a2b8dbc/library/core/src/ptr/mut_ptr.rs:1449
#2 gemm_common::pack_operands::pack_generic_inner_loop<num_complex::Complex<f64>, 1, 3> () at /home/adr/.cargo/registry/src/github.com-1ecc6299db9ec823/gemm-common-0.15.2/src/pack_operands.rs:163
#3 gemm_common::pack_operands::pack_generic<num_complex::Complex<f64>, 1, 3> () at /home/adr/.cargo/registry/src/github.com-1ecc6299db9ec823/gemm-common-0.15.2/src/pack_operands.rs:256
#4 gemm_common::pack_operands::pack_rhs::{closure#0}<num_complex::Complex<f64>, 1, 3, gemm_common::simd::x86::Fma> () at /home/adr/.cargo/registry/src/github.com-1ecc6299db9ec823/gemm-common-0.15.2/src/pack_operands.rs:301
#5 gemm_common::simd::x86::{impl#2}::vectorize<gemm_common::pack_operands::pack_rhs::{closure_env#0}<num_complex::Complex<f64>, 1, 3, gemm_common::simd::x86::Fma>> (f=...)
at /home/adr/.cargo/registry/src/github.com-1ecc6299db9ec823/gemm-common-0.15.2/src/simd.rs:50
#6 0x000055c4371c74ef in gemm_common::pack_operands::pack_rhs<num_complex::Complex<f64>, 1, 3, gemm_common::simd::x86::Fma> (n=<optimized out>, k=8192, dst=..., src=..., src_cs=48, src_rs=255, dst_stride=768)
at /home/adr/.cargo/registry/src/github.com-1ecc6299db9ec823/gemm-common-0.15.2/src/pack_operands.rs:299
#7 0x000055c4371c4992 in gemm_common::gemm::gemm_basic_generic::{closure#3}<gemm_common::simd::x86::Fma, num_complex::Complex<f64>, 2, 6, 3, 3, gemm_c64::gemm::f64::fma_cplx::gemm_basic_cplx::{closure_env#0}> (
tid=<optimized out>) at /home/adr/.cargo/registry/src/github.com-1ecc6299db9ec823/gemm-common-0.15.2/src/gemm.rs:395
#8 0x000055c4371f88ea in core::ops::function::impls::{impl#0}::call<(usize), (dyn core::ops::function::Fn<(usize), Output=()> + core::marker::Send + core::marker::Sync)> (args=..., self=<optimized out>)
at /rustc/84c898d65adf2f39a5a98507f1fe0ce10a2b8dbc/library/core/src/ops/function.rs:263
#9 core::ops::function::impls::{impl#1}::call_mut<(usize), &(dyn core::ops::function::Fn<(usize), Output=()> + core::marker::Send + core::marker::Sync)> (args=..., self=<optimized out>)
at /rustc/84c898d65adf2f39a5a98507f1fe0ce10a2b8dbc/library/core/src/ops/function.rs:274
#10 core::iter::traits::iterator::Iterator::for_each::call::{closure#0}<usize, &&(dyn core::ops::function::Fn<(usize), Output=()> + core::marker::Send + core::marker::Sync)> (item=8192)
at /rustc/84c898d65adf2f39a5a98507f1fe0ce10a2b8dbc/library/core/src/iter/traits/iterator.rs:849
#11 core::iter::traits::iterator::Iterator::fold<core::ops::range::Range<usize>, (), core::iter::traits::iterator::Iterator::for_each::call::{closure_env#0}<usize, &&(dyn core::ops::function::Fn<(usize), Output=()> + core::marker::
Send + core::marker::Sync)>> (self=..., f=..., init=<optimized out>) at /rustc/84c898d65adf2f39a5a98507f1fe0ce10a2b8dbc/library/core/src/iter/traits/iterator.rs:2477
#12 core::iter::traits::iterator::Iterator::for_each<core::ops::range::Range<usize>, &&(dyn core::ops::function::Fn<(usize), Output=()> + core::marker::Send + core::marker::Sync)> (self=..., f=0x7fe657ffd740)
at /rustc/84c898d65adf2f39a5a98507f1fe0ce10a2b8dbc/library/core/src/iter/traits/iterator.rs:852
#13 rayon::iter::for_each::{impl#1}::consume_iter<&(dyn core::ops::function::Fn<(usize), Output=()> + core::marker::Send + core::marker::Sync), usize, core::ops::range::Range<usize>> (self=..., iter=...)
at /home/adr/.cargo/registry/src/github.com-1ecc6299db9ec823/rayon-1.7.0/src/iter/for_each.rs:55
#14 rayon::iter::plumbing::Producer::fold_with<rayon::range::IterProducer<usize>, rayon::iter::for_each::ForEachConsumer<&(dyn core::ops::function::Fn<(usize), Output=()> + core::marker::Send + core::marker::Sync)>> (self=...,
folder=...) at /home/adr/.cargo/registry/src/github.com-1ecc6299db9ec823/rayon-1.7.0/src/iter/plumbing/mod.rs:110
#15 rayon::iter::plumbing::bridge_producer_consumer::helper<rayon::range::IterProducer<usize>, rayon::iter::for_each::ForEachConsumer<&(dyn core::ops::function::Fn<(usize), Output=()> + core::marker::Send + core::marker::Sync)>> (
len=<optimized out>, migrated=<optimized out>, splitter=..., producer=..., consumer=...) at /home/adr/.cargo/registry/src/github.com-1ecc6299db9ec823/rayon-1.7.0/src/iter/plumbing/mod.rs:438
OS: arch linux
CPU: AMD Threadripper 1920X
version: current main branch (88dd024)
Is your feature request related to a problem? Please describe.
Hey, this might be an embarrassingly dumb question, but I cannot see a way to add two matrices using +
(or subtract using -
). The official website does mention that the addition and subtraction operators should work as expected, but when I write a+b
for e.g. two Mat<f64>
instances, it won't compile. Looking at the faer_core
crate I cannot find an implementation of core::ops::Add
. What am I missing?
Describe the solution you'd like
It would be great if the +
operator would just work for two suitable Mat<E>
instances as well as one Mat<E>
and the as a suitable MatRef
instance. I can elaborate on what I mean by suitable, but roughly I mean if the entities can be added, then the matrix should be addable (given that the matrix dimensions match).
Describe alternatives you've considered
I could of course do element wise additions manually, but that would feel so much less ergonomic than having an Add
trait implemented. Same for AddAssign
.
Additional context
If I missed an obvious way to add matrices, I'd really appreciate a hint. If the Add
and AddAssign
, Sub
, and SubAssign
traits are indeed not implemented I'd be very happy to file a PR because this crate is really awesome. Please let me know and thanks for your work :)
implement subtraction for Mat<T>
the traits core::ops::Sub<Mat<T>>
and core::ops::Sub<&Mat<T>>
should be implemented for Mat<T>
and &Mat<T>
T
should satisfy the trait ComplexField
for those operations to be implementable. Mat::with_dims
could be used, with a closure that takes two elements and subtracts them
First of all, thanks a lot for your work in faer
, it is impressive. It has also been a lot of fun to play around with the library.
Describe the bug
The results of solving a lower triangular system are unexpected.
To Reproduce
The following code reproduces the error. It also shows how the equivalent code with nalgebra
gets the correct solution:
# Cargo.toml
[package]
name = "test"
version = "0.1.0"
edition = "2021"
[dependencies]
faer-core = "0.9.1"
nalgebra = "0.32.2"
// main.rs
const RESPONSE: [f64; 3] = [-18.0, -60.0, -80.0];
// Toeplitz and lower triangular
fn r_element(i: usize, j: usize) -> f64 {
if i >= j {
let diff = i - j;
if diff < RESPONSE.len() {
RESPONSE[diff]
} else {
0.0
}
} else {
0.0
}
}
fn r_matrix_faer(n: usize) -> faer_core::Mat<f64> {
faer_core::Mat::with_dims(n, n, |i, j| r_element(i, j))
}
fn r_matrix_nalgebra(n: usize) -> nalgebra::DMatrix<f64> {
nalgebra::DMatrix::from_fn(n, n, |i, j| r_element(i, j))
}
// Solve for X in Y = R * X
fn solve_x_faer(r: &faer_core::Mat<f64>, y: &mut faer_core::Mat<f64>) {
faer_core::solve::solve_lower_triangular_in_place_with_conj(
r.as_ref(),
faer_core::Conj::No,
y.as_mut(),
faer_core::Parallelism::None,
);
}
// Solve for X in Y = R * X
fn solve_x_nalgebra(r: &nalgebra::DMatrix<f64>, y: &mut nalgebra::DMatrix<f64>) {
r.solve_lower_triangular_mut(y);
}
fn main() {
let (i, j) = (400, 1);
let r = r_matrix_faer(i);
let mut x = faer_core::Mat::zeros(i, j);
x.write(0, 0, 150.0);
let mut y = r.clone() * x;
solve_x_faer(&r, &mut y); // The values in `y` blow up
// Now, the same thing with nalgebra
let r = r_matrix_nalgebra(i);
let mut x = nalgebra::DMatrix::zeros(i, j);
x[(0, 0)] = 150.0;
let mut y = r.clone() * x;
solve_x_nalgebra(&r, &mut y); // The values in `y` are correct
}
Expected behavior
I would expect the same results as nalgebra
Is your feature request related to a problem? Please describe.
At current there are test but no automated tests run via CI to ensure bugs aren't introduced as changes are made.
Describe the solution you'd like
A GitHub Action to run the test and output our current test coverage.
Describe alternatives you've considered
None seemed natural to use GitHub Actions since we're on GitHub. Believe travis-CI is a viable alternative though.
In the LAPACK Cholesky factorization routines (e.g. DPOTRF) when it encounters an error it returns the details about where it failed (INFO > 0). Is it possible to add something similar to the faer Cholesky routines?
Is your feature request related to a problem? Please describe.
I have a x: Mat<f64>
and an n: f64
and I need to update x = n * x
.
Describe the solution you'd like
I think it would be convenient to implement the MulAssign
trait such that we can do x *= n
and it reuses the memory allocated for x
.
Describe alternatives you've considered
Maybe it would be good to have a generic map_with
method that transforms all the elements of a matrix in place. It could be used to implement this MulAssign
trait, and it would probably be useful for other things as well.
Routines like (Bi-) conjugate gradients, MINRES, GMRES, SYMMLQ, LOBPCG etc. are not part of LAPACK but may fit in faer
as well. Would you accept PRs for a faer-iterative
crate for large/non-dense problems?
Is your feature request related to a problem? Please describe.
Hi, I ran the benchmarks yesterday on my machine (Intel(R) Core(TM) i7-8700 CPU @ 3.20GHz) and did a quick experiment with PGO. The wallclock results are even more impressive and are eventually worth adding to the benchmark section.
Describe the solution you'd like
Make user aware of PGO, link the section in Rust documentation and add some additional estimates of benchmarks.
Additional context
Two examples with least and most improvements. I have seen a consistent improvement for all benchmarks, especially for the parallelized version.
with PGO | w/o PGO |
|
|
with PGO | w/o PGO |
|
|
Wondering if faer is using SIMD api. If not, maybe SIMD can make faer faster
Is your feature request related to a problem? Please describe.
Especially in debugging, it is useful to be able to inspect the matrix in a non-terminal environment. The CSV format is easily implemented and read from, but has not yet been implemented.
Describe the solution you'd like
Methods for writing and reading matrices, either as implementations for a given type, or as generic functions.
Hi @sarah-ek,
Is there any plans on adding any high-level APIs for computing eigenvectors and eigenvalues of square, but not necessarily symmetric, matrices?
I really like the ergonomics of your library but it would be nice to just have a functional, simple API for providing matrices and getting out eigenvalues and their associated eigenvectors without having to hand roll some form of iterative/inverse iteration method in each new crate.
Is your feature request related to a problem? Please describe.
Another maintenance task. Unlike #10 which is geared towards developers knowing what low level changes happened. This one is geared towards user facing changes so the average user knows what changes happened between releases without much hassle on the maintainers side.
Describe the solution you'd like
This is one I have to research a bit more. I forgot the means to do this but I know it's possible and seemed rather useful.
You can assign this one to me
Describe the bug
The Vec<c64>
output of eigenvalues()
for the adjacency matrix of the following
Note: the path graph on
To Reproduce
/* use the smaller unit test linked above
#[test]
fn this_other_tree_has_correct_maximum_eigenvalue() {
let edges = [(3, 2), (6, 1), (7, 4), (7, 6), (8, 5), (9, 4), (11, 2), (12, 2), (13, 2), (15, 6), (16, 2), (16, 4), (17, 8), (18, 0), (18, 8), (18, 2), (19, 6), (19, 10), (19, 14)];
let mut a = faer::Mat::zeros(20, 20);
for (v, u) in edges.iter() {
a[(*v, *u)] = 1.0;
a[(*u, *v)] = 1.0;
}
let eigs_complex: Vec<faer::complex_native::c64> = a.eigenvalues();
println!("{eigs_complex:?}");
let eigs_real = eigs_complex
.iter()
.map(|e| e.re)
.collect::<Vec<_>>();
println!("{eigs_real:?}");
let lambda_1 = *eigs_real
.iter()
.max_by(|a, b| a.partial_cmp(&b).unwrap())
.unwrap();
let correct_lamba_1 = 2.6148611139728866;
assert!(
(lambda_1 - correct_lamba_1).abs() < 1e-10,
"lambda_1 = {lambda_1}, correct_lamba_1 = {correct_lamba_1}",
);
}
*/
Expected behavior
name: graphenv
# channels:
# - defaults
# dependencies:
# - python
# - networkx
# - matplotlib
# - imageio
# - msgpack-python
# - scipy
import networkx as nx
import numpy as np
import matplotlib.pyplot as plt
# Create the graph
edges = [[3, 2], [6, 1], [7, 4], [7, 6], [8, 5], [9, 4], [11, 2], [12, 2], [13, 2], [15, 6], [16, 2], [16, 4], [17, 8], [18, 0], [18, 8], [18, 2], [19, 6], [19, 10], [19, 14]]
graph = nx.Graph()
graph.add_edges_from(edges)
# Get the adjacency matrix eigenvalues
adjacency_matrix = nx.adjacency_matrix(graph).todense()
eigenvalues = np.linalg.eigvals(adjacency_matrix)
# Print the eigenvalues in decreasing order
print("Eigenvalues in decreasing order:")
for eigenvalue in sorted(eigenvalues, reverse=True):
print(eigenvalue)
# Plot the graph
nx.draw(graph, with_labels=True)
plt.show()
Is your feature request related to a problem? Please describe.
Normal math operators like +
, -
, *
, ==
etc. can't be used on sparse matrices. There are currently two bounds that can't be satisfied:
impl GenericMatrix
, which they don't.MatMul
, MatAdd
etc. Right now MatMul
is implemented for sparse types, but not any of the other operations.Describe the solution you'd like
Right now, to multiply a sparse matrix with another matrix, you must manually invoke the MatMul
implementation as so:
<SparseRowMat<_> as MatMul<DenseCol>>::mat_mul(lhs, rhs.as_ref());
And there's no way to add or subtract sparse matrices. It would be nice to be able to do something like lhs * &rhs
and lhs + &rhs
like normal. As well, it would also be nice to have a Debug
and Index
implementation for sparse matrices.
The current iteration of faer::polars::polars_to_faer_fxx
requires that a slice &[&str]
of column names be passed to the function. While there is nothing wrong with this in an absolute sense, it does make for a somewhat awkward workflow when working with polars
.
A pattern more aligned to how polars
functions is to remove the &[&str]
item from the function signature, and have the function create a Mat<E>
from the entire frame. The function should also error if columns containing invalid data types (such as strings) are passed in as well.
Hi!
I am really impressed by your colossal work on this math kernel!
I am writing a Julia wrapper to benchmark faer against OpenBLAS and MKL.
So far I have only studied the dense matrix-matrix multiplication. My preliminary results show faer approximately 50% slower than OpenBLAS and 25% slower than MKL on an AMD Ryzen 5 7640U on 8 threads.
This is basically my first Rust project and I want to be fair to faer: is this a reasonable dynamic library exposing faer inplace matrix multiplication using the C ABI?
I am not sure if opening an issue is the right way to ask but the faer documentation is very sparse at the moment on how to import external matrices.
use faer::modules::core::mul::matmul;
use faer::{mat, Parallelism};
use std::usize;
// inplace c = a * b
#[no_mangle]
pub unsafe extern "C" fn mult(
c_ptr: *mut f64,
c_nrows: u64,
c_ncols: u64,
c_row_stride: u64,
c_col_stride: u64,
a_ptr: *const f64,
a_nrows: u64,
a_ncols: u64,
a_row_stride: u64,
a_col_stride: u64,
b_ptr: *const f64,
b_nrows: u64,
b_ncols: u64,
b_row_stride: u64,
b_col_stride: u64,
nthreads: u32,
) {
assert!(!c_ptr.is_null());
assert!(!a_ptr.is_null());
assert!(!b_ptr.is_null());
let c = unsafe {
mat::from_raw_parts_mut::<f64>(
c_ptr,
c_nrows as usize,
c_ncols as usize,
c_row_stride as isize,
c_col_stride as isize,
)
};
let a = unsafe {
mat::from_raw_parts::<f64>(
a_ptr,
a_nrows as usize,
a_ncols as usize,
a_row_stride as isize,
a_col_stride as isize,
)
};
let b = unsafe {
mat::from_raw_parts::<f64>(
b_ptr,
b_nrows as usize,
b_ncols as usize,
b_row_stride as isize,
b_col_stride as isize,
)
};
matmul(c, a, b, None, 1.0, Parallelism::Rayon(nthreads as usize));
}
Describe the bug
Calling Mat::eigenvalues()
may lead to a panic from a failing assertion. To reproduce this, generate a 4096x4096 matrix with random f64
elements of range 0 to 1 and call the eigenvalues funtion (See code below).
This will result in the following error:
thread 'main' panicked at /home/sthagel/.cargo/registry/src/index.crates.io-6f17d22bba15001f/faer-0.18.1/src/linalg/evd/hessenberg.rs:816:26:
assertion `left == right` failed: iterators must have the same length
left: 31
right: 32
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace
To Reproduce
use faer::mat::Mat;
use num::complex::Complex64;
use rand::prelude::*;
fn main() {
let n = 4096;
let n_sq = n * n;
let mat = (0..n_sq)
.map(|_| {
let mut rng = rand::thread_rng();
rng.gen()
})
.collect::<Vec<f64>>();
let results = eigenvals_faer(&mat, n - 2, n);
println!("{:#?}", results)
}
#[inline]
pub fn eigenvals_faer(
mat: &[f64],
nev: usize,
n: usize,
) -> Result<Vec<Complex64>, &str> {
if nev * n == 0 || nev > n - 2 {
return Err("Invalid parameters!");
}
let faer_mat: Mat<f64> = Mat::from_fn(n, n, |i, j| mat[i + n * j]);
let evs = faer_mat.eigenvalues();
Ok(evs)
}
Expected behavior
One would expect the calculation to produce the desired eigenvalues
Desktop (please complete the following information):
Additional info
On smaller matrices (e.g. 40x40 instead of 4096x4096) it works just fine.
implement basic operations (addition, subtraction, matrix multiplication, multiplication by a scalar) for Mat<T>
T
should satisfy the trait ComplexField
for those operations to be implementable. for the implementation of addition, subtraction, and scalar multiplication, Mat::with_dims
could be used, with a closure that takes two elements and adds or subtracts them
for matrix multiplication, we could create a matrix full of zeros with Mat::zeros
, and then fill it up with the result of the computation with faer_core::mul::matmul
Is your feature request related to a problem? Please describe.
It makes sense to have the macros col!
and row!
in order to be consistent with mat!
. This also seems a good opportunity to include a related macro (or function) blockwise!
:
blockwise![[A, B], [C, D]];
// โโโโโโโโโฌโโโ
// โ โ โ
// โ A โ Bโ
// โ โ โ
// โโโโโโโโโผโโโค
// โ C โ Dโ
// โโโโโโโโโดโโโ
Describe the bug
Running the faer-bench benchmark on 2xAMD EPYC 7V13 64-Core Processor
is surprisingly slow. Both in absolute numbers, as well as in comparing the speedup of faer(par) over faer(seq).
This does not seem to be the case on other, larger AMD server cpu's
To Reproduce
Just to keep track for myself:
CXXFLAGS="-I/u/drehwald/prog" CXX=g++ cargo +nightly run --release --no-default-features --features faer
Expected behavior
Well, don't be slow.
Screenshots
If applicable, add screenshots to help explain your problem.
Desktop (please complete the following information):
โ ~ g++ --version
g++ (Ubuntu 11.3.0-1ubuntu1~22.04) 11.3.0
Copyright (C) 2021 Free Software Foundation, Inc.
This is free software; see the source for copying conditions. There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
โ ~ cargo +nightly --version
cargo 1.71.0-nightly (d0a4cbcee 2023-04-16)
Additional context
Our admin just got back to me, university admins probably won't adjust the perf settings for us, the machine is too busy so it would be a perf risk. But I got access to two other AMD machines, maybe we can use that for pinning the issue down.
Is your feature request related to a problem? Please describe.
This is more of a maintenance task to make it obvious and easy to see what changes have happened over the life of the project.
Describe the solution you'd like
I'm thinking of using an action to generate the Changelog on releases.
Describe alternatives you've considered
Considered GitHub Actions and some other functions for doing it. Decided to stick with native GitHub tooling.
Feel free to assign to me, as I think it would be fun to do.
Will this repository include topics such as Geometric Algerbra, such as G^n. This would be something I would be interested in helping contribute to.
Is your feature request related to a problem? Please describe.
Matrix exponentation utilizing the standard double precision algorithm by Al-Mohy and Higham.
Describe the solution you'd like
A simple function that takes in a reference or matrix view (of some sorts not familiar with this library just yet) and yields a Result< ,> containing the exponentiated matrix if it worked and an error if not.
Describe alternatives you've considered
n/a
Additional context
I have implemented this function for ndarray-linalg but the maintainers have taken months and still no word. I am using my implementation for research purposes, using ndarray I have an error one-norm of about 20x scipy at 200x200 dense matrices and an error of machine precision for 1-sparse matrices up to 1024x1024. It might take some time but I'd be willing to port it over to this library to help move it along, would like to hear how this should best be done.
Specifying or documenting the MSRV somewhere would be helpful for consumers of faer
.
This was discussed on discord.
I saw square matrix sizes up to 1k in the benchmark section. Is it in scope of the project to maintain and optimize larger (50-200k) matrix sizes for multiplication (or other operations)?
You do not support no_std
, right?
The top line of each of the wiki's examples has using faer_core::mat;
or something like that. I think it should be something like use faer::core::mat;
and so forth.
Hello,
Just out of curiosity, are you planning on adding support for linear programming that could run on top of the algebraic operations provided by the project? Or do you think that such thing should be better provided by third-parties using faer-rs
?
Thanks
I ran cargo run in the faer-bench dir and the result was a panic.
The system is a Arm MacOS with the newest nichtly.
leifeld@MacStudovonDirk faer-bench % cargo +nightly run --release --no-default-features
Finished release [optimized] target(s) in 0.08s
Running target/release/faer-bench
f32
Multiplication of two square matrices of dimension n
.
n faer faer(par) ndarray nalgebra eigen
4 33ns 32ns 142ns - -
8 57ns 60ns 122ns - -
16 206ns 211ns 303ns - -
32 1.1ยตs 1.1ยตs 996ns - -
64 8.1ยตs 8.1ยตs 6.7ยตs - -
96 24.8ยตs 30.9ยตs 1.3ms - -
128 59.8ยตs 39.1ยตs 1.2ms - -
192 195.1ยตs 93.3ยตs 1.7ms - -
256 479.7ยตs 202.1ยตs 1.9ms - -
384 1.6ms 448.6ยตs 2.3ms - -
512 3.7ms 1ms 2.6ms - -
640 7.3ms 1.7ms 3ms - -
768 12.5ms 3ms 5.1ms - -
896 19.7ms 4.4ms 6.8ms - -
1024 29.8ms 6.7ms 9.2ms - -
Solving AX = B
in place where A
and B
are two square matrices of dimension n
, and A
is a triangular matrix.
n faer faer(par) ndarray nalgebra eigen
4 14ns 14ns 26.3ยตs - -
8 90ns 90ns 27ยตs - -
16 378ns 380ns 29.9ยตs - -
32 1.4ยตs 1.4ยตs 35.9ยตs - -
64 7.2ยตs 7.2ยตs 54.1ยตs - -
96 20.4ยตs 27.2ยตs 93.5ยตs - -
128 44.9ยตs 40.6ยตs 131.9ยตs - -
192 131.7ยตs 117.5ยตs 290.3ยตs - -
256 301.3ยตs 152.8ยตs 487.1ยตs - -
384 913.6ยตs 450.8ยตs 996.1ยตs - -
512 2.2ms 966ยตs 2.6ms - -
640 4.1ms 1.8ms 2.7ms - -
768 6.7ms 2.6ms 3.9ms - -
896 10.8ms 3.6ms 5.3ms - -
1024 17.7ms 5.5ms 12.6ms - -
Computing A^-1
where A
is a square triangular matrix with dimension n
.
n faer faer(par) ndarray nalgebra eigen
4 160ns 9.3ยตs 26.3ยตs - -
8 468ns 12ยตs 27.1ยตs - -
16 1.2ยตs 16.4ยตs 29.9ยตs - -
32 3.4ยตs 24.1ยตs 35.9ยตs - -
64 9.9ยตs 31.6ยตs 54.1ยตs - -
96 23.2ยตs 39.9ยตs 89.1ยตs - -
128 33.9ยตs 49.9ยตs 133.3ยตs - -
192 86.4ยตs 94.7ยตs 310.4ยตs - -
256 153.1ยตs 143.9ยตs 507.7ยตs - -
384 421.7ยตs 281.3ยตs 1ms - -
512 907.5ยตs 492.6ยตs 2.7ms - -
640 1.6ms 816.2ยตs 2.7ms - -
768 2.6ms 1ms 3.9ms - -
896 4.1ms 1.5ms 5.4ms - -
1024 6.8ms 2.4ms 12.5ms - -
Factorizing a square matrix with dimension n
as LรL.T
, where L
is lower triangular.
n faer faer(par) ndarray nalgebra eigen
4 43ns 43ns 120ns - -
8 154ns 154ns 354ns - -
16 767ns 766ns 1.2ยตs - -
32 2.1ยตs 2.1ยตs 4.8ยตs - -
64 6.3ยตs 6.3ยตs 15ยตs - -
96 15.8ยตs 15.8ยตs 34.2ยตs - -
128 22.2ยตs 22.3ยตs 428.7ยตs - -
192 59.7ยตs 87ยตs 12.5ms - -
256 106.1ยตs 126.4ยตs 3.2ms - -
384 292.4ยตs 331.5ยตs 5.3ms - -
512 651.6ยตs 483.6ยตs 7ms - -
640 1.1ms 1.1ms 11.4ms - -
768 2ms 1.5ms 19.6ms - -
896 3ms 2ms 1.82s - -
1024 4.7ms 2.5ms 22.3ms - -
Factorizing a square matrix with dimension n
as PรLรU
, where P
is a permutation matrix, L
is unit lower triangular and U
is upper triangular.
n faer faer(par) ndarray nalgebra eigen
4 92ns 89ns 154ns - -
8 263ns 238ns 456ns - -
16 895ns 905ns 2ยตs - -
32 2.9ยตs 2.9ยตs 7ยตs - -
64 11.9ยตs 12.2ยตs 22.8ยตs - -
96 27.2ยตs 28.2ยตs 46.3ยตs - -
128 54ยตs 53.7ยตs 88.3ยตs - -
192 141.9ยตs 215.8ยตs 206.9ยตs - -
256 285.8ยตs 384.5ยตs 749.8ยตs - -
384 801.3ยตs 919.1ยตs 1.6ms - -
512 1.8ms 1.7ms 2.4ms - -
640 3.2ms 3.1ms 4.2ms - -
768 5.3ms 4.2ms 5.8ms - -
896 8.2ms 5.8ms 7.7ms - -
1024 12.8ms 7.9ms 9.8ms - -
Factorizing a square matrix with dimension n
as PรLรUรQ.T
, where P
and Q
are permutation matrices, L
is unit lower triangular and U
is upper triangular.
n faer faer(par) ndarray nalgebra eigen
4 160ns 160ns - - -
8 388ns 403ns - - -
16 1.5ยตs 1.5ยตs - - -
32 7.9ยตs 7.8ยตs - - -
64 53.4ยตs 53ยตs - - -
96 174.2ยตs 173.9ยตs - - -
128 417.9ยตs 417ยตs - - -
192 1.5ms 1.5ms - - -
256 3.5ms 3.5ms - - -
384 11.8ms 11.3ms - - -
512 28.9ms 21.8ms - - -
640 55.6ms 33.2ms - - -
768 98ms 49.2ms - - -
896 158.7ms 72.1ms - - -
1024 242.3ms 103.1ms - - -
Factorizing a square matrix with dimension n
as QR
, where Q
is unitary and R
is upper triangular.
n faer faer(par) ndarray nalgebra eigen
4 102ns 100ns 729ns - -
8 294ns 293ns 1.5ยตs - -
16 1.1ยตs 1.2ยตs 4.7ยตs - -
32 4.6ยตs 4.7ยตs 17.2ยตs - -
64 25.3ยตs 25.6ยตs 65.4ยตs - -
96 57.7ยตs 59.4ยตs 454.9ยตs - -
128 110.2ยตs 109.9ยตs 4.4ms - -
192 298.2ยตs 297.7ยตs 37.8ms - -
256 615.9ยตs 796.9ยตs 59ms - -
384 1.8ms 2.1ms 118.2ms - -
512 4ms 3.7ms 197.2ms - -
640 6.7ms 5.1ms 301.4ms - -
768 11.2ms 8ms 429.2ms - -
896 17.3ms 11.9ms 523.8ms - -
1024 25.6ms 16.5ms 632.4ms - -
Factorizing a square matrix with dimension n
as QRP
, where P
is a permutation matrix, Q
is unitary and R
is upper triangular.
n faer faer(par) ndarray nalgebra eigen
4 153ns 140ns - - -
8 414ns 410ns - - -
16 1.5ยตs 1.6ยตs - - -
32 6.5ยตs 6.5ยตs - - -
64 40.7ยตs 66.3ยตs - - -
96 123.3ยตs 167.8ยตs - - -
128 266.4ยตs 327.5ยตs - - -
192 802.5ยตs 862.3ยตs - - -
256 1.8ms 2.3ms - - -
384 5.6ms 6.8ms - - -
512 12.5ms 12.8ms - - -
640 24.6ms 19.9ms - - -
768 41.3ms 28.1ms - - -
896 65.1ms 38.3ms - - -
1024 94.5ms 51.7ms - - -
Computing the inverse of a square matrix with dimension n
.
n faer faer(par) ndarray nalgebra eigen
4 726ns 14.3ยตs 412ns - -
8 1.7ยตs 17.4ยตs 907ns - -
16 4.4ยตs 25.3ยตs 3.9ยตs - -
32 13ยตs 51.3ยตs 12.6ยตs - -
64 42ยตs 84.9ยตs 49.4ยตs - -
96 100.1ยตs 141ยตs 443.4ยตs - -
128 168.5ยตs 209.1ยตs 2.8ms - -
192 430ยตs 454.3ยตs 3.1ms - -
256 837.1ยตs 711.4ยตs 7.3ms - -
384 2.3ms 1.6ms 17.9ms - -
512 5.3ms 3ms 19.4ms - -
640 9.6ms 5ms 31.4ms - -
768 15.2ms 7ms 34.6ms - -
896 24ms 9.7ms 43.1ms - -
1024 37.6ms 14.3ms 50.6ms - -
Computing the SVD of a square matrix with dimension n
.
n faer faer(par) ndarray nalgebra eigen
4 1.1ยตs 1.1ยตs 2ยตs - -
8 5.8ยตs 21.5ยตs 5.8ยตs - -
16 20.3ยตs 54ยตs 19.5ยตs - -
32 72ยตs 116.1ยตs 71.6ยตs - -
64 270.6ยตs 360.5ยตs 3.9ms - -
96 664.6ยตs 971.7ยตs 6.7ms - -
128 1.3ms 1.7ms 44.4ms - -
192 3.5ms 5.1ms 108.8ms - -
256 7.3ms 9.3ms 173.6ms - -
384 21.4ms 20.8ms 310.9ms - -
512 47.5ms 35.9ms 466.5ms - -
640 85.2ms 54.2ms 616.3ms - -
768 143ms 80ms 830.5ms - -
896 225.4ms 115.2ms 1.07s - -
1024 327.3ms 160.4ms 1.23s - -
Computing the SVD of a rectangular matrix with shape (4096, n)
.
n faer faer(par) ndarray nalgebra eigen
4 32.8ยตs 32.4ยตs 644.1ยตs - -
8 112ยตs 150.6ยตs 1.4ms - -
16 318.9ยตs 456.7ยตs 8.4ms - -
32 947.4ยตs 1.2ms 15.5ms - -
64 2.9ms 3.5ms 45.5ms - -
96 5.3ms 5.8ms 61ms - -
128 8.9ms 9ms 100.3ms - -
192 18.7ms 17.5ms 248.2ms - -
256 33.7ms 29.3ms 380.9ms - -
384 75.2ms 54.5ms 657.6ms - -
512 138.8ms 94ms 1.15s - -
640 224.2ms 136.4ms 1.63s - -
768 336.6ms 182.1ms 1.97s - -
896 478.8ms 240.6ms 2.45s - -
1024 656ms 327.2ms 2.68s - -
Computing the EVD of a hermitian matrix with shape (n, n)
.
n faer faer(par) ndarray nalgebra eigen
4 992ns 890ns 1ยตs - -
8 2.5ยตs 2.4ยตs 3.5ยตs - -
16 9.2ยตs 8.7ยตs 12.1ยตs - -
32 36.5ยตs 34.8ยตs 55.5ยตs - -
64 161.4ยตs 194.4ยตs 421.2ยตs - -
96 391.3ยตs 499.3ยตs 1.1ms - -
128 725.3ยตs 841.1ยตs 4.3ms - -
192 1.9ms 2.2ms 22.6ms - -
256 3.7ms 4ms 45.9ms - -
384 10.7ms 12.4ms 114.5ms - -
512 23.2ms 22.7ms 206.6ms - -
640 41.9ms 34.5ms 330.7ms - -
768 69.5ms 51ms 471ms - -
896 107.4ms 70.9ms 657ms - -
1024 156.6ms 92.4ms 888.3ms - -
Computing the EVD of a matrix with shape (n, n)
.
n faer faer(par) ndarray nalgebra eigen
4 3.9ยตs 4.2ยตs 2.4ยตs - -
8 11.6ยตs 13ยตs 6.9ยตs - -
16 43.7ยตs 42.8ยตs 27.4ยตs - -
32 174.2ยตs 188.9ยตs 123.3ยตs - -
64 806.5ยตs 834.3ยตs 658.5ยตs - -
96 1.9ms 2ms 4.4ms - -
128 3.2ms 3.4ms 31.6ms - -
192 8.5ms 9ms 100.3ms - -
256 16.4ms 17ms 228.4ms - -
384 38.3ms 47.9ms 592.5ms - -
512 82.8ms 96.1ms 798.3ms - -
640 146.3ms 164.9ms 1.12s - -
768 230.3ms 243.3ms 1.46s - -
896 340.3ms 341.4ms 1.69s - -
1024 540.2ms 539.9ms 2.45s - -
f64
Multiplication of two square matrices of dimension n
.
n faer faer(par) ndarray nalgebra eigen
4 32ns 32ns 128ns - -
8 80ns 84ns 171ns - -
16 311ns 315ns 345ns - -
32 2.1ยตs 2.2ยตs 1.8ยตs - -
64 14.8ยตs 14.8ยตs 12.1ยตs - -
96 50.3ยตs 41.3ยตs 1.2ms - -
128 124.1ยตs 69.5ยตs 1.4ms - -
192 392.9ยตs 173.1ยตs 1.5ms - -
256 943.6ยตs 315.2ยตs 1.3ms - -
384 3.2ms 889.3ยตs 2.3ms - -
512 7.5ms 1.9ms 3.7ms - -
640 14.5ms 3.4ms 4.9ms - -
768 25.7ms 5.8ms 8.5ms - -
896 30ms 7.3ms 13.1ms - -
1024 45.1ms 11ms 17.4ms - -
Solving AX = B
in place where A
and B
are two square matrices of dimension n
, and A
is a triangular matrix.
n faer faer(par) ndarray nalgebra eigen
4 14ns 14ns 27.9ยตs - -
8 89ns 90ns 27.7ยตs - -
16 429ns 431ns 29.8ยตs - -
32 1.9ยตs 1.9ยตs 35.5ยตs - -
64 11.3ยตs 11.3ยตs 54ยตs - -
96 33.9ยตs 35.3ยตs 81.9ยตs - -
128 75.4ยตs 63.5ยตs 126ยตs - -
192 235ยตs 149.1ยตs 264.2ยตs - -
256 555.4ยตs 254.1ยตs 759.9ยตs - -
384 1.7ms 841.4ยตs 1.2ms - -
512 4.5ms 1.6ms 3.5ms - -
640 7.8ms 2.8ms 3.4ms - -
768 13.1ms 4.5ms 7.3ms - -
896 20.9ms 6.5ms 7ms - -
1024 35.4ms 10ms 18.3ms - -
Computing A^-1
where A
is a square triangular matrix with dimension n
.
n faer faer(par) ndarray nalgebra eigen
4 156ns 12.4ยตs 28ยตs - -
8 479ns 11.4ยตs 27.7ยตs - -
16 1.3ยตs 17ยตs 29.9ยตs - -
32 3.7ยตs 24.1ยตs 34.2ยตs - -
64 11.4ยตs 33ยตs 52.2ยตs - -
96 28.8ยตs 45.4ยตs 212.9ยตs - -
128 45.8ยตs 65.2ยตs 334.2ยตs - -
192 124.9ยตs 130.4ยตs 405.3ยตs - -
256 247ยตs 226ยตs 765ยตs - -
384 712.3ยตs 395.6ยตs 1.2ms - -
512 1.8ms 736.7ยตs 3.4ms - -
640 2.9ms 1.3ms 3.3ms - -
768 4.8ms 2ms 7.2ms - -
896 7.6ms 2.6ms 7.1ms - -
1024 13.5ms 3.9ms 18ms - -
Factorizing a square matrix with dimension n
as LรL.T
, where L
is lower triangular.
n faer faer(par) ndarray nalgebra eigen
4 46ns 46ns 121ns - -
8 146ns 145ns 393ns - -
16 631ns 625ns 1.3ยตs - -
32 2ยตs 2ยตs 5.1ยตs - -
64 7.3ยตs 7.3ยตs 187ยตs - -
96 18.3ยตs 18.3ยตs 1.4ms - -
128 32.2ยตs 32.4ยตs 1.3ms - -
192 88.2ยตs 100.1ยตs 6ms - -
256 186.8ยตs 188.5ยตs 4.8ms - -
384 520.7ยตs 465ยตs 8.8ms - -
512 1.3ms 935.6ยตs 10.4ms - -
640 2.2ms 1.6ms 12.7ms - -
768 3.8ms 2.2ms 20.5ms - -
896 5.7ms 3ms 24.7ms - -
1024 9.5ms 4.6ms 26.2ms - -
Factorizing a square matrix with dimension n
as PรLรU
, where P
is a permutation matrix, L
is unit lower triangular and U
is upper triangular.
n faer faer(par) ndarray nalgebra eigen
4 76ns 93ns 150ns - -
8 225ns 213ns 493ns - -
16 740ns 771ns 2.2ยตs - -
32 3ยตs 2.9ยตs 6.6ยตs - -
64 14.7ยตs 14.8ยตs 21.1ยตs - -
96 37.7ยตs 37.7ยตs 46.5ยตs - -
128 75.7ยตs 76.2ยตs 302.8ยตs - -
192 218.3ยตs 276.5ยตs 620.6ยตs - -
256 474.4ยตs 524ยตs 810.8ยตs - -
384 1.4ms 1.3ms 1.5ms - -
512 3.4ms 2.6ms 3ms - -
640 5.9ms 4.4ms 4.6ms - -
768 9.9ms 6.2ms 6.5ms - -
896 15.5ms 8.5ms 8.7ms - -
1024 24.8ms 12.8ms 11.3ms - -
Factorizing a square matrix with dimension n
as PรLรUรQ.T
, where P
and Q
are permutation matrices, L
is unit lower triangular and U
is upper triangular.
n faer faer(par) ndarray nalgebra eigen
4 119ns 132ns - - -
8 371ns 369ns - - -
16 1.4ยตs 1.4ยตs - - -
32 7.7ยตs 7.7ยตs - - -
64 53.4ยตs 53.4ยตs - - -
96 185.1ยตs 184.7ยตs - - -
128 460.1ยตs 459ยตs - - -
192 1.7ms 1.7ms - - -
256 4.3ms 4.3ms - - -
384 14.8ms 14ms - - -
512 38.2ms 25.8ms - - -
640 73.1ms 38ms - - -
768 128.8ms 56.5ms - - -
896 207.4ms 80.4ms - - -
1024 306.1ms 119.3ms - - -
Factorizing a square matrix with dimension n
as QR
, where Q
is unitary and R
is upper triangular.
n faer faer(par) ndarray nalgebra eigen
4 106ns 106ns 658ns - -
8 295ns 296ns 1.6ยตs - -
16 1ยตs 1ยตs 4.7ยตs - -
32 5ยตs 5ยตs 18ยตs - -
64 31.9ยตs 31.8ยตs 75.6ยตs - -
96 79.7ยตs 79.7ยตs 526.7ยตs - -
128 158.8ยตs 157.9ยตs 4.7ms - -
192 451.3ยตs 450.1ยตs 49.9ms - -
256 963.8ยตs 1.1ms 72.8ms - -
384 2.9ms 2.9ms 148.4ms - -
512 6.6ms 5.4ms 291.2ms - -
640 12.4ms 8.9ms 337.5ms - -
768 20.8ms 13.7ms 413.4ms - -
896 31.6ms 20.7ms 546.6ms - -
1024 45.6ms 28.3ms 734.9ms - -
Factorizing a square matrix with dimension n
as QRP
, where P
is a permutation matrix, Q
is unitary and R
is upper triangular.
n faer faer(par) ndarray nalgebra eigen
4 175ns 162ns - - -
8 422ns 436ns - - -
16 1.6ยตs 1.6ยตs - - -
32 8ยตs 7.9ยตs - - -
64 58.7ยตs 93.9ยตs - - -
96 169ยตs 213.4ยตs - - -
128 358.2ยตs 422.9ยตs - - -
192 1.2ms 1.2ms - - -
256 2.7ms 3ms - - -
384 8.8ms 8.5ms - - -
512 20.5ms 16.5ms - - -
640 39.9ms 27.4ms - - -
768 67.8ms 43.4ms - - -
896 107.9ms 59.4ms - - -
1024 159.2ms 77ms - - -
Computing the inverse of a square matrix with dimension n
.
n faer faer(par) ndarray nalgebra eigen
4 715ns 12.4ยตs 406ns - -
8 1.6ยตs 16.7ยตs 991ns - -
16 4.2ยตs 25.4ยตs 4.1ยตs - -
32 13.6ยตs 53.5ยตs 12.8ยตs - -
64 51.3ยตs 96.7ยตs 56.2ยตs - -
96 130ยตs 169ยตs 450.3ยตs - -
128 240.2ยตs 260.5ยตs 3.3ms - -
192 659.7ยตs 589.3ยตs 6.1ms - -
256 1.4ms 1.1ms 7.2ms - -
384 4.1ms 2.4ms 15.8ms - -
512 9.8ms 4.5ms 21.3ms - -
640 17.3ms 7.5ms 28.9ms - -
768 28.7ms 10.8ms 40ms - -
896 44.9ms 15.3ms 52.6ms - -
1024 74.2ms 23.2ms 70.7ms - -
Computing the SVD of a square matrix with dimension n
.
n faer faer(par) ndarray nalgebra eigen
4 1.4ยตs 1.3ยตs 2.4ยตs - -
8 7.4ยตs 23.1ยตs 7.4ยตs - -
16 23.3ยตs 56.8ยตs 24.9ยตs - -
32 85.2ยตs 130.1ยตs 86.7ยตs - -
64 374ยตs 464.2ยตs 4.1ms - -
96 953ยตs 1.3ms 7ms - -
128 1.9ms 2.2ms 51.7ms - -
192 5.3ms 6.3ms 135.2ms - -
256 11.4ms 11ms 204.5ms - -
384 34.1ms 27.3ms 312.2ms - -
512 77.3ms 48.9ms 516ms - -
640 144.7ms 78.2ms 688.1ms - -
768 248.5ms 119.6ms 912.3ms - -
896 384.6ms 176.2ms 1.16s - -
1024 560.5ms 249.6ms 1.64s - -
Computing the SVD of a rectangular matrix with shape (4096, n)
.
n faer faer(par) ndarray nalgebra eigen
4 43.6ยตs 43.7ยตs 578.8ยตs - -
8 165.7ยตs 180.3ยตs 1.5ms - -
16 471.3ยตs 573.1ยตs 7.8ms - -
32 1.4ms 1.5ms 15.2ms - -
64 4.6ms 4.5ms 58.3ms - -
96 9.3ms 8.2ms 62.8ms - -
128 16.1ms 12.8ms 103.5ms - -
192 34ms 24.8ms 237.3ms - -
256 60.9ms 41.3ms 383.9ms - -
384 134.4ms 83.4ms 663.6ms - -
512 248ms 142.1ms 1.05s - -
640 402.7ms 204.7ms 1.52s - -
768 616.6ms 298.7ms 1.81s - -
896 874.1ms 399.3ms 2.44s - -
1024 1.19s 532.8ms 2.87s - -
Computing the EVD of a hermitian matrix with shape (n, n)
.
n faer faer(par) ndarray nalgebra eigen
4 1.1ยตs 1ยตs 1.2ยตs - -
8 3ยตs 3.6ยตs 4.2ยตs - -
16 12ยตs 11.8ยตs 15.5ยตs - -
32 50.4ยตs 53.6ยตs 68ยตs - -
64 224.6ยตs 245.4ยตs 437.8ยตs - -
96 519.9ยตs 615.6ยตs 1.4ms - -
128 989.7ยตs 1.1ms 6ms - -
192 2.5ms 2.8ms 25.6ms - -
256 5.1ms 4.8ms 51.1ms - -
384 14.5ms 14.8ms 122.3ms - -
512 31.6ms 27.8ms 232.1ms - -
640 58.5ms 44.6ms 370.5ms - -
768 97ms 65.1ms 554ms - -
896 148.8ms 92.1ms 769.1ms - -
1024 218.1ms 122.4ms 1.09s - -
Computing the EVD of a matrix with shape (n, n)
.
n faer faer(par) ndarray nalgebra eigen
4 4.2ยตs 4.3ยตs 2.5ยตs - -
8 14.7ยตs 14ยตs 8.6ยตs - -
16 51.5ยตs 49ยตs 31.3ยตs - -
32 228.1ยตs 225.8ยตs 146.1ยตs - -
64 1ms 1ms 822.7ยตs - -
96 2.8ms 3ms 5.1ms - -
128 5.2ms 5.2ms 25.1ms - -
192 14.2ms 14.4ms 121.7ms - -
256 26.9ms 26.7ms 312.8ms - -
384 69.1ms 78.5ms 624.7ms - -
512 159.2ms 177.8ms 1.02s - -
640 256.7ms 259.4ms 1.17s - -
768 407.6ms 382.6ms 1.69s - -
896 588.6ms 531.4ms 2.03s - -
1024 1.03s 931.6ms 2.75s - -
f128
Multiplication of two square matrices of dimension n
.
n faer faer(par) ndarray nalgebra eigen
4 150ns 151ns - - -
8 1.3ยตs 16.2ยตs - - -
16 8.1ยตs 25.4ยตs - - -
32 60.4ยตs 40.6ยตs - - -
64 474.6ยตs 162.2ยตs - - -
96 1.6ms 519.4ยตs - - -
128 3.8ms 961.5ยตs - - -
192 12.8ms 2.6ms - - -
256 30.7ms 5.5ms - - -
384 103.2ms 16.5ms - - -
512 254.1ms 39.6ms - - -
640 481.4ms 70.9ms - - -
768 866.7ms 133.5ms - - -
896 1.34s 185ms - - -
1024 2.25s 311.7ms - - -
Solving AX = B
in place where A
and B
are two square matrices of dimension n
, and A
is a triangular matrix.
n faer faer(par) ndarray nalgebra eigen
4 73ns 73ns - - -
8 670ns 669ns - - -
16 5.1ยตs 24.7ยตs - - -
32 37.8ยตs 179.7ยตs - - -
64 269.6ยตs 471.8ยตs - - -
96 892.4ยตs 568.2ยตs - - -
128 2.1ms 926.1ยตs - - -
192 6.9ms 2.3ms - - -
256 16ms 4.4ms - - -
384 53.9ms 12ms - - -
512 126.2ms 25.5ms - - -
640 243.8ms 47.3ms - - -
768 419.6ms 71.8ms - - -
896 662.1ms 118.7ms - - -
1024 1.03s 170.5ms - - -
Computing A^-1
where A
is a square triangular matrix with dimension n
.
n faer faer(par) ndarray nalgebra eigen
4 271ns 16ยตs - - -
8 965ns 13.8ยตs - - -
16 4.2ยตs 35.5ยตs - - -
32 22ยตs 85.3ยตs - - -
64 132.6ยตs 330.6ยตs - - -
96 389ยตs 566.4ยตs - - -
128 862.3ยตs 1ms - - -
192 2.7ms 2ms - - -
256 6.1ms 3ms - - -
384 19.6ms 7.6ms - - -
512 46ms 13ms - - -
640 96.4ms 25.3ms - - -
768 155ms 35.2ms - - -
896 235.6ms 54ms - - -
1024 349.3ms 75.2ms - - -
Factorizing a square matrix with dimension n
as LรL.T
, where L
is lower triangular.
n faer faer(par) ndarray nalgebra eigen
4 289ns 286ns - - -
8 692ns 696ns - - -
16 2.4ยตs 2.4ยตs - - -
32 18.9ยตs 58.2ยตs - - -
64 128.9ยตs 393.8ยตs - - -
96 368.2ยตs 844ยตs - - -
128 858.1ยตs 1.6ms - - -
192 2.7ms 3.6ms - - -
256 6.1ms 5.2ms - - -
384 20.1ms 12.8ms - - -
512 45.8ms 19ms - - -
640 88.4ms 36.8ms - - -
768 147.9ms 50.1ms - - -
896 231.4ms 69.7ms - - -
1024 348.7ms 89.7ms - - -
Factorizing a square matrix with dimension n
as PรLรU
, where P
is a permutation matrix,
L
is unit lower triangular and U
is upper triangular.
n faer faer(par) ndarray nalgebra eigen
4 214ns 186ns - - -
8 580ns 596ns - - -
16 2.4ยตs 2.5ยตs - - -
32 21.7ยตs 64.4ยตs - - -
64 173.7ยตs 369.3ยตs - - -
96 581.2ยตs 889.6ยตs - - -
128 1.3ms 1.5ms - - -
192 4.8ms 3.7ms - - -
256 10.9ms 6.4ms - - -
384 36.4ms 16.3ms - - -
512 85.5ms 33.5ms - - -
640 165.6ms 54ms - - -
768 284.5ms 80.2ms - - -
896 450.1ms 125ms - - -
1024 684.1ms 173.6ms - - -
Factorizing a square matrix with dimension n
as PรLรUรQ.T
, where P
and Q
are
permutation matrices, L
is unit lower triangular and U
is upper triangular.
n faer faer(par) ndarray nalgebra eigen
4 310ns 314ns - - -
8 1.2ยตs 1.1ยตs - - -
16 5.4ยตs 5.5ยตs - - -
32 35.5ยตs 35.6ยตs - - -
64 255.5ยตs 256ยตs - - -
96 843.7ยตs 840.9ยตs - - -
128 2ms 2ms - - -
192 6.7ms 6.7ms - - -
256 16.2ms 16.2ms - - -
384 52.2ms 47.8ms - - -
512 125.9ms 76.2ms - - -
640 241.1ms 114.2ms - - -
768 417.1ms 166.8ms - - -
896 661.6ms 232.6ms - - -
1024 996.3ms 326.2ms - - -
Factorizing a square matrix with dimension n
as QR
, where Q
is unitary and R
is upper
triangular.
n faer faer(par) ndarray nalgebra eigen
4 698ns 699ns - - -
8 2.3ยตs 2.3ยตs - - -
16 10.9ยตs 10.7ยตs - - -
32 56.8ยตs 56.6ยตs - - -
64 519.8ยตs 518.4ยตs - - -
96 1.5ms 1.5ms - - -
128 3.4ms 3.5ms - - -
192 10.7ms 10.8ms - - -
256 24.7ms 18.5ms - - -
384 80.8ms 52ms - - -
512 198.4ms 97.7ms - - -
640 368.2ms 146.3ms - - -
768 624ms 219.9ms - - -
896 980.1ms 318.4ms - - -
1024 1.46s 422.8ms - - -
Factorizing a square matrix with dimension n
as QRP
, where P
is a permutation matrix,
Q
is unitary and R
is upper triangular.
n faer faer(par) ndarray nalgebra eigen
4 947ns 939ns - - -
8 3.6ยตs 3.6ยตs - - -
16 16.8ยตs 16.8ยตs - - -
32 97ยตs 97.4ยตs - - -
64 684.6ยตs 714.8ยตs - - -
96 2ms 1.9ms - - -
128 4.4ms 4.2ms - - -
192 13.7ms 13.1ms - - -
256 30.7ms 25.1ms - - -
384 99.1ms 49.1ms - - -
512 231.4ms 90ms - - -
640 449.1ms 151.6ms - - -
768 764.5ms 230.5ms - - -
896 1.21s 332.2ms - - -
1024 1.79s 462ms - - -
Computing the inverse of a square matrix with dimension n
.
n faer faer(par) ndarray nalgebra eigen
4 1.4ยตs 15.1ยตs - - -
8 4.5ยตs 36.1ยตs - - -
16 20.4ยตs 63.4ยตs - - -
32 110.3ยตs 196.8ยตs - - -
64 694.5ยตs 777.2ยตs - - -
96 2.1ms 1.7ms - - -
128 4.7ms 3.1ms - - -
192 15.1ms 7.8ms - - -
256 35ms 13.6ms - - -
384 117.1ms 35.4ms - - -
512 267.1ms 64.2ms - - -
640 524.5ms 120.4ms - - -
768 899.5ms 179.5ms - - -
896 1.4s 273.8ms - - -
1024 2.17s 405.6ms - - -
Computing the SVD of a square matrix with dimension n
.
n faer faer(par) ndarray nalgebra eigen
4 11.6ยตs 11.7ยตs - - -
8 60.4ยตs 96.7ยตs - - -
16 259.8ยตs 333.2ยตs - - -
32 1.1ms 1.1ms - - -
64 5.9ms 5.6ms - - -
96 16.4ms 12.4ms - - -
128 31.5ms 22.2ms - - -
192 89.6ms 51ms - - -
256 194.7ms 89.3ms - - -
384 581.4ms 243.3ms - - -
512 1.32s 464.9ms - - -
640 2.48s 758ms - - -
768 4.13s 1.16s - - -
896 6.45s 1.72s - - -
1024 9.51s 2.27s - - -
Computing the SVD of a rectangular matrix with shape (4096, n)
.
n faer faer(par) ndarray nalgebra eigen
4 1.1ms 2.1ms - - -
8 3.6ms 4.6ms - - -
16 10.2ms 13.2ms - - -
32 32.3ms 33.1ms - - -
64 120.3ms 86.5ms - - -
96 253.1ms 142.4ms - - -
128 444.6ms 216.4ms - - -
192 996.1ms 399.4ms - - -
256 1.8s 645.6ms - - -
384 4.05s 1.24s - - -
512 7.37s 2.1s - - -
640 11.67s 3.04s - - -
768 17.43s 4.6s - - -
896 23.81s 6.02s - - -
1024 32.59s 7.85s - - -
Computing the EVD of a hermitian matrix with shape (n, n)
.
n faer faer(par) ndarray nalgebra eigen
4 5.4ยตs 5.8ยตs - - -
8 21.9ยตs 25.7ยตs - - -
16 102ยตs 99.5ยตs - - -
32 515.3ยตs 497.8ยตs - - -
64 3.1ms 3.1ms - - -
96 8.6ms 7.2ms - - -
128 17.5ms 13.1ms - - -
192 51.1ms 33.1ms - - -
256 108.3ms 65.1ms - - -
384 339.3ms 152.3ms - - -
512 745.4ms 283.9ms - - -
640 1.4s 457.2ms - - -
768 2.36s 702.8ms - - -
896 3.72s 1.03s - - -
1024 5.45s 1.37s - - -
Computing the EVD of a matrix with shape (n, n)
.
n faer faer(par) ndarray nalgebra eigen
4 16.2ยตs 14.7ยตs - - -
8 68.3ยตs 101.5ยตs - - -
16 349.1ยตs 388ยตs - - -
32 1.8ms 1.8ms - - -
64 11.4ms 12.8ms - - -
96 46.5ms 60.2ms - - -
128 92.9ms 104.8ms - - -
192 299.1ms 221.5ms - - -
256 593.9ms 418.8ms - - -
384 1.61s 978.8ms - - -
512 3.46s 2s - - -
640 6.26s 3.63s - - -
768 10.17s 5.24s - - -
896 15.31s 7.25s - - -
1024 22.59s 9.99s - - -
c32
Multiplication of two square matrices of dimension n
.
n faer faer(par) ndarray nalgebra eigen
4 61ns 61ns 97ns - -
8 317ns 319ns 208ns - -
16 2ยตs 2ยตs 659ns - -
32 14.1ยตs 14.1ยตs 4ยตs - -
64 107.4ยตs 107.4ยตs 837.1ยตs - -
96 356ยตs 123.6ยตs 1.1ms - -
128 834.9ยตs 285.1ยตs 1.3ms - -
192 2.8ms 675.1ยตs 1.5ms - -
256 6.6ms 1.4ms 2ms - -
384 22.3ms 4.6ms 3.1ms - -
512 52.6ms 10.6ms 6.7ms - -
640 103.6ms 19.3ms 9.9ms - -
768 178.3ms 32.5ms 16.9ms - -
896 282.8ms 50.6ms 23.1ms - -
1024 422.1ms 74.6ms 33.5ms - -
Solving AX = B
in place where A
and B
are two square matrices of dimension n
, and A
is a triangular matrix.
n faer faer(par) ndarray nalgebra eigen
4 22ns 22ns 26.8ยตs - -
8 203ns 202ns 27.6ยตs - -
16 1.3ยตs 1.3ยตs 30.5ยตs - -
32 8.8ยตs 8.8ยตs 40.6ยตs - -
64 62.7ยตs 62.7ยตs 59.5ยตs - -
96 206ยตs 132ยตs 332.9ยตs - -
128 466.5ยตs 272ยตs 432.2ยตs - -
192 1.5ms 558.7ยตs 602.8ยตs - -
256 3.5ms 1.1ms 919.3ยตs - -
384 11.7ms 2.9ms 1.8ms - -
512 27.4ms 6.2ms 4.5ms - -
640 53.3ms 11.7ms 5ms - -
768 91.6ms 19.5ms 10.2ms - -
896 144.6ms 29.4ms 12ms - -
1024 215.7ms 43.6ms 25.1ms - -
Computing A^-1
where A
is a square triangular matrix with dimension n
.
n faer faer(par) ndarray nalgebra eigen
4 186ns 15.9ยตs 27.5ยตs - -
8 534ns 12.7ยตs 27.6ยตs - -
16 1.8ยตs 16.5ยตs 30.5ยตs - -
32 7.1ยตs 27.2ยตs 40.7ยตs - -
64 35.2ยตs 56.5ยตs 59.6ยตs - -
96 101.2ยตs 146.3ยตs 85.4ยตs - -
128 208.8ยตs 251.4ยตs 454.6ยตs - -
192 636ยตs 576.1ยตs 597.7ยตs - -
256 1.4ms 1ms 1.1ms - -
384 4.4ms 1.8ms 1.8ms - -
512 10.1ms 3.1ms 4.5ms - -
640 19.3ms 5.3ms 5ms - -
768 32.8ms 7.9ms 10.2ms - -
896 51.7ms 11.2ms 12ms - -
1024 76.5ms 16.3ms 25ms - -
Factorizing a square matrix with dimension n
as LรL.T
, where L
is lower triangular.
n faer faer(par) ndarray nalgebra eigen
4 45ns 45ns 124ns - -
8 172ns 172ns 397ns - -
16 925ns 925ns 1.3ยตs - -
32 5.1ยตs 5.2ยตs 6.6ยตs - -
64 30.3ยตs 30.4ยตs 176.3ยตs - -
96 85.9ยตs 86ยตs 1.7ms - -
128 192.3ยตs 192.3ยตs 1.5ms - -
192 583.1ยตs 541.4ยตs 4ms - -
256 1.3ms 934.9ยตs 5ms - -
384 4.2ms 2.3ms 10ms - -
512 9.8ms 3.8ms 11.5ms - -
640 18.9ms 6.5ms 26.6ms - -
768 32.2ms 9.9ms 20.1ms - -
896 50.8ms 13.8ms 18.9ms - -
1024 76.1ms 19.1ms 28.1ms - -
Factorizing a square matrix with dimension n
as PรLรU
, where P
is a permutation matrix, L
is unit lower triangular and U
is upper triangular.
n faer faer(par) ndarray nalgebra eigen
4 105ns 91ns 176ns - -
8 266ns 273ns 566ns - -
16 1ยตs 1.1ยตs 2.3ยตs - -
32 6.5ยตs 6.6ยตs 7.8ยตs - -
64 45.5ยตs 45.4ยตs 30.9ยตs - -
96 145.9ยตs 145.5ยตs 74ยตs - -
128 327.9ยตs 326.5ยตs 360.5ยตs - -
192 1.1ms 863.8ยตs 756ยตs - -
256 2.4ms 1.7ms 1ms - -
384 8ms 4.2ms 2.6ms - -
512 18.7ms 8ms 4.7ms - -
640 36.3ms 13.9ms 7.6ms - -
768 62.6ms 20.9ms 10.8ms - -
896 98.6ms 29.8ms 14.7ms - -
1024 146.5ms 43ms 24ms - -
Factorizing a square matrix with dimension n
as PรLรUรQ.T
, where P
and Q
are permutation matrices, L
is unit lower triangular and U
is upper triangular.
n faer faer(par) ndarray nalgebra eigen
4 191ns 185ns - - -
8 573ns 516ns - - -
16 2.2ยตs 2.3ยตs - - -
32 12.4ยตs 12.3ยตs - - -
64 82.7ยตs 82.4ยตs - - -
96 262.9ยตs 262.7ยตs - - -
128 609.8ยตs 609.7ยตs - - -
192 2ms 2ms - - -
256 4.9ms 4.9ms - - -
384 16.4ms 15.6ms - - -
512 40.7ms 29.4ms - - -
640 77.3ms 41ms - - -
768 134.2ms 63.2ms - - -
896 215.1ms 99.3ms - - -
1024 316.8ms 136.3ms - - -
Factorizing a square matrix with dimension n
as QR
, where Q
is unitary and R
is upper triangular.
n faer faer(par) ndarray nalgebra eigen
4 164ns 164ns 879ns - -
8 487ns 487ns 2.1ยตs - -
16 2.5ยตs 2.5ยตs 6.9ยตs - -
32 16.1ยตs 16.1ยตs 29.1ยตs - -
64 132.5ยตs 132.5ยตs 1ms - -
96 377.1ยตs 377.3ยตs 5.2ms - -
128 815.1ยตs 815.2ยตs 9.7ms - -
192 2.5ms 2.5ms 47.7ms - -
256 5.7ms 4.7ms 68.9ms - -
384 18.2ms 11.3ms 132.7ms - -
512 41.9ms 21.9ms 176.6ms - -
640 78.9ms 34.2ms 266ms - -
768 134.3ms 52.8ms 350.6ms - -
896 210.9ms 77.2ms 453.1ms - -
1024 313.2ms 109.5ms 544.3ms - -
Factorizing a square matrix with dimension n
as QRP
, where P
is a permutation matrix, Q
is unitary and R
is upper triangular.
thread 'main' panicked at 'cast>SizeMismatch', /Users/leifeld/.cargo/registry/src/index.crates.io-6f17d22bba15001f/bytemuck-1.13.1/src/internal.rs:32:3
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace
leifeld@MacStudovonDirk faer-bench % 77.3ms 41ms - - -
768 134.2ms 63.2ms - - -
896 215.1ms 99.3ms - - -
1024 316.8ms 136.3ms - - -
Factorizing a square matrix with dimension n
as QR
, where Q
is unitary and R
is upper triangular.
n faer faer(par) ndarray nalgebra eigen
4 164ns 164ns 879ns - -
8 487ns 487ns 2.1ยตs - -
16 2.5ยตs 2.5ยตs 6.9ยตs - -
32 16.1ยตs 16.1ยตs 29.1ยตs - -
64 132.5ยตs 132.5ยตs 1ms - -
96 377.1ยตs 377.3ยตs 5.2ms - -
128 815.1ยตs 815.2ยตs 9.7ms - -
192 2.5ms 2.5ms 47.7ms - -
256 5.7ms 4.7ms 68.9ms - -
384 18.2ms 11.3ms 132.7ms - -
512 41.9ms 21.9ms 176.6ms - -
640 78.9ms 34.2ms 266ms - -
768 134.3ms 52.8ms 350.6ms - -
896 210.9ms 77.2ms 453.1ms - -
1024 313.2ms 109.5ms 544.3ms - -
Factorizing a square matrix with dimension n
as QRP
, where P
is a permutation matrix, Q
is unitary and R
is upper triangular.
thread 'main' panicked at 'cast>SizeMismatch', /Users/leifeld/.cargo/registry/src/index.crates.io-6f17d22bba15001f/bytemuck-1.13.1/src/internal.rs:32:3
Is your feature request related to a problem? Please describe.
Currently none of the matrix multiplication crates support the half (fp16) datatype. There is the half crate (https://crates.io/crates/half) that includes a rust type for this.
This means for any crate that needs to support matrix multiplication on CPU with f16 datatype, they need to manually implement a matrix multiplication algorithm, which can be very slow.
Describe the solution you'd like
It'd be great if faer included configurable support for these data types.
Describe alternatives you've considered
I'm currently using this super naive matmul implementation:
fn naive_gemm<F: num_traits::Float + std::ops::AddAssign, M: Dim, K: Dim, N: Dim>(
(m, k, n): (M, K, N),
ap: *const F,
a_strides: [usize; 2],
bp: *const F,
b_strides: [usize; 2],
cp: *mut F,
c_strides: [usize; 2],
) {
for i_m in 0..m.size() {
for i_k in 0..k.size() {
for i_n in 0..n.size() {
unsafe {
let a = *ap.add(a_strides[0] * i_m + a_strides[1] * i_k);
let b = *bp.add(b_strides[0] * i_k + b_strides[1] * i_n);
let c = cp.add(c_strides[0] * i_m + c_strides[1] * i_n);
*c += a * b;
}
}
}
}
}
Which is extremely slow. I'm not sure what the other alternatives are.
Additional context
I'm the author of dfdx. dfdx has both cuda/CPU support, and CUDA does have hardware acceleration for f16. It'd just be nice if f16 matmul on CPU was a bit faster!
Currently my pyO3 bindings that call some rust routines that use faer look something like:
use numpy::{IntoPyArray, PyArray2, PyReadonlyArray2};
use pyo3::{exceptions::PyRuntimeError, pymodule, types::PyModule, PyResult, Python};
use faer::{prelude::*}
// ...
#[pymodule]
fn rust_lib<'py>(_py: Python<'py>, m: &'py PyModule)
-> PyResult<()>
{
#[pyfn(m)]
fn rpca<'py>(py: Python<'py>, a_py: PyReadonlyArray2<'py, f64>, n_rank: usize, n_iters: usize, n_oversamples: usize)
-> &'py PyArray2<f64>
{
let a_ndarray = a_py.as_array();
let a_faer = a_ndarray.view().into_faer();
// ... faer-rs math
ndarray_result.into_pyarray(py)
}
}
I'm unsure if I am using the best approach.
// ...
#[pyfn(m)]
fn rpca<'py>(py: Python<'py>, a_py: PyReadonlyArray2<'py, f64>, n_rank: usize, n_iters: usize, n_oversamples: usize)
-> &'py PyArray2<f64>
{
let a_faer = a_py.into_faer();
// ... faer-rs math
faer_result.into_pyarray(py)
}
Is your feature request related to a problem? Please describe.
Some problems need a Sparse Cholesky solver, such as GraphSlam
Describe the solution you'd like
Implement a Sparse Cholesky Solver
Describe alternatives you've considered
In Rust, I used Russel which wraps SuiteSparse
Raising issue as a bookmark for making an initial contribution as suggested by @sarah-ek.
It would be great if there will be, for any function, a low level API which exposes the needed workspace to avoid any allocations of the function.
The work is impressive. Being competitive with the big guys is nothing short of amazing considering this is a single person show.
Although the IntoNdarray
trait itself depends on the ndarray
feature, all of the implementations depend on the nalgebra
feature.
I'm frequently using matrix-diagonal and diagonal-diagonal multiplication (which is a component-wise vector mul). Since populating a matrix with a diagonal does lead to wasted performance, I'm using a modified copy of div_by_s
I found in faer source.
Does it make sense to flesh out a Diagonal type?
Is your feature request related to a problem? Please describe.
It would be great to have an equivalent to DSYRK in faer, to do the operations
C = alpha * A * A^T + beta * C
and
C = alpha * A^T * A + beta * C
This is useful for example when accumulating normal equations.
Describe the solution you'd like
A symmetric rank k routine.
Describe alternatives you've considered
The operation can be done with GEMM, but is significantly slower.
Apache Arrow is an aligned chunked array representation of columns. If my matrix is a dense matrix encoded as a chunked array of float64 numbers is it possible to use faer-rs on top of it in a zero-copy way? I assume further constraints (chunksize has to be a multiple of some batch size) will be required. Or will we have to combine the (relatively large) chunks into a continuous aligned array first before feeding it to faer?
I have just seen your crate and ran a quick benchmark against intel mkl.
I got the following result
test bench_faer_rs_n ... bench: 33,479,057 ns/iter (+/- 9,121,197)
test bench_faer_rs_t ... bench: 32,854,984 ns/iter (+/- 7,885,723)
test bench_mkl_n ... bench: 35,916,818 ns/iter (+/- 9,305,052)
test bench_mkl_t ... bench: 35,681,294 ns/iter (+/- 10,449,638)
I tried it several times just to make sure it is not a luck result, or bug. Having on par performance with intel mkl library is amazing!!!.
Maybe you can confirm yourself and put a comparison with intelmkl to documentation since it is one of the fastest blas implementation, if not the fastest.
Describe the bug
For many graph adjacency matrices with small eigenvalues, faer
crashes on an i32
overflow when computing eigenvalues as a self-adjoint matrix.
The precise line is iter += 1;
.
My current workaround to avoid selfadjoint_eigenvalues()
is
let eigs: Vec<faer::complex_native::c64> = a.eigenvalues();
To Reproduce
let mat: [[f64; 20]; 20] = [[0.0, 0.0, 1.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0], [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 1.0, 0.0, 0.0, 0.0], [1.0, 0.0, 0.0, 0.0, 0.0, 1.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 1.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0], [0.0, 0.0, 0.0, 0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0], [1.0, 0.0, 0.0, 0.0, 0.0, 1.0, 1.0, 1.0, 0.0, 0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0], [0.0, 0.0, 1.0, 1.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0], [0.0, 0.0, 1.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 1.0, 0.0, 0.0, 1.0, 0.0, 1.0, 0.0], [0.0, 0.0, 0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 1.0, 0.0, 1.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0], [0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0], [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 1.0, 0.0, 0.0], [0.0, 0.0, 0.0, 0.0, 0.0, 1.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 1.0, 1.0, 0.0, 1.0, 0.0, 0.0], [0.0, 0.0, 0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 1.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 1.0, 0.0, 0.0, 0.0], [0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0], [0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 1.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 1.0], [1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0], [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0], [0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0], [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 1.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0], [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 1.0], [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 1.0, 0.0]];
let mut matt = faer::Mat::zeros(20, 20);
for i in 0..20 {
for j in 0..20 {
matt[(i, j)] = mat[i][j];
}
}
let eigs = matt.selfadjoint_eigenvalues(faer::Side::Lower);
Expected behavior
Expected eigs
to return a vector of
For example,
# `bad_mat.py`
# run with `python bad_mat.py`
import numpy as np
mat = [[0.0, 0.0, 1.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0], [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 1.0, 0.0, 0.0, 0.0], [1.0, 0.0, 0.0, 0.0, 0.0, 1.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 1.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0], [0.0, 0.0, 0.0, 0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0], [1.0, 0.0, 0.0, 0.0, 0.0, 1.0, 1.0, 1.0, 0.0, 0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0], [0.0, 0.0, 1.0, 1.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0], [0.0, 0.0, 1.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 1.0, 0.0, 0.0, 1.0, 0.0, 1.0, 0.0], [0.0, 0.0, 0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 1.0, 0.0, 1.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0], [0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0], [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 1.0, 0.0, 0.0], [0.0, 0.0, 0.0, 0.0, 0.0, 1.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 1.0, 1.0, 0.0, 1.0, 0.0, 0.0], [0.0, 0.0, 0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 1.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 1.0, 0.0, 0.0, 0.0], [0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0], [0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 1.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 1.0], [1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0], [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0], [0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0], [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 1.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0], [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 1.0], [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 1.0, 0.0]]
mat = np.array(mat)
eigs = np.linalg.eig(mat)[0]
print(eigs)
real_parts = [np.real(_) for _ in eigs]
print(real_parts)
''' output:
[ 3.69390168e+00+0.00000000e+00j -3.23272209e+00+0.00000000e+00j
-2.52597629e+00+0.00000000e+00j 2.29766159e+00+0.00000000e+00j
2.19984492e+00+0.00000000e+00j -2.25650189e+00+0.00000000e+00j
-1.98531299e+00+0.00000000e+00j -1.48123145e+00+0.00000000e+00j
-1.10237216e+00+0.00000000e+00j 1.47508155e+00+0.00000000e+00j
-6.62467970e-01+0.00000000e+00j -5.32411898e-01+0.00000000e+00j
-2.28816715e-01+0.00000000e+00j 4.97954067e-01+0.00000000e+00j
7.40821370e-01+0.00000000e+00j 1.10254826e+00+0.00000000e+00j
1.00000000e+00+0.00000000e+00j 1.00000000e+00+0.00000000e+00j
-1.88640048e-17+9.57840098e-17j -1.88640048e-17-9.57840098e-17j]
[3.6939016833383165, -3.2327220859059014, -2.52597628501941, 2.2976615943259593, 2.199844923144114, -2.2565018926765257, -1.9853129913610488, -1.4812314534620308, -1.102372157787216, 1.4750815501253782, -0.6624679696406541, -0.5324118977580056, -0.22881671515516241, 0.49795406722182745, 0.7408213702790264, 1.1025482603313292, 1.0000000000000004, 1.0000000000000004, -1.886400483817679e-17, -1.886400483817679e-17]
'''
Additional context
Feel free to suggest a different method to use!
Since these matrices have a few very small eigenvalues, perhaps it is appropriate to tweak the high-level function to instead calculate a good guess for consider_zero_threshold
in the higher-level API.
In this instance, the value equals 2.2250738585072014e-308
.
pub fn compute_tridiag_real_evd_qr_algorithm<E: RealField>(
diag: &mut [E],
offdiag: &mut [E],
u: Option<MatMut<'_, E>>,
epsilon: E,
consider_zero_threshold: E,
) {
Is it possible to make rayon
-dependency optional? If so, would you consider accepting a PR that does this?
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.