Comments (18)
In that case, the third option works well. I'll implement and make a PR.
from krylov.jl.
Shame on me. It failed because I used a rhs with integer values of integer type. Will investigate further, though.
from krylov.jl.
What doesn't work exactly?
from krylov.jl.
It looks like BLAS.dot
only accept floats. I see three solutions:
- Change
BLAS.dot
todot
; - Change the type of
b
fromArray{T,1}
toArray{Float64,1}
, so that it really won't work forArray{Int,1}
, but it will be clearer where; - Change
r = copy(b)
tor = 1.0*b
.
from krylov.jl.
Yes, BLAS is only float32, float64, complex64 and complex128. In my tests, BLAS.dot
can be much faster than dot
, but things might have changed. I think b
should be declared Vector{T}
with T <: Real
. If the user's b
is integer, we should convert it to float64.
from krylov.jl.
Note (for the future) that there are disadvantages to using BLAS.dot
:
- vectors must be contiguous; you can't use non-contiguous slices
- the implementation doesn't generalize to multiple right-hand sides.
Maybe it's time to benchmark dot
again.
from krylov.jl.
Code: https://gist.github.com/abelsiqueira/23148a392462c2d7a07fbdb1f943119f
Minimum, mean and maximum times
BLAS.dot time/dot time. Greater than 1 means that BLAS.dot is that much faster.
Looks like BLAS.dot
is much better for small problems.
from krylov.jl.
Many thanks for that! So BLAS.dot
remains very much relevant.
from krylov.jl.
What BLAS are you using?
from krylov.jl.
The default. I think that's OpenBLAS.
from krylov.jl.
It must be better tuned than mine because if I try to reproduce your experiment, BLAS.dot and dot are pretty much the same at size 100 or even less.
from krylov.jl.
Fixed in #18.
from krylov.jl.
In the same vein, I'm noticing (with my BLAS) that
BLAS.dot(x, y)
(as opposed toBLAS.dot(n, x, 1, y, 1)
) has the same performance asdot(x,y)
BLAS.nrm2(x,y)
is similar toBLAS.nrm2(n, x, 1)
and both are faster thannorm(x)
.
I modified your script: https://gist.github.com/272c500b0e0a8557682378b6924aeb48
from krylov.jl.
from krylov.jl.
I observe the same with nrm2
. My statement above was incorrect. I fixed it.
from krylov.jl.
Looking at
https://github.com/JuliaLang/julia/blob/v0.5.0/base/linalg/blas.jl
and
https://github.com/JuliaLang/julia/blob/v0.5.0/base/linalg/generic.jl
I found that
dot
is sufficiently more complicated thanBLAS.dot
BLAS.dot
simple should be slightly more complicated thatBLAS.dot
norm
is sufficiently more complicated thanBLAS.nrm2
BLAS.nrm2
simple should be slightly more complicated thatBLAS.nrm2
BLAS.nrm2
is simpler thatBLAS.dot
simpleBLAS.dot
simple uses parametric types, whileBLAS.nrm2
simple doesn't.
from krylov.jl.
A straightforward comparison of BLAS.scal!
with a simple loop equivalent shows that BLAS.scal!
always looses! I wonder if there's room for a package that will benchmark those things and export only the best versions (i.e., it will export, say, BLAS1.dot
, which will correspond to either BLAS.dot
or to Julia dot
, depending on which is faster on a given machine with the BLAS used to build Julia). We should keep in mind that the BLAS functions will only be efficient with contiguous arrays of type <: BlasReal
.
from krylov.jl.
I think nobody has gone to the trouble of doing this (or other upgrades on BLAS) because of the relative small performance gain versus the trouble of doing it. But it's at least worth investigating or mentioning on the discourse or the issues.
from krylov.jl.
Related Issues (20)
- GMRES Convergence Without Meeting atol/rtol HOT 3
- Add an example and a test for the CuSparseMatrixCOO format
- [documentation] Add an optimized linear operator for sparse matrix-vector product on Nvidia GPUs HOT 1
- Add a note about AB-GMRES and BA-GMRES
- Support preconditioners in MINARES
- Add an option reorthogonalization for the processes
- Comments for Krylov processes
- Question for GPU computation: lots of time on vector products HOT 1
- Optimize the block non-Hermitian Lanczos process
- Automatically promote `atol` and `rtol` to `eltype(b)` HOT 8
- Move Block-GMRES in Krylov.jl
- Add a reference for CRAIG
- [documentation] KrylovPreconditioners.jl
- Add more timers in KrylovStats HOT 2
- [documentation] Sensitivity analysis
- Add more tests for Block-GMRES
- Implement copyto! for KrylovStats
- parallel gmres HOT 1
- Download badge is broken HOT 1
- `block_gmres` fails with `LinearOperator` HOT 14
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from krylov.jl.