Code Monkey home page Code Monkey logo

Comments (18)

abelsiqueira avatar abelsiqueira commented on May 18, 2024 1

In that case, the third option works well. I'll implement and make a PR.

from krylov.jl.

abelsiqueira avatar abelsiqueira commented on May 18, 2024

Shame on me. It failed because I used a rhs with integer values of integer type. Will investigate further, though.

from krylov.jl.

dpo avatar dpo commented on May 18, 2024

What doesn't work exactly?

from krylov.jl.

abelsiqueira avatar abelsiqueira commented on May 18, 2024

It looks like BLAS.dot only accept floats. I see three solutions:

  • Change BLAS.dot to dot;
  • Change the type of b from Array{T,1} to Array{Float64,1}, so that it really won't work for Array{Int,1}, but it will be clearer where;
  • Change r = copy(b) to r = 1.0*b.

from krylov.jl.

dpo avatar dpo commented on May 18, 2024

Yes, BLAS is only float32, float64, complex64 and complex128. In my tests, BLAS.dot can be much faster than dot, but things might have changed. I think b should be declared Vector{T} with T <: Real. If the user's b is integer, we should convert it to float64.

from krylov.jl.

dpo avatar dpo commented on May 18, 2024

Note (for the future) that there are disadvantages to using BLAS.dot:

  1. vectors must be contiguous; you can't use non-contiguous slices
  2. the implementation doesn't generalize to multiple right-hand sides.

Maybe it's time to benchmark dot again.

from krylov.jl.

abelsiqueira avatar abelsiqueira commented on May 18, 2024

Code: https://gist.github.com/abelsiqueira/23148a392462c2d7a07fbdb1f943119f
blas_bench

Minimum, mean and maximum times

blas_bench2

BLAS.dot time/dot time. Greater than 1 means that BLAS.dot is that much faster.

Looks like BLAS.dot is much better for small problems.

from krylov.jl.

dpo avatar dpo commented on May 18, 2024

Many thanks for that! So BLAS.dot remains very much relevant.

from krylov.jl.

dpo avatar dpo commented on May 18, 2024

What BLAS are you using?

from krylov.jl.

abelsiqueira avatar abelsiqueira commented on May 18, 2024

The default. I think that's OpenBLAS.

from krylov.jl.

dpo avatar dpo commented on May 18, 2024

It must be better tuned than mine because if I try to reproduce your experiment, BLAS.dot and dot are pretty much the same at size 100 or even less.

from krylov.jl.

dpo avatar dpo commented on May 18, 2024

Fixed in #18.

from krylov.jl.

dpo avatar dpo commented on May 18, 2024

In the same vein, I'm noticing (with my BLAS) that

  • BLAS.dot(x, y) (as opposed to BLAS.dot(n, x, 1, y, 1)) has the same performance as dot(x,y)
  • BLAS.nrm2(x,y) is similar to BLAS.nrm2(n, x, 1) and both are faster than norm(x).

I modified your script: https://gist.github.com/272c500b0e0a8557682378b6924aeb48

from krylov.jl.

abelsiqueira avatar abelsiqueira commented on May 18, 2024

My results (changed to 15)
blas dot_blas dot simple _dot_ratio
blas dot_vs_blas dot simple _vs_dot
blas nrm2_blas nrm2 simple _norm_ratio
blas nrm2_vs_blas nrm2 simple _vs_norm

from krylov.jl.

dpo avatar dpo commented on May 18, 2024

I observe the same with nrm2. My statement above was incorrect. I fixed it.

from krylov.jl.

abelsiqueira avatar abelsiqueira commented on May 18, 2024

Looking at
https://github.com/JuliaLang/julia/blob/v0.5.0/base/linalg/blas.jl
and
https://github.com/JuliaLang/julia/blob/v0.5.0/base/linalg/generic.jl
I found that

  • dot is sufficiently more complicated than BLAS.dot
  • BLAS.dot simple should be slightly more complicated that BLAS.dot
  • norm is sufficiently more complicated than BLAS.nrm2
  • BLAS.nrm2 simple should be slightly more complicated that BLAS.nrm2
  • BLAS.nrm2 is simpler that BLAS.dot simple
  • BLAS.dot simple uses parametric types, while BLAS.nrm2 simple doesn't.

from krylov.jl.

dpo avatar dpo commented on May 18, 2024

A straightforward comparison of BLAS.scal! with a simple loop equivalent shows that BLAS.scal! always looses! I wonder if there's room for a package that will benchmark those things and export only the best versions (i.e., it will export, say, BLAS1.dot, which will correspond to either BLAS.dot or to Julia dot, depending on which is faster on a given machine with the BLAS used to build Julia). We should keep in mind that the BLAS functions will only be efficient with contiguous arrays of type <: BlasReal.

from krylov.jl.

abelsiqueira avatar abelsiqueira commented on May 18, 2024

I think nobody has gone to the trouble of doing this (or other upgrades on BLAS) because of the relative small performance gain versus the trouble of doing it. But it's at least worth investigating or mentioning on the discourse or the issues.

from krylov.jl.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.