Code Monkey home page Code Monkey logo

Comments (10)

pkofod avatar pkofod commented on June 28, 2024

Hi @Magalame !

Can you maybe elaborate on where the bottle neck is? Is your suspicion that if you could thread the evaluation of the model to calculate the residuals on multiple cores, your performance issues would be mitigated?

If you can maybe post some code that illustrates the issue, we can try to experiment / locate the issue.

Ps: is support for lazy structures (like iterators, or lazy arrays) planned?

What kind of lazy arrays and where in the chain? Do you mean as the parameter vector, or the residual vector?

from lsqfit.jl.

Magalame avatar Magalame commented on June 28, 2024

Hi!

My project basically consists of a simulation of a physical model, and then extracting some properties from it. Running the simulation itself is fine: extracting the properties I want is the bottle neck, and I need to fit a curve in order to do so.

I have fairly large input/output arrays (~ 1 million cells), and applying curve_fit to them constitutes about 72% of the run time. I got some good improvements by providing the analytical jacobian, and then optimizing it as much as I could.

Here's the relevant portion of the code:

function numerical(ttL::Array{Complex{Float64},1},params::Params)
    
    
    GtimeL2,ttn = fourier1d(ttL,params.dE)
    tind::Array{Int64,1}=findall(ttn .> 0)
    
    tfft::Array{Float64,1} = ttn[tind]
    Gfft::Array{Complex{Float64},1} = GtimeL2[tind]
    
    logabs::Array{Float64} = log.(abs.(Gfft))
    
    taulin = 0
    durbinlin = 0
    
    let 
        top =  logabs[1]
        down = logabs[end]
        a0 = (down-top)/(tfft[end]-tfft[1])
        b0 = logabs[1]
        fit = curve_fit(linmodel, jacobianlin, tfft, logabs,[a0,b0])   #This takes about 12% of the run time
        taulin = fit.param
        durbinlin = durbinwatson(fit.resid)
        
    end
   
    taupiece = 0
    durbinpiece = 0
   
    let
        
        # [...] lots of preliminary computations go there, mainly to obtain a good estimate on the guess parameters 

        fit = curve_fit(piecemodel, jacobianpiece, tfft, logabs,[a,b,m,c,x0]) #Takes about 50% of run time
        
        taupiece = fit.param
        durbinpiece = durbinwatson(fit.resid)
        
        end
    
    #[...] some postprocessing computations
      
    
end

The functions fitted (a linear function, and a piecewise composed of two linear functions) and their jacobian are the following:

@. linmodel(x, p) = p[1]*x+p[2]

function jacobianlin(x,p)
    J = Array{Float64}(undef, length(x),length(p))
    J[:,1] .= x    #dmodel/dp[1]
    J[:,2] .= 1  #dmodel/dp[2]
    J
end

@. piecemodel(x,p) = piecewise(x,p..)

function piecewise(x::Float64,a,b,m,c,x0)
    if x <= x0
        b + a*x;
    else
        c + m*x;
    end
end

function jacobianpiece(x,p)::Array{Float64,2}
    J::Array{Float64,2} = zeros(Float64,length(x),length(p))
    dx = x[2]-x[1]
    
    i::Int = 1
    while i <= length(x) && x[i] <= p[end]
        J[i,1] = x[i]
        i += 1
    end
    
    if x[end] + dx >= p[end] #otherwise no x[end] dependence
        if i <= length(x) #checks that we are within data range
            J[i,5] = (piecewise(x[i], p[1],p[2],p[3],p[4],p[5]+dx)-piecewise(x[i], p[1],p[2],p[3],p[4],p[5]-dx))/(2*dx)
        end
        if i-1 >= 1
            J[i-1,5] = (piecewise(x[i-1], p[1],p[2],p[3],p[4],p[5]+dx)-piecewise(x[i-1], p[1],p[2],p[3],p[4],p[5]-dx))/(2*dx)
        end
        if i-2 >= 1
            J[i-2,5] = (piecewise(x[i-2], p[1],p[2],p[3],p[4],p[5]+dx)-piecewise(x[i-2], p[1],p[2],p[3],p[4],p[5]-dx))/(2*dx)
        end
    end
    
    
    j::Int = 1
    while j <= length(x) && x[j] <= p[end]
        J[j,2] = 1
        j += 1
    end
    
    while i <= length(x) && x[i] > p[end]
        J[i,3] = x[i]
        i += 1
    end
   
    while j <= length(x) && x[j] > p[end]
        J[j,4] = 1
        j += 1
    end
    
    J
end

Thanks for your help!

Also about the lazy structure, I was thinking at the level of the residuals array it could help reduce memory consumption when the input vector is large.

from lsqfit.jl.

pkofod avatar pkofod commented on June 28, 2024

Okay, so did you actually profile this? I could imagine re-allocating that Jacobian all the time could be expensive. This is something we allow for in Optim and NLsolve: just pre-allocate that Jacobian once and reuse it. I should add that possibility to LsqFit as well. What happens if you check out the master branch and use

curve_fit(f, p0; autodiff=:forward)

?

from lsqfit.jl.

Magalame avatar Magalame commented on June 28, 2024

Indeed I did. Reusing the Jacobian would probably be great, and actually the curve_fit function accounts for 99% of the memory allocation of the whole program.

With automatic differenciation I get an error:
MethodError: no method matching Float64(::ForwardDiff.Dual{ForwardDiff.Tag{getfield(NLSolversBase, Symbol("#ff!#1")){getfield(LsqFit, Symbol("#f#10")){typeof(piecemodel),Array{Float64,1},Array{Float64,1}}},Float64},Float64,5}) ...

It breaks when it comes to fitting the piecewise function, it'd make sense that auto differentiation doesn't support functions that aren't differentiable on the entire real axis.

I then tried using autodiff only for the linear fits, but it makes the memory allocation skyrocket (from 25GB to 190GB) and it largely slows down the execution (~4.7 times slower).

from lsqfit.jl.

pkofod avatar pkofod commented on June 28, 2024

It's actually complaining about your use of ::Float64 because ForwardDiff wants to pass in a special number type (that's the Dual part). You just need to make that's parametric. Instead of

function f(x)
    cache = Array{Float64, 3}(undef, 3, 3, 3)
end

You'd put

function f(x::AbstractArray{T}) where T
    cache = Array{T, 3}(undef, 3, 3, 3)
end

or

function f(x)
    cache = Array{eltype(x), 3}(undef, 3, 3, 3)
end

from lsqfit.jl.

pkofod avatar pkofod commented on June 28, 2024

Until I get around to enabling inplace-operations, you can use a closure or a callable type.

Xdata = ...
p0 = ...

struct LinJacobian{T}
   J::T
end
LinJacobian(x, p) = LinJacobian(zeros(eltype(x), length(x), length(p))

function (lj::LinJacobian)(x,p)
    J = lj.J
    J[:,1] .= x  #dmodel/dp[1]
    J[:,2] .= 1  dmodel/dp[2]
    J
end


struct PieceJacobian{T}
    J::T
end
PieceJacobian(x, p) = PieceJacobian(zeros(eltype(x),length(x), length(p)))

function (pj::PieceJacobian)(x, p)
    J = pj.J
    dx = x[2]-x[1]
    
    i::Int = 1
    while i <= length(x) && x[i] <= p[end]
        J[i,1] = x[i]
        i += 1
    end
    
    if x[end] + dx >= p[end] #otherwise no x[end] dependence
        if i <= length(x) #checks that we are within data range
            J[i,5] = (piecewise(x[i], p[1],p[2],p[3],p[4],p[5]+dx)-piecewise(x[i], p[1],p[2],p[3],p[4],p[5]-dx))/(2*dx)
        end
        if i-1 >= 1
            J[i-1,5] = (piecewise(x[i-1], p[1],p[2],p[3],p[4],p[5]+dx)-piecewise(x[i-1], p[1],p[2],p[3],p[4],p[5]-dx))/(2*dx)
        end
        if i-2 >= 1
            J[i-2,5] = (piecewise(x[i-2], p[1],p[2],p[3],p[4],p[5]+dx)-piecewise(x[i-2], p[1],p[2],p[3],p[4],p[5]-dx))/(2*dx)
        end
    end
    
    
    j::Int = 1
    while j <= length(x) && x[j] <= p[end]
        J[j,2] = 1
        j += 1
    end
    
    while i <= length(x) && x[i] > p[end]
        J[i,3] = x[i]
        i += 1
    end
   
    while j <= length(x) && x[j] > p[end]
        J[j,4] = 1
        j += 1
    end
    
    J
end

and then you create

jacobianlin = LinJacobian(Xdata, p0)
jacobianpiece = PieceJacobian(Xdata, p0)

and pass those to curve_fit. I think it'll be a big performance improvement.

from lsqfit.jl.

Magalame avatar Magalame commented on June 28, 2024

It seems like the slow down I observed from switching to autodiff is actually due to moving to master alone. If I go back to the initial version of the code and stay on master, the code is as slow. I should have been more careful.

Thanks for the tip on callable types! I think I'm not using it correctly though:

> jacobianlin = LinJacobian(xdata, p0)
> fit = curve_fit(linmodel, jacobianlin, xdata, ydata,p0)
MethodError: no method matching curve_fit(::typeof(linmodel), ::LinJacobian{Array{Float64,2}}, ::Array{Float64,1}, ::Array{Float64,1}, ::Array{Float64,1})

from lsqfit.jl.

pkofod avatar pkofod commented on June 28, 2024

Ah, I see. This is because curve_fit specifies that they have to be <:Function - that is something we need to change. You can probably do

> jacobianlin = LinJacobian(xdata, p0)
> fit = curve_fit(linmodel, (x,p)->jacobianlin(x,p), xdata, ydata,p0)

such that it's an anonymous function instead.

It bothers me that you say that master is slower than the official version. That is something I have to investigate. Thanks.

I don't know if you've noticed, I'm trying to get this package up to speed again so I can tag a new version :)

from lsqfit.jl.

pkofod avatar pkofod commented on June 28, 2024

One thing I did do was to up the number of maxIter so maybe it just capped out at 100 all the time before, and now it runs for longer?

from lsqfit.jl.

Magalame avatar Magalame commented on June 28, 2024

The maxIter certainly seems to play for something, it goes back to similar performances, running some small tests with maxIter=100:

31.835962 seconds (451.36 k allocations: 33.421 GiB, 12.11% gc time) # master
27.380634 seconds (438.64 k allocations: 25.311 GiB, 9.40% gc time) # release
21.280030 seconds (423.66 k allocations: 11.331 GiB, 7.31% gc time) # release + callable type

The run time has some variability, but memory consumption/allocations /gc time are about exactly constant.
The callable type is a clear improvement, particularly memory-wise, thanks!

Also thanks a lot for working on this project, curve fitting is hugely important in the field I work in! If you need any help with anything, I'll be glad to contribute.

from lsqfit.jl.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.