Hi, Curve_fit is currently the bottleneck of a data analysis code I

Hi <a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="

Execution in parallel about lsqfit.jl HOT 10 CLOSED

julianlsolvers commented on June 28, 2024

Execution in parallel

from lsqfit.jl.

Comments (10)

pkofod commented on June 28, 2024

Hi @Magalame !

Can you maybe elaborate on where the bottle neck is? Is your suspicion that if you could thread the evaluation of the model to calculate the residuals on multiple cores, your performance issues would be mitigated?

If you can maybe post some code that illustrates the issue, we can try to experiment / locate the issue.

Ps: is support for lazy structures (like iterators, or lazy arrays) planned?

What kind of lazy arrays and where in the chain? Do you mean as the parameter vector, or the residual vector?

from lsqfit.jl.

Magalame commented on June 28, 2024

Hi!

My project basically consists of a simulation of a physical model, and then extracting some properties from it. Running the simulation itself is fine: extracting the properties I want is the bottle neck, and I need to fit a curve in order to do so.

I have fairly large input/output arrays (~ 1 million cells), and applying curve_fit to them constitutes about 72% of the run time. I got some good improvements by providing the analytical jacobian, and then optimizing it as much as I could.

Here's the relevant portion of the code:

function numerical(ttL::Array{Complex{Float64},1},params::Params)
    
    
    GtimeL2,ttn = fourier1d(ttL,params.dE)
    tind::Array{Int64,1}=findall(ttn .> 0)
    
    tfft::Array{Float64,1} = ttn[tind]
    Gfft::Array{Complex{Float64},1} = GtimeL2[tind]
    
    logabs::Array{Float64} = log.(abs.(Gfft))
    
    taulin = 0
    durbinlin = 0
    
    let 
        top =  logabs[1]
        down = logabs[end]
        a0 = (down-top)/(tfft[end]-tfft[1])
        b0 = logabs[1]
        fit = curve_fit(linmodel, jacobianlin, tfft, logabs,[a0,b0])   #This takes about 12% of the run time
        taulin = fit.param
        durbinlin = durbinwatson(fit.resid)
        
    end
   
    taupiece = 0
    durbinpiece = 0
   
    let
        
        # [...] lots of preliminary computations go there, mainly to obtain a good estimate on the guess parameters 

        fit = curve_fit(piecemodel, jacobianpiece, tfft, logabs,[a,b,m,c,x0]) #Takes about 50% of run time
        
        taupiece = fit.param
        durbinpiece = durbinwatson(fit.resid)
        
        end
    
    #[...] some postprocessing computations
      
    
end

The functions fitted (a linear function, and a piecewise composed of two linear functions) and their jacobian are the following:

@. linmodel(x, p) = p[1]*x+p[2]

function jacobianlin(x,p)
    J = Array{Float64}(undef, length(x),length(p))
    J[:,1] .= x    #dmodel/dp[1]
    J[:,2] .= 1  #dmodel/dp[2]
    J
end

@. piecemodel(x,p) = piecewise(x,p..)

function piecewise(x::Float64,a,b,m,c,x0)
    if x <= x0
        b + a*x;
    else
        c + m*x;
    end
end

function jacobianpiece(x,p)::Array{Float64,2}
    J::Array{Float64,2} = zeros(Float64,length(x),length(p))
    dx = x[2]-x[1]
    
    i::Int = 1
    while i <= length(x) && x[i] <= p[end]
        J[i,1] = x[i]
        i += 1
    end
    
    if x[end] + dx >= p[end] #otherwise no x[end] dependence
        if i <= length(x) #checks that we are within data range
            J[i,5] = (piecewise(x[i], p[1],p[2],p[3],p[4],p[5]+dx)-piecewise(x[i], p[1],p[2],p[3],p[4],p[5]-dx))/(2*dx)
        end
        if i-1 >= 1
            J[i-1,5] = (piecewise(x[i-1], p[1],p[2],p[3],p[4],p[5]+dx)-piecewise(x[i-1], p[1],p[2],p[3],p[4],p[5]-dx))/(2*dx)
        end
        if i-2 >= 1
            J[i-2,5] = (piecewise(x[i-2], p[1],p[2],p[3],p[4],p[5]+dx)-piecewise(x[i-2], p[1],p[2],p[3],p[4],p[5]-dx))/(2*dx)
        end
    end
    
    
    j::Int = 1
    while j <= length(x) && x[j] <= p[end]
        J[j,2] = 1
        j += 1
    end
    
    while i <= length(x) && x[i] > p[end]
        J[i,3] = x[i]
        i += 1
    end
   
    while j <= length(x) && x[j] > p[end]
        J[j,4] = 1
        j += 1
    end
    
    J
end

Thanks for your help!

Also about the lazy structure, I was thinking at the level of the residuals array it could help reduce memory consumption when the input vector is large.

from lsqfit.jl.

pkofod commented on June 28, 2024

Okay, so did you actually profile this? I could imagine re-allocating that Jacobian all the time could be expensive. This is something we allow for in Optim and NLsolve: just pre-allocate that Jacobian once and reuse it. I should add that possibility to LsqFit as well. What happens if you check out the master branch and use

curve_fit(f, p0; autodiff=:forward)

from lsqfit.jl.

Magalame commented on June 28, 2024

Indeed I did. Reusing the Jacobian would probably be great, and actually the curve_fit function accounts for 99% of the memory allocation of the whole program.

With automatic differenciation I get an error:
MethodError: no method matching Float64(::ForwardDiff.Dual{ForwardDiff.Tag{getfield(NLSolversBase, Symbol("#ff!#1")){getfield(LsqFit, Symbol("#f#10")){typeof(piecemodel),Array{Float64,1},Array{Float64,1}}},Float64},Float64,5}) ...

It breaks when it comes to fitting the piecewise function, it'd make sense that auto differentiation doesn't support functions that aren't differentiable on the entire real axis.

I then tried using autodiff only for the linear fits, but it makes the memory allocation skyrocket (from 25GB to 190GB) and it largely slows down the execution (~4.7 times slower).

from lsqfit.jl.

pkofod commented on June 28, 2024

It's actually complaining about your use of ::Float64 because ForwardDiff wants to pass in a special number type (that's the Dual part). You just need to make that's parametric. Instead of

function f(x)
    cache = Array{Float64, 3}(undef, 3, 3, 3)
end

You'd put

function f(x::AbstractArray{T}) where T
    cache = Array{T, 3}(undef, 3, 3, 3)
end

function f(x)
    cache = Array{eltype(x), 3}(undef, 3, 3, 3)
end

from lsqfit.jl.

pkofod commented on June 28, 2024

Until I get around to enabling inplace-operations, you can use a closure or a callable type.

Xdata = ...
p0 = ...

struct LinJacobian{T}
   J::T
end
LinJacobian(x, p) = LinJacobian(zeros(eltype(x), length(x), length(p))

function (lj::LinJacobian)(x,p)
    J = lj.J
    J[:,1] .= x  #dmodel/dp[1]
    J[:,2] .= 1  dmodel/dp[2]
    J
end


struct PieceJacobian{T}
    J::T
end
PieceJacobian(x, p) = PieceJacobian(zeros(eltype(x),length(x), length(p)))

function (pj::PieceJacobian)(x, p)
    J = pj.J
    dx = x[2]-x[1]
    
    i::Int = 1
    while i <= length(x) && x[i] <= p[end]
        J[i,1] = x[i]
        i += 1
    end
    
    if x[end] + dx >= p[end] #otherwise no x[end] dependence
        if i <= length(x) #checks that we are within data range
            J[i,5] = (piecewise(x[i], p[1],p[2],p[3],p[4],p[5]+dx)-piecewise(x[i], p[1],p[2],p[3],p[4],p[5]-dx))/(2*dx)
        end
        if i-1 >= 1
            J[i-1,5] = (piecewise(x[i-1], p[1],p[2],p[3],p[4],p[5]+dx)-piecewise(x[i-1], p[1],p[2],p[3],p[4],p[5]-dx))/(2*dx)
        end
        if i-2 >= 1
            J[i-2,5] = (piecewise(x[i-2], p[1],p[2],p[3],p[4],p[5]+dx)-piecewise(x[i-2], p[1],p[2],p[3],p[4],p[5]-dx))/(2*dx)
        end
    end
    
    
    j::Int = 1
    while j <= length(x) && x[j] <= p[end]
        J[j,2] = 1
        j += 1
    end
    
    while i <= length(x) && x[i] > p[end]
        J[i,3] = x[i]
        i += 1
    end
   
    while j <= length(x) && x[j] > p[end]
        J[j,4] = 1
        j += 1
    end
    
    J
end

and then you create

jacobianlin = LinJacobian(Xdata, p0)
jacobianpiece = PieceJacobian(Xdata, p0)

and pass those to curve_fit. I think it'll be a big performance improvement.

from lsqfit.jl.

Magalame commented on June 28, 2024

It seems like the slow down I observed from switching to autodiff is actually due to moving to master alone. If I go back to the initial version of the code and stay on master, the code is as slow. I should have been more careful.

Thanks for the tip on callable types! I think I'm not using it correctly though:

> jacobianlin = LinJacobian(xdata, p0)
> fit = curve_fit(linmodel, jacobianlin, xdata, ydata,p0)
MethodError: no method matching curve_fit(::typeof(linmodel), ::LinJacobian{Array{Float64,2}}, ::Array{Float64,1}, ::Array{Float64,1}, ::Array{Float64,1})

from lsqfit.jl.

pkofod commented on June 28, 2024

Ah, I see. This is because curve_fit specifies that they have to be <:Function - that is something we need to change. You can probably do

> jacobianlin = LinJacobian(xdata, p0)
> fit = curve_fit(linmodel, (x,p)->jacobianlin(x,p), xdata, ydata,p0)

such that it's an anonymous function instead.

It bothers me that you say that master is slower than the official version. That is something I have to investigate. Thanks.

I don't know if you've noticed, I'm trying to get this package up to speed again so I can tag a new version :)

from lsqfit.jl.

pkofod commented on June 28, 2024

One thing I did do was to up the number of maxIter so maybe it just capped out at 100 all the time before, and now it runs for longer?

from lsqfit.jl.

Magalame commented on June 28, 2024

The maxIter certainly seems to play for something, it goes back to similar performances, running some small tests with maxIter=100:

31.835962 seconds (451.36 k allocations: 33.421 GiB, 12.11% gc time) # master
27.380634 seconds (438.64 k allocations: 25.311 GiB, 9.40% gc time) # release
21.280030 seconds (423.66 k allocations: 11.331 GiB, 7.31% gc time) # release + callable type

The run time has some variability, but memory consumption/allocations /gc time are about exactly constant.
The callable type is a clear improvement, particularly memory-wise, thanks!

Also thanks a lot for working on this project, curve fitting is hugely important in the field I work in! If you need any help with anything, I'll be glad to contribute.

from lsqfit.jl.

Execution in parallel about lsqfit.jl HOT 10 CLOSED

Comments (10)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent