I finally think I have figured out the source the assertion errors I've encountered us

<a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="/us

I know that Ipopt has explicitly checks in its line search for <code class="notranslat

Check NaN-ness of gradients in hz_linesearch about optim.jl HOT 16 CLOSED

julianlsolvers commented on August 24, 2024

Check NaN-ness of gradients in hz_linesearch

from optim.jl.

Comments (16)

timholy commented on August 24, 2024

@tcovert, there have been some big changes to the linesearch code in 3131af4, it's possible this has been fixed if you check out the master branch of Optim. Any chance you can check? (Does your error happen frequently enough to be reproducible?)

from optim.jl.

tcovert commented on August 24, 2024

Is the master branch what I get when I Pkg.update()? If so, then no, the problem is still there - I get the same error whenever my calculated gradient includes Inf's of opposite signs, or any NaNs.

from optim.jl.

johnmyleswhite commented on August 24, 2024

You need Pkg.checkout, not Pkg.update.

from optim.jl.

tcovert commented on August 24, 2024

Still getting the same error, for (apparently) the same reason - gradients with NaN's in them. judging by the scale of the non-NaN gradients (10^200), I'm guessing that "c" gets to be really large for some reason.

from optim.jl.

timholy commented on August 24, 2024

Thanks for checking! What line number are you getting the error for now? Have you tried implementing a fix to see if it works?

from optim.jl.

tcovert commented on August 24, 2024

Haven't tried a fix for myself yet but might give it a try later.

Here is an example that happens when I use l_bfgs instead of full bfgs on a particular optimization problem (not quite a least squares minimization, but pretty darn close - panel data random effects negative log-likelihood minimization). Why bfgs works fine while l_bfgs doesn't is a separate mystery, but I think this error should not be happening either way:

ERROR: assertion failed: lsr.slope[ib] < 0
in bisect! at /Users/tcovert/.julia/v0.3/Optim/src/linesearch/hz_linesearch.jl:561
in hz_linesearch! at /Users/tcovert/.julia/v0.3/Optim/src/linesearch/hz_linesearch.jl:261
in hz_linesearch! at /Users/tcovert/.julia/v0.3/Optim/src/linesearch/hz_linesearch.jl:188
in l_bfgs at /Users/tcovert/.julia/v0.3/Optim/src/l_bfgs.jl:165
in optimize at /Users/tcovert/.julia/v0.3/Optim/src/optimize.jl:113
in SOMEFUNCTION at /PATH_TO_CODE/SOMECODE.jl:1273
in include at /usr/local/Cellar/julia/0.3.0/lib/julia/sys.dylib
in include_from_node1 at loading.jl:128
in process_options at /usr/local/Cellar/julia/0.3.0/lib/julia/sys.dylib
in _start at /usr/local/Cellar/julia/0.3.0/lib/julia/sys.dylib (repeats 2 times)
while loading /PATH_TO_CODE/SOME_OTHER_CODE.jl, in expression starting on line 36

I've obscured some of the above (i.e. SOMEFUNCTION, PATH_TO_CODE) for privacy reasons but the base Julia and Optim.jl specific stuff is still there. If you really think it will help I can include it.

Again, the assertion error is happening because the function value at the line search result point "ib" is finite and indeed higher than the candidate point, but the gradient contains NaN entries (meaning that dphic is not >= 0).

from optim.jl.

mlubin commented on August 24, 2024

I know that Ipopt has explicitly checks in its line search for NaNs and Infs, and if detected it will cut the step size and try again.

from optim.jl.

tcovert commented on August 24, 2024

Cool, is it looking for NaNs in the function value or in the gradients? If its not looking for NaNs/Infs in the gradients then maybe what I'm asking for is unusual and not necessarily appropriate optimizer behavior.

from optim.jl.

johnmyleswhite commented on August 24, 2024

I haven't said much about this yet because I don't have a good intuition for how we should handle this situation, but what would we do after inserting a check for NaN values in the gradient? My intuition is that a single non-finite value in a gradient is cause for a fatal error -- except perhaps in cases where non-finiteness is used to encode constraint violations.

from optim.jl.

tcovert commented on August 24, 2024

I can't speak for optimization problems in general, but at least for the optimization problem I am working with, Optim.jl gets to this point because it is trying to take too large of a step. I guess the step is small enough that my objective function still produces a finite value, but its big enough that the gradients are non-finite.

As it stands, when large steps result in non-finite function values, the line search routine scales back the step size (i.e., the "while !isfinite(phic) && iterfinite < iterfinitemax" block in hz_linesearch code). Similarly, I would propose that when a step results in non-finite gradient evaluation (i.e., any Inf or NaN), the line search should also scale back until the gradient is finite.

Why it is the case that the line search routine wants to start searching from such a big step (in my problem, anyway) is still somewhat of a mystery to me. I understand that the H&Z paper does suggest starting from the optimal step size from the previous iteration, but I have no intuition about (mathematically) why that would be a good idea.

from optim.jl.

mlubin commented on August 24, 2024

@tcovert, is your function returning a finite value while the gradient is returning NaNs? That seems pretty strange. Anyway functions should be interpreted as having an implicit domain of the values at which they return a finite value. I think it's the right choice for numerical stability in most cases to just roll back the line search when nonfinite values are detected instead of blowing up in front of the user.

from optim.jl.

tcovert commented on August 24, 2024

Yes, it does have a finite value even when some gradient entries are NaN or +/-Inf. It is indeed strange, but I can construct (obviously cooked up) examples where it happens, at least to floating point precision. For example: suppose you have a cumulative distribution function F that is (nearly) a perfect step function, with a step near a point X (i.e., the CDF of a normal random variable with mean X, but extremely small variance). The domain of F is obviously in the unit interval, but around X, the derivative of F should be very large, and depending on how small a variance you choose, the computer might call it +Inf.

Now imagine trying to minimize over (x,y) the bivariate function G(x,y) = F(x) + .9 * (1 - F(y)). Around x = X and y = X, the gradient is going to be [+Inf,-Inf] even though the function is obviously well defined and finite. Then the gradient of the "phi" function in a line search routine is going to be dot([+Inf,-Inf],[X,X]) = NaN.

The function I'm optimizing doesn't work this way, but the principle of a finite function value + NaN gradient still applies. In my case the objective function is multinomial logit negative log-likelihood where the logit "deltas" (the stuff I pass to logsumexp) are definitely finite at the parameter values that generate this error. Instead, whats happening is that the deltas are big (not too big to generate a non-finite logsumexp, so the function value is still fine), but big enough that the gradient of the logit likelihood is +/- infinity, depending on which parameter I'm looking at.

from optim.jl.

timholy commented on August 24, 2024

There are actually a lot of checks for finiteness in hz_linesearch, with fallbacks to bisection. But if you have finite function value and non-finite gradient values, we might need more checks.

I just pushed something to master that stands a chance of fixing this. Please do another Pkg.checkout("Optim"). If this doesn't fix it, frankly your only real hope is to (1) share your code, or better (2) fix it yourself and submit a PR. There's nothing like having a test that generates a problem for being sure you have the right fix.

from optim.jl.

tcovert commented on August 24, 2024

That seems to have fixed it, at least on the small problems I've tested it on. Thanks a lot for working through this with me!

from optim.jl.

timholy commented on August 24, 2024

Glad to hear it!

@johnmyleswhite, I'll let you decide when to tag a new verison. But when you do, it should presumably be a minor-version update, not a patch, since it's a breaking change.

from optim.jl.

timholy commented on August 24, 2024

Closed by 6195116

from optim.jl.

Check NaN-ness of gradients in hz_linesearch about optim.jl HOT 16 CLOSED

Comments (16)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent