Comments (16)
@tcovert, there have been some big changes to the linesearch code in 3131af4, it's possible this has been fixed if you check out the master branch of Optim. Any chance you can check? (Does your error happen frequently enough to be reproducible?)
from optim.jl.
Is the master branch what I get when I Pkg.update()? If so, then no, the problem is still there - I get the same error whenever my calculated gradient includes Inf's of opposite signs, or any NaNs.
from optim.jl.
You need Pkg.checkout, not Pkg.update.
from optim.jl.
Still getting the same error, for (apparently) the same reason - gradients with NaN's in them. judging by the scale of the non-NaN gradients (10^200), I'm guessing that "c" gets to be really large for some reason.
from optim.jl.
Thanks for checking! What line number are you getting the error for now? Have you tried implementing a fix to see if it works?
from optim.jl.
Haven't tried a fix for myself yet but might give it a try later.
Here is an example that happens when I use l_bfgs instead of full bfgs on a particular optimization problem (not quite a least squares minimization, but pretty darn close - panel data random effects negative log-likelihood minimization). Why bfgs works fine while l_bfgs doesn't is a separate mystery, but I think this error should not be happening either way:
ERROR: assertion failed: lsr.slope[ib] < 0
in bisect! at /Users/tcovert/.julia/v0.3/Optim/src/linesearch/hz_linesearch.jl:561
in hz_linesearch! at /Users/tcovert/.julia/v0.3/Optim/src/linesearch/hz_linesearch.jl:261
in hz_linesearch! at /Users/tcovert/.julia/v0.3/Optim/src/linesearch/hz_linesearch.jl:188
in l_bfgs at /Users/tcovert/.julia/v0.3/Optim/src/l_bfgs.jl:165
in optimize at /Users/tcovert/.julia/v0.3/Optim/src/optimize.jl:113
in SOMEFUNCTION at /PATH_TO_CODE/SOMECODE.jl:1273
in include at /usr/local/Cellar/julia/0.3.0/lib/julia/sys.dylib
in include_from_node1 at loading.jl:128
in process_options at /usr/local/Cellar/julia/0.3.0/lib/julia/sys.dylib
in _start at /usr/local/Cellar/julia/0.3.0/lib/julia/sys.dylib (repeats 2 times)
while loading /PATH_TO_CODE/SOME_OTHER_CODE.jl, in expression starting on line 36
I've obscured some of the above (i.e. SOMEFUNCTION, PATH_TO_CODE) for privacy reasons but the base Julia and Optim.jl specific stuff is still there. If you really think it will help I can include it.
Again, the assertion error is happening because the function value at the line search result point "ib" is finite and indeed higher than the candidate point, but the gradient contains NaN entries (meaning that dphic is not >= 0).
from optim.jl.
I know that Ipopt has explicitly checks in its line search for NaN
s and Inf
s, and if detected it will cut the step size and try again.
from optim.jl.
Cool, is it looking for NaNs in the function value or in the gradients? If its not looking for NaNs/Infs in the gradients then maybe what I'm asking for is unusual and not necessarily appropriate optimizer behavior.
from optim.jl.
I haven't said much about this yet because I don't have a good intuition for how we should handle this situation, but what would we do after inserting a check for NaN values in the gradient? My intuition is that a single non-finite value in a gradient is cause for a fatal error -- except perhaps in cases where non-finiteness is used to encode constraint violations.
from optim.jl.
I can't speak for optimization problems in general, but at least for the optimization problem I am working with, Optim.jl gets to this point because it is trying to take too large of a step. I guess the step is small enough that my objective function still produces a finite value, but its big enough that the gradients are non-finite.
As it stands, when large steps result in non-finite function values, the line search routine scales back the step size (i.e., the "while !isfinite(phic) && iterfinite < iterfinitemax" block in hz_linesearch code). Similarly, I would propose that when a step results in non-finite gradient evaluation (i.e., any Inf or NaN), the line search should also scale back until the gradient is finite.
Why it is the case that the line search routine wants to start searching from such a big step (in my problem, anyway) is still somewhat of a mystery to me. I understand that the H&Z paper does suggest starting from the optimal step size from the previous iteration, but I have no intuition about (mathematically) why that would be a good idea.
from optim.jl.
@tcovert, is your function returning a finite value while the gradient is returning NaNs? That seems pretty strange. Anyway functions should be interpreted as having an implicit domain of the values at which they return a finite value. I think it's the right choice for numerical stability in most cases to just roll back the line search when nonfinite values are detected instead of blowing up in front of the user.
from optim.jl.
Yes, it does have a finite value even when some gradient entries are NaN or +/-Inf. It is indeed strange, but I can construct (obviously cooked up) examples where it happens, at least to floating point precision. For example: suppose you have a cumulative distribution function F that is (nearly) a perfect step function, with a step near a point X (i.e., the CDF of a normal random variable with mean X, but extremely small variance). The domain of F is obviously in the unit interval, but around X, the derivative of F should be very large, and depending on how small a variance you choose, the computer might call it +Inf.
Now imagine trying to minimize over (x,y) the bivariate function G(x,y) = F(x) + .9 * (1 - F(y)). Around x = X and y = X, the gradient is going to be [+Inf,-Inf] even though the function is obviously well defined and finite. Then the gradient of the "phi" function in a line search routine is going to be dot([+Inf,-Inf],[X,X]) = NaN.
The function I'm optimizing doesn't work this way, but the principle of a finite function value + NaN gradient still applies. In my case the objective function is multinomial logit negative log-likelihood where the logit "deltas" (the stuff I pass to logsumexp) are definitely finite at the parameter values that generate this error. Instead, whats happening is that the deltas are big (not too big to generate a non-finite logsumexp, so the function value is still fine), but big enough that the gradient of the logit likelihood is +/- infinity, depending on which parameter I'm looking at.
from optim.jl.
There are actually a lot of checks for finiteness in hz_linesearch
, with fallbacks to bisection. But if you have finite function value and non-finite gradient values, we might need more checks.
I just pushed something to master that stands a chance of fixing this. Please do another Pkg.checkout("Optim")
. If this doesn't fix it, frankly your only real hope is to (1) share your code, or better (2) fix it yourself and submit a PR. There's nothing like having a test that generates a problem for being sure you have the right fix.
from optim.jl.
That seems to have fixed it, at least on the small problems I've tested it on. Thanks a lot for working through this with me!
from optim.jl.
Glad to hear it!
@johnmyleswhite, I'll let you decide when to tag a new verison. But when you do, it should presumably be a minor-version update, not a patch, since it's a breaking change.
from optim.jl.
Closed by 6195116
from optim.jl.
Related Issues (20)
- What is Optim's mechanism for calculating the gradient? HOT 1
- [Feature request] Adding Hill Climbing HOT 7
- Docs Latex expressions displays as raw HOT 4
- JuMP Interface, Incorrect Status Returned HOT 5
- Mac nightly fails for some reason
- IPNewton : start is not an interior point HOT 1
- CUDA and Adam errors HOT 3
- Incorrect referencing to ipnewton_basics.ipynb in Nonlinear constrained optimization HOT 1
- `Adam` supported by `Fminbox`? HOT 2
- Add progress meter HOT 1
- Documentation Incorrect HOT 2
- Simple optimization failing HOT 1
- Not all convergence criteria mentioned in documentation HOT 2
- `eval` in OptimMOIExt fails to precompile HOT 2
- Bounds Error in Neural Network Training Process HOT 1
- Expose more optimisation parameters to users HOT 4
- Extended trace for Adam fails
- Univariate minimization given function and derivatives HOT 1
- Problem with f_calls_limit in Fminbox
- Convergence of SAMIN is converted to non-convergence.
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from optim.jl.