algorithmsbooks / optimization Goto Github PK
View Code? Open in Web Editor NEWErrata for Algorithms for Optimization book
Errata for Algorithms for Optimization book
Errata refers to algorithm 4.4 on page 74
I might be completely missing something but in page 160 of the printed book, the last paragraph says it's desirable to avoid equation (9.8) due to a singularity at
In eq 8.3 in section 8.2, defining the set from which the lower triangular components of L are sampled, the ere is no description of the {+1, + 2....-1} constants that appear. this pattern if left undefined could be any positive integer , z, added to -sqrt(alpha^(k)), e.g. -sqrt(alpha^(k)) + z, or subtracted from sqrt(alpha^(k)), e.g. sqrt(alpha^(k)) - z.
I was browsing your book and came across this minor error.
section 10.3, p171: You are not dividing by c_n when writing x_n as function of the other x. For brevity, you can define {c'}_i := c_i / c_n. Then you'll have x_n = - {c'}1 x_1 - {c'}2 x_2 - ... - {c'}{n-1} x{n-1}.
Cheers!
The final two lines in the for loop of Algorithm 10.2 augmented_lagrange_method on page 183 should be swapped. That is, the final two lines should be:
\lambda -= \rho * h(x)
\rho *= \gamma
The Lagrange multiplier estimate \lambda should be updated before the penalty factor \rho is increased.
I have a question regarding the notation used in the Wolfe conditions. The armijo's condition is also written as f(x^(k+1)) <= f(x^(k))+ betaalphagrad_f(x^(k))^T* d^(k) and this is consistent with algorithm 4.2. But the first thing that comes to my mind when reading nabla_(d^(k)) f(x^(k)), in equation (4.4), is as the gradient of f(.) wrt d^(k). Is this equivalent to the dot product of grad_f(x^k) and d^(k)?
Thanks!
The equations (18.17) and (18.18) on page 325 are identical. As I cannot find an error in the surrounding transformations, I suspect a simple copy-paste duplication.
As far as I can tell, I have the first printing (at least there is no mention of printing at all).
Awesome book btw, I'm positively blown away by the very approachable presentation of the material!
Also, 👍 on using Julia for the examples, that definitely was influencing my choice, as I wanted to get a feel for the language for quite some time.
The only thing I found myself wishing for was some material on the relative merits/advantages/drawbacks of the different approaches for different applications (maybe as a final chapter?). I'm not an expert in the field like you authors are, and if I'm doing optimisation in the future some basis for deciding which approach to investigate first would surely come in handy!
Page 175, figure 10.7: Not an error but since (10.23) motivated as lower bound for (10.22), it might be more appropriate to visualize not the plain linear function in Fig. 10.7 but rather a linear function truncated to zero for \mu<0.
I am using Julia 1.4 with ] status
for Convex.jl as [f65535da] Convex v0.13.3
I tried to code up and run Algorithm 4.4 as written in the second printing, but it throws an argument error for the solve!
function. Apparently the Convex.jl API changed in a recent version. From ? solve!
, both methods of solve!
now require an optimizer
:
# 2 methods for generic function "solve!":
[1] solve!(problem::Problem{T}, optimizer::MathOptInterface.ModelLike; check_vexity, verbose, warmstart) where T in Convex at /Users/rba/.julia/packages/Convex/v9Ehz/src/solution.jl:204
[2] solve!(problem::Problem{T}, optimizer; kwargs...) where T in Convex at /Users/rba/.julia/packages/Convex/v9Ehz/src/solution.jl:192
They have updated the syntax to solve!(p, solver)
according to this commit. The suggested solver interface using SCS.jl is given in Convex.jl's news.
I changed the following in Algorithm 4.4 and everything seems to work as expected.
using Convex
to using Convex, SCS
solve!(p)
to solve!(p, SCS.Optimizer(verbose=false))
this is very minor! the final sentence of p. 418 mentions using "the supertype
and subtype
functions", however subtype
is undefined -- it should be subtypes
.
(pardon the spam of minor issues)
Chapter 8, page 140, equation (8.23): should \delta^(i) be boldface on the right hand side of the "for" in eq. (8.23)?
I think we have the second argument of univariate normal distributions as the variance, not standard deviation. Hence, I think eq. 16.5 should have hat sigma squared. Do you agree @tawheeler ?
In alg. 7.4 on line 102, the second instance of line_search
should probably have x' instead of x. It doesn't matter for fig. 7.4 since we are doing pretty much exact line search, but I think the real algorithm should probably do the search starting from the end point. Right @tawheeler ?
My sincere apologies if I've simply missed this somewhere, but is there a github repo or other way to access the source code helpfully printed throughout the book without copy-pasting from the PDFs?
I think last terms in equations 10.16 and 10.17 have to be with minus sign
The second to last sentence of Sec. 7.6.2 says: "The lower bound for each hyper-rectangle can be computed based on the longest side length and center value." However, I think it should be the distance to the vertex (the diagonal distance). See Def. 4.1 in the original 1993 paper, which says "let d_i denote the distance from the center point to the vertices". Interestingly, fig. 7.12 in the book correctly shows the lower bound decaying as it approaches the vertices.
Hi!
I noticed a couple of minor issues (formatting etc):
x_7
and x_9
) has too much space before the equals sign, probably due to a stretching hspace or something like that.\mathcal
instead of \mathbb
, is what I would suspect in Latex.c^{(d)}
argumentI have the second printing. In the upper bound in Equation 3.14, and with respect to how the lower bound is computed, the leading sign is flipped, but the y_min and y^(i) terms inside the parentheses have also been flipped in sign, so the lower and upper bounds appear to be equal.
Based on the algorithm, I think it should be:
$ \left[ x^{(i)} - \frac{1}{\ell}(y_{min}-y^{(i)}), x^{(i)} + \frac{1}{\ell}(y_{min}-y^{(i)}) \right] $
Figure 3.6, page 38:
In the caption it is claimed that for n queries I_{n-1} = F_2 I_n. In agreement with (3.1), F_2=1, therefore, it must hold that I_{n-1} = I_n.
However, in the figure n=5 and it is explicitly written that I_4 = 2 I_5.
Related to the caption – I think all the directions in the positive spanning sets in Example 8.1 have unit l-infinity norms rather than unit l-one norms, right?
This is related to the Julia code (both in the book and in the corresponding Jupyter notebook): in the chapter on descent methods, there is a code snippet:
struct GradientDescent <: DescentMethod
α
end
init!(M::GradientDescent, f, ∇f, x) = M
function step!(M::GradientDescent, f, ∇f, x)
α, g = M.α, ∇f(x)
return x - α*g
end
The step!
function does not seem to affect the input arguments, why is there the exclamation mark in the name then? True, it is just a matter of recommended style but while trying to understand the code I find this issue confusing.
Page 248, Algorithm 13.11.
Both the algorithm and the comment in the margin/caption (on the bound
for the kth prime) appear correct but, unless I am very confused, the
correctness of the algorithm does not follow very directly from the
bound. Since the bound is valid for k > 6, the algorithm should be
testing n (and not the bound(n)) against 6.
Also, I believe the bound is valid for k=6 as well, based on ["Sharper
bounds...", P. Dusart, 1998].
The implementation will fail for m = 1, but that is a somewhat
perverse input.
On page 35 it is stated "f is monotonically decreasing for x ≤ x∗ and monotonically increasing for x ≥ x∗" for unimodal function. Isn't it monotonically increasing for x ≤ x∗ as well as for x ≥ x∗ ?
The solution reads "Unfortunately, no known analytical solutions for fitting multivariate normal distributions exist." Shouldn't this be "mixture distributions"?
It looks like in eqs. 4.7-4.8, the use or non-use of indices is inconsistent. I think they can just all be dropped (e.g. as shown in the plots). Acknowledge Remy Zawislak. Assigning Tim to confirm.
Exercise 21.4 specifiex theta_max as 10 rad, but I think it should be 10 degrees instead, given the wing-deformation context.
I know very little about airplane design but I don't think I want to be in a plane whose wing deforms 10 radians. :-) Besides, 10 degrees gives an answer that matches the solution in Appendix D.
On page 141, Eq 8.30 uses c_c but right after, in Eq 8.31, it is referred to as c_Σ.
The 5th sentence on pg 287 is missing a period. "The likelihood of the data is the probability that the observed points were drawn from the model Equivalently, we can maximize ..."
First printing, pg. 313, example 17.3: shouldn't the function be f(x,z)=x^2 + 6*exp(-x^2)? The value at x=0 is 6.
Alg. 8.2 seems to have an issue with the construction of D. There seems to be some no-ops going on here:
D = L[randperm(n),:]
D = L[:,randperm(n)]
The code for Algorithm B.1 on page 426 seems to be missing a "exp(1)" after the last + sign.
Algorithm 3.2 on page 39.
Inside the body of the function fibonacci_search
, the φ
parameter is used on the second line but it is neither defined in the code nor supplied as an input argument to the function.
MvNormal
is called without calling using Distributions
before anywhere in the chapter. Therefore, will result in UndefVarError
.
This problem minimizes sin(x) over an interval. So the minimizer is either an endpoint or a stationary point. In fact, x* = -pi/2, p* = -1.
In the example's plot, u starts from 0.1. If we include u = 0, we would recover x*.
A slightly more interesting example may be consider the constraint x^2 <= 1. Now the x* is found at a nontrivial u. (u = cos(1) / 2, if I got this correct.)
I think the line 13 in the algorithm 8.1 miss the "*" operator.
σ = M.σ(M.k)
Thanks for the book. I have a simple entry-level question. How to assemble a FactorTable correctly?
julia-1.5.4:
struct Variable
name::Symbol
m::Int # number of possible values
end
const Assignment = Dict{Symbol,Int}
const FactorTable = Dict{Assignment,Float64}
struct Factor
vars::Vector{Variable}
table::FactorTable
end
B = Variable(:b, 2);
FactorTable((b = 1,) => 0.99, (b = 2,) => 0.01);
Code on page 26 "chapter 2 representation" throw exception.
ERROR: LoadError: MethodError: Cannot `convert` an object of type NamedTuple{(:b,), Tuple{Int64}} to an object of type Dict{Symbol, Int64}
Closest candidates are:
convert(::Type{T}, ::T) where T<:AbstractDict at abstractdict.jl:520
convert(::Type{T}, ::AbstractDict) where T<:AbstractDict at abstractdict.jl:522
convert(::Type{T}, ::T) where T at essentials.jl:205
...
It reads "The inner maximization in the dual problem...". Do you mean the inner minimization?
The x in Eq. 9.1 is boldface, indicating a vector, but I think it should not be boldface. The x_i's are indexed components of the vector - each drawn from a scalar distribution.
Algorithm B.1 Ackley's function is missing an exp(1)
in the return statement.
This could potentially be a Unicode rendering issue if you're using the \euler
character in Julia; I've run into this specifically with \euler
and Tufte-algorithms class before.
On page 76 in Chapter 5, the citation for the Nesterov momentum paper should have k²
(superscript 2) instead of k₂
(subscript 2).
Corrected citation:
Y. Nesterov, ``A Method of Solving a Convex Programming Problem with Convergence Rate O(1/k²),'' Soviet Mathematics Doklady, vol. 27, no. 2, pp. 543–547, 1983.
Chapter 8, page 125, the use of a footnote superscript makes it look like an exponent in the inline equation:
...typically a decreasing sequence σ⁽ᵏ⁾ such as 1/k.³
This makes it look like it's 1/k³ (cubed), when it's just 1/k
Hello,
In the final constraint in the optimization, all of the coupling variables c are updated with the current y values, except for c^{(d)}. Should a y replace the c?
There is a typo/formatting error in example 18.6, page 336, where the vector q is given as
q = [0.577, 0.4500.450, 0.417, 0.417]
when I think it should read
q = [0.577, 0.450, 0.450, 0.417, 0.417]
I did not do the calculations manually to confirm, and sadly have not found time (yet?) to implement the Julia code for the calculations.
As far as I can tell, I have the first printing (at least there is no mention of printing at all).
Chapter 7, page 122, example 7.2 for DIRECT: the values for the center
column in the table show 0.25 and 0.75, when the actual centers are at 1/6 and 5/6, respectively (unless I'm misinterpreting something).
e.g., the center for interval 1 says [0.25, 0.50] when the actual center is [1/6, 1/2].
There is an issue in the step from eq. 16.11 to 16.12.
Hence, the first appearance of sigma hat in eq. 16.12 should be squared. This correction should be propagated to Alg. 16.2. The sigma on line 4 should be squared.
Thanks to Philippe Weingertner! @tawheeler , if you agree, I can implement this fix.
I just purchased your book and it is fantastic. I am working through it now.
I real the errata : p. 47: Change for clarity: updated algorithm to use y_min and replaced j with i. (thanks to Ross Alexander)
Can you please post the update for the listing for algorithm 3.5 of p. 47? I not sure what the changes are supposed to be.
Hi! I don't know if it has been signaled and corrected yet because it might be a small typo (doesn't appear in the errata), but isn't there a letter o instead of number 0 at the end of 17.3.4 page 316?
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.