algorithmsbooks / optimization Goto Github PK

View Code? Open in Web Editor NEW

66.0 66.0 16.0 339 KB

Errata for Algorithms for Optimization book

optimization's People

Contributors

Stargazers

Watchers

Forkers

aerdely dannykappa kbmajeed tawheeler voquangtuong mrchanghao isboonto steven1123581321 pvasired aiyaher gmh5225 niuneo blenature ethicalsecurity-agency sharc007

optimization's Issues

Algorithm 4.4 is on page 64

Errata refers to algorithm 4.4 on page 74

I might be completely missing something but in page 160 of the printed book, the last paragraph says it's desirable to avoid equation (9.8) due to a singularity at $r=0$. However, (9.8) is $e^{-\gamma r}$, which is well behaved at zero; I suspect it refers to (9.7), i.e. $1/r^2$, which does have a singularity at zero.

Eq. 8.3

In eq 8.3 in section 8.2, defining the set from which the lower triangular components of L are sampled, the ere is no description of the {+1, + 2....-1} constants that appear. this pattern if left undefined could be any positive integer , z, added to -sqrt(alpha^(k)), e.g. -sqrt(alpha^(k)) + z, or subtracted from sqrt(alpha^(k)), e.g. sqrt(alpha^(k)) - z.

Minor error

I was browsing your book and came across this minor error.

section 10.3, p171: You are not dividing by c_n when writing x_n as function of the other x. For brevity, you can define {c'}_i := c_i / c_n. Then you'll have x_n = - {c'}1 x_1 - {c'}2 x_2 - ... - {c'}{n-1} x{n-1}.

Cheers!

page 183 Algorithm 10.2 error

The final two lines in the for loop of Algorithm 10.2 augmented_lagrange_method on page 183 should be swapped. That is, the final two lines should be:
\lambda -= \rho * h(x)
\rho *= \gamma
The Lagrange multiplier estimate \lambda should be updated before the penalty factor \rho is increased.

Notation question

I have a question regarding the notation used in the Wolfe conditions. The armijo's condition is also written as f(x^(k+1)) <= f(x^(k))+ betaalphagrad_f(x^(k))^T* d^(k) and this is consistent with algorithm 4.2. But the first thing that comes to my mind when reading nabla_(d^(k)) f(x^(k)), in equation (4.4), is as the gradient of f(.) wrt d^(k). Is this equivalent to the dot product of grad_f(x^k) and d^(k)?

Thanks!

Repeated equation (18.17)

The equations (18.17) and (18.18) on page 325 are identical. As I cannot find an error in the surrounding transformations, I suspect a simple copy-paste duplication.

As far as I can tell, I have the first printing (at least there is no mention of printing at all).

Awesome book btw, I'm positively blown away by the very approachable presentation of the material!
Also, 👍 on using Julia for the examples, that definitely was influencing my choice, as I wanted to get a feel for the language for quite some time.

The only thing I found myself wishing for was some material on the relative merits/advantages/drawbacks of the different approaches for different applications (maybe as a final chapter?). I'm not an expert in the field like you authors are, and if I'm doing optimisation in the future some basis for deciding which approach to investigate first would surely come in handy!

Figure 10.7

Page 175, figure 10.7: Not an error but since (10.23) motivated as lower bound for (10.22), it might be more appropriate to visualize not the plain linear function in Fig. 10.7 but rather a linear function truncated to zero for \mu<0.

Alg. 4.4: Convex.jl API change

I am using Julia 1.4 with ] status for Convex.jl as [f65535da] Convex v0.13.3

I tried to code up and run Algorithm 4.4 as written in the second printing, but it throws an argument error for the solve! function. Apparently the Convex.jl API changed in a recent version. From ? solve!, both methods of solve! now require an optimizer:

# 2 methods for generic function "solve!":
  [1] solve!(problem::Problem{T}, optimizer::MathOptInterface.ModelLike; check_vexity, verbose, warmstart) where T in Convex at /Users/rba/.julia/packages/Convex/v9Ehz/src/solution.jl:204
  [2] solve!(problem::Problem{T}, optimizer; kwargs...) where T in Convex at /Users/rba/.julia/packages/Convex/v9Ehz/src/solution.jl:192

They have updated the syntax to solve!(p, solver) according to this commit. The suggested solver interface using SCS.jl is given in Convex.jl's news.

I changed the following in Algorithm 4.4 and everything seems to work as expected.

using Convex to using Convex, SCS
solve!(p) to solve!(p, SCS.Optimizer(verbose=false))

Minor issue: Appendix A, p. 418

this is very minor! the final sentence of p. 418 mentions using "the supertype and subtype functions", however subtype is undefined -- it should be subtypes.

Missing boldface math in eq. (8.23)

(pardon the spam of minor issues)

Chapter 8, page 140, equation (8.23): should \delta^(i) be boldface on the right hand side of the "for" in eq. (8.23)?

Consistency in univariate normal parameters

I think we have the second argument of univariate normal distributions as the variance, not standard deviation. Hence, I think eq. 16.5 should have hat sigma squared. Do you agree @tawheeler ?

Second line_search in Powell's method

In alg. 7.4 on line 102, the second instance of line_search should probably have x' instead of x. It doesn't matter for fig. 7.4 since we are doing pretty much exact line search, but I think the real algorithm should probably do the search starting from the end point. Right @tawheeler ?

Availability of code from book

My sincere apologies if I've simply missed this somewhere, but is there a github repo or other way to access the source code helpfully printed throughout the book without copy-pasting from the PDFs?

Page 174, eq 10.16 and 10.17

I think last terms in equations 10.16 and 10.17 have to be with minus sign

Lipschitz constant in multidimentional DIRECT method

The second to last sentence of Sec. 7.6.2 says: "The lower bound for each hyper-rectangle can be computed based on the longest side length and center value." However, I think it should be the distance to the vertex (the diagonal distance). See Def. 4.1 in the original 1993 paper, which says "let d_i denote the distance from the center point to the vertices". Interestingly, fig. 7.12 in the book correctly shows the lower bound decaying as it approaches the vertices.

Minor issues

Hi!

I noticed a couple of minor issues (formatting etc):

p 347, example 19.3: In the last array of equations, the right pair (starting with x_7 and x_9) has too much space before the equals sign, probably due to a stretching hspace or something like that.
p 376, Example 20.6: In the sentence "Next we sample a rule to apply to B.", the "B" is in the wrong font - \mathcal instead of \mathbb, is what I would suspect in Latex.
p 402, Figure 21.11: I think the expression "f(v, s, r, p,..." is missing the c^{(d)} argument

p. 46, Eq. 3.14: Shubert-Piyavskii uncertainty regions equation error

I have the second printing. In the upper bound in Equation 3.14, and with respect to how the lower bound is computed, the leading sign is flipped, but the y_min and y^(i) terms inside the parentheses have also been flipped in sign, so the lower and upper bounds appear to be equal.

Based on the algorithm, I think it should be:
$ \left[ x^{(i)} - \frac{1}{\ell}(y_{min}-y^{(i)}), x^{(i)} + \frac{1}{\ell}(y_{min}-y^{(i)}) \right] $

Figure 3.6

Figure 3.6, page 38:

In the caption it is claimed that for n queries I_{n-1} = F_2 I_n. In agreement with (3.1), F_2=1, therefore, it must hold that I_{n-1} = I_n.

However, in the figure n=5 and it is explicitly written that I_4 = 2 I_5.

Example 8.1 caption: L1 –> L_inf norm

Related to the caption – I think all the directions in the positive spanning sets in Example 8.1 have unit l-infinity norms rather than unit l-one norms, right?

Is julia step! function truly affecting the input arguments?

This is related to the Julia code (both in the book and in the corresponding Jupyter notebook): in the chapter on descent methods, there is a code snippet:

struct GradientDescent <: DescentMethod
    α
end

init!(M::GradientDescent, f, ∇f, x) = M

function step!(M::GradientDescent, f, ∇f, x)
    α, g = M.α, ∇f(x)
    return x - α*g
end

The step! function does not seem to affect the input arguments, why is there the exclamation mark in the name then? True, it is just a matter of recommended style but while trying to understand the code I find this issue confusing.

Need boldface on theta in text above Eq. 14.7 (p. 254)

Explanation wrinkle in Algorithm 13.11, page 248.

Page 248, Algorithm 13.11.

Both the algorithm and the comment in the margin/caption (on the bound
for the kth prime) appear correct but, unless I am very confused, the
correctness of the algorithm does not follow very directly from the
bound. Since the bound is valid for k > 6, the algorithm should be
testing n (and not the bound(n)) against 6.

Also, I believe the bound is valid for k=6 as well, based on ["Sharper
bounds...", P. Dusart, 1998].

The implementation will fail for m = 1, but that is a somewhat
perverse input.

Typo on page 35 ?

On page 35 it is stated "f is monotonically decreasing for x ≤ x∗ and monotonically increasing for x ≥ x∗" for unimodal function. Isn't it monotonically increasing for x ≤ x∗ as well as for x ≥ x∗ ?

Solution for exercise 8.1 on page 453

The solution reads "Unfortunately, no known analytical solutions for fitting multivariate normal distributions exist." Shouldn't this be "mixture distributions"?

P. 60 index use consistency

It looks like in eqs. 4.7-4.8, the use or non-use of indices is inconsistent. I think they can just all be dropped (e.g. as shown in the plots). Acknowledge Remy Zawislak. Assigning Tim to confirm.

Radians v degrees in Exercise 21.4

Exercise 21.4 specifiex theta_max as 10 rad, but I think it should be 10 degrees instead, given the wing-deformation context.

I know very little about airplane design but I don't think I want to be in a plane whose wing deforms 10 radians. :-) Besides, 10 degrees gives an answer that matches the solution in Appendix D.

Missing boldface on 0 in text above Eq. 10.43

Inconsistent notation for constants in CMA-ES

On page 141, Eq 8.30 uses c_c but right after, in Eq 8.31, it is referred to as c_Σ.

Missing period page 287

The 5th sentence on pg 287 is missing a period. "The likelihood of the data is the probability that the observed points were drawn from the model Equivalently, we can maximize ..."

Minor parentheses styling inconsistency

On pg. 172, Example 10.3, in the first objective function, the second inner set of parentheses should be larger to match the style of the rest of the equations (e.g, \left( x_2 - \frac{3}{2} \right)):

Error?

First printing, pg. 313, example 17.3: shouldn't the function be f(x,z)=x^2 + 6*exp(-x^2)? The value at x=0 is 6.

rand_positive_spanning_set

Alg. 8.2 seems to have an issue with the construction of D. There seems to be some no-ops going on here:

D = L[randperm(n),:]
D = L[:,randperm(n)]

Missing exp(1) in Algorithm B.1, page 426.

The code for Algorithm B.1 on page 426 seems to be missing a "exp(1)" after the last + sign.

Algorithm 3.2

Algorithm 3.2 on page 39.

Inside the body of the function fibonacci_search, the φ parameter is used on the second line but it is neither defined in the code nor supplied as an input argument to the function.

MvNormal in Algorithm 15.3

MvNormal is called without calling using Distributions before anywhere in the chapter. Therefore, will result in UndefVarError.

example 10.5 on P179

This problem minimizes sin(x) over an interval. So the minimizer is either an endpoint or a stationary point. In fact, x* = -pi/2, p* = -1.

In the example's plot, u starts from 0.1. If we include u = 0, we would recover x*.

A slightly more interesting example may be consider the constraint x^2 <= 1. Now the x* is found at a nontrivial u. (u = cos(1) / 2, if I got this correct.)

Algorithm 8.1 in Page 126

I think the line 13 in the algorithm 8.1 miss the "*" operator.

σ = M.σ(M.k)

Julia beginner

Thanks for the book. I have a simple entry-level question. How to assemble a FactorTable correctly?

julia-1.5.4:

struct Variable
    name::Symbol
    m::Int # number of possible values
end

const Assignment = Dict{Symbol,Int}
const FactorTable = Dict{Assignment,Float64}
struct Factor
    vars::Vector{Variable}
    table::FactorTable
end
B = Variable(:b, 2);
FactorTable((b = 1,) => 0.99, (b = 2,) => 0.01);

Code on page 26 "chapter 2 representation" throw exception.

ERROR: LoadError: MethodError: Cannot `convert` an object of type NamedTuple{(:b,), Tuple{Int64}} to an object of type Dict{Symbol, Int64}
Closest candidates are:
  convert(::Type{T}, ::T) where T<:AbstractDict at abstractdict.jl:520
  convert(::Type{T}, ::AbstractDict) where T<:AbstractDict at abstractdict.jl:522
  convert(::Type{T}, ::T) where T at essentials.jl:205
  ...

P177 last line

It reads "The inner maximization in the dual problem...". Do you mean the inner minimization?

Eq. 9.1: Remove boldface x

The x in Eq. 9.1 is boldface, indicating a vector, but I think it should not be boldface. The x_i's are indexed components of the vector - each drawn from a scalar distribution.

Ackley's function missing `exp(1)`

Algorithm B.1 Ackley's function is missing an exp(1) in the return statement.

This could potentially be a Unicode rendering issue if you're using the \euler character in Julia; I've run into this specifically with \euler and Tufte-algorithms class before.

Ch. 5, Nesterov citation typo

On page 76 in Chapter 5, the citation for the Nesterov momentum paper should have k² (superscript 2) instead of k₂ (subscript 2).

Corrected citation:
Y. Nesterov, ``A Method of Solving a Convex Programming Problem with Convergence Rate O(1/k²),'' Soviet Mathematics Doklady, vol. 27, no. 2, pp. 543–547, 1983.

Footnote looks like exponent

Chapter 8, page 125, the use of a footnote superscript makes it look like an exponent in the inline equation:

...typically a decreasing sequence σ⁽ᵏ⁾ such as 1/k.³

This makes it look like it's 1/k³ (cubed), when it's just 1/k

Figure 21.11: y^{(d)} should replace c^{(d)} ?

Hello,

In the final constraint in the optimization, all of the coupling variables c are updated with the current y values, except for c^{(d)}. Should a y replace the c?

Typo example 18.6, page 336

There is a typo/formatting error in example 18.6, page 336, where the vector q is given as

q = [0.577, 0.4500.450, 0.417, 0.417]

when I think it should read

q = [0.577, 0.450, 0.450, 0.417, 0.417]

I did not do the calculations manually to confirm, and sadly have not found time (yet?) to implement the Julia code for the calculations.
As far as I can tell, I have the first printing (at least there is no mention of printing at all).

DIRECT example 7.2 center values (incorrect)

Chapter 7, page 122, example 7.2 for DIRECT: the values for the center column in the table show 0.25 and 0.75, when the actual centers are at 1/6 and 5/6, respectively (unless I'm misinterpreting something).

e.g., the center for interval 1 says [0.25, 0.50] when the actual center is [1/6, 1/2].

Issue with expected improvement

There is an issue in the step from eq. 16.11 to 16.12.

Hence, the first appearance of sigma hat in eq. 16.12 should be squared. This correction should be propagated to Alg. 16.2. The sigma on line 4 should be squared.

Thanks to Philippe Weingertner! @tawheeler , if you agree, I can implement this fix.

Algorithm 3.5 errata update

I just purchased your book and it is fantastic. I am working through it now.

I real the errata : p. 47: Change for clarity: updated algorithm to use y_min and replaced j with i. (thanks to Ross Alexander)

Can you please post the update for the listing for algorithm 3.5 of p. 47? I not sure what the changes are supposed to be.

o instead of 0

Hi! I don't know if it has been signaled and corrected yet because it might be a small typo (doesn't appear in the errata), but isn't there a letter o instead of number 0 at the end of 17.3.4 page 316?