I couldn't find any reference in the errata, hence may be it is not an error - but how

Exercise 2.1 - Uniform sampling about machine_learning_refined HOT 4 CLOSED

jermwatt commented on May 20, 2024

Exercise 2.1 - Uniform sampling

from machine_learning_refined.

Comments (4)

jermwatt commented on May 20, 2024 2

You're absolutely right - I apologize. The wording of this problem is confusing - after our conversation has converged I'll be updating the text and errata with better vocabulary.

In particular - we've used the phrase "uniform" too loosely here. We look at uniform grids (e.g., Figure 2.3) as a simple way over which we can evaluate a function in search of its minimum. The point we want to communicate with its usage here is that evaluating a function systematically - e.g., on a uniform grid - in search of a minima / maxima very quickly becomes inefficient as the input dimension N increases.

We then try to contrast that systematic approach with a random one - sampling uniformly from the input domain of a function, evaluating said points, in search of minima / maxima. Our point here is that this search strategy also fails as the the input dimension N increases.

So in short we've played too fast and loose with the phrase "uniform" and its confusing. We have two naive optimization strategies

input sampling via a uniformly spaced discrete grid --> function evaluation (in search of minima/maxima)
input sampling at random via a uniform distribution --> function evaluation (in search of minima/maxima)

In both cases a strategy of simply sampling and evaluating a function to echo-locate its minima / maxima fail very quickly as input dimension N increases.

OK - now onto the problem itself - 2.1 a)

Here we would like you to show yourself that 1. above - sampling points off a uniform grid (even when the global minimum of a function exists on the grid itself) is a poor strategy.

To do this we must first define a grid on [-1,1]^N - which involves choosing a distance parameter d (as illustrated in Figure 2.3). This is just the distance between each pair of coordinatewise gridlines. Such a distance parameter d implies a corresponding set of possible coordinate values. For example if d=0.2 possible coordinate values (for each coordinate) are

[-1. , -0.8, -0.6, -0.4, -0.2, 0. , 0.2, 0.4, 0.6, 0.8, 1. ]

Of course any point on the uniform grid defined by d=0.2 of any input dimension N has coordinates defined by these values.

One way to then sample an N dimensional point from this grid we can construct the entire input vector placing a discrete distribution over these coordinate values and sample N times, as illustrated below.

### for given N generate random point on uniform grid over [-1,1]^N where d = 0.2 ###
## we first setup a discrete distribution for sampling possible coordinate values
# create possible coordinate values - for given settings --> [-1. , -0.8, -0.6, -0.4, -0.2,  0. ,  0.2,  0.4,  0.6,  0.8,  1. ]
d = 0.2
uniform_grid_vals = np.linspace(-1,1,int((1 - (-1))/d) + 1)

# create uniform discrete probabilities for selecting these coordinate grid values
probabilities = [1/len(uniform_grid_vals) for i in range(len(uniform_grid_vals))]

# choose random point of N dimensional uniform grid
w = np.random.choice(uniform_grid_vals, N, p=probabilities)[:,np.newaxis] # np.newaxis ensures our output is shaped Nx1

If we define our quadratic like something as follows

import numpy as np
def g(w: "Nx1 numpy array - input to quadratic function") -> "1x1 numpy array - output of quadratic for input w":
    return np.dot(w.T,w)

then question 2.1 a) involves evaluating this function for various values of N, for each N doing so P times, and for each N returning the minimum function evaluation achieved.

Does this make more sense?

from machine_learning_refined.

jermwatt commented on May 20, 2024

Great question! Let me know if I'm tracing out our thinking correctly.

In this context sampling on [-1,1] means chosing any real point on the interval starting at -1 and ending at 1, and the 'uniform' part means to produce this sample with respect to the uniform distribution on that interval. A (randomly) chosen point could be any real value inbetween -1 and 1 inclusive (e.g., -0.23218, 0.38482,...).

The N dimensional hypercube

[-1,1] X [1,1] X ... X [-1,1]

is the higher dimensional analog of the interval [-1,1]. Choosing a point in the cube means selecting a real valued N dimensional vector like

[0.2322, -0.7327,...,0.998]

To do this uniformly just means to sample points like this according to a uniform distribution. Because the desired distribution is uniform, you can just sample each coordinate independently according to a uniform on [-1,1]. In other words, to produce an N dimensional point chosen uniformly in the hypercube described above, you can create an N length array where each corrdinate is sampled uniformly on the interval [-1,1].

Here a phrase like "sample $K$ points uniformly from $[-1,1]^N$' means to do this $K$ times - producing $K$ such vectors.

Does that make sense / did I interpret you correctly?

from machine_learning_refined.

theundergroundsorcerer commented on May 20, 2024

Thank you for your reply.

I am not sure that I accept your explanation because it doesn't match the description of uniform sampling strategy in the text, and it is reasonable to assume that the definitions in the text and the exercises coincide, unless explicitly stated otherwise.

What do you mean here by "random", then? Other than that, on page 24 when discussing global optimization the last paragraph says:

We can take two approaches here to choosing our (finite) set of input points to test: either sample (i.e., guess) them uniformly over an evenly spaced grid (uniform sampling), or pick the same number of input points at random (random sampling).

It doesn't mention uniform distribution, but creating a grid (unless I've got something wrong when reading it) of appropriate dimension (the discussion in section 2.3.1 - The curse of dimensionality) seems to match this interpretation of "uniform sampling" (grid of appropriate dimension).

Other than that, if we accept your explanation, how should part (c) of the question be interpreted?
c).

Part (c) of the question says:

(c) Repeat parts (a) and (b), this time replacing uniformly chosen samples with
randomly selected ones.

The procedure you've just described for uniform sampling sections (a) and (b) as "uniform" is just one would do when trying to generate a random vector in $[-1, 1]^N$. (I think that's what np.random.rand(1, dimension) does - generates a vector according $N$ coordinates that are independently uniformly distributed on $[0, 1]$.

from machine_learning_refined.

theundergroundsorcerer commented on May 20, 2024

It make more sense now.

Thanks.

from machine_learning_refined.

Exercise 2.1 - Uniform sampling about machine_learning_refined HOT 4 CLOSED

Comments (4)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent