Code Monkey home page Code Monkey logo

Comments (4)

jermwatt avatar jermwatt commented on May 20, 2024 2

You're absolutely right - I apologize. The wording of this problem is confusing - after our conversation has converged I'll be updating the text and errata with better vocabulary.

In particular - we've used the phrase "uniform" too loosely here. We look at uniform grids (e.g., Figure 2.3) as a simple way over which we can evaluate a function in search of its minimum. The point we want to communicate with its usage here is that evaluating a function systematically - e.g., on a uniform grid - in search of a minima / maxima very quickly becomes inefficient as the input dimension N increases.

We then try to contrast that systematic approach with a random one - sampling uniformly from the input domain of a function, evaluating said points, in search of minima / maxima. Our point here is that this search strategy also fails as the the input dimension N increases.

So in short we've played too fast and loose with the phrase "uniform" and its confusing. We have two naive optimization strategies

  1. input sampling via a uniformly spaced discrete grid --> function evaluation (in search of minima/maxima)
  2. input sampling at random via a uniform distribution --> function evaluation (in search of minima/maxima)

In both cases a strategy of simply sampling and evaluating a function to echo-locate its minima / maxima fail very quickly as input dimension N increases.

OK - now onto the problem itself - 2.1 a)

Here we would like you to show yourself that 1. above - sampling points off a uniform grid (even when the global minimum of a function exists on the grid itself) is a poor strategy.

To do this we must first define a grid on [-1,1]^N - which involves choosing a distance parameter d (as illustrated in Figure 2.3). This is just the distance between each pair of coordinatewise gridlines. Such a distance parameter d implies a corresponding set of possible coordinate values. For example if d=0.2 possible coordinate values (for each coordinate) are

[-1. , -0.8, -0.6, -0.4, -0.2, 0. , 0.2, 0.4, 0.6, 0.8, 1. ]

Of course any point on the uniform grid defined by d=0.2 of any input dimension N has coordinates defined by these values.

One way to then sample an N dimensional point from this grid we can construct the entire input vector placing a discrete distribution over these coordinate values and sample N times, as illustrated below.

### for given N generate random point on uniform grid over [-1,1]^N where d = 0.2 ###
## we first setup a discrete distribution for sampling possible coordinate values
# create possible coordinate values - for given settings --> [-1. , -0.8, -0.6, -0.4, -0.2,  0. ,  0.2,  0.4,  0.6,  0.8,  1. ]
d = 0.2
uniform_grid_vals = np.linspace(-1,1,int((1 - (-1))/d) + 1)

# create uniform discrete probabilities for selecting these coordinate grid values
probabilities = [1/len(uniform_grid_vals) for i in range(len(uniform_grid_vals))]

# choose random point of N dimensional uniform grid
w = np.random.choice(uniform_grid_vals, N, p=probabilities)[:,np.newaxis] # np.newaxis ensures our output is shaped Nx1

If we define our quadratic like something as follows

import numpy as np
def g(w: "Nx1 numpy array - input to quadratic function") -> "1x1 numpy array - output of quadratic for input w":
    return np.dot(w.T,w)

then question 2.1 a) involves evaluating this function for various values of N, for each N doing so P times, and for each N returning the minimum function evaluation achieved.

Does this make more sense?

from machine_learning_refined.

jermwatt avatar jermwatt commented on May 20, 2024

Great question! Let me know if I'm tracing out our thinking correctly.

In this context sampling on [-1,1] means chosing any real point on the interval starting at -1 and ending at 1, and the 'uniform' part means to produce this sample with respect to the uniform distribution on that interval. A (randomly) chosen point could be any real value inbetween -1 and 1 inclusive (e.g., -0.23218, 0.38482,...).

The N dimensional hypercube

[-1,1] X [1,1] X ... X [-1,1]

is the higher dimensional analog of the interval [-1,1]. Choosing a point in the cube means selecting a real valued N dimensional vector like

[0.2322, -0.7327,...,0.998]

To do this uniformly just means to sample points like this according to a uniform distribution. Because the desired distribution is uniform, you can just sample each coordinate independently according to a uniform on [-1,1]. In other words, to produce an N dimensional point chosen uniformly in the hypercube described above, you can create an N length array where each corrdinate is sampled uniformly on the interval [-1,1].

Here a phrase like "sample $K$ points uniformly from $[-1,1]^N$' means to do this $K$ times - producing $K$ such vectors.

Does that make sense / did I interpret you correctly?

from machine_learning_refined.

theundergroundsorcerer avatar theundergroundsorcerer commented on May 20, 2024

Thank you for your reply.

I am not sure that I accept your explanation because it doesn't match the description of uniform sampling strategy in the text, and it is reasonable to assume that the definitions in the text and the exercises coincide, unless explicitly stated otherwise.

What do you mean here by "random", then? Other than that, on page 24 when discussing global optimization the last paragraph says:

We can take two approaches here to choosing our (finite) set of input points to test: either sample (i.e., guess) them uniformly over an evenly spaced grid (uniform sampling), or pick the same number of input points at random (random sampling).

It doesn't mention uniform distribution, but creating a grid (unless I've got something wrong when reading it) of appropriate dimension (the discussion in section 2.3.1 - The curse of dimensionality) seems to match this interpretation of "uniform sampling" (grid of appropriate dimension).

Other than that, if we accept your explanation, how should part (c) of the question be interpreted?
c).

Part (c) of the question says:

(c) Repeat parts (a) and (b), this time replacing uniformly chosen samples with
randomly selected ones.

The procedure you've just described for uniform sampling sections (a) and (b) as "uniform" is just one would do when trying to generate a random vector in $[-1, 1]^N$. (I think that's what np.random.rand(1, dimension) does - generates a vector according $N$ coordinates that are independently uniformly distributed on $[0, 1]$.

from machine_learning_refined.

theundergroundsorcerer avatar theundergroundsorcerer commented on May 20, 2024

It make more sense now.

Thanks.

from machine_learning_refined.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.