The materials for the course MTH 594 Advanced data mining: theory and applications (Dmitry Efimov, American University of Sharjah)
Jupyter Notebook 99.96%Python 0.04%
mth594_machinelearning's Introduction
MTH594 Advanced data mining: theory and applications
The materials for the course MTH 594 Advanced data mining: theory and applications taught by Dmitry Efimov in American University of Sharjah, UAE in Spring, 2016 semester.
The program of the course can be downloaded from the folder syllabus.
To compose this lectures mainly I used the ideas from three sources:
Possibly it would be useful to explain that we do not consider $p( x^{( i )} | \theta )$ because actually it equals $p( x^{( i )} )$ since $x^{( i )}$ doesn't depend on $\theta$ and therefore this term has no sense in the problem of maximization
That would be very nice to introduce so-called $\odot$ operation of elementwise multiplication as now formulee like step 9 in Algorithm 3 look informal.
Also the remark about vectorization of activation function g(z) and its derivative is preferable.
When we calculate the number of parameters for the case of 2 classes and 2 features, we forget that \Sigma_0 and \Sigma_1 are symmetrical. So, the real number of parameters equals 1+2+2+3+3 = 11 (for \phi, \mu_0, \mu_1, \Sigma_0, \Sigma_1 respectively).
When weighted sum of squares is considered it's actually unclear what is the x in \omega^{ (i) } expression.
If we fit \theta to minimize over train sample we have no x without superscripts.
So the following statement about disadvantage of loess would be better placed before stating the optimization problem,
When we formulate the first optimization problems the margin \gamma wouldn't belong to the set we solve the problem over. The same concerns the equivalent (second) one.