Code Monkey home page Code Monkey logo

mth594_machinelearning's Introduction

MTH594 Advanced data mining: theory and applications

The materials for the course MTH 594 Advanced data mining: theory and applications taught by Dmitry Efimov in American University of Sharjah, UAE in Spring, 2016 semester. The program of the course can be downloaded from the folder syllabus.

To compose this lectures mainly I used the ideas from three sources:

  1. Stanford lectures by Andrew Ng on YouTube: https://www.youtube.com/watch?v=UzxYlbK2c7E&list=PLA89DCFA6ADACE599
  2. The book "The elements of Statistical Learning" by T. Hastie, R. Tibshirani and J. Friedman: http://statweb.stanford.edu/~tibs/ElemStatLearn
  3. Lectures by Andrew Ng on Coursera: https://www.coursera.org/learn/machine-learning

All uploaded pdf lectures are adapted in a way to help students to understand the material.

The supplementary files from ipython folder are aimed to teach students how to use built-in methods to train the models on Python 2.7.

In case you found some mistakes or typos, please email me [email protected], this course is a new for me and probably there are some :)

The content of the lectures:

Supervised learning

Linear and logistic regressions, perceptrons

Linear regression

Analytical minimization: normal equations

Statistical interpretation

Logistic regression

Perceptron

Bayesian interpretation and regularization

Python implementation

Linear regression
Logistic regression
Perceptron
Regularization

Methods of optimization

Gradient descent

Examples of gradient descent

Newton's method

Python implementation

Batch gradient descent
Stochastic gradient descent

Generalized linear models (GLM)

Exponential family

Generalized Linear Models (GLM)

Python implementation

Softmax regression

Generative learning algorithms

General idea of generative algorithms

Gaussians

Gaussian discriminant analysis

Generative vs Discriminant comparison

Naive Bayes

Laplace smoothing

Event models

Python implementation

Gaussians
Gaussian discriminant analysis
Naive Bayes

Neural networks

Definition

Backpropagation

Python implementation

Support vector machines

Support vector machines: intuition

Primal/dual optimization problem and KKT

SVM dual

Kernels

Kernel examples

Kernel testing

SVM with kernels

Soft margin

SMO algorithm

Python implementation

Coordinate ascent
SVM
SMO algorithm

Nonparametric methods

Locally weighted regression

Generalized additive models (GAM)

GAM for regression

GAM for classification

Tree-based methods

Regression trees

Classification trees

Boosting

Exponential loss
Adaboost
Gradient boosting
Gradient tree boosting

Python implementation

Locally weighted regression
GAM for regression
GAM for classification
Regression decision trees
Classification decision trees
Gradient tree boosting

Learning theory

Bias / variance

Empirical risk minimization (ERM)

Union bound / Hoeffding inequality

Uniform convergence

VC dimension

Model selection

Feature selection

Python implementation

Cross validation

Online learning

Advices for apply ML algorithms

Unsupervised learning

Clustering

K-means

Python implementation

Mixture of Gaussians and EM algorithm

Mixture of Gaussians

Jensen's inequality

General EM algorithm

EM algorithm for the mixture of Gaussians

EM algorithm for the mixture of Naive Bayes

Python implementation

Mixture of Gaussians
EM algorithm for mixture of Gaussians

Factor analysis

Intuition

Marginal and conditionals for Gaussians

Factor analysis model

EM steps for factor analysis

Python implementation

Principal component analysis

PCA algorithm

Latent semantic indexing (LSI)

Python implementation

Independent component analysis (ICA)

mth594_machinelearning's People

Contributors

diefimov avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

mth594_machinelearning's Issues

Explanation of (1.3)

Page 14 of lectures, formula (1.3).

Possibly it would be useful to explain that we do not consider $p( x^{( i )} | \theta )$ because actually it equals $p( x^{( i )} )$ since $x^{( i )}$ doesn't depend on $\theta$ and therefore this term has no sense in the problem of maximization

Elementwise vector multiplication

Section 5.2 (page 52)

That would be very nice to introduce so-called $\odot$ operation of elementwise multiplication as now formulee like step 9 in Algorithm 3 look informal.

Also the remark about vectorization of activation function g(z) and its derivative is preferable.

Parameters in Gaussian discriminant analysis

Section 4.3 (page 38)

When we calculate the number of parameters for the case of 2 classes and 2 features, we forget that \Sigma_0 and \Sigma_1 are symmetrical. So, the real number of parameters equals 1+2+2+3+3 = 11 (for \phi, \mu_0, \mu_1, \Sigma_0, \Sigma_1 respectively).

Second idea of generative algorithms

Section 4.1 (page 36)

The true numerator of the fration in the second idea of generation algorithms seems to be
$ p( x | y=1 ) p(y=1) $, not $ p( x | y=1 ) p(y) $.

The same thing in Section 4.4 (page 40).

Unclear argument usage

Section 7.1, page 88.

When weighted sum of squares is considered it's actually unclear what is the x in \omega^{ (i) } expression.
If we fit \theta to minimize over train sample we have no x without superscripts.

So the following statement about disadvantage of loess would be better placed before stating the optimization problem,

Set of paramters for optimization

Section 6.1, page 64.

When we formulate the first optimization problems the margin \gamma wouldn't belong to the set we solve the problem over. The same concerns the equivalent (second) one.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.