MTH594 Advanced data mining: theory and applications

The materials for the course MTH 594 Advanced data mining: theory and applications taught by Dmitry Efimov in American University of Sharjah, UAE in Spring, 2016 semester. The program of the course can be downloaded from the folder syllabus.

To compose this lectures mainly I used the ideas from three sources:

Stanford lectures by Andrew Ng on YouTube: https://www.youtube.com/watch?v=UzxYlbK2c7E&list=PLA89DCFA6ADACE599
The book "The elements of Statistical Learning" by T. Hastie, R. Tibshirani and J. Friedman: http://statweb.stanford.edu/~tibs/ElemStatLearn
Lectures by Andrew Ng on Coursera: https://www.coursera.org/learn/machine-learning

All uploaded pdf lectures are adapted in a way to help students to understand the material.

The supplementary files from ipython folder are aimed to teach students how to use built-in methods to train the models on Python 2.7.

In case you found some mistakes or typos, please email me [email protected], this course is a new for me and probably there are some :)

The content of the lectures:

Supervised learning

Linear and logistic regressions, perceptrons

Linear regression

Analytical minimization: normal equations

Statistical interpretation

Logistic regression

Perceptron

Bayesian interpretation and regularization

Python implementation

Linear regression

Logistic regression

Perceptron

Regularization

Methods of optimization

Gradient descent

Examples of gradient descent

Newton's method

Python implementation

Batch gradient descent

Stochastic gradient descent

Generalized linear models (GLM)

Exponential family

Generalized Linear Models (GLM)

Python implementation

Softmax regression

Generative learning algorithms

General idea of generative algorithms

Gaussians

Gaussian discriminant analysis

Generative vs Discriminant comparison

Naive Bayes

Laplace smoothing

Event models

Python implementation

Gaussians

Gaussian discriminant analysis

Naive Bayes

Neural networks

Definition

Backpropagation

Python implementation

Support vector machines

Support vector machines: intuition

Primal/dual optimization problem and KKT

SVM dual

Kernels

Kernel examples

Kernel testing

SVM with kernels

Soft margin

SMO algorithm

Python implementation

Coordinate ascent

SVM

SMO algorithm

Nonparametric methods

Locally weighted regression

Generalized additive models (GAM)

GAM for regression

GAM for classification

Tree-based methods

Regression trees

Classification trees

Boosting

Exponential loss

Adaboost

Gradient boosting

Gradient tree boosting

Python implementation

Locally weighted regression

GAM for regression

GAM for classification

Regression decision trees

Classification decision trees

Gradient tree boosting

Learning theory

Bias / variance

Empirical risk minimization (ERM)

Union bound / Hoeffding inequality

Uniform convergence

VC dimension

Model selection

Feature selection

Python implementation

Cross validation

Online learning

Advices for apply ML algorithms

Unsupervised learning

Clustering

K-means

Python implementation

Mixture of Gaussians and EM algorithm

Mixture of Gaussians

Jensen's inequality

General EM algorithm

EM algorithm for the mixture of Gaussians

EM algorithm for the mixture of Naive Bayes

Python implementation

Mixture of Gaussians

EM algorithm for mixture of Gaussians

Factor analysis

Intuition

Marginal and conditionals for Gaussians

Factor analysis model

EM steps for factor analysis

Python implementation

Principal component analysis

PCA algorithm

Latent semantic indexing (LSI)

Python implementation

Independent component analysis (ICA)

diefimov / mth594_machinelearning Goto Github PK

mth594_machinelearning's Introduction

MTH594 Advanced data mining: theory and applications

Supervised learning

Linear and logistic regressions, perceptrons

Linear regression

Analytical minimization: normal equations

Statistical interpretation

Logistic regression

Perceptron

Bayesian interpretation and regularization

Python implementation

Linear regression

Logistic regression

Perceptron

Regularization

Methods of optimization

Gradient descent

Examples of gradient descent

Newton's method

Python implementation

Batch gradient descent

Stochastic gradient descent

Generalized linear models (GLM)

Exponential family

Generalized Linear Models (GLM)

Python implementation

Softmax regression

Generative learning algorithms

General idea of generative algorithms

Gaussians

Gaussian discriminant analysis

Generative vs Discriminant comparison

Naive Bayes

Laplace smoothing

Event models

Python implementation

Gaussians

Gaussian discriminant analysis

Naive Bayes

Neural networks

Definition

Backpropagation

Python implementation

Support vector machines

Support vector machines: intuition

Primal/dual optimization problem and KKT

SVM dual

Kernels

Kernel examples

Kernel testing

SVM with kernels

Soft margin

SMO algorithm

Python implementation

Coordinate ascent

SVM

SMO algorithm

Nonparametric methods

Locally weighted regression

Generalized additive models (GAM)

GAM for regression

GAM for classification

Tree-based methods

Regression trees

Classification trees

Boosting

Exponential loss

Adaboost

Gradient boosting

Gradient tree boosting

Python implementation

Locally weighted regression

GAM for regression

GAM for classification

Regression decision trees

Classification decision trees

Gradient tree boosting

Learning theory

Bias / variance