A brief talk about machine learning in Python
-
Linear regression
- which one would be the best classifer while having many possibility to perfectly discriminate data?
- maximal margin => SVM
-
SVM
- Mathematical formulation
- relation to logistic regression in view of loss function
- kernel trick (linear, poly, rbf)
-
RandomForest
- 原理
- Decision Tree
- Pros and Cons
-
XGBoost
- 原理
- the difference with GBDT
- Gradient Boosting Method
- Ensemble of a series of randomforests with gradient boosting
-
Dimension reduction
- why we need dimension reduction?
- how to find out the principal components from observed data?
-
說明 PCA 原理
- (1) linear combination of original features, (2) explain the most variance in that
- 什麼是 explained variance: 投影到某一軸, data variance 能夠被表達多少
- PCA example and visualization
-
Other feature selection method
- Lasso regression
-
Clustering
- Kmeans clustering
- Minibatch Kmeans clustering
- GMM (Gaussian Mixed Model)
- Kernel Approximation