Theoretic Fundamentals of Machine and Deep Learning

Materials of Lecture course that I taught in Winter/Spring 2023 in two largest Russia's Technical Universities:

Lomonosov Moscow State University (Faculty of Mechanics and Mathematics) - under the name "Introduction to Machine and Deep Learning Theory"
Moscow Institute of Physics and Technology (Master's program "Methods and technologies of artificial intelligence") - under the name "Introduction to Deep Learning Theory"

Course overview

Deep learning is a young (approximate date of origin is 2011-2012), but actively developing area of machine learning, which is characterized primarily by the use of neural networks with a large (hence the word “deep” in the name) number of layers in their architecture. Initially, deep learning was a predominantly empirical field of knowledge in which new findings were primarily found experimentally. Subsequently, many findings began to receive theoretical justification, and somewhere theory is now even ahead of practice. The course will cover the basic concepts that are used in the theoretical consideration of empirical methods of machine and deep learning - the reasoning behind of loss functions, working with data distributions, theory of generative models, adversarial learning, stability of neural networks, and limit theorems.

Course content

Empirical risk and its approximation
- The basic concepts of measuring the quality of the work of a machine learning algorithm are empirical risk and its approximation. Differentiability. Stochastic gradient descent. Regularization. Probabilistic meaning of loss functions on the example of maximum likelihood and a posteriori probability.
Basic loss functions. Their evolution based on the problem of face recognition
- The main classification functions of losses are logistic, cross entropy. Entropy and the Gibbs inequality. Functions on distributions. Kullback-Leibler distance. Neural Collapse. Evolution of loss functions on the example of the problem of face recognition.
Theoretical justification of adversarial learning methods
- The mechanism of adversarial learning as a minimax. Derivation of formulas reflecting a practical approach to training. Connection with the Wasserstein metric.
Variational Inference
- Bayes' theorem and posterior probability. Approximation using a parametric family of distributions. Lower bound by ELBO. Bayesian Neural Networks.
AE, VAE and CVAE
- Concepts of autoencoder, variational autoencoder, conditional variational autoencoder and their differences. Architectural implementation in practice.
Markov Chain Monte Carlo
- The problem of sampling from an empirical space in a (high) multidimensional space. Equations of detailed balance. Gibbs, Metropolis and Metropolis-Hastings samplers. Relationship with Langevin dynamics. Adjustment by Metropolis.
Diffusion Models
- Forward and reverse process as an analogue of the diffusion process. Derivation of formulas and architectural implementation in practice. 3 interpretations of diffusion models. Classifier (-free) Guidance.
Adversarial examples and defense against them
- The surprising effect of the instability of neural networks to the input perturbations. Examples of adversarial perturbations. Basic methods for constructing adversarial examples and defending against them. Classification of adversarial examples. Adversarial Examples implementable in the Real-World and their common features.
Certified Robustness
- The concepts of certificate and certified robustness. Classical approach using randomized smoothing. Neyman-Pearson Lemma. Smoothing distribution vs norm of perturbation. Curse of dimensionality in computer vision problems. Semantic Traansformations.
Limiting theorems for the training process
- Limiting (existence) theorems for approximation, as well as the dynamics of the convergence of the training process.

Course program

N	Lecture (in English)	Desctription	Video (in Russian)
01	Emprirical Risk and Loss	Empirical Risk and its Approximation. Loss Function. (Stochastic) Gradient Descent. MLE and MAP. Kullback-Leibler divergence and Cross Entropy	record01
02	Representation Learning and FaceID Losses	Task of Representation Learning. Neural Collapse. FaceID: Evolution of Loss Function. Representation Learning. SoftMax-, contrastive- and angular-based losses	record02
03	GANs	Discriminate vs Generative models. Generative Adversarial Networks. Deep Convolutional GAN. Wasserstein GAN. Gradient-Penalty WGAN. Conditional GAN.	record03
04	Bayes, VI, AE, VAE, cVAE, BNN	Bayesian Inference, Bayesian Neural Network, Variational Inference, Autoencoder, Variational Autoencoder, Conditional Variational Autoencoder	record04
05	Markov Chain Monte Carlo	Recap of Markov Chains. Markov Chain Monte Carlo. Gibbs sampler. Metropolis-Hastings sampler. Langevin dynamics and Metropolis-Adjusted Langevin. Stochastic Gradient Langevin Dynamics	record05
06	Diffusion Models	Recap of Variational Autoencoder. Markovian Hierarchical VAE. Diffusion models: Variational Diffusion Models, Diffusion Denoising Probabilistic Models, Diffusion Denoising Implicit Models, Classifier and classifier-free guidance, 3 interpretations	record06
07	Adversarial Robustness I: Digital Domain	Adversarial Robustness I: Great Success of CNNs, Robustness Phenomenon, Taxonomy of Adversarial Attacks, l_p norms, Digital Domain, Fast Gradient Sign Method and its variants, Universal Attacks, Top-k Attacks, l_0 attacks	record07
08	Adversarial Robustness II: Real World	Adversarial Robustness II: Adversarial examples in real world, Adversarial attack on Face detection and Face ID systems, Defense from adversarial examples in real world, Black-box Face restoration	record08
09	Certified Robustness I: Randomized Smoothing	Certified Robustness I: definitions of Certified Robustness, connection to Lipschitzness, Randomized Smoothing and its variants	record09
10	Certified Robustness II: High Dimensions and Semantic Transformations	Certified Robustness II: recap of Certified Robustness, Ablations on base classifier/norm of perturbation/smoothing distribution, Certification in High Dimensional case, Certification of Semantic Perturbations, Application to different Computer Vision tasks	record10
11	Neural Tangent Kernel	Neural Tangent Kernel: Lazy regime of training, GD as PDE, NTK and CNTK, NTK convergence rates	record11

Bibliography

Machine Learning Lecture Course on http://www.machinelearning.ru from Vorontsov K.V.
Hastie, T. and Tibshirani, R. and Friedman, J. The Elements of Statistical Learning, 2nd edition, Springer, 2009.
Bishop, C.M. Pattern Recognition and Machine Learning, Springer, 2006.
Ian Goodfellow, Yoshua Bengio, Aaron Courville, and Yoshua Bengio. Deep learning. Vol. 1. Cambridge: MIT press, 2016.
Matus Telgarsky, Deep learning theory lecture notes, 2021
Sanjeev Arora et al., Theory of Deep learning book draft, 2020

Useful links

Introduction to machine learning

Homemade Machine Learning: github repo
Machine learning: Course by Andrew Ng on the site https://www.coursera.org

Theoretic Courses

Foundations of Deep Learning: Course at UCLA
Deep learning theory: Course at UIUC
Theoretical Deep Learning: Course at Princeton

License

Creative Commons: BY Attribution-NonCommercial-ShareAlike 4.0 International (CC BY-NC-SA)

fatheral / tfmdl Goto Github PK

tfmdl's Introduction

Theoretic Fundamentals of Machine and Deep Learning

Course overview

Course content

Course program

Bibliography

Useful links

Introduction to machine learning

Theoretic Courses

License

tfmdl's People

Contributors

Stargazers

Watchers

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent