Cross Validated Questions for DSI Reading

This document collates many questions and answers from Cross Validated that are relevant reading for the Galvanize data science curriculum.

Linear Algebra

What justifies this calculation of the derivative of a matrix function?. The great whuber on matrix derivatives.

Probability

How is Pr(X=x|Y=y) defined when Y is continuous and X discrete?. Xian discusses the need for rigorous measure theory in this circumstance. Not for everyone.

Sample Statistics

Correlations with categorical variables. Options for calculating sample correlations between categorical variables.

Is there a serious problem with dropping observations with missing values when computing correlation matrix?. Comments on the common practice of dropping missing data when running basic sample statistics in EDA.

Parameter Estimation

When is a biased estimator preferable to unbiased one?

Sampling Theory

Central limit theorem for sample medians. Is there an analogue of the CLT for sample medians?

Null Hypothesis Significance Testing

What is the meaning of p values and t values in statistical tests?. High level discussion of what the hell all this stuff means.

Why is it wrong to stop an A/B test before optimal sample size is reached?

How to understand degrees of freedom?

Linear Models

Why is the squared difference so commonly used?. Whuber gives a detailed answer from a decision theoretic point of view.

Regressors with low variance. Comments and warnings on the practice of dropping regression predictors with low variance.

Is R^2 useful or dangerous?. Is it?

Correct way to use polynomial regression in Python. A quick answer on how to use Pipelines and FeatureUnions to do polynomial regression in Python.

Simple linear regression, p-values and the AIC. A nice discussion around specifying a good regression model for a simple dataset.

Why is a T distribution used for hypothesis testing a linear regression coefficient?

Proof that the coefficients in an OLS model follow a t-distribution with (n-k) degrees of freedom. Similar to the above, but with more mathematical details.

Can there be multiple local optimum solutions when we solve a linear regression?. Whuber gives a nuanced answer.

GLM Categorical Variable Level grouping / simplification. Why grouping variables based on estimated coefficients is a bad idea.

What is the difference between fixed effect, random effect and mixed effect models?. Confusing terminology.

Logistic Models

Regularization methods for logistic regression. A clear description of how to regularize logistic regression with great visuals.

Is there any intuitive explanation of why logistic regression will not work for perfect separation case? And why adding regularization will fix it?. A clear explanation with good visuals.

Should sampling for logistic regression reflect the real ratio of 1's and 0's?. Comments on Logistic Regression's ability to accurately estimate probabilities vs. estimating marginal effects of predictors.

Logistic regression: maximum likelihood vs misclassification. Why we use likelihood instead of classification error as a loss function.

Regularization Methods

The origin of the term “regularization”. Where does that word come from?

Why use regularisation in polynomial regression instead of lowering the degree?. A defense of regularization methods over discrete feature selection.

Is standardisation before Lasso really necessary?.

How can I estimate coefficient standard errors when using ridge regression?. A discussion of the meaningfulness of standard error estimates in regularized regression.

What is the smallest λ that gives a 0 component in lasso?. Math heavy but interesting.

Is iterating LASSO a reasonable idea?. A question about the motivating the iterated/adaptive LASSO.

Elastic net: How to improve unstable cross validation of lambda. Measuring the sensitivity of results with respect to the regularization parameter.

Variable coefficient rises, then falls as lambda decreases (LASSO). This phenomena seems to be surprising to some.

Derivation of closed form lasso solution. Solving the LASSO equations in the one variable case.

Closed form solution to lasso problem when data matrix is diagonal. How the LASSO works out in the simple case of uncorrelated data.

The proof of shrinking coefficients using ridge regression through “spectral decomposition”. Using the SVD to justify ridge regression shrinking coefficients.

Evaluation Metrics

What does AUC stand for and what is it?

When is it appropriate to use an improper scoring rule?. A fascinating example from Cagdas.

What are variable importance rankings useful for?

ROC curve drawbacks.

Principal Components

What can cause PCA to worsen results of a classifier?.

Mathematical Optimization

How does the L-BFGS work?. Mark Stone gives a high level description of Quasi-Newton methods.

Boosting

How is gradient boosting like gradient descent?. What is the connection between gradient boosting and gradient descent?

Bayesian Methods

PyMC modeling drug testing with truthiness. Bayesian modeling to detect drug abuse when people lie about their consumption.

When (if ever) is a frequentist approach substantively better than a Bayesian?

Time Series

Intuitive explanation of stationarity

Intuitive explanation of unit root. A difficult concept illuminated with good storytelling.

Neural Networks

What does the hidden layer in a neural network compute?.

How to choose the number of hidden layers and nodes in a feedforward neural network?

Support Vector Machines

How does a Support Vector Machine (SVM) work?. High level discussion of the intuition and mathematics underlying SVMs.

Visualization

What are essential rules for designing and producing plots?. Good advice is always welcome here.

Variety Pack

Recommendations for non-technical yet deep articles in statistics

How do R and Python complement each other in data science?. Very opinion based, but good discussion.

How did scientists figure out the shape of the normal distribution probability density function?. Fun history.

Maximum gap between samples drawn without replacement from a discrete uniform distribution. Fun probability theory.

Consider the sum of n uniform distributions on 0, 1. Why does the cusp in the PDF disappear for n≥3?. More fun probability theory.

What is “reject inferencing” and how can it be used to increase the accuracy of a model?. Dealing with censored data in credit rating models.

How to sample from Cantor distribution?. How does that even work?

Simulate a Bernoulli variable with probability ab using a biased coin. Simulating coin flips of a given bias using a coin with a different bias.

Is it possible to have a pair of Gaussian random variables for which the joint distribution is not Gaussian?. Couplas and pretty visualizations.

Approximate e using Monte Carlo Simulation

auditorbrooks / dsi-cross-validated-readings Goto Github PK

dsi-cross-validated-readings's Introduction