- +/- s.d. 68%
- +/- 2*s.d. 95%
- +/- 3*s.d. 99.7%
- ROC: Receiver operating characteristic curve
- AUC: Area under the ROC curve
ROC curve, is a graphical plot that illustrates the diagnostic ability of a binary classifier system as its discrimination threshold is varied.
- Y - TPR, X - FPR
- TPR = TP / (TP + FN) = TP / P
- FPR = FP / (FP + TN) = FP / N
Note:
- Precision = TP / (TP + FP)
- Recall = TP / (TP + FN)
- Recall = TPR
Binary Classification Table
Predicted Positive | Predicted Negative | |
---|---|---|
Actual Positive | TP | FN |
Actual Negative | FP | TN |
- z = (p - p_0) / sqrt((p_0 * (1 - p_0)) / n)
- z = (mu - mu_0) / sqrt(sigma^2 / n)
Note:
- Power of test: 1 - beta
- Analysis of Variance (ANOVA) is a statistical method used to test differences between two or more means.
- A hypothesis test that is used to compare the means of two populations is called t-test. A statistical technique that is used to compare the means of more than two populations is known as Analysis of Variance or ANOVA.
- Test Statistic / F statistics: Between Group Variance/Within Group Variance
- Signal: between group variance
- Noise: within group variance
- F statistics = Signal-to-Noise-Ratio = Between Group Variance/Within Group Variance
- Multiply t-test will increase the type I (false positive) rate: a null hypothesis is rejected although it is true.
- If there are only two conditions of the independent variable, doing ANOVA is the same as running a (two-tailed) two-sample t-test.
Examples:
-
Situations
- Categorical variables
- Data is frequency or counts
- Compare the observed frequency with the expected frequency
-
chi-square = sum( (observed_frequency_i - expected_frequency_i)^2 / expected_frequency_i )
-
Use case examples:
- Whether a dice is a fair dice
- Whether gender (Male or Female) is independent of voting preference (Republican or Democrat)
-
This lesson explains how to conduct a chi-square test for independence. The test is applied when you have two categorical variables from a single population. It is used to determine whether there is a significant association between the two variables.
In statistics, the Kolmogorov–Smirnov test (K–S test or KS test) is a nonparametric test of the equality of continuous, one-dimensional probability distributions that can be used to compare a sample with a reference probability distribution (one-sample K–S test), or to compare two samples (two-sample K–S test).
Simpson’s Paradox is a paradox in probability and statistics, in which a trend appears in different groups of data but disappears or reverses when these groups are combined.
- By adding Gaussian noise to the input, the learning model will behave like an L2-penalty regularizer.
- By adding Laplace noise to the input, the learning model will behave like an L1-penalty regularizer.
- Linear relationship between Y and X1, …, Xp
- Error Term i.i.d (Independent and identically distributed) 均值为0,近normal 分布(normally distributed residuals),constant variance和independently distributed.
- Error has normal distribution.
- The errors have zero mean.
- The errors have the same (but unknown) variance. When this assumption does not hold, the problem is called the heterogeneity or the heteroscedasticity problem. (191/391 Regression Analysis By Example)
- Weighted Least Squares sum(w_i * (y_i - beta_0 - beta_1 * x_i)^2)
- w_i = 1/sigma_i^2, sigma_i^2 is the variance of epsilon_i.
- Ordinary Least Squared sum( (y_i - beta_0 - beta_1 * x_i)^2 )
- The errors are independent of each other. When this assumption does not hold, we have the autocorrelation problem. (213/391 Regression Analysis By Example)
- The first type is only autocorrelation in appearance. It is due to the omission of a variable that should be in the model, such as seasonality.
- The second type of autocorrelation may be referred to as pure autocorrelation. The methods of correcting for pure autocorrelation involve a transformation of the data.
- The predictor variables X1, . . . , Xp, are assumed to be linearly independent of each other; variables are nonrandom; variable values are all measured without error.
- All observations are equally reliable and have approximately equal role in determining the regression results and in influencing conclusions.
With Replacement
- Sampling with replacement is used to find probability with replacement. In other words, you want to find the probability of some event where there’s a number of balls, cards or other objects, and you replace the item each time you choose one. Let’s say you had a population of 7 people, and you wanted to sample 2. Their names are: John, Jack, Qiu, Tina, Hatty, Jacques, Des. You could put their names in a hat. If you sample with replacement, you would choose one person’s name, put that person’s name back in the hat, and then choose another name. The possibilities for your two-name sample are: [John, John], [John, Jack], [John, Qui], … and so on.
Without Replacement
- Sampling without Replacement is a way to figure out probability without replacement. In other words, you don’t replace the first item you choose before you choose a second.
Autoregression
- x(t) = c + sum( beta_i * x(t - i)) + epsilon(t)
Moving Average
- x(t) = mu + epsilon(t) + sum( q_i * epsilon(t - i))
- ARIMA(p,d,q)
- Differencing in statistics is a transformation applied to time-series data in order to make it stationary.
- If d=0: y(t) = Y(t)
- If d=1: y(t) = Y(t) - Y(t-1)
- If d=2: y(t) = (Y(t) - Y(t-1)) - (Y(t-1) - Y(t-2)) = Y(t) - 2Y(t-1) + Y(t-2)
- regularization
- dropout
- cross validation
- early stopping
- ...
Linear Regression
- Y = a + b * X
- b = Cov(X, Y) / Variance(X) = Cov(X, Y) / Sigma(X)^2
Pearson Correlation
- rho = Cov(X, Y) / Sigma(X) * Sigma(Y)
- Pearson correlation coefficient formula: rho = Cov(X, Y) / Sigma(X) * Sigma(Y)
- Cov(X, Y) = E((X - E(X))*(Y - E(Y)))
- Corr(X) = E((X - E(X)) * (X - E(X))) = E(X^2) - (E(X))^2
- X ~ N(0, 1), E(X) = 0, Corr(X) = Sigma(X)^2 = E(X^2) - (E(X))^2 = E(X^2) = 1
Answer:
- sqrt(2) / 2
Pick up a coin from C1 and C2 with probability of trials p(h1)=.7, p(h2)=.6 and doing 10 trials. And what is the probability that the given coin you picked is C1 given you have 7 heads and 3 tails?
- p(C1|7H) = p(7H|C1) * p(C1) / p(7H)
- p(7H) = p(7H|C1)*p(C1) + p(7H|C2)*p(C2)
- assume p(C1) = p(C2) = 0.5, pick either w/ equal prob.
- p(7H | C1) = C(10,7)0.7^70.3^3
- ans = C(10,7)*0.7^7 * 0.3^3 * 0.5 / [ C(10,7)*0.7^7 * 0.3^3 * 0.5 + C(10,7)*0.6^7 * 0.4^3 * 0.5]
A box contains a red ball, an orange ball, a yellow ball, a green ball, a blue ball and an indigo ball. Billy randomly selects 7 balls from the box (with replacement). What is the expected value for the number of distinct colored balls Billy will select?
- Let P(i), 0 < i <= 7 be the probability of having i distinct balls after drawing 7 balls.
- P(1) = C(6,1) * (1/6)^7
- P(2) = C(6,2) * (1/6)^7 * (C(7,1) + C(7,2) + C(7,3) + C(7,4) + C(7,5) + C(7,6))
- ...
- roll once and get #1: 1/6
- roll twice and get #1: 5/6*1/6
- roll 3 times and get #1: (5/6)^2*1/6
- ...
- E(N) = 1/6 + 21/65/6 + 31/6(5/6)^2 + ...
- 5/6 * E(N) = 1/6 * 5/6 + ...
- 1/6 * E(N) = 1/6 + 1/6*5/6 + = 1
- E(N) = 1/p
- 1th: a, 1
- 2nd: b, 5/6
- 3rd: c, 4/6
- 4th: d, 3/6
- 5th: e, 2/6
- 6th: f, 1/6
- To get (1, 2, 3, 4, 5, 6): 1 + 6/5 + 6/4 + 6/3 + 6/2 + 6/1
- First flip once to get H or T, accept
- Then we need to get the other T for H (1st), or H for T (1st), the expected flip times is 2.
- So, finally expected times = 1 + 2 = 3
- Flip once, H (or T)
- Flip twice, 1/2 to get H, stop; remain 1/2 need to continue, get T
- Flip third, 1/2 to get T, stop; remain 1/2 need to continue, get H
- ...
- expected times = 1/2 * 2 + 1/2 * 1/2 * 3 + 1/2 * 1/2 * 1/2 * 4 + ... = 3
- In sampling without replacement, focus on how many iterations are needed. That number will be any count from 1 through 100, each equally likely.
- So sampling without replacement, it will be 1/100*(1+2+...+100) = 50.5.
1000 people, each time select 10, (w/ replacement), 问每个人on average多少次会被第一次抽到(1000个人,你每次抽10个人。 每次抽完以后再把这10个人放回去。这样有些人会被抽中1次,有些人会被抽中2次,etc。 比如你抽了10次之后,有一个人Mike, 他第一次被你抽中。现在问,平均下来,一个人被第一次抽中时, 你抽了几次。)
- p = 1 / 100
- Expected times to be selected 1/p = 100
- p = 1/3
- Expected times to get 红球: 1/p = 3
- 蓝色和白色are symmetric, so the expected time to get 蓝色 is 1.
If the probability a car get at least one ticket within an hour is 84%, what is the probability of getting at least one ticket in half an hour?
- Say the probability of not get a ticket within half an hour is 1-x, then one hour without a ticket is (1-x)*(1-x)=1-0.84. x=0.6
or,
- It really depends on what model is assumed. However, if the idea is that no matter how long you leave your car there, you have a 16% chance of getting through any given hour unscathed, you can treat it as an exponential decay problem. Let p(t) be the probability that you do not get a ticket in the first t hours. Then p(1)=0.16, p(2)=0.16*0.16 (a 16% chance of making it through the first hour times a 16% chance of making it through the second), and in general p(t)=0.16^t. The probability of not getting a ticket in the first half hour is then p(1/2)=0.16^(1/2)=0.4, and the probability that you do get a ticket in the first half hour is about 1−0.4=0.6.
I throw three (n) darts along a circle, what is the probability all three(n) points are in a semi-circle?
-
Suppose that point i has angle 0 (angle is arbitrary in this problem) -- essentially this is the event that point i is the "first" or "leading" point in the semicircle. Then we want the event that all of the points are in the same semicircle -- i.e., that the remaining points end up all in the upper half plane.
-
That's a coin-flip for each remaining point, so you end up with 1 / 2^(n−1). There's n points, and the event that any point i is the "leading" point is disjoint from the event that any other point j is, so the final probability is n / 2^(n−1) (i.e. we can just add them up).
-
A sanity check for this answer is to notice that if you have either one or two points, then the probability must be 1, which is true in both cases.
1) A and B have a game. There are 7 games in total, who wins 4 games first will succeed the whole game and then the game ends. Given that A has a probability P to win a single game, and A already lost the first 2 games, what's the probability that A still wins the whole game. 2) Given that someone said A has 80% probability to win the whole game, what is the posterior that A win the whole game.'
- C(5, 1) * P^4 * (1-P)
Consider a game with 2 players, A and B. Player A has 8 stones, player B has 6. Game proceeds as follows. First, A rolls a fair 6-sided die, and the number on the die determines how many stones A takes over from B. Next, B rolls the same die, and the exact same thing happens in reverse. This concludes the round. Whoever has more stones at the end of the round wins and the game is over. If players end up with equal # of stones at the end of the round, it is a tie and another round ensues. What is the probability that B wins in 1, 2, ..., n rounds?
- On the first round, B can win if (A,B) rolls: (1,4), (1,5), (1,6), (2,5), (2,6), (3,6) - so there are 6 out of 36 possibilities where B wins. P(B1) [read, probability that B wins in round 1] = 1/6 On the first round, B can tie if (A,B) rolls: (1,3), (2,4), (3,5), (4,6) - so there are 4 out of 36 possibilities where B ties. If B ties with A, there is a second round, where B wins with probability 15/36, A wins with probability 15/36 = 5/12, and they tie with probability 6/36. So P(B2) = 1/9 * (5/12) After the second round, the game only continues if both players have equal number of points, and the probability of a tie in each game is 1/6, so P(B3) = (1/9) * (1/6) * (5/12) Generally, P(Bn) = (1/9) * (1/6)^(n-2) * (5/12)
- 一个放1个红球。另一个99个红球,100个白球。这样概率不就是0.5 + 0.5 * 99/199了吗~
Bobo the amoeba has a 25%, 25%, and 50% chance of producing 0, 1, or 2 offspring, respectively. Each of Bobo's descendants also have the same probabilities. What is the probability that Bobo's lineage dies out?
- The probability of lineage dies out for Bob = probability of lineage dies out for each of Bob’s son’s.
- Let’s assume the probability is P, then p = 0.25 + 0.25 * p + 0.5 * (p * p) => p = 0.5
- That is, Bob’lineage dies out = Bob has not child + Bob has one child * this child’s lineage dies out + Bob has two children * both children’s lineage dies out
There is a building with 100 floors. You are given 2 identical eggs. How do you use 2 eggs to find the threshold floor, where the egg will definitely break from any floor above floor N, including floor N itself.
- UCB : An Optimal Exploration Algorithm for Multi-Armed Bandits
- 广告bid, 出价最高的公司可以拿到那个广告位置,但是价格是第二高的那个价格。比如说有3个出价,100,50,30。那么出价100的赢了,但只需要出50块。 问题比较简单,如果只有两个人bid,这两人的出价都是Uniform(0,1),那么expected的广告收入说多少?
- 我说一个价格x_1, 一个x_2, 本质求 min(x_1, x_2)的期望。
- From StackExchange
- By definition probability, F(y) = P(Y ≤ y) = 1 − P(Y > y) = 1 − P(min(X1,…,Xn) > y).
- Of course, min(X1,…Xn) > y exactly when Xi > y for all i.
- Since these variables are i.i.d., we have F(y) = 1 − P(X1 > y)P(X2 > y)…P(Xn > y) = 1 − P(X1 > y)^n.
- From F(y), derivative to get the density function f(y)
- The expect value E(Y) = integral(y * f(y) * dy)
- Another Solution
计算Var(x1, x2, ..., xn), 但是所有xi分布在k个不同的服务器上。服务器运算力很强,服务器之间不能传数据,只能和本地一台很差的机器相互传输。怎么尽量减少传输数据量。
- var(X) = E(X^2)-(E(X))^2
- E(X) & E(X^2)都可以先在服务器上计算一部分,传3个数 (sum(Xi), sum(Xi^2), count(Xi)) 回本地,最后本地再合成。
In a class you have 30 students, what is the probability of that we have exact two people who have the same birthday?
- C(30, 2) * C(365, 1) * 364 * 363 * ... * (364 - 27) / 365 ^ 30