In this lesson, we'll learn about the problems that can arise from doing multiple comparisons in a single experiment.
You will be able to:
- Understand and explain the concept of spurious correlation
- Understand and explain why multiple comparisons increases the likelihood of misleading results
- Understand and use corrections such as the Bonferroni Correction to deal with multiple comparisons
By now, we've learned about the concept of correlation. We've also learned some important adages, such as "correlation does not equal causation". Correlation tells us that there seems to be some sort of mathematical relationship between the values of two different things. If you're wondering why we say that there only seems to be a mathematical relationship, because sometimes, two things that seem to be correlated aren't actually correlated--it just happens to look that way due to the random nature of our dataset. Although the data may suggest that two things are correlated, we know that there is no actual relationship between them. We call these sorts of "false" correlations Spurious Correlation.
These are always fun--let's take a look at some examples!
All of the data in the example graphs below are real. These are real spurious correlations that can be found in the real world.
The number of letters in the winning word of the spelling bee correlates very closely with the number of people killed by venomous spiders that year.
The number of people who drown by falling in a pool each year correlates very closely with the number of films Nicolas Cage appears in each year.
The number of songs from a given year that make the list of "Top 500 rock songs" by Rolling Stone correlates very closely with US Crude Oil Production.
As we can see, although these graphs show that each of these things seems to be very strongly correlated, we know pretty intuitively just by looking at them that this must only be because of coincidence. Regardless of what the statistics tells us, there is no relationship through which the length of spelling bee word affects then number of people killed by venomous spiders
Spurious correlation is a Type 1 Error, meaning that it's a type of False Positive. We think we've found something important, when really there isn't. With each comparison we make in an experiment, we try to set a really low p-value to limit our exposure to type 1 errors. When we only reject the null hypothesis when p < 0.05, for example, we are effectively saying "I'm only going to accept these results as true if there is less than a 5% chance that I didn't actually find anything important, and my data only looks like this due to randomness". However, when we make Mulitple Comparisons by checking for many things at once, each of small risks of a Type 1 Error become cumulative!
Here's another easy to way to phrase this--a p-value threshold of < .05 means that we will only make a Type 1 error 1 in every 20 times. This means that statistically, if I have 20 findings where my p-value is less than < .05 at the same time, 1 of them is almost guaranteed to be a Type 1 error (False Positive)--but I have no idea of which one!
The main way we can avoid the cumulative effect of Type 1 errors is through the use of statistical corrections such as the Bonferroni Correction. To do this, we just divide our
For instance, if we have
In this lesson, you learned about the problems that can arise from doing multiple comparisons in a single experiment.