This lesson summarizes the topics we'll be covering in section 20 and why they'll be important to you as a data scientist.
You will be able to:
- Understand and explain what is covered in this section
- Understand and explain why the section will help you to become a data scientist
In this section, we'll be looking at .....
Without good experimental design, it's very easy to draw the wrong conclusions from your experiments. Because of that, we kick this section off by looking at the scientific method and the key elements of good experimental design - forming alternate and null hypotheses, conducting an experiment, analysing the results for statistical significance and drawing conclusions.
We then look at how to calculate and interpret the size of the difference between control and test groups. We'll see how the "Effect Size" can be used to communicate the practical significance of experimental results, to perform meta-analyses of multiple studies, and to perform power analysis to determine the number of particicpants that a study would require to achieve a certain probability of finding a true effect. We'll also look at t-tests and how they can be used to compare two averages to see how significant the differences are between two sets of results.
From there, we introduce the concept of type 1 (false positive) and type 2 (false negative) errors and the inherent trade off between them.
We then introduce the concept of the power of a statistical test - the test's ability to detect a difference, when one exists. We look at how it relates to p-values and effect size for hypothesis testing, and get some practice calculating statistical powers using SciPy. We then pull together all of the previous ideas and ask you to design an experiment for a policital campaign.
From there, we look at some of the issues that arise when trying to perform multiple comparisons - from the risks of spurious correlations to the important of corrections such as the Bonferroni Correction to deal with the risks inherent in multiple comparisons.
Next up is A/B testing. We start by introducing the concept of an A/B terst, and then building on our recent experience of experimental design, we go through the process of designing, structuring and running an A/B test.
We then take a little bit of time to consider the implications of misusing metrics - even if our experiments are initially sound.
Finally, we spend some time introducing the (Analysis of Variance) method for generalizing previous discussions regarding statistical tests to multiple groups.
Without a good understanding of experimental design, it's easy to end up confusing spurious correlations for meaningful results or placing too much (or too little) weight on the results of any given test. In this section we cover a range of tools and techniques to ensure that you design your experiments regirously and interpret them thoughtfully.