Code Monkey home page Code Monkey logo

dsc-2-20-01-introduction's Introduction

Introduction

Introduction

This lesson summarizes the topics we'll be covering in section 20 and why they'll be important to you as a data scientist.

Objectives

You will be able to:

  • Understand and explain what is covered in this section
  • Understand and explain why the section will help you to become a data scientist

Hypothesis and AB testing

In this section, we'll be looking at .....

Experimental Design

Without good experimental design, it's very easy to draw the wrong conclusions from your experiments. Because of that, we kick this section off by looking at the scientific method and the key elements of good experimental design - forming alternate and null hypotheses, conducting an experiment, analysing the results for statistical significance and drawing conclusions.

Effect Size

We then look at how to calculate and interpret the size of the difference between control and test groups. We'll see how the "Effect Size" can be used to communicate the practical significance of experimental results, to perform meta-analyses of multiple studies, and to perform power analysis to determine the number of particicpants that a study would require to achieve a certain probability of finding a true effect. We'll also look at t-tests and how they can be used to compare two averages to see how significant the differences are between two sets of results.

Type 1 and Type 2 Errors

From there, we introduce the concept of type 1 (false positive) and type 2 (false negative) errors and the inherent trade off between them.

Statistical Power

We then introduce the concept of the power of a statistical test - the test's ability to detect a difference, when one exists. We look at how it relates to p-values and effect size for hypothesis testing, and get some practice calculating statistical powers using SciPy. We then pull together all of the previous ideas and ask you to design an experiment for a policital campaign.

Multiple Comparisons

From there, we look at some of the issues that arise when trying to perform multiple comparisons - from the risks of spurious correlations to the important of corrections such as the Bonferroni Correction to deal with the risks inherent in multiple comparisons.

A/B Testing

Next up is A/B testing. We start by introducing the concept of an A/B terst, and then building on our recent experience of experimental design, we go through the process of designing, structuring and running an A/B test.

Goodharts Law and Metric Tracking

We then take a little bit of time to consider the implications of misusing metrics - even if our experiments are initially sound.

ANOVA Testing

Finally, we spend some time introducing the (Analysis of Variance) method for generalizing previous discussions regarding statistical tests to multiple groups.

Summary

Without a good understanding of experimental design, it's easy to end up confusing spurious correlations for meaningful results or placing too much (or too little) weight on the results of any given test. In this section we cover a range of tools and techniques to ensure that you design your experiments regirously and interpret them thoughtfully.

dsc-2-20-01-introduction's People

Contributors

loredirick avatar peterbell avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

dsc-2-20-01-introduction's Issues

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.