This lesson summarizes the topics we'll be covering in section 10 and why they'll be important to you as a data scientist.
You will be able to:
- Understand and explain what is covered in this section
- Understand and explain why the section will help you to become a data scientist
In this section we're going to introduce our first machine learning model - linear regression. It's really just a fancy way of saying "(straight) line of best fit", but it will introduce a number of concepts that will be important as we continue to explore more sophisticated models in modules 2 and 3.
We start the section by covering covarience and correlation, both of which relate to how likely two variables are to change together. For example, with houses, it wouldn't be too surprising if the number of rooms and the price of a house were correlated (in general, more rooms == more expensive).
We then explore statistical learning theory and how dependent and independent variables relate to it.
Next, we look into a simple linear regression and figure out how to calculate the "line of best fit".
We're then gong to introduce the idea of "R squared" as the coefficient of determination to quantify how well a particular line fits a particular data set.
From there we look at calculating a complete linear regression, just using code, cover some of the assumptions that must be held for a "least squares regression", introduce Ordinary Least Squares in Statsmodel and introduce some tools for diagnosing your linear regression such as Q-Q plots, the Jarque-Bera test for normal distribution of residuals and the Goldfield-Quandt test for heteroscedasticity. We then look at interpretation of significance and p-value and finish up by doing a regression model of the Boston Housing data set.
Congratulations! You've made it through much of the introductory data and we've finally got enough context to take our first look at our first machine learning model, while broading our experience of both coding and math so we'll be able to introduce more sophisticated machine learning models as the course progresses.