A Complete Data Science Project Using Multiple Regression - Introduction

Introduction

In this section, you'll get a chance to synthesize your skills and work through the entire Data Science workflow. To start, you'll extract appropriate data from a SQL database. From there, you'll continue exploring and cleaning your data, modeling the data, and conducting statistical analyses!

Data Science Processes

You'll take a look at three general frameworks for conducting Data Science processes using the skills you've learned thus far:

CRoss-Industry Standard Process for Data Mining - CRISP-DM
Knowledge Discovery in Databases - KDD
Obtain Scrub Explore Model iNterpret - OSEMN

Note: OSEMN is pronounced "OH-sum" and rhymes with "possum"

From there, the lessons follow a similar structure:

Obtaining Data

You'll review SQL and practice importing data from a relational database using the ETL (Extract, Transform and Load) process.

Scrubbing Data

From there, you'll practice cleaning data:

Casting columns to the appropriate data types
Identifying and dealing with null values appropriately
Removing columns that aren't required for modeling
Checking for and dealing with multicollinearity
Normalizing the data

Exploring Data

Once you've the cleaned data, you'll then do some further EDA (Exploratory Data Analysis) to check out the distributions of the various columns, examine the descriptive statistics for the dataset, and to create some initial visualizations to better understand the dataset.

Modeling Data

Finally, you'll create a definitive model. This will include fitting an initial regression model, and then conducting statistical analyses of the results. You'll take a look at the p-values of the various features and perform some feature selection. You'll test for regression assumptions including normality, heteroscedasticity, and independence. From these tests, you'll then refine and improve the model, not just for performance, but for interpretability as well.

Summary

In this section, you'll conduct end-to-end review of the Data Science process!

learn-co-curriculum / dsc-full-ds-regression-intro Goto Github PK

dsc-full-ds-regression-intro's Introduction

A Complete Data Science Project Using Multiple Regression - Introduction

Introduction

Data Science Processes

Obtaining Data

Scrubbing Data

Exploring Data

Modeling Data

Summary

dsc-full-ds-regression-intro's People

Contributors

Watchers

Forkers

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent