Code Monkey home page Code Monkey logo

dsc-full-ds-regression-intro's Introduction

A Complete Data Science Project Using Multiple Regression - Introduction

Introduction

In this section, you'll get a chance to synthesize your skills and work through the entire Data Science workflow. To start, you'll extract appropriate data from a SQL database. From there, you'll continue exploring and cleaning your data, modeling the data, and conducting statistical analyses!

Data Science Processes

You'll take a look at three general frameworks for conducting Data Science processes using the skills you've learned thus far:

  • CRoss-Industry Standard Process for Data Mining - CRISP-DM
  • Knowledge Discovery in Databases - KDD
  • Obtain Scrub Explore Model iNterpret - OSEMN

Note: OSEMN is pronounced "OH-sum" and rhymes with "possum"

From there, the lessons follow a similar structure:

Obtaining Data

You'll review SQL and practice importing data from a relational database using the ETL (Extract, Transform and Load) process.

Scrubbing Data

From there, you'll practice cleaning data:

  • Casting columns to the appropriate data types
  • Identifying and dealing with null values appropriately
  • Removing columns that aren't required for modeling
  • Checking for and dealing with multicollinearity
  • Normalizing the data

Exploring Data

Once you've the cleaned data, you'll then do some further EDA (Exploratory Data Analysis) to check out the distributions of the various columns, examine the descriptive statistics for the dataset, and to create some initial visualizations to better understand the dataset.

Modeling Data

Finally, you'll create a definitive model. This will include fitting an initial regression model, and then conducting statistical analyses of the results. You'll take a look at the p-values of the various features and perform some feature selection. You'll test for regression assumptions including normality, heteroscedasticity, and independence. From these tests, you'll then refine and improve the model, not just for performance, but for interpretability as well.

Summary

In this section, you'll conduct end-to-end review of the Data Science process!

dsc-full-ds-regression-intro's People

Contributors

cheffrey2000 avatar fpolchow avatar loredirick avatar mas16 avatar mathymitchell avatar peterbell avatar sumedh10 avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.