Code Monkey home page Code Monkey logo

dsc-inference-vs-prediction-v2-1's Introduction

Inference versus Prediction

Introduction

In the last few lessons, you have seen how to deal with categorical variables and why multicollinearity can be an issue with regression analysis. You also learned about log transformations, feature scaling, and normalization in order to accurately determine the coefficients of your features and to improve the model accuracy. Before we proceed further, it is important to discuss two different modeling approaches that you should keep in mind when working with data: modeling for inference and modeling for prediction. Are you asking yourself "aren't they the same thing"? Well, no! In this lesson you will see why and how.

Objectives

  • Explain the difference between modeling for inference and prediction

Inference

When you are modeling for inference, you are asking the question "How does X (independent variables or features) affect Y (dependent or target or outcome variable)?". So, in essence, you are trying to figure out which features affect your outcome and how your outcome changes when these features change.

When modeling for inference, you are typically focused on only a subset of features because you are trying to understand how the outcome changes when you vary these features. As a result, great emphasis is given to the coefficients of these features as opposed to the overall accuracy of the model.

Hence, when you are modeling for inference, you typically choose simpler models, that is, models that are interpretable. Linear regression is a very good example of a model that is interpretable. With some basic training, anyone can understand how the features affect the outcome by observing the coefficients of these features. Some other interpretable models that you will learn later are logistic regression, decision trees, linear SVMs etc.

Prediction

When you are modeling for prediction, you are asking the question "How well can I use X (independent variables or features) to predict Y (dependent or target or outcome variable)?" Thus, in this case, you are less concerned about how and which features impact Y as opposed to how you can efficiently use them to predict Y.

When modeling for prediction, you typically use all available features (and most likely engineer new features) because you are trying to accurately predict Y, at all costs. As a result, you are less concerned about the coefficients of these features and instead focus on the overall accuracy of the model.

Hence, when you are modeling for prediction, you typically choose more complex models. In the upcoming modules, as you learn about various Machine Learning models, you will notice that your sole focus is on improving the predictive accuracy of your models. That is, given some data, your job will be to build a model that best predicts the future (your target variable). This can often mean you will be dealing with black box models -- models that are difficult to interpret. Given the independent variables, these models can do a great job of predicting the target, but its inner workings will be very very difficult (almost impossible) to understand. These models can include SVMs with radial kernels, random forests, neural networks, and other techniques such regularization, cross-validation, grid search etc.

Additional Resources

Summary

Remember that how you build your models depends on what question you are asking of your data:

  • Are you solely interested in how you can use the data to predict the future? If so, you are most likely modeling for prediction
  • Or, are you interested in understanding how a given set of features affect your outcome? If so, you are most likely modeling for inference

Depending on what questions you ask, your modeling approaches will vary significatly, and hence it is very important to first understand the context of your problem and ask yourself what is the end goal of your analysis before you set out building any models.

dsc-inference-vs-prediction-v2-1's People

Contributors

sumedh10 avatar suntzulombardi avatar cheffrey2000 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.