Code Monkey home page Code Monkey logo

ames-house-price-prediction-with-advanced-regression-techniques's Introduction

Ames-House-Price-Prediction-with-Advanced-Regression-Techniques

This competition is very important to me as it helped me to begin my journey on Kaggle few months ago. Competition link: https://www.kaggle.com/c/house-prices-advanced-regression-techniques

Competition Banner

About the dataset: Compared to the Boston Housing Dataset the Ames Housing Dataset is a large dataset with a total of around 2900+ observations and packed with 80 feature variables ('Boston Dataset' has 506 observations and only 14 variables). In the dataset there are,

  • 20 continuous variables (relate to various area dimensions for each observation),
  • 14 discrete variables (quantify the number of items occurring within the house),
  • 46 categorical variables (23 nominal, 23 ordinal)

Read more about the dataset here: http://jse.amstat.org/v19n3/decock.pdf

It's a dataset packed with a lot of features of every type and a wide range of variations and problems in them. Hence getting a good score in this competition requires advanced EDA & modelling skills.

I have applied the following feature Engineering techniques here:

  • Imputing missing values by analaysing each of the features seperately without dropping any of them
  • Removing extreme outliers from the training data using both univariate & bivariate analysis
  • Transforming ordinal features to numerical (i.e. label encoding) and numerical variables to categorical where necessary
  • Adding new features both numerical and discrete/boolean special features
  • Analysing skewness of the numerical features
  • Box Cox Transformation of skewed features (instead of log-transformation because boxcox yield to slightly better models)
  • One-hot encoding the nominal categorical variables

I have tried several ML algorithms and in the end stacking & ensembling approches:

  • Lasso (L1) Regression
  • Ridge (L2) Regression
  • ElasticNet
  • Gradient Boosting (xgboost, lightgbm etc.)
  • Averaging all/selected base models
  • Model Stacking with a meta regressor
  • Model blending by assigning different weights to different models

After approaching and experimenting with all the above techniques my final LB score was 0.11994 (top 2% on the leaderboard the last time I submitted: September, 2020)

LB

ames-house-price-prediction-with-advanced-regression-techniques's People

Contributors

arnabx007 avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.