Code Monkey home page Code Monkey logo

real-estate's Introduction

2.Real Estate(Regression Problem)

1. Business Problem

1.1 Problem Context

Our client is a large Real Estate Investment Trust (REIT).

  • They invest in houses, apartments, and condos(complex of buildings) within a small county in New York state.
  • As part of their business, they try to predict the fair transaction price of a property before it's sold.
  • They do so to calibrate their internal pricing models and keep a pulse on the market.

1.2 Problem Statement

The REIT has hired us to find a data-driven approach to valuing properties.

  • They currently have an untapped dataset of transaction prices for previous properties on the market.
  • The data was collected in 2016.
  • Our task is to build a real-estate pricing model using that dataset.
  • If we can build a model to predict transaction prices with an average error of under US Dollars 70,000, then our client will be very satisfied with the our resultant model.

1.3 Business Objectives and Constraints

  • Deliverable: Trained model file
  • Win condition: Avg. prediction error < $70,000
  • Model Interpretability will be useful
  • No latency requirement

2. Machine Learning Problem

2.1 Data Overview

For this project:

  1. The dataset has 1883 observations in the county where the REIT operates.
  2. Each observation is for the transaction of one property only.
  3. Each transaction was between $200,000 and $800,000.

Target Variable

  • 'tx_price' - Transaction price in USD

Features of the data:

Public records:

  • 'tx_year' - Year the transaction took place
  • 'property_tax' - Monthly property tax
  • 'insurance' - Cost of monthly homeowner's insurance

Property characteristics:

  • 'beds' - Number of bedrooms
  • 'baths' - Number of bathrooms
  • 'sqft' - Total floor area in squared feet
  • 'lot_size' - Total outside area in squared feet
  • 'year_built' - Year property was built
  • 'active_life' - Number of gyms, yoga studios, and sports venues within 1 mile
  • 'basement' - Does the property have a basement?
  • 'exterior_walls' - The material used for constructing walls of the house
  • 'roof' - The material used for constructing the roof

Location convenience scores:

  • 'restaurants' - Number of restaurants within 1 mile
  • 'groceries' - Number of grocery stores within 1 mile
  • 'nightlife' - Number of nightlife venues within 1 mile
  • 'cafes' - Number of cafes within 1 mile
  • 'shopping' - Number of stores within 1 mile
  • 'arts_entertainment' - Number of arts and entertainment venues within 1 mile
  • 'beauty_spas' - Number of beauty and spa locations within 1 mile
  • 'active_life' - Number of gyms, yoga studios, and sports venues within 1 mile

Neighborhood demographics:

  • 'median_age' - Median age of the neighborhood
  • 'married' - Percent of neighborhood who are married
  • 'college_grad' - Percent of neighborhood who graduated college

Schools:

  • 'num_schools' - Number of public schools within district
  • 'median_school' - Median score of the public schools within district, on the range 1 - 10

2.2 Mapping business problem to ML problem

2.2.1 Type of Machine Learning Problem

It is a regression problem, where given the above set of features, we need to predict the transaction price of the house.

2.2.2 Performance Metric (KPI)

Since it is a regression problem, we will use the following regression metrics:

  • Root Mean Squared Error (RMSE)
  • R-squared
  • Mean Absolute Error

3.Feature Engineering

3.1.Features Created:

two_and_two:
  • An indicator variable to flag properties with 2 beds and 2 baths and name it 'two_and_two'.
old_properties:
  • People might also not take much interest in old properties.
  • so, this feature gives properties built before 1980.
tax_and _insurance:
  • Both tax and insurance are monthly quantities property holder needs to pay monthly
during_recession:
  • A new feature called 'during_recession' is created to indicate if a transaction falls between 2010 and 2013.
propery_age:
  • property_age' denotes the age of the property when it was sold and not how old it is today, since we want to predict the price at the time when the property is sold.
school_score:
  • A school score feature is created as num_schools * median_school
One hot Encoding:
  • Machine learning algorithms cannot directly handle categorical features. Specifically, they cannot handle text values.
  • Therefore, we need to create dummy variables for our categorical features.
  • Dummy variables are a set of binary (0 or 1) features that each represent a single class from a categorical feature.

3.2.Features Removed:

  • Features such as property_tax, year_built, tx_year, insurance are removed.

real-estate's People

Contributors

sanikavt avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.