Code Monkey home page Code Monkey logo

used_car_price_predictor's Introduction

Used Car Price Estimator: Project Overview

  • Created a tool that estimates price of a used car(MAE ~ $4K) to help the buyer negotiate while shopping and individuals who want to list their car on platforms like Facebook Marketplace.
  • Scraped over 2000 car listings in California and NewYork from cargurus.com (listing website of used car) using selenium.
  • Cleaned data and engineered features (such as model, built year, mileage, details, and description) from raw data.
  • Performed EDA (exploratory data analysis) to handle missing values, remove outliers, transform variables, check correlation and shortlist variables for machine learning model building.
  • Optimized Linear, Lasso, Ridge, Random Forest, and XGBoost regressors using GridsearchCV to find the best model.
  • Deployed in Heroku cloud using Flask API endpoint. Made it available online for users to get an approximate price of the car. Link given in Productionization section below.

Code and Resources Used

Web Scraping (webscraper.ipynb)

Scraped over 2000 car listing from cargurus.com using selenium. The following information were extracted from each listing:

  • Price of car
  • Build year
  • Model of the car (Ford, Toyota, Honda, etc.)
  • Gas Mileage
  • Features (Bluetooth, Alloy wheels, Android Auto, Backup Camera, Heated seats, etc.)
  • Transmission (Automatic and Manual)
  • Color
  • Engine
  • Drivetrain
  • Fuel type
  • Description

Data Cleaning (data_cleaning.ipynb)

Cleaned the raw data and made new columns with proper assignment of variables.

  • Parsed numeric and categorical variables out of text.
  • Made columns for price, model, and build year of the car.
  • Parsed details such as Transmission, Color, Engine, Drivetrain, Fuel_type, Gas mileage into separate columns from a single column.
  • Column for description length
  • Counted number of features such as Bluetooth, keyless entry, type of wheel, heated seats, etc, and saved into one column.

EDA (EDA.ipnyb)

Plotted distribution of continous and categorial variables. Imputed missing values and removed outliers. Highlights from EDA notebook are as follows:

alt text alt text alt text alt text

Model Building (model_building.ipnyb)

  • Transformed the categorical variables into binary format using One-Hot encoding.
  • Split the dataset into 80:20 for train and test respectively.
  • Used MAE (Mean Absolute Error) to evaluate model. MAE is good with outliers.
  • Used five models:
    • Multilinear regression - Baseline for the model.
    • Ridge regression - To prevent overfitting.
    • Lasso regression - Effective because data is sparse for many categorical variables.
    • Random Forest - Could be a good fit for sparse data type.
    • XGBoost - Most advanced algorithm for sparse data. Also, hypertuned the parameters using GridsearchCV.

Model performance

XGBoost performed the best:

  • Multilinear regression: MAE = $4968
  • Ridge regression: MAE = $4968
  • Lasso regression: MAE = $4914
  • Random Forest: MAE = $4596
  • XGBoost: MAE = $4185

Feature Importance

alt text

Productionization

Built a Flask API endpoint and hosted on webserver using heroku. The following page takes user inputs and estimate the price of car. Refer this link to checkout deployment page: https://carpricechecker.herokuapp.com/

alt text

Conclusion and Future Recommendation

  • Average price of used cars listed in cargurus.com is ~$20K in California and NewYork.
  • Major factors that affect the price of used cars is in following order: mileage > built year > gas mileage > features > color > engine > drivetrain
  • Pipeline can be implemented for data cleaning, EDA and model building to streamline the process.
  • Accuracy of model can be improved by selecting better features, variable transformation(skewed to normal distribution), and removal of more outliers.
  • NLP can be used to get insight from descriptions.

Github rendering problem with jupyter notebook?

Use the link below and copy paste the git repository link into it for a smooth experience. https://nbviewer.jupyter.org/

used_car_price_predictor's People

Contributors

ajay-rai avatar

Stargazers

 avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.