Code Monkey home page Code Monkey logo

meysamraz / laptop-price-prediction-end-to-end-project-using-ecommerce-website-data Goto Github PK

View Code? Open in Web Editor NEW
2.0 1.0 0.0 2.28 MB

Using machine learning, feature engineering, and web scraping, I created an end-to-end laptop price prediction website by scraping data from a popular Iranian source. Empowering users with accurate pricing estimates and model comparisons

License: GNU General Public License v3.0

Jupyter Notebook 100.00%
datavisualization machine-learning mageai streamlit webscraping workflow

laptop-price-prediction-end-to-end-project-using-ecommerce-website-data's Introduction

Loptop-price-prediction-end-to-end-project-using-ecommerce-website-data

In this project, I tried to create a model that can predict the price of a laptop based on the criteria of the desired laptop by using the data of the available laptops that I collected from the Digikala website, the largest e-commerce site in Iran.

This project is a part of my portfolio, showcasing my skills. The main code isn't public, but I'm open to collaboration! Interested? Email me at [email protected]

Note: Since the price of laptops in Iran is constantly changing, I used Mage for this project, a modern tool for build and orchestrate data pipelines that fetch and prepare data and retrain model based on newset data.

Watch Demo here :

https://meysamraz-laptop-price-prediction-project.streamlit.app/

Watch Demo here heroku version :

update : heroku may shutdown free hosting so if this link didnt work use link above

https://loptop-price-prediction.herokuapp.com/

alt Text

Predict price is in Rial (Iran's currency)

Project Overview :

1 - Collect Data

To collect my data, I used Digikala's secret api. I was able to collect the data I wanted (available laptops with prices) with a simple fitler.

  • Collect Laptop Main Data

    • id : ID registered for laptop in Digikala
    • title_fa : The name of the laptop in Farsi
    • title_en : The name of the laptop in English
    • price : Laptop price in Rial (Iranian currency)
    • image_url : Laptop photo
    • brand : Laptop brand
  • Collect Laptop Details Data

    • cpu manufacturer : Laptop cpu manufacturer
    • cpu series : The cpu series used in the laptop
    • cpu model : The cpu model used in the laptop
    • ram : Laptop RAM capacity
    • ram type : The type of RAM used in the laptop
    • internal storage : Internal storage capacity of the laptop
    • internal storage type : The type of internal storage in loptop
    • gpu manufacturer : Laptop gpu manufacturer
    • gpu model : The gpu model used in the laptop
    • screen resolution : Laptop screen resolution
    • ports : Ports used in laptops
  • Merge Collected data

  • Remove duplicated rows

  • Save data into csv file

2 - Take a Look at Data :

After collecting the data, I started checking the collected data to make sure it was collected correctly

  • Check shape of data

  • Check is there any null value

  • Check data types

  • Check number of unique values in each column

3 - Cleaning Data

Like all machine learning projects, the data doesn't arrive perfect and ready for prediction. At this point, I started cleaning the collected data.

  • Convert brands name from persian to english

  • Convert ram from persian to english digits

  • Clean and convert internal storage to english

  • Convert and clean internal storage to english

  • Convert and clean laptops screen size

  • Clean laptops resolution

4 - EDA

For the next step, which is Feature engineering, it was necessary to get information about the data. In this step, I analyzed and explored the data.

  • Laptops price distribution

  • Number of laptops of each brand

  • Number of cpu of each cpu manufacturer

  • Number of laptops for each ram group

  • Number of laptops for each ram type group

  • Number of laptops with diffrent internal storage

  • Number of laptops for each internal storage group

  • Number of laptops with diffrent screen sizes

  • Number of laptop with diffrent screen resolution

5 - Feature Engineering

In this step, I prepared the features for training the model

  • Remove outliers base on laptops price using z score

  • Convert screen resolution to number

  • Extract Gaming brands from title (asus rog , acer nitro ...)

  • Remove brand with only one laptop

  • Extract clean gpu model from gpu model column

  • Remove laptops with only less than 3 model gpu

  • Label endcoding cleand gpu models

  • Convert internal storage from tb and gb to mg

  • Label encoding internal storage type

  • Convert ram from str to int

  • Extract port count

  • Label encoding ram type

  • Label encoding cpu series

  • One hot encoding brand - cpu manufacturer - gpu manufacturer (nominal categorical variables)

6 - Feature Selection

In this step, I chose the features needed to train the model

  • Check correlation

  • Mutual information regression

7 - Model training

8 - Hyperparameter tuning

9 - Cross validation

10 - Save model

I pickeld model for use in the gui environment

10 - Create Data Pipline

I used Mage (moderen and easier version of Airflow), to ETL data from Digikala everyday and retrain model based on newest data and export model

11 - Website

To create website, I used streamlit formwork, a powerful formwork that allows me to create the desired user interface completely using Python.

12 - Deploy

I used Heroku a cloud platform as a service which provide a free hosting to deploy my app on it. it's and amzaing platform gave me so much flexbilte to deploy your apps

Libraries and FrameWorks Used in the Project

laptop-price-prediction-end-to-end-project-using-ecommerce-website-data's People

Contributors

meysamraz avatar

Stargazers

 avatar  avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.