Code Monkey home page Code Monkey logo

abhinav-bhardwaj / walmart-sales-time-series-forecasting-using-machine-learning Goto Github PK

View Code? Open in Web Editor NEW
28.0 2.0 11.0 50.4 MB

Time Series Forecasting of Walmart Sales Data using Deep Learning and Machine Learning

Home Page: https://now-its-abhi.medium.com/walmart-sales-time-series-forecasting-using-deep-learning-e7a5d47c448b?source=friends_link&sk=60a520d4cd7960a26114d39731eabb0b

License: GNU General Public License v3.0

Jupyter Notebook 100.00%
time-series-forecasting time-series-analysis walmart-sales-forecasting data-science deep-learning machine-learning linear-regression random-forest xgboost-regression regression

walmart-sales-time-series-forecasting-using-machine-learning's Introduction

Walmart Sales Time Series Forecasting Using Machine and Deep Learning

Time Series Forecasting of Walmart Sales Data using Deep Learning and Machine Learning

Blog of this Project

Datasets

Walmart Recruiting - Store Sales Forecasting downloaded from https://www.kaggle.com/c/walmart-recruiting-store-sales-forecasting

  • train.csv - CSV Data file containing following attributes
    • Store
    • Dept
    • Date
    • Weekly_Sales
    • IsHoliday

115064 Data rows

  • stores.csv - CSV Data File containing following attributes
    • Store
    • Type
    • Size

45 Data rows

  • features.csv - CSV Data file containing following attributes
    • Store
    • Date
    • Temperature
    • Fuel_Price
    • MarkDown1
    • MarkDown2
    • MarkDown3
    • MarkDown4
    • MarkDown5
    • CPI
    • Unemployment
    • IsHoliday

8190 Data rows

Machine Learning Models

  • Linear Regression Model
  • Random Forest Regression Model
  • K Neighbors Regression Model
  • XGBoost Regression Model
  • Custom Deep Learning Neural Network

Data Preprocessing

  • Handling Missing Values

    • CPI, Unemployment of features dataset had 585 null values.
    • MarkDown1 had 4158 null values
    • MarkDown2 had 5269 null values
    • MarkDown3 had 4577 null values
    • MarkDown4 had 4726 null values
    • MarkDown5 had 4140 null values All missing values were filled using median of respective columns.
  • Merging Datasets

    • Main Dataset merged with stores dataset
    • Resulting Dataset merged with features dataset
    • Total data rows 421570 and 15 attributes
    • Date column converted into datetime data type
    • Date attribute set as index of combined dataset
  • Splitting Date Column

    • Split the Date column into Year, Month, Week
  • Aggregate Weekly Sales

    • Create max, min, mean, median, std of weekly sales
  • Outlier Detection and Other abnormalities

    • Markdowns were summed into Total_MarkDown
    • Outliers were removed using z-score
    • Data rows 375438 and 20 columns
    • Negative weekly sales were removed
    • 374247 Data rows and 20 columns
  • One-hot-encoding

    • Store, Dept, Type columns were one-hot-encoded using get_dummies method
    • After one-hot-encoding, no. of columns becomes 145
  • Data Normalization

    • Numerical columns normalized using MinMaxScaler in the range 0 to 1
  • Recursive Feature Elimination

    • Random Forest Regressor used to calculate feature ranks and importance with 23 estimators
    • Features selected to retain
      • mean, median, Week, Temperature, max, CPI, Fuel_Price, min, std, Unemployment, Month, Total_MarkDown, Dept_16, Dept_18, IsHoliday, Dept_3, Size, Dept_9, Year, Dept_11, Dept_1, Dept_5, Dept_56
    • No. of attributes after feature elimination - 24

Splitting Dataset

  • Dataset was splitted into 80% for training and 20% for testing
  • Target feature - Weekly_Sales

Linear Regression Model

  • Linear Regressor Accuracy - 92.28
  • Mean Absolute Error - 0.030057
  • Mean Squared Error - 0.0034851
  • Root Mean Squared Error - 0.059
  • R2 - 0.9228
  • LinearRegression(copy_X=True, fit_intercept=True, n_jobs=None, normalize=False)

Random Forest Regression Model

  • Random Forest Regressor Accuracy - 97.889
  • Mean Absolute Error - 0.015522
  • Mean Squared Error - 0.000953
  • Root Mean Squared Error - 0.03087
  • R2 - 0.9788
  • n_estimators - 100
  • RandomForestRegressor(bootstrap=True, ccp_alpha=0.0, criterion='mse', max_depth=None, max_features='auto', max_leaf_nodes=None, max_samples=None, min_impurity_decrease=0.0, min_impurity_split=None, min_samples_leaf=1, min_samples_split=2, min_weight_fraction_leaf=0.0, n_estimators=100, n_jobs=None, oob_score=False, random_state=None, verbose=0, warm_start=False)

K Neighbors Regression Model

  • KNeigbhbors Regressor Accuracy - 91.9726
  • Mean Absolute Error - 0.0331221
  • Mean Squared Error - 0.0036242
  • Root Mean Squared Error - 0.060202
  • R2 - 0.919921
  • Neighbors - 1
  • KNeighborsRegressor(algorithm='auto', leaf_size=30, metric='minkowski', metric_params=None, n_jobs=None, n_neighbors=1, p=2, weights='uniform')

XGBoost Regression Model

  • XGBoost Regressor Accuracy - 94.21152
  • Mean Absolute Error - 0.0267718
  • Mean Squared Error - 0.0026134
  • Root Mean Squared Error - 0.051121
  • R2 - 0.942115235
  • Learning Rate - 0.1
  • n_estimators - 100
  • XGBRegressor(base_score=0.5, booster='gbtree', colsample_bylevel=1, colsample_bynode=1, colsample_bytree=1, gamma=0, importance_type='gain', learning_rate=0.1, max_delta_step=0, max_depth=3, min_child_weight=1, missing=None, n_estimators=100, n_jobs=1, nthread=None, objective='reg:linear', random_state=0, reg_alpha=0, reg_lambda=1, scale_pos_weight=1, seed=None, silent=None, subsample=1, verbosity=1)

Custom Deep Learning Neural Network Model

  • Deep Neural Network accuracy - 90.50328
  • Mean Absolute Error - 0.033255
  • Mean Squared Error - 0.003867
  • Root Mean Squared Error - 0.06218
  • R2 - 0.9144106
  • Kernel Initializer - normal
  • Optimizer - adam
  • Input layer with 23 dimensions and 64 output dimensions and activation function as relu
  • 1 hidden layer with 32 nodes
  • Output layer with 1 node
  • Batch Size - 5000
  • Epochs -100

Comparing Models

  • Linear Regressor Accuracy - 92.280797
  • Random Forest Regressor Accuracy - 97.889071
  • K Neighbors Regressor Accuracy - 91.972603
  • XGBoost Accuracy - 94.211523
  • DNN Accuracy - 90.503287

Citations

Written with StackEdit.

walmart-sales-time-series-forecasting-using-machine-learning's People

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.