Code Monkey home page Code Monkey logo

food-sales-predictions's Introduction

GROCERY-SALES-PROJECTIONS

PREDICTING SALES VOLUME AT VARIOUS STORES

Jonathan Jones

22.04.29

This project aims to help the retailer understand the properties of their products and outlets that play a key role in predicting sales.

DATA

This data set encompasses various features associated with revenue including, but not limited to, the type and quantity of items sold, the type, size, and location of the outlets or stores that the items are sold to, the maximum retail price of each item, and the individual outlet sales for each item. It is important to separate, regroup and isolate these elements so that they can collated, correlated and analyzed against current sales. The insights from the evaluation can then be used to determine a strategy for revenue growth.

image

VISUAL ANALYSIS

Once the data is processed and free of duplicates, missing, values and other anomalies it can be used for visual analysis. Graphs and heatmaps are used to organize, and simply huge quantities of data and illuminate trends.

image

When comparing the Item Type Counts with Item Type by Store Sales we can readily see that the average sales per item are similar despite vast differences in item quantity. A focus on the sale of specific items, particularly items with higher retail prices, may be a good place to concentrate resources.

image

Heat maps can help with concentrating our focus on elements that share a relationship. They can also be used to affirm our initial insights; there is significant positive correlation between Max Retail Price and Item Outlet sales.

MACHINE LEARNING & MODELING

Machine learning is used to streamline, expedite and enhance the forecasting process by removing repetitive manual procedures. Automating algorithms or programs allow for higher efficiency, deeper insights, and higher profitability. In the Machine Learning section, we use a copy of the unadulterated data as a basis and instructive training tool for our models. Two different models are used with this training data to draw inferences and detail elemental relationships between features or columns so that they will be able to interact with, and perform similar analysis on testing or unseen data.

INSIGHTS & EVALUATION

Both the Linear Regression and Regression Tree models exhibit high bias or underfitting, meaning that they have difficulty formulating predictions on training and testing data regardless of which featrues are used from the current matrix. Their respective test and training scores are close in value but ultimately too low to inspire confidence in their performance.

BUSINESS RECOMMENDATIONS

I would recommend the Regression Tree model over the Linear Regression model even though the difference in their performance, as measured by Coefficient of Determination or R2, is slight. However, I would ask the stakeholders to prioritize the production and collection of more data for either models training. Both models performed poorly, and I believe that the main issue is the lack of data and/or the quantity and types of features used to predict future values for our target, Item Outlet Sales.

SUMMARY

Observation: It is evident from the correlation matrix and model test scores that there are few features that are positively correlated with our target. After Item Maximum Retail Price, with a positive correlation value of 0.57, the strongest correlation with our target is Item weight with a value of 0.012 โ€“ this is an insignificant correlation. Without 3 or more strongly correlated features the performance of most models would be in question. Several of the columns provided in the dataset are inherently ineffective in efforts to predict sales outcomes. Item Identifier, Outlet Identifier and Outlet Establishment Year are such columns.

Proposed Solution: I would suggest that the stakeholders explore metrics related to foot traffic, neighboring/anchor tenants, and demographical metrics (populations size, average age, marital status, number of children, education level, income, occupation, etc.). These metrics would help the models and stakeholders learn more about the habits and preferences of their local customers. The addition of these new features in conjunction with more data for the current feature columns would drastically improve the complexity, and predictive capacity of the current models, laying the foundation for definitive decision making and confidence capital re-allocation.

REFERENCES

food-sales-predictions's People

Contributors

starkjones avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.