BIKE-SHARING-DATASET-Analysis

Bike sharing dataset analysis Abstract The Washington D.C. bike share service was first in operation in 2008. It was the first such service in North America. One of the activities that the corporation that runs the service requires is the projected number of bike rentals on a per hour basis. This report addresses this activity by using historical data and test a number of Regression Models to determine the best algorithm for forecasting bike rentals on a per hour basis. A total of five scenarios were modeled and based upon the success criteria the Random Forest model achieved the required results. The following report discusses the methodology that was followed and the other models that were tested. This report includes a portion of the coding and a portion of the visualizations that were created. Attached with this report is a html file created with Jupyter Notebook which includes all the coding used for data understanding, feature engineering and modelling. This report plus the Jupyter Notebook file and the data can be found on Github at the following web address. https://github.com/Salampop/BIKE-SHARING-DATASET-Analysis Introduction “Life connected by pedal strokes.” (Press Kit, 2020) This is the vision of Capital Bikeshare. A bike sharing service in Washington DC. A service that provides enjoyable and environmentally friendly transport from point A to point B. Washington DC was the first city in North America to offer a bike sharing service. This service started in August 2008 and started with 120 bikes and 10 stations. Since then the service has grown and spread into seven localities in and around Washington DC. The service now has 4300 bikes in 500 stations. This is a 50% growth year over year. (Press Kit, 2020) One of the many challenges they have is the prediction of bike usage for future capital and operating budgeting requirements and for locations of future bike racks and bikes themselves. The algorithm developed for Capital Bikeshare will help in determining how the bike sharing service will expand or contract in the future. The participants of the team involved in this study are Sam Fawzi, Jinping Bai, Krishna Kiruba, Leolein Paouchi, and Paul Flemming. The team brings together a wide variety of experience from the fields of business, finance, logistics, engineering and IT. This combined experience enables the team to analyze the project from different perspectives. Background The datasets (Bike Sharing Dataset Data Set) that have been provided for this study are usage statistics on a daily and hourly basis. Both files include fields about weather, working days, type of user (registered or not) and the total usage count per hour, in one file and usage per day in another file. The data we are working is from the years 2011 and 2012. One issue with working with this data set is that it is eight years old. To provide an algorithm that will predict usage for the rest of 2020 and into 2021, the existing dataset should be replaced with more recent data from 2018 and 2019. One aspect that this report does not take into account is the year over year increase in usage. To account for year over year increase a much larger dataset would need to be used that will encompass a larger number of years.

Objective The objective of this study is to develop an algorithm to predict the number of system users per hour. This process will include looking at the variables and deciding if they are required in the modelling process. Looking for outliers in the data. Checking for missing data. Checking the independence of the variables to each other. A variety of models will initially be tested in determining the best way to develop the desired algorithm. The models that will be tested include Decision Tree, Random Forest, Train, test split model and Linear Regression model. A model with a score greater than 0.90 will be considered as a successful model. All coding for data understanding, data preparation and modelling is provided in a separate html file that has been generated from a Jupyter Notebook using Python as the programming language. Select tables and graphs are included in this report.

salampop / bike-sharing-dataset-analysis Goto Github PK

bike-sharing-dataset-analysis's Introduction

BIKE-SHARING-DATASET-Analysis

bike-sharing-dataset-analysis's People

Contributors

Watchers

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent