Code Monkey home page Code Monkey logo

bike-sharing-dataset-analysis's Introduction

BIKE-SHARING-DATASET-Analysis

Bike sharing dataset analysis Abstract The Washington D.C. bike share service was first in operation in 2008. It was the first such service in North America. One of the activities that the corporation that runs the service requires is the projected number of bike rentals on a per hour basis. This report addresses this activity by using historical data and test a number of Regression Models to determine the best algorithm for forecasting bike rentals on a per hour basis. A total of five scenarios were modeled and based upon the success criteria the Random Forest model achieved the required results. The following report discusses the methodology that was followed and the other models that were tested. This report includes a portion of the coding and a portion of the visualizations that were created. Attached with this report is a html file created with Jupyter Notebook which includes all the coding used for data understanding, feature engineering and modelling. This report plus the Jupyter Notebook file and the data can be found on Github at the following web address. https://github.com/Salampop/BIKE-SHARING-DATASET-Analysis Introduction β€œLife connected by pedal strokes.” (Press Kit, 2020) This is the vision of Capital Bikeshare. A bike sharing service in Washington DC. A service that provides enjoyable and environmentally friendly transport from point A to point B. Washington DC was the first city in North America to offer a bike sharing service. This service started in August 2008 and started with 120 bikes and 10 stations. Since then the service has grown and spread into seven localities in and around Washington DC. The service now has 4300 bikes in 500 stations. This is a 50% growth year over year. (Press Kit, 2020) One of the many challenges they have is the prediction of bike usage for future capital and operating budgeting requirements and for locations of future bike racks and bikes themselves. The algorithm developed for Capital Bikeshare will help in determining how the bike sharing service will expand or contract in the future. The participants of the team involved in this study are Sam Fawzi, Jinping Bai, Krishna Kiruba, Leolein Paouchi, and Paul Flemming. The team brings together a wide variety of experience from the fields of business, finance, logistics, engineering and IT. This combined experience enables the team to analyze the project from different perspectives. Background The datasets (Bike Sharing Dataset Data Set) that have been provided for this study are usage statistics on a daily and hourly basis. Both files include fields about weather, working days, type of user (registered or not) and the total usage count per hour, in one file and usage per day in another file. The data we are working is from the years 2011 and 2012. One issue with working with this data set is that it is eight years old. To provide an algorithm that will predict usage for the rest of 2020 and into 2021, the existing dataset should be replaced with more recent data from 2018 and 2019. One aspect that this report does not take into account is the year over year increase in usage. To account for year over year increase a much larger dataset would need to be used that will encompass a larger number of years.

Objective The objective of this study is to develop an algorithm to predict the number of system users per hour. This process will include looking at the variables and deciding if they are required in the modelling process. Looking for outliers in the data. Checking for missing data. Checking the independence of the variables to each other. A variety of models will initially be tested in determining the best way to develop the desired algorithm. The models that will be tested include Decision Tree, Random Forest, Train, test split model and Linear Regression model. A model with a score greater than 0.90 will be considered as a successful model. All coding for data understanding, data preparation and modelling is provided in a separate html file that has been generated from a Jupyter Notebook using Python as the programming language. Select tables and graphs are included in this report.

bike-sharing-dataset-analysis's People

Contributors

salampop avatar

Watchers

James Cloos avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    πŸ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. πŸ“ŠπŸ“ˆπŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❀️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.