Code Monkey home page Code Monkey logo

rapids-blog-post's Introduction

Rapids Blog Post Repository

This repo contains the entire code used in our comparisons between Rapids - Scikit Learn - Spark - Pandas that are featured here: https://medium.com/sfu-cspmp/rapids-the-future-of-gpu-data-science-9e0524563019

The Code

ETL Timing Viz.ipynb \ ETL Cost.ipynb \ ML Timing Viz.ipynb \ ML Cost.ipynb

These notebooks create the graphs used in the Blog Post.

Rapids ETL Timing.ipynb \ Rapids ML Timing.ipynb \ Pandas Timing.ipynb \ Scikit-Learning Timing.ipynb

These notebooks perform the experiments and record the timings. The link for the data for these notebooks is located below.

create_ml_data.py

This script creates the data for the ML experiments (also located below - see ml_data.zip)

spark_ml_data_subsets_creation.py \ spark_etl_subsets_creation.py

These scripts are PySpark scripts to partition the data for spark to increase the experiment speed.

spark_ml_tests.py \ spark_etl_tests.py

These PySpark scripts execute the experiments using spark. See below for submission instructions:

  • spark-submit spark_etl_tests.py bc_air_monitoring_stations.csv spark_etl_test_subsets spark_etl_results
  • spark-submit spark_ml_tests.py spark_ml_test_subsets spark_ml_results

The Data

All data used can be found here: https://1sfu-my.sharepoint.com/:f:/g/personal/avickars_sfu_ca/Erj8utK-OatOiN9aOpWZZGABFWtGZyYPm29KrTQuc8_gWw?e=k5DXO9

The results for AWS are located in "AWS Results" in this repo.

Blog post shared word doc

https://1sfu-my.sharepoint.com/:w:/g/personal/avickars_sfu_ca/EQZLCAxGg7tLg7NDNfnv8kMBfDZkvtPgg_VzSj_BRKiP0A?e=gbJOZQ

rapids-blog-post's People

Contributors

avickars avatar karthiksrinatha avatar

Stargazers

 avatar  avatar  avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.