Code Monkey home page Code Monkey logo

docker-for-data-science-tutorial's Introduction

Docker for Data Science

Alt text

Materials for "Docker for Data Science" tutorial presented at PyCon 2018 in Cleveland, OH.

YouTube / Slides


Description

Jupyter notebooks simplify the process of developing and sharing Data Science projects across groups and organizations. However, when we want to deploy our work into production, we need to extract the model from the notebook and package it up with the required artifacts (data, dependencies, configurations, etc) to ensure it works in other environments. Containerization technologies such as Docker can be used to streamline this workflow.

This hands-on tutorial presents Docker in the context of Reproducible Data Science - from idea to application deployment. You will get a thorough introduction to the world of containers; learn how to incorporate Docker into various Data Science projects; and walk through the process of building a Machine Learning model in Jupyter and deploying it as a containerized Flask REST API.

Audience

This session is geared towards Data Scientists who are interested in learning about Docker and want to understand how to incorporate it in their projects. No prior knowledge of Docker is assumed. Proficiency with Git and the Command Line is not a prerequisite, but will make it easier to follow along.

Upon completion of this tutorial, students will be able to:

  • Navigate the Docker ecosystem with ease
  • Leverage containers as part of their data science workflow
  • Productionize & deploy a Machine Learning model wrapped in an API

Learn how to become a Full-Stack Data Scientist!

Installation Instructions

Step 1: Install Docker and Docker-Compose

Mac

  1. Download Docker for Mac. Contains both Docker and Docker-Compose.

  2. Install

Linux

  1. Update your package manager.

  2. Use package manager to install Docker.

  3. Use package manager to install Docker-Compose.

Might need to add user account to docker group.

Windows

Note: Windows 10 users can use the Linux subsystem to install Docker and Docker-Compose. Instructions from a post we found on Medium.

Please also make sure to install Docker-Compose when you are installing Docker. Then proceed to Step 2

Otherwise, we have created a VM image. USB sticks with the image will be available at the tutorial

  1. Download VirtualBox for Windows Hosts.

  2. Download VirtualBox image containing all required files and containers. We also have USB sticks containing these images to reduce strain on the conference WiFi.

  3. Open VirtualBox Manager.

  4. File > Import Applicance > point to the file you just downloaded. Import it in.

  5. Double-click VM to start an instance.

  6. Login: osboxes | Password: osboxes.org | Root password: osboxes.org

The image you download contains images as well as repositories that were cloned to ~/docker-for-data-science.

  1. Update cloned repos by going into each folder and doing a git pull. Skip Steps 2 and 3.

Step 2: Clone Git Repositories

  1. Create a folder for this tutorial, we recommend ~/docker-for-data-science as this will be the folder we use in all of our examples.

  2. cd into folder

  3. Download both repositories:

git clone https://github.com/docker-for-data-science/docker-for-data-science-tutorial.git
git clone https://github.com/docker-for-data-science/talkvoter.git

Step 3: Download Docker Images

Please pre-download Docker images to reduce the strain on the conference WiFi.

  1. cd ~/docker-for-data-science/docker-for-data-science-tutorial/installation_files

  2. Run the shell script: ./download_docker_images.sh

  3. Build images for Talk Recommendation application:

cd ~/docker-for-data-science/talkvoter
docker-compose build

docker-for-data-science-tutorial's People

Contributors

alysivji avatar joejasinski avatar mandliya avatar tathagata avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.