Code Monkey home page Code Monkey logo

mbrukman / pipeline Goto Github PK

View Code? Open in Web Editor NEW

This project forked from pipelineai/pipeline

0.0 1.0 0.0 2.95 GB

Real-time, End-to-End, Advanced Analytics and Machine Learning Recommendation Pipeline

Home Page: http://pipeline.io

License: Apache License 2.0

Shell 0.44% Vim Script 0.09% ApacheConf 0.02% Python 0.70% XSLT 0.01% C 0.02% Makefile 0.01% C++ 0.35% Scala 1.16% Java 0.16% HTML 0.10% CSS 1.10% JavaScript 1.38% Jupyter Notebook 94.50%

pipeline's Introduction

Follow Wiki to Setup Docker-based Environment

End-to-End, Real-time ML Reference Data Pipeline

Gitter Chat Room

Architecture Overview

Follow Wiki to Setup Docker-based Environment Pipeline Architecture Overview

Mapped to Code

Workshop Architecture Overview

Powered by the PANCAKE STACK!

PANCAKE STACK

Upcoming Workshops

Title

Building an End-to-End Streaming Analytics and Recommendations Pipeline with Spark, Kafka, and TensorFlow

Agenda (Full Day)

Part 1 (Analytics and Visualizations)

  • Analytics and Visualizations Overview (Live Demo!)
  • Verify Environment Setup (Docker, Cloud Instance)
  • Notebooks (Zeppelin, Jupyter/iPython)
  • Interactive Data Analytics (Spark SQL, Hive, Presto)
  • Graph Analytics (Spark, Elastic, NetworkX, TitanDB)
  • Time-series Analytics (Spark, Cassandra)
  • Visualizations (Kibana, Matplotlib, D3)
  • Approximate Queries (Spark SQL, Redis, Algebird)
  • Workflow Management (Airflow)

Part 2 (Streaming and Recommendations)

  • Streaming and Recommendations (Live Demo!)
  • Streaming (NiFi, Kafka, Spark Streaming, Flink)
  • Cluster-based Recommendation (Spark ML, Scikit-Learn)
  • Graph-based Recommendation (Spark ML, Spark Graph)
  • Collaborative-based Recommendation (Spark ML)
  • NLP-based Recommendation (CoreNLP, NLTK)
  • Geo-based Recommendation (ElasticSearch)
  • Hybrid On-Premise+Cloud Auto-scale Deploy (Docker)
  • Save Workshop Environment for Your Use Cases

Locations and Dates

  • San Francisco: Saturday, April 23rd (SOLD OUT)
  • San Francisco: Saturday, June 4th (SOLD OUT)
  • Washington DC: Saturday, June 18th (SOLD OUT)
  • Los Angeles: Sunday, July 10th (SOLD OUT)
  • Seattle: Saturday, July 30th (SOLD OUT)
  • Santa Clara: Saturday, August 6th (SOLD OUT)
  • Chicago: Saturday, August 27th
  • New York: Saturday, September 24th
  • Barcelona: Saturday, October 1st
  • Munich: Saturday, October 15th
  • London: Saturday, October 22nd
  • Brussels: Saturday, October 29th
  • Oslo: Monday, October 31st
  • Madrid: Saturday, November 19th
  • Tokyo: December 3rd
  • Shanghai: December 10th
  • Beijing: Saturday, December 17th
  • Hyderabad: Saturday, December 24th
  • Bangalore: Saturday, December 31st
  • Sydney: Saturday, January 7th, 2017
  • Melbourne: Saturday, January 14th, 2017
  • Sao Paulo: Saturday, February 11th, 2017
  • Rio de Janeiro: Saturday, February 18th, 2017

Suggest a City and Date

Description

The goal of this workshop is to build an end-to-end, streaming data analytics and recommendations pipeline on your local machine using Docker and the latest streaming analytics

  • First, we create a data pipeline to interactively analyze, approximate, and visualize streaming data using modern tools such as Apache Spark, Kafka, Zeppelin, iPython, and ElasticSearch.
  • Next, we extend our pipeline to use streaming data to generate personalized recommendation models using popular machine learning, graph, and natural language processing techniques such as collaborative filtering, clustering, and topic modeling.
  • Last, we productionize our pipeline and serve live recommendations to our users!

Screenshots

Apache Zeppelin Notebooks

Apache Zeppelin Notebooks

Stanford CoreNLP Sentiment Analysis

Stanford CoreNLP Sentiment

Jupyter/iPython Notebooks

Jupyter/iPython Notebooks

SparkR Notebooks

SparkR Notebooks

TensorFlow Notebooks

TensorFlow Notebooks

Deploy Spark ML and TensorFlow Models into Production with Netflix OSS

Hystrix Dashboard Hystrix Dashboard

Apache NiFi Data Flows

Apache NiFi Data Flows

AirFlow Workflows

AirFlow Workflows

Presto Queries

Presto Queries

Tableau Integration

Tableau Integration

Beeline Command-line Hive Client

Beeline Command-line Hive Client

Log Visualization with Kibana & Logstash

Log Visualization with Kibana & Logstash

Spark, Spark Streaming, and Spark SQL Admin UIs

Spark Admin UI Spark Admin UI Spark Admin UI Spark Admin UI Spark Admin UI Spark Admin UI

Vector Host and Guest (Docker) System Metric UIs

Vector Metrics UI Vector Metrics UI Vector Metrics UI

Ganglia System and JVM Metrics Monitoring UIs

Ganglia Metrics UI Ganglia Metrics UI Ganglia Metrics UI

Tools Overview

Apache Spark Redis Apache Cassandra Apache Kafka NiFi ElasticSearch Logstash Kibana Apache Zeppelin Ganglia Hadoop HDFS iPython Notebook Docker

pipeline's People

Contributors

cfregly avatar retroryan avatar velvia avatar mistobaan avatar uover82 avatar andyzeli avatar andypetrella avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.