Code Monkey home page Code Monkey logo

simple_spark_lib's Introduction

Simple Spark Lib

  1. Enables you to use the capability of Spark without actually writing the spark codes.
  2. Includes many workflows; which helps in writing codes and get your results in just few lines.
  3. For power user, it allows you to tweak every step in the flow.

Prerequisite:

This assumes that you have access to Apache Spark. (and Cassandra clusters if working with cassandra workflow)

Installation:

Clone the repo and build with the command:

python setup.py install

Uninstallation:

sudo pip uninstall simple_spark_lib

Usage:

Cassandra Workflow example:

# First, import your libraries
from simple_spark_lib import SimpleSparkCassandraWorkflow

# Define connection configuration for cassandra
cassandra_connection_config = {
  'host':     '192.168.56.101',
  'username': 'cassandra',
  'password': 'cassandra'
}

# Define Cassandra Schema information
cassandra_config = {
  'cluster': 'rootCSSCluster',
  'tables': {
    'api_events': 'simpl_events_production.api_events',
    # <alias of table> : <keyspace>.<table_name>
    # (Spark's temporary table name) : Cassandra's config
  }
}
# Initiate your workflow
workflow = SimpleSparkCassandraWorkflow(appName="Simple Example Worker")

# Setup the workflow with configurations
workflow.setup(cassandra_connection_config, cassandra_config)

# Run your favourite query
df = workflow.process(query="SELECT * FROM api_events")

print df.take(10)

Run this example with the command:

simple-runner filename.py -d cassandra

simple_spark_lib's People

Contributors

rootcss avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.