Code Monkey home page Code Monkey logo

pdk's Introduction

pdk

CircleCI Go Report Card GoDoc

Pilosa Dev Kit - implementation tooling and use case examples are here!

Documentation is here: https://www.pilosa.com/docs/pdk/

Requirements

Installing

We assume you are on a UNIX-like operating system. Otherwise adapt the following instructions for your platform. We further assume that you have a Go development environment set up. You should have $GOPATH/bin on your $PATH for access to installed binaries.

  • go get github.com/pilosa/pdk
  • cd $GOPATH/src/github.com/pilosa/pdk
  • make install

Running Tests

To run unit tests make test

To run unit and integration tests, first, install and start the Confluent stack:

  1. Download tarball here: https://www.confluent.io/download
  2. Decompress, enter directory, then,
  3. Run ./bin/confluent start kafka-rest Now that's running, you can do make test TESTFLAGS="-tags=integration"

Taxi usecase

To get started immediately, run this:

pdk taxi

This will create and fill an index called taxi, using the short url list in usecase/taxi/urls-short.txt.

If you want to try out the full data set, run this:

pdk taxi -i taxi-big -f usecase/taxi/greenAndYellowUrls.txt

There are a number of other options you can tweak to affect the speed and memory usage of the import (or point it to a remote pilosa instance). Use pdk taxi --help to see all the options.

Note that this url file represents 1+ billion columns of data - depending on your hardware this will probably take well over 3 hours, and consume quite a bit of memory (and CPU). You can make a file with fewer URLs if you just want to get a sample.

After importing, you can try a few example queries at https://github.com/alanbernstein/pilosa-notebooks/blob/master/taxi-use-case.ipynb .

Net usecase

To get started immediately, run this:

pdk net -i en0

which will capture traffic on the interface en0 (see available interfaces with ifconfig).

SSB

The Star Schema Benchmark is a benchmark based on TPC-H but tweaked for a somewhat difference use case. It has been implemented by some big data projects such as https://hortonworks.com/blog/sub-second-analytics-hive-druid/ .

To execute the star schema benchmark with Pilosa, you must.

  1. Generate the SSB data at a particular scale factor.
  2. Import the data into Pilosa.
  3. Run the demo-ssb application for convenience which has all of the SSB queries pre-written.

Generating SSB data

Use https://github.com/electrum/ssb-dbgen.git to generate the raw SSB data. This can be a bit finicky to work with - hit up @tgruben for tips (or maybe he'll update this section ๐Ÿ˜‰.

When generating the data, you have to select a particular scale factor - the size of the generated data will be about 600MB * SF(scale factor), so SF=100 will generate about 60GB of data.

Import data into Pilosa.

Use pdk ssb to import the data into Pilosa. You must specify the directory containing the .tbl files generated in the first step as well as the location of your pilosa cluster. There are a few other options which you can tweak which may help import performance. See pdk ssb -h for more information.

Run demo-ssb

This repo https://github.com/pilosa/demo-ssb.git contains a small Go program which packages up the different queries which comprise the benchmark. Running demo-ssb starts a web server which executes queries against pilosa on your behalf. You can simply run (e.g.) curl localhost:8000/query/1.1 to run an SSB query.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.