The burst from nikpopovic

The Burst Behavioral Analysis Engine

Burst was designed to support fast, rich, and flexible behavioral study of enormous and noisy real world event datasets generated by mobile applications as they are used day to day by their end users over long periods of time. It was developed at a small mobile application analytics startup called Flurry, later bought by Yahoo, then Verizon Media, and has been serving production scale request workloads (for free!) 24x7 to its customers for many years. It is very good at what it does and is now available to you as an open source platform.

is Burst for you?

We suggest you start any effort to understand Burst and what it does is by first taking a look at how we define Behavioral Analysis. This is there to allow you to quickly understand if your data, and the questions you want to ask of it, match well what Burst does well. Then we suggest turning to the overview of the Burst Data Model, the Burst Execution Model, and finally the Burst Runtime Model. These high level presentations should help you get a cleaner and deeper sense of what Burst is, how it works, and how you might envision it working for you. Extra credit to dig into a unique approach that Burst takes called the Single Pass Scan, as well as high level discussions of [Performance], [Security], and [Sampling]

what Burst is not

Equally important is to spend a moment clarifying what Burst is not...

a database

Burst is not a general purpose query engine nor is it a persistent, authoritative, or transactional database. It is an online analysis engine that scans imported data snapshots. To analyze your data you must first import a dataset from your data storage system into the memory/disk cache of a suitable Burst compute Cell, where you can then run one more more analysis requests across that data snapshot.

'real time'

Burst does not support what can be considered real time or streaming data access, it does provide services and protocols that can be used to build efficient, massive parallel import pipelines that can fetch up to date data quickly. The data Burst analyzes will be as current or up to date as the last import done. Burst has features that allow you to control the time (lookback) window of the 'view' you import...

conformant SQL

Burst has a rich front end language called EQL, and where possible we have tried to make that language conform to and look the same as SQL. However though the world of behavioral data and questions significantly overlaps the world of relational data models and relational calculus, EQL and the underlying semantics are simply not the same as SQL and its underlying semantics nor are they intended to be. If you feel you can, or should, or even simply prefer a true and full ANSI SQL, then Burst will disappoint.

prerequisites

In order to stand up a Burst compute cell and use it to analyze your data you will need these basics:

A Burst Compute Cell: One or more nodes of linux server hardware each with some sort of Java runtime deployment environment to set up a suitable Burst Master/Worker process or container topology e.g. Kubernetes. The Burst runtime deployment is at its core a small number of uber jar binary artifacts. These can be placed into virtual containers or any other packaging/deployment environment appropriate to your needs. With few reasonable limits, you can scale up Burst to service larger datasets and provide faster computations by adding more or more capable Worker hardware. For Worker node hardware it is generally helpful to have lots of cores to speed up analysis and a fast and large enough disk system for the Burst Worker data cache. The cache works best with SSD hardware though Burst will stripe data across multiple magnetic disk spindles quite handily.
Metadata Catalog: A SQL DB that supports a JDBC connection that can be used as a Burst Catalog that stores metadata. For most scenarios this DB does not need to be particularly high performance though for high/concurrent analysis request rates, it should be able to provide low latency indexed table lookups
Remote Datasource: A datasource system/cluster, with access to your data, where the Burst Java remote data import system endpoint can be stood up. This can be colocated on the Burst compute cell. If you have a parallel (multi-node) data storage system such as HBASE, the Burst data import system is quite good at spreading remote data feed endpoints across numerous data nodes.

digging deeper

If you are still with us, and you want to understand and/or vette the implementation, we suggest you take a look at the individual subsystem documentation and as well as become familiar with our external dependencies.

next steps

If you want to get close and personal we have a few more steps for you to take...

------ HOME --------------------------------------------

nikpopovic / burst Goto Github PK

burst's Introduction

The Burst Behavioral Analysis Engine

is Burst for you?

what Burst is not

a database

'real time'

conformant SQL

prerequisites

digging deeper

next steps

burst's People

Contributors

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent