implydata / plywood Goto Github PK

A toolkit for querying and interacting with Big Data

License: Apache License 2.0

Shell 0.60% JavaScript 66.65% TypeScript 32.19% PEG.js 0.56%

plywood's Introduction

Plywood

Plywood is a JavaScript library that simplifies building interactive visualizations and applications for large data sets. Plywood acts as a middle-layer between data visualizations and data stores.

Plywood is architected around the principles of nested Split-Apply-Combine, a powerful divide-and-conquer algorithm that can be used to construct all types of data visualizations. Plywood comes with its own expression language where a single Plywood expression can translate to multiple database queries, and where results are returned in a nested data structure so they can be easily consumed by visualization libraries such as D3.js.

You can use Plywood in the browser and/or in node.js to easily create your own visualizations and applications.

Plywood also acts as a very advanced query planner for Druid, and Plywood will determine the most optimal way to execute Druid queries.

Installation

To use Plywood from npm simply run: npm install plywood.

Plywood can also be used by the browser.

Documentation

To learn more, see http://plywood.imply.io

Questions & Support

For updates about new and upcoming features follow @implydata on Twitter.

Please file bugs and feature requests by opening and issue on GitHub and direct all questions to our user groups.

plywood's People

Contributors

Stargazers

Watchers

Forkers

nishantmonu51 jaynblue wuqic lastlegion b-slim nagyistge jfenton kobiburnley saidimu stvhanna chaitany6143 0xgeert longweiquan cheddar wpexia aqia358 llysjtu egor20041 georgesabu neo-nie llinjing solarix888 bala-drg itechnology-rs 3838438org panoplymedia crozzy tanureja robertervin brenokleber mckang sbhuinya tomas-mm robertus100 superdodd juripero xaxank mengjin001 adrianmroz acdn-abatos leonie922 remerge wylieallen-i benzerbett cuebook bireports cheslip entea harunme topgames-release liyupi skondla eddings elwlwlwk cristian-popa plesiecki alsotang ilhanadiyaman denniswieczorek

plywood's Issues

Time render Array instead an object

Right now, if i want to compare two different timeframes for the same object id i have to use a hack to duplicate the query and send it twice with pivot. Is there any way to do it sending just an array of two (or more) timeranges and let plywood do its magic?

something like:

   "action":"in",
   "expression":{  
      "op":"literal",
      "value":[{  
         "start":"2015-12-26T00:01:00.000Z",
         "end":"2015-12-27T00:01:00.000Z"
      },{  
         "start":"2015-12-26T00:01:00.000Z",
         "end":"2015-12-27T00:01:00.000Z"
      }],
      "type":"TIME_RANGE"
   }

Subquery Equality Filtering on Computed Column Failing

Plywood.ply()
      .apply("my_datasource", $("my_datasource")
          .filter(
            $("timestamp").in({
              start: new Date("2018-01-01"),
              end: new Date("2018-02-16")
            })
      ))
      .apply('visitorTypes', $("my_datasource")
             .split({ UserId: '$user__id' })
             .apply('user__is_new', $("my_datasource").max('$user__is_first_session')))
      .apply('data', $("my_datasource")
             .filter($('user__id').in(
                $('visitorTypes').filter('$user__is_new == 0').collect($('UserId'))
              ))
             .count())

Created a POST request of

{  
   "method":"POST",
   "url":"https://example.com/druid/v2/",
   "body":{  
      "queryType":"timeseries",
      "dataSource":"my_datasource",
      "intervals":"2018-01-01T00Z/2018-02-18T19:35:30.768Z",
      "granularity":"all",
      "context":{  
         "timeout":10000
      },
      "filter":{  
         "type":"or",
         "fields":[  

         ]
      },
      "aggregations":[  
         {  
            "name":"__VALUE__",
            "type":"count"
         }
      ]
   },
   "headers":{  
      "Content-type":"application/json"
   }
}

Which makes Druid throw

Error: Unknown exception: Instantiation of [simple type, class io.druid.query.filter.OrDimFilter] value failed: OR operator requires at least one field (through reference chain: io.druid.query.filter.OrDimFilter["fields"])

Since the dataset $user__is_new is a member of is in-memory, Plywood should filter in-memory instead of passing through to Druid.

Also of note, if I use a quantile filter instead of an equality filter (e.g.

 .filter($('ad__id').in(
    $('adsByCTR').filter('$ad__ctr <= $adsByCTR.quantile($ad__ctr, 0.25)').collect($('AdId'))
))

it works as expected.

Usage within Java

Are there any examples of usage within JVM? I have seen that currently only MySQL is supported, how could I and create a generic SQL layer for other databases?

create multiline graph with expression in Pivot config.yaml

Hello!

I really need to understand if its possible to aggregate few measures into one multiline graph by expression in config.yaml of my Pivot? it appears to be Plywood expression so writing question here.

So I have those 2 dimensions, like "dimension1" and "dimension2", and I need something like:
- name: combine
title: Combine
formula: [$main.sum($dimension1), $main.sum($dimension2)]

Thanks in advance!

Passing DATASET to .in

Is it possible to pass DATASET type to .in() operator?
I'm trying to filter on an attribute and then pass the output of that into another filter.

UnhandledPromiseRejectionWarning: Unhandled promise rejection (rejection id: 1): Type Error: in expression has a bad type combination STRING IN DATASET

Error in fallback calculation

NaN values not being handled correctly in this example:
$main.max(($NumberPacketsLostSum / ($NumberPacketsReceivedSum + ($NumberPacketsLostSum).absolute())).fallback(0) * 100)

From https://groups.google.com/forum/?utm_medium=email&utm_source=footer#!msg/imply-user-group/dSSiPXyu_k4/V7xFThDHBQAJ

Native Plywood performance slower than Pivot

Hello,
I was experimenting with Plywood and Pivot separately and I tried to run same query from Pivot as well as I have written a code in nodeJS which uses Plywood for running query on druid. But same query takes almost 1-2 seconds in Pivot.
But when I tried my nodeJS code which also uses plywood, my query took around 10 seconds for exactly same query and same results. I don't understand what the problem is. The plywood statement which I generated in my nodeJS code is as follows:

$src.split($__time.timeBucket('PT1M', 'Etc/UTC'), 'Time').apply('count', $src.sum($count)).sort($count,'descending').limit(20).apply('GroupBy', $src.split($dim1,dim1).apply('count', $src.su
m($count)).sort($count,'descending').limit(5).apply('GroupBy', $src.split($dim2,dim2).apply('count', $src.sum($count)).sort($count,'descending').limit(5)))

where "src" is my datasource and "dim1" and "dim2" are my dimensions.

I don't understand when pivot is also using plywood at the back, why using native plywood as query client is giving delayed results than pivot. Are there some settings which I have been missing?

-Sundaram

[Deprecation] SharedArrayBuffer will require cross-origin isolation as of M91, around May 2021.

Just to let you know that Chrome DevTools got the following warning:

[Deprecation] SharedArrayBuffer will require cross-origin isolation as of M91, around May 2021. See https://developer.chrome.com/blog/enabling-shared-array-buffer/ for more details.

Can this be fixed in a timely manner?

Thank you!

Filtering on Parseable String Filter Failing

Plywood.ply()
.apply("my_datasource", $("my_datasource")
      .filter(
        $("timestamp").in({
          start: new Date("2018-01-01"),
          end: new Date("2018-02-16")
        })
      )
      .apply('visitorTypes', $("my_datasource")
             .split({ UserId: '$user__id' })
             .apply('user__is_new', $("my_datasource").max('$user__is_first_session'))
       )
      .filter('$user__id in [$visitorTypes.filter($user__is_new == 1).collect($UserId)]')
)
.apply("count", $("my_datasource").count())

Fails with

Expression parse error: Expected "$", "+", "-", "false", "i$", "null", "ply", "true", "|", (, Name, Number, NumberSet, String, or StringSet but "[" found. on '$user__id in [$visitorTypes.collect($UserId)]'

While

.filter($('user__id').in(
    $('visitorTypes').filter('$user__is_new == 1').collect('$UserId')
))

Doesn't throw that error (but still fails due to #166)

Columns with spaces

Any way to get the apply function to work with spaces in column names?

var context = {
"dataset": dataset,
"log_count": "Log Count"
}

.apply('Log Count', '$dataset.sum($log_count)')

Error: sum must have expression of type NUMBER (is STRING)
at SumAction.Action._checkExpressionTypes (/root/druid/node_modules/plywood/build/plywood.js:7579:27)
at new SumAction (/root/druid/node_modules/plywood/build/plywood.js:10306:18)
at SumAction.Action._substituteHelper (/root/druid/node_modules/plywood/build/plywood.js:7750:20)
at /root/druid/node_modules/plywood/build/plywood.js:7247:76
at Array.map (native)
at ChainExpression._substituteHelper (/root/druid/node_modules/plywood/build/plywood.js:7247:38)
at ApplyAction.Action._substituteHelper (/root/druid/node_modules/plywood/build/plywood.js:7744:44)
at /root/druid/node_modules/plywood/build/plywood.js:7247:76
at Array.map (native)
at ChainExpression._substituteHelper (/root/druid/node_modules/plywood/build/plywood.js:7247:38)

Add support for new Druid Math Expressions

Druid recently released some very powerful functionality common to SQL queries which will provide a lot of useful functionality to Plywood.

Some of the key ones include things like

CASE statements
NVL statements
LIKE statements

among many others. The full list is available at http://druid.io/docs/latest/misc/math-expr.html.

Many use cases would benefit from this functionality, and would really add a lot of power to Plywood's existing functionality.

Filtering with Computed Columns Hanging

Plywood.ply()
.apply("my_datasource", $("my_datasource")
      .filter(
        $("timestamp").in({
          start: new Date("2018-01-01"),
          end: new Date("2018-02-16")
        })
      )
      .apply('visitorTypes', $("my_datasource")
             .split({ UserId: '$user__id' })
             .apply('user__is_new', $("my_datasource").max('$user__is_first_session'))
       )
      .filter($('user__id').in(
        $('visitorTypes').filter('$user__is_new == 1').collect('$UserId')
      ))
)
.apply("count", $("my_datasource").count())

Never resolves (from the 2min wait time I gave it) on a 100k row dataset

How to customize an expression in plywood?

How to let plywood use listFiltering?

Hi!

Druid have listFiltered filtering.

Ex)
{
    "type": "listFiltered",
    "delegate": {
       "type": "default",
       "dimension": "tags",
       "outputName": "tags"
    },
    "values": ["t3"]
}

Effect of that is similar to that of SQL HAVING ... clause.

I already checked https://plywood.imply.io/expressions, but I couldn't found any clues.

Would you please to help me with that?

Thanks.

Adding Column Comparison filter

Please add column comparison filter.

http://druid.io/docs/0.10.1/querying/filters.html#column-comparison-filter

"filter": { "type": "columnComparison", "dimensions": [<dimension_a>, <dimension_b>] }

feature request: filter / split on nested json data (or request documentation if it's already there)

I'm not sure if plywood has nested json data support already, or not? from http://imply.io/docs/latest/tutorial-batch it says not

Let's use a small pageviews dataset as an example. Druid supports TSV, CSV, and JSON out of the box. Note that nested JSON objects are not supported, so if you do use JSON, you should provide a file containing flattened objects.

but when I give it nested json data, with Dataset.fromJS(nested_jsondata) it says this nested bidderTiming is filterable and splitable, however I don't know how to write the plywood expresssion

extractVersion should work for x.y version formats

presto's non-semantic versioning (currently 0.148) doesn't work with the current extractVersion()

here's a test case:

    it("works with super basic versions", () => {
      expect(External.extractVersion('0.1')).to.equal('0.1');
    });

Lookups

Hi, Is there any way to work with Druid Lookups ?

[https://druid.apache.org/docs/latest/querying/lookups.html](Druid Lookups Docs)

I have a project where I'm thinking about usage of plywood but this Lookups feature is really important to me.

Multi-Reference Concatenation Fails

A query that does some string manipulation to concatenate 2 or more columns fails since the expression contains many free references.

An example would be something like $column1 ++ 'exampleBaseString' ++ $column2

Error compile in v0.16.5

Compiling TypeScript
node_modules/plywood-base-api/index.d.ts(1,32): error TS7016: Could not find a declaration file for module 'readable-stream'. '/Users/xxx/Downloads/plywood-master/node_modules/readable-stream/readable.js' implicitly has an 'any' type.
src/datatypes/attributeInfo.ts(20,29): error TS7016: Could not find a declaration file for module 'has-own-prop'. '/Users/xxx/Downloads/plywood-master/node_modules/has-own-prop/index.js' implicitly has an 'any' type.
src/datatypes/common.ts(20,29): error TS7016: Could not find a declaration file for module 'has-own-prop'. '/Users/xxx/Downloads/plywood-master/node_modules/has-own-prop/index.js' implicitly has an 'any' type.
src/datatypes/dataset.ts(18,29): error TS7016: Could not find a declaration file for module 'has-own-prop'. '/Users/xxx/Downloads/plywood-master/node_modules/has-own-prop/index.js' implicitly has an 'any' type.
src/datatypes/set.ts(18,29): error TS7016: Could not find a declaration file for module 'has-own-prop'. '/Users/xxx/Downloads/plywood-master/node_modules/has-own-prop/index.js' implicitly has an 'any' type.
src/datatypes/valueStream.ts(17,27): error TS7016: Could not find a declaration file for module 'readable-stream'. '/Users/xxx/Downloads/plywood-master/node_modules/readable-stream/readable.js' implicitly has an 'any' type.
src/expressions/baseExpression.ts(19,29): error TS7016: Could not find a declaration file for module 'has-own-prop'. '/Users/xxx/Downloads/plywood-master/node_modules/has-own-prop/index.js' implicitly has an 'any' type.
src/expressions/baseExpression.ts(22,45): error TS7016: Could not find a declaration file for module 'readable-stream'. '/Users/xxx/Downloads/plywood-master/node_modules/readable-stream/readable.js' implicitly has an 'any' type.
src/expressions/joinExpression.ts(20,29): error TS7016: Could not find a declaration file for module 'has-own-prop'. '/Users/xxx/Downloads/plywood-master/node_modules/has-own-prop/index.js' implicitly has an 'any' type.
src/expressions/literalExpression.ts(23,29): error TS7016: Could not find a declaration file for module 'has-own-prop'. '/Users/xxx/Downloads/plywood-master/node_modules/has-own-prop/index.js' implicitly has an 'any' type.
src/expressions/numberBucketExpression.ts(17,29): error TS7016: Could not find a declaration file for module 'has-own-prop'. '/Users/xxx/Downloads/plywood-master/node_modules/has-own-prop/index.js' implicitly has an 'any' type.
src/expressions/refExpression.ts(18,29): error TS7016: Could not find a declaration file for module 'has-own-prop'. '/Users/xxx/Downloads/plywood-master/node_modules/has-own-prop/index.js' implicitly has an 'any' type.
src/expressions/splitExpression.ts(17,29): error TS7016: Could not find a declaration file for module 'has-own-prop'. '/Users/xxx/Downloads/plywood-master/node_modules/has-own-prop/index.js' implicitly has an 'any' type.
src/expressions/timePartExpression.ts(17,25): error TS7016: Could not find a declaration file for module 'moment-timezone'. '/Users/xxx/Downloads/plywood-master/node_modules/moment-timezone/index.js' implicitly has an 'any' type.
src/external/baseExternal.ts(18,69): error TS7016: Could not find a declaration file for module 'readable-stream'. '/Users/xxx/Downloads/plywood-master/node_modules/readable-stream/readable.js' implicitly has an 'any' type.
src/external/baseExternal.ts(21,29): error TS7016: Could not find a declaration file for module 'has-own-prop'. '/Users/xxx/Downloads/plywood-master/node_modules/has-own-prop/index.js' implicitly has an 'any' type.
src/external/baseExternal.ts(550,29): error TS7006: Parameter 'encoding' implicitly has an 'any' type.
src/external/baseExternal.ts(550,39): error TS7006: Parameter 'callback' implicitly has an 'any' type.
src/external/baseExternal.ts(554,15): error TS7006: Parameter 'callback' implicitly has an 'any' type.
src/external/baseExternal.ts(564,37): error TS7006: Parameter 'encoding' implicitly has an 'any' type.
src/external/baseExternal.ts(564,47): error TS7006: Parameter 'callback' implicitly has an 'any' type.
src/external/baseExternal.ts(581,23): error TS7006: Parameter 'callback' implicitly has an 'any' type.
src/external/baseExternal.ts(1444,25): error TS7006: Parameter 'chunk' implicitly has an 'any' type.
src/external/baseExternal.ts(1444,32): error TS7006: Parameter 'encoding' implicitly has an 'any' type.
src/external/baseExternal.ts(1444,42): error TS7006: Parameter 'callback' implicitly has an 'any' type.
src/external/baseExternal.ts(1517,21): error TS7006: Parameter 'chunk' implicitly has an 'any' type.
src/external/baseExternal.ts(1517,28): error TS7006: Parameter 'enc' implicitly has an 'any' type.
src/external/baseExternal.ts(1517,33): error TS7006: Parameter 'callback' implicitly has an 'any' type.
src/external/druidExternal.ts(20,29): error TS7016: Could not find a declaration file for module 'has-own-prop'. '/Users/xxx/Downloads/plywood-master/node_modules/has-own-prop/index.js' implicitly has an 'any' type.
src/external/druidExternal.ts(22,27): error TS7016: Could not find a declaration file for module 'readable-stream'. '/Users/xxx/Downloads/plywood-master/node_modules/readable-stream/readable.js' implicitly has an 'any' type.
src/external/druidExternal.ts(23,26): error TS7016: Could not find a declaration file for module 'stream-to-array'. '/Users/xxx/Downloads/plywood-master/node_modules/stream-to-array/index.js' implicitly has an 'any' type.
src/external/druidExternal.ts(168,14): error TS7006: Parameter 'sourcesArray' implicitly has an 'any' type.
src/external/druidExternal.ts(181,14): error TS7006: Parameter 'res' implicitly has an 'any' type.
src/external/druidExternal.ts(189,29): error TS7006: Parameter 'encoding' implicitly has an 'any' type.
src/external/druidExternal.ts(189,39): error TS7006: Parameter 'callback' implicitly has an 'any' type.
src/external/druidSqlExternal.ts(19,26): error TS7016: Could not find a declaration file for module 'stream-to-array'. '/Users/xxx/Downloads/plywood-master/node_modules/stream-to-array/index.js' implicitly has an 'any' type.
src/external/druidSqlExternal.ts(81,14): error TS7006: Parameter 'sources' implicitly has an 'any' type.
src/external/druidSqlExternal.ts(93,14): error TS7006: Parameter 'res' implicitly has an 'any' type.
src/external/mySqlExternal.ts(20,26): error TS7016: Could not find a declaration file for module 'stream-to-array'. '/Users/xxx/Downloads/plywood-master/node_modules/stream-to-array/index.js' implicitly has an 'any' type.
src/external/mySqlExternal.ts(73,14): error TS7006: Parameter 'sources' implicitly has an 'any' type.
src/external/mySqlExternal.ts(84,14): error TS7006: Parameter 'res' implicitly has an 'any' type.
src/external/postgresExternal.ts(19,26): error TS7016: Could not find a declaration file for module 'stream-to-array'. '/Users/xxx/Downloads/plywood-master/node_modules/stream-to-array/index.js' implicitly has an 'any' type.
src/external/postgresExternal.ts(87,14): error TS7006: Parameter 'sources' implicitly has an 'any' type.
src/external/postgresExternal.ts(95,14): error TS7006: Parameter 'res' implicitly has an 'any' type.
src/external/sqlExternal.ts(19,27): error TS7016: Could not find a declaration file for module 'readable-stream'. '/Users/xxx/Downloads/plywood-master/node_modules/readable-stream/readable.js' implicitly has an 'any' type.
src/external/utils/druidAggregationBuilder.ts(17,29): error TS7016: Could not find a declaration file for module 'has-own-prop'. '/Users/xxx/Downloads/plywood-master/node_modules/has-own-prop/index.js' implicitly has an 'any' type.
src/helper/concurrentLimitRequester.ts(18,29): error TS7016: Could not find a declaration file for module 'readable-stream'. '/Users/xxx/Downloads/plywood-master/node_modules/readable-stream/readable.js' implicitly has an 'any' type.
src/helper/retryRequester.ts(19,29): error TS7016: Could not find a declaration file for module 'readable-stream'. '/Users/xxx/Downloads/plywood-master/node_modules/readable-stream/readable.js' implicitly has an 'any' type.
src/helper/streamBasics.ts(17,26): error TS7016: Could not find a declaration file for module 'readable-stream'. '/Users/xxx/Downloads/plywood-master/node_modules/readable-stream/readable.js' implicitly has an 'any' type.
src/helper/streamBasics.ts(27,12): error TS2339: Property 'emit' does not exist on type 'ReadableError'.
src/helper/streamConcat.ts(17,37): error TS7016: Could not find a declaration file for module 'readable-stream'. '/Users/xxx/Downloads/plywood-master/node_modules/readable-stream/readable.js' implicitly has an 'any' type.
src/helper/streamConcat.ts(39,12): error TS2339: Property 'push' does not exist on type 'StreamConcat'.
src/helper/streamConcat.ts(42,55): error TS2339: Property 'emit' does not exist on type 'StreamConcat'.
src/helper/utils.ts(17,29): error TS7016: Could not find a declaration file for module 'has-own-prop'. '/Users/xxx/Downloads/plywood-master/node_modules/has-own-prop/index.js' implicitly has an 'any' type.
src/helper/utils.ts(18,48): error TS7016: Could not find a declaration file for module 'readable-stream'. '/Users/xxx/Downloads/plywood-master/node_modules/readable-stream/readable.js' implicitly has an 'any' type.

Question about supporting escape in string parsing rule

Hi! expert!

I currently using plywood library to mediate query to druid broker.
Yesterday I met expression parse error, so I digged deeper inside plywood and I found below.

String "String"
= "'" chars:NotSQuote "'" _ { return chars; }
/ "'" chars:NotSQuote { error("Unmatched single quote"); }
/ '"' chars:NotDQuote '"' _ { return chars; }
/ '"' chars:NotDQuote { error("Unmatched double quote"); }

In my case construct plywood string expression using query from client like below.

const expressionStr = `$dim.in(['${val1}', '${val2}'])`

and one of the parameters was 'banana",
plywood expression would be $dim.in([''banana"']) or $dim.in(["'banana""]).
Unfortunately, neither of these was accepted.

By parsing rule, single quote string's last character must match. Either 'some' or "some".
For now I avoided this by directly using plywood.LiteralExpression, so no more exceptions thrown.

So, I wonder any plans to support escape rule.

Thanks
Best regard.

Add additional filter on top of Expression

Hi all,

I have a question that is hopefully a simple one - I have an Expression object (that may or may not have a filter in it) and I would like to add another filter on top of it, that will be AND'ed with any existing filter.
The context is - I'd like to patch Swiv to add some filter on its server side that will limit the scope of the query to what the specific user is allowed to see.
I tried several things, my best guess was to add:

ex = ex.filter('$myField == "someValue"');

before the call to compute, but that fails with Error: could not resolve $myField (even though the field exists on all my Druid data sources)
Any guidance will be appreciated

Thank you

Eran

unsupported aggregate action while running .collect()

Hi I'm not able to run .collect(). I get the the following error:
unsupported aggregate action

Digging through the source code I found there is no handler for collect. There are corresponding countToAggregation, countDistinctToAggregation etc. but no collectToAggregation. Is that the issue?
https://github.com/implydata/plywood/blob/master/src/external/utils/druidAggregationBuilder.ts#L190

Expression.some does not work

Expression.some does not work properly.

Examples:

const e1 = Expression.parse("$main.countDistinct($user)");
const e2 = Expression.parse("$main.countDistinct($user) * 100");

Check:

e1.some(e => e instanceof CountDistinctExpression); // returns true
e2.some(e => e instanceof CountDistinctExpression); // returns false

That's because here:

plywood/src/expressions/baseExpression.ts

Line 969 in 574c6aa

return (v == null) ? null : !v;

we always return some boolean, and inside:

plywood/src/expressions/baseExpression.ts

Line 2076 in 574c6aa

if (pass != null) {

we early return if we get non-null value. So everyHelper doesn't recurse and can't find correct expression nested.

Postgres support

Are there any plans to add support for postgres (and redshift) style sql? If you think this is something you would like to support I'd be happy to try my hand at a PR.

Support of doubleFirst and doubleLast aggregation for druid

Hello,

Is there any way by which we can apply doubleFirst or doubleLast aggregation for druid?

Thanks.

does quantile require data be sorted first?

as d3.quantile required that,

https://github.com/d3/d3/wiki/Arrays#d3_quantile

I'm not seeing this plywood quantile work as expected in ed780ff#commitcomment-17744755 ; please also fix this code example beea9b1#commitcomment-17744746

feature request: ability to split by TIME_FORMAT druid sql equivalent.

Hello,

I currently use timePart('HOUR_OF_DAY') to group by across many years.
I would also like to be able to group by hour of day and year across many years.
For example:

select SUM("value") as val, TIME_FORMAT(__time, 'YYYY-HH') AS "date" FROM "datasource" GROUP BY TIME_FORMAT(__time, 'YYYY-HH')

The above works in the druid sql console. I would love to have this ability in plywood as well.

Plywood and Druid zero-fill on timeseries queries

Hi! I'm new to Plywood framework. I'm using it to query a Druid database.

Suppose I want to count how many clicks certain link had from day 2019-10-1 until 2019-10-07 but some days don't have any data to show. How could I fill those missing days? I initially thought the operator.timeRange() would achieve that, but I guess I was wrong.

Any help is appreciated. Thanks in advance.

Edit: as I can see zero-filling is done by Druid, but I can't get those values returned from Plywood.

Incomplete results for deeply nested queries

For deeply nested queries, Plywood by default spawns up to 500 queries and this is a root cause of incomplete results, really hard to spot. The results from N-th row on the most deeply dimension(s) contain only "undefined" due to missing/skipped results.

plywood/src/expressions/baseExpression.ts

Line 1742 in 5d2ee0c

return null; // Query limit reached, don't do any more queries.

I fully understand that limit is pretty high and could be changed via API but when the limit is reached I would expect an exception from Plywood rather than silently skipping queries/results. It gives me a clue that something is wrong with my query.

Also, I'm considering switching to regular "group by" query instead of tons of "top N" queries for deeply nested Plywood expressions. Is it possible with current Plywood version?

Negative values in NUMBER_RANGE

Working with negative numbers is broken in NUMBER_RANGE filters.
$main.filter($time.in([2015-12-17T00:01:00.000Z,2015-12-18T00:01:00.000Z)).and($Latitude.in([-1,12]))).split($Region,SEGMENT,main).apply(count,$main.count()).sort($count,descending).limit(101) gives empty result, while same request with $Latitude.in([0,12]) not. Such requests worked in 0.8.12 version, for example.

Getting incorrect count distinct value in Plywood

I am getting an incorrect count distinct value in plywood response. "useApproximateCountDistinct" options is set to false in Druid and I am getting exact count in Druid UI. But with Plywood, I am only getting approximate count.
I am using Plywood version 0.22.10 and Druid version 0.20.1
Is there any option to get exact count distinct with Plywood ?

Allow extensibility for Druid extensions

Ideally this allow for self-contained Plywood extensions similar to Druid's extensions.

Right now, some ones that stand out for providing a lot of powerful queries are things like DataSketches and Approximate Histograms

What is the best way to get all distinct values of all dimensions?

Hi I am trying to get all the distinct values of all the dimensions for a given dataset. What is the best way to achieve this using plywood? Also, how can I get all the dimensions of a given dataset?

Thanks!

test.html doesn't seem to contain valid code

Hi,

I'm trying to understand and play with plywood so I tried the test.html file but i'm getting the following error right away:

Uncaught TypeError: name must be a string$ @ plywood.js:4788(anonymous function) @ test.html:27

This seems to come from $() as it has no param.

Configure Plywood to not throw exception on unknown dimension

Hello,

I'm using Plywood (great lib!) for querying Druid with the plywood-druid-requester component.

I get could not resolve $some_dimension_name exceptions when I query some dimensions/metrics that has not been ingested in Druid yet (but are expected to come).

Is there a way to configure plywood, or the plywood-druid-requester, to not throw an exception?
The normal behaviour of Druid (no exception, and return dimensions values to null and metrics' to 0) would be expected.

Date equality works on Druid but not locally

I noticed this while writing a unit test.

const someDataset = plywood.Dataset.fromJS([
  { cut: 'Good',  price: 400, time: new Date('2015-10-01T00:00:00Z') },
]);
const ex = $('data').filter($('time').is(new Date('2015-10-01T00:00:00Z'))).count();
const r = await ex.compute({ data: someDataset });
console.log(r); // --> 0

I tried sharing the date object between the query and the dataset but it's the same.

const d = new Date('2015-10-01T00:00:00Z');
const someDataset = plywood.Dataset.fromJS([
  { cut: 'Good',  price: 400, time: d },
]);
const ex = $('data').filter($('time').is(d)).count();
const r = await ex.compute({ data: someDataset });
console.log(r); // --> 0

I get the same with equals(new Date(...)) or in([new Date(...)]). But in() works if you share the Date object!

const d = new Date('2015-10-01T00:00:00Z');
const someDataset = plywood.Dataset.fromJS([
  { cut: 'Good',  price: 400, time: d },
]);
const ex = $('data').filter($('time').in([d])).count();
const r = await ex.compute({ data: someDataset });
console.log(r); // --> 1

I think these all work with a Druid backend. I'm not sure what's the right way to do this in unit tests. Thanks for any advice!

Bucketing Time splits requires floorable Duration

Hi!

When creating bucketing action for time splits, there's check for floorability of Duration: https://github.com/implydata/plywood/blob/master/src/expressions/timeBucketExpression.ts#L47

So some more interesting Durations are of the limits (P2D for example). We had generated Druid query with floorable Duration, changed Duration value inside query and send it to Druid - it worked fine.

Is it possible to loosen this requirement if Druid could support it?

Add quantile support

Granularity builders should support origin parameter

Description

Queries should support origin in granularity builders. This is defined via http://druid.io/docs/latest/querying/granularities.html. Right now the response from Druid starts from the nearest year, month, or week depending on the granularity. It should start from the query's start time.

Use Case

Users with time range filters on their UI want to make an area chart from the user's selected start time to their end time, with a defined granularity.

Split Query Limitation

is there any limitation in the plywood query that only we can split upto 5 dimensions? I'm getting error if i add more that 5 in the split query. Please help

Parent Dataset Attribute Not Resolving

Plywood.ply()
      .apply("my_datasource", $("my_datasource")
            .filter($("timestamp").in({start: new Date("2018-01-01"), end: new Date() }))
            .split({UserId: '$user__id'}, 'visitorTypes')
            .apply('user__is_new', $("my_datasource").max('$user__is_first_session'))
            .filter($('user__id').in($('visitorTypes').filter($('user__is_new').is(0)).collect($('UserId'))))
      )
      .apply('data', $("my_datasource").count())

Plywood throws

Error: could not resolve $user__id

plywood/src/expressions/refExpression.ts

Line 254 in c767f76

if (!myTypeContext) {

MatchExpression does not support ignoring case

MatchExpression does not support ignoring case.

How to define JavaScript post-aggregator with plywood ?

Hello Team plywood,

I would like to know how to define JS Post Aggregator with plywood ?

In my use case, I need to apply the formula similar to following one:

$data.filter($type == "A").sum($count) * 1 + $data.filter($type == "B").sum($count) * 2 + $data.filter($type == "C").sum($count) * 3 + ...

Plywood will generate a query with many filter aggregators executed in druid side, which is very slow.

To improve the performance, I can only apply dimension on type and apply post aggregator for the final result. But I can't find the approach to add custom post aggregators with plywood.

How can I modify the start day of the week after selecting time shift

when select time shift i want to start from sunday.

query:

{ queryType: 'topN',
  dataSource: 'dataSource1',
  intervals: '2020-12-13T00Z/2020-12-27T00Z',
  granularity: 'all',
  context: { timeout: 600000 },
  virtualColumns:
   [ { type: 'expression',
       name: 'v:***__time',
       expression:
        'timestamp_floor(nvl(if((1608422400000<=__time && __time<1609027200000),__time,\'\'),timestamp_shift(__time,\'P1W\',1,\'Etc\\u002fUTC\')),\'P1W\',\'\',\'Etc\\u002fUTC\')',
       outputType: 'LONG' } ],
  dimension:
   { type: 'default',
     dimension: 'v:***__time',
     outputName: '***__time',
     outputType: 'LONG' },
  aggregations:
   [ { type: 'filtered',
       name: '!T_0',
       filter: [Object],
       aggregator: [Object] },
     { type: 'filtered',
       name: '!T_1',
       filter: [Object],
       aggregator: [Object] },
     { type: 'filtered',
       name: '!T_2',
       filter: [Object],
       aggregator: [Object] },
     { type: 'filtered',
       name: '!T_3',
       filter: [Object],
       aggregator: [Object] },
     { type: 'filtered',
       name: '!T_4',
       filter: [Object],
       aggregator: [Object] },
     { type: 'filtered',
       name: '!T_5',
       filter: [Object],
       aggregator: [Object] } ],
  postAggregations:
   [ { type: 'expression',
       expression: '604800000',
       name: 'MillisecondsInInterval' },
     { type: 'expression',
       expression:
        '((cast((("!T_0"+"!T_1")-"!T_2"),\'DOUBLE\')/604800000)*86400000)',
       name: 'elte_d' },
     { type: 'expression',
       expression:
        '((cast((("!T_3"+"!T_4")-"!T_5"),\'DOUBLE\')/604800000)*86400000)',
       name: '_previous__elte_d' },
     { type: 'expression',
       expression:
        '(((cast((("!T_0"+"!T_1")-"!T_2"),\'DOUBLE\')/604800000)*86400000)-((cast((("!T_3"+"!T_4")-"!T_5"),\'DOUBLE\')/604800000)*86400000))',
       name: '_delta__elte_d' } ],
  metric: { type: 'dimension', ordering: 'lexicographic' },
  threshold: 100 }

After modifying the dimension in the request parameters, it can start from Sunday.

Value of the modified dimension:

dimension:
   { type: 'extraction',
     dimension: '__time',
     outputName: '***__time',
     extractionFn:
      { type: 'timeFormat',
        granularity: { type: 'period', period: 'P1W', timeZone: 'Etc/UTC', origin: '1970-01-04T00Z' },  // add origin
        format: 'yyyy-MM-dd\'T\'HH:mm:ss\'Z',
        timeZone: 'Etc/UTC' } },

change default DimensionSpec to Extraction DimensionSpec, it work. But the result set is incorrect，how can I do to keep it from Sunday and the result is right, thanks!

union dataset support

It would be nice feature to allow union datasets that would match druid union datasource queries.
https://druid.apache.org/docs/latest/querying/datasource.html#union-datasource

concurrent limit requester getting stuck

We recently started observing plywood seemingly getting stuck. After some investigation we realized the concurrent limit requester is getting stuck and from its perspective the number of requests didn't decrease anymore.

I am not very familiar with node's streaming API but we patched the requester to listen to the close event instead of end & error and haven't observed a similar issue since, however. I am not a 100% sure if this actually fixed the issue as it only popped up sporadically or if there any drawbacks of using close. The node documentation states that the end event will not be triggered unless all data is read from the stream and my assumption is that we are somehow running into that edge case

Timezone not applied to timeBucket without any split

Hello Team Plywood,

When I use timeBucket to split the results by P1H with a timezone and without any other splits, the timezone is not applied to the timestamp.

And when I use timeBucket to split the results by P1H with the same timezone and with other splits, the timezone is applied to the timestamp.

So, the behaviours are different for timeBucket when there is or not a split. Would you please check this problem, please ?

implydata / plywood Goto Github PK

plywood's Introduction

Plywood

Installation

Documentation

Questions & Support

plywood's People

Contributors

Stargazers

Watchers

Forkers

plywood's Issues

Description

Use Case

when select time shift i want to start from sunday.

After modifying the dimension in the request parameters, it can start from Sunday.

Value of the modified dimension:

Recommend Projects

Recommend Topics

Recommend Org