Code Monkey home page Code Monkey logo

plywood's Introduction

Plywood

Plywood is a JavaScript library that simplifies building interactive visualizations and applications for large data sets. Plywood acts as a middle-layer between data visualizations and data stores.

Plywood is architected around the principles of nested Split-Apply-Combine, a powerful divide-and-conquer algorithm that can be used to construct all types of data visualizations. Plywood comes with its own expression language where a single Plywood expression can translate to multiple database queries, and where results are returned in a nested data structure so they can be easily consumed by visualization libraries such as D3.js.

You can use Plywood in the browser and/or in node.js to easily create your own visualizations and applications.

Plywood also acts as a very advanced query planner for Druid, and Plywood will determine the most optimal way to execute Druid queries.

Installation

To use Plywood from npm simply run: npm install plywood.

Plywood can also be used by the browser.

Documentation

To learn more, see http://plywood.imply.io

Questions & Support

For updates about new and upcoming features follow @implydata on Twitter.

Please file bugs and feature requests by opening and issue on GitHub and direct all questions to our user groups.

plywood's People

Contributors

asherbitter avatar cheddar avatar chrismclaughlin55 avatar evasomething avatar fjy avatar gianm avatar jgoz avatar longweiquan avatar lorem--ipsum avatar mattallty avatar mcbrewster avatar mujinss avatar pengz-imply avatar qsss avatar tylerreece22 avatar vogievetsky avatar wylieallen-i avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

plywood's Issues

Time render Array instead an object

Right now, if i want to compare two different timeframes for the same object id i have to use a hack to duplicate the query and send it twice with pivot. Is there any way to do it sending just an array of two (or more) timeranges and let plywood do its magic?

something like:

   "action":"in",
   "expression":{  
      "op":"literal",
      "value":[{  
         "start":"2015-12-26T00:01:00.000Z",
         "end":"2015-12-27T00:01:00.000Z"
      },{  
         "start":"2015-12-26T00:01:00.000Z",
         "end":"2015-12-27T00:01:00.000Z"
      }],
      "type":"TIME_RANGE"
   }

Subquery Equality Filtering on Computed Column Failing

Plywood.ply()
      .apply("my_datasource", $("my_datasource")
          .filter(
            $("timestamp").in({
              start: new Date("2018-01-01"),
              end: new Date("2018-02-16")
            })
      ))
      .apply('visitorTypes', $("my_datasource")
             .split({ UserId: '$user__id' })
             .apply('user__is_new', $("my_datasource").max('$user__is_first_session')))
      .apply('data', $("my_datasource")
             .filter($('user__id').in(
                $('visitorTypes').filter('$user__is_new == 0').collect($('UserId'))
              ))
             .count())

Created a POST request of

{  
   "method":"POST",
   "url":"https://example.com/druid/v2/",
   "body":{  
      "queryType":"timeseries",
      "dataSource":"my_datasource",
      "intervals":"2018-01-01T00Z/2018-02-18T19:35:30.768Z",
      "granularity":"all",
      "context":{  
         "timeout":10000
      },
      "filter":{  
         "type":"or",
         "fields":[  

         ]
      },
      "aggregations":[  
         {  
            "name":"__VALUE__",
            "type":"count"
         }
      ]
   },
   "headers":{  
      "Content-type":"application/json"
   }
}

Which makes Druid throw

Error: Unknown exception: Instantiation of [simple type, class io.druid.query.filter.OrDimFilter] value failed: OR operator requires at least one field (through reference chain: io.druid.query.filter.OrDimFilter["fields"])

Since the dataset $user__is_new is a member of is in-memory, Plywood should filter in-memory instead of passing through to Druid.

Also of note, if I use a quantile filter instead of an equality filter (e.g.

 .filter($('ad__id').in(
    $('adsByCTR').filter('$ad__ctr <= $adsByCTR.quantile($ad__ctr, 0.25)').collect($('AdId'))
))

it works as expected.

Usage within Java

Are there any examples of usage within JVM? I have seen that currently only MySQL is supported, how could I and create a generic SQL layer for other databases?

create multiline graph with expression in Pivot config.yaml

Hello!

I really need to understand if its possible to aggregate few measures into one multiline graph by expression in config.yaml of my Pivot? it appears to be Plywood expression so writing question here.

So I have those 2 dimensions, like "dimension1" and "dimension2", and I need something like:
- name: combine
title: Combine
formula: [$main.sum($dimension1), $main.sum($dimension2)]

Thanks in advance!

Passing DATASET to .in

Is it possible to pass DATASET type to .in() operator?
I'm trying to filter on an attribute and then pass the output of that into another filter.

UnhandledPromiseRejectionWarning: Unhandled promise rejection (rejection id: 1): Type Error: in expression has a bad type combination STRING IN DATASET

Native Plywood performance slower than Pivot

Hello,
I was experimenting with Plywood and Pivot separately and I tried to run same query from Pivot as well as I have written a code in nodeJS which uses Plywood for running query on druid. But same query takes almost 1-2 seconds in Pivot.
But when I tried my nodeJS code which also uses plywood, my query took around 10 seconds for exactly same query and same results. I don't understand what the problem is. The plywood statement which I generated in my nodeJS code is as follows:

$src.split($__time.timeBucket('PT1M', 'Etc/UTC'), 'Time').apply('count', $src.sum($count)).sort($count,'descending').limit(20).apply('GroupBy', $src.split($dim1,dim1).apply('count', $src.su
m($count)).sort($count,'descending').limit(5).apply('GroupBy', $src.split($dim2,dim2).apply('count', $src.sum($count)).sort($count,'descending').limit(5)))

where "src" is my datasource and "dim1" and "dim2" are my dimensions.

I don't understand when pivot is also using plywood at the back, why using native plywood as query client is giving delayed results than pivot. Are there some settings which I have been missing?

-Sundaram

Filtering on Parseable String Filter Failing

Plywood.ply()
.apply("my_datasource", $("my_datasource")
      .filter(
        $("timestamp").in({
          start: new Date("2018-01-01"),
          end: new Date("2018-02-16")
        })
      )
      .apply('visitorTypes', $("my_datasource")
             .split({ UserId: '$user__id' })
             .apply('user__is_new', $("my_datasource").max('$user__is_first_session'))
       )
      .filter('$user__id in [$visitorTypes.filter($user__is_new == 1).collect($UserId)]')
)
.apply("count", $("my_datasource").count())

Fails with

Expression parse error: Expected "$", "+", "-", "false", "i$", "null", "ply", "true", "|", (, Name, Number, NumberSet, String, or StringSet but "[" found. on '$user__id in [$visitorTypes.collect($UserId)]'

While

.filter($('user__id').in(
    $('visitorTypes').filter('$user__is_new == 1').collect('$UserId')
))

Doesn't throw that error (but still fails due to #166)

Columns with spaces

Any way to get the apply function to work with spaces in column names?

var context = {
"dataset": dataset,
"log_count": "Log Count"
}

.apply('Log Count', '$dataset.sum($log_count)')

Error: sum must have expression of type NUMBER (is STRING)
at SumAction.Action._checkExpressionTypes (/root/druid/node_modules/plywood/build/plywood.js:7579:27)
at new SumAction (/root/druid/node_modules/plywood/build/plywood.js:10306:18)
at SumAction.Action._substituteHelper (/root/druid/node_modules/plywood/build/plywood.js:7750:20)
at /root/druid/node_modules/plywood/build/plywood.js:7247:76
at Array.map (native)
at ChainExpression._substituteHelper (/root/druid/node_modules/plywood/build/plywood.js:7247:38)
at ApplyAction.Action._substituteHelper (/root/druid/node_modules/plywood/build/plywood.js:7744:44)
at /root/druid/node_modules/plywood/build/plywood.js:7247:76
at Array.map (native)
at ChainExpression._substituteHelper (/root/druid/node_modules/plywood/build/plywood.js:7247:38)

Add support for new Druid Math Expressions

Druid recently released some very powerful functionality common to SQL queries which will provide a lot of useful functionality to Plywood.

Some of the key ones include things like

  • CASE statements

  • NVL statements

  • LIKE statements

among many others. The full list is available at http://druid.io/docs/latest/misc/math-expr.html.

Many use cases would benefit from this functionality, and would really add a lot of power to Plywood's existing functionality.

Filtering with Computed Columns Hanging

Plywood.ply()
.apply("my_datasource", $("my_datasource")
      .filter(
        $("timestamp").in({
          start: new Date("2018-01-01"),
          end: new Date("2018-02-16")
        })
      )
      .apply('visitorTypes', $("my_datasource")
             .split({ UserId: '$user__id' })
             .apply('user__is_new', $("my_datasource").max('$user__is_first_session'))
       )
      .filter($('user__id').in(
        $('visitorTypes').filter('$user__is_new == 1').collect('$UserId')
      ))
)
.apply("count", $("my_datasource").count())

Never resolves (from the 2min wait time I gave it) on a 100k row dataset

How to let plywood use listFiltering?

Hi!

Druid have listFiltered filtering.

Ex)
{
    "type": "listFiltered",
    "delegate": {
       "type": "default",
       "dimension": "tags",
       "outputName": "tags"
    },
    "values": ["t3"]
}

Effect of that is similar to that of SQL HAVING ... clause.

I already checked https://plywood.imply.io/expressions, but I couldn't found any clues.

Would you please to help me with that?

Thanks.

feature request: filter / split on nested json data (or request documentation if it's already there)

I'm not sure if plywood has nested json data support already, or not? from http://imply.io/docs/latest/tutorial-batch it says not

Let's use a small pageviews dataset as an example. Druid supports TSV, CSV, and JSON out of the box. Note that nested JSON objects are not supported, so if you do use JSON, you should provide a file containing flattened objects.

but when I give it nested json data, with Dataset.fromJS(nested_jsondata) it says this nested bidderTiming is filterable and splitable, however I don't know how to write the plywood expresssion

image

extractVersion should work for x.y version formats

presto's non-semantic versioning (currently 0.148) doesn't work with the current extractVersion()

here's a test case:

    it("works with super basic versions", () => {
      expect(External.extractVersion('0.1')).to.equal('0.1');
    });

Lookups

Hi, Is there any way to work with Druid Lookups ?

[https://druid.apache.org/docs/latest/querying/lookups.html](Druid Lookups Docs)

I have a project where I'm thinking about usage of plywood but this Lookups feature is really important to me.

Multi-Reference Concatenation Fails

A query that does some string manipulation to concatenate 2 or more columns fails since the expression contains many free references.

An example would be something like $column1 ++ 'exampleBaseString' ++ $column2

Error compile in v0.16.5

Compiling TypeScript
node_modules/plywood-base-api/index.d.ts(1,32): error TS7016: Could not find a declaration file for module 'readable-stream'. '/Users/xxx/Downloads/plywood-master/node_modules/readable-stream/readable.js' implicitly has an 'any' type.
src/datatypes/attributeInfo.ts(20,29): error TS7016: Could not find a declaration file for module 'has-own-prop'. '/Users/xxx/Downloads/plywood-master/node_modules/has-own-prop/index.js' implicitly has an 'any' type.
src/datatypes/common.ts(20,29): error TS7016: Could not find a declaration file for module 'has-own-prop'. '/Users/xxx/Downloads/plywood-master/node_modules/has-own-prop/index.js' implicitly has an 'any' type.
src/datatypes/dataset.ts(18,29): error TS7016: Could not find a declaration file for module 'has-own-prop'. '/Users/xxx/Downloads/plywood-master/node_modules/has-own-prop/index.js' implicitly has an 'any' type.
src/datatypes/set.ts(18,29): error TS7016: Could not find a declaration file for module 'has-own-prop'. '/Users/xxx/Downloads/plywood-master/node_modules/has-own-prop/index.js' implicitly has an 'any' type.
src/datatypes/valueStream.ts(17,27): error TS7016: Could not find a declaration file for module 'readable-stream'. '/Users/xxx/Downloads/plywood-master/node_modules/readable-stream/readable.js' implicitly has an 'any' type.
src/expressions/baseExpression.ts(19,29): error TS7016: Could not find a declaration file for module 'has-own-prop'. '/Users/xxx/Downloads/plywood-master/node_modules/has-own-prop/index.js' implicitly has an 'any' type.
src/expressions/baseExpression.ts(22,45): error TS7016: Could not find a declaration file for module 'readable-stream'. '/Users/xxx/Downloads/plywood-master/node_modules/readable-stream/readable.js' implicitly has an 'any' type.
src/expressions/joinExpression.ts(20,29): error TS7016: Could not find a declaration file for module 'has-own-prop'. '/Users/xxx/Downloads/plywood-master/node_modules/has-own-prop/index.js' implicitly has an 'any' type.
src/expressions/literalExpression.ts(23,29): error TS7016: Could not find a declaration file for module 'has-own-prop'. '/Users/xxx/Downloads/plywood-master/node_modules/has-own-prop/index.js' implicitly has an 'any' type.
src/expressions/numberBucketExpression.ts(17,29): error TS7016: Could not find a declaration file for module 'has-own-prop'. '/Users/xxx/Downloads/plywood-master/node_modules/has-own-prop/index.js' implicitly has an 'any' type.
src/expressions/refExpression.ts(18,29): error TS7016: Could not find a declaration file for module 'has-own-prop'. '/Users/xxx/Downloads/plywood-master/node_modules/has-own-prop/index.js' implicitly has an 'any' type.
src/expressions/splitExpression.ts(17,29): error TS7016: Could not find a declaration file for module 'has-own-prop'. '/Users/xxx/Downloads/plywood-master/node_modules/has-own-prop/index.js' implicitly has an 'any' type.
src/expressions/timePartExpression.ts(17,25): error TS7016: Could not find a declaration file for module 'moment-timezone'. '/Users/xxx/Downloads/plywood-master/node_modules/moment-timezone/index.js' implicitly has an 'any' type.
src/external/baseExternal.ts(18,69): error TS7016: Could not find a declaration file for module 'readable-stream'. '/Users/xxx/Downloads/plywood-master/node_modules/readable-stream/readable.js' implicitly has an 'any' type.
src/external/baseExternal.ts(21,29): error TS7016: Could not find a declaration file for module 'has-own-prop'. '/Users/xxx/Downloads/plywood-master/node_modules/has-own-prop/index.js' implicitly has an 'any' type.
src/external/baseExternal.ts(550,29): error TS7006: Parameter 'encoding' implicitly has an 'any' type.
src/external/baseExternal.ts(550,39): error TS7006: Parameter 'callback' implicitly has an 'any' type.
src/external/baseExternal.ts(554,15): error TS7006: Parameter 'callback' implicitly has an 'any' type.
src/external/baseExternal.ts(564,37): error TS7006: Parameter 'encoding' implicitly has an 'any' type.
src/external/baseExternal.ts(564,47): error TS7006: Parameter 'callback' implicitly has an 'any' type.
src/external/baseExternal.ts(581,23): error TS7006: Parameter 'callback' implicitly has an 'any' type.
src/external/baseExternal.ts(1444,25): error TS7006: Parameter 'chunk' implicitly has an 'any' type.
src/external/baseExternal.ts(1444,32): error TS7006: Parameter 'encoding' implicitly has an 'any' type.
src/external/baseExternal.ts(1444,42): error TS7006: Parameter 'callback' implicitly has an 'any' type.
src/external/baseExternal.ts(1517,21): error TS7006: Parameter 'chunk' implicitly has an 'any' type.
src/external/baseExternal.ts(1517,28): error TS7006: Parameter 'enc' implicitly has an 'any' type.
src/external/baseExternal.ts(1517,33): error TS7006: Parameter 'callback' implicitly has an 'any' type.
src/external/druidExternal.ts(20,29): error TS7016: Could not find a declaration file for module 'has-own-prop'. '/Users/xxx/Downloads/plywood-master/node_modules/has-own-prop/index.js' implicitly has an 'any' type.
src/external/druidExternal.ts(22,27): error TS7016: Could not find a declaration file for module 'readable-stream'. '/Users/xxx/Downloads/plywood-master/node_modules/readable-stream/readable.js' implicitly has an 'any' type.
src/external/druidExternal.ts(23,26): error TS7016: Could not find a declaration file for module 'stream-to-array'. '/Users/xxx/Downloads/plywood-master/node_modules/stream-to-array/index.js' implicitly has an 'any' type.
src/external/druidExternal.ts(168,14): error TS7006: Parameter 'sourcesArray' implicitly has an 'any' type.
src/external/druidExternal.ts(181,14): error TS7006: Parameter 'res' implicitly has an 'any' type.
src/external/druidExternal.ts(189,29): error TS7006: Parameter 'encoding' implicitly has an 'any' type.
src/external/druidExternal.ts(189,39): error TS7006: Parameter 'callback' implicitly has an 'any' type.
src/external/druidSqlExternal.ts(19,26): error TS7016: Could not find a declaration file for module 'stream-to-array'. '/Users/xxx/Downloads/plywood-master/node_modules/stream-to-array/index.js' implicitly has an 'any' type.
src/external/druidSqlExternal.ts(81,14): error TS7006: Parameter 'sources' implicitly has an 'any' type.
src/external/druidSqlExternal.ts(93,14): error TS7006: Parameter 'res' implicitly has an 'any' type.
src/external/mySqlExternal.ts(20,26): error TS7016: Could not find a declaration file for module 'stream-to-array'. '/Users/xxx/Downloads/plywood-master/node_modules/stream-to-array/index.js' implicitly has an 'any' type.
src/external/mySqlExternal.ts(73,14): error TS7006: Parameter 'sources' implicitly has an 'any' type.
src/external/mySqlExternal.ts(84,14): error TS7006: Parameter 'res' implicitly has an 'any' type.
src/external/postgresExternal.ts(19,26): error TS7016: Could not find a declaration file for module 'stream-to-array'. '/Users/xxx/Downloads/plywood-master/node_modules/stream-to-array/index.js' implicitly has an 'any' type.
src/external/postgresExternal.ts(87,14): error TS7006: Parameter 'sources' implicitly has an 'any' type.
src/external/postgresExternal.ts(95,14): error TS7006: Parameter 'res' implicitly has an 'any' type.
src/external/sqlExternal.ts(19,27): error TS7016: Could not find a declaration file for module 'readable-stream'. '/Users/xxx/Downloads/plywood-master/node_modules/readable-stream/readable.js' implicitly has an 'any' type.
src/external/utils/druidAggregationBuilder.ts(17,29): error TS7016: Could not find a declaration file for module 'has-own-prop'. '/Users/xxx/Downloads/plywood-master/node_modules/has-own-prop/index.js' implicitly has an 'any' type.
src/helper/concurrentLimitRequester.ts(18,29): error TS7016: Could not find a declaration file for module 'readable-stream'. '/Users/xxx/Downloads/plywood-master/node_modules/readable-stream/readable.js' implicitly has an 'any' type.
src/helper/retryRequester.ts(19,29): error TS7016: Could not find a declaration file for module 'readable-stream'. '/Users/xxx/Downloads/plywood-master/node_modules/readable-stream/readable.js' implicitly has an 'any' type.
src/helper/streamBasics.ts(17,26): error TS7016: Could not find a declaration file for module 'readable-stream'. '/Users/xxx/Downloads/plywood-master/node_modules/readable-stream/readable.js' implicitly has an 'any' type.
src/helper/streamBasics.ts(27,12): error TS2339: Property 'emit' does not exist on type 'ReadableError'.
src/helper/streamConcat.ts(17,37): error TS7016: Could not find a declaration file for module 'readable-stream'. '/Users/xxx/Downloads/plywood-master/node_modules/readable-stream/readable.js' implicitly has an 'any' type.
src/helper/streamConcat.ts(39,12): error TS2339: Property 'push' does not exist on type 'StreamConcat'.
src/helper/streamConcat.ts(42,55): error TS2339: Property 'emit' does not exist on type 'StreamConcat'.
src/helper/utils.ts(17,29): error TS7016: Could not find a declaration file for module 'has-own-prop'. '/Users/xxx/Downloads/plywood-master/node_modules/has-own-prop/index.js' implicitly has an 'any' type.
src/helper/utils.ts(18,48): error TS7016: Could not find a declaration file for module 'readable-stream'. '/Users/xxx/Downloads/plywood-master/node_modules/readable-stream/readable.js' implicitly has an 'any' type.

Question about supporting escape in string parsing rule

Hi! expert!

I currently using plywood library to mediate query to druid broker.
Yesterday I met expression parse error, so I digged deeper inside plywood and I found below.

String "String"
= "'" chars:NotSQuote "'" _ { return chars; }
/ "'" chars:NotSQuote { error("Unmatched single quote"); }
/ '"' chars:NotDQuote '"' _ { return chars; }
/ '"' chars:NotDQuote { error("Unmatched double quote"); }

In my case construct plywood string expression using query from client like below.

const expressionStr = `$dim.in(['${val1}', '${val2}'])`

and one of the parameters was 'banana",
plywood expression would be $dim.in([''banana"']) or $dim.in(["'banana""]).
Unfortunately, neither of these was accepted.

By parsing rule, single quote string's last character must match. Either 'some' or "some".
For now I avoided this by directly using plywood.LiteralExpression, so no more exceptions thrown.

So, I wonder any plans to support escape rule.

Thanks
Best regard.

Add additional filter on top of Expression

Hi all,

I have a question that is hopefully a simple one - I have an Expression object (that may or may not have a filter in it) and I would like to add another filter on top of it, that will be AND'ed with any existing filter.
The context is - I'd like to patch Swiv to add some filter on its server side that will limit the scope of the query to what the specific user is allowed to see.
I tried several things, my best guess was to add:

ex = ex.filter('$myField == "someValue"');

before the call to compute, but that fails with Error: could not resolve $myField (even though the field exists on all my Druid data sources)
Any guidance will be appreciated

Thank you

Eran

Expression.some does not work

Expression.some does not work properly.

Examples:

const e1 = Expression.parse("$main.countDistinct($user)");
const e2 = Expression.parse("$main.countDistinct($user) * 100");

Check:

e1.some(e => e instanceof CountDistinctExpression); // returns true
e2.some(e => e instanceof CountDistinctExpression); // returns false

That's because here:

return (v == null) ? null : !v;

we always return some boolean, and inside:

if (pass != null) {

we early return if we get non-null value. So everyHelper doesn't recurse and can't find correct expression nested.

Postgres support

Are there any plans to add support for postgres (and redshift) style sql? If you think this is something you would like to support I'd be happy to try my hand at a PR.

feature request: ability to split by TIME_FORMAT druid sql equivalent.

Hello,

I currently use timePart('HOUR_OF_DAY') to group by across many years.
I would also like to be able to group by hour of day and year across many years.
For example:

select SUM("value") as val, TIME_FORMAT(__time, 'YYYY-HH') AS "date" FROM "datasource" GROUP BY TIME_FORMAT(__time, 'YYYY-HH')

The above works in the druid sql console. I would love to have this ability in plywood as well.

Plywood and Druid zero-fill on timeseries queries

Hi! I'm new to Plywood framework. I'm using it to query a Druid database.

Suppose I want to count how many clicks certain link had from day 2019-10-1 until 2019-10-07 but some days don't have any data to show. How could I fill those missing days? I initially thought the operator.timeRange() would achieve that, but I guess I was wrong.

Any help is appreciated. Thanks in advance.

Edit: as I can see zero-filling is done by Druid, but I can't get those values returned from Plywood.

Incomplete results for deeply nested queries

For deeply nested queries, Plywood by default spawns up to 500 queries and this is a root cause of incomplete results, really hard to spot. The results from N-th row on the most deeply dimension(s) contain only "undefined" due to missing/skipped results.

return null; // Query limit reached, don't do any more queries.

I fully understand that limit is pretty high and could be changed via API but when the limit is reached I would expect an exception from Plywood rather than silently skipping queries/results. It gives me a clue that something is wrong with my query.

Also, I'm considering switching to regular "group by" query instead of tons of "top N" queries for deeply nested Plywood expressions. Is it possible with current Plywood version?

Negative values in NUMBER_RANGE

Working with negative numbers is broken in NUMBER_RANGE filters.
$main.filter($time.in([2015-12-17T00:01:00.000Z,2015-12-18T00:01:00.000Z)).and($Latitude.in([-1,12]))).split($Region,SEGMENT,main).apply(count,$main.count()).sort($count,descending).limit(101) gives empty result, while same request with $Latitude.in([0,12]) not. Such requests worked in 0.8.12 version, for example.

Getting incorrect count distinct value in Plywood

I am getting an incorrect count distinct value in plywood response. "useApproximateCountDistinct" options is set to false in Druid and I am getting exact count in Druid UI. But with Plywood, I am only getting approximate count.
I am using Plywood version 0.22.10 and Druid version 0.20.1
Is there any option to get exact count distinct with Plywood ?

test.html doesn't seem to contain valid code

Hi,

I'm trying to understand and play with plywood so I tried the test.html file but i'm getting the following error right away:

Uncaught TypeError: name must be a string$ @ plywood.js:4788(anonymous function) @ test.html:27

This seems to come from $() as it has no param.

Configure Plywood to not throw exception on unknown dimension

Hello,

I'm using Plywood (great lib!) for querying Druid with the plywood-druid-requester component.

I get could not resolve $some_dimension_name exceptions when I query some dimensions/metrics that has not been ingested in Druid yet (but are expected to come).

Is there a way to configure plywood, or the plywood-druid-requester, to not throw an exception?
The normal behaviour of Druid (no exception, and return dimensions values to null and metrics' to 0) would be expected.

Date equality works on Druid but not locally

I noticed this while writing a unit test.

const someDataset = plywood.Dataset.fromJS([
  { cut: 'Good',  price: 400, time: new Date('2015-10-01T00:00:00Z') },
]);
const ex = $('data').filter($('time').is(new Date('2015-10-01T00:00:00Z'))).count();
const r = await ex.compute({ data: someDataset });
console.log(r); // --> 0

I tried sharing the date object between the query and the dataset but it's the same.

const d = new Date('2015-10-01T00:00:00Z');
const someDataset = plywood.Dataset.fromJS([
  { cut: 'Good',  price: 400, time: d },
]);
const ex = $('data').filter($('time').is(d)).count();
const r = await ex.compute({ data: someDataset });
console.log(r); // --> 0

I get the same with equals(new Date(...)) or in([new Date(...)]). But in() works if you share the Date object!

const d = new Date('2015-10-01T00:00:00Z');
const someDataset = plywood.Dataset.fromJS([
  { cut: 'Good',  price: 400, time: d },
]);
const ex = $('data').filter($('time').in([d])).count();
const r = await ex.compute({ data: someDataset });
console.log(r); // --> 1

I think these all work with a Druid backend. I'm not sure what's the right way to do this in unit tests. Thanks for any advice!

Bucketing Time splits requires floorable Duration

Hi!

When creating bucketing action for time splits, there's check for floorability of Duration: https://github.com/implydata/plywood/blob/master/src/expressions/timeBucketExpression.ts#L47

So some more interesting Durations are of the limits (P2D for example). We had generated Druid query with floorable Duration, changed Duration value inside query and send it to Druid - it worked fine.

Is it possible to loosen this requirement if Druid could support it?

Granularity builders should support origin parameter

Description

Queries should support origin in granularity builders. This is defined via http://druid.io/docs/latest/querying/granularities.html. Right now the response from Druid starts from the nearest year, month, or week depending on the granularity. It should start from the query's start time.

Use Case

  • Users with time range filters on their UI want to make an area chart from the user's selected start time to their end time, with a defined granularity.

Split Query Limitation

is there any limitation in the plywood query that only we can split upto 5 dimensions? I'm getting error if i add more that 5 in the split query. Please help

Parent Dataset Attribute Not Resolving

In

Plywood.ply()
      .apply("my_datasource", $("my_datasource")
            .filter($("timestamp").in({start: new Date("2018-01-01"), end: new Date() }))
            .split({UserId: '$user__id'}, 'visitorTypes')
            .apply('user__is_new', $("my_datasource").max('$user__is_first_session'))
            .filter($('user__id').in($('visitorTypes').filter($('user__is_new').is(0)).collect($('UserId'))))
      )
      .apply('data', $("my_datasource").count())

Plywood throws

Error: could not resolve $user__id

on

if (!myTypeContext) {

How to define JavaScript post-aggregator with plywood ?

Hello Team plywood,

I would like to know how to define JS Post Aggregator with plywood ?

In my use case, I need to apply the formula similar to following one:

$data.filter($type == "A").sum($count) * 1 + $data.filter($type == "B").sum($count) * 2 + $data.filter($type == "C").sum($count) * 3 + ...

Plywood will generate a query with many filter aggregators executed in druid side, which is very slow.

To improve the performance, I can only apply dimension on type and apply post aggregator for the final result. But I can't find the approach to add custom post aggregators with plywood.

How can I modify the start day of the week after selecting time shift

screenshot-20210224-114153

when select time shift i want to start from sunday.

query:

{ queryType: 'topN',
  dataSource: 'dataSource1',
  intervals: '2020-12-13T00Z/2020-12-27T00Z',
  granularity: 'all',
  context: { timeout: 600000 },
  virtualColumns:
   [ { type: 'expression',
       name: 'v:***__time',
       expression:
        'timestamp_floor(nvl(if((1608422400000<=__time && __time<1609027200000),__time,\'\'),timestamp_shift(__time,\'P1W\',1,\'Etc\\u002fUTC\')),\'P1W\',\'\',\'Etc\\u002fUTC\')',
       outputType: 'LONG' } ],
  dimension:
   { type: 'default',
     dimension: 'v:***__time',
     outputName: '***__time',
     outputType: 'LONG' },
  aggregations:
   [ { type: 'filtered',
       name: '!T_0',
       filter: [Object],
       aggregator: [Object] },
     { type: 'filtered',
       name: '!T_1',
       filter: [Object],
       aggregator: [Object] },
     { type: 'filtered',
       name: '!T_2',
       filter: [Object],
       aggregator: [Object] },
     { type: 'filtered',
       name: '!T_3',
       filter: [Object],
       aggregator: [Object] },
     { type: 'filtered',
       name: '!T_4',
       filter: [Object],
       aggregator: [Object] },
     { type: 'filtered',
       name: '!T_5',
       filter: [Object],
       aggregator: [Object] } ],
  postAggregations:
   [ { type: 'expression',
       expression: '604800000',
       name: 'MillisecondsInInterval' },
     { type: 'expression',
       expression:
        '((cast((("!T_0"+"!T_1")-"!T_2"),\'DOUBLE\')/604800000)*86400000)',
       name: 'elte_d' },
     { type: 'expression',
       expression:
        '((cast((("!T_3"+"!T_4")-"!T_5"),\'DOUBLE\')/604800000)*86400000)',
       name: '_previous__elte_d' },
     { type: 'expression',
       expression:
        '(((cast((("!T_0"+"!T_1")-"!T_2"),\'DOUBLE\')/604800000)*86400000)-((cast((("!T_3"+"!T_4")-"!T_5"),\'DOUBLE\')/604800000)*86400000))',
       name: '_delta__elte_d' } ],
  metric: { type: 'dimension', ordering: 'lexicographic' },
  threshold: 100 }

After modifying the dimension in the request parameters, it can start from Sunday.

Value of the modified dimension:

dimension:
   { type: 'extraction',
     dimension: '__time',
     outputName: '***__time',
     extractionFn:
      { type: 'timeFormat',
        granularity: { type: 'period', period: 'P1W', timeZone: 'Etc/UTC', origin: '1970-01-04T00Z' },  // add origin
        format: 'yyyy-MM-dd\'T\'HH:mm:ss\'Z',
        timeZone: 'Etc/UTC' } },

change default DimensionSpec to Extraction DimensionSpec, it work. But the result set is incorrect,how can I do to keep it from Sunday and the result is right, thanks!

concurrent limit requester getting stuck

We recently started observing plywood seemingly getting stuck. After some investigation we realized the concurrent limit requester is getting stuck and from its perspective the number of requests didn't decrease anymore.

I am not very familiar with node's streaming API but we patched the requester to listen to the close event instead of end & error and haven't observed a similar issue since, however. I am not a 100% sure if this actually fixed the issue as it only popped up sporadically or if there any drawbacks of using close. The node documentation states that the end event will not be triggered unless all data is read from the stream and my assumption is that we are somehow running into that edge case

Timezone not applied to timeBucket without any split

Hello Team Plywood,

When I use timeBucket to split the results by P1H with a timezone and without any other splits, the timezone is not applied to the timestamp.

And when I use timeBucket to split the results by P1H with the same timezone and with other splits, the timezone is applied to the timestamp.

So, the behaviours are different for timeBucket when there is or not a split. Would you please check this problem, please ?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.