kuseman / payloadbuilder Goto Github PK

View Code? Open in Web Editor NEW

7.0 7.0 1.0 2.71 MB

SQL query engine

License: Apache License 2.0

Java 99.60% ANTLR 0.40%

payloadbuilder's People

Contributors

Stargazers

Watchers

Forkers

siggemannen

payloadbuilder's Issues

ESCatalog: Add support for nested predicates

Today if there is a nested mapping a simple where fails because it's not wrapped in a nested filter.

{
  "nestedType": {
    "type": "nested",
    "properties": {
      "value": {
        "type": "integer"
      }
    }
  }
}

select *
from _doc
where nestedType.value = 10

This should work and should produce a (partial) body:

{
  "nested": {
    "path": "nestedType",
    "filter": {
      "term": {
        "value": 10
      }
    }
  }
}

Core: Add support for extracting select values in to variables

select top 1 @var = t.col
from table t

Should be a runtime error if not all select items are assignments. Ie one cannot mix select items and assignment items
Should not yield any result set
In case of multiple rows returned, use the last Tuples value as assignment values

Add support for elasticsearch async requests

When you don't have narrow filters you can often get pretty big & heavy requests that take a long time to fully load in the PLB, this could maybe be used to alleviate that.

Core: Stricter table source alias policy

If a query like this is executed:

Select * From purchase p Where s.amount > 10

ie. a misspelling or similar this query keeps on going without any hits, the intention here was p.amount > 10 clearly.

To avoid such mistakes introduce these rules:

Only time where an alias is optional is a single select without joins
All column references must reference a table source with an alias.

This way from the example above rule number 2 would kick in and we would get a Invalid table source reference 's'

Editor: Asterisk selects returns wrong colums

When selecting asterisk data from an alias and the columns changes for rows, null is return for a column where the actual value is not null.

Problem is located in PayloadBuilderService where the columns changes on rows is not correctly detected.

Core: Add support to cache table rows

To avoid as much IO as possible it would be preferable to be able to cache as much as possible, even between different queries.

Add support for a new kind of table option(s)

with (cacheKey = <cache-expression>, cacheTTL = 10)

This only makes sense when having an index of the cached table source, otherwise it will be hard to cache because scan is the only alternative.

So for example (assuming an index on tableB)

select * from tableA a inner join tableB b with (cacheKey = listOf(a.id, @constant), cacheTTL=10) on b.id = a.id and b.active

Will put a caching-operator infront on index-operator for tableB.

BatchHashJoin
- scan(tableA)
- cache (key = listOf(a.id, @constant), TTL=10)
  - index(tableB)

The cache operator will collect keys that are missing from cache and fetch those from downstream and put to cache.
Cache-framework should be configurable from QuerySession

Predicate push down is wrong in some plans

Predicates is wrongly pushed down on left joins

select *
from tableA a
left join tableB b
 on b.col = a.col
where b.value <> ''

Here b.value <> '' is pushed down to tableB which is wrong.

Is some plans we could rewrite the left join into a inner join if there is a predicate that checks for non null values but that will be another time, now I think the best is to never push anything down to a left joined table source and let the user handle that

ESCatalog: Refactor

Fetch ES version to be able to build better abstractions regaring query building etc.

Detect single type version (_doc type at version 7.x approx)
Be able to differentiate between type string and keyword etc.

JsonOutputWriter doesn't write closing array

When having a setting like:

JsonSettings settings = new JsonSettings();
settings.setRowSeparator("\n");
settings.setResultSetsAsArrays(true);
return settings;

and only having one result set no end-array is written.
This becuase it's handeled inside initResult and should be handled in endResult

CsvOutputWriter/JsonOutputWriter doesn't support iterators

When having a projection of a lambda ie. p.map(x -> x.name) the output will be toString of the iterator

Editor: Remember recent files

Ease of life, either

Preload recent files
Recent files easily accessible in UI

Catalog: Add Jmx catalog

Query a Jmx host for values
Multi host query (wildcard like ES?) Host group config?

Table alias on TableFuntions is a bit off

Having a sub query expression like:

select
(
   select x.col
   from open_rows(a) x
   for object
)
from table a

don't work today because the TFV is beeing resolved to the destination alias => a
and having an alias and trying to use the alias yields a syntax error

Can be a bit difficult to solve becuase the framework today let's TVF's resolve the alias to
properly resolve qualifiers, so there needs to be some link between x and destination a above when resolving

SessionBatchCache isn't following interface contract

The resulting map should contain all input keys.

        for (TKey key : keys)
        {
            CacheEntry<List<Tuple>> entry = cache.get(key);
            if (entry != null)     <---- remove
            {
                result.put(key, entry.value);
            }
        }

Add support for notifying missing indices

When building the operator tree and a HashJoin was chosen then its just a index missing from choosing a BatchHashJoin and this info could be printed so session printer.

Editor: Split application config and user prefs

Today both catalog-extensions and user prefs. like recent files etc. resides in the same config.json.
This is problematic when releasing new versions with a bundled config-file for extensions since that would overwrite the user prefs.

Fix!

Catalog: Add JdbcCatalog

ESCatalog: Add searchable fields as index candidates

Today only __id is an index candidate but every index/analyzed field in a mapping is a potential index candidate.
Which would need a query and not a mget to fetch.

Add support for this.

Aggregate functions is a bit off when having group bys

When having a group by today the count for example doesn't know it's contained in a group by and hence a count(1) counts the scalar value 1 not the count of the group for expression 1

From TSQL-doc:

COUNT(*) returns the number of items in a group. This includes NULL values and duplicates.
COUNT(ALL expression) evaluates expression for each row in a group, and returns the number of nonnull values.
COUNT(DISTINCT expression) evaluates expression for each row in a group, and returns the number of unique, nonnull values.

So COUNT(1) should be treated as COUNT([expr]) ie. that is count the scalar 1 for each row in the group

Editor: Misc

Always start with a blank file when opening application
Open multi files in open dialog
Remember last open directory
Add menu alternative File -> Recent files (10 last opened files)

Grouping on multi qualified names doesn't work

This should work but doesn't

select obj.value
from table
group by obj.value

There is an issue in OperatorBuilderUtils#createGroupBy which doesn't handle multi qualifiers correctly

Don't allow multi TableAlias functions etc.

The resolving today is a bit messy becuase the framework allows for functions to return multiple aliases
eg. unionall(alias1, alias2) these kinds of constructs needs to go since they make the code to complex.

Instead we need to implement proper support for UNION operators etc.

ESCatalog: Cannot combine wildcard index with mget

Having a query to ES with an index yields a mget query to ES with ID's.
However if the index-property is of wildcard type (customer-*) then an invalid query is made
and ES responds

{
    "docs": [
        {
            "_index": "customer-*",
            "_type": "type",
            "_id": "id-to-doc",
            "error": "[customer-*] missing"
        }
    ]
}

Detect wildcard index and switch mget to a regular query instead.

Better parse errors

Add some logic in query parser and try to fix better messages for common errors.

Also move the existing parser errors into a new class so it can be reused in Queryeer.

Core: Add support for select fields in subquery

Today its only supported to query full row information in subqueries:

select *
from
(
  from table a
  inner join table b
    on b.col = a.col
) x

To fully comply with ANSI sql we need:

select *
from
(
  select a.value, b.value    <------- THIS
  from table a
  inner join table b
    on b.col = a.col
) x