Right now we are using the whole space for querying . <p dir="auto

They are all present as a single Expression: <code class="notranslat

Filter Pushdown on indexed columns about qbeast-spark HOT 6 CLOSED

osopardo1 commented on June 30, 2024

Filter Pushdown on indexed columns

from qbeast-spark.

Comments (6)

osopardo1 commented on June 30, 2024

From recent findings, we can skip the push down filters step, since the filters needed are already present in OTreeIndex matchingFiles method

qbeast-spark/src/main/scala/io/qbeast/spark/sql/files/OTreeIndex.scala

Line 64 in 7edbfc6

dataFilters: Seq[Expression]): Seq[AddFile] = {

A result from doing:

val df = spark.read.format("qbeast").load("/tmp/qbeast_table")
df.filter("user_id > 537764969").explain(true)

Shows the user_id filter as part of the DataFilters on the last step:

== Physical Plan ==
*(1) Filter (isnotnull(user_id#1762) AND (user_id#1762 >= 537764969))
+- *(1) ColumnarToRow
   +- FileScan parquet [event_time#1755,event_type#1756,product_id#1757,category_id#1758L,category_code#1759,brand#1760,price#1761,user_id#1762,user_session#1763] Batched: true, ...
DataFilters: [isnotnull(user_id#1762), (user_id#1762 >= 537764969)],...
Format: Parquet, Location: OTreeIndex[file:/tmp/qb-testing6614606061063411331], PartitionFilters: [], PushedFilters: [IsNotNull(user_id), GreaterThanOrEqual(user_id,537764969)], ReadSchema: struct<event_time:string,event_type:string,product_id:int,category_id:bigint,category_code:string...

from qbeast-spark.

alexeiakimov commented on June 30, 2024

Just curiosity, what will be in the DataFilters collection if the original filter uses OR instead of AND?

from qbeast-spark.

osopardo1 commented on June 30, 2024

Good question! My guess is that they would not appear in DataFilters, but I will check it.

from qbeast-spark.

osopardo1 commented on June 30, 2024

They are all present as a single Expression:

DataFilters: [(((user_id#1762 >= 537764969) OR ((user_id#1762 < 666666666) AND (product_id#1757 >= 6789009)))

from qbeast-spark.

alexeiakimov commented on June 30, 2024

So if we want a precise filtering we still need to work with AST, right?

from qbeast-spark.

osopardo1 commented on June 30, 2024

Correct. For now, I think we should work only with conjunctions. There are functions in Spark that we can reproduce for splitting predicates, as Delta and other partition-aware formats do.
But maybe @cugni has other hints

Edited: In fact, for a query with only conjunctive predicates, Spark itself already separates them in different Expressions.

from qbeast-spark.

Recommend Projects

Filter Pushdown on indexed columns about qbeast-spark HOT 6 CLOSED

Comments (6)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent