andygrove / how-query-engines-work Goto Github PK

This is the companion repository for the book How Query Engines Work.

License: Apache License 2.0

Shell 1.25% Dockerfile 0.18% Kotlin 63.81% Java 20.72% Scala 14.04%

how-query-engines-work's Introduction

Hi there 👋

I'm Andy Grove, a software engineer specializing in distributed systems and query engines. I am the author of the book How Query Engines Work.

Open Source

I am a PMC member of the Apache Arrow and Apache DataFusion projects. I have made a number of code donations to these projects:

In 2018, I donated the Rust implementation of Apache Arrow
In 2019, I donated the DataFusion in-process SQL query engine
In 2021, I donated the Ballista distributed SQL query engine

I am also the original author of the sqlparser-rs crate.

Social Media

how-query-engines-work's People

Contributors

Stargazers

Watchers

Forkers

zznq jfz lvheyang mattyw harsha2010 codingcat fabianmurariu liliu88 mhatrep fedomn chenliu0831 rustin170506 xiaohan2013 abhishek-das-gupta waitingkuo wususu sel-fish fengys1996 chaojun-zhang at15 huaxiangsun adedayoominiyi yuan-yuan-jia duanmeng brucekellan huymq1710 risyomei yutiansut asad-awadia zen0fpy llama90 cyyeh tanvn jtt9340 lor-ela dborchard jalpan-randeri xuxiaotuan macohen zhangjiashen khalefa-ow manick02 ntileio4 rpmartz luyou-2023 dharanad

how-query-engines-work's Issues

If the sql statement contains (), an error will occur in SqlParser

If the sql where statement contains (), an error will occur in SqlParser , eg : " select a from t where (a=1 or b = 2) and c =3 "

Typo in the online book example

An example under https://howqueryengineswork.com/05-logical-plan.html#math-expressions has a small typo, it should be name instead of "mult"

override fun toField(input: LogicalPlan): Field {
      return Field("mult", l.toField(input).dataType)
}

null while reading parquet file

I'm trying to use the engine from Scala.
Pretty simple setup. Using example parquet file from testdata folder.
Code looks like this:

  val ctx = new ExecutionContext(Map.empty[String,String].asJava)
  val pqtSource = new ParquetDataSource("data/alltypes_plain.parquet")

  println(pqtSource.schema().toString)

  ctx.registerDataSource("pdata",pqtSource)
  val df2 = ctx.sql("select id,bool_col from pdata")
  val c2 =  ctx.execute(df2).iterator().asScala.toList.map(r => println(r))

First prinln statement works as expected, gives the structure:

Schema(fields=[Field(name=id, dataType=Int(32, true)), Field(name=bool_col, dataType=Bool), Field(name=tinyint_col, dataType=Int(32, true)), Field(name=smallint_col, dataType=Int(32, true)), Field(name=int_col, dataType=Int(32, true)), Field(name=bigint_col, dataType=Int(64, true)), Field(name=float_col, dataType=FloatingPoint(SINGLE)), Field(name=double_col, dataType=FloatingPoint(DOUBLE)), Field(name=date_string_col, dataType=Binary), Field(name=string_col, dataType=Binary), Field(name=timestamp_col, dataType=Binary)])

Second, only nulls:

Reading 8 rows
null,null
null,null
null,null
null,null
null,null
null,null
null,null
null,null

What am I doing wrong ? Or is might be a Scala incompatibility ?

Small question/clarification about whether something would fit into Logical Plans or Physical Plans

To cement my understanding of the book, I'm working through it a second time, this time changing the implementation from using Arrow vectors + types (columnar) to row-based data (Map<String, Any>) and JDBC types:

data class Field(val name: String, val type: JDBCType)
data class Schema(val fields: List<Field>)

interface DataSource {
    fun schema(): Schema
    fun scan(projection: List<String>): Sequence<Map<String, Any?>>
}

// Physical plans return iterators over rows.
interface PhysicalPlan {
    fun schema(): Schema
    fun children(): List<PhysicalPlan>
    fun execute(): Sequence<Map<String, Any?>>
}

// Physical expression interface
interface Expression {
    fun evaluate(rows: Map<String, Any?>): Any?
}

My question is this:

If I have different "translation" strategies for converting Logical Plans to SQL, which then get executed as a IE a JDBC query -- would these be considered different Physical Plans or would they be implementation details of the same Physical Plan?

IE, translating a Logical Plan to a SQL string with StringBuilder, versus using an ORM/Query-builder like jOOQ:

Example should be 12 months instead of 2

I think there is a typo ;)

https://github.com/ballista-compute/ballista-jvm/blob/ec55780c9985ddbc3a23d64cce575522e923ac8b/jvm/examples/src/main/kotlin/ParallelQuery.kt#L31

Page 43, `createPhysicalExpr` uses `PhysicalExpr` as return type but updated type name is just `Expression`

how-query-engines-work/jvm/query-planner/src/main/kotlin/QueryPlanner.kt

Line 62 in 9c27d47

fun createPhysicalExpr(expr: LogicalExpr, input: LogicalPlan): Expression =

andygrove / how-query-engines-work Goto Github PK

how-query-engines-work's Introduction

Hi there 👋

Open Source

Social Media

how-query-engines-work's People

Contributors

Stargazers

Watchers

Forkers

how-query-engines-work's Issues

If the sql statement contains (), an error will occur in SqlParser

Typo in the online book example

null while reading parquet file

Small question/clarification about whether something would fit into Logical Plans or Physical Plans

Example should be 12 months instead of 2

Page 43, `createPhysicalExpr` uses `PhysicalExpr` as return type but updated type name is just `Expression`

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent