Code Monkey home page Code Monkey logo

hyrise's People

Contributors

adi64 avatar apopiak avatar bensk1 avatar beta-alf avatar bouncner avatar carstenwalther avatar cmfcmf avatar dukeharris avatar engelfa avatar f4lco avatar fabianwiebe avatar hendraet avatar j-beyer avatar jfrohnhofen avatar johannalatt avatar jstriebel avatar klauck avatar lanice avatar lawben avatar maxifischer avatar michael-janke avatar mjendruk avatar mrks avatar nilsthamm avatar normanrz avatar svenlehmann avatar tbjoern avatar timzimmermann avatar tonista avatar torpedro avatar

Stargazers

 avatar  avatar

Watchers

 avatar  avatar  avatar

hyrise's Issues

Optimize sub-queries for <, <=, >, >=, =, <>

The LQP optimizer should use a join for queries like

SELECT *
FROM customer
WHERE c_acctbal < (
  SELECT MIN(o_totalprice)
  FROM orders
  WHERE o_custkey = c_custkey
)

(Returns all customers who cannot afford any of their orders.)

  • Implement this feature on a separate branch

Implement IN-reformulation for simple correlated queries

The LQP optimizer should use a join for a query like

SELECT *
FROM customer
WHERE c_custkey IN (
  SELECT o_custkey
  FROM orders
  WHERE o_totalprice > c_acctbal
)

(Returns all customers who have orders that they cannot afford.)

Incorrect reformulation with multiple aggregates

Example:

SELECT *
FROM customer
WHERE c_acctbal > (
  SELECT SUM(a.x)
  FROM (
    SELECT AVG(o_totalprice) AS x
    FROM orders
    WHERE o_custkey = c_custkey
    GROUP BY o_orderpriority
  ) AS a
)

(copy-paste for hyriseConsole: SELECT * FROM customer WHERE c_acctbal > (SELECT SUM(a.x) FROM (SELECT AVG(o_totalprice) AS x FROM orders WHERE o_custkey = c_custkey GROUP BY o_orderpriority) AS a))

Test SubqueryToJoinRule

  • Add test that rule creates anti-join instead of join with !=
  • Add red test that rule cannot handle duplicates yet
  • Adjust sqlite_testrunner_queries.sql, and add a new table if necessary

Tests for helper methods

extract_input_lqp_info

  • Success cases: (NOT) IN, (NOT) EXISTS, comparisons
  • (NOT) IN with constant list (NoRewriteConstantIn)
  • Correlated NOT IN (NoRewriteCorrelatedNotIn)
  • (NOT) IN and comparison: Left operand must be a column expression (NoRewriteIfLeftOperandIsNotAColumn, None that check for column expressions that are not table columns)
  • Uncorrelated (NOT) EXISTS not changed (NoRewriteUncorrelatedExists)

uses_correlated_parameters/assess_correlated_parameter_usage

  • Correlated parameter used in subquery
  • Correlated parameter used in join (NoRewriteIfJoinUsesCorrelatedParameter)
  • Correlated parameter used in projection (NoRewriteIfCorrelatedParameterInProjection)
  • Correlated parameter of outer subquery used
  • Counts correlated nodes and not number of correlated uses

try_to_extract_join_predicate

  • Predicate of unsupported type (NoRewriteCorrelatedNestedIn, NoRewriteCorrelatedNestedExists, ...)
  • Non-equals predicate below aggregate (NoRewriteIfCorrelatedParameterInPredicateOtherThanEqualsBelowAggregate)
  • Other side is not a column expression
  • Correlated parameter is from outer subquery/a placeholder

adapt_aggregate_node

  • No duplicate group by columns
  • Preserves aggregates

adapt_alias_node

  • No added duplicates
  • Preserves original duplicates
  • Preserves aliases of originals

adapt_projection_node

  • No added duplicates
  • Preserves original duplicates

find_pullable_predicate_nodes

  • Does not consider non-equals predicate below aggregate as pullable (overlaps with try_to_extract_join_predicate, basically just tests that the correct value for is_below_aggregate is passed to that method)
  • Finds nodes in certain input sides of joins:
    • inner/cross
    • outer join variants
    • semi/anti
  • Does not find nodes in the NULL-producing sides of outer joins
  • Does not find nodes in the right side of semi-/anti-joins
  • Does not find nodes below limits (NoRewriteIfCorrelatedParameterIsUsedBelowLimitNode)

copy_and_adapt_lqp

  • Does not change nodes from original LQP (important for multi output nodes)
  • Removes pullable predicates from the LQP
  • Note: Many cases are already covered by the tests for the adapt_* methods

Fix issue with recursive optimizations

Sample query:

SELECT s_suppkey
FROM supplier
WHERE s_suppkey IN (
  SELECT c_custkey
  FROM customer
  WHERE c_custkey IN (
    SELECT o_custkey
    FROM orders
    WHERE o_totalprice > c_acctbal AND o_totalprice > c_custkey AND o_totalprice > s_suppkey
  )
)

Single line version for pasting into hyriseConsole:

SELECT s_suppkey FROM supplier WHERE s_suppkey IN (SELECT c_custkey FROM customer WHERE c_custkey IN (SELECT o_custkey FROM orders WHERE o_totalprice > c_acctbal AND o_totalprice > c_custkey AND o_totalprice > s_suppkey))

Handle Alias nodes

Probably best done by extending their node_expressions to include all columns (since we join with a semi/anti join, we don't care about removing these columns again later).

Potential Hyrise bugs to report

  • SELECT COUNT(*) FROM orders WHERE o_orderkey < 150000000000000 overflows and returns 0 for TPCH (scale factor 0.1)
  • SELECT o_totalprice AS o, o_totalprice FROM orders returns two columns named o (is this expected SQL behavior?)

Run benchmarks

Test for different sizes for the subquery (1, 5, 150000)

  • Compare scan with hash join
  • TPC-H
    • Run compare_benchmarks.py
  • Microbenchmarks (generate and execute them with Google Benchmark Framework)
  • Post results in Telegram group

Further optimization of subqueries with comparators

SELECT c_custkey
FROM customer
WHERE c_acctbal < (
  SELECT MAX(o_totalprice)
  FROM orders
  WHERE o_custkey = c_custkey
)

returns all customers who have at least one order but cannot afford any of them.
We currently optimize it to an inner join that has c_acctbal < o_totalprice as its predicate.
However, the query could also be written in an even more efficient way that uses an equi join:

SELECT c_custkey
FROM customer
JOIN orders ON o_custkey = c_custkey
GROUP BY c_custkey, c_acctbal
HAVING c_acctbal < MAX(o_totalprice)

Specifically for this case, it is possible to boost it even further:

SELECT DISTINCT c_custkey
FROM customer
JOIN orders ON o_custkey = c_custkey
WHERE c_acctbal < o_totalprice

The SubqueryToJoinRule should apply these reformulation techniques when possible.

Implement optimization heuristic

Only optimize if it is worth it (e.g., only replace NL-JOIN with Hash-JOIN if at least 100 iterations are expected to be performed)

Optimize EXISTS

The LQP optimizer should use a join for queries like

SELECT *
FROM customer
WHERE EXISTS (
  SELECT o_custkey
  FROM orders
  WHERE o_custkey = c_custkey AND o_totalprice > c_acctbal
)

(Returns all customers who have orders that they cannot afford.)

Implement reformulation for multi-correlated queries

  • Find branch in Hyrise repo for multi-predicate JOINs (only supports Equi-JOINs)
  • Use it in IN-reformulation rule
  • Use it in EXISTS-reformulation rule
  • Add TODO as comment to reformulate OR as soon as the multi-predicate join supports this (Issue)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.