queryproc / optimizing-subgraph-queries-combining-binary-and-worst-case-optimal-joins Goto Github PK

View Code? Open in Web Editor NEW

26.0 26.0 16.0 226 KB

Code for the paper titled "Optimizing Subgraph Queries by Combining Binary and Worst-Case Optimal Joins". VLDB'19

Home Page: http://amine.io/papers/wco-optimizer-vldb19.pdf

Shell 0.01% Python 3.04% ANTLR 0.90% Java 96.04%

optimizing-subgraph-queries-combining-binary-and-worst-case-optimal-joins's Introduction

Hi there 👋

I am an assistant professor at Polytechnique Montréal leading the Data Systems Group.

optimizing-subgraph-queries-combining-binary-and-worst-case-optimal-joins's People

Contributors

Stargazers

Watchers

Forkers

curiosityyy lmatz qsguo zhengyi-yang yuchen-ecnu kangfei zhengtongyan fabianmurariu g31pranjal danhlephuoc avudzor edison0521 tonyyxliu hongtaicao lxhq anhlt18vn

optimizing-subgraph-queries-combining-binary-and-worst-case-optimal-joins's Issues

No such file or directory

After I build the graphflow and change snap to edges.csv, I run the following command:
python serialize_dataset.py /absolute/dataset/edges.csv /absolute/data
but I got the following error:
No such file or directory: '/GRAPHFLOW_HOME/build/install/graphflow/bin/dataset-serializer': '/GRAPHFLOW_HOME/build/install/graphflow/bin/dataset-serialize
It seems that the build step didn't generate the whole graphflow.
I also git clone the graphflow repo and it can generate '/GRAPHFLOW_HOME/build/install/graphflow/bin/, but not the scripts to load data.
I guess that might be something wrong?

Looking forward to your reply!

Query results do not match

The number of tuples output is often 1 or 2 less than there actually are.

Reproduce:

edges.csv:

9,1,3
9,4,3
12,1,0
9,1,2
9,6,4
2,1,3
5,1,2
9,6,3
12,1,2
11,2,4

-vertices.csv:

0,1
1,1
2,0
3,1
4,2
5,1
6,1
7,1
8,1
9,1
10,2
11,0
12,2

commands:

root@dc124d4957e7:~# rm -r /root/data/graphflow/
root@dc124d4957e7:~# mkdir /root/data/graphflow/
root@dc124d4957e7:~# python3 eva_graphflow_stream/scripts/serialize_dataset.py /root/data/edges.csv /root/data/graphflow/ -v /root/data/vertices.csv
[INFO ][2023-07-04 16:11:17.845] KeyStore: Serializing the types and labels key store.
[INFO ][2023-07-04 16:11:17.853] Graph: Serializing the data graph.
root@dc124d4957e7:~# JAVA_OPTS='-Xmx500G' python3 eva_graphflow_stream/scripts/serialize_catalog.py /root/data/graphflow/ -v 2
[INFO ][2023-07-04 16:12:07.315] Catalog: serializing the data graph's catalog.
root@dc124d4957e7:~# python3 eva_graphflow_stream/scripts/execute_query.py "(a:1)-[3]->(b:1)" /root/data/graphflow/
(a:1)-[3]->(b:1)
[INFO ][2023-07-04 16:12:18.357] OptimizerExecutor: Dataset loading run time: 115.204859 (ms)
[INFO ][2023-07-04 16:12:18.370] OptimizerExecutor: Optimizer run time: 10.196823 (ms)
[INFO ][2023-07-04 16:12:18.372] OptimizerExecutor: Plan initialization before exec run time: 10.196823 (ms)
[INFO ][2023-07-04 16:12:18.374] OptimizerExecutor: Query execution run time: 0.0371 (ms)
[INFO ][2023-07-04 16:12:18.374] OptimizerExecutor: Number output tuples: 2
[INFO ][2023-07-04 16:12:18.375] OptimizerExecutor: Number intermediate tuples: 0
[INFO ][2023-07-04 16:12:18.375] OptimizerExecutor: Plan: SCAN (a)->(b)

number of output tuples:
expected 3
actual 2.

About the size of the adjacency list

In the class SortedAdjList, there is a function named "size". However, I think is should be "-" in the return statement. (Because in the offset, offset[i] is the number of all type(label) not bigger than i)

OutOfMemoryError:

Hello, Thank you for the open-source code,but I try to set the JAVA_OPTS='-Xmx50G', but it seems doesn't work,Could you please help me to fix it?
Exception in thread "main" java.lang.OutOfMemoryError: Java heap space
at ca.waterloo.dsg.graphflow.query.QueryGraph.addQEdgeToQGraph(QueryGraph.java:107)
at ca.waterloo.dsg.graphflow.query.QueryGraph.addEdge(QueryGraph.java:95)
at ca.waterloo.dsg.graphflow.query.QueryGraph$$Lambda$63/0x00000008001b7c40.accept(Unknown Source)
at java.base/java.util.ArrayList.forEach(ArrayList.java:1540)
at ca.waterloo.dsg.graphflow.query.QueryGraph.addEdges(QueryGraph.java:53)
at ca.waterloo.dsg.graphflow.query.QueryGraph.copy(QueryGraph.java:234)
at ca.waterloo.dsg.graphflow.planner.catalog.CatalogPlans.setNoops(CatalogPlans.java:277)
at ca.waterloo.dsg.graphflow.planner.catalog.CatalogPlans.setNextOperators(CatalogPlans.java:163)
at ca.waterloo.dsg.graphflow.planner.catalog.CatalogPlans.setNextOperators(CatalogPlans.java:168)
at ca.waterloo.dsg.graphflow.planner.catalog.Catalog.populate(Catalog.java:233)
at ca.waterloo.dsg.graphflow.runner.dataset.CatalogSerializer.main(CatalogSerializer.java:62)

queryproc / optimizing-subgraph-queries-combining-binary-and-worst-case-optimal-joins Goto Github PK

optimizing-subgraph-queries-combining-binary-and-worst-case-optimal-joins's Introduction

Hi there 👋

optimizing-subgraph-queries-combining-binary-and-worst-case-optimal-joins's People

Contributors

Stargazers

Watchers

Forkers

optimizing-subgraph-queries-combining-binary-and-worst-case-optimal-joins's Issues

No such file or directory

Query results do not match

About the size of the adjacency list

OutOfMemoryError:

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent