Code Monkey home page Code Monkey logo

gcore-spark's People

Contributors

aldanadiego avatar dianageo93 avatar peterboncz avatar renzoar avatar rgarcias avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

gcore-spark's Issues

CONSTRUCT (v1 GROUP foo), (v1 GROUP bar IN v2)

Introduce a new optional IN subclause where "IN v1" is the new optional subclause of GROUP in CONSTRUCT. This means that v1 and v2 are going to belong to the same domain, so v1 and v2 will be the same now if expression foo and bar are the same, and therefore we can potentially also construct a self-referencing edge here (the use-case whereby Hannes Voigt championed this feature is graph summarization).

  • Change in spoofax parser
  • Change in CONSTRUCT translation

Add specific functions to make it easier to work with paths

This should start and be defined when the system stably works and we start testing it for trajectory storage & analysis applications.

For example, if we want to select paths from a repo of stored paths that pass through two nodes, cut them out and then feed them into GROUP BY paths we would at least need:

  • path_intersects(path,vertex) : bool
  • path_cut(path,vertex,vertex) : path

Construct with empty nodes

Describe the bug
The following query does not work : Construct () match ()
The problem seems to be the empty node.

unsuported expression: property p.prop = ""

For a expression of the form: construct (n) match (n) where n.name = "";
The parser assumes that n.name = n.""

| + MatchClause
| | + CondMatchClause
| | | + SimpleMatchClause
| | | | + GraphPattern
| | | | | + Vertex
| | | | | | + Reference [n]
| | | | | | + ObjectPattern
| | | | | | | + True$ [true, GcoreBoolean$]
| | | | | | | + True$ [true, GcoreBoolean$]
| | | | + DefaultGraph$
| | | + PropertyRef [n.""]

Error saving new graph

When a graph is saved, the application show an error and don't save the graph in hard disk.

Support creating stored paths

This means creating the path dataframes, which contain a src_id, dst_id and edge_list. Currently, these are not created yet in CONSTRUCT.

CONSTRUCT .. ()-[e HAVING …]-()

This feature applies a selection after grouping on the binding table for a construct pattern.

  • spoofax parser needs to be extended
  • Translation needs to be created (in CONSTRUCT)

Support the FROM table T clause

  • adds T as a binding table variable,
  • creates the new binding table as cartesian product between original binding table and T
  • support the columns of T as single-valued properties (this entails usage of properties in SELECT, MATCH and CONSTRUCT)

Implement missing semantic checks

  1. Expression types match for binary expressions (if possible to check)

  2. (?) An edge is between two distinct vertices

  3. Variable bindings in SimpleMatchClause have different names. Note: Bindings can be re-used across multiple SimpleMatchClauses. HOWEVER, edges should not be reused.

  4. Ambiguous labeling of entities. For example, in queries such as:(v1:L1)->(v2)<-(v3), (v1:L2)v1 is labeled differently in the two patterns.

  5. .Eliminate similar queries? For example:(v1:L1)->(v2), (v3:L1)->(v4)v1 and v3 are the same Vertex, v2 and v4 will be the same vertex, their edges are the same too, it’s a repeated query, which we translate into a join over the two edges.

  6. Validate that all keys in edgeRestriction (GraphSchema) are present in the graph. Also validate that all values in edgeRestrictions are present in the graph.

  7. ALL PATHS can only be used with stored paths

  8. (?) Throw error or warn the user if, after label inference, an entity can have more than one label. This translates into a UNION ALL of all labels for that entity.

  9. Each match variable must be matched on only one graph. Validation should go in MapBindingToGraph.

  10. All variables in an edge or path pattern should be part of the same graph. Validation should go in MapBindingToGraph.

  11. (?) An EXISTS subquery must have at least one common variable with the main MATCH clause.

  12. UnionAll operator is applied on two relations with the same header.

  13. Property exists for given label, or exists for given entity type.

  14. Property types match with Expression types.

  15. (?) No two Table’s contain the same id.

  16. A constructed entity must be of the same type as its matched counterpart, if they are the same variable (this can be checked in CreateGroupingSets).

  17. A specific GROUP clause can only be used with unbound variables. This can be checked in CreateGroupingSets

  18. Check that each variable in CONSTRUCT ends up in the end with at most one label - if the label is missing, then we can create a new one in VertexCreate. This check can be done in CreateGroupingSets.

  19. Are aggregate expressions allowed in the MATCH’s WHERE clause?

  20. An entity can only be GROUP-ed once, or else the GROUP-ings must be combined.

WHERE for MATCH

The current grammar does not allow a WHERE clause for the entire MATCH clause, when OPTIONAL patterns are included.

Correct use of parentheses RPQ

For some RPQ expression using parentheses, the query crashes.

To Reproduce
Steps to reproduce the behavior:

  1. Open the console
  2. Run a query like this: CONSTRUCT (n)-/@p:reach/->(m) MATCH (n)-/p<(:HasInterest |:IsLocatedIn)! :Knows>/->(m)
  3. Error: "Key not found"

Expected behavior
A successful query execution.

GCORE sparkSession support

Wrap the code in a gcore-spark module, that a single import statement initializes the gcore-spark subsystem, reads the default catalog, and then is ready to execute queries by adding some gcore(string) : Graph method to the sparkSession.

  • If the query is a SELECT query, we should just return a dataframe
  • If the query is a CONSTRUCT, we should return some SparkGraph object, and maybe offer some simple basic methods to look into graphs (like returning a list of V, E, P dataframes, or even couple with a graph visualization libraries).

Implement “full graph” operations

  • g1 OP g2 : g3, where OP in { UNION, INTERSECT, MINUS }
  • Essentially, we need to pair all (vertex,edge,path) dataframes (df1i,df2i) of both g1 and g2, with the same label. Use an empty dataframe if the dataframe does not exist in either graph. Then apply df3i = df1i OP df2i

Path operations

Define syntax and semantics for paths operations.

  • Operations between sets of paths: join, union, difference, intersection
  • Filter functions for paths: The WHERE clause could contains functions to filter paths, e.g. path.contains(node)
  • Path construction: the CONSTRUCT clause could contains operators to add labels and properties to the paths returned by the MATCH

CREATE and DROP GRAPH

Support CREATE GRAPH x, which indicates that graph x is persistent, and that the catalog has to be changed also in a persistent way.

Also support DROP GRAPH x, which indicates that a graph x that is persistent has to be deleted from Spark and the catalog.

CREATE GRAPH x should have a default semantics (for example, it should be default rule that indicates where graph x should be stored and in which format).

CREATE GRAPH x should also give the possibility to the user to specify some parameters such as directory where x is going to be stored, format for x, …

It could be something like CREATE GRAPH x (directory="/foo", format="parquet")

Support multi-valued properties

  • Spark dataframes can support lists of literal values, so this is easily added on the storage level.
  • We should support lists of literal values (list, list, list, list, list, list) in the schema languages and catalog
  • This means we should then also support the functionality of binding a property value to a variable. This has the effect of “unrolling” the multi-valued value into individual rows of the new binding table. If the multi-valued values was in fact unbound (NULL), the binding is not lost, but will consist of a single row in the binding table (with the variable taking NULL in that row).

Support SELECT

  • Add in spoofax parser
  • Translation: basically implement x.prop expressions on the binding table

Review the use of CREATE

Describe the bug
The following query is allowed: "CREATE 'nuevo' CONSTRUCT (n) MATCH (n)"

To Reproduce

  1. Execute: CREATE 'nuevo' CONSTRUCT (n) MATCH (n);

Expected behavior
Show a parser error, because it must be "CREATE GRAPH"

RPQ with KleeneBounds

Implement RPQ with KleeneBounds

Ex. MATCH (n:Person)-/ALL p<:knows*{2}>/->(m:Person)

Change syntax of unbound test

  • support use of NULL as a constant symbol indicating an unbound property
  • introduce“e IS NULL” (like SQL) and remove exist(e) notation (affects both parser and translation)

PATH expressions

  • basic PATH pat = (src)-...pattern..-(dst),... MATCH ... -/ pat* /- … ON g
    This can be implemented as a rewrite:

MATCH … -/ pat*/- … ON (
CONSTRUCT g,(src)-[pat]-(dst)
MATCH (src)-..pattern…-(dst),.. )

  • Add weighted paths on PATH expressions by adding a ’weight’ property on the newly created edges. In the translation and specifically the generation of the graphX code, we need GraphX to sum the “weight” properties to calculate the path length (as opposed to taking hop-count). It is probably better not to expose this on the g-core syntax level, but put this as an annotation on the GraphX operator during the translation.

The rewrite for weighted PATH pat = (src)-...pattern..-(dst),...COST ..Y.. used in MATCH … -/ pat* COST x/- … is therefore:

MATCH … -/ pat* COST x/- … ON (
CONSTRUCT g,(src)-[pat {weight:=..Y..}]-(dst)
MATCH (src)-..pattern…-(dst),.. )

again, somehow we need to ensure that COST x now gets filled not with the hopcount, but with the SUM(weight). The GraphX implementation already has some support for this, but it needs to be triggered. This extra info is probably best a property attached to the algebra tree nodes, so the GraphX code generation can pick it up and generate the appropriate stuff

GRAPH VIEW x AS CONSTRUCT ...

  • This is lightweight version of CREATE GRAPH
  • Keep the result dataframes of the CONSTRUCT query around in the Spark session(now they get de-allocated), but do not save them persistently as in CREATE GRAPH
  • Temporarily add the meta-data of the CONSTRUCT query to the catalog under graph name X (the new catalog should not be stored in disk)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.