Code Monkey home page Code Monkey logo

ucfs's Introduction

UCFS

Build Test

Note: project under heavy development!

About

UCFS is an Unuversal Context-Free Solver: a tool to solve problems related to context-free and regular language intersection. Examples of such problems:

  • Parsing
  • Context-free path querying (CFPQ)
  • Context-free language reachability (CFL-r)

Usage

Command Line Interface

Usage: kotgll options_list
Options: 
    --input -> Input format (always required) { Value should be one of [string, graph] }
    --grammar -> Grammar format (always required) { Value should be one of [cfg, rsm] }
    --sppf [ON] -> Sppf mode { Value should be one of [on, off] }
    --inputPath -> Path to input txt file (always required) { String }
    --grammarPath -> Path to grammar txt file (always required) { String }
    --outputPath -> Path to output txt file (always required) { String }
    --help, -h -> Usage info

From sources

Step 1. Clone repository

git clone https://github.com/FormalLanguageConstrainedPathQuerying/kotgll.git

or

git clone [email protected]:FormalLanguageConstrainedPathQuerying/kotgll.git

or

gh repo clone FormalLanguageConstrainedPathQuerying/kotgll

Step 2. Go to the folder

cd kotgll

Step 3. Run the help command

gradle run --args="--help"

You will see the "Options list" message.

Example

gradle run --args="--input graph --grammar rsm --sppf off --inputPath src/test/resources/cli/TestGraphReadWriteCSV/dyck.csv --grammarPath src/test/resources/cli/TestRSMReadWriteTXT/dyck.txt --outputPath ./result.txt"

Using JAR

Step 1. Download the latest JAR

curl -L -O https://github.com/FormalLanguageConstrainedPathQuerying/kotgll/releases/download/v1.0.0/kotgll-1.0.0.jar

Step 2. Run JAR with Java

java -jar kotgll-1.0.0.jar --input graph --grammar rsm --sppf off --inputPath src/test/resources/cli/TestGraphReadWriteCSV/dyck.csv --grammarPath src/test/resources/cli/TestRSMReadWriteTXT/dyck.txt --outputPath ./result.txt

CFG Format Example

StartNonterminal("S")
Nonterminal("S") -> Terminal("subClassOf_r") Nonterminal("S") Terminal("subClassOf")
Nonterminal("S") -> Terminal("subClassOf_r") Terminal("subClassOf")
Nonterminal("S") -> Terminal("type_r") Nonterminal("S") Terminal("type")
Nonterminal("S") -> Terminal("type_r") Terminal("type")

RSM Format Example

StartState(id=0,nonterminal=Nonterminal("S"),isStart=true,isFinal=false)
State(id=0,nonterminal=Nonterminal("S"),isStart=true,isFinal=false)
State(id=1,nonterminal=Nonterminal("S"),isStart=false,isFinal=false)
State(id=4,nonterminal=Nonterminal("S"),isStart=false,isFinal=false)
State(id=3,nonterminal=Nonterminal("S"),isStart=false,isFinal=true)
State(id=2,nonterminal=Nonterminal("S"),isStart=false,isFinal=false)
State(id=6,nonterminal=Nonterminal("S"),isStart=false,isFinal=true)
State(id=5,nonterminal=Nonterminal("S"),isStart=false,isFinal=false)
TerminalEdge(tail=0,head=1,terminal=Terminal("subClassOf_r"))
TerminalEdge(tail=0,head=4,terminal=Terminal("type_r"))
TerminalEdge(tail=1,head=3,terminal=Terminal("subClassOf"))
NonterminalEdge(tail=1,head=2,nonterminal=Nonterminal("S"))
TerminalEdge(tail=4,head=6,terminal=Terminal("type"))
NonterminalEdge(tail=4,head=5,nonterminal=Nonterminal("S"))
TerminalEdge(tail=2,head=3,terminal=Terminal("subClassOf"))
TerminalEdge(tail=5,head=6,terminal=Terminal("type"))

Performance

The GLL algorithm has been modified to support graph input. The proposed modification has been evaluated on several real graphs for the scenario of finding all pairs of reachability.

Machine configuration: PC with Ubuntu 20.04, Intel Core i7-6700 3.40GHz CPU, DDR4 64Gb RAM.

Enviroment configuration:

  • Java HotSpot(TM) 64-Bit server virtual machine (build 15.0.2+7-27, mixed mode, sharing).
  • JVM heap configuration: 55Gb both xms and xmx.

Graphs

The graph data is selected from CFPQ_Data dataset.

A detailed description of the graphs is listed bellow.

RDF analysis graphs

Graph name |V| |E| #subClassOf #type #broaderTransitive
Enzyme 48 815 86 543 8 163 14 989 8 156
Eclass 239 111 360 248 90 962 72 517 0
Go hierarchy 45 007 490 109 490 109 0 0
Go 582 929 1 437 437 94 514 226 481 0
Geospecies 450 609 2 201 532 0 89 065 20 867
Taxonomy 5 728 398 14 922 125 2 112 637 2 508 635 0

Grammars

All queries used in evaluation are variants of same-generation query. The inverse of an x relation and the respective edge is denoted as x_r.


Grammars used for RDF graphs:

G1

S -> subClassOf_r S subClassOf | subClassOf_r subClassOf 
     | type_r S type | type_r type

The representation of G1 context-free grammar in the repository can be found here.

The representation of G1 context-free grammar as recursive automaton in the repository can be found here.

Results

The results of the all pairs reachability queries evaluation on graphs related to RDF analysis are listed below.

In each row, the best mean time in seconds is highlighted in bold.

Graph CFG RSM GLL4Graph
Enzyme 0.107 0.044 0.22
Eclass 0.94 0.43 1.5
Go hierarchy 4.1 3.0 3.6
Go 3.2 1.86 5.55
Geospecies 0.97 0.34 2.89
Taxonomy 31.2 14.8 45.4

More results

More results, but in raw form, can be found in the repository kotgll_benchmarks

ucfs's People

Contributors

vadyushkins avatar dependabot[bot] avatar gsvgit avatar

Forkers

pertsevpv ane1y

ucfs's Issues

Separate incrementality logic

Separate incrementality logic into different interfaces for

  • DescriptorStorage
    • fun removeFromHandled
    • fun restoreDescriptors

and more

Speed up generator tests

The parser file is generated for each test run. Regeneration should only be performed if the generator or grammar is changed.

Build a hierarchy for RsmState class

Technically, the RsmState class contains from 3 entities:

  • simple state of Rsm
  • state with error recovery information
  • state with string view of path in Rsm

Some things are unnecessary and can be omitted under the right conditions (e.g. adding recovery information if recovery mode if OF).

Therefore, it is worth dividing them into different entities (classes, interfaces) and calling the appropriate instance in each case.

RSM Edges

Replace union of terminal and nonterminal edges in RSM with corresponding iterator

Nonterminals serialization

Now in serialization methods readRSMFromTXT and writeRSMToTXT nonterminals are defined by their name.

It is necessary to compare Nonterminals as objects, for example, store them with id and compare by it.

Parser generator: access to rules API

add to the API generator to access an alternative with the given name. And also inside the product -- access to a character by a given index/to all characters in the chain

Add information about path length

Right now we can get information only about reachable vertexes. But it would be very usefull, if we can get path length to each reachable vertex. For example, we could make assumptions about the correct declaration based on the path if there are several reachable vertices

RSM modification

Develop an interface of the following types

  • Add(nonterm, rightSide), which computes the union of automata if there was such a nonterm, otherwise adds an automaton.

  • Remove(nonterm, rightSide), which calculates the difference.

It is important that the states are preserved, because they are used to reuse the already calculated results. It is wrong to create a new automaton every time.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.