digitalheir / java-probabilistic-earley-parser Goto Github PK
View Code? Open in Web Editor NEW🎲 Efficient Java implementation of the probabilistic Earley algorithm to parse Stochastic Context Free Grammars (SCFGs)
License: MIT License
🎲 Efficient Java implementation of the probabilistic Earley algorithm to parse Stochastic Context Free Grammars (SCFGs)
License: MIT License
Suppose you have a grammar and a set of parsed sentences, we want to use inside-outside to estimate the most likely probability distribution for the grammar rules
Ensure that the probabilities in a SCFG are proper and consistent as defined in Booth and Thompson (1973), and that the grammar contains no useless nonterminals (ones that can never appear in a derivation).
check that no rules are doubled with different probabilities (in which case we either have undefined dehaviour or conflate the rules?)
Can you make a project, of this code, that I can execute/run on Mac OS by terminal, please? I haven't familiarity with Maven.
Thanks a lot in advance.
P.S if you make this example project, say me how can run it. Thanks!
I tried the following grammar:
S -> a
S -> S a
Reading it like this:
Grammar<String> grammar = Grammar.parse(
Paths.get("/some/path/test.cfg"), Charset.forName("UTF-8"));
Results in:
java.lang.RuntimeException: Matrix is singular.
at org.leibnizcenter.cfg.algebra.matrix.LUDecomposition.solve(LUDecomposition.java:140)
at org.leibnizcenter.cfg.algebra.matrix.Matrix.solve(Matrix.java:346)
at org.leibnizcenter.cfg.algebra.matrix.Matrix.inverse(Matrix.java:357)
at org.leibnizcenter.cfg.grammar.Grammar.getReflexiveTransitiveClosure(Grammar.java:134)
at org.leibnizcenter.cfg.grammar.Grammar.<init>(Grammar.java:102)
at org.leibnizcenter.cfg.grammar.Grammar$Builder.build(Grammar.java:416)
at org.leibnizcenter.cfg.grammar.Grammar.parse(Grammar.java:183)
at org.leibnizcenter.cfg.grammar.Grammar.parse(Grammar.java:166)
at com.vision4j.internal.cli.PlayTest.cfg(PlayTest.java:48)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:50)
at org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
at org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:47)
at org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:325)
at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:78)
at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:57)
at org.junit.runners.ParentRunner$3.run(ParentRunner.java:290)
at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:71)
at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:288)
at org.junit.runners.ParentRunner.access$000(ParentRunner.java:58)
at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:268)
at org.junit.runners.ParentRunner.run(ParentRunner.java:363)
at org.junit.runner.JUnitCore.run(JUnitCore.java:137)
at com.intellij.junit4.JUnit4IdeaTestRunner.startRunnerWithArgs(JUnit4IdeaTestRunner.java:68)
at com.intellij.rt.execution.junit.IdeaTestRunner$Repeater.startRunnerWithArgs(IdeaTestRunner.java:51)
at com.intellij.rt.execution.junit.JUnitStarter.prepareStreamsAndStart(JUnitStarter.java:237)
at com.intellij.rt.execution.junit.JUnitStarter.main(JUnitStarter.java:70)
Converting the grammar to right-recursive avoids this issue:
S -> a
S -> a S
I am using the latest version in Maven: 0.9.12
Is there something I misunderstood about the behaviour of the grammar or is this bug?
use streams/lambdas to automatically parallelize parse functions; report results
Is it possible to write a grammar to parse the following pattern:
...anything RULE1 anything RUL2 anything...
What i want is match the rules defined in the sentence, and ignore the noises (anything -> may be any characters)
Dear colleagues,
I'm exploring a good Earley parsers for writing cf-grammars, and this one seems to be friendly for me. Could you tell please, does this parser allow to write cf-grammars without probabilities setting?
P.S. I need Java parser like Lark (Python) for directly rule writing.
Thanks,
Daria
Hi again,
The example of how to use JPEP as a library is for "Parser.recognize". It would be nice to add a "println" of a parse tree, just like the command-line app does.
PS: in a previous issue, I mentioned I'm struggling with CommandLine's "argument magic". What I meant is that CommanLIne draws the parse by calling "System.out.println(parse.parseTree);", where "parse" is an object of class "ParseTreeWithScore", taking the arguments to the ParseTreeWithScore form the command-line arguments in a somewhat complex way (to me, at least). So I guess the question is how to build an object of type "ParseTreeWithScore" when using JPEP as a library, given a particular grammar and a set of tokens (as you would from the command-line).
Again, thanks and regards!
Easy addition. User might want to mess with the chart some.
Can you implement an error handling?
In case of error, we can be insert correct token and using synchronizing token method.
7-parsing-error.pdf
The parser currently can't handle rules of the form
X → ε (p)
where ε is the empty string.
See section 4.7 Null Productions on page 19 of Stolcke's paper.
We have the choice of extending prediction and completion to work with ε-rules, but this is a bit complicated. Another possibility is to rewrite the grammar to eliminate these productions, described at the end of page 20, 4.7.4 Eliminating null productions.
Best to implement the simpler solution first, and implement the philosphically correct version later.
Eg, N -> /(wo)?Man/i
Implementation almost done
Hello,
I'm trying to use java-probabilistic-earley-parser as a library. Following the instructions:
You can parse .cfg files as follows:
Grammar<String> g = Grammar.parse(Paths.get("path", "to", "grammar.cfg"), Charset.forName("UTF-8"));
I get (in Eclipse) the error:
Error: "The method parse(Path, Charset) is undefined for the type Grammar"
I'm not using the Maven dependency, I'm just adding the latest jar to my project.
From the command line, everything works and I get a nice parse tree based on my grammar file, but the CommandLIne class does some "magic" with the arguments and I'm struggling to figure out how to do the equivalent thing without command-line arguments.
Thanks in advance!
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.