Code Monkey home page Code Monkey logo

nlptools's People

Contributors

afader avatar bhadramani avatar gmjabs avatar harrysethi avatar jgilme1 avatar jstnhuang avatar schmmd avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

nlptools's Issues

NullPointerException

Got the NullPointerException . How to solve it ?

public class OpenIETest {
public static void main(String a[])
{
OpenIE openie = new OpenIE(new ClearParser(new ClearPostagger(new ClearTokenizer())),new ClearSrl(),false,false);
System.out.println( openie.extract("The whales will not eat the otters"));

}

}

Extend Stemmer interface to take PostaggedTokens in addition to just Strings

In some cases (particularly Morpha) adding a POS tag can be used improve stemming accuracy. For example, "Reye's Syndrome" is incorrectly stemmed (after tokenization) to "reye ' syndrome" unless postags are included. The original TextRunner demo passed postags to Morpha roughly like so:

val wordtag = word + "_" + tag
val morpha = new Morpha(new StringReader(wordtag))
morpha.yybegin(Morpha.scan)
return _lexer.next()

LogisticRegression should throw immediate error when there's missing feature

Otherwise, when you run it, you get this:

java.util.NoSuchElementException: key not found: which|who|that before rel
    at scala.collection.MapLike$class.default(MapLike.scala:228)
    at scala.collection.AbstractMap.default(Map.scala:58)
    at scala.collection.MapLike$class.apply(MapLike.scala:141)
    at scala.collection.AbstractMap.apply(Map.scala:58)
    at edu.knowitall.tool.conf.impl.LogisticRegression$$anonfun$1.apply(LogisticRegression.scala:41)
    at edu.knowitall.tool.conf.impl.LogisticRegression$$anonfun$1.apply(LogisticRegression.scala:40)
    at scala.collection.Iterator$$anon$11.next(Iterator.scala:328)
    at scala.collection.Iterator$class.foreach(Iterator.scala:727)
    at scala.collection.AbstractIterator.foreach(Iterator.scala:1157)
    at scala.collection.TraversableOnce$class.foldLeft(TraversableOnce.scala:144)
    at scala.collection.AbstractIterator.foldLeft(Iterator.scala:1157)
    at scala.collection.TraversableOnce$class.sum(TraversableOnce.scala:203)
    at scala.collection.AbstractIterator.sum(Iterator.scala:1157)
    at edu.knowitall.tool.conf.impl.LogisticRegression.getConf(LogisticRegression.scala:44)
    at edu.knowitall.tool.conf.impl.LogisticRegression.apply(LogisticRegression.scala:32)
    at edu.knowitall.chunkedextractor.Relnoun$$anonfun$main$2$$anonfun$apply$4.apply(Relnoun.scala:664)
    at edu.knowitall.chunkedextractor.Relnoun$$anonfun$main$2$$anonfun$apply$4.apply(Relnoun.scala:663)
    at scala.collection.immutable.List.foreach(List.scala:318)
    at edu.knowitall.chunkedextractor.Relnoun$$anonfun$main$2.apply(Relnoun.scala:663)
    at edu.knowitall.chunkedextractor.Relnoun$$anonfun$main$2.apply(Relnoun.scala:659)
    at scala.collection.Iterator$class.foreach(Iterator.scala:727)
    at scala.collection.AbstractIterator.foreach(Iterator.scala:1157)
    at edu.knowitall.chunkedextractor.Relnoun$.main(Relnoun.scala:659)
    at edu.knowitall.chunkedextractor.Relnoun.main(Relnoun.scala)

Token/PostaggedToken/ChunkedToken has poor serialization support

There's half-completed code to serialize these as a tab separated list of whitespace separated token aspects. We also want some serialization that keeps all the aspects together.

I@0/PRP/B-NP rode@5/VB/B-VP

vs.

I@0 rode@5  \t  PRP VB  \t  B-NP B-VP

In the tab format, should the offsets be separated from the tokens?

I rode  \t  0 5  \t  PRP VB  \t  B-NP B-VP

srl deserialization fails

doing RemoteSRL on this sentence:

"Thus a natural hazard will not result in a natural disaster in areas without vulnerability, e. g. strong earthquakes in uninhabited areas."

fails with this exception:

scala.MatchError: Could not deserialize relation: g._17.01 (of class java.lang.String)
        at edu.knowitall.tool.srl.Relation$.deserialize(Frame.scala:51)
        at edu.knowitall.tool.srl.Frame$.deserialize(Frame.scala:19)
        at Test$RemoteSrl$$anonfun$apply$1.apply(Test.scala:14)
        at Test$RemoteSrl$$anonfun$apply$1.apply(Test.scala:14)
        at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244)
...

curling the parse for that sentence returns this:

result_6.01:[AM-DIS=Thus_0, A1=hazard_3, AM-MOD=will_4, AM-NEG=not_5, A2=in_7]
g._17.01:[A0=e._16, A1=earthquakes_19, AM-LOC=in_20]

I'm not sure what those things mean, but it looks like there's an extra . in the second line than the deserialization matcher is expecting.

Parsers should have method `parsePostagged` and `parseTokenized`

This is presently a problem because it's difficult to use a thread-safe instance of ClearParser. With OpenNlpTokenizer for example.

Rob, would you be interested in looking into this after the present project is over? I'd like for you and John each to have a small project that digs into nlptools a little deeper. I realize you need to move on to work with Tony, but I think this would only take a small amount of your time.

`Token` should not be able to have whitespace

Unfortunately, BreezeSentencer uses Tokenizer.computeOffsets to compute offsets from the resulting sentences, so simply adding require(string.forall(!_.isWhitespace)) breaks BreezeSentencer.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.