Code Monkey home page Code Monkey logo

Comments (7)

linas avatar linas commented on May 22, 2024

Hmmm The default islands_ok should not have changed. That seems like a bug.

The downcasing of But is acceptable. The slitting of Corp. is .. odd. It would be better if it was not split.

from link-grammar.

ampli avatar ampli commented on May 22, 2024

From the recent change log:

  • Change default setting of 'islands_ok' to true.

I didn't like this change...

Regarding the splitting of Corp., Both Corp and . are in the dict, a thing that supports their splitting.

Should the rule be the following one?
If 2 tokens together are a string that is found in the dict, then they should not get split.
Edit: if 2 or more...

I fear it may not be general enough (for arbitrary languages), but I can try to add it (conditionally compiled by default, so we can experiment with it).

Another edit: it is surely not correct in general, so maybe the rule should be this:
If potential tokens, together, are a word in the dict, and at least one of them is LPUNC or RPUNC, then they should not get split.

I'm not sure that even this is general enough...

from link-grammar.

linas avatar linas commented on May 22, 2024

Gahhh. Yes, I now vaguely remember changing islands_ok to true, after reading the source code for it, and deciding it was a neat thing that we should always have enabled. I do not quite remember why I thought it would be a good idea to change it, but it did seem like a good idea. I guess I should have written down a justification ... Do yousee bad things happening, as a result?

Re corp vs corp.: OK, do not do anything. The right fix would be to assign the LG rules that link to "corp" to use a higher cost than those that link to "corp." That way, the form with the period would be preferred.

from link-grammar.

ampli avatar ampli commented on May 22, 2024

Consider the sentence:
*This is a test what

linkparser> !islands-ok=1
islands-ok set to 1
linkparser> this is a test what
No complete linkages found.
Found 8 linkages (8 had no P.P. violations) at null count 1
    Linkage 1, cost vector = (UNUSED=0 DIS= 2.00 LEN=10)
    +--------------Ww-------------+
    |              +---Osm--+     |
    |        +-Ss*b+  +Ds**c+     |
    |        |     |  |     |     |
LEFT-WALL this.p is.v a  test.n what 
linkparser> !islands-ok=0
islands-ok set to 0
linkparser> this is a test what
No complete linkages found.
Found 6 linkages (6 had no P.P. violations) at null count 1
    Linkage 1, cost vector = (UNUSED=1 DIS= 0.00 LEN=7)

    +----->WV----->+---Osm--+       
    +---Wd---+-Ss*b+  +Ds**c+       
    |        |     |  |     |       
LEFT-WALL this.p is.v a  test.n [what] 

In the parse with islands-ok=1, it is harder to see that this is a broken sentence. In case of a more complex sentence, it may be harder to even find the discontinuity problem(s) at a glance. It is also less clear that the sentence is correct without the word what, and it is not shown how the sentence could be correctly parsed without it.

from link-grammar.

linas avatar linas commented on May 22, 2024

OK. I'll try to deal with this next week

from link-grammar.

linas avatar linas commented on May 22, 2024

Fixed the Corp. thing in commit 40940eb

from link-grammar.

linas avatar linas commented on May 22, 2024

reverted islands_ok in 2f97e3c

from link-grammar.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.