Comments (7)
Hmmm The default islands_ok should not have changed. That seems like a bug.
The downcasing of But is acceptable. The slitting of Corp. is .. odd. It would be better if it was not split.
from link-grammar.
From the recent change log:
- Change default setting of 'islands_ok' to true.
I didn't like this change...
Regarding the splitting of Corp.
, Both Corp
and .
are in the dict, a thing that supports their splitting.
Should the rule be the following one?
If 2 tokens together are a string that is found in the dict, then they should not get split.
Edit: if 2 or more...
I fear it may not be general enough (for arbitrary languages), but I can try to add it (conditionally compiled by default, so we can experiment with it).
Another edit: it is surely not correct in general, so maybe the rule should be this:
If potential tokens, together, are a word in the dict, and at least one of them is LPUNC or RPUNC, then they should not get split.
I'm not sure that even this is general enough...
from link-grammar.
Gahhh. Yes, I now vaguely remember changing islands_ok to true, after reading the source code for it, and deciding it was a neat thing that we should always have enabled. I do not quite remember why I thought it would be a good idea to change it, but it did seem like a good idea. I guess I should have written down a justification ... Do yousee bad things happening, as a result?
Re corp vs corp.: OK, do not do anything. The right fix would be to assign the LG rules that link to "corp" to use a higher cost than those that link to "corp." That way, the form with the period would be preferred.
from link-grammar.
Consider the sentence:
*This is a test what
linkparser> !islands-ok=1
islands-ok set to 1
linkparser> this is a test what
No complete linkages found.
Found 8 linkages (8 had no P.P. violations) at null count 1
Linkage 1, cost vector = (UNUSED=0 DIS= 2.00 LEN=10)
+--------------Ww-------------+
| +---Osm--+ |
| +-Ss*b+ +Ds**c+ |
| | | | | |
LEFT-WALL this.p is.v a test.n what
linkparser> !islands-ok=0
islands-ok set to 0
linkparser> this is a test what
No complete linkages found.
Found 6 linkages (6 had no P.P. violations) at null count 1
Linkage 1, cost vector = (UNUSED=1 DIS= 0.00 LEN=7)
+----->WV----->+---Osm--+
+---Wd---+-Ss*b+ +Ds**c+
| | | | |
LEFT-WALL this.p is.v a test.n [what]
In the parse with islands-ok=1, it is harder to see that this is a broken sentence. In case of a more complex sentence, it may be harder to even find the discontinuity problem(s) at a glance. It is also less clear that the sentence is correct without the word what
, and it is not shown how the sentence could be correctly parsed without it.
from link-grammar.
OK. I'll try to deal with this next week
from link-grammar.
Fixed the Corp. thing in commit 40940eb
from link-grammar.
reverted islands_ok in 2f97e3c
from link-grammar.
Related Issues (20)
- asert in do_count HOT 8
- insane ./configure in github circleci HOT 3
- incorrect use of pcre2_match_data would result in severe memory leak HOT 15
- Emscripten build fails HOT 3
- multi-threaded dictionary crash HOT 18
- Problems in printing sentence split time HOT 2
- pool management idea... HOT 1
- Link-deduplication multi-connector issue
- Sharing parse choice elements HOT 8
- pool_alloc_vec() with too many elts.
- classic_parse: Sentence disjunct count 108279 exceeded limit 105123 HOT 6
- Stop litering `.lg_history` in current directory. HOT 7
- Get rid of `null_count>0` parsing HOT 26
- Flood-counting HOT 35
- SAT parser improvements HOT 4
- Get rid of max-cost HOT 14
- Open work items for 5.12.5 HOT 12
- Word "test" in English dict 5.12.1 vs older ones HOT 3
- `www.abisource.com` is not accessible HOT 16
- Make - failure to find link-names.o HOT 3
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from link-grammar.