In which situations such a thing can happen? When the parser p

My current Wordgraph code produces two word paths: <p dir="au

well, maybe not moved to print.c ... anyway, I opened issue <a class="issue-link js-is

/* XXX FIXME in the future, linkage->num_words might not match sent->length */ about link-grammar HOT 22 CLOSED

opencog commented on May 22, 2024

/* XXX FIXME in the future, linkage->num_words might not match sent->length */

from link-grammar.

Comments (22)

linas commented on May 22, 2024

For that particular comment, its case 2 that I had in mind.

Note, by the way, that even right now, the linkage can contain fewer words than the sentence, because compute_chose_words can mash together the morphemes, resultingg in fewer words.

Also, today, for russian, the sentence will contain nul words, while the linkage will have the null words removed, and thus be shorter.

from link-grammar.

linas commented on May 22, 2024

btw, the last round of changes removed many or most uses of sent->length.

from link-grammar.

ampli commented on May 22, 2024

My current Wordgraph code produces two word paths:

sane_linkage_morphism() produces a Wordgraph path that matches chosen_disjuncts, but it skips the empty words. Thus the index of a word in this path cannot be used to index chosen_disjuncts in order to find the disjunct of a word. For that purpose I included a word-array pointer (i.e. pointer to the corresponding sent->word[i]) in the Wordgraph words. The offset of this pointer in sent->word can be used to index chosen_disjuncts. This depends on sent->word[i] be in 1-to-1 correspondence to the chosen_disjuncts[i], hence my comment about the expected equal lengths.

If we eventually would like to get rid of the sent->word array, then I can instead put a numerical offset of the word-array instead of the address of the word in the word-array.
However, this has a problem in case of null (not empty) words.
With word-array pointers as I implemented it now (as described above), in case a Wordgraph words corresponds to a null word, one can look at the x-nodes of the word-array and immediately find the potential subwords that had no linkage (an API can be provided, for example). With a numerical offset and no sent->word array, this is harder, but still can be done - by analyzing the Wordgraph, but maybe this is not a real problem because an API can be provided for that too.
computer_chosen_words(), when !morphology=0, in my new code produces a contracted path, with Wordgraph words that match the combined morphemes. I'm not sure this is too useful. The said word offset can still be used to find the disjuncts, but it will be only of one of the morphemes. Instead, maybe an offset vector may be produced, indexed by the displayed words and in turn indexes the full Wordgraph path.

from link-grammar.

linas commented on May 22, 2024

Removal of sent->word is in the distant future,

An eventual design goal was to have the length of a linkage equal the length of wordgraph-path. Thus, if there are no empty words in the wordgraph-path, they shouldn't be in the linkage, either.

After compute_chosen_disjuncts() but before sane_linkage_morphism(), linkage->num_words will equal sent->length. After this, the the linkage could/should be shortened to omit empty words, so that a given linkage corresponds to only on wordgraph-path. After sane_linkage_morphism() returns, I do not know of any reason why sent->words would ever have to be touched again. Right? Am I missing somethig?

compute_chosen_words probably should be altered to remove the HIDE_MORPHO stuff, and the HIDE_MORPHO should be a distinct step, and moved to print.c. I guess I should open a distinct issue for that.

from link-grammar.

linas commented on May 22, 2024

well, maybe not moved to print.c ... anyway, I opened issue #39 for the HIDE_MORPHO work, and it would probably be better to create patches to fix that first, before doing othe things.

from link-grammar.

ampli commented on May 22, 2024

After compute_chosen_disjuncts() but before sane_linkage_morphism(), linkage->num_words will equal sent->length. After this, the the linkage could/should be shortened to omit empty words, so that a given linkage corresponds to only on wordgraph-path. After sane_linkage_morphism() returns, I do not know of any reason why sent->words would ever have to be touched again. Right? Am I missing somethig?

It is actually sane_linkage_morphism() first and then compute_chosen_disjuncts(). (And I think this order is how it should be, i.e. after the linkage step, the bogus linkages should be first removed, as if they have never existed, so the rest of the code will not process them at all.)

You are right that the empty words should be omitted first. In my Wordgraph version right now, it is still compute_chosen_words() which omits them. This makes it complicated and prone to errors. Today I found in it a bug when an empty word got between morphemes which it tried to combine,
and it combined the empty word instead. There is another complications regarding that too (when I try to combine null-word morphemes to show them as one string inside [], and they have an empty word between them).

So to make the code clearer, indeed first the empty words should be removed from chosen_disjuncts[] of the linkage before invoking compute_chosen_words, and then linkage->num_words will be <= sent-words. Since the wordgraph-path then will correspond exactly to chosen_disjuncts[], it will be trivial to find the disjuncts of a word in the wordgraph-path, and this will make redundant the word-array pointer field in the Wordgraph words (I will remove it).

Regarding sent->words, I think it can, as you see, be just neglected after the linkage (in my current code, even sane_linkage_morphism() doesn't touch it at all). The only reason to look at it is if you would like an easy way to provide an API to query null-word positions and get the list of morphemes that didn't get a linkage. But this can be of course found from the Wordgraph.

So I am going to implement these changes:

Removing empty words from chosen_disjuncts[], making it matching 1-to-1 the wordgraph-path.
Removing the word-array pointer from Gword.
Simplifying (the new) compute_chosen_words(). I will try to remove from it any reference to sent->word and sent->length.

from link-grammar.

linas commented on May 22, 2024

How much of this can you do before merging the word-graph code? I would be happier if you were able to feed me small changes, small pull requests, so that I could review each, rather than a giant snowball of stuff. (do you have snow over there?)

Anyway, right now, sine morphism runs after, not before chosen disjuncts. I don't know if it matters.

if you modify compute_chosen_words, dont forget issue #39

from link-grammar.

ampli commented on May 22, 2024

I can indeed back port many things that are not directly related to the Wordgrap (like I started to do regarding fixed-buffers). This will indeed make the patches more manageable, and if something malfunctions it will easier to bisect.

You are right about compute_chosen_disjuncts(). Somehow I really referred to compute_chosen_words() (the one that I totally changed...) which runs after...

So removing the empty words can be done just before compute_chosen_disjuncts() returns. Am I right?

(This will remove several lines from sane_linkage_morphism(), and more lines from compute_chosen_words().)

BTW, currently the new sane_linkage_morphism() has this code to neglect empty words (cdj is the chosen disjunct, word is a pointer to a Gword word corresponding to the disjunct, which in this case is a pointer to a detached static Gword that represents the empty word):

        if (MT_EMPTY == cdj->word[0]->morpheme_type)
        {
            lgdebug(4, "- Empty word\n");
            wp_new = wp_old;
            continue; /* totally disregard it */
        }

from link-grammar.

linas commented on May 22, 2024

Yes, removing empty words can be done any time after select_linkages().

From what I can tell, it should be safe to call compute_chosen_disjuncts() after sane_morphism() but I haven't tried.

I am thinking that perhaps it might be better to release a version 5.2.0 soon, and save the word-path code for 5.3.0 -- a lot of fixes and changes have gone in, and they mostly all seem to work, for now, so perhaps I need to do a full regression test and then package up 5.2.0 and ship it ..

from link-grammar.

ampli commented on May 22, 2024

I am thinking that perhaps it might be better to release a version 5.2.0 soon, and save the word-path code for 5.3.0 -- a lot of fixes and changes have gone in, and they mostly all seem to work, for now, so perhaps I need to do a full regression test and then package up 5.2.0 and ship it ..

I also think so. I have some more code that could be backported, like the "quotation markes as tokens". But when the work is in progress there is always more code than can be included, and we need to cut somewhere.

After you release 5.2.0 I can continue backporting some more independent features from my new code before including the bulk of the Wordgraph code.

from link-grammar.

linas commented on May 22, 2024

it will take me a few days or a week. Need to run the full regression tests, etc. and fix whatever comes up. SAT solver stopped working, although I don't think anyone cares.

from link-grammar.

ampli commented on May 22, 2024

I wrote a function remove_empty_words(Linkage), which is invoked at the end of compute_chosen_disjuncts():

    for (in=0; in < N_linkages_alloced; in++)
    {
        Linkage lkg = &sent->lnkages[in];
        Linkage_info *lifo = &lkg->lifo;
        if (lifo->discarded || lifo->N_violations) continue;

        partial_init_linkage(lkg, pi->N_words);
        extract_links(lkg, pi);
        /* Because the empty words are used only in the parsing stage, they are
         * removed here along with their links so from now on we will not need to
         * consider them. */
        remove_empty_words(lkg);
    }

BTW, I noted an old inconsistency in free_linkages() (happened when HIDE_MORPHO introduced):

        exfree((void *) linkage->word, sizeof(const char *) * linkage->num_words);
        exfree(linkage->link_array, sizeof(Link) * linkage->lasz);
        exfree(linkage->chosen_disjuncts, linkage->num_words * sizeof(Disjunct *));

The code above assumes that the number of chosen_disjuncts elements is the same as the number of linkage->word elements, which may not be so when HIDE_MORPHO is true. It doesn't cause a problem because exfree() is defined as free(). However, it is misleading. The question is how to fix it.

Use free() directly.
Define a field linkage->cdsz to hold its initial size.
Leave it as is...

I checked this remove_empty_words() in the Wordgraph version (of course I had to change the remaining after-linkage sent->length usages to linkage->num_words). If desired, I can backport it and send a pull request.

from link-grammar.

linas commented on May 22, 2024

If it is easy to backport, and it you feel like doing so, then yes, do it. I'm guessing that it might make the overall conceptual structure of the word-graph stuff clearer if you back-port. I can't tell if its urgent or not.

from link-grammar.

linas commented on May 22, 2024

Oh, and yes, chosen_disjuncts should have its size indicated.

from link-grammar.

ampli commented on May 22, 2024

It turned out a direct backport of this was very difficult, due to the complications it creates in both sane_linkage_morphism() and compute_chosen_words(). So I aborted this backport try.

On the other hand, in the Wordgraph version code, an early remove_empty_words() actually simplifies sane_linkage_morphism() and compute_chosen_words() (and as I said, it already works fine there).

So for the current code, I will send you a pull request consisting of:

linkage->cdsz for chosen_disjuncts.
New two debug print functions, unused for now but used in the Wordgraph version.
Convenience displaying of EMPTY_WORD.zzz in debug messages, as "ø".
Supress printing of the empty word in the output of the "!!" command (e.g. as happens with !!let's).

from link-grammar.

linas commented on May 22, 2024

ok, got it, thanks

from link-grammar.

linas commented on May 22, 2024

whoops ... I just accidentaly merged something into master that should have waited until after 5.2.5.

If your wordgraph code is ready, it can now be merged. I notice that you havenot created a pull request for it.

Meanwhile, fixes for 5.2.5 will have to go into a different branch, 5.2.x-maintenance Just FYI. ... I made a mistake which will complicate 5.2.5 ...

from link-grammar.

ampli commented on May 22, 2024

After the recent changes, the pull request now is not clean.
Hence I would like to rebase it on your current master.
I guess nobody based any work on it yet...

I currently have a branch wordgraph, which has messed commits, and a fixed one wordgraph1.
So I will rebase wordgraph1 and rename it to wordgraph, and send you a pull request.

from link-grammar.

linas commented on May 22, 2024

Its not urgent. I just wanted to tell you I made the mistake. If you have any fixes for 5.2.5, they should apply to the bracnh called 5.2.x-maintenance

from link-grammar.

ampli commented on May 22, 2024

I have fixes for 5.2.5. I installed MSVC13 in order to test their Windows compatibility. I currently try to install a 64bit regex library... If you know one that I can just download and extract - in a binary form - this may save me much work.

from link-grammar.

linas commented on May 22, 2024

No clue about windows regex.

from link-grammar.

ampli commented on May 22, 2024

I think the following is the only open issue here (besides of this

Removal of sent->word is in the distant future,

which can be deferred until a further notice).

ampli commented on Jan 23, 2015

I currently try to install a 64bit regex library

In order to close this issue, here is the solution for that, from my post to the LG group:

[...]I generated regex.lib from libtre-0.7.5 as follows, using dlltool from cygwin (I don't know if there is a better way):
dlltool -l regex.lib -d libtre/win32/tre.def -D regex.dll libtre/win32/bin/x64_release/libtre_dll.dll

from link-grammar.

/* XXX FIXME in the future, linkage->num_words might not match sent->length */ about link-grammar HOT 22 CLOSED

Comments (22)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent