Code Monkey home page Code Monkey logo

Comments (36)

nschneid avatar nschneid commented on August 18, 2024 1

for fixed and most other cases this is already determined by the deprel: that is, ExtPos = expected POS of the deprel.

Indeed, the ExtPos information is already implied if the deprel is correct and is a functional relation (cc, case, mark, or advmod). But there are cases where fixed is used and the deprel is something less specific (e.g. pronouns can attach in a variety of ways), and in general, making ExtPos explicit highlights on the same line as the first word the fact that its UPOS does not control its deprel.

While I would be interested in discussing something similar when it is tied to an effective morphological strategy, e.g. in relation to VerbForm.

Could you elaborate?

from docs.

nschneid avatar nschneid commented on August 18, 2024 1

I am somewhat worried that a feature like ExtPos could go out of hands and be very much misinterpreted by new annotators, as it already happens for fixed.

Some treebanks are already using ExtPos. Treebanks are free to innovate with MISC attributes. As far as the validator is concerned, the only change will be for fixed expressions (and it will be a warning not an error). If there is enthusiasm for a broader definition of ExtPos down the road, that might lead to new guidelines, but I think that would be premature at this point.

I suspect requiring ExtPos on fixed expressions might actually encourage treebanks to reduce their use of fixed, because they will realize that most semantic multiword expressions can be accommodated by syntactically regular deprels (but we'll see).

It seems to me that most of your objections above are actually objections to the fixed analysis in the first place. I don't want to bog down this thread with debates about particular expressions, but given that the relation exists to capture grammatical words-with-spaces, it doesn't seem like there is much harm in assigning those a holistic tag (even if it is sometimes inferable from the deprel, just as ADP, ADV, CCONJ, SCONJ are usually inferable from the deprel for single words). Explicitly flagging, e.g. for "rather" in "rather than", that it is an ADV internally and part of a CCONJ expression externally (rather than some other anomaly leading to ADV/cc) seems like it would help treebank users see what is going on.

from docs.

sylvainkahane avatar sylvainkahane commented on August 18, 2024 1

I don't even understand why there is a discussion about the relevancy of ExtPos. ExtPos is just as relevant as upos, not more, not less. @Stormur if you said that ExtPost can be inferred from the syntactic relation, the same could be said about upos. (I don't think it is true that the POS can be inferred from the syntactic relation but that's not the point.) And even if it could be inferred, what is the problem to add ExtPos? I really don't understand the point.

One of the reasons we introduced ExtPos (apart the fact that in SUD our syntactic relations are less redundant with upos) is that it was difficult to track down the annotation errors or to find strange constructions because we add many unexpected pairs upos-relations. It is possible with Grew-match to search elements that have ExtPos=ADV or if no ExtPos, upos=ADV and then to get all the ADVs of one or several tokens (if you ExtPos on all fixed expressions as in French treebanks).

from docs.

dan-zeman avatar dan-zeman commented on August 18, 2024 1

if you said that ExtPost can be inferred from the syntactic relation, the same could be said about upos

I think it is slightly different in that I do not envision ExtPos for fixed being other than contextual, more or less by definition given its "externality".

I think it does not have to be that way. If I have to add ExtPos to all fixed expressions in a treebank using a script, the script will not look at the context and make inferences like "the incoming deprel is advmod, hence ExtPos=ADV". Instead, the script will have a list of the fixed expressions in the language and a "dictionary" UPOS for each of them. I may discover expressions that are currently fixed but I do not want them on the list, so I will change their annotation. And after I apply the script, I may ask the validator whether some of them occurred in a context that is not compatible with its new ExtPos, and fix the annotation if it does.

from docs.

nschneid avatar nschneid commented on August 18, 2024 1

Today the Core Group discussed FEATS vs. MISC and voted that FEATS would be a better home for ExtPos. Most MISC attributes are optional and unregulated at the universal level; putting ExtPos in FEATS gives it greater visibility and is in keeping with existing practice by the SUD group. Another practical advantage is a clear home in the docs for universal + language-specific pages (e.g. https://universaldependencies.org/en/feat/ExtPos.html). The encouragement to document the different values of ExtPos with examples in each language may have the effect of promoting discussion of the appropriate scope of fixed.

from docs.

sylvainkahane avatar sylvainkahane commented on August 18, 2024 1

Two anwers to @nschneid.

  1. Yes "de la" is the partitive article. I don't like this notion, in fact it is just the indefinite article for massive nouns. Note that the plural indefinite article "des" is also a portmanteau "de+les".

  2. Using an indefinite article in the subject position is not very felicitous in French: https://universal.grew.fr/?custom=668424e695391. When the subject is indefinite, we have a special construction. Rather than saying S V, we prefer "il y a S qui V" 'there is S that V", especially in spoken French: https://universal.grew.fr/?custom=668429b9e5348.

from docs.

nschneid avatar nschneid commented on August 18, 2024 1

@AngledLuffa these are great questions/observations about fixed consistency. Could you please move them to separate issues as I'm sure some will require discussion?

from docs.

nschneid avatar nschneid commented on August 18, 2024

N.B. Currently I have an EWT instance that is triggering a validator error: it is for the adpositional expression "due to" attaching as case, only the "to" is omitted so there is no fixed dependency. It would make sense to tag "due" as ADJ and ExtPos=ADP, but the validator needs to be updated to recognize the latter because it is not allowing an ADJ to attach as case.

from docs.

Stormur avatar Stormur commented on August 18, 2024

I sincerely do not see much utility in this, as for fixed and most other cases this is already determined by the deprel: that is, ExtPos = expected POS of the deprel.

While I would be interested in discussing something similar when it is tied to an effective morphological strategy, e.g. in relation to VerbForm.

from docs.

amir-zeldes avatar amir-zeldes commented on August 18, 2024

But there are cases where fixed is used and the deprel is something less specific (e.g. pronouns can attach in a variety of ways)

Exactly, for example "each other" has ExtPos=PRON but a variety of deprels.

from docs.

Stormur avatar Stormur commented on August 18, 2024

for fixed and most other cases this is already determined by the deprel: that is, ExtPos = expected POS of the deprel.

Indeed, the ExtPos information is already implied if the deprel is correct and is a functional relation (cc, case, mark, or advmod). But there are cases where fixed is used and the deprel is something less specific (e.g. pronouns can attach in a variety of ways), and in general, making ExtPos explicit highlights on the same line as the first word the fact that its UPOS does not control its deprel.

But there are cases where fixed is used and the deprel is something less specific (e.g. pronouns can attach in a variety of ways)

Exactly, for example "each other" has ExtPos=PRON but a variety of deprels.

Annotation practices of course interfere with what would be the "expected" POS (ExpPos 😬 ) for a dependency relation. But let's take each other as a specific example.

If I am not mistaken, this sequence would be labelled with ExtPos=PRON because it is considered a MWE behaving as a whole as a reciprocal pronoun. This means that we expect it to get relations obj, nsubj, obl, iobj: all of these entail a nominal part of speech, so either NOUN (+ PROPN) or PRON. The fact that this MWE is ascribable to PRON rather than NOUN derives from the fact that its "head" (and actually both elements) are of a synsemantic nature. But anyway, this would be an internal distinction to the fact of "behaving nominally". There are also other possible relations like conj, orphan, parataxis... which are neutral with respect to parts of speech, so they are not relevant here.

From the data, there appear to be just these relations in English treebanks for each other. Now, imagine that it were annotated with relation advmod. I am quite confident that in this case ExtPos would be set to ADV; if not, the correctness of advmod would be very doubtful (and in fact I think it would not be correct). This goes to show that ExtPos is a case of contextual annotation, as it is mechanically determined by the dependency relation: it is redundant and not useful. (Incidentally, I am very much against enforcing warnings from the annotator if this feature is to be annotated under MISC.)


UPDATE: I recognise the following interpretation is faulty, I am sorry for this. I am toning it down but I am leaving it here for the more general points.

Now, still more specifically to each other is why it should be annotated as fixed. It seems transparent: you have a contrastive element other modified by a distributive each, and this is a determinantal (or it might be argued, pronominal) phrase which behaves as any other nominal argument. I see that in some treebanks the "head" each gets the feature PronType=Rcp, which is problematic: if annotated at all, this should also go into MISC, exactly as it has been proposed for ExtPos. I think however that here we need to refer to a MWE annotation level and not let it percolate onto the morphosyntactic one.

By the way, the English case is quite different from the more or less corresponding Latin one, where we have a reciprocal element invicem: while this has transparent etymology in + vicem 'in [smb's/the other's] turn', it really looks crystallised and it does not appear where you would expect an oblique nominal phrase: you have it used as an obj, or you have things like ab invicem 'from each+other', ad invicem 'to each+other', etc. (i.e., here you would have two adpositions). No reason to split it to have it again annotated as fixed: this might appear on a derivational annotation layer, but it does not seem appropriate to the morphosyntactic one anymore.


So really I cannot see what ExtPos would add.


While I would be interested in discussing something similar when it is tied to an effective morphological strategy, e.g. in relation to VerbForm.

Could you elaborate?

Here allow me to refer to my article Formae reformandae (UDW5). Traditionally, we have labels like participle, infinitive, supine, masdar, etc. to refer to particular forms in verbal paradigms whereby a verb gets to be used as a different part of speech, as it were. So, the participle is a verbal adjective whereby I can say (examples in Latin):

  • scriptura poetissa 'going-to-write poetess', which might be also expressed as
  • poetissa quae scriptura est/erat/erit 'poetess who will be/was/is going to write', or similar, with a "finite" verb form (i.e., a "verby verb")

The form scriptura behaves in all like an adjective: inflection for gender/number/case, possibility of degree (scripturior, scripturissima), possibility of adverbialisation (scripture); but then also as a verb, in that can have the same argument structure: scriptura librum 'going to write a book', with accusative, instead of a nominal strategy like genitive *scriptura libri.

So, in the end using VerbForm=Part would be equivalent to ExtPos=ADJ (in fact, I have proposed a notation like Transposed=ADJ), but in this case this is tied effectively to morphology, and not to an invisible "global property" of a MWE.

from docs.

nschneid avatar nschneid commented on August 18, 2024

Now, still more specifically to each other is why it [each other] should be annotated as fixed.

Discussions of English pronouns are at #517, and docs at https://universaldependencies.org/en/pos/PRON.html. While it might be nice to show the historical origin of the expression with a relation other than fixed, it seemed our best option to express the reciprocal slot of the pronoun paradigm was to use fixed and treat the whole thing as PRON.

So really I cannot see what ExtPos would add.

Without ExtPos, how would one search a treebank for all expressions acting as pronouns? The rule would need to specify individual lexical items like "one another". But with ExtPos, it is easy to find the ones that are not PRON at the individual word level.

You mentioned conj etc.: these are cases where it is not always trivial to detect the UPOS from the deprel. From English-GUM: "husbands are likely to laugh at jokes about wives and vice versa"—ExtPos is necessary to express that "vice versa" functions as an ADV (coordinated with an ADJ).

There also may be languages with fixed expressions functioning as PART, for example. PART is idiosyncratic and not necessarily predictable from the deprel.

So, in the end using VerbForm=Part would be equivalent to ExtPos=ADJ (in fact, I have proposed a notation like Transposed=ADJ), but in this case this is tied effectively to morphology, and not to an invisible "global property" of a MWE.

The line between VERB and ADJ can be tricky and I don't know enough about Latin to weigh in here (VerbForm=Part as used in English is NOT equivalent to occurring in ADJ-like environments), but yes, there may be many good uses of ExtPos beyond fixed expressions.

from docs.

Stormur avatar Stormur commented on August 18, 2024

UPDATE: I know that in the haste of writing I put forth a faulty interpretation of English each other , I am sorry (but I am leaving it there). This however does not invalidate the other points.

Anyway, this is yet another case where, if each other is indeed a unique word like Latin invicem, written separately just for the vagaries of orthography, I think a token with spaces could be welcome.

from docs.

Stormur avatar Stormur commented on August 18, 2024

Without ExtPos, how would one search a treebank for all expressions acting as pronouns? The rule would need to specify individual lexical items like "one another". But with ExtPos, it is easy to find the ones that are not PRON at the individual word level.

One would look for all elements with nominal relations (nsubj, obj, nmod, ...) and select those whose head falls into a synsemantic word class. If the head is not synsemantic, I would put in doubt the pronominality of the expression. Conversely, I don't think that we want to assign ExtPos=ADV to phrases like gr. pro Kopf ~ 'each', lit. 'per head', or to any other oblique.

A similar thing has already to be performed to retrieve predicates: a word receiving advcl, csubj, etc. can well be a non-verb with an auxiliary. But I do not think that we want to assign ExtPos=VERB to those occurrences. The relation already tells us that. On the other hand, it is interesting to know if a csubj is headed by a verb form "mimicking" a NOUN or an ADJ.

I am somewhat worried that a feature like ExtPos could go out of hands and be very much misinterpreted by new annotators, as it already happens for fixed.


You mentioned conj etc.: these are cases where it is not always trivial to detect the UPOS from the deprel. From English-GUM: "husbands are likely to laugh at jokes about wives and vice versa"—ExtPos is necessary to express that "vice versa" functions as an ADV (coordinated with an ADJ).

This is a general problem which goes beyond the appropriateness of annotating ExtPos.

In this specific case, the issue has to be solved by addressing how to mark the presence of an ellipsis and/or the nature of vice versa: the annotation as ADV is a confusing factor here (in the sense that it does not look like the right solution, at least not to me). Annotating ExtPos here does not add anything, if possible it makes it even more confusing (I would immediately go look into the data to understand what justifies this asymmetry).

There also may be languages with fixed expressions functioning as PART, for example. PART is idiosyncratic and not necessarily predictable from the deprel.

We would need some example to discuss this. Anyhow, PART is rather restricted in what it can be associated to. Another point is that it is this idiosincraticity of PART annotation the problem we have to address.

So, in the end using VerbForm=Part would be equivalent to ExtPos=ADJ (in fact, I have proposed a notation like Transposed=ADJ), but in this case this is tied effectively to morphology, and not to an invisible "global property" of a MWE.

The line between VERB and ADJ can be tricky and I don't know enough about Latin to weigh in here (VerbForm=Part as used in English is NOT equivalent to occurring in ADJ-like environments), but yes, there may be many good uses of ExtPos beyond fixed expressions.

It really is the same in any Indo-European language (and beyond). What are non-ADJ-like environments of English VerbForm=Part (which should at the same time be non-VERB-like)? If it were so, could I dare to suggest that this annotation might need some revision from a typological point of view?

But the point is, transposition exists and a unified way to mark it could be useful.

from docs.

Stormur avatar Stormur commented on August 18, 2024

No problems in using it if one sees fit to do that, but only with making it more or less mandatory with warnings from the validator. I am contrary to that.


Then, my personal considerations about its utility still stand.

if you said that ExtPost can be inferred from the syntactic relation, the same could be said about upos

I think it is slightly different in that I do not envision ExtPos for fixed being other than contextual, more or less by definition given its "externality".

While in general it is true we are interested to see whether, say, an nmod is realised by a NOUN,/PROPN, PRON, ADJ, DET, NUM, VERB with a VerbForm... but in those cases, we have a syntactic word which does show characteristics of that word class.

It seems to me that most of your objections above are actually objections to the fixed analysis in the first place.

This is for sure a very big problem.

from docs.

Stormur avatar Stormur commented on August 18, 2024

I may discover expressions that are currently fixed but I do not want them on the list, so I will change their annotation. And after I apply the script, I may ask the validator whether some of them occurred in a context that is not compatible with its new ExtPos, and fix the annotation if it does.

I understand, but this is independent from ExtPos and based just on a query for fixed...

from docs.

dan-zeman avatar dan-zeman commented on August 18, 2024

I may discover expressions that are currently fixed but I do not want them on the list, so I will change their annotation. And after I apply the script, I may ask the validator whether some of them occurred in a context that is not compatible with its new ExtPos, and fix the annotation if it does.

I understand, but this is independent from ExtPos and based just on a query for fixed...

Yes, there is definitely extra work required. But if the validator is modified to take ExtPos into account, some of its current tests can be applied. The current state is that if the validator sees a fixed child, it will turn off many of its UPOS-DEPREL compatibility tests.

from docs.

Stormur avatar Stormur commented on August 18, 2024

I may discover expressions that are currently fixed but I do not want them on the list, so I will change their annotation. And after I apply the script, I may ask the validator whether some of them occurred in a context that is not compatible with its new ExtPos, and fix the annotation if it does.

I understand, but this is independent from ExtPos and based just on a query for fixed...

Yes, there is definitely extra work required. But if the validator is modified to take ExtPos into account, some of its current tests can be applied. The current state is that if the validator sees a fixed child, it will turn off many of its UPOS-DEPREL compatibility tests.

But this would be an extra test created from nothing, from the addition of this feature which itself can only be added on contextual grounds as by definition it cannot depend on the characteristics of the single components. Because if it would, then why fixed? And so it all boils down again to just checking all fixed combinations, whatever their dependency relations.

There is circularity here. I also fear that making ExtPos de facto mandatory would lead to an increase of fixed expressions in new annotation endeavours, as in a sense this would justify the use of fixed more than it is warranted (while we actually need the opposite, I think).

Now I will sit silent because I think I have already insisted too much on these points (sorry) and I am becoming grumpy and repetitive. But do not get me wrong, I can understand the implementation of tests like the ones you describe. However, all in all, I believe that these possible benefits are extremely marginal at best and that drawbacks on the contrary are too many. I would like to see a different "angle of attack" to the issues that we are confronting here.

from docs.

nschneid avatar nschneid commented on August 18, 2024

@dan-zeman has drafted a universal guidelines page: https://universaldependencies.org/u/feat/ExtPos.html

A couple of questions about French examples:

image image

from docs.

dan-zeman avatar dan-zeman commented on August 18, 2024

I took the French examples from the French documentation but I did not verify them in the French treebanks.

from docs.

nschneid avatar nschneid commented on August 18, 2024

I switched the "plutôt que" example to a "bien que" example from one of the treebanks.

@sylvainkahane or @bguil, maybe you could confirm the "de la" example of ExtPos=DET? Why would that not just be an ordinary ADP + DET combination?

from docs.

sylvainkahane avatar sylvainkahane commented on August 18, 2024

here are all the values of ExtPos in the French GSD treebank: https://universal.grew.fr/?custom=66841dc7423dd.
If you look at the DET value you find "de la" (and its variant "de l'"). Note that "de la" is not always an indefinite determiner, it can also be the combination of ADP "de" and the definite determiner "la".

from docs.

dan-zeman avatar dan-zeman commented on August 18, 2024

The de la example occurs 9 times in Sequoia.

from docs.

nschneid avatar nschneid commented on August 18, 2024

Ah I was querying for "la" as the lemma when it should be "le". OK I guess this is the partitive article construction. (Curious: Can "de la" ever be used on a subject? I mainly see it following a verb or preposition, where historically "de" might have acted as a preposition.)

from docs.

AngledLuffa avatar AngledLuffa commented on August 18, 2024

In terms of implementing this in English treebanks such as PUD, are we at the point of labeling sort of etc, or not there yet?

from docs.

amir-zeldes avatar amir-zeldes commented on August 18, 2024

FEATS would be a better home for ExtPos

Sounds good, will implement for GUM as well

from docs.

nschneid avatar nschneid commented on August 18, 2024

In terms of implementing this in English treebanks such as PUD, are we at the point of labeling sort of etc, or not there yet?

This ExtPos policy applies to all fixed expressions, if that's what you're asking. If there are questions about what counts as fixed that should go in other issues.

from docs.

AngledLuffa avatar AngledLuffa commented on August 18, 2024

Actually I just mean - are we now ready to label fixed expressions in PUD, or is there a reason to wait for the standard to be finalized and/or the validator to be updated?

from docs.

nschneid avatar nschneid commented on August 18, 2024

We're ready to implement! The validator is not updated yet (once it is there will be an official announcement of the new policy), but I've already implemented in EWT.

from docs.

amir-zeldes avatar amir-zeldes commented on August 18, 2024

GUM is implemented too, just moved it to FEATS, should update the next push

from docs.

AngledLuffa avatar AngledLuffa commented on August 18, 2024

Found some cases of up to which may need a fixed relation in EWT

Train section:

bundling together cheques of up to $1,000 from friends and family

but not up to the standards that I was told I should expect

the food was not up to par with the price tag

Test:

# text = I'll pay up to 200-250 for it if I have to.

from docs.

AngledLuffa avatar AngledLuffa commented on August 18, 2024

Where is the line to draw for as X as expressions? There are some marked in EWT, such as

**as well as** the fun filled social dance evening held every Saturday evening
I will often have **as many as** one per kitten

but then many others are not marked, such as

We should know **as much as** we can

from docs.

AngledLuffa avatar AngledLuffa commented on August 18, 2024

There are several fixed expressions marked in PUD which are not marked in EWT. Here are few:

not marked in EWT:

after all
After all, the internet is not a luxury

as if
photographs that looked **as if** they were from the 1970s

at best
**At best** it is naive and at worst it would yet again...

close to  ... similar to "approximately"
Cairo had a population of **close to** half a million

in addition   ... "furthermore"
**In addition**, statute determines the election of assembly of regions

Marked in PUD but not existing in EWT:

more or less:
The working time undertaken in this first hour is more or less equal to 45 minutes.

from docs.

AngledLuffa avatar AngledLuffa commented on August 18, 2024

What about down to in a phrase such as

# text = The horse I had posted about a couple weeks ago with the atrophied cheek muscles is down to his last resort for life.

incidentally, am happy that as a human, we have surgical options other than "shotgun" for deal with atrophied cheek muscles

from docs.

AngledLuffa avatar AngledLuffa commented on August 18, 2024

next to in EWT which possibly matches other next to ExtPos

If sites next to you don't have what you want
the sea next to you
I throw a treat across the floor or even right next to her paw
place it next to the couch
the fish look better next to them
right next to the ice machine

I don't see much difference with those and the following:

First room had used tissues next to the bed
It is next to Gare du Nord

although certainly there might be some subtle differences

dev, not marked:

# text = We are staying next to the airport which is located next to BARTrail.

test, not marked:

# text = Place is next to carval and walmart.

from docs.

AngledLuffa avatar AngledLuffa commented on August 18, 2024

Done

from docs.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.