Code Monkey home page Code Monkey logo

Comments (8)

nschneid avatar nschneid commented on July 18, 2024 1

If there is general agreement I would be open to adding a sentence along the lines of "If the word is deemed a 'real' word of the language, then another tag should be used, even if that word's morphosyntactic behavior is unusual."

from docs.

sylvainkahane avatar sylvainkahane commented on July 18, 2024

For spoken corpora, X is used for unfinished words (scraps? false start? I am not sure how you call that in English). But we are inconsistent: most of the time we can figure out what will be the complete word and we use the POS of the reapir. We hesitated between two strategies:

  1. using the POS of the corrected word (the repair) when we can figure it out.
  2. using X everytime and put the POS of the corrected word in ExtPos when we can figure it out.

I think I prefer Solution 2 because, even if "a~" is repaired by "after" and I know that "a~" was used here as the start of an ADP, I don't want to have "a~" among the ADPs of my corpus.

In our corpora of spoken French it is incoherent and we should take a clear decision
See https://universal.grew.fr/?custom=657ad60d136b4. We use "~" to indicate unfinished words, because "-" is used in orthographic words. It would be easy to change the annotation with a Grew rule as soon as we have decided what to do.

from docs.

Stormur avatar Stormur commented on July 18, 2024

I would like the definition to stress more that this POS (non-)tag should really be a last resort and that it is actually a non lexcial one, similarly as for dep.

from docs.

nschneid avatar nschneid commented on July 18, 2024

@sylvainkahane For words truncated/unfinished due to a dysfluency, my gut feeling is that X would make sense, falling under the word fragment subcase. There are also uses of reparandum where a word is repeated, and there I would expect the regular tag to apply on both tokens.

@Stormur "It should be used very restrictively." seems to say that...are you seeing places where it is overused?

from docs.

Stormur avatar Stormur commented on July 18, 2024

Maybe I am nitpicking, but it seems to leave space for creating own restrictions, which might be arbitrarily large as we know, instead of specifying that it is really the last thing you should do if there is no other possibility.

from docs.

nschneid avatar nschneid commented on July 18, 2024

Thanks @Stormur: the group agreed to emphasize that it should be used narrowly. Updated https://universaldependencies.org/u/pos/X.html

from docs.

nschneid avatar nschneid commented on July 18, 2024

And @sylvainkahane it now mentions truncated words. I think I agree with you about ExtPos being the right place for the intended word POS if it can be determined.

from docs.

sylvainkahane avatar sylvainkahane commented on July 18, 2024

Thanks @nschneid. I will adopt the POS X for all truncated words in our spoken corpora and add an ExtPos feature with the POS of the expected word.

from docs.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.