Comments (8)
If there is general agreement I would be open to adding a sentence along the lines of "If the word is deemed a 'real' word of the language, then another tag should be used, even if that word's morphosyntactic behavior is unusual."
from docs.
For spoken corpora, X is used for unfinished words (scraps? false start? I am not sure how you call that in English). But we are inconsistent: most of the time we can figure out what will be the complete word and we use the POS of the reapir. We hesitated between two strategies:
- using the POS of the corrected word (the repair) when we can figure it out.
- using X everytime and put the POS of the corrected word in ExtPos when we can figure it out.
I think I prefer Solution 2 because, even if "a~" is repaired by "after" and I know that "a~" was used here as the start of an ADP, I don't want to have "a~" among the ADPs of my corpus.
In our corpora of spoken French it is incoherent and we should take a clear decision
See https://universal.grew.fr/?custom=657ad60d136b4. We use "~" to indicate unfinished words, because "-" is used in orthographic words. It would be easy to change the annotation with a Grew rule as soon as we have decided what to do.
from docs.
I would like the definition to stress more that this POS (non-)tag should really be a last resort and that it is actually a non lexcial one, similarly as for dep
.
from docs.
@sylvainkahane For words truncated/unfinished due to a dysfluency, my gut feeling is that X
would make sense, falling under the word fragment subcase. There are also uses of reparandum
where a word is repeated, and there I would expect the regular tag to apply on both tokens.
@Stormur "It should be used very restrictively." seems to say that...are you seeing places where it is overused?
from docs.
Maybe I am nitpicking, but it seems to leave space for creating own restrictions, which might be arbitrarily large as we know, instead of specifying that it is really the last thing you should do if there is no other possibility.
from docs.
Thanks @Stormur: the group agreed to emphasize that it should be used narrowly. Updated https://universaldependencies.org/u/pos/X.html
from docs.
And @sylvainkahane it now mentions truncated words. I think I agree with you about ExtPos being the right place for the intended word POS if it can be determined.
from docs.
Thanks @nschneid. I will adopt the POS X for all truncated words in our spoken corpora and add an ExtPos feature with the POS of the expected word.
from docs.
Related Issues (20)
- Insight into when to create a class of MWT vs not? HOT 38
- MW Tokenization Issues in Sindhi HOT 10
- Three questions on Setswana (Bantu) annotations HOT 11
- independent possessives cross-linguistically HOT 25
- Zero width spaces (U+200b) inside the token HOT 8
- coordinated copulas HOT 2
- when to annotate `compound` versus `obj` HOT 26
- Co-relative relative Clauses in Saraiki HOT 4
- incoherence of `acl:relcl` versus `acl` distinction HOT 13
- problem with annotation of "sadece" in UD_Turkish-BOUN HOT 10
- Insertion of two new Feats for voicing and euphony HOT 13
- Question on requirement for 'aux' not to have children (Tswana) HOT 6
- complements of "be" HOT 9
- Treatment of split "what a ((ADJ) NOUN)" construction in Low Saxon and Dutch HOT 16
- Dative Subjects Saraiki HOT 5
- Misidentified Lemmas in Spanish HOT 1
- clausal appos HOT 33
- Flat:foreign with Typo=Yes HOT 3
- acl vs xcomp vs advcl HOT 1
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from docs.