Code Monkey home page Code Monkey logo

ud_italian-isdt's Issues

Fine Grain POS tags documentation

Hi! @dan-zeman @msimi @fginter

I am using SpaCy's italian model for research and from SpaCy's documentation I see it has been trained using this dataset.

Could you please provide some clarification (or documentation) on the meaning of the following fine-grain POS tags?

A, AP, B, BN, B_PC, CC, CS, DD, DE, DI, DQ, DR, E, E_RD, FB, FC, FF, FS, I, N, NO, PART, PC, PC_PC, PD, PE, PI, PP, PQ, PR, RD, RI, S, SP, SW, SYM, Sw, T, V, VA, VA_PC, VM, VM_PC, VM_PC_PC, V_B, V_PC, V_PC_PC, X

Inspecting the dataset I can see they are used to tag words but I cannot extract their meaning.

I tried to use SpaCy's explain() method but unfortunately only works with the English and German model.

It would really help me having something like the following explanation of the Universal Pos TAGS ( which are coarse-grain).

ADJ: adjective
ADP: adposition
ADV: adverb
AUX: auxiliary
CCONJ: coordinating conjunction
DET: determiner
INTJ: interjection
NOUN: noun
NUM: numeral
PART: particle
PRON: pronoun
PROPN: proper noun
PUNCT: punctuation
SCONJ: subordinating conjunction
SYM: symbol
VERB: verb
X: other

Thank you very much in advance!

Stray xpos tag: "Sw"

sent_id = 2Parole_4-180
word 19

The tag here is Sw instead of SW:

17      film    film    NOUN    S       Gender=Masc     14      nmod    14:nmod:di      _
18      Les     Les     PROPN   SW      Foreign=Yes     17      nmod    17:nmod _
19      invasions       invasions       PROPN   Sw      _       18      flat:name       18:flat:name    _
20      barbares        barbares        PROPN   SW      Foreign=Yes     18      flat:name       18:flat:name    SpaceAfter=No
21      .       .       PUNCT   FS      _       5       punct   5:punct _

Sentence where the period should be split

I believe the period at the end of the following train sentence should be split into a separate word. I can do that quite easily if that is correct.

# sent_id = 2Parole_2-176
# text = I soldati sono entrati nel teatro e hanno ucciso tutti i ceceni con un gas velenoso.
1       I       il      DET     RD      Definite=Def|Gender=Masc|Number=Plur|PronType=Art       2       det     2:det   _
2       soldati soldato NOUN    S       Gender=Masc|Number=Plur 4       nsubj   4:nsubj|10:nsubj        _
3       sono    essere  AUX     VA      Mood=Ind|Number=Plur|Person=3|Tense=Pres|VerbForm=Fin   4       aux     4:aux   _
4       entrati entrare VERB    V       Gender=Masc|Number=Plur|Tense=Past|VerbForm=Part        0       root    0:root  _
5-6     nel     _       _       _       _       _       _       _       _
5       in      in      ADP     E       _       7       case    7:case  _
6       il      il      DET     RD      Definite=Def|Gender=Masc|Number=Sing|PronType=Art       7       det     7:det   _
7       teatro  teatro  NOUN    S       Gender=Masc|Number=Sing 4       obl     4:obl:in        _
8       e       e       CCONJ   CC      _       10      cc      10:cc   _
9       hanno   avere   AUX     VA      Mood=Ind|Number=Plur|Person=3|Tense=Pres|VerbForm=Fin   10      aux     10:aux  _
10      ucciso  uccidere        VERB    V       Gender=Masc|Number=Sing|Tense=Past|VerbForm=Part        4       conj    4:conj:e        _
11      tutti   tutto   DET     T       Gender=Masc|Number=Plur|PronType=Tot    13      det:predet      13:det:predet   _
12      i       il      DET     RD      Definite=Def|Gender=Masc|Number=Plur|PronType=Art       13      det     13:det  _
13      ceceni  ceceni  NOUN    S       Gender=Masc|Number=Plur 10      obj     10:obj  _
14      con     con     ADP     E       _       16      case    16:case _
15      un      uno     DET     RI      Definite=Ind|Gender=Masc|Number=Sing|PronType=Art       16      det     16:det  _
16      gas     gas     NOUN    S       Gender=Masc     10      obl     10:obl:con      _
17      velenoso.       velenoso.       ADJ     A       Gender=Masc|Number=Sing 16      amod    16:amod _

xpos and upos meaning

Hi,

I would like to know where I can find the correspondent meaning of each acrimonious/tagset for the upos and xpos fields .

thank you

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.