Code Monkey home page Code Monkey logo

albanian-pos's Introduction

Albanian-POS

This is a treebank for Standard Albanian. The trained models will be uploaded as soon as possible we have finished some implrovements.

Acknowledgments

.......

References

Morphological Tagging and Lemmatization of Albanian: A Manually Annotated Corpus and Neural Models Nelda Kote, Marenglen Biba, Jenna Kanerva, Samuel Rönnqvist, Filip Ginter

Cite as: arXiv:1912.00991 [cs.CL] (or arXiv:1912.00991v1 [cs.CL] for this version)

albanian-pos's People

Contributors

neldakote avatar

Stargazers

Redi Cane avatar Edis Hasaj avatar  avatar  avatar Natalia P avatar Toti Kadriu avatar

Watchers

Judit Acs avatar  avatar paper2code - bot avatar

albanian-pos's Issues

Typos in POS and morphological tagging

The POS distribution is this:

NOUN 25612
DET 354
ADJ 7635
PRON 31600
VERB 37705
ADV 3647
ADP 665
AUX 13091
PROPN 4427
NUM 4006
PART 2
ACC 1
AADV 1
NUON 1
NOM 1
PROPM 1

I suspect the rare ones are typos.

I compiled a list of the distribution of tag-value pairs and there seems to be a lot of typos:

POS: NOUN
  tag: Gender, value distribution: {'Fem': 6011, 'Masc': 6167, 'Mas': 1, 'Plur': 1, 'Sing': 1, 'Neu': 1, 'Nom': 1}
  tag: Number, value distribution: {'Sing': 8184, 'Plur': 4040, 'plur': 1, 'SIng': 3}
  tag: Definite, value distribution: {'Def': 8492, 'Ind': 3752, 'Nom': 1, 'Acc': 1}
  tag: Case, value distribution: {'Acc': 5183, 'Nom': 4631, 'Abl': 627, 'Dat': 637, 'Gen': 2302, 'Det': 1, 'ACC': 3}
  tag: VerbForm, value distribution: {'Part': 1}
  tag: Foreign, value distribution: {'Yes': 24}
  tag: Foreing, value distribution: {'Yes': 4}
  tag: Degre, value distribution: {'Pos': 1}
  tag: Numer, value distribution: {'Plur': 1}
  tag: PronType, value distribution: {'Dem': 1}
POS: DET
  tag: Gender, value distribution: {'Masc': 6, 'Fem': 7}
  tag: Number, value distribution: {'Sing': 8, 'Plur': 4}
  tag: Case, value distribution: {'Abl': 5, 'Acc': 6, 'Nom': 4, 'Dat': 1, 'Gen': 5}
  tag: PronType, value distribution: {'Art': 11, 'Prs': 1}
  tag: Person, value distribution: {'3': 1}
  tag: NumType, value distribution: {'Card': 1}
POS: ADJ
  tag: Degre, value distribution: {'Pos': 2797, 'Cmp': 33, 'Sup': 76, 'SUP': 1}
  tag: Degree, value distribution: {'Pos': 125, 'Po': 1}
  tag: Gender, value distribution: {'Masc': 4}
  tag: Number, value distribution: {'Plur': 2, 'Sing': 2}
  tag: Case, value distribution: {'Acc': 4}
  tag: Definite, value distribution: {'Ind': 4}
  tag: Foreign, value distribution: {'Yes': 1}
POS: PRON
  tag: PronType, value distribution: {'Prs': 125, 'Dem': 56, 'Int': 24, 'Ind': 102, 'Rel': 25, 'PRS': 1, 'Prd': 1}
  tag: Person, value distribution: {'3': 41, '2': 13, '1': 21}
  tag: Case, value distribution: {'Acc': 81, 'Nom': 66, 'Abl': 37, 'Dat': 36, 'Det': 1, 'Gen': 16}
  tag: Number, value distribution: {'Sing': 98, 'Plur': 72}
  tag: Gender, value distribution: {'Fem': 82, 'Masc': 72}
  tag: Poss, value distribution: {'Yes': 45}
  tag: Reflex, value distribution: {'Yes': 9}
  tag: Relfex, value distribution: {'Yes': 15}
  tag: Definite, value distribution: {'Def': 2, 'Ind': 1}
  tag: PronTYpe, value distribution: {'Ind': 1}
  tag: Prontype, value distribution: {'Rel': 2}
  tag: CCase, value distribution: {'Acc': 1}
POS: VERB
  tag: Mood, value distribution: {'Ind': 6072, 'Imp': 468, 'Cnd': 179, 'Sub': 1886, 'Opt': 1216, 'Adm': 2429, 'Int': 1, 'Def': 1, 'Indi*': 1, 'Nom': 1, 'Des': 3, 'Ipm': 1}
  tag: Tense, value distribution: {'Fut': 112, 'Pres': 5317, 'Imp': 2719, 'Past': 2871, 'Pqp': 2, 'Prs': 1, 'Impt': 1}
  tag: Person, value distribution: {'2': 2999, '1': 2808, '3': 5266}
  tag: Number, value distribution: {'Sing': 5840, 'Plur': 5037, 'Sign': 2}
  tag: Polarity, value distribution: {'Neg': 2}
  tag: VerbForm, value distribution: {'Part': 615, 'PART': 1, 'Inf': 1, 'Past': 1}
  tag: MMood, value distribution: {'Ind': 1}
  tag: Case, value distribution: {'Nom': 1, 'Acc': 1}
  tag: Definite, value distribution: {'Def': 2}
  tag: Gender, value distribution: {'Fem': 1, 'Masc': 1}
  tag: MVerbForm, value distribution: {'Part': 1}
POS: ADV
  tag: AdvType, value distribution: {'Loc': 73, 'Man': 302, 'Deg': 79, 'Tim': 114, 'Cau': 13}
  tag: Abbr, value distribution: {'Yes': 3}
  tag: Degree, value distribution: {'Pos': 1}
POS: ADP
  tag: Case, value distribution: {'Acc': 20, 'Abl': 26, 'Nom': 10, 'Dat': 1, 'Gen': 1}
  tag: AdvType, value distribution: {'Loc': 4}
  tag: Gender, value distribution: {'Masc': 1}
  tag: Number, value distribution: {'Sing': 2}
  tag: Definite, value distribution: {'Ind': 1}
  tag: PronType, value distribution: {'Prs': 2}
  tag: Person, value distribution: {'1': 1}
POS: AUX
  tag: Tense, value distribution: {'Pres': 50, 'Imp': 28, 'Past': 12}
  tag: Person, value distribution: {'2': 25, '3': 37, '1': 26}
  tag: Number, value distribution: {'Sing': 48, 'Plur': 39}
  tag: Mood, value distribution: {'Ind': 42, 'Adm': 25, 'Sub': 12, 'Opt': 12}
  tag: VerbForm, value distribution: {'Part': 2}
  tag: N0umber, value distribution: {'Sing': 1}
POS: PROPN
  tag: Definite, value distribution: {'Def': 2410, 'Ind': 579, 'Acc': 1, 'DEF': 1}
  tag: Gender, value distribution: {'Masc': 1810, 'Fem': 1141, 'Mac': 3}
  tag: Number, value distribution: {'Sing': 2714, 'Plur': 217, 'Sin': 1}
  tag: Case, value distribution: {'Nom': 1347, 'Dat': 91, 'Acc': 776, 'Gen': 810, 'Abl': 55, 'Det': 1, 'GenDat': 1, 'Geb': 1}
  tag: Degree, value distribution: {'Pos': 10}
  tag: Foreign, value distribution: {'Yes': 178}
  tag: Degre, value distribution: {'Pos': 66}
  tag: Foreugn, value distribution: {'Yes': 3}
  tag: NumType, value distribution: {'Card': 2, 'Ord': 1}
  tag: Foreing, value distribution: {'Yes': 6}
  tag: Defenite, value distribution: {'Def': 1}
  tag: Deegre, value distribution: {'Pos': 1}
  tag: Foreig, value distribution: {'Yes': 2}
POS: NUM
  tag: NumType, value distribution: {'Ord': 53, 'Card': 596}
  tag: Gender, value distribution: {'Fem': 15, 'Masc': 9}
POS: PART
  tag: PronType, value distribution: {'Prs': 1}
  tag: Person, value distribution: {'3': 1}
  tag: Number, value distribution: {'Sing': 2}
  tag: Gender, value distribution: {'Fem': 2}
  tag: Case, value distribution: {'Acc': 1, 'Nom': 1}
  tag: Definite, value distribution: {'Ind': 1}
POS: ACC
  tag: Case, value distribution: {'Acc': 1}
  tag: Gender, value distribution: {'Masc': 1}
  tag: Number, value distribution: {'Sing': 1}
  tag: Definite, value distribution: {'Ind': 1}
POS: AADV
  tag: AdvType, value distribution: {'Tim': 1}
POS: NUON
  tag: Gender, value distribution: {'Fem': 1}
  tag: Case, value distribution: {'Nom': 1}
  tag: Definite, value distribution: {'Def': 1}
  tag: Number, value distribution: {'Sing': 1}
POS: NOM
  tag: Definite, value distribution: {'Ind': 1}
  tag: Number, value distribution: {'Plur': 1}
  tag: Gender, value distribution: {'Masc': 1}
  tag: Case, value distribution: {'Acc': 1}
POS: PROPM
  tag: Gender, value distribution: {'Masc': 1}
  tag: Definite, value distribution: {'Def': 1}
  tag: Number, value distribution: {'Sing': 1}
  tag: Case, value distribution: {'Nom': 1}

How to interpret short train-short-new.conllu

Thank you for this corpus.

How should I interpret the train file with extremely short sentences? What is the source of this file and why aren't longer sentences included in the train file?

How to use this?

Maybe you could have an example in the readme. On my side it's the first time that I heard about Conllu.
Also, since it's a public available model and your model is on a paper, would you publish the sources to gather the corpus?
Cheers!

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.