Code Monkey home page Code Monkey logo

alksnis-v.3.0's Introduction

Alksnis-v.3.0 (The Lithuanian dependency treebank)

Summary

The Lithuanian dependency treebank ALKSNIS v3.0 (Vytautas Magnus university). From v.2.1 to v.3.0 was developed during the project "Semantika2" (Nr. 02.3.1-CPVA-V-527-01-0002)

Introduction

This is a new corrected and enhanced version of the ALKSNIS Lithuanian treebank. It is annotated in a style derived from the Prague Dependency Treebank of Czech. The previous ALKSNIS v2.1 consists of 2,355 syntactically annotated sentences. Each node of a tree corresponds to a word, a punctuation mark or other text element (symbol, digit etc.) within a sentence. ALKSNIS v.2.1 is published in CLARIN LT repository at http://hdl.handle.net/20.500.11821/10. (Some users experience DNS errors when trying to access the repository; configuring the client machine to use 8.8.8.8 as the DNS server may help. See also http://clarin-lt.lt/?page_id=86.) A version of the MULTEXT-East (http://nl.ijs.si/ME/V4/msd/html/index.html) tag set is used in ALKSNIS v2.1. The following information is presented for each node: 1) a used form; 2) a lemma; 3) a morphology tag, and 4) a syntactic function (subject, object, etc.). Dependencies are shown by links between words. ALKSNIS v3.0 from v2.1 was developed during the Vytautas Magnus University project “Semantika2” (Nr. 02.3.1-CPVA-V-527-01-0002). It consists of 3,643 syntactically annotated sentences.
Modifications from v2.1 to 3.0 (2019-07-08)

  • The older version undergone full review of syntactic information based on improved guidelines to enhance annotation quality.
  • New layer added: non-compositional multiword expressions (light verbs and idioms).
  • Added new data: scientific abstracts and reviews, additional administrative texts.
  • Schema version modified as 3.0.
  • Jablonskis tagset, which is human-friendly, is used instead of MULTEXT-East tagset.
  • Some syntactic relations were corrected or modified (details to be published in the improved guidelines).
  • Conllu files are added together with the pml files (RMQ conllu files does not keep the mwe field).

Content:

  • ALKSNIS-3.0.ZIP - The Lithuanian dependency treebank files.
  • Jablonskis-LT.pdf - Morphological annotation standart used in ALKSNIS.
  • ALksnio-3.0_sandara.docx - the structure of ALKSNIS v.3.0 files

Acknowlegments

From v.2.1 to v.3.0 was developed during the project "Semantika2" (Nr. 02.3.1-CPVA-V-527-01-0002). The Project funded by European Structural Funds

References

For ALKSNIS v.2.1: • Agnė Bielinskienė, Loïc Boizou, Jolanta Kovalevskaitė, Erika Rimkutė (2016): Lithuanian Dependency Treebank ALKSNIS. In: I. Skadiņa and R. Rozis (Eds.): Human Language Technologies – The Baltic Perspective, pp. 107–114. Amsterdam: IOS Press. doi:10.3233/978-1-61499-701-6-107 http://fcim.vdu.lt/~erika_rimkute/straipsniai/Alksnis_HLT.pdf, http://ebooks.iospress.nl/volumearticle/45523

For v.3.0 (2019-10-07):

  • License: CC BY-SA 4.0;
  • Includes text: yes;
  • Genre: news nonfiction legal scientific;
  • Lemmas: manual native;
  • UPOS: converted from manual;
  • XPOS: manual native;
  • Features: converted from manual;
  • Relations: converted from manual;
  • Contributors: Utka, Andrius; Rimkutė, Erika; Bielinskienė, Agnė; Kovalevskaitė, Jolanta; Boizou, Loïc; Aleksandravičiūtė, Gabrielė; Brokaitė, Kristina;
  • Contact: [email protected], [email protected].

alksnis-v.3.0's People

Contributors

semantika2 avatar

Stargazers

Viktorija avatar Valdas avatar Aleksas Pielikis avatar Rokas Ramanauskas avatar

Watchers

James Cloos avatar Gintautas Minelga avatar  avatar Natalia P avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.