Code Monkey home page Code Monkey logo

Comments (7)

dan-zeman avatar dan-zeman commented on August 18, 2024 2

I can make a change in the sa treebanks pages using the ISO 15924 for a start. Let me know if and where this needs to be discussed and documented (templatized) for future work.

I will raise this at a future meeting of the core group. Assuming there won't be objections, these are the next steps:

  • Document it next to other metadata in the Release checklist.
  • Add it to validation infrastructure (the line will be required and must contain valid values, e.g. Script: Latn for Latin-based alphabets).
  • Announce it to the mailing list and make sure that all treebanks have the new line in their READMEs. Also add it to the template used when creating new repositories.
  • Make sure that the script that generates treebank hub pages (at release time) copies this information to the pages.

from docs.

dan-zeman avatar dan-zeman commented on August 18, 2024 1

ISO 15924 provides codes suitable for such a metadata item. There are probably finer distinctions that could be made about the spelling rules in the treebank, but those would be difficult to capture systematically, and ISO codes of scripts would be an improvement over no info (current status).

from docs.

dan-zeman avatar dan-zeman commented on August 18, 2024 1

I did a quick check of the treebank comparison pages (those linked in home page) sa is the only case I caught of treebanks having different scripts.

This is true at the moment as far as I know, but there are other languages that could use multiple writing systems, so it is definitely a property of the treebank rather than the language.

from docs.

Abhishek-P avatar Abhishek-P commented on August 18, 2024

I can make a change in the sa treebanks pages using the ISO 15924 for a start.
Let me know if and where this needs to be discussed and documented (templatized) for future work.

from docs.

Abhishek-P avatar Abhishek-P commented on August 18, 2024

I did a quick check of the treebank comparison pages (those linked in home page) sa is the only case I caught of treebanks having different scripts.

from docs.

amir-zeldes avatar amir-zeldes commented on August 18, 2024

This is true at the moment as far as I know, but there are other languages that could use multiple writing systems, so it is definitely a property of the treebank rather than the language.

Another candidate is UD_Egyptian, which uses Schenkel transcription rather than hieroglyphs or Gardiner codes, either of which would be conceivable for Egyptian.

from docs.

robvanderg avatar robvanderg commented on August 18, 2024

If it should be automated, it can be tricky to find code for this (as script is ambiguous when searching for code), here are some existing solutions (last one by me, optimized for speed not RAM):

https://github.com/cisnlp/GlotScript
https://robvanderg.github.io/scripts/scripts/

from docs.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.