Code Monkey home page Code Monkey logo

jasima's Introduction

ARCHIVED

nimi Linku (the Google sheet) and jasima Linku (this repo) have been archived, replaced by sona Linku (the dataset) and nimi Linku (the localization platform Crowdin). If you are looking to contribute, please see those projects or join kulupu Linku (the Discord server).


data.json schema

Structure

  • All fields are stored as strings, even if an integer or float representation would be appropriate.
  • All immediate children of the top four keys (documented below) are stored alphabetically.
  • If a key exists but has no value in nimi Linku, it will not be in jasima Linku.
  • If a field's name contains a slash /, the word(s) before the slash is a parent key, and the word(s) after the slash is a child key. For example, the Words sheet or data parent key has def/[language_code] for all word definitions.
  • It is not possible for a parent key to have the same name as a normal key; the updater script would fail, as it would attempt to insert to a string as though it were a dictionary.

Reading the documentation

  • A key is indicated with its name key_name:
  • A variable key is indicated in brackets [key_name]:.
  • If a bracketed key name has any formatting information, it will be documented in the first child key.
  • Machine-reading instructions, if applicable, are indicated at the end of the description with **surrounding asterisks**.

languages

Derived from the Languages sheet

[language_code]:
  id_long: The full ID of the language according to ISO
  name_endonym: The name of the language according to its speakers.
  name_english: The name of the language in English.
  name_toki_pona: The name of the language in Toki Pona (generally derived from endonym).
  credits: List of those who contributed to this translation. **Split on `,` to get each name.**
  completeness_percent:
    [usage_category]:
      The integer percentage of words with definitions translated into this language in this usage category.

credits

Derived from the Credits sheet

[contributor]:
  description: A human-readable description of the contribution made.

data

Derived from the Words sheet.

Some static files are derived from ijo Linku, such as the luka pona and audio files.

All children of etymology_data have the same list length once split on ;.

[word_id]: A unique identifier for the word which is often the word, but may have an integer suffix if the word has been coined multiple times.
  word: The word as it would be written in toki pona using sitelen Lasina.
  sitelen_pona: A list of latin character strings that convert to all alternates for a given word. Usually identical to [word]; see "akesi".
  ucsur: The unicode codepoint assigned to the word.
  sitelen_pona_etymology: Human-readable description of the origin of the sitelen pona.
  sitelen_sitelen: URL to an image of the sitelen sitelen for the word.
  sitelen_emosi: The emoji corresponding to the word in sitelen Emosi.
  luka_pona:
    [format]: URL to the luka pona sign being demonstrated in [format].
  audio:
    [author]: URL to the audio of the word being spoken by [author].
  coined_era: One of [pre-pu, post-pu, post-ku] indicating the "era" the word was created in, relative to the publishing of the Toki Pona books.
  coined_year: The year a word was coined.
  book: One of [pu, ku suli, ku lili, none] indicating what Toki Pona book the word was first documented in.
  usage_category: One of [core, widespread, common, uncommon, rare, obscure] indicating the word's popularity.
  source_language: The language(s) the word derives from.
  etymology: A human-readable description of the word's etymology(ies), including the original word(s), definition(s), and other metadata.
  etymology_data:
    langs: List of languages the word derives from. **Split on `;`.**
    words: List of words the word derives from. **Split on `;`.**
    alts: List of alternate writings or indicated pronunciations for the words in `words`. **Split on `;`.**
    defs: List of definitions for the words in `words`. **Split on `;`.**
  creator: The name of the word's creator.
  ku_data: Usage data from Toki Pona Dictionary (ku), indicated with a superscript number. **Split on `,`.**
  recognition:
    [date]:
      Integer percentage of survey respondents who recognize and use the word as of [date]. [date] is YYYY-MM format.
  author_verbatim: Definition of the word as written by its original author. Defer to `pu_verbatim` if that is defined.
  pu_verbatim:
    [language_code]:
      "Definition of the word in [language_code] as written in the corresponding translation of Toki Pona: The Language of Good."
  see_also: A list of words related to [word]. **Split on `,`.**
  commentary: Human-readable extra information about the word, such as historical usage, replacement, or clarifications.
  def:
    [language_code]:
      Definition of the word in [language_code]. [language_code] is an entry in the `language` key.

fonts

Derived from the Fonts sheet.

Also used to fill out nasin sitelen Linku.

Some fonts exist which are recorded here but are not distributed in nasin sitelen Linku or in ilo Linku; this is generally due to licensing conflicts, as we can only distribute fonts with licenses allowing us to do so. Commonly applicable fonts include OFL (Open Font License) and variations of CC (Creative Commons).

See also, license IDs according to SPDX

[font_name]:
  name_short: The name of the font. Usually the same as [font_name], but not always.
  writing_system: The writing system of the font (alphabet, syllabary, sitelen pona, sitelen sitelen, ...)
  links:
    fontfile: A direct download link for the font. Usually downloadable without authentication.
    repo: The repository of the font on github or gitlab.
    webpage: The homepage of the font.
  creator: The name(s) of the font creator(s). TODO- Inconsistently split by `/` and `&`.
  license: The code of the license according to SPDX.
  version: Version number of the font as provided by the author(s). NOT GLOBALLY CONSISTENT.
  last_updated: The last time the font file was updated in nasin sitelen Linku. YYYY-MM format.
  filename: The name of the file in nasin sitelen Linku. Provided even if license is incompatible.
  style: One word description of font appearance.
  features: Human-readable list of capabilities of the font (supported glyphs, UCSUR support, combining glyphs, etc).

jasima's People

Contributors

acipensersturio avatar cubedhuang avatar gregdan3 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar

jasima's Issues

Error in definition of "alente"

The definition on lipu Linku states that alente is:

the set of every possible human concept subtracted from the set of concepts already covered by established toki pona words weighted by their relative usage

Symbolically, if C is the set of every possible human concept and T the set of concepts already covered by established Toki Pona words. Then, because T is a subset of C:

alente := T \ C = {}

I believe the intent here was to define alente like this instead:

alente := C \ T

Or in natural language, the definition should be:

the set of concepts already covered by established toki pona words subtracted from the set of every possible human concept weighted by their relative usage

The database should be updated to reflect the proposed change.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.