Code Monkey home page Code Monkey logo

Comments (14)

spyysalo avatar spyysalo commented on August 18, 2024

A related question regarding the expected level of detail: English and Finnish currently have documentation for dependency relations that fully mirrors that for universal relations, with a separate page including language-specific examples for every relation:
http://universaldependencies.github.io/docs/en-dep-index.html and http://universaldependencies.github.io/docs/fi-dep-index.html .

Is (some, minimal) per-relation documentation expected for each language, or could a (single-document) "diff" summarizing language-specific aspects enough? How about the same for tags and features?

from docs.

dan-zeman avatar dan-zeman commented on August 18, 2024

I would not necessarily require a full mirror. I would expect readers of the language-specific part to have read the universal part first.

On the other hand, simple diff (e.g. "we do not use iobj and we define a new subtype of nsubj, called nsubj-erg and defined as ...") will not always be enough. Showing language-specific examples for the universal dependencies will be often helpful, especially if people may be uncertain where a phenomenon belongs. Okay, that could actually result in the full mirror :-)

from docs.

dan-zeman avatar dan-zeman commented on August 18, 2024

As for tags, it will mostly be about borderline words and renaming traditional categories of the language.

As for features, it will be same as tags, plus defining new features or values if necessary.

from docs.

jnivre avatar jnivre commented on August 18, 2024

I agree. Language-specific examples are very useful, but I don’t think we should force people to repeat all universal definitions.

Joakim

On 20 Sep 2014, at 12:08, Dan Zeman <[email protected]mailto:[email protected]> wrote:

I would not necessarily require a full mirror. I would expect readers of the language-specific part to have read the universal part first.

On the other hand, simple diff (e.g. "we do not use iobj and we define a new subtype of nsubj, called nsubj-erg and defined as ...") will not always be enough. Showing language-specific examples for the universal dependencies will be often helpful, especially if people may be uncertain where a phenomenon belongs. Okay, that could actually result in the full mirror :-)


Reply to this email directly or view it on GitHubhttps://github.com//issues/43#issuecomment-56263675.

from docs.

spyysalo avatar spyysalo commented on August 18, 2024

This issue is likely to become increasingly relevant now. Some related work was already done in documenting style guidelines (#72) and the way the templates got structured provides a frame for this, but more detailed guidelines wouldn't hurt. What's your opinion on what we've been doing for features in Finnish (http://universaldependencies.github.io/docs/fi/feat/all.html)? Could something like this serve as an example (hopefully in a positive sense ;-))?

from docs.

jnivre avatar jnivre commented on August 18, 2024

Looks good to me. Crossing out features that are not used is a very nice visualization. Perhaps we can add language-specific features to the same table but with a different color. This would provide a nice, intuitive overview of how the language-specific inventory relates to the universal ones. (Perhaps there just aren’t any language-specific features for Finnish.)

Joakim

from docs.

dan-zeman avatar dan-zeman commented on August 18, 2024

Looks good to me too. Different color may be a problem because the feature names are hyperlinks and their color is determined by the hyperlink style. What about preceding them by a plus sign?

from docs.

spyysalo avatar spyysalo commented on August 18, 2024

Great! I'll assign myself for writing a first brief draft on instructions for language-specific feature documentation based on this.

+1 for a plus sign for language-specific features :-)

from docs.

jnivre avatar jnivre commented on August 18, 2024

We still don't have any general instructions for the language-specific documentation, right? I think it would be very useful to add this in preparation for the second release, where the goal is to have complete documentation for the first ten languages. I could have a first go at drafting this.

from docs.

spyysalo avatar spyysalo commented on August 18, 2024

from docs.

jnivre avatar jnivre commented on August 18, 2024

Here is a proposal for guidelines for the language-specific documentation, organized by sections. I have tried to make the Swedish documentation conform to these guidelines as an example.

overview/introduction.html
Short description of the treebank, size, types of texts, annotation/conversion process, etc.
Optional subheadings: "Acknowledgments", "References"

overview/tokenization.html
Short description of the principles for word segmentation with references to additional documentation and/or standard tokenizers as appropriate. The description must state whether the treebank contains multiword tokens and, if so, describe the major cases in a separate subsection (see the Czech documentation for an example of this).
Optional subheadings: "Multiword tokens", "References"

overview/morphology.html
Short description of the use of universal tags and features making clear whether language-specific features have been added (but without describing them). Short description of language-specific tags if relevant.
Optional subheading: "References"

pos/index.html
List of universal tags with unused tags crossed out (see Finnish for an example). Each tag is linked to a subpage with a language-specific definition and examples. Minimally, this should be a restatement of the universal definition with at least one language-specific example added. Preferably, it should add more information, for example, about inflectional categories of a part of speech in the specific language (see Swedish ADJ documentation for an example).

feat/index.html
List of morphological features with unused universal features crossed out and language-specific features preceded by a + (see Finnish for an example). Each features is linked to a subpage with a language-specific definition and examples. Minimally, this should be a restatement of the universal definition with at least one language-specific example added. It should also make clear which values are used for a given feature, since this may vary across languages (see Swedish Case documentation for an example).

overview/syntax.html
Short description of the use of universal dependency relations making clear whether language-specific subtypes have been added (but without describing them).

overview/specific-syntax.html
Optional description of specific constructions (see Finnish for examples). Ideally, this should be structured in the same way as the universal documentation, but since the latter is not very developed it seems hard to enforce this. If possible, use the main headings "Elements of a clause", "Elements of a nominal", "Adjectival and adverbial constructions", and add additional headings as needed (for example, "Coordination", "Compounding", etc.) Use any subheadings that are appropriate.

dep/index.html
List of universal dependencies with unused relations crossed out and language-specific subtypes indented below their supertype (see Finnish for an example). Each relation is linked to a subpage with a language-specific definition and examples. Minimally, this should be a restatement of the universal definition with at least one language-specific example added.

from docs.

jnivre avatar jnivre commented on August 18, 2024

I forgot to say that known discrepancies from the guidelines should be mentioned under the most relevant heading.

from docs.

jnivre avatar jnivre commented on August 18, 2024

One more thing I forgot: The owners of each language should also make sure that their language-specific features and relations are added to:
http://universaldependencies.github.io/docs/ext-feat-index.html
http://universaldependencies.github.io/docs/ext-dep-index.html

Ideally, this should happen automagically given the language-specific documentation to ensure consistency.

from docs.

dan-zeman avatar dan-zeman commented on August 18, 2024

Great, thanks for the draft! I would also suggest that these guidelines adopt the way how treebank-specific diffs are described (issue #116 proposal by @manning).

from docs.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.