Comments (14)
A related question regarding the expected level of detail: English and Finnish currently have documentation for dependency relations that fully mirrors that for universal relations, with a separate page including language-specific examples for every relation:
http://universaldependencies.github.io/docs/en-dep-index.html and http://universaldependencies.github.io/docs/fi-dep-index.html .
Is (some, minimal) per-relation documentation expected for each language, or could a (single-document) "diff" summarizing language-specific aspects enough? How about the same for tags and features?
from docs.
I would not necessarily require a full mirror. I would expect readers of the language-specific part to have read the universal part first.
On the other hand, simple diff (e.g. "we do not use iobj
and we define a new subtype of nsubj
, called nsubj-erg
and defined as ...") will not always be enough. Showing language-specific examples for the universal dependencies will be often helpful, especially if people may be uncertain where a phenomenon belongs. Okay, that could actually result in the full mirror :-)
from docs.
As for tags, it will mostly be about borderline words and renaming traditional categories of the language.
As for features, it will be same as tags, plus defining new features or values if necessary.
from docs.
I agree. Language-specific examples are very useful, but I don’t think we should force people to repeat all universal definitions.
Joakim
On 20 Sep 2014, at 12:08, Dan Zeman <[email protected]mailto:[email protected]> wrote:
I would not necessarily require a full mirror. I would expect readers of the language-specific part to have read the universal part first.
On the other hand, simple diff (e.g. "we do not use iobj and we define a new subtype of nsubj, called nsubj-erg and defined as ...") will not always be enough. Showing language-specific examples for the universal dependencies will be often helpful, especially if people may be uncertain where a phenomenon belongs. Okay, that could actually result in the full mirror :-)
—
Reply to this email directly or view it on GitHubhttps://github.com//issues/43#issuecomment-56263675.
from docs.
This issue is likely to become increasingly relevant now. Some related work was already done in documenting style guidelines (#72) and the way the templates got structured provides a frame for this, but more detailed guidelines wouldn't hurt. What's your opinion on what we've been doing for features in Finnish (http://universaldependencies.github.io/docs/fi/feat/all.html)? Could something like this serve as an example (hopefully in a positive sense ;-))?
from docs.
Looks good to me. Crossing out features that are not used is a very nice visualization. Perhaps we can add language-specific features to the same table but with a different color. This would provide a nice, intuitive overview of how the language-specific inventory relates to the universal ones. (Perhaps there just aren’t any language-specific features for Finnish.)
Joakim
from docs.
Looks good to me too. Different color may be a problem because the feature names are hyperlinks and their color is determined by the hyperlink style. What about preceding them by a plus sign?
from docs.
Great! I'll assign myself for writing a first brief draft on instructions for language-specific feature documentation based on this.
+1 for a plus sign for language-specific features :-)
from docs.
We still don't have any general instructions for the language-specific documentation, right? I think it would be very useful to add this in preparation for the second release, where the goal is to have complete documentation for the first ten languages. I could have a first go at drafting this.
from docs.
from docs.
Here is a proposal for guidelines for the language-specific documentation, organized by sections. I have tried to make the Swedish documentation conform to these guidelines as an example.
overview/introduction.html
Short description of the treebank, size, types of texts, annotation/conversion process, etc.
Optional subheadings: "Acknowledgments", "References"
overview/tokenization.html
Short description of the principles for word segmentation with references to additional documentation and/or standard tokenizers as appropriate. The description must state whether the treebank contains multiword tokens and, if so, describe the major cases in a separate subsection (see the Czech documentation for an example of this).
Optional subheadings: "Multiword tokens", "References"
overview/morphology.html
Short description of the use of universal tags and features making clear whether language-specific features have been added (but without describing them). Short description of language-specific tags if relevant.
Optional subheading: "References"
pos/index.html
List of universal tags with unused tags crossed out (see Finnish for an example). Each tag is linked to a subpage with a language-specific definition and examples. Minimally, this should be a restatement of the universal definition with at least one language-specific example added. Preferably, it should add more information, for example, about inflectional categories of a part of speech in the specific language (see Swedish ADJ documentation for an example).
feat/index.html
List of morphological features with unused universal features crossed out and language-specific features preceded by a + (see Finnish for an example). Each features is linked to a subpage with a language-specific definition and examples. Minimally, this should be a restatement of the universal definition with at least one language-specific example added. It should also make clear which values are used for a given feature, since this may vary across languages (see Swedish Case documentation for an example).
overview/syntax.html
Short description of the use of universal dependency relations making clear whether language-specific subtypes have been added (but without describing them).
overview/specific-syntax.html
Optional description of specific constructions (see Finnish for examples). Ideally, this should be structured in the same way as the universal documentation, but since the latter is not very developed it seems hard to enforce this. If possible, use the main headings "Elements of a clause", "Elements of a nominal", "Adjectival and adverbial constructions", and add additional headings as needed (for example, "Coordination", "Compounding", etc.) Use any subheadings that are appropriate.
dep/index.html
List of universal dependencies with unused relations crossed out and language-specific subtypes indented below their supertype (see Finnish for an example). Each relation is linked to a subpage with a language-specific definition and examples. Minimally, this should be a restatement of the universal definition with at least one language-specific example added.
from docs.
I forgot to say that known discrepancies from the guidelines should be mentioned under the most relevant heading.
from docs.
One more thing I forgot: The owners of each language should also make sure that their language-specific features and relations are added to:
http://universaldependencies.github.io/docs/ext-feat-index.html
http://universaldependencies.github.io/docs/ext-dep-index.html
Ideally, this should happen automagically given the language-specific documentation to ensure consistency.
from docs.
Great, thanks for the draft! I would also suggest that these guidelines adopt the way how treebank-specific diffs are described (issue #116 proposal by @manning).
from docs.
Related Issues (20)
- Misidentified Lemmas in Spanish HOT 1
- clausal appos HOT 36
- Flat:foreign with Typo=Yes HOT 3
- acl vs xcomp vs advcl HOT 1
- Deprel of list item enumerators HOT 11
- English nominal subtypes: merge :npmod and :tmod as :unmarked HOT 18
- Create treebank without syntactic dependencies HOT 2
- How to document script used for the data in treebank? HOT 7
- Some Broken or missing treebank links HOT 4
- NPs in head-marking languages HOT 19
- Standardizing ExtPos (at least for fixed expressions) HOT 36
- New enhanced dependencies - Propagation of nsubj for ccomp and advcl in pro-drop languages HOT 3
- Annotation of Classifiers in the Egyptian-UJaen Treebank HOT 33
- English mischievous nominals involving names and numbers HOT 7
- Repository for new treebank HOT 1
- Transitive vs intransitive verb features? HOT 1
- `as X as` expressions as `fixed` with ExtPos - what qualifies? HOT 8
- Ellipsis in UD HOT 2
- docs site has stopped building HOT 2
- How to differentiate DET for quantifiers and DET for demonstrative determiners for isolating languages like Thai HOT 4
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from docs.