loanpydatahub / streitberggothic Goto Github PK

View Code? Open in Web Editor NEW

1.0 2.0 1.0 1.94 MB

a cldf dataset of Wilhelm Streitberg's 1910 Gothic dictionary

License: Creative Commons Attribution 4.0 International

TeX 18.83% Python 80.45% Shell 0.72%

streitberggothic's Introduction

CLDF dataset derived from Streitberg's "Gotische Bibel" from 1910

How to cite

If you use these data please cite

the original source

Streitberg, Wilhelm. (1910). Die Gotische Bibel [The Gothic bible], Carl Winter, Heidelberg.
the derived dataset using the DOI of the particular released version you were using

Description

This dataset is licensed under a CC-BY-4.0 license

Available online at http://www.wulfila.be/lib/streitberg/1910/text/html/

Conceptlists in Concepticon:

Dellert-2018-1016

Notes

For a walkthrough visit our blog post and check out my YouTube-tutorial

Statistics

Varieties: 1
Concepts: 443
Lexemes: 635
Sources: 1
Synonymy: 1.43
Invalid lexemes: 0
Tokens: 3,682
Segments: 44 (0 BIPA errors, 0 CLTS sound class errors, 44 CLTS modified)
Inventory size (avg): 44.00

Contributors

Name	GitHub user	Description	Role
Wilhelm Streitberg		Published the dictionary in 1910	Author
Jozef Van Loon		Ph.D. in Germanic Languages K.U.Leuven 1979. Teaches German and Dutch Language and Linguistics at the University of Antwerp; member of the Royal Academy of Dutch Language and Literature and of the Royal Historical Commission. Author of the books: Principles of Historical Morphology (2005), De ontstaansgeschiedenis van het begrip ‘stad’ (1999), Endogene factoren in de diachrone morfologie van de Germaanse talen (1996), Historische fonologie van het Nederlands (1986), Morfeemgeschiedenis en -geografie der Nederlandse toenamen (1981). — Director of the project since 1998.	Project Coordinator
Tom De Herdt		Research assistant at the University of Antwerp on the projects mentioned above (1998-2008); currently developer at the university library of the VUB. — Webmaster. Started the project in 1997 by posting fragments of the Gothic Bible to a student website; created the database, TEI edition, digital facsimile editions and morphological software; text entry of Streitberg's dictionary.	Contributor
Frank Kinnaer		Historian, currently doing archeological work in Mechelen, Belgium. — Tagged the names, dates and other content elements in the diary of Christiaan Munters.	Contributor
David Landau		Department of Information Technology, Tampere University of Technology, Finland. His own website, the Database of the Gothic Language, focuses on digitizing the manuscripts and text heritage in general. — Contributed to the text entry of the Gothic Bible (John, Nehemias).	Contributor
Robert Tannert		Software developer (Oak Ridge National Laboratory). Completed his Ph.D. in Germanic philology at the University of Tennessee; produced several electronic editions of Middle High German works in his dissertation project. — Contributed to the text entry of the Gothic Bible (Epistles, Luke, parts of John, Skeireins: in other words the greater part of the entire corpus).	Contributor
Steven Van Assche		Civil Engineer. Completed his Ph.D. at Ghent University, with a thesis on lossless image compression. Now doing research on new technologies in multimedia for the Flemish Radio and Television Company (VRT). — Developed a C++ prototype of the morphological software and entered a significant part of Streitberg's dictionary.	Contributor
Viktor Martinović	@martino-vik	CLDF conversion	Other
Johann-Mattis List	@LinguList	CLDF conversion	Other

CLDF Datasets

The following CLDF datasets are available in cldf:

CLDF Wordlist at cldf/cldf-metadata.json
CLDF Dictionary at cldf/Dictionary-metadata.json

streitberggothic's People

Contributors

Stargazers

Watchers

Forkers

lexibank

streitberggothic's Issues

Transfer to Lexibank or Submit to Zenodo

We have two possibilities to publish the repo with Zenodo:

transfer to the lexibank organization on Zenodo, we will take care of publication then (authorship etc. won't change, of course)
submit directly to zenodo, using Zendodo's possibility to submit repositories along with a new release. This means you have to follow zenodo's guideline and set this up yourself, and we make a release then, so it will be automatically linked on Zenodo.

orthography

The current transcription with orthography.tsv is very good, but there are some mistakes that I don't see how to overcome without a rulebased transcriber, e.g.:

line 1243 is the only non-BIPA conform line, since in gagáhaftjan > ^ g agá h a f t j a n $ > g a ɣ á h a ɸ t j a n the rule á > a never gets applied.
in line 10 abrs > a β r s doesn't mark the syllabic <r̥> since the r gets clustered to <abr>, so the rule for <brs> doesn't apply.

Not only do these rules cancel each other out but also if I wanted to explicitly mention all these transformations in orthography.tsv, i.e. to hardcode every word (which I tried) then the clts-validation is failing since 1 word = 1 grapheme.

I think it would be nice if I could somehow plug in a third-party transcriber and check those results for clts-conformity. I've added the column IPA in a post-processing step for now. Those are generated with the epitran-library and I've included the transcription rules in the folder etc for more transparency. I like those transcriptions more, but of course that's not worth much if it can't be checked against the clts data.

Move all datasets to a new raw folder

Instructions can be found also specifically here: https://calc.hypotheses.org/2954

This explains the structure you need to convert data to CLDF. Note that conversion is done programmatically, transparently, not by having a dataset in converted form, but by using code to convert spreadsheet data to cldf.

inspect data through edictor

@martino-vic, if you want to inspect the data through edictor, I recommend pyedictor (pip install pyedictor).

$ edictor wordlist --name streitberggothic

A tutorial on the library is pending (maybe next month as blog post). This will convert the data to the TSV you need for lingpy / edictor. Edictor is here: https://lingulist.de/edev/ (dev version), official version: https://digling.org/edictor . May be interesting to check the data in this form. I can explain another time how.

Getting started

@martino-vic, do you prefer me to add the first code, or would you rather like to follow the workflow and make this Python script, as indicated by me in issue #1 ?

Remove Private Files

files ending in *.bat
files like cmd.txt

They also reveal your local desktop structure, which I consider a bit private ;)

make a dictionary with a wordlist from the dataset

The workflow is:

make a dictionary
make some code to map the senses to concepticon in a restricted concept list
make some precedence how the wordlist can be manually refined
make a combined lexibank script

loanpydatahub / streitberggothic Goto Github PK

streitberggothic's Introduction

CLDF dataset derived from Streitberg's "Gotische Bibel" from 1910

How to cite

Description

Notes

Statistics

Contributors

CLDF Datasets

streitberggothic's People

Contributors

Stargazers

Watchers

Forkers

streitberggothic's Issues

Recommend Projects

Recommend Topics

Recommend Org