bdproto / bdproto Goto Github PK
View Code? Open in Web Editor NEWBDPROTO: a database of phonological inventories from ancient and reconstructed languages
License: Creative Commons Attribution Share Alike 4.0 International
BDPROTO: a database of phonological inventories from ancient and reconstructed languages
License: Creative Commons Attribution Share Alike 4.0 International
add TimeDepthSource and HomelandSource to bib-file
Check if it's Proto-Hlai or Proto-Hlaic.
Is this just Goedelic?
Asked Harald for additional Glottocodes for the different stages of Ancient Egyptian.
We could have more datapoints for Akkadian if we want, with good datings. Basically there are a few main dialects that are attested over a few thousand years. Do we want that?
Check all the Uralic entries, there's some fishy stuff there.
Some reconstructed languages are called "Common X" rather than "Proto-X." Keep or find a better term?
Hi, can we make a unified list of proto-languages, as well as a list of proto-languages (by family) to be hunted down? The latter can be made manually and added to.
The structure of the three proto-tones is not reconstructable, but how represent then the tonal phonemes?
Which is the right group for this inventory?
Add homelands
No Glottocode, maybe not a node. Check.
Ask Matthias Urban about the smaller areas he uses.
Check to see which the source refers to.
I think it would be good to have a summary of the coverage per area, so we can know what areas we should target in searches, rather than focusing on particular areas.
this could be by adding a column LanguageName
with a standardized way of naming the inventories (so as to be able to easily identify duplicate entries) and replacing the current LanguageName
with something like SourceLanguageName
or it could be by assigning a Glottocode to each and every inventory
this should be done in the g-spreadsheet for the time being
so that we can fill in missing fields and re-merge into the csv file later
I think we need a special annotation for low-confidence segments, with two possibilities: low confidence in the phoneme as a distinctive unit, and low confidence in its phonological ID. This can be useful because if people think there is a distinctive phoneme but don't agree on its interpretation, we can use it for total inventory counts but might want to exclude it from other analyses. If people don't know if it is necessary at all, we can exclude it or include it, depending on whether we want to be conservative or not.
make a docs folder for scratch stuff and dump everything there. root folder should be:
Check if it's Proto-Mon or Old Mon.
This is to distinguish reconstructed from ancient languages, because we might want to exclude one type for some analyses.
No Glottocode, no node, check to see if valid classification.
Todo: automatically detect whether an inventory is all Cs or Vs.
What's the difference between LanguageFamily and Classification? Do we need both?
Is this a real grouping? No Glottocode. Check source.
Is Lakkia a language or a family? Check source
There are a few inventories of proto-languages that are prob too controversial/fringe to keep in the db.
This isn't a final list, but right now I'd get rid of BDPROTO ID numbers:
1055 - Nostratic
1053 - Proto-Altaic -- does Robeets have a sound inventory yet?
1059 - Proto-Australian -- not sure but prob to be got rid of
1061 - Proto-Afroasiatic -- v skeptical of any proposed concrete inventory; Nichols says it looks like a pseudo-phylum.
148 - Proto-Nilo-Saharan - not sure about this but looks like the evidence is for the splitters.
97/1089 - Ob-Ugric is sketchy, but I will check out the whole Uralic story, so maybe leave this for now.
20 - Is Proto-TNG really a thing?
1114 - Uralo-Siberian
14 - Proto-Dene-Caucasian
Probably better to find another source.
There isn't a single connected text in Kassite, so the phonology might be extremely iffy. Suggest deleting.
Each segment in BDPROTO should conform to phoible conventions:
http://phoible.github.io/conventions/
This includes:
ID - We should merge them all, but later. UZ has 1-15, we stated new ones from 16. Since we should go through the original data too, we can merge them once they've been vetted.
LanguageFamilyRoot/Family/Classification: we should drop all but Family unless there is a reason to keep them all. This should be taken automatically if possibly from Glottolog.
TimeDepth etc - I think that we only need one field for this, and another for the reference source. As for how to proceed, I suggest we ask experts for a reliable ref or pc. I can handle this, as I bother people regularly by mail.
Homeland - same.
BibTexKey, FileName, Squib - why not a single entry?
LanguageCode - why do we need this if we have the Glottocode? Can we drop it?
Syllable structure - I don't think we'll have this for many languages, so I suggest we drop it, or otherwise have a fixed choice of data to be entered, otherwise the data is likely to be messy.
Region - how is this different from homeland? Can we drop it? If it is different, can we move it next to homeland?
Allophone - in many cases, some variation is given (g~gh), could this be used for this, and if so, how does one select the variant? Here the issue isn't really allophony, though, it's generally uncertainty as which is to be reconstructed.
Another - we probably need a tag for marginal phonemes (i.e., phonemes in parentheses in the doculect).
Add missing squibs
No Glottocode, no node, check to see whether the classification is solid.
Are 2013 and 2014 duplicates? Looks like the same source.
No Glottocode. Prob Glottolog doesn't believe this is a valid grouping, check it out.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.