Code Monkey home page Code Monkey logo

bdproto's People

Contributors

bambooforest avatar eitangrossman avatar hallerp avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar

bdproto's Issues

bib-file

add TimeDepthSource and HomelandSource to bib-file

Ancient Egyptian

Asked Harald for additional Glottocodes for the different stages of Ancient Egyptian.

Akkadian

We could have more datapoints for Akkadian if we want, with good datings. Basically there are a few main dialects that are attested over a few thousand years. Do we want that?

Uralic

Check all the Uralic entries, there's some fishy stuff there.

time depth

  • time depth for geographic area
    • general minimum time depth (8000 bp/bc in the fertile crescent and china; more recent developments 1000-2000 bc in the Americas)
  • we need dates for the proto-languages

Make family checklist?

Hi, can we make a unified list of proto-languages, as well as a list of proto-languages (by family) to be hunted down? The latter can be made manually and added to.

Coverage?

I think it would be good to have a summary of the coverage per area, so we can know what areas we should target in searches, rather than focusing on particular areas.

Need a way to distinguish duplicate entries

  • this could be by adding a column LanguageName with a standardized way of naming the inventories (so as to be able to easily identify duplicate entries) and replacing the current LanguageName with something like SourceLanguageName

  • or it could be by assigning a Glottocode to each and every inventory

this should be done in the g-spreadsheet for the time being

Low-confidence

I think we need a special annotation for low-confidence segments, with two possibilities: low confidence in the phoneme as a distinctive unit, and low confidence in its phonological ID. This can be useful because if people think there is a distinctive phoneme but don't agree on its interpretation, we can use it for total inventory counts but might want to exclude it from other analyses. If people don't know if it is necessary at all, we can exclude it or include it, depending on whether we want to be conservative or not.

Clean up directory

make a docs folder for scratch stuff and dump everything there. root folder should be:

  • bdproto.bib
  • bdproto-inventories.tsv
  • bdproto-metadata.tsv
  • README.md

New column for Type

This is to distinguish reconstructed from ancient languages, because we might want to exclude one type for some analyses.

Inventory types

Todo: automatically detect whether an inventory is all Cs or Vs.

Duplicate columns?

What's the difference between LanguageFamily and Classification? Do we need both?

Add columns to metadata spreadsheet

  • in g-sheets that bins that proto-languages in rough ages
  • if we can't get complete glottocode coverage, we'll need to add some rough geo-coordinates

Delete inventories

There are a few inventories of proto-languages that are prob too controversial/fringe to keep in the db.

This isn't a final list, but right now I'd get rid of BDPROTO ID numbers:

1055 - Nostratic
1053 - Proto-Altaic -- does Robeets have a sound inventory yet?
1059 - Proto-Australian -- not sure but prob to be got rid of
1061 - Proto-Afroasiatic -- v skeptical of any proposed concrete inventory; Nichols says it looks like a pseudo-phylum.
148 - Proto-Nilo-Saharan - not sure about this but looks like the evidence is for the splitters.
97/1089 - Ob-Ugric is sketchy, but I will check out the whole Uralic story, so maybe leave this for now.
20 - Is Proto-TNG really a thing?
1114 - Uralo-Siberian
14 - Proto-Dene-Caucasian

2010 Kassite

There isn't a single connected text in Kassite, so the phonology might be extremely iffy. Suggest deleting.

Fields for BDPROTO

ID - We should merge them all, but later. UZ has 1-15, we stated new ones from 16. Since we should go through the original data too, we can merge them once they've been vetted.
LanguageFamilyRoot/Family/Classification: we should drop all but Family unless there is a reason to keep them all. This should be taken automatically if possibly from Glottolog.
TimeDepth etc - I think that we only need one field for this, and another for the reference source. As for how to proceed, I suggest we ask experts for a reliable ref or pc. I can handle this, as I bother people regularly by mail.
Homeland - same.
BibTexKey, FileName, Squib - why not a single entry?
LanguageCode - why do we need this if we have the Glottocode? Can we drop it?
Syllable structure - I don't think we'll have this for many languages, so I suggest we drop it, or otherwise have a fixed choice of data to be entered, otherwise the data is likely to be messy.
Region - how is this different from homeland? Can we drop it? If it is different, can we move it next to homeland?
Allophone - in many cases, some variation is given (g~gh), could this be used for this, and if so, how does one select the variant? Here the issue isn't really allophony, though, it's generally uncertainty as which is to be reconstructed.
Another - we probably need a tag for marginal phonemes (i.e., phonemes in parentheses in the doculect).

49 Central Chadic

No Glottocode, no node, check to see whether the classification is solid.

Duplicate?

Are 2013 and 2014 duplicates? Looks like the same source.

11/135 Tibeto-Burman

No Glottocode. Prob Glottolog doesn't believe this is a valid grouping, check it out.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.