Comments (7)
I thought it should be possible on https://phoible.org/parameters to simply press a "download" button to get that table, but looking at the site now I don't see any download button (cc @xrotwang - am I just mistaken that there should be a download button for the full segments table?)
As for why the table in https://github.com/phoible/dev/blob/master/raw-data/FEATURES/phoible-segments-features.tsv has fewer segments than the table at https://phoible.org/parameters --- I'm not sure off the top of my head. The website should reflect the state of this repo as of 862bec9 (the 2.0 release tag) but if I look at that file from the 2.0 release commit it has 2163 lines, not 3183 like on the website. Maybe @bambooforest or @xrotwang have ideas? The files in https://github.com/clld/phoible/tree/master/phoible/static/data have suspicious-looking filenames/dates, making me wonder if the live data is in fact out of date?
from dev.
The process of feeding PHOIBLE 2.0 into the web app wasn't particularly streamlined :) This should be a lot simpler for PHOIBLE 3.0, I'd hope.
So, the data from https://github.com/phoible/dev/ was converted to a CLDF dataset using scripts in https://github.com/bambooforest/phoible-scripts . The process is described in https://github.com/bambooforest/phoible-scripts/blob/master/to_cldf/to_cldf.md and here we already see the 3,183
show up. This CLDF data then served as input to basically copy the CLDF data but add metadata in https://github.com/cldf-datasets/phoible/ - which eventually was loaded into the web app database.
As far as I can tell, the primary data source in the phoible/dev repos is the RData object https://github.com/phoible/dev/blob/master/data/phoible.RData , but @bambooforest might know more about this.
from dev.
(cc @xrotwang - am I just mistaken that there should be a download button for the full segments table?)
I did away with the per-table download buttons when I moved to the new paradigm that clld apps only serve data from released CLDF datasets. Thus, rather than download (filtered or sorted or otherwise manipulated) individual collections of rows (without any provenance information), users are encouraged to work from the full CLDF dataset, which includes metadata regarding provenance, etc.
I realize that the PHOIBLE app still advertises the per-table download feature, though. Should be changed (see clld/phoible#32).
from dev.
@tang-kevin Looking at your particular example, maybe phoible-segments-features.tsv
isn't supposed to be the full list of segments appearing in any inventories? Just grepping for the segment reveals that it appears elsewhere:
$ grep "ɚ" raw-data/*/*
raw-data/FEATURES/component-feature-table.csv:ɚ,025A,0,-,+,-,-,-,+,+,0,+,-,-,-,-,-,0,0,+,+,+,+,+,-,-,-,-,-,-,-,+,-,-,-,0,-,-,0
raw-data/UZ/UZ_inventories.tsv: "ɚː" "ɚː" """vowels are lowered and centralised before [ɹ] and many contrasts are lost"""
raw-data/UZ/UZ_inventories.tsv: "ɚ" "ɚ"
from dev.
from dev.
@tang-kevin As far as I can tell, https://github.com/cldf-datasets/phoible/blob/v2.0.1/cldf/parameters.csv is exactly the complete list of all sounds encountered in any of the inventories covered in PHOIBLE.
from dev.
@xrotwang Thank you. It does appear to have all 3183 sounds! It solves my personal problem for sure.
I would suggest the PHOIBLE website to direct the reader to this file instead of download.
from dev.
Related Issues (20)
- voiceless asipiration diacritic on voiced base glyphs (d, n, r) in EA HOT 5
- Labio-velar plosives and velarized plosives have the same feature sets HOT 1
- d̪ʼkxʼ is said to have a feature of "dorsal -" HOT 2
- Wrong language code HOT 1
- "ə˞ː" missing from distinctive features data HOT 1
- ə and ɜ have the exact same features HOT 1
- Question: is there an RDF version of the dataset(s)? HOT 3
- Adding Armenian doculects HOT 1
- Adding Homshetsma
- Tones missing in InventoryID 859
- Tones missing from EA source
- inv 1383 (!Xun) has voiceless aspiration on voiced base glyph HOT 1
- Feature vectors for allophones that aren't phonemes HOT 4
- Identical features for s̪ and s̻ for Polish inventory 1046
- [Question] Multiple entries under the same ISO code HOT 1
- Duplicated ISO codes for multiple glottocodes HOT 2
- Identical features for segments t̠ʃʼ and d̠ʒʼ
- Correction to PHOIBLE entry for Fwâi
- Correction to language name in Inventory Lebanese Arabic (PH 1098)
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from dev.