Comments (1)
@DanielSWolf -- indeed. This is a problem that we are aware of (hence the "should"). The problem is also pervasive.
library(tidyverse)
df <- read_csv(url('https://raw.githubusercontent.com/phoible/dev/master/data/phoible.csv'))
df <- df %>% select(7:48, -Allophones, -Source, -Marginal, -SegmentClass) %>% distinct()
df <- df %>% remove_rownames %>% column_to_rownames(var="Phoneme")
df <- df %>% filter(tone != "+")
df <- rownames_to_column(df, "Phoneme")
out <- df %>% group_by(tone, stress, syllabic, short, long, consonantal, sonorant, continuant, delayedRelease, approximant, tap, trill, nasal, lateral, labial, round, labiodental, coronal, anterior, distributed, strident, dorsal, high, low, front, back, tense, retractedTongueRoot, advancedTongueRoot, periodicGlottalSource, epilaryngealSource, spreadGlottis, constrictedGlottis, fortis, raisedLarynxEjective, loweredLarynxImplosive, click) %>%
summarize(phonemes = paste0(Phoneme, collapse = ', '), count = n()) %>% ungroup()
out %>% select(phonemes, count) %>% filter(count > 1) %>% arrange(desc(count))
phonemes count
<chr> <int>
1 t, t͉, t̠, t̺, t̟, d̥, t̪̺, d̺̥, t̺͉ 9
2 t̻s̻, t̪s̪, ts̪, t̪s, t̪̻s̪̻, t̟ʃ̟, ts̻, t̻s̪̻ 8
3 t̠ʃ, t̠ʃ͉, t̠͉ʃ, d̥ʒ̥, t̻ʃ̻, d̥ʒ̊, ʈ̻ʂ̻ 7
4 ts, t͉s, t̺s̺, t̟s̟, d̥z̥, d̺̥z̺̥, ts̺ 7
5 d̻z̻, d̪z̪, dz̪, d̪ʒ, d̟ʒ̟, d̪z, dz̻ 7
6 ʃ, ʃ͉, ʒ̊, s̠, s̺̠, s̻̠, ʂ̻ 7
There are several reasons for this, including but probably not limited to:
- no features for tones (that's why I filter them out above)
- some phoneme specifications in different documents collapse the feature vectors across phonemes, e.g., ʃ vs ʒ̊
- some clicks are difficult to specify with the current feature set
- plain mistakes that we need to revisit
- the feature set itself requires some updates
@drammock anything else?
We should make this clearer in the FAQ and on the FEATURES page.
from dev.
Related Issues (20)
- Add retirements.json
- Missing dental diacritics
- Add HUPC inventories
- Add Papuanesia inventories
- Merge sources HOT 1
- Missing dates in bibtex HOT 1
- Bikele (1102) marginal vowel missing
- voiceless asipiration diacritic on voiced base glyphs (d, n, r) in EA HOT 5
- Labio-velar plosives and velarized plosives have the same feature sets HOT 1
- d̪ʼkxʼ is said to have a feature of "dorsal -" HOT 2
- Wrong language code HOT 1
- "ə˞ː" missing from distinctive features data HOT 1
- Question: is there an RDF version of the dataset(s)? HOT 3
- Adding Armenian doculects HOT 1
- Adding Homshetsma
- Tones missing in InventoryID 859
- Tones missing from EA source
- inv 1383 (!Xun) has voiceless aspiration on voiced base glyph HOT 1
- Feature vectors for allophones that aren't phonemes HOT 4
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from dev.