Code Monkey home page Code Monkey logo

Comments (1)

bambooforest avatar bambooforest commented on June 23, 2024

@DanielSWolf -- indeed. This is a problem that we are aware of (hence the "should"). The problem is also pervasive.

library(tidyverse)
df <- read_csv(url('https://raw.githubusercontent.com/phoible/dev/master/data/phoible.csv'))
df <- df %>% select(7:48, -Allophones, -Source, -Marginal, -SegmentClass) %>% distinct()
df <- df %>% remove_rownames %>% column_to_rownames(var="Phoneme")
df <- df %>% filter(tone != "+")
df <- rownames_to_column(df, "Phoneme")
out <- df %>% group_by(tone, stress, syllabic, short, long, consonantal, sonorant, continuant, delayedRelease, approximant, tap, trill, nasal, lateral, labial, round, labiodental, coronal, anterior, distributed, strident, dorsal, high, low, front, back, tense, retractedTongueRoot, advancedTongueRoot, periodicGlottalSource, epilaryngealSource, spreadGlottis, constrictedGlottis, fortis, raisedLarynxEjective, loweredLarynxImplosive, click) %>%
  summarize(phonemes = paste0(Phoneme, collapse = ', '), count = n()) %>% ungroup()
out %>% select(phonemes, count) %>% filter(count > 1) %>% arrange(desc(count))

   phonemes                       count
   <chr>                          <int>
 1 t, t͉, t̠, t̺, t̟, d̥, t̪̺, d̺̥, t̺͉          9
 2 t̻s̻, t̪s̪, ts̪, t̪s, t̪̻s̪̻, t̟ʃ̟, ts̻, t̻s̪̻     8
 3 t̠ʃ, t̠ʃ͉, t̠͉ʃ, d̥ʒ̥, t̻ʃ̻, d̥ʒ̊, ʈ̻ʂ̻         7
 4 ts, t͉s, t̺s̺, t̟s̟, d̥z̥, d̺̥z̺̥, ts̺         7
 5 d̻z̻, d̪z̪, dz̪, d̪ʒ, d̟ʒ̟, d̪z, dz̻         7
 6 ʃ, ʃ͉, ʒ̊, s̠, s̺̠, s̻̠, ʂ̻                7

There are several reasons for this, including but probably not limited to:

  • no features for tones (that's why I filter them out above)
  • some phoneme specifications in different documents collapse the feature vectors across phonemes, e.g., ʃ vs ʒ̊
  • some clicks are difficult to specify with the current feature set
  • plain mistakes that we need to revisit
  • the feature set itself requires some updates

@drammock anything else?

We should make this clearer in the FAQ and on the FEATURES page.

from dev.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.