I am just testing the online-pmi , and it works so far, but I'd appreciate to be

<a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="/us

Yes, for consistency: The tranions were so bad, that we had things like "ka_⁵",

adjust pmi score creation to regular wordlist output about autocogphylo HOT 12 OPEN

phylostar commented on July 21, 2024

adjust pmi score creation to regular wordlist output

from autocogphylo.

Comments (12)

LinguList commented on July 21, 2024 1

Okay, this is useful, I can later, when I find time, add the code to immediately compute lingpy-evaluation scores (as we use them for the rest), as well as nexus output.

from autocogphylo.

PhyloStar commented on July 21, 2024

@LinguList
python3 online_pmi.py -i ../AutoCogPhylo/data/data-an-45-210.tsv --prune --eval -o data-an-45-210

-o switch allows the program to output the judgments to a custom file. Otherwise, the judgments are written to temp file.

I updated the online repo.

The ".cognates" file contains the judgments.

Did you find any difference in the scores i reported?

from autocogphylo.

PhyloStar commented on July 21, 2024

OnlinePMI gives low results with Pama-Nyungan dataset. I am looking at the reason for this.

from autocogphylo.

LinguList commented on July 21, 2024

One ad-hoc explanation: in general, all algos have a low precision, but lexstat-algos are higher. Online-PMI does not capture language-specific sound correspondences, it rather updates the scorer, but not for language-pairs, right? This means, those algos will be trapped in those cases where lookalikes sound similar. For example, when languages have a simple structure in which there are not many different sounds, they may capture the major signal of similarity, but linguists who have investigated the correspondence patterns won't follow this and label those cases as lookalikes.

In order to debug, it is useful to export the data to wordlist format and look at the differences in detail. This might easily tell you where the major problems lie (i.e., whether they are as I suppose, due to lack of pairwise correspondences).

With lingpy, you can inspect the data using the diff-command:

>>> from lingpy.evaluate.acd import diff
>>> diff(wordlist, 'cogid', 'yourid')

this will create a file where you can see the concepts listed in a convenient way, also pointing to false positives and negatives...

from autocogphylo.

LinguList commented on July 21, 2024

In general, I can give an ad-hoc impression on the scores, juding from my experience with lexstat:

we have high coverage, so apart from ST, this should not be a problem this time
we have in part problematic orthographies (IE has some bugs, etc.)
we can see that the low precision for AA and ST results from partial cognates which are not properly handled in the gold standard (and our algos pick similarities that the scholars may ignore, due to morpheme boundaries)
where cognates are well-known and readily adjusted, we expect lexstat to have a high recall, which is the case for IE, but not for AN, where we know that many good cognates have often not been assigned (we reach higher scores for re-cleaned datasets where experts adjust the readings)
in PN, we may have to deal with instances of inconsistent cognate coding (inconsistent from the perspective of the algos, as I assume that Claire usually has an idea why she labels things cognate or not, but this may involve decisions based on morphology which are unpredictable unless you have an idea for some proto-language)
we can see that the selection of the data according to coverage changes the chances for lexstat: while our svm paper reports 0.8, we are now at 0.84, and both precision and recall have increased, but especially precision is much higher now

Differences may also result from the threshold selected, but I consider it as best to stick to the Ts we have inferred from training in the PLOS paper. Otherwise, we would have to re-train on datasets with a similar number of languages, but they are not available so far.

from autocogphylo.

PhyloStar commented on July 21, 2024

You are right that it does not update the scorer for language pairs. OnlinePMI only comes up a scorer for all languages in general. The scorer has to be updated for language pairs. I don't know of any immediate solution to this problem. Even, Gerhard's PMI matrix also gives similar F-score of 0.75 with InfoMap 0.5 threshold. Found a testset where OnlinePMI does not perform so well.

from autocogphylo.

PhyloStar commented on July 21, 2024

One thing about ST is that I got 0.79 F-score with previous dataset that Mei-Shin gave using SCA alphabet. Are you removing tonal information in SCA transcription?

from autocogphylo.

LinguList commented on July 21, 2024

Yes, for consistency: The transcriptions were so bad, that we had things like "ka_⁵", which does not make any sense, so I decided to delete tonal information in general in all datasets. Also, because this is fairer to SVM, as Gerhard's PMI usually does not have tones, right? So I deleted ALL morpheme boundaries, all tones, and all spaces, that is, word boundaries, to be as consistent as possible across all datasets.

from autocogphylo.

PhyloStar commented on July 21, 2024

Okay. This is great. Gerhard's PMI does not have tones. Okay. Very nice.

I have an update about getting language pairwise scorer matrices starting from OnlinePMI. I implemented a LexStat version of Online PMI. The code is available here.

https://github.com/PhyloStar/OnlinePMI/blob/master/online_pmi_lexstat.py#L407

The main results I got after running on some of the datasets. I put the precision and recall in brackets.

Dataset	Online PMI	Online PMI + LexStat
ObUgrian	0.9166 (0.9571, 0.8793)	0.8708 (0.9870, 0.7791)
Japanese	0.9166 (0.9468, 0.888)	0.8618 (0.97, 0.7752)
st-64-110	0.5552 (0.746, 0.442)	0.4691 (0.8559, 0.3230)
Huon	0.8652 (0.8115, 0.9265)	0.8789 (0.9134, 0.8468)
Chinese_1964	0.7469 (0.6163, 0.9478)	0.8233 (0.7570, 0.9024)

As you mentioned, the precision is very high but the recall is low for 0.5 threshold for infomap. I will add more results on pn, ie, aa, and an soon.

from autocogphylo.

LinguList commented on July 21, 2024

Cool. As a further idea deserving reconsideration, consider the following two factors that lingpy uses to smoothen the rigidity of pairwise sound correspondences:

use of a simpler sound class system that allows for more matches along with the position-based symbols (prosodic strings) which essentially enlarge the alphabet afterwards, but based on position, rather than how somethings sounds
use of a combined scoring function that retains a certain portion of the initial SCA-score (or ASJP-score), currently set to 1:2 (one point SCA score + 2 points pure correspondence probability)

As a general idea that will be really easy to implement for online PMI is the following:

construct a new alphabet that is based on sound positions (i.e., instead of writing SCA-sound classes + prosodic string, use one symbol only) and convert the data accordingly (before computation, this is anyway done before, so why not add another sound class in which the position is accounted for?
recompute as if nothing happened

The point I was actually not reallly making clear before (also to myself) is: we can easily deal with context without actually having the algos know about context by simply changing the alphabets. This comes close to our other spin-off-idea where we wanted to look into improved sound class systems. In fact: we should generalize this to improved sound class models (inferred from feature bundles or whatever) plus enhanced context models. The goal would be to have maximal predictability for regular words in a given language based on simple markov chains.

from autocogphylo.

PhyloStar commented on July 21, 2024

I added Chinese Restaurant process to lexstat calculation. Can you please test it on your side?
python3 online_pmi_lexstat.py -i uniform_data/ob_ugrian.tsv.uniform --eval -c crp`
By default, I use infomap.
My impression about weighting is that onlinePMI is behaving the opposite of SCA when the algorithm uses all synonymous word pairs to compute the PMI matrix.
About memory size for LexStat: Is there memory issue when storing scorers for each language pair? I am facing such issue for the recent datasets on my laptop (8GB ram only). As of now, I am caching the alignments when calculating the language pair correspondences on fly.
Is the new alphabet a mapping from unique tuple of (SCA alphabet, prosodic string) --> something else?Just clarifying.
Online PMI is performing like SCA. :)

from autocogphylo.

PhyloStar commented on July 21, 2024

I fixed the memory issue by only caching alignments for a single language pair. This removed the memory issue. The program takes less than 1GB ram. It takes 10 minutes to calculate both PMI and lexstat scores for st-64-110 dataset.

The other issue is about the threshold for Online_LexStat. The precision is very high due to the current cutoff. I suppose I have to tune the threshold and also tune the weight for the base scorer.

from autocogphylo.

adjust pmi score creation to regular wordlist output about autocogphylo HOT 12 OPEN

Comments (12)

Related Issues (14)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent