Turchin based Nexus file for all the datasets. I think <a class="user-mention notransl

here's the code to run it: <div class="highlight highlight-source-python notransla

here's the nexus for IELex: <a href="https://github.com/PhyloStar/Au

<a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="/us

Here's hou you compute turchin: <div class="highlight highlight-source-python notr

Turchin about autocogphylo HOT 11 CLOSED

phylostar commented on July 21, 2024

Turchin

from autocogphylo.

Comments (11)

LinguList commented on July 21, 2024

here's the code to run it:

from lingpy import *
from glob import glob

files = ['ABVD_full.txt', 'IELex-2016.tsv']
for f in files[::-1]:
    print(f)
    lex = LexStat('data/'+f, check=True, transcription='transcription')
    lex.cluster(method='turchin', ref='cogs')
    lex.output('paps.nex', filename=f, ref='cogs', missing='?')

but I just checked: this will yield some errors, both in ielex and abvd, not many, but still, so it may be useful to check that or have it checked.

from autocogphylo.

LinguList commented on July 21, 2024

here's the nexus for IELex:

IELex-2016.tsv.paps.nex.txt

from autocogphylo.

LinguList commented on July 21, 2024

@PhyloStar, tell me, how you prefer to handle this: do you want me to submit simple scripts, which take care of things, so you call them from command line, or rather just point to code solutions and have you do the lifting, so you can see that it is a more integrated framework?

from autocogphylo.

PhyloStar commented on July 21, 2024

@LinguList Simple scripts might be sufficient. Running LexStat on the whole of ABVD might take sometime.

I am using ASJP sound classes for running LDN and PMI. Do you think we need to stick to SCA or ASJP or test with both?

from autocogphylo.

LinguList commented on July 21, 2024

test with both or only with one should not make a big difference, I'd say...

from autocogphylo.

PhyloStar commented on July 21, 2024

Okay. Lets say I test LDN and PMI with ASJP. Doing with ASJP is part of tradition. :) Whereas, SCA+LexStat/Turchin is part of tradition also. :)

from autocogphylo.

PhyloStar commented on July 21, 2024

I get a memory error when running Turchin on the ABVD full dataset.

from autocogphylo.

LinguList commented on July 21, 2024

in my case it worked without a problem, but took some time. But you can run your own turchin, I mean it is really simple, and a linear algorith, so no need to run it in lingpy. I'll later add it (wanted to do that anyway). Remind me if I should forget it. It's a ten-liner, nothing more ;-)

from autocogphylo.

LinguList commented on July 21, 2024

Here's hou you compute turchin:

dolgos = {}
wl = Wordlist('ABVDxxx.tsv')
for idx, segments in iter_rows(wl, 'tokens'):
    dolgo = tokens2class(tokens, 'dolgo') + ['H']
    if dolgo[0] == 'V':
        dolgo = ['H'] + dolgo
    dstring = ''.join([d for d in dolgo if d != 'V'])[:2]
    dolgos[idx] = dstring+'-'+wl[k, 'concept']
wl.add_entries('cog', dolgos, lambda x: dolgos[x])
wl.renumber('cog', 'turchinid')
wl.output('paps.nex', ref='turchinid', missing='?')

from autocogphylo.

LinguList commented on July 21, 2024

The basic idea is: just convert all segments to the two-letter thing in dolgopolsky-strings, and then use lingpy's functions for renumbering of cognate sets to convert the things to numbers. Much faster and easier than envoking the lexstat model.

from autocogphylo.

LinguList commented on July 21, 2024

jusut added the code "turchin.py", note that this is much faster and closer to the original description

from autocogphylo.

Turchin about autocogphylo HOT 11 CLOSED

Comments (11)

Related Issues (13)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent