Comments (11)
here's the code to run it:
from lingpy import *
from glob import glob
files = ['ABVD_full.txt', 'IELex-2016.tsv']
for f in files[::-1]:
print(f)
lex = LexStat('data/'+f, check=True, transcription='transcription')
lex.cluster(method='turchin', ref='cogs')
lex.output('paps.nex', filename=f, ref='cogs', missing='?')
but I just checked: this will yield some errors, both in ielex and abvd, not many, but still, so it may be useful to check that or have it checked.
from autocogphylo.
here's the nexus for IELex:
from autocogphylo.
@PhyloStar, tell me, how you prefer to handle this: do you want me to submit simple scripts, which take care of things, so you call them from command line, or rather just point to code solutions and have you do the lifting, so you can see that it is a more integrated framework?
from autocogphylo.
@LinguList Simple scripts might be sufficient. Running LexStat on the whole of ABVD might take sometime.
I am using ASJP sound classes for running LDN and PMI. Do you think we need to stick to SCA or ASJP or test with both?
from autocogphylo.
test with both or only with one should not make a big difference, I'd say...
from autocogphylo.
Okay. Lets say I test LDN and PMI with ASJP. Doing with ASJP is part of tradition. :) Whereas, SCA+LexStat/Turchin is part of tradition also. :)
from autocogphylo.
I get a memory error when running Turchin on the ABVD full dataset.
from autocogphylo.
from autocogphylo.
Here's hou you compute turchin:
dolgos = {}
wl = Wordlist('ABVDxxx.tsv')
for idx, segments in iter_rows(wl, 'tokens'):
dolgo = tokens2class(tokens, 'dolgo') + ['H']
if dolgo[0] == 'V':
dolgo = ['H'] + dolgo
dstring = ''.join([d for d in dolgo if d != 'V'])[:2]
dolgos[idx] = dstring+'-'+wl[k, 'concept']
wl.add_entries('cog', dolgos, lambda x: dolgos[x])
wl.renumber('cog', 'turchinid')
wl.output('paps.nex', ref='turchinid', missing='?')
from autocogphylo.
The basic idea is: just convert all segments to the two-letter thing in dolgopolsky-strings, and then use lingpy's functions for renumbering of cognate sets to convert the things to numbers. Much faster and easier than envoking the lexstat model.
from autocogphylo.
jusut added the code "turchin.py", note that this is much faster and closer to the original description
from autocogphylo.
Related Issues (13)
- LexStat HOT 11
- adjust pmi score creation to regular wordlist output HOT 12
- Nexus: taxa and format HOT 5
- Number of clusters and number of characters in nexus files HOT 2
- Language names to glottocodes omitted languages. HOT 2
- Pama-Nyungan HOT 11
- Mutual coverage reports HOT 2
- Sino-Tibetan data contains many errors and needs to be replaced HOT 1
- selection of subsets based on coverage HOT 32
- reference trees HOT 9
- improper segmentation in aa data HOT 1
- ASJP and DOLGO columns HOT 5
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from autocogphylo.