Comments (26)
I'm still trying to follow the blogpost, I think it would be a cool skill to be able to create these on my own, but I have just started so I have to see how it is going and will write an update here as soon as I can (probably today still), thanks for the great support!
Edit: Connected issues:
from streitberggothic.
from streitberggothic.
Closing this, because the code is running without error now.
from streitberggothic.
Still getting the same error. If I run
cd C:\Users\Viktor\OneDrive\PhD\concepticon
concepticon --repos=C:\Users\Viktor\OneDrive\PhD\concepticon\concepticon-data map_concepts C:\Users\Viktor\OneDrive\Git\streitberggothic\raw\Van_Loon-2004-3659.tsv
I get following error written into test.tsv:
On Windows you must specify an output file since printing to the terminal may not work
--
usage: concepticon map_concepts [-h] [--reference-list REFLIST]
[--full-search] [--language LANGUAGE]
[--skip_multimatch] [--output OUTPUT]
CONCEPTLIST
Attempt an automatic mapping for a new concept list.
--
Notes
-----
In order for the automatic mapping to work, the new list has to be
well-formed, i.e. in line with the requirments of Concepticon
(GLOSS/ENGLISH column, see also CONTRIBUTING.md).
positional arguments:
CONCEPTLIST Path to (or ID of) concept list in TSV format
options:
-h, --help show this help message and exit
--reference-list REFLIST
Another concept list to be used as reference for the
gloss mapping (default: None)
--full-search select between approximate search (default) and full
search (default: False)
--language LANGUAGE specify your desired language for mapping (default:
en)
--skip_multimatch
--output OUTPUT specify output file (default: None)
from streitberggothic.
So would you not just have to specify an output file via --output ?
from streitberggothic.
Yes, now it worked, thanks :) I had tried the --output flag earlier but had put it in the wrong place conceptivon --repos --output map_conceptlist
instead of concepticon --repos map_conceptlist --output
I now uploaded concepts.tsv but I don't know how to resolve the conflicts - for the multiple mappings it is impossible for me to decide which one is correct, so do I pick a random one in those cases? - and the ones with question marks, do I really have to delete them all? I mean I can see that some of the meanings seem nonsensical (like Aai -> ???), but some are perfectly fine but somehow weren't recognised (like Nachkomme -> ???), do I have to add those manually? see concepticon/concepticon-data#1157
from streitberggothic.
still experimenting, current update:
C:\Users\Viktor\OneDrive\Git\streitberggothic>cldfbench lexibank.makecldf lexibank_streitgerggothic.py --concepticon=C:\Users\Viktor\OneDrive\PhD\lexibank\concepticon\concepticon\concepticon-data --glottolog=C:\Users\Viktor\OneDrive\PhD\lexibank\glottolog --clts=C:\Users\Viktor\OneDrive\PhD\lexibank\clts --concepticon-version=v2.5.0 --glottolog-version=v4.5 --clts-version=v2.2.0
Traceback (most recent call last):
File "C:\Users\Viktor\AppData\Local\Programs\Python\Python310\lib\runpy.py", line 196, in _run_module_as_main
return _run_code(code, main_globals, None,
File "C:\Users\Viktor\AppData\Local\Programs\Python\Python310\lib\runpy.py", line 86, in _run_code
exec(code, run_globals)
File "C:\Users\Viktor\AppData\Local\Programs\Python\Python310\Scripts\cldfbench.exe\__main__.py", line 7, in <module>
File "C:\Users\Viktor\AppData\Local\Programs\Python\Python310\lib\site-packages\cldfbench\__main__.py", line 68, in main
stack.enter_context(
File "C:\Users\Viktor\AppData\Local\Programs\Python\Python310\lib\contextlib.py", line 492, in enter_context
result = _cm_type.__enter__(cm)
File "C:\Users\Viktor\AppData\Local\Programs\Python\Python310\lib\site-packages\cldfcatalog\catalog.py", line 69, in __enter__
self.checkout(self.tag)
File "C:\Users\Viktor\AppData\Local\Programs\Python\Python310\lib\site-packages\cldfcatalog\repository.py", line 105, in checkout
return self.repo.git.checkout(spec)
File "C:\Users\Viktor\AppData\Local\Programs\Python\Python310\lib\site-packages\git\cmd.py", line 638, in <lambda>
return lambda *args, **kwargs: self._call_process(name, *args, **kwargs)
File "C:\Users\Viktor\AppData\Local\Programs\Python\Python310\lib\site-packages\git\cmd.py", line 1183, in _call_process
return self.execute(call, **exec_kwargs)
File "C:\Users\Viktor\AppData\Local\Programs\Python\Python310\lib\site-packages\git\cmd.py", line 983, in execute
raise GitCommandError(redacted_command, status, stderr_value, stdout_value)
git.exc.GitCommandError: Cmd('git') failed due to: exit code(1)
cmdline: git checkout v4.5
stderr: 'error: Your local changes to the following files would be overwritten by checkout:
languoids/tree/atla1278/volt1241/kwav1236/nato1234/lele1262/likp1239/sekp1241/md.ini
languoids/tree/atla1278/volt1241/nort3149/buak1234/adam1257/goul1243/goul1244/zank1234/kula1285/fani1244/md.ini
languoids/tree/aust1307/mala1545/cele1242/grea1299/east2488/sout2928/bung1268/east2489/east2490/baho1237/md.ini
languoids/tree/aust1307/mala1545/cele1242/grea1299/east2488/sout2928/muna1246/nucl1573/muni1256/buso1238/md.ini
languoids/tree/aust1307/mala1545/cele1242/grea1299/tomi1242/sout2925/damp1237/md.ini
languoids/tree/aust1307/mala1545/cele1242/kail1255/nort2898/kail1254/comm1248/bara1371/md.ini
languoids/tree/aust1307/mala1545/cent2237/cent2245/cent2254/east2466/east2741/seti1249/beng1287/md.ini
languoids/tree/aust1307/mala1545/cent2237/east2712/ocea1241/admi1239/east2459/manu1262/east2460/koro1314/bowa1234/papi1254/md.ini
languoids/tree/aust1307/mala1545/cent2237/east2712/ocea1241/admi1239/east2459/sout2879/loup1244/balu1257/md.ini
languoids/tree/aust1307/mala1545/cent2237/east2712/ocea1241/sout3173/newc1243/main1286/sout3313/extr1245/nume1242/md.ini
languoids/tree/aust1307/mala1545/cent2237/east2712/ocea1241/west2818/nort3206/sarm1241/sarm1242/anus1238/anus1237/md.ini
languoids/tree/aust1307/mala1545/cent2237/east2712/sout2850/sout3229/bomb1263/bedo1237/md.ini
languoids/tree/aust1307/mala1545/cent2237/east2712/sout2850/sout3229/morm1235/md.ini
languoids/tree/aust1307/mala1545/cent2237/east2712/sout2850/sout3229/raja1255/ambe1249/biga1238/md.ini
languoids/tree/aust1307/mala1545/cent2237/east2712/sout2850/sout3229/raja1255/asss1237/md.ini
languoids/tree/aust1307/mala1545/cham1312/md.ini
languoids/tree/aust1307/mala1545/nort3238/meso1254/sout2905/md.ini
languoids/tree/aust1307/mala1545/nort3253/sara1342/puna1279/puna1280/aput1239/sian1255/md.ini
languoids/tree/aust1307/mala1545/sout2923/nort2894/pitu1237/dakk1238/md.ini
languoids/tree/aust1307/mala1545/sout2923/ramp1244/bada1260/beso1237/md.ini
languoids/tree/aust1307/mala1545/sout2923/ramp1244/seko1241/pana1302/budo1241/md.ini
languoids/tree/coch1271/yuma1250/gene1244/paii1252/paip1241/md.ini
languoids/tree/geel1240/bura1294/bura1276/md.ini
languoids/tree/geel1240/demi1242/md.ini
languoids/tree/inan1242/duri1243/md.ini
languoids/tree/indo1319/clas1257/indo1320/indo1321/indo1324/chit1278/kala1372/md.ini
languoids/tree/iroq1247/nort2947/moha1257/moha1258/md.ini
languoids/tree/japo1237/ryuk1243/ryuk1244/miya1259/md.ini
languoids/tree/kawe1237/nort1506/qawa1238/md.ini
languoids/tree/lake1255/east2500/tawo1244/md.ini
languoids/tree/lake1255/tari1255/east2502/dout1239/dout1240/md.ini
languoids/tree/nakh1245/dagh1238/avar1255/andi1254/botl1243/botl1242/md.ini
languoids/tree/pama1250/dese1234/ngum1251/ngum1256/jaru1256/jaru1254/md.ini
languoids/tree/sali1297/sali1298/md.ini
languoids/tree/sape1238/md.ini
languoids/tree/sino1245/kuki1245/kuki1246/oldk1252/chir1298/chir1283/md.ini
languoids/tree/sino1245/mish1241/idum1241/md.ini
languoids/tree/taik1256/kamt1241/daic1238/daic1237/cent2251/wenm1239/sapa1255/sout3184/sout2743/blac1269/thai1259/md.ini
languoids/tree/taik1256/kamt1241/daic1238/daic1237/cent2251/wenm1239/sapa1255/sout3184/sout2743/shan1276/assa1264/aito1238/md.ini
languoids/tree/taik1256/kamt1241/daic1238/daic1237/cent2251/wenm1239/sapa1255/sout3184/sout2743/shan1276/assa1264/kham1290/md.ini
languoids/tree/toro1256/tora1268/coas1312/bone1255/md.ini
languoids/tree/tupi1275/mawe1252/awet1245/tupi1276/tupi1281/waya1271/zoee1241/zoee1240/md.ini
languoids/tree/tuuu1241/kwii1241/nuuu1241/md.ini
languoids/tree/ural1272/saam1281/west2390/cent2240/nort2671/md.ini
languoids/tree/west2604/kara1499/md.ini
Please commit your changes or stash them before you switch branches.
Aborting'
and if I rerun it I get:
warning: templates not found in C:/Users/Viktor/.git-template
error: remote origin already exists.
Everything up-to-date
from streitberggothic.
Running cldfbench lexibank.makecldf lexibank_streitberggothic.py --concepticon=C:\Users\Viktor\OneDrive\PhD\lexibank\concepticon\concepticon\concepticon-data --glottolog=C:\Users\Viktor\OneDrive\PhD\lexibank\glottolog --clts=C:\Users\Viktor\OneDrive\PhD\lexibank\clts --concepticon-version=v2.5.0 --glottolog-version=v4.5 --clts-version=v2.2.0
I'm getting the output:
ERROR:
Invalid dataset spec: <lexibank.dataset> lexibank_streitberggothic.py
This is the Python script that seems to cause the error, but I'm not quite sure how.
Maybe I should also mention that I ipa-transcribed and cleaned the data during preprocessing with this script from this xml-file (cf. the html version)
Update: Currently reading through the documentation of cldfbench to understand the mechanics, got an error during the tutorial: cldf/cldfbench#73
from streitberggothic.
Update: Managed to push the repo via GitHub Desktop. I assume the validation part with the yml-file is not part of my todo-list anymore, right? Otherwise next up for me would be:
- taking out df.explode from xml2csv.py since pylexibank convers that part
- adding parameters.csv
- adding replacements through form_spec
- transcribing to IPA
- creating an orthography profile, improving transcriptions, checking the data
- maybe try pysem one day to add links to concepticon
from streitberggothic.
Nice. For the orthoprofile, you may not need to create IPA first, but can use the orthoprofile to convert to IPA.
from streitberggothic.
If you check this website, providing a JS implementation of orthoprofiles: https://digling.org/calc/profile/ you can find that we have already one Gothic profile there. You could compare with your data. The profiles are here: https://github.com/orthograpy/orthograpy
from streitberggothic.
I got stuck again while creating parameters.csv. My first assumption was to simply modify the current Script by adding args.writer.objects['ParameterTable'].append({...})
to the loop but that gave me the error pycldf.dataset.SchemaError: 'Dataset has no table "ParameterTable"'
. So I decided to abandon the cldfbench tutorial and try it with the pylexibank one from the blogpost again. But add_languages()
doesn't work, it just gets stuck and throws no error. This is the corresponding Script:
import pathlib
from pylexibank import Dataset as BaseDataset
from pylexibank import Language
class CustomLanguage(Language):
pass
class Dataset(BaseDataset):
dir = pathlib.Path(__file__).parent
id = "streitberggothic"
language_class = CustomLanguage
def cmd_makecldf(self, args):
# add bib
args.writer.add_sources()
args.log.info("added sources")
# add language
print(self.languages)
args.writer.add_languages()
print("hi")
It does write sources.bib correctly to the folder cldf, prints INFO running _cmd_makecldf on streitberggothic ... INFO added sources
to the console. Then it prints [OrderedDict([('ID', 'goth1244'), ('Name', 'Gothic'), ('Macroarea', 'Eurasia'), ('Latitude', '46.9304'), ('Longitude', '29.9786'), ('Glottocode', 'goth1244'), ('ISO639P3code', 'got'), ('Countries', 'UA'), ('Family_ID', 'indo1319'), ('Language_ID', '0')])]
and then it gets stuck, i.e. "hi" never gets printed.
I tried looking into the source code to understand what exactly is happening but it's difficult for me to understand and I guess that's also not the supposed way to fix this, so that's why I thought I'd write yet another comment here. From what I see the function add_languages()
is a loop where add_language()
is applied to multiple languages. Since I have only one language in my data I thought add_language()
might work instead and then got following error: ValueError: invalid CLDF identifier LanguageTable-ID:
. I found the line in the source-code that is triggering this but can't figure out what is happening. I also tried to keep the wrapper from the tutorial around class CustomLanguage
, and also tried to delete class CustomLanguage
over all, but it all leads to the same results. Sorry for writing so many comments and thanks for bearing with me.
from streitberggothic.
from clldutils.misc import slug
concepts = {}
for i, concept in enumerate(self.raw_dir.read_csv("concepts.tsv", delimiter="\t", dicts=True)):
idx = str(i+1)+"_"+slug(concept["gloss"])
args.writer.add_concept(ID=idx, Name=concept["gloss"])
concepts[concept["gloss"]] = idx
The dictionary concepts
gives you the reference to the concept ID (Parameter_ID
) in args.writer.add_form
or args.writer.add_forms_with_segments
to get the respective ID.
from streitberggothic.
Adding the concepts = parameters should be done with the lexibank commands, that is add_concept (as also done in the example file for Vietic languages).
from streitberggothic.
I pushed the current state of affairs to this repository. The good news is that "added concept" does get printed to my console now, which is an improvement, but then again only empty files with headers get written to the folder cldf if I run the following:
import pathlib
from clldutils.misc import slug
from pylexibank import Dataset as BaseDataset
from pylexibank import Language
from pylexibank import FormSpec
import attr
class CustomLanguage(Language):
pass
class Dataset(BaseDataset):
dir = pathlib.Path(__file__).parent
id = "streitberggothic"
language_class = CustomLanguage
form_spec = FormSpec(separators=",", first_form_only=True)
def cmd_makecldf(self, args):
# add bib
args.writer.add_sources()
args.log.info("added sources")
# add concept
concepts = {}
for i, concept in enumerate(self.concepts):
idx = f"{i}_{slug(concept['sense'])}"
concepts[concept["sense"]] = idx
args.writer.add_concept(
ID=idx,
Name=concept["sense"]
)
args.log.info("added concepts")
I wish I could run the entire script, as indicated in the tutorial, but adding just one line, namely args.writer.add_languages()
already causes it to get stuck without throwing an error, which makes debugging hard for me. I tried to describe the details as good as I could in the previous comment. I followed the instructions of the blogpost as closely as possible - only making changes where the nature of my own data compared to the one from the tutorial required so, like for example omitting the column "source" in languages.tsv. I still have the feeling it must be something trivial that I'm overlooking, but I've been playing around with this today the whole day and just can't find out what it is.
from streitberggothic.
You are missing decorators in your language class:
@attr.s
class CustomLanguage
But maybe uncomment it and also uncomment the assignment in the Dataset?
from streitberggothic.
I also tried to keep the wrapper from the tutorial around
class CustomLanguage
Sorry, I meant to say decorator. Adding the decorator doesn't fix the problem. And I left it out on purpose because my data has no extra column "Sources" to add like the Vietic one.
But maybe uncomment it and also uncomment the assignment in the Dataset?
Ah I think you're looking at the cldfbench_streitberggothic.py script, I'm sorry, that's my bad, I had left it in the folder and it's from the cldfbench-tutorial where it's still commented out, but currently I am running the other script, lexibank_streitberggothic.py
Looks like the problem has to do something with line 279 in cldf.py:
if (not getattr(self.args, 'dev', False)) and 'Glottocode' in kw \
and hasattr(self.args, 'glottolog') \
and kw['Glottocode'] in self.args.glottolog.api.cached_languoids:
This if clause never gets entered, and the parser doesn't reach the return statement that comes after it. I'm just in the process of figuring out
Update: It takes 6 minutes to reach the return statement after this if-clause and the if-clause itself is still false. The files written to the cldf-folder are still empty.
from streitberggothic.
from streitberggothic.
Yes yes, I'm doing all of those: "language_class = CustomLanguage" is uncommented, the decorator is added, I'm following the script for Vietic as closely as possible. But the if-clause in line 279 in cldf.py takes 6-7 minutes to run on my computer - even if I truncate my data - which I guess shouldn't be the case. Anyways, I'm still in the process of playing around and figuring out why exactly this happens
from streitberggothic.
from streitberggothic.
Ahh okay this explains a lot, thanks :)
Update: the conversion seems to be correct now & t new repo has been pushed 🎉
from streitberggothic.
Okay, now I finished the Vietic tutorial from A-Z and I think the repository is ready. I assume there's still room for improvement, as always, e.g. the CLTS-validation-badge looks a bit sad, so ideas and suggestions for improvement are very welcome; but in general I think that's it, right? Because if yes then I could slowly start writing the blog entry and I'll also keep an eye on pysem, so that I can add the more eclectic links to concepticon once that part is ready :) Thank you again for the great tutorial and all the help and patience with my questions.
from streitberggothic.
Did you push your changes? Then I'd review these days.
from streitberggothic.
Yes, I pushed them. Thank you.
from streitberggothic.
Ah, sorry realised only now that the pysem-tutorial came out already. So I'll tackle that next!
from streitberggothic.
Alright, I added the links to concepticon - 31% is the max that I could squeeze out without manual curation, which is quite good I think. Also improved the orthography by the procedure suggested in #3. pep8 stylechecked all scripts. I think the repo should be ready.
from streitberggothic.
Related Issues (7)
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from streitberggothic.