I imagine this has to do with the new tag parsing system. I noticed this mainly be

Text in brackets being removed from Edict definitions about spark-reader HOT 7 CLOSED

laurensweyn commented on June 6, 2024

Text in brackets being removed from Edict definitions

from spark-reader.

Comments (7)

wareya commented on June 6, 2024

I think some word had blank definitions when I tried to look it up because of this. Yeah, it's annoying, you can't tell whether something in brackets is a tag, a restriction to a particular spelling/reading, or just a note without having the code check one at a time, and I forgot to bother to do that because it already got a little messy.

from spark-reader.

LaurensWeyn commented on June 6, 2024

Working on the JMDict conversion, I'm surprised I didn't notice the missing brackets sooner; there's a lot of useful information there.

This also broke definition export line splitting, and all text in brackets from the user dictionary file after saving. Instead of trying to fix it, I thought to get going on the JMDict parser.

I've made a new branch for this since it changes a lot of stuff (Is it a good idea to upload near 100MB files to GitHub? Probably not...), and has all the metadata loaded/stored properly this time, though it's not all in use yet. I have yet to port things like your FrequencySink though.

from spark-reader.

wareya commented on June 6, 2024

Oh wow, I had no idea it broke so much stuff. Sorry about that.

The FrquencySink stuff just tries to find a valid combination of spelling and reading(katakana) in the frequency data sink. It's a simple idea, but the code is gross. I can fix it once you decide the JMDict functionality is ready.

from spark-reader.

LaurensWeyn commented on June 6, 2024

Don't worry, not too big a deal. Mostly the fault of me too lazy to set up tests for anything except the simplest of things.

from spark-reader.

wareya commented on June 6, 2024

Regression testing is hard.

from spark-reader.

wareya commented on June 6, 2024

I think there should be an option for the definition export feature to prefer exporting kanji. This was one of the ideas behind associating individual spellings and readings for definitions. This can be done once the JMDict functionality is all done, since you could get the exact most preferred (first) kanji valid for a given reading, if the word was written in kana. With the old EDict functionality there's no way to make sure the exporter is only looking at kanji that are okay for that reading, since parsing the brackets turned out to break things.

from spark-reader.

LaurensWeyn commented on June 6, 2024

I think the JMDict implementation is fairly stable now and ready for the master branch. The biggest issue with it right now is the relevance/sorting system, which needs some tweaking. I could've emulated the EDict2 'P' tag approach but I went for a new scoring system that should be better in the long run.

With that, this bug is mostly fixed, except for my frustrations with the user dictionary editor not updating internally, or not saving to a file, or both. On the bright side, this has given me motivation to get working on that VNDB importer, which I should start working on hopefully this wednesday or sooner.

from spark-reader.

Text in brackets being removed from Edict definitions about spark-reader HOT 7 CLOSED

Comments (7)

Related Issues (19)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent