Code Monkey home page Code Monkey logo

Comments (7)

wareya avatar wareya commented on June 6, 2024

I think some word had blank definitions when I tried to look it up because of this. Yeah, it's annoying, you can't tell whether something in brackets is a tag, a restriction to a particular spelling/reading, or just a note without having the code check one at a time, and I forgot to bother to do that because it already got a little messy.

from spark-reader.

LaurensWeyn avatar LaurensWeyn commented on June 6, 2024

Working on the JMDict conversion, I'm surprised I didn't notice the missing brackets sooner; there's a lot of useful information there.

This also broke definition export line splitting, and all text in brackets from the user dictionary file after saving. Instead of trying to fix it, I thought to get going on the JMDict parser.

I've made a new branch for this since it changes a lot of stuff (Is it a good idea to upload near 100MB files to GitHub? Probably not...), and has all the metadata loaded/stored properly this time, though it's not all in use yet. I have yet to port things like your FrequencySink though.

from spark-reader.

wareya avatar wareya commented on June 6, 2024

Oh wow, I had no idea it broke so much stuff. Sorry about that.

The FrquencySink stuff just tries to find a valid combination of spelling and reading(katakana) in the frequency data sink. It's a simple idea, but the code is gross. I can fix it once you decide the JMDict functionality is ready.

from spark-reader.

LaurensWeyn avatar LaurensWeyn commented on June 6, 2024

Don't worry, not too big a deal. Mostly the fault of me too lazy to set up tests for anything except the simplest of things.

from spark-reader.

wareya avatar wareya commented on June 6, 2024

Regression testing is hard.

from spark-reader.

wareya avatar wareya commented on June 6, 2024

I think there should be an option for the definition export feature to prefer exporting kanji. This was one of the ideas behind associating individual spellings and readings for definitions. This can be done once the JMDict functionality is all done, since you could get the exact most preferred (first) kanji valid for a given reading, if the word was written in kana. With the old EDict functionality there's no way to make sure the exporter is only looking at kanji that are okay for that reading, since parsing the brackets turned out to break things.

from spark-reader.

LaurensWeyn avatar LaurensWeyn commented on June 6, 2024

I think the JMDict implementation is fairly stable now and ready for the master branch. The biggest issue with it right now is the relevance/sorting system, which needs some tweaking. I could've emulated the EDict2 'P' tag approach but I went for a new scoring system that should be better in the long run.

With that, this bug is mostly fixed, except for my frustrations with the user dictionary editor not updating internally, or not saving to a file, or both. On the bright side, this has given me motivation to get working on that VNDB importer, which I should start working on hopefully this wednesday or sooner.

from spark-reader.

Related Issues (19)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.