Comments (7)
I think some word had blank definitions when I tried to look it up because of this. Yeah, it's annoying, you can't tell whether something in brackets is a tag, a restriction to a particular spelling/reading, or just a note without having the code check one at a time, and I forgot to bother to do that because it already got a little messy.
from spark-reader.
Working on the JMDict conversion, I'm surprised I didn't notice the missing brackets sooner; there's a lot of useful information there.
This also broke definition export line splitting, and all text in brackets from the user dictionary file after saving. Instead of trying to fix it, I thought to get going on the JMDict parser.
I've made a new branch for this since it changes a lot of stuff (Is it a good idea to upload near 100MB files to GitHub? Probably not...), and has all the metadata loaded/stored properly this time, though it's not all in use yet. I have yet to port things like your FrequencySink though.
from spark-reader.
Oh wow, I had no idea it broke so much stuff. Sorry about that.
The FrquencySink stuff just tries to find a valid combination of spelling and reading(katakana) in the frequency data sink. It's a simple idea, but the code is gross. I can fix it once you decide the JMDict functionality is ready.
from spark-reader.
Don't worry, not too big a deal. Mostly the fault of me too lazy to set up tests for anything except the simplest of things.
from spark-reader.
Regression testing is hard.
from spark-reader.
I think there should be an option for the definition export feature to prefer exporting kanji. This was one of the ideas behind associating individual spellings and readings for definitions. This can be done once the JMDict functionality is all done, since you could get the exact most preferred (first) kanji valid for a given reading, if the word was written in kana. With the old EDict functionality there's no way to make sure the exporter is only looking at kanji that are okay for that reading, since parsing the brackets turned out to break things.
from spark-reader.
I think the JMDict implementation is fairly stable now and ready for the master branch. The biggest issue with it right now is the relevance/sorting system, which needs some tweaking. I could've emulated the EDict2 'P' tag approach but I went for a new scoring system that should be better in the long run.
With that, this bug is mostly fixed, except for my frustrations with the user dictionary editor not updating internally, or not saving to a file, or both. On the bright side, this has given me motivation to get working on that VNDB importer, which I should start working on hopefully this wednesday or sooner.
from spark-reader.
Related Issues (19)
- User Input improvements
- Edict entry ID is parsed incorrectly HOT 1
- Question: preferred definitions, should they be based on dictionary (deconjugated) form or surface (as it is in text) form? HOT 9
- Experimental changes HOT 2
- Build instructions? HOT 3
- Idea: cosmetic-only segmentation with "mouseover" mode HOT 1
- JMDict update broke something HOT 4
- Moving over to MVC HOT 1
- Some weird problem with stagr/stagk in jmdict
- Version of Java Required to Run Sparkreader HOT 2
- Automatic Line Breaks HOT 2
- Delete workspace.xml (temporarily) and add it to .gitignore HOT 4
- Manu bugs HOT 3
- Not registering clipboard changes HOT 1
- Question: kuromoji HOT 21
- Question: old deconjugator HOT 2
- UI can become completely inaccessible, leaving java running HOT 1
- Word splitter performance issues HOT 9
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from spark-reader.