Comments (4)
thanks Cutano. I am running people and company names though it so it is hitting other ones as well that are not fixed but that fix did work for a lot of the names in the batch i was testing on.
below is a few more that a failing. It is not all of them but i am hoping that it is enough to give a good idea of the issue. The first fix seem to reduce the errors by over 50%
日本コンクリート工業(株)
東京コスモス電機(株)
日本アジア投資(株)
日精エー・エス・ビー機械(株)
(株)三ッ星
三ッ矢産業
三ツ川工業所
三ツ川浩一
白神自然学校一ッ森校
from kawazu.
i think i located the issue. hopefully this makes sense since i can't write or read japanese myself. The issue was that the Utilities.GetTextType method returns the wrong response cases where a Kana character is actually concidered a Kanji character like in the name "袖ケ浦港運". In that usecase the "ケ" should have been treated as Kanji and Utilities.GetTextType should have returned PureKanji but instead returns KanjiKanaMixed that then breaks the conversion. If you force it to PureKanji the conversion seems to work.
from kawazu.
i think i located the issue. hopefully this makes sense since i can't write or read japanese myself. The issue was that the Utilities.GetTextType method returns the wrong response cases where a Kana character is actually concidered a Kanji character like in the name "袖ケ浦港運". In that usecase the "ケ" should have been treated as Kanji and Utilities.GetTextType should have returned PureKanji but instead returns KanjiKanaMixed that then breaks the conversion. If you force it to PureKanji the conversion seems to work.
Yes, I checked the call stack and found the problem was caused by the ambiguity of kana "ケ". In common circumstances, it pronounced as "ke", but in the example that you offered, it is "ge", which directly caused the mismatch of the method IndexOf()
.
I'm working on this problem currently but don't know how to solve this in a decent way yet.
from kawazu.
I updated the nuget package and solved the problem temporarily by filtering the ケ relating results, but there could be other problems. Right now, it is just a temporary solution.
from kawazu.
Related Issues (15)
- Can I get a list of pronunciations for every char. HOT 3
- One or more errors occurred. (Index was outside the bounds of the array.) HOT 4
- Romaji to Hiragana HOT 2
- Hiraganas does not take context into account (like numbers) HOT 11
- Parts of speech HOT 2
- Why didn't provide synchronized API? HOT 2
- ArgumentOutOfRangeException for input 鷺ノ森中ノ丁
- Words with kana in the middle fail to divide correctly HOT 2
- .net5 support request HOT 2
- Dispose method to release unmanaged memory HOT 3
- License HOT 5
- Resources(dictionary) not been copied automatically HOT 7
- Question about Romaji HOT 1
- Furigana is sometimes inaccurate HOT 5
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from kawazu.