cutano / kawazu Goto Github PK
View Code? Open in Web Editor NEWA C# library for converting Japanese sentence to Hiragana, Katakana or Romaji with furigana and okurigana modes supported. Inspired by project Kuroshiro.
License: MIT License
A C# library for converting Japanese sentence to Hiragana, Katakana or Romaji with furigana and okurigana modes supported. Inspired by project Kuroshiro.
License: MIT License
Hi,
First thanks for the great work with Kawazu, very heplful!
But sometimes it generates an exception.
Step to reproduce :
var converter = new KawazuConverter();
var inputH = converter.Convert("だれでうどうんづしますか", To.Hiragana, Mode.Normal).Result;
Stacktrace:
at Kawazu.Division..ctor(MeCabIpaDicNode node, TextType type, RomajiSystem system)
at Kawazu.KawazuConverter.<>c__DisplayClass6_0.<Convert>b__1(MeCabIpaDicNode node)
at System.Linq.Enumerable.SelectArrayIterator`2.ToList()
at Kawazu.KawazuConverter.<>c__DisplayClass6_0.<Convert>b__0()
at System.Threading.Tasks.Task`1.InnerInvoke()
at System.Threading.ExecutionContext.RunFromThreadPoolDispatchLoop(Thread threadPoolThread, ExecutionContext executionContext, ContextCallback callback, Object state)
--- End of stack trace from previous location ---
at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw()
at System.Threading.Tasks.Task.ExecuteWithThreadLocal(Task& currentTaskSlot, Thread threadPoolThread)
--- End of stack trace from previous location ---
at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw()
at System.Runtime.CompilerServices.TaskAwaiter.ThrowForNonSuccess(Task task)
at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task task)
at Kawazu.KawazuConverter.<Convert>d__6.MoveNext()
the current version Kawazu works only with .Net core1-3, I tested .Net5 and it works good only if I copy the matrix.bin
and blabal files to the build output floder.
So it would be easy to support .Net5 by modify the csproj file.
BTW, it would helpful if you can support the old .net frameworks4+, there are a lot apps running on .net frameworks, but I'm not sure if Kawzu can run on .NetFramework4 without code changing, no test yet.
I could be doing things incorrectly but i am trying to basically do 2 things given almost any imput in a japanese text field.
The Romaji conversion is thowing a heap of erorrs most of the time like the one below from a unit test.
[Theory]
[InlineData("袖ケ浦港運", "Sodegaura-kō un")]
public async Task RomajiTransliterationTest(string input, string expectedOutput)
{
KawazuConverter converter = new();
var output = await converter.Convert(input, To.Romaji, Mode.Okurigana, RomajiSystem.Hepburn);
Assert.Equal(expectedOutput, output);
}
System.ArgumentOutOfRangeException: Length cannot be less than zero. (Parameter 'length')
at System.Text.StringBuilder.ToString(Int32 startIndex, Int32 length)
at Kawazu.Division..ctor(MeCabIpaDicNode node, TextType type, RomajiSystem system)
at Kawazu.KawazuConverter.<>c__DisplayClass6_0.<Convert>b__1(MeCabIpaDicNode node)
at System.Linq.Enumerable.SelectArrayIterator`2.ToList()
at System.Linq.Enumerable.ToList[TSource](IEnumerable`1 source)
at Kawazu.KawazuConverter.<>c__DisplayClass6_0.<Convert>b__0()
at System.Threading.Tasks.Task`1.InnerInvoke()
at System.Threading.Tasks.Task.<>c.<.cctor>b__277_0(Object obj)
at System.Threading.ExecutionContext.RunFromThreadPoolDispatchLoop(Thread threadPoolThread, ExecutionContext executionContext, ContextCallback callback, Object state)
Hi,
Not sure if this is a Kawazu or LibNMeCab issue, but when converting kanjis ignores exceptions.
For example, if one wants to convert 300 with 三百, it will output さんひゃく (sanhyaku) but the correct answer is さんびゃく (sanbyaku).
Same for 600 and 900.
Currently working on a workaround on my fork: lasyan3@156bf7e
Hi Cutano. This is more of a request than an issue. Would you include a PartsOfSpeech property to Division.cs? I believe you can pull it from MeCabIpaDicNode.
https://github.com/Cutano/Kawazu/blob/master/Kawazu/Division.cs#L131-L154
This fails to correctly segment the readings for words without kanji in the middle. Eg 言い方 and 当たり前 both fail.
Hi, thanks for your nice work, but I'd like to get pronunciations one by one.
as"今晩"
var result = await converter.Convert("今晩は", To.Romaji, Mode.Normal, RomajiSystem.Hepburn, "(", ")");
Kawazu return "komban"
now
I wanna get a list {"kom", "ban"}
for searching match purpose
is there any way to get this list?
Hi Cutano. Sorry to bother you again. I was just wondering. Is it possible to covert Romaji to Hiragana?
I think you should add a dispose method in the KawazuConverter class to dispose the MeCab tagger and release unmanaged memory because the garbage collector can't clean unmanaged memory.
Hi Cutano. I was playing around with the KawazuConverter and found that when converting to To.Hiragana in Mode.Furigana it sometimes returns inaccurate results. For example, with the word:
あの方
the result was: あのほう
and it should have been: あのかた
Hi. This is more of a question than an issue. If I include this library in a commercial project. What do I need to do to satisfy this condition in your license?
"The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software."
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.