languagetool-org / languagetool Goto Github PK
View Code? Open in Web Editor NEWStyle and Grammar Checker for 25+ Languages
Home Page: https://languagetool.org
License: GNU Lesser General Public License v2.1
Style and Grammar Checker for 25+ Languages
Home Page: https://languagetool.org
License: GNU Lesser General Public License v2.1
Right now the tests for rules in the disambiguation rules run all the rules on the sentence that precede the tested rule. But there is a bug: if the rule is a part of a rule group, then previous members of the rule group are not run, so any changes to tokens are not visible in the test (though they would be visible in the LanguageTool output). Make a correction in DisambiguationRuleTest to run the preceding rules from the same rule group as well.
LT 2.4 and 2.4.1
In the GUI, when I click 'Ignore' on a misspelled word, the 'Possible spelling mistake' rule is deactivated. I expected 'Ignore' to apply only to the misspelled word on which I right-click. I suggest that if possible, you change 'Ignore' to 'Deactivate spelling check' or something similar.
I just noticed that the command line is very
slow somehow when I used the -e option. I wonder
why. The following 2 commands take:
That looks like a bug to me. I can think why it
would take so long:
$ echo "C'est-à-dire." | \
time java -jar /home/pel/sb/languagetool/languagetool-standalone/target/LanguageTool-2.4-SNAPSHOT/LanguageTool-2.4-SNAPSHOT/languagetool-commandline.jar \
-l fr -
Expected text language: French
Working on STDIN...
Time: 524ms for 1 sentences (1.9 sentences/sec)
4.58user 0.11system 0:01.84elapsed 254%CPU (0avgtext+0avgdata 97836maxresident)k
0inputs+0outputs (0major+37347minor)pagefaults 0swaps
$ echo "C'est-à-dire." | \
time java -jar /home/pel/sb/languagetool/languagetool-standalone/target/LanguageTool-2.4-SNAPSHOT/LanguageTool-2.4-SNAPSHOT/languagetool-commandline.jar \
-l fr -e APOS_TYP -
Expected text language: French
Working on STDIN...
1.) Line 1, column 2, Rule ID: APOS_TYP[1]
Message: Employez l’apostrophe typographique '’'.
Suggestion: ’
C'est-à-dire.
^
Time: 76ms for 1 sentences (13.2 sentences/sec)
15.69user 0.26system 0:12.62elapsed 126%CPU (0avgtext+0avgdata 371400maxresident)k
0inputs+0outputs (0major+106806minor)pagefaults 0swaps
Besides the difference with and without -e, the timing given by
LT is very different than the timing given by the "time" command
(15.69 sec vs 76ms in the last example). Obviously they measure
something different.
It would be nice if the standalone GUI had options to navigate from an issue (in the lower part of the window) to the issue in the text. Also, the ability to go to the next/previous issues in the text would be very nice to have.
Right now, the disambiguation rules have only one action applied only to one token (only sometimes to a group of tokens, when the action is "unify" or "filterall"). Make it possible to apply the same action to multiple tokens, and apply multiple actions (for example, filter and immunize).
All code that loads resources currently uses the classloader as resource loader. This is allows to package all resources within the JARs or somewhere else in the classpath. It's good for standalone applications or plugins, because they can be delivered as packages without any external references. It would be good to abstract the resource loading from the classloader to an own small API/interface to allow API-users custom loading mechanisms, e.g. from a database or a network filesystem instead of the classpath. Additionally the API-user may cache the content of resources, e.g. dictionaries or other huge files, and reduce disk I/O.
The benifit is that in such cases, most likely in server-environments, it's possible to change resources, like the grammar.xml
or dictionaries, at runtime without the need of a server-restart to update the languagetool JARs.
Idea
A new interface ResourceLoader
will be introduced and provided to both the Language
and the JLanguageTool
in a separate constructor. All default constructors will create a default implementation ClasspathResourceLoader
that implements loading over the classloader. All code that requires to load resources must have access to the resource loader, most likely by requiring it in a constructor. There should be no static references to a default constructor or so, to keep it as flexible as possible.
Constraints
Can't mount the extension on Version: 4.1.3.2
The disambiguation rules do not support phrases but phrases are simply discarded silently so the user may be quite surprised to see that they don't work. What is required is to bring PatternRule and DisambiguationPatternRule in sync (and PatternRuleMatcher and DisambiguationPatternReplacer as well). One needs to use the element list to ensure that markers are supported correctly in disambiguation rules.
In your demo site they are no suggestion the french misspelling :'(
ex of web :
https://languagetool.org:8081/?language=fr&text=saluut%20comment%20vass%20tu%20?
the reponse is :
...
error fromy="0" fromx="0" toy="0" tox="6" ruleId="HUNSPELL_NO_SUGGEST_RULE" msg="Faute de frappe possible trouvée" replacements="" context="saluut comment vass tu ?" contextoffset="0" offset="0" errorlength="6" category="Faute de frappe possible" locqualityissuetype="misspelling"/
...
and when we try with : https://languagetool.org:8081/?language=fr-FR&text=saluut%20comment%20vass%20tu%20?
there are a error with no language valid :s
In doc of commad line they are : -l, –-language Text language as a character code (e.g. en for English). Note that you need to specify a variant (e.g. en-US) if you want spell checking to be active (spell checking is supported since LanguageTool 1.8).
but in french is impossible to specify the variant
Sorry for my bad english !
It happened to me once that the test got stuck with 200% CPU load. The stacktrace was this (quoting only the part that matters):
"pool-3-thread-6" prio=10 tid=0x00007f790c2d4800 nid=0x6a93 runnable [0x00007f7950c87000]
java.lang.Thread.State: RUNNABLE
at java.util.HashMap.getEntry(HashMap.java:446)
at java.util.HashMap.containsKey(HashMap.java:434)
at org.languagetool.rules.patterns.UnifierConfiguration.setEquivalence(UnifierConfiguration.java:60)
at org.languagetool.rules.patterns.XMLRuleHandler.finalizeTokens(XMLRuleHandler.java:569)
at org.languagetool.rules.patterns.PatternRuleHandler.endElement(PatternRuleHandler.java:275)
at com.sun.org.apache.xerces.internal.parsers.AbstractSAXParser.endElement(AbstractSAXParser.java:606)
at com.sun.org.apache.xerces.internal.impl.dtd.XMLDTDValidator.endNamespaceScope(XMLDTDValidator.java:2054)
at com.sun.org.apache.xerces.internal.impl.dtd.XMLDTDValidator.handleEndElement(XMLDTDValidator.java:2005)
at com.sun.org.apache.xerces.internal.impl.dtd.XMLDTDValidator.endElement(XMLDTDValidator.java:879)
at com.sun.org.apache.xerces.internal.impl.XMLDocumentFragmentScannerImpl.scanEndElement(XMLDocumentFragmentScannerImpl.java:1742)
at com.sun.org.apache.xerces.internal.impl.XMLDocumentFragmentScannerImpl$FragmentContentDriver.next(XMLDocumentFragmentScannerImpl.java:2900)
at com.sun.org.apache.xerces.internal.impl.XMLDocumentScannerImpl.next(XMLDocumentScannerImpl.java:607)
at com.sun.org.apache.xerces.internal.impl.XMLDocumentFragmentScannerImpl.scanDocument(XMLDocumentFragmentScannerImpl.java:489)
at com.sun.org.apache.xerces.internal.parsers.XML11Configuration.parse(XML11Configuration.java:835)
at com.sun.org.apache.xerces.internal.parsers.XML11Configuration.parse(XML11Configuration.java:764)
at com.sun.org.apache.xerces.internal.parsers.XMLParser.parse(XMLParser.java:123)
at com.sun.org.apache.xerces.internal.parsers.AbstractSAXParser.parse(AbstractSAXParser.java:1210)
at com.sun.org.apache.xerces.internal.jaxp.SAXParserImpl$JAXPSAXParser.parse(SAXParserImpl.java:568)
at com.sun.org.apache.xerces.internal.jaxp.SAXParserImpl.parse(SAXParserImpl.java:302)
at javax.xml.parsers.SAXParser.parse(SAXParser.java:195)
at org.languagetool.rules.patterns.PatternRuleLoader.getRules(PatternRuleLoader.java:71)
at org.languagetool.JLanguageTool.loadPatternRules(JLanguageTool.java:311)
at org.languagetool.JLanguageTool.activateDefaultPatternRules(JLanguageTool.java:345)
at org.languagetool.server.LanguageToolHttpHandler.getLanguageToolInstance(LanguageToolHttpHandler.java:296)
at org.languagetool.server.LanguageToolHttpHandler.checkText(LanguageToolHttpHandler.java:230)
at org.languagetool.server.LanguageToolHttpHandler.handle(LanguageToolHttpHandler.java:108)
at com.sun.net.httpserver.Filter$Chain.doFilter(Filter.java:77)
at sun.net.httpserver.AuthFilter.doFilter(AuthFilter.java:83)
at com.sun.net.httpserver.Filter$Chain.doFilter(Filter.java:80)
at sun.net.httpserver.ServerImpl$Exchange$LinkHandler.handle(ServerImpl.java:677)
at com.sun.net.httpserver.Filter$Chain.doFilter(Filter.java:77)
at sun.net.httpserver.ServerImpl$Exchange.run(ServerImpl.java:649)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:724)
"pool-3-thread-5" prio=10 tid=0x00007f790c2d2800 nid=0x6a92 runnable [0x00007f7950d88000]
java.lang.Thread.State: RUNNABLE
at java.util.HashMap.getEntry(HashMap.java:446)
at java.util.HashMap.containsKey(HashMap.java:434)
at org.languagetool.rules.patterns.UnifierConfiguration.setEquivalence(UnifierConfiguration.java:60)
at org.languagetool.rules.patterns.XMLRuleHandler.finalizeTokens(XMLRuleHandler.java:569)
at org.languagetool.rules.patterns.PatternRuleHandler.endElement(PatternRuleHandler.java:275)
at com.sun.org.apache.xerces.internal.parsers.AbstractSAXParser.endElement(AbstractSAXParser.java:606)
at com.sun.org.apache.xerces.internal.impl.dtd.XMLDTDValidator.endNamespaceScope(XMLDTDValidator.java:2054)
at com.sun.org.apache.xerces.internal.impl.dtd.XMLDTDValidator.handleEndElement(XMLDTDValidator.java:2005)
at com.sun.org.apache.xerces.internal.impl.dtd.XMLDTDValidator.endElement(XMLDTDValidator.java:879)
at com.sun.org.apache.xerces.internal.impl.XMLDocumentFragmentScannerImpl.scanEndElement(XMLDocumentFragmentScannerImpl.java:1742)
at com.sun.org.apache.xerces.internal.impl.XMLDocumentFragmentScannerImpl$FragmentContentDriver.next(XMLDocumentFragmentScannerImpl.java:2900)
at com.sun.org.apache.xerces.internal.impl.XMLDocumentScannerImpl.next(XMLDocumentScannerImpl.java:607)
at com.sun.org.apache.xerces.internal.impl.XMLDocumentFragmentScannerImpl.scanDocument(XMLDocumentFragmentScannerImpl.java:489)
at com.sun.org.apache.xerces.internal.parsers.XML11Configuration.parse(XML11Configuration.java:835)
at com.sun.org.apache.xerces.internal.parsers.XML11Configuration.parse(XML11Configuration.java:764)
at com.sun.org.apache.xerces.internal.parsers.XMLParser.parse(XMLParser.java:123)
at com.sun.org.apache.xerces.internal.parsers.AbstractSAXParser.parse(AbstractSAXParser.java:1210)
at com.sun.org.apache.xerces.internal.jaxp.SAXParserImpl$JAXPSAXParser.parse(SAXParserImpl.java:568)
at com.sun.org.apache.xerces.internal.jaxp.SAXParserImpl.parse(SAXParserImpl.java:302)
at javax.xml.parsers.SAXParser.parse(SAXParser.java:195)
at org.languagetool.rules.patterns.PatternRuleLoader.getRules(PatternRuleLoader.java:71)
at org.languagetool.JLanguageTool.loadPatternRules(JLanguageTool.java:311)
at org.languagetool.JLanguageTool.activateDefaultPatternRules(JLanguageTool.java:345)
at org.languagetool.server.LanguageToolHttpHandler.getLanguageToolInstance(LanguageToolHttpHandler.java:296)
at org.languagetool.server.LanguageToolHttpHandler.checkText(LanguageToolHttpHandler.java:230)
at org.languagetool.server.LanguageToolHttpHandler.handle(LanguageToolHttpHandler.java:108)
at com.sun.net.httpserver.Filter$Chain.doFilter(Filter.java:77)
at sun.net.httpserver.AuthFilter.doFilter(AuthFilter.java:83)
at com.sun.net.httpserver.Filter$Chain.doFilter(Filter.java:80)
at sun.net.httpserver.ServerImpl$Exchange$LinkHandler.handle(ServerImpl.java:677)
at com.sun.net.httpserver.Filter$Chain.doFilter(Filter.java:77)
at sun.net.httpserver.ServerImpl$Exchange.run(ServerImpl.java:649)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:724)
ANREDE_KOMMA triggers on: „Sehr geehrter Herr Dr. Dings,“, but shouldn’t.
Words like Unschuldiger, Vorgesetzter, Suchender are marked as incorrect because they're capitalized.
I assume this error applies to all nouns derived from adjectives.
I interpreted this to mean I could use the standalone jar anywhere and could forget about everything else extracted from the zipfile because the jar is, well, standalone. However, when moving the jar elsewhere I get the following exception:
Exception in thread "main" java.lang.NoClassDefFoundError: org/languagetool/language/RuleFilenameException
at java.lang.Class.getDeclaredMethods0(Native Method)
at java.lang.Class.privateGetDeclaredMethods(Class.java:2521)
at java.lang.Class.getMethod0(Class.java:2764)
at java.lang.Class.getMethod(Class.java:1653)
at sun.launcher.LauncherHelper.getMainMethod(LauncherHelper.java:494)
at sun.launcher.LauncherHelper.checkAndLoadMain(LauncherHelper.java:486)
Caused by: java.lang.ClassNotFoundException: org.languagetool.language.RuleFilenameException
at java.net.URLClassLoader$1.run(URLClassLoader.java:366)
at java.net.URLClassLoader$1.run(URLClassLoader.java:355)
at java.security.AccessController.doPrivileged(Native Method)
at java.net.URLClassLoader.findClass(URLClassLoader.java:354)
at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:308)
at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
Is this expected?
Ich ein neue Fehler gefunden ==> was the text to prove. It showed no errors.
The correct sentence:
Ich (1) einen neuen Fehler gefunden
(1) could be: habe or hatte
Second Fault: neue: because "der Fehler" masculin, therefore:
Ich habe einen neuen Fehler gefunden
Another example:
Ich habe einen neue Feder gefunden
Correct: Ich habe eine neue Feder gefunden.
As "Feder" is feminin.
Trivial but irritating.
Libreoffice grammar check: 'trash' with UK English selected produces the following message
"trash is a common American expression, in British English it is more common to use: rubbis."
The final 'h' is missing.
I've noticed that quite a few tokens in the grammar.xml (at least in the german one) use regexp-tokens but just define a list of possible values, e.g. like this:
<token regexp="yes">an|auf|durch|für|gegen|über</token>
It would be nice to introduce multivalue-tokens for such cases, maybe like this:
<token multivalue="yes">an|auf|durch|für|gegen|über</token>
Either multivalue
or regexp
may be defined, but not both.
I think it would be the best to store the token values as Set
because Element.isStringTokenMatched()
must simply use Set.contains()
then, which is much faster than a regular expression that just holds a list of possible values.
It's not clear if existing rules should be changed or if there's a mechanism that checks the token value and interprets it as a multivalue token instead of a regular expression.
It's already known that most of the time is spent in the method for matching rules (see New Pattern Matching @ http://wiki.languagetool.org/missing-features), so this improvement will not change the world, but will slightly improve the performance.
Constraints:
Daniel, you forgot to add new functionality to the DisambiguatorPatternRule. I consider this a bug because the format of both files is kept in sync...
I guess skipMaxTokens should go to AbstractPatternRulePerformer?
I need a visualisation of false friends in the same way of
https://github.com/languagetool-org/languagetool/blob/master/languagetool-core/src/main/resources/org/languagetool/rules/false-friends.css
but show only the rulegroups in which a certain language is used.
The list of false friends is getting longer and longer so being able to filter on rulegroups only related to the language you are working on is practical.
Ideas for implementation are (one or more):
Jaume, I dumped the Polish dictionary, used the frequency list to encode it. But then I cannot dump the dictionary again as there is an error:
d:\download\LanguageTool-2.4-SNAPSHOT>java -cp languagetool.jar org.languagetool .dev.DictionaryExporter pl_PL.dict >pl_PL.src Unhandled program error occurred. Invoke with '--help' for help. java.lang.RuntimeException: Invalid dictionary entry format (missing separator). ``` at morfologik.stemming.DictionaryIterator.next(DictionaryIterator.java:5 ``` 9) at morfologik.stemming.DictionaryIterator.next(DictionaryIterator.java:1 5) at morfologik.tools.FSADumpTool.dump(FSADumpTool.java:171) at morfologik.tools.FSADumpTool.go(FSADumpTool.java:75) at morfologik.tools.Tool.go(Tool.java:45) at morfologik.tools.FSADumpTool.main(FSADumpTool.java:285) at org.languagetool.dev.DictionaryExporter.main(DictionaryExporter.java: 40)I think this is an omission on our part in morfologik-speller but it shows also in LT code.
Hi,
I get the following error when I try to install LanguageTool:
(com.sun.star.uno.RuntimeException) { { Message = "[jni_uno bridge error]
UNO calling Java method writeRegistryInfo: non-UNO exception occurred:
java.lang.UnsupportedClassVersionError: org/languagetool/openoffice/Main :
Unsupported major.minor version 51.0\X000ajava stack
trace:\X000ajava.lang.UnsupportedClassVersionError:
org/languagetool/openoffice/Main : Unsupported major.minor version
51.0\X000a\X0009at java.lang.ClassLoader.defineClass1(Native
Method)\X000a\X0009at
java.lang.ClassLoader.defineClass(ClassLoader.java:643)\X000a\X0009at
java.security.SecureClassLoader.defineClass(SecureClassLoader.java:142)\X000a\X0009at
java.net.URLClassLoader.defineClass(URLClassLoader.java:277)\X000a\X0009at
java.net.URLClassLoader.access$000(URLClassLoader.java:73)\X000a\X0009at
java.net.URLClassLoader$1.run(URLClassLoader.java:212)\X000a\X0009at
java.security.AccessController.doPrivileged(Native Method)\X000a\X0009at
java.net.URLClassLoader.findClass(URLClassLoader.java:205)\X000a\X0009at
java.lang.ClassLoader.loadClass(ClassLoader.java:323)\X000a\X0009at
java.lang.ClassLoader.loadClass(ClassLoader.java:316)\X000a\X0009at
java.net.FactoryURLClassLoader.loadClass(URLClassLoader.java:615)\X000a\X0009at
java.lang.ClassLoader.loadClass(ClassLoader.java:268)\X000a\X0009at
com.sun.star.comp.loader.RegistrationClassFinder.find(RegistrationClassFinder.java:55)\X000a\X0009at
com.sun.star.comp.loader.JavaLoader.writeRegistryInfo(JavaLoader.java:399)\X000a",
Context = (com.sun.star.uno.XInterface) @0 } }
My versions:
Ubuntu Saucy 13.10
LibreOffice 4.1.4
libreoffice-java-common is installed
Right now only individual rules or rule groups can be enabled and disabled on the commandline. Find a way to disable and enable whole categories: this is more difficult because categories have only names and types (which also could be used). But names may contain spaces, which means that they need to be escaped on the commandline, or we need to introduce IDs.
In verbose mode, the disabled/enabled categories and types should display also rule IDs that were enabled/disabled.
For a rule:
<token max="2">abc</token>
<token>dfg</token>
there will be two matches in the following sequence:
abc abc dfg
The first match:
abc abc dfg
The second match:
abc dfg
The second match should not occur, and we should not allow matching the rule as soon as the longest match is already found. The file that needs to be changed is PatternRuleMatcher.java, and there are tests that should pass in languagetool-core under resources for the demo (xx) language in the grammar.xml (the test with the first FIXME).
Requires: knowledge of Java
Rules loaded by JLanguageTool.activateDefaultPatternRules()
and JLanguageTool.activateDefaultFalseFriendRules()
should be cached outside the JLanguageTool
and be provided to it on creation.
The goal of this issue is to increase the speed and reduce the memory consumption of creating new JLanguageTool
s.
One choice is to move both methods to Language
. The caller is able to cache the different variants of a Language
then, e.g. "english with enabled pattern rules" and "english without pattern rules".
Various things must be taken care of:
JLanguageTool.addRule
, JLanguageTool.disableRule
, etc.) stored?JLanguageTool.addIgnoreWords
adds them on the rule)?Constraints:
JLanguageTool
must still work. The new mechanism just enables more possibilities to cache for an API-user.Add a real clickable link to LanguageTool's About dialog. Probably easy if you know Swing programming.
Requires: knowledge of Java/Swing
The code for is not complete in disambiguation rules, and they support this construct only partially. Bring PatternRule and DisambiguationRule in sync (move some methods from PatternRuleMatcher to AbstractPatternRulePerformer to use them in PatternRuleReplacer).
Stand-alone LT, version 2.3. AutoCheck not selected.
The text area has 1 line of text: Test the data.
Select 'CheckText' (either the icon, or from Text Checking).
The message area shows 3 messages, which are for text that was previously in the text area.
Select 'CheckText' again, and the correct messages appears in the message area.
LanguageTool-20131226-snapshot on Windows.
To copy text from the Tagger Result dialog is not possible.
Right now only the first token is matched in suggestions. for example, if tokens are:
<token max="2">abc</token>
<token>dfg</token>
Then the suggestion tag:
<suggestion>\1\2</suggestion>
will return:
abcdfg
for the input text "abc abc dfg" (note that only one abc is returned: the first occurrence).
We should have a way return both matched tokens. In the first case, we need two tokens with whitespace, as they are in the text (the same way it would work with:
<suggestion><match include_skipped="yes" no="1"/><match no="2/></suggestion>
In such a case, the rule should format the suggestion:
abc abcdfg
In the second case, we need to have a new attribute for the match
element, for example:
<suggestion><match whitespace="no" include_skipped="yes" no="1"/><match no="2/></suggestion>
The text of the suggestion would be:
abcabcdfg
Requires: the knowledge of Java
If you check a text on http://languagetool.org and there's no error in it (or LT doesn't find one), you'll get a popup that you need to click. Instead, that text that no error was found should be displayed somewhere near the check form using HTML, not as a popup that needs to be clicked.
Requires knowledge of Javascript/HTML
Any error detection rule in LanguageTool (i.e. the rules in grammar.xml
) can have an URL attached which provides more information about the error. Find a rule that could benefit from such a rule and add the URL. It should point to some reputable site where we can assume the URL will still be there in a few years.
Requires: knowledge of XML
The em dashes from the page:
http://en.wikipedia.org/wiki/August_22
are converted to Chinese (?) characters:
1777 舑 American Revolutionary War: British forces abandon the Siege of Fort Stanwix...
Some Unicode encoding failure?
The tagged text should also contain the disambiguation log (maybe optionally), which is now displayed in the verbose mode by the command line version.
Requires: knowledge of Java
On http://community.languagetool.org/ruleEditor/expert, you can press Ctrl-Return to submit the form. This is quite convenient if you re-check your texts a lot. Add the same feature for the main check form at http://languagetool.org/ and the language-specific sub pages.
Requires: a bit knowledge of Javascript
There may be typos in messages and rule titles, in particular in XML pattern rules. Create a check that would run LanguageTool on these messages. Note: some rule titles quote the error that they match, so the match of that very rule should be ignored.
Requires: knowledge of Java
Type some text and then immediately press Ctrl-T: the text will be tagged, but as the checking timer is still running, the tagging result will soon be overwritten by the text checking result.
Perhaps it would be possible to extrapolate XML rules out from a single mass into individual XML files?
We regularly create new XML rules, and it would speed up the process of submitting new rules, as it could be done through github, individually.
It also means people could host their own custom XML rule repositories, that can be bolted onto LanguageTool ad-hoc.
LT thinks 'The dog bark' is correct. Why is that?
Lt thinks 'The dogs barks' is wrong but 'The red dogs barks' is correct. Why is that?
I just had another case of HTTPServerLoadTest hanging, like #13. Unfortunately I made a mistake so I don't have a stacktrace this time. I'm now running the test again for a longer time, hopefully I can reproduce the issue.
Combining Unicode characters are not correctly handled by LanguageTool, at least not by the command line version of LT.
For example, the accentuated character in the French word "Café" can be written as:
They should both be treated as equivalent. However, the Hunspell rule of LT only accepts the U+00E9 character and not the U+0065 + U+0301.
Example:
$ echo "Café. Café." |
java -jar LanguageTool-2.4-SNAPSHOT/languagetool-commandline.jar -l fr -
Expected text language: French
Working on STDIN...
1.) Line 1, column 7, Rule ID: HUNSPELL_NO_SUGGEST_RULE
Message: Faute de frappe possible trouvée
Café. Café.
^^^^
Notice that the first word "Café" is recognized by LT, but the second word "Café" is signaled as a typo, yet it should also be correct just like the first word.
Furthermore, words highlighted after combining characters are highlighted at the wrong location:
Example:
$ echo "Café. Café. Foobar." |
java -jar ~/sb/languagetool/languagetool-standalone/target/LanguageTool-2.4-SNAPSHOT/LanguageTool-2.4-SNAPSHOT/languagetool-commandline.jar -l fr -
Expected text language: French
Working on STDIN...
1.) Line 1, column 7, Rule ID: HUNSPELL_NO_SUGGEST_RULE
Message: Faute de frappe possible trouvée
Café. Café. Foobar.
^^^^
2.) Line 1, column 14, Rule ID: HUNSPELL_NO_SUGGEST_RULE
Message: Faute de frappe possible trouvée
Café. Café. Foobar.
^^^^^^
Notice that the underline ^^^^^ under the word word "Foobar" is at the wrong location and LT also indicates the error at line 1, column 14, but it should be a line 1 column 13.
I'm using LT-2.4-SNAPSHOT (git rev dedc30d, from Fri Oct 11, 2013)
I suppose that this could be fixed by normalizing Unicode text before
checking it.
More info on combining Unicode characters and normalization:
http://en.wikipedia.org/wiki/Combining_characters
http://www.fileformat.info/info/unicode/char/301/index.htm
http://en.wikipedia.org/wiki/Unicode_normalization
License is mentioned on the website but should be included with the code.
In GUI and our commandline interface, we assume the platform's encoding. In Windows, it may not be UTF-8, but UTF-8 files are easy to detect (with BOM or without BOM). Add a simple fix to detect them properly.
Using the rule system, I'm under the impression that LanguageTool could be used for other purposes... Scaffolding XML rules in such a way could language tool parse any sort of string and apply all sorts of other rules to it, Is this correct?
For example could LanguageTool use XML rules to parse code style, sentence structure, basic mathematics etc?
Hello!
I dealt with the LT for Slovenian in my master's thesis and was advised to make a bug report regarding some of its issues so here it is:
Regards
mario
Extend the pattern rule test to make sure that the short message (<short>
in the grammar XML file), if any, is actually shorter than the other error message (<message>
).
Requires: knowledge of Java
Include sentences like:
She wants you to goes there.
I've been trying to called you.
She is manipulating her father to gets her way.
He tried very hard to lifted the rock.
But there are similar correct sentences:
The calendar her eyes kept straying to said it was December sixth.
But everyone I talked to said it was too risky.
Whatever college he wants to go to is fine with me.
It should be possible to add words to the dictionary stored in the same directory where the configuration resides (i.e., user home directory). The words would be add to ignored words when initalizing LT, also in the command-line mode. The file should be a simple text file, just like ignore.txt, and the same reading routine can be used.
The outstanding issue with Java WebStart is that it takes artifact languagetool-standalone with the default classpath while the compiled jars from this project have a custom layout.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.