There may be typos in messages and rule titles, in particular in XML pattern rules. Cr

<a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="/us

Create a JUnit test that would check if the messages and titles of rules in LanguageTool are correct,about languagetool-org/languagetool

Comments (18)

czojo26 commented on May 22, 2024 1

Hi, I am Michał and I would like to contribute to this project. This is the first problem I would like to solve,
but I am not sure where this JUnit should be placed. In which module such test would fit?

from languagetool.

danielnaber commented on May 22, 2024

Thanks for your interest in LT. You could place the test in languagetool-standalone, like this test: https://github.com/languagetool-org/languagetool/blob/master/languagetool-standalone/src/test/java/org/languagetool/JLanguageToolTest.java#L40

from languagetool.

janschreiber commented on May 22, 2024

Maybe this excellent idea can be expanded to run checks on the (supposedly) correct example sentences as well?

from languagetool.

EgorNemchinov commented on May 22, 2024

Hello! May I take this issue?

from languagetool.

danielnaber commented on May 22, 2024

Hello! May I take this issue?

Sure! Let us know here or on the forum if you have questions.

from languagetool.

EgorNemchinov commented on May 22, 2024

@danielnaber, My questins are rather concerning the understanding of LT basics.
There are rules in languagetool-core, which extend "Rule" class. What is title of a rule here? What does "in particular in XML pattern rules" mean? I want to understand what does title and message mean applied to Rule class.
And the task is to apply all rules to titles of each of these rules?

There's a lot of documentation, it would be wonderful if you could guide me what to explore firstly.
Any information would be appreciated. Thank you

from languagetool.

danielnaber commented on May 22, 2024

title is a string shown in the configuration dialog where users can enable/disable rules. message is the string (often with a variable) that is shown to the user when a potential error is found.

And the task is to apply all rules to titles of each of these rules?

Basically yes, but as titles are not complete sentence, we need to see if it makes sense. Maybe some rules are just not useful for this use case.

from languagetool.

EgorNemchinov commented on May 22, 2024

All right, I can tell a little about my intermediate results. I analysed descriptions of English rules with the rules themselves.

There are cases, when this approach finds errors, for example:
Phrase: Space character at the begin of paragraph
Rule: a/the + infinitive

Yes, often rules that must be applied to whole sentence don't make sense here (like UPPERCASE_SENTENCE_START, SENTENCE_FRAGMENT). But let's suppose we can filter the rules by some category (I haven't explored embedded categories yet).

There is a problem with indeterminacy of using brackets and overall pattern of writing bad and good samples in titles.
Sometimes brackets are used to explain the rule:

Example №1: Who + verb (who know's/knows)
Example №2: whos NN (possessive)
But mostly brackets are used to show the right way:
Example №1: could of (could have)
Example №2: must be do (done)
Similar thing applies to quotes, here are different use cases:
Example №1: we'Re' (we're) etc
Example №2: Replace '12 pm' with 'noon'
Example №3:Agreement: 'I is / you is / ... ' (at sentence start only)

I'll continue exploring this, but for now it's clear there are a lot of cases to be considered
By the way, It seems to me that some rules are incorrectly applied, but I'll look into that more carefully

from languagetool.

danielnaber commented on May 22, 2024

Thanks for the update. It probably makes sense to focus on messages first, and care about titles later.

from languagetool.

EgorNemchinov commented on May 22, 2024

Hey! Sorry, last few days had been at a hackathon.
May I ask for an advice? How should I extract message from Rule object?
RuleMatch has .getMessage() method, but Rule doesn't.
Am I missing something obvious? Should I look into rules in XML format?

from languagetool.

danielnaber commented on May 22, 2024

The message indeed doesn't depend on the rule but on the specific match. You can load the rules using org.languagetool.rules.patterns.PatternRuleLoader, each rule has at least one incorrect example which you can run to get a match with its message. (Not sure now if you even need PatternRuleLoader or whether you can iterate the rules of a language.)

from languagetool.

EgorNemchinov commented on May 22, 2024

Thanks, will try!

from languagetool.

EgorNemchinov commented on May 22, 2024

Even though there is a lot of noise, i.e. found RuleMatches aren't really caused by mistake, but rather based on Rules' messages properties, there is some signal.
Also there are a lot of repeated whitespaces, unpaired brackets and suggestions to replace simple quotes with smart ones.
What I did is run all rules on a Rule message and excluded matches that were in the . Also I need to exclude the ones in single quotation marks

For example

The term 'Anglo-Saxon' is generally used to describe 'a member of any of the West Germanic tribes
Message: Consider simply using of instead
- in the English speaking world*
  Message: Did you mean the adjective English-speaking?

So, if we disable some rules and apply some conditions - it might be somewhat sensible, I can try.

from languagetool.

danielnaber commented on May 22, 2024

Sounds useful. This is probably not something that will run on every test run, but maybe every few months, or before release. And someone will need to look at it anyway.

from languagetool.

EgorNemchinov commented on May 22, 2024

Yeah, I agree. So, how do you see it? Should I just write a test? Then how do achieve that it's not run each time?

from languagetool.

danielnaber commented on May 22, 2024

Write a test and use the @Ignore notation.

from languagetool.

ales-blaze commented on May 22, 2024

May i work on this issue?

from languagetool.

danielnaber commented on May 22, 2024

@ales-blaze Sure, feel free to give it a try.

from languagetool.

Create a JUnit test that would check if the messages and titles of rules in LanguageTool are correct about languagetool HOT 18 CLOSED

Comments (18)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent