Comments (18)
Hi, I am Michał and I would like to contribute to this project. This is the first problem I would like to solve,
but I am not sure where this JUnit should be placed. In which module such test would fit?
from languagetool.
Thanks for your interest in LT. You could place the test in languagetool-standalone
, like this test: https://github.com/languagetool-org/languagetool/blob/master/languagetool-standalone/src/test/java/org/languagetool/JLanguageToolTest.java#L40
from languagetool.
Maybe this excellent idea can be expanded to run checks on the (supposedly) correct example sentences as well?
from languagetool.
Hello! May I take this issue?
from languagetool.
Hello! May I take this issue?
Sure! Let us know here or on the forum if you have questions.
from languagetool.
@danielnaber, My questins are rather concerning the understanding of LT basics.
There are rules in languagetool-core, which extend "Rule" class. What is title of a rule here? What does "in particular in XML pattern rules" mean? I want to understand what does title and message mean applied to Rule class.
And the task is to apply all rules to titles of each of these rules?
There's a lot of documentation, it would be wonderful if you could guide me what to explore firstly.
Any information would be appreciated. Thank you
from languagetool.
title
is a string shown in the configuration dialog where users can enable/disable rules. message
is the string (often with a variable) that is shown to the user when a potential error is found.
And the task is to apply all rules to titles of each of these rules?
Basically yes, but as title
s are not complete sentence, we need to see if it makes sense. Maybe some rules are just not useful for this use case.
from languagetool.
All right, I can tell a little about my intermediate results. I analysed descriptions of English rules with the rules themselves.
There are cases, when this approach finds errors, for example:
Phrase: Space character at the begin of paragraph
Rule: a/the + infinitive
Yes, often rules that must be applied to whole sentence don't make sense here (like UPPERCASE_SENTENCE_START, SENTENCE_FRAGMENT). But let's suppose we can filter the rules by some category (I haven't explored embedded categories yet).
There is a problem with indeterminacy of using brackets and overall pattern of writing bad and good samples in titles.
Sometimes brackets are used to explain the rule:
- Example №1: Who + verb (who know's/knows)
- Example №2: whos NN (possessive)
But mostly brackets are used to show the right way: - Example №1: could of (could have)
- Example №2: must be do (done)
Similar thing applies to quotes, here are different use cases: - Example №1: we'Re' (we're) etc
- Example №2: Replace '12 pm' with 'noon'
- Example №3:Agreement: 'I is / you is / ... ' (at sentence start only)
I'll continue exploring this, but for now it's clear there are a lot of cases to be considered
By the way, It seems to me that some rules are incorrectly applied, but I'll look into that more carefully
from languagetool.
Thanks for the update. It probably makes sense to focus on messages first, and care about titles later.
from languagetool.
Hey! Sorry, last few days had been at a hackathon.
May I ask for an advice? How should I extract message from Rule object?
RuleMatch
has .getMessage()
method, but Rule
doesn't.
Am I missing something obvious? Should I look into rules in XML format?
from languagetool.
The message indeed doesn't depend on the rule but on the specific match. You can load the rules using org.languagetool.rules.patterns.PatternRuleLoader
, each rule has at least one incorrect example which you can run to get a match with its message. (Not sure now if you even need PatternRuleLoader
or whether you can iterate the rules of a language.)
from languagetool.
Thanks, will try!
from languagetool.
Even though there is a lot of noise, i.e. found RuleMatches aren't really caused by mistake, but rather based on Rules' messages properties, there is some signal.
Also there are a lot of repeated whitespaces, unpaired brackets and suggestions to replace simple quotes with smart ones.
What I did is run all rules on a Rule message and excluded matches that were in the . Also I need to exclude the ones in single quotation marks
For example
- The term 'Anglo-Saxon' is generally used to describe 'a member of any of the West Germanic tribes
Message: Consider simply using of instead -
- in the English speaking world*
Message: Did you mean the adjective English-speaking?
- in the English speaking world*
So, if we disable some rules and apply some conditions - it might be somewhat sensible, I can try.
from languagetool.
Sounds useful. This is probably not something that will run on every test run, but maybe every few months, or before release. And someone will need to look at it anyway.
from languagetool.
Yeah, I agree. So, how do you see it? Should I just write a test? Then how do achieve that it's not run each time?
from languagetool.
Write a test and use the @Ignore
notation.
from languagetool.
May i work on this issue?
from languagetool.
@ales-blaze Sure, feel free to give it a try.
from languagetool.
Related Issues (20)
- Add Only Office plugin on the list of office plugins on the Website HOT 1
- [en] Missing apostrophe false positive in "Execute shell command"
- Color parameter outside of expected range: Green HOT 3
- [en] The word "Belorussian" is incorrect HOT 1
- [en] an Xcode project false positive HOT 1
- Crash introduced in LT Plugin 6.3.1 HOT 20
- [DE] Fehlalarm bei wörtlicher Rede HOT 1
- [de] agreement false positive with "entzückend chaotisch" HOT 1
- Rejected Word: LANCOM / Abgelehntes Wort: LANCOM HOT 1
- [de] Wrong correction with rule ARTNOMSIN_SUBNOMSIN_VERPLU HOT 4
- [en] Agreement false positive when the subject has "between X and Y" HOT 1
- [en] Don't suggest a period after /etc
- [DE] Fehlalarlm mit widersprüchigen Fehlerbeschreibung HOT 1
- LT stops working with custom User Agent enabled.
- Rename `allButTextLevelOnly` mode to `paragraphLevelOnly` HOT 3
- Security vulnerabilities in LanguageTool 6.3a
- Dark mode is pretty useless HOT 2
- Wi
- [en] Maven broken build, LanguageTool-20240216-snapshot.zip testrules error HOT 20
- [DE] Fehlalarm mit "das ist heiliges Gebiet" im Nebensatz HOT 3
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from languagetool.