Code Monkey home page Code Monkey logo

Comments (18)

czojo26 avatar czojo26 commented on May 22, 2024 1

Hi, I am Michał and I would like to contribute to this project. This is the first problem I would like to solve,
but I am not sure where this JUnit should be placed. In which module such test would fit?

from languagetool.

danielnaber avatar danielnaber commented on May 22, 2024

Thanks for your interest in LT. You could place the test in languagetool-standalone, like this test: https://github.com/languagetool-org/languagetool/blob/master/languagetool-standalone/src/test/java/org/languagetool/JLanguageToolTest.java#L40

from languagetool.

janschreiber avatar janschreiber commented on May 22, 2024

Maybe this excellent idea can be expanded to run checks on the (supposedly) correct example sentences as well?

from languagetool.

EgorNemchinov avatar EgorNemchinov commented on May 22, 2024

Hello! May I take this issue?

from languagetool.

danielnaber avatar danielnaber commented on May 22, 2024

Hello! May I take this issue?

Sure! Let us know here or on the forum if you have questions.

from languagetool.

EgorNemchinov avatar EgorNemchinov commented on May 22, 2024

@danielnaber, My questins are rather concerning the understanding of LT basics.
There are rules in languagetool-core, which extend "Rule" class. What is title of a rule here? What does "in particular in XML pattern rules" mean? I want to understand what does title and message mean applied to Rule class.
And the task is to apply all rules to titles of each of these rules?

There's a lot of documentation, it would be wonderful if you could guide me what to explore firstly.
Any information would be appreciated. Thank you

from languagetool.

danielnaber avatar danielnaber commented on May 22, 2024

title is a string shown in the configuration dialog where users can enable/disable rules. message is the string (often with a variable) that is shown to the user when a potential error is found.

And the task is to apply all rules to titles of each of these rules?

Basically yes, but as titles are not complete sentence, we need to see if it makes sense. Maybe some rules are just not useful for this use case.

from languagetool.

EgorNemchinov avatar EgorNemchinov commented on May 22, 2024

All right, I can tell a little about my intermediate results. I analysed descriptions of English rules with the rules themselves.

There are cases, when this approach finds errors, for example:
Phrase: Space character at the begin of paragraph
Rule: a/the + infinitive

Yes, often rules that must be applied to whole sentence don't make sense here (like UPPERCASE_SENTENCE_START, SENTENCE_FRAGMENT). But let's suppose we can filter the rules by some category (I haven't explored embedded categories yet).

There is a problem with indeterminacy of using brackets and overall pattern of writing bad and good samples in titles.
Sometimes brackets are used to explain the rule:

  • Example №1: Who + verb (who know's/knows)
  • Example №2: whos NN (possessive)
    But mostly brackets are used to show the right way:
  • Example №1: could of (could have)
  • Example №2: must be do (done)
    Similar thing applies to quotes, here are different use cases:
  • Example №1: we'Re' (we're) etc
  • Example №2: Replace '12 pm' with 'noon'
  • Example №3:Agreement: 'I is / you is / ... ' (at sentence start only)

I'll continue exploring this, but for now it's clear there are a lot of cases to be considered
By the way, It seems to me that some rules are incorrectly applied, but I'll look into that more carefully

from languagetool.

danielnaber avatar danielnaber commented on May 22, 2024

Thanks for the update. It probably makes sense to focus on messages first, and care about titles later.

from languagetool.

EgorNemchinov avatar EgorNemchinov commented on May 22, 2024

Hey! Sorry, last few days had been at a hackathon.
May I ask for an advice? How should I extract message from Rule object?
RuleMatch has .getMessage() method, but Rule doesn't.
Am I missing something obvious? Should I look into rules in XML format?

from languagetool.

danielnaber avatar danielnaber commented on May 22, 2024

The message indeed doesn't depend on the rule but on the specific match. You can load the rules using org.languagetool.rules.patterns.PatternRuleLoader, each rule has at least one incorrect example which you can run to get a match with its message. (Not sure now if you even need PatternRuleLoader or whether you can iterate the rules of a language.)

from languagetool.

EgorNemchinov avatar EgorNemchinov commented on May 22, 2024

Thanks, will try!

from languagetool.

EgorNemchinov avatar EgorNemchinov commented on May 22, 2024

Even though there is a lot of noise, i.e. found RuleMatches aren't really caused by mistake, but rather based on Rules' messages properties, there is some signal.
Also there are a lot of repeated whitespaces, unpaired brackets and suggestions to replace simple quotes with smart ones.
What I did is run all rules on a Rule message and excluded matches that were in the . Also I need to exclude the ones in single quotation marks

For example

  1. The term 'Anglo-Saxon' is generally used to describe 'a member of any of the West Germanic tribes
    Message: Consider simply using of instead
    • in the English speaking world*
      Message: Did you mean the adjective English-speaking?

So, if we disable some rules and apply some conditions - it might be somewhat sensible, I can try.

from languagetool.

danielnaber avatar danielnaber commented on May 22, 2024

Sounds useful. This is probably not something that will run on every test run, but maybe every few months, or before release. And someone will need to look at it anyway.

from languagetool.

EgorNemchinov avatar EgorNemchinov commented on May 22, 2024

Yeah, I agree. So, how do you see it? Should I just write a test? Then how do achieve that it's not run each time?

from languagetool.

danielnaber avatar danielnaber commented on May 22, 2024

Write a test and use the @Ignore notation.

from languagetool.

ales-blaze avatar ales-blaze commented on May 22, 2024

May i work on this issue?

from languagetool.

danielnaber avatar danielnaber commented on May 22, 2024

@ales-blaze Sure, feel free to give it a try.

from languagetool.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.