Code Monkey home page Code Monkey logo

Comments (5)

dimus avatar dimus commented on June 3, 2024

Thanks @Adafede for the feedback, great to know you are monitoring results, and help me to fix problems!

Do you mean the space after hybrid sign? The reason there is no such space is my desire to provide results "according to" a source, and in the vast majority of cases I do not "improve" verbatim name-strings provided by data-sources. For example, as you know IndexFungorum provides years with the fungi names, and I provide names as they are, even if it is not accoring to ICN. I feel it gives me a better excuse that a match happened as it was intended by the author of the source. I do change names when I know that otherwise they wont be matched at all. For example I decapitalize specific epithets if I see them capitalized in the source, because parser will chop such epithets into unparsed tail or interpet them as an author.

from gnverifier.

Adafede avatar Adafede commented on June 3, 2024

Hi,

I mean the additional space after the × (hybrid sign).
I totally understand your desire and would also prefer it that way (not modifying the source).
Currently, in the example I gave at least, it does not seem to be the case.

      "matchedName": "Citrus ×aurantiifolia (Christm.) Swingle Swingle (Christm.)", # OK, what I expect
      "matchedCanonicalSimple": "Citrus aurantiifolia", # OK, what I expect
      "matchedCanonicalFull": "Citrus × aurantiifolia", # Here, there is an unexpected additional space
      "currentName": "Citrus ×aurantiifolia (Christm.) Swingle Swingle (Christm.)", # OK, what I expect
      "currentCanonicalSimple": "Citrus aurantiifolia", # OK, what I expect
      "currentCanonicalFull": "Citrus × aurantiifolia", # Here, there is an unexpected additional space

from gnverifier.

dimus avatar dimus commented on June 3, 2024

I do have a problem of answering before I fully understand a question,
sorry about that :)

The purpose of "canonical forms" is to normalize a name, so these forms make quite a lot of "improvements" to make different lexical variants of names comparable and searchable. One obvious problem with writing hybrid sign together with specific epithet is interpretation of it through retyping, OCR errors as 'x' and "creating" species Citrus xaurantiifolia, so normalization adds a space to avoid such interpretations.

Is there a "non-lexical" difference between Citrus × aurantiifolia and Citrus ×aurantiifolia and the normalization is wrong? In this case I need to change parser to reflect it.

from gnverifier.

Adafede avatar Adafede commented on June 3, 2024

Don't worry, now we are on the same line, everything clear, thank you!

No "non-lexical" difference that I know, I am simply using a lot Wikidata, and the two guidelines I found (https://www.wikidata.org/wiki/Wikidata_talk:WikiProject_Taxonomy/Archive/2015/07#Hybrids and https://www.wikidata.org/wiki/Wikidata_talk:WikiProject_Taxonomy/Archive/2020/10#Spaces_in_hybrid_name) about Citrus × aurantiifolia or Citrus ×aurantiifolia agreed on the second, while you seem to favor the first one for the canonical form.

As said, I don't argue one is more correct, just wanted to ask if you had a reason for it, and your OCR one is understandable.
I can keep changing the × to ×'s, it is no issue, just wanted to modify the least possible sources as you also mentioned. When I need to implement something downstream I took the habit of reporting upstream to see if it eventually benefits other people or would deserve attention. 😊

from gnverifier.

dimus avatar dimus commented on June 3, 2024

When I need to implement something downstream I took the habit of reporting upstream to see if it eventually benefits other people or would deserve attention.

Great habit! Your issues are always very helpful/insightful

from gnverifier.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.