Comments (5)
Thanks @Adafede for the feedback, great to know you are monitoring results, and help me to fix problems!
Do you mean the space after hybrid sign? The reason there is no such space is my desire to provide results "according to" a source, and in the vast majority of cases I do not "improve" verbatim name-strings provided by data-sources. For example, as you know IndexFungorum provides years with the fungi names, and I provide names as they are, even if it is not accoring to ICN. I feel it gives me a better excuse that a match happened as it was intended by the author of the source. I do change names when I know that otherwise they wont be matched at all. For example I decapitalize specific epithets if I see them capitalized in the source, because parser will chop such epithets into unparsed tail or interpet them as an author.
from gnverifier.
Hi,
I mean the additional space after the ×
(hybrid sign).
I totally understand your desire and would also prefer it that way (not modifying the source).
Currently, in the example I gave at least, it does not seem to be the case.
"matchedName": "Citrus ×aurantiifolia (Christm.) Swingle Swingle (Christm.)", # OK, what I expect
"matchedCanonicalSimple": "Citrus aurantiifolia", # OK, what I expect
"matchedCanonicalFull": "Citrus × aurantiifolia", # Here, there is an unexpected additional space
"currentName": "Citrus ×aurantiifolia (Christm.) Swingle Swingle (Christm.)", # OK, what I expect
"currentCanonicalSimple": "Citrus aurantiifolia", # OK, what I expect
"currentCanonicalFull": "Citrus × aurantiifolia", # Here, there is an unexpected additional space
from gnverifier.
I do have a problem of answering before I fully understand a question,
sorry about that :)
The purpose of "canonical forms" is to normalize a name, so these forms make quite a lot of "improvements" to make different lexical variants of names comparable and searchable. One obvious problem with writing hybrid sign together with specific epithet is interpretation of it through retyping, OCR errors as 'x' and "creating" species Citrus xaurantiifolia
, so normalization adds a space to avoid such interpretations.
Is there a "non-lexical" difference between Citrus × aurantiifolia
and Citrus ×aurantiifolia
and the normalization is wrong? In this case I need to change parser to reflect it.
from gnverifier.
Don't worry, now we are on the same line, everything clear, thank you!
No "non-lexical" difference that I know, I am simply using a lot Wikidata, and the two guidelines I found (https://www.wikidata.org/wiki/Wikidata_talk:WikiProject_Taxonomy/Archive/2015/07#Hybrids and https://www.wikidata.org/wiki/Wikidata_talk:WikiProject_Taxonomy/Archive/2020/10#Spaces_in_hybrid_name) about Citrus × aurantiifolia
or Citrus ×aurantiifolia
agreed on the second, while you seem to favor the first one for the canonical form.
As said, I don't argue one is more correct, just wanted to ask if you had a reason for it, and your OCR one is understandable.
I can keep changing the ×
to ×
's, it is no issue, just wanted to modify the least possible sources as you also mentioned. When I need to implement something downstream I took the habit of reporting upstream to see if it eventually benefits other people or would deserve attention. 😊
from gnverifier.
When I need to implement something downstream I took the habit of reporting upstream to see if it eventually benefits other people or would deserve attention.
Great habit! Your issues are always very helpful/insightful
from gnverifier.
Related Issues (20)
- doubtful entries in GBIF HOT 6
- Advanced search: filter on taxonomic rank HOT 18
- Prepare gnverifier to v1.0.0 release HOT 5
- Improve uBio presence in gnverifier
- new datasources of fungal names HOT 2
- As a Developer I want to refactor the code to a better file structure
- Update list of data-sources given in web-UI
- brew v 1.0.2 fails HOT 2
- As a User I want to see results for exact name_string
- As a User I want to see a widget for a particullar name-string
- No fuzzy matching? HOT 3
- Post return incorrect name HOT 1
- Updating datasets, iNaturalist and VASCAN in particular HOT 2
- include prokaryotic names (e.g., from LPSN) as a source in the verifier HOT 7
- Add TaxonomicStatus to results
- Add a Relaxed fuzzy match option
- Add explanatin in README how fuzzy matching works
- Make use of TaxonomicStatus field from DarwinCore to determine taxonomic status when possible
- api v0 link not working HOT 1
- Return 'vernacular' field to present common names provided by a data source for a particular match. HOT 1
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from gnverifier.