Code Monkey home page Code Monkey logo

Comments (18)

dimus avatar dimus commented on June 12, 2024 1

updating gnverifier via brew or github download should also do the trick

from gnverifier.

Adafede avatar Adafede commented on June 12, 2024

I think what you are looking for is: https://github.com/gnames/gnverifier#advanced-search-query-language

You can limit the parents, or look for genera or lower taxa, but not filter for a specific rank afaik

from gnverifier.

larsgw avatar larsgw commented on June 12, 2024

You can [...] not filter for a specific rank afaik

Yes, that's what I meant. I was hoping there would be some (hidden) way to do so but I guess rank matching is only used when the rank is implied (with binomial names & subspecific ranks)?

from gnverifier.

larsgw avatar larsgw commented on June 12, 2024

Could I help with adding support for this in some way?

from gnverifier.

dimus avatar dimus commented on June 12, 2024

Hi @larsgw @Adafede, I am back from vacation now. @larsgw you are right, the rank match is implicit.

Adding a constraint by rank should possible, but I do wonder how often such a usecase would be useful. I can see your example, but I would assume it is a quite rare situation? Let me know if I am wrong.

Could I help with adding support for this in some way?

Adding bug reports, suggesting new features, letting us know about useful public data sources, discussing ideas for further development, citing the app are all helpful. And for adventurous enough, contributing code to "scratch an itch" is the best help :)

from gnverifier.

larsgw avatar larsgw commented on June 12, 2024

Adding a constraint by rank should possible, but I do wonder how often such a usecase would be useful. I can see your example, but I would assume it is a quite rare situation? Let me know if I am wrong.

I have encountered other situations that could be solved by the same solution:

  • Errors in GBIF and CoL taxa, like the "genera" Asilidae or Scarabaeoidea returning before the respective family and superfamily. Ideally those data errors should just be fixed but this is blocking my work.
  • Subgenera with the same name as the parent genus (i.e. Genus s.s.)

And for adventurous enough, contributing code to "scratch an itch" is the best help :)

That's what I was suggesting, but I haven't had any luck navigating the various repositories involved in this. (I also don't think I can set up a testing environment at the moment due to limited disk space)

from gnverifier.

larsgw avatar larsgw commented on June 12, 2024

My use case is that I have manually extracted lists of taxa (with their taxonomic ranks) from older and newer sources, and I want to match these taxa to GBIF. https://twitter.com/larswillighagen/status/1557875955301056512

(I have the feeling that the results now are a bit worse than a few months ago, sometimes adding the author & year does not seem to change the top result even if there's an exact match, but that's only for a few taxa)

from gnverifier.

dimus avatar dimus commented on June 12, 2024

And for adventurous enough, contributing code to "scratch an itch" is the best help :)

That's what I was suggesting, but I haven't had any luck navigating the various repositories involved in this. (I also don't think I can set up a testing environment at the moment due to limited disk space)

I'll be happy to help to set the gnresolver development env. when/if you will be ready to help with the code. There is also a way to help with a limited disk space as well by modifying advanced search query library https://github.com/gnames/gnquery to include rank.

from gnverifier.

dimus avatar dimus commented on June 12, 2024

(I have the feeling that the results now are a bit worse than a few months ago, sometimes adding the author & year does not seem to change the top result even if there's an exact match, but that's only for a few taxa)

I'd be curious to know learn more about this, I work with CoL guys @yroskov and @gdower and I alert them about possible issues. CoL went through big changes recently and got integrated more with GBIF backbone taxonomy.

from gnverifier.

larsgw avatar larsgw commented on June 12, 2024

(I have the feeling that the results now are a bit worse than a few months ago, sometimes adding the author & year does not seem to change the top result even if there's an exact match, but that's only for a few taxa)

I'd be curious to know learn more about this, I work with CoL guys @yroskov and @gdower and I alert them about possible issues. CoL went through big changes recently and got integrated more with GBIF backbone taxonomy.

Hm, I looked into it a bit more (I was rushing a bit when I worked on it last week), and that aspect might be my fault.

from gnverifier.

larsgw avatar larsgw commented on June 12, 2024

Hm, I looked into it a bit more (I was rushing a bit when I worked on it last week), and that aspect might be my fault.

Yep, I hadn't set up unit tests for the custom name parsing code and accidentally introduced a regression that seems to have discarded all author names in processing. I have the input still so it's fine, but oops.

from gnverifier.

gdower avatar gdower commented on June 12, 2024

It's possible that Systema Dipterorum updated Asilidae cristatus recently and the changes just haven't made it into CoL and GBIF yet. In the Systema Dipterorum V3.8 data it is Asilidae cristatus.

from gnverifier.

dimus avatar dimus commented on June 12, 2024

@larsgw, I thought more about rank constraint, and I do not think it is a good idea. Ranks are not normalized and are a mess of a variety of strings, sometimes they are given, and sometimes not, so such an option will create misunderstanding of results and a confusion. Sorry about that.

I guess a postprocessing of the results is the best we can do at this point.

from gnverifier.

larsgw avatar larsgw commented on June 12, 2024

Ah I see as well that the JSON output does (kind of, in the classificationRanks field) show the rank of the result, while this isn't available in the CSV. That's enough for me to continue, thank you very much.

from gnverifier.

dimus avatar dimus commented on June 12, 2024

I did move gnverifier and gnfinder to v1.0.0. If you use API, then changing /api/v0/ to /api/v1/ in API URL is needed for scripts to work again

from gnverifier.

larsgw avatar larsgw commented on June 12, 2024

Thank you for letting me know. I noticed that the CLI broke for me as well until I uncommented the API URL setting in the config file.

from gnverifier.

larsgw avatar larsgw commented on June 12, 2024

Actually classificationRanks does not work because it contains the classification of the accepted name, so a species name that is now a synonym for a subspecies is seen as a rank mismatch. Some other examples that I am running into:

  • "Diptera" matches the former genus Diptera Borkh., synonym of Saxifraga L.
  • "Nomada cypria Mavromoustakis" matches Nomada (the genus) first. This can be solved with the cardinality score though, and I cannot reproduce it anymore today.

from gnverifier.

larsgw avatar larsgw commented on June 12, 2024

I've improved my algorithm to detect short common prefixes. Before it already noticed that Diptera Borkh. was unusual because of the short common prefix of the classificationPath, but now it can figure out that (although "Diptera" is listed first) the actual most likely intended common prefix is Animalia|Arthropoda|Insecta|Diptera, and grab the Diptera result that matches it from the list of results.

There are still some other issues: "Mycetophilidae" first matches Mycetophilidae MyceoIntGen [sic] instead of the family. Because the genus is not a synonym it's possible to reliably get the rank and compare to my own data, but that probably won't always be the case.

from gnverifier.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.