During the last weeks I opened many issues and questions about gnverifier / gnames / r

repeatability of results: online (gnames apis) vs standalone tools (gndiff & gnparser) about gnverifier HOT 10 OPEN

abubelinha commented on June 12, 2024

repeatability of results: online (gnames apis) vs standalone tools (gndiff & gnparser)

from gnverifier.

Comments (10)

dimus commented on June 12, 2024 1

Usually I use formula: work/users_num

I think it is something that everybody who publishes their results would need, so I think it is not so much work in the end. I'll keep it open and close when the system is in place

from gnverifier.

dimus commented on June 12, 2024

It is indeed a problem. And it is not only code, because database evolves as well, although, it mostly stays backward compatible sofar. However, nothing prevents a situation where an important feature would break that backward compatibility. So I guess a solution would be

Figure out how to monitor database versioning (database actually is defined by this internal package https://github.com/gnames/gnidump), which is an equivalent of walking around the house alone in pajamas (no docs, bad architecture, no versions). So it would need to be improved. It would need to get to v1, and every time there is a breaking change in the database, increae major version number to v2, v3 etc.
Add version number to sql dump file at http://opendata.globalnames.org/dumps/
gnames version should return its own version + version of gnmatcher
Every major version of database dumps has one latest file (something like dump-v1.3.6, dump-v2.0.2)

That gives a theoretical possibility to put together verification system. Using particular version of gnames + gnmatcher + database.

It does not solve a problem of data changing all the time, but I think that in most cases for most data-sources data change is cumulative, so result should be close, albeit not identical sometimes.

from gnverifier.

abubelinha commented on June 12, 2024

Quite a lot of work.

So to be realistic, I think we are much closer to a day where I can create a replicable protocol using this combination:

my own draft list of problematic names
my own set of checklist datasources (i.e., dwc dump of my preferred sources, and extract the needed columns from them to feed gndiff/gnparser)
gndiff+gnparser CLI

All these are versionable, downloadable, easily citable and standalone executable.
I will closely follow gndiff evolution ;)

from gnverifier.

abubelinha commented on June 12, 2024

I think it is something that everybody who publishes their results would need, so I think it is not so much work in the end. I'll keep it open and close when the system is in place

Great. Not sure if you are now meaning gnverifier / gndiff option, or both. But any advances would be good as for "theorical" repeteability.

As for really practical, I think the gndiff approach is the only good one (it would be easy to replicate something as long as you use the same offline tools; but anybody would accomplish the task of replicating the whole gnames services as they were at some time in the past, just for reviewing goodness of a small experiment or checklist).

from gnverifier.

dimus commented on June 12, 2024

for gndiff it should be easy, it has no remote dependencies, so just its version defines the result

from gnverifier.

abubelinha commented on June 12, 2024

Yes I agree.
Version plus a given combination of request parameters, since it would be best to give users the option to define as much as possible the matching behaviour (of course with default values for everything, to avoid undesired CLI complexity).

Either that or using an editable default config file, so users can see default values and modifiy as needed.

from gnverifier.

abubelinha commented on June 12, 2024

Somehow related, but a bit off-topic.
I have seen some Zenodo links related to your work (i.e. https://doi.org/10.5281/zenodo.5111543). A couple of questions:

As that links back to github, I understand you prefer the Zenodo link to be cited. Correct?
Does Zenodo contain a full backup of the github project files by that time?
I wonder if or you needed to upload them all to Zenodo (perhaps there is some "auto-zenoding" tool for github projects that you can tell me?)
When still not in Zenodo, which would you say is the best way to citate a github project?
I am a bit lost because the above Zenodo url links back to https://github.com/gnames/gnverifier/tree/v0.3.3 , but I am not sure what "tree" and "v0.3.3" means in this context. What's the difference between tree v0.3.3 and realease v0.3.3? https://github.com/gnames/gnverifier/releases/tag/v0.3.3

Just looking for advice so I might decide to use github and/or zenodo for versioning a checklist in the future.

Thanks a lot in advance

from gnverifier.

dimus commented on June 12, 2024

Someone wanted to cite gnames, so I created Zenodo link for that purpose. Being lazy, I prefer to avoid unnecessary work, so I decided not to update these links, until someone requests a change again :)

from gnverifier.

abubelinha commented on June 12, 2024

OK. I thought you used some kind of auto-backup from github and zenodo.

As for the difference between github tree v.xxx and release v.xxx, do you have any opinion?

from gnverifier.

dimus commented on June 12, 2024

I these tree/vx.x.x and vx.x.x mean the same. In case of github links I usually use something like
https://github.com/gnames/gnverifier/releases/tag/v0.8.2

from gnverifier.

repeatability of results: online (gnames apis) vs standalone tools (gndiff & gnparser) about gnverifier HOT 10 OPEN

Comments (10)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent