Code Monkey home page Code Monkey logo

Comments (10)

dimus avatar dimus commented on June 12, 2024 1

Usually I use formula: work/users_num

I think it is something that everybody who publishes their results would need, so I think it is not so much work in the end. I'll keep it open and close when the system is in place

from gnverifier.

dimus avatar dimus commented on June 12, 2024

It is indeed a problem. And it is not only code, because database evolves as well, although, it mostly stays backward compatible sofar. However, nothing prevents a situation where an important feature would break that backward compatibility. So I guess a solution would be

  1. Figure out how to monitor database versioning (database actually is defined by this internal package https://github.com/gnames/gnidump), which is an equivalent of walking around the house alone in pajamas (no docs, bad architecture, no versions). So it would need to be improved. It would need to get to v1, and every time there is a breaking change in the database, increae major version number to v2, v3 etc.
  2. Add version number to sql dump file at http://opendata.globalnames.org/dumps/
  3. gnames version should return its own version + version of gnmatcher
  4. Every major version of database dumps has one latest file (something like dump-v1.3.6, dump-v2.0.2)

That gives a theoretical possibility to put together verification system. Using particular version of gnames + gnmatcher + database.

It does not solve a problem of data changing all the time, but I think that in most cases for most data-sources data change is cumulative, so result should be close, albeit not identical sometimes.

from gnverifier.

abubelinha avatar abubelinha commented on June 12, 2024

Quite a lot of work.

So to be realistic, I think we are much closer to a day where I can create a replicable protocol using this combination:

  • my own draft list of problematic names
  • my own set of checklist datasources (i.e., dwc dump of my preferred sources, and extract the needed columns from them to feed gndiff/gnparser)
  • gndiff+gnparser CLI

All these are versionable, downloadable, easily citable and standalone executable.
I will closely follow gndiff evolution ;)

from gnverifier.

abubelinha avatar abubelinha commented on June 12, 2024

I think it is something that everybody who publishes their results would need, so I think it is not so much work in the end. I'll keep it open and close when the system is in place

Great. Not sure if you are now meaning gnverifier / gndiff option, or both. But any advances would be good as for "theorical" repeteability.

As for really practical, I think the gndiff approach is the only good one (it would be easy to replicate something as long as you use the same offline tools; but anybody would accomplish the task of replicating the whole gnames services as they were at some time in the past, just for reviewing goodness of a small experiment or checklist).

from gnverifier.

dimus avatar dimus commented on June 12, 2024

for gndiff it should be easy, it has no remote dependencies, so just its version defines the result

from gnverifier.

abubelinha avatar abubelinha commented on June 12, 2024

Yes I agree.
Version plus a given combination of request parameters, since it would be best to give users the option to define as much as possible the matching behaviour (of course with default values for everything, to avoid undesired CLI complexity).

Either that or using an editable default config file, so users can see default values and modifiy as needed.

from gnverifier.

abubelinha avatar abubelinha commented on June 12, 2024

Somehow related, but a bit off-topic.
I have seen some Zenodo links related to your work (i.e. https://doi.org/10.5281/zenodo.5111543). A couple of questions:

  • As that links back to github, I understand you prefer the Zenodo link to be cited. Correct?
  • Does Zenodo contain a full backup of the github project files by that time?
    I wonder if or you needed to upload them all to Zenodo (perhaps there is some "auto-zenoding" tool for github projects that you can tell me?)
  • When still not in Zenodo, which would you say is the best way to citate a github project?
    I am a bit lost because the above Zenodo url links back to https://github.com/gnames/gnverifier/tree/v0.3.3 , but I am not sure what "tree" and "v0.3.3" means in this context. What's the difference between tree v0.3.3 and realease v0.3.3? https://github.com/gnames/gnverifier/releases/tag/v0.3.3

Just looking for advice so I might decide to use github and/or zenodo for versioning a checklist in the future.

Thanks a lot in advance

from gnverifier.

dimus avatar dimus commented on June 12, 2024

Someone wanted to cite gnames, so I created Zenodo link for that purpose. Being lazy, I prefer to avoid unnecessary work, so I decided not to update these links, until someone requests a change again :)

from gnverifier.

abubelinha avatar abubelinha commented on June 12, 2024

OK. I thought you used some kind of auto-backup from github and zenodo.

As for the difference between github tree v.xxx and release v.xxx, do you have any opinion?

from gnverifier.

dimus avatar dimus commented on June 12, 2024

I these tree/vx.x.x and vx.x.x mean the same. In case of github links I usually use something like
https://github.com/gnames/gnverifier/releases/tag/v0.8.2

from gnverifier.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.