Code Monkey home page Code Monkey logo

Comments (14)

olemke avatar olemke commented on July 23, 2024 1

@simonpf I'm in the process of downloading the v1.1.0 from Zenodo and putting it on the ftp.

from arts.

olemke avatar olemke commented on July 23, 2024 1

@olemke , I cannot find any official data volume policy on github CI. Is there something I am missing? It's about a 1Gb extra to download per merge/push/PR to add this data. I think this is fine, but it's getting to the level (i.e., a round number) where knowing the policy is good.

Couldn't find anything in the Github docs on CI bandwidth limits either. However, considering that setting up CI environments can already download quite some data (packages, docker containers or conda environments), it's probably not an issue to have a few more data files to download. However, we should still consciously be aware that every file is downloaded for every single CI job, which multiplies bandwidth demand and thus make an effort to keep the number files we need to download to the minimum necessary.

from arts.

erikssonpatrick avatar erikssonpatrick commented on July 23, 2024

Not sure why feel that you need to ask. The test data in ARTS is already a mess. Some in controlfiles/testdata, but still many folders contain local test data. So you will ruin anything!
Or do you worry about file sizes? A bit of TRO data I assume is no problem. But what about ARO? I assume you want that as well.
Or shall we make ARTS dependent on the database? A bit hard already to use ARTS without arts-cat-data and arts-xml-data. So one more data repository does not feel as a big deal.
Sorry if more questions than answers.

from arts.

simonpf avatar simonpf commented on July 23, 2024

Thanks, @erikssonpatrick , I included you in the discussion mainly to provide more substance to my request. But also the discussion regarding the relation between ARTS and the SSDB is probably worth having.

However, what I need right now is the ability to efficiently and flexibly load scattering data in a way that is non-obsolete and reproducible. I think the easiest way to achieve this would be to add some data from the SSDB to the ARTS testdata.

from arts.

riclarsson avatar riclarsson commented on July 23, 2024

I would prefer this data to be downloaded on the fly for the tests. Is there a server holding it with relatively good granularity of files and really good chance to be up at all times? For the XML and CAT data, we have the mechanisms to download on the fly in place. For SVN stuff, give me a link if this is how the SSDB stuff is stored and I can arrange for it to work there as well. Otherwise python provides download tools we can use

from arts.

erikssonpatrick avatar erikssonpatrick commented on July 23, 2024

The way we distribute the data are by Zenodo (https://doi.org/10.5281/zenodo.1175572). And we have only the code behind the data in repositories, the data are too large for that. Don't know if there is a solution. Let's discuss at next ARTS meeting.
@simonpf Go ahead and add some testdata, so this does not delay you. Progress in the coding is priority 1, 2 and 3. How to handle the data we can sort out later.

from arts.

simonpf avatar simonpf commented on July 23, 2024

@erikssonpatrick This is what I am trying to do, but since this is not a 1-man-project, it probably makes sense to coordinate with the other developers.

@riclarsson @olemke Can you comment on what you consider the best way of including test data? I see at least two different folders with test data. I thought that the approach you took with the catalog data in tests/testdata looked quite nice. Then it would enough to just replace the scattering data in the ARTS XML data with the SSDB data.

from arts.

olemke avatar olemke commented on July 23, 2024

Replacing (or for now adding some set of SSDB data) to arts-xml-data sounds good to me. I'm just worried about the size of the scattering data. How big are the ~10 files you would need to add?

from arts.

olemke avatar olemke commented on July 23, 2024

We also do have an unpacked version of the scattering data on our projects FTP server. The testdata framework could be used to download selected files from there. See lftp ftp://ftp-projects.cen.uni-hamburg.de:/arts/ArtsScatDbase/v1.0.0/StandardHabits/FullSet

from arts.

simonpf avatar simonpf commented on July 23, 2024

The files are 6 to 10 MB. So the FTP sound like a good option. Can you update it to include the latest version?

Nonetheless, it probably makes sense to get rid of the scattering data in ARTS xml as it will be obsolete.

from arts.

riclarsson avatar riclarsson commented on July 23, 2024

The files are 6 to 10 MB. So the FTP sound like a good option. Can you update it to include the latest version?

@olemke , I cannot find any official data volume policy on github CI. Is there something I am missing? It's about a 1Gb extra to download per merge/push/PR to add this data. I think this is fine, but it's getting to the level (i.e., a round number) where knowing the policy is good.

I think the amount of data sounds reasonable since it is testing a quite adaptive system.

from arts.

olemke avatar olemke commented on July 23, 2024

@simonpf v1.1.0 is now available at ftp://ftp-projects.cen.uni-hamburg.de:/arts/ArtsScatDbase/v1.1.0/StandardHabits/FullSet

from arts.

simonpf avatar simonpf commented on July 23, 2024

Thank you, @olemke for updating the SSDB data.

Currently the downloaded data amounts to 40 MB. It can probably reduced further but that would require modifying the SSDB data increasing the risk of the test data becoming obsolete with the next updated of the SSDB.

Ultimately it may not be necessary to test the SSDB compatibility in the CI but instead use on-the-fly generated scattering data. Nonetheless, I think it is usefull to have reproducible tests for this available.

from arts.

olemke avatar olemke commented on July 23, 2024

40MB should be no problem.

from arts.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.