Code Monkey home page Code Monkey logo

Comments (11)

jeremyh avatar jeremyh commented on June 7, 2024

Licensing/etc feels like something that should go in the source dataset metadata, as it's ultimately a property of the source dataset (and will assumedly remain consistent when output to new formats). The previous EODS dataset metadata format included similar licensing information, but it hasn't been added to the eodatasets structure so far.

We might be able to assume a collection has a single, consistent license, and so be able to add it there, But it feels like something that should go in each dataset metadata regardless, and we can match to collections based on that property if needed. We'll need input from others (including Ops)

from datacube-core.

omad avatar omad commented on June 7, 2024

Okay, agreed that it does belong as a property of the source dataset.

However, I'm concerned about making the pre-ingestion process overly complicated and prescriptive for someone looking to use a new dataset.

from datacube-core.

cet900 avatar cet900 commented on June 7, 2024

Remember you don't have to include the full licence, you can have in the licence field a link that simply points to a URL for the licence (or a relative path on the filesystem but that's less robust) if CC-BY4.0 is insufficient. -Claire

from datacube-core.

mpaget avatar mpaget commented on June 7, 2024

I think Jeremy (and Claire) are on the right track here. That's not to say you're wrong, Damian - not at all!

Rather, what do you mean by "pre-ingestion process overly complicated and prescriptive for someone looking to use a new dataset". The metadata elements you refer to are basic and should be, effectively, mandatory across all ingest-ready data sets. I think we have the "authority" to define mandatory and optional metadata elements - and the ones listed above, for my money, should be mandatory.

BTW: I'd include a point of contact (creator and/or custodian) in your list above.

from datacube-core.

omad avatar omad commented on June 7, 2024

This stuff is essential for a share, public/semi-public deployment, but for a private/testing datacube could provide hurdles and delays. Especially for someone trying to ingest a third party dataset. I want to make it as easy for them to solve as possible.

For example:
The Landsat datasets we're experimenting with don't currently include this type of information. Creating compliant Storage Units requires the info from somewhere. The options are.

  1. Get the datasets fixed properly before using them (lots of time/effort required)
  2. Provide the metadata in an existing config file we already use.
  3. Proposal Have an option to add extra metadata into datasets when ingesting. Either:
  • Rewrite the dataset configs into a temporary directory
  • Specify extra data to be added in the ingestion phase.

from datacube-core.

mpaget avatar mpaget commented on June 7, 2024

I'll leave the implementation details to you for now :)

But if it helps I generally consider 3 levels of metadata specificity: 1. collection level (static until a major change, same for all files and variables); 2. variable level (capture differences between bands/variables), and; 3. file level (time/space dims). The ingest will likely have a couple more (e.g., ingest or storage unit params) or re-scope the above as appropriate.

In any case, if the set of metadata files are kept and maintained somewhere central/useful then we can build on the core ones as/when required. I agree that you don't need the full sets of metadata fields at this time. But a mechanism to add/join/append metadata files together - either on the fly in the code (know in which file a metadata element resides) or join multiple metadata files together (form a big dict in memory) - could provide the extensibility you're after.

from datacube-core.

jeremyh avatar jeremyh commented on June 7, 2024

@omad Since the current test data is generated from eods packages, in the short-term we can update Greg's eodataset packager driver to include the licensing fields (there are other fields I'd like to change in them, so I think we need to regenerate the test data anyway).

Longer term, I think it's worth exploring: such as collection "default properties", which are included in a dataset when none have been specified.

from datacube-core.

woodcockr avatar woodcockr commented on June 7, 2024

@omad your noting of the different public, private and temporary datacubes and the differing needs for metadata "hurdles" is important. Whilst the use case patterns haven't been full fleshed out (a certain lion we know is working on them ) the three formal names coming out of the design workshop are:

  • Custodial collections - which would require the gold meta-data standard, NERDIP style compliance, etc.
  • Analysis and Temporary collections - which seems to be the equivalent of your private/testing with minimal hurdles. I would add though that AGDC should encourage these have sufficient information so that when a final product is produced (e.g. wofs) there is sufficient information to be able to build the Custodial level should that be where the result is destined.
  • Private collections: which will be whatever level is required for that private use

There would be room to add these metadata characteristics to the Use-case-Patterns wiki page at: http://www.datacube.org.au/wiki/Patterns_of_Use#Ingestion_Use_Case_Patterns. I'm sure the input from the development team to these design goals and requirements would be welcomed and its a more useful record than the github chat for the outcome.

from datacube-core.

omad avatar omad commented on June 7, 2024

@woodcockr Thanks for the use case/terminology pointers, it's really helpful in getting everyone on the same page.

from datacube-core.

woodcockr avatar woodcockr commented on June 7, 2024

@omad no problem. your addition to the docs welcome as well for the same reason.

from datacube-core.

omad avatar omad commented on June 7, 2024

This type of information can now be specified in the storage mapping files, (eg. GA Landsat 5 NBAR Mapping: Line 18).

from datacube-core.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.