When setting up a datacube collection, we need to record collection level metadata lik

<a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="/us

Add collection metadata to storage units about datacube-core HOT 11 CLOSED

opendatacube commented on June 7, 2024

Add collection metadata to storage units

from datacube-core.

Comments (11)

jeremyh commented on June 7, 2024

Licensing/etc feels like something that should go in the source dataset metadata, as it's ultimately a property of the source dataset (and will assumedly remain consistent when output to new formats). The previous EODS dataset metadata format included similar licensing information, but it hasn't been added to the eodatasets structure so far.

We might be able to assume a collection has a single, consistent license, and so be able to add it there, But it feels like something that should go in each dataset metadata regardless, and we can match to collections based on that property if needed. We'll need input from others (including Ops)

from datacube-core.

omad commented on June 7, 2024

Okay, agreed that it does belong as a property of the source dataset.

However, I'm concerned about making the pre-ingestion process overly complicated and prescriptive for someone looking to use a new dataset.

from datacube-core.

cet900 commented on June 7, 2024

Remember you don't have to include the full licence, you can have in the licence field a link that simply points to a URL for the licence (or a relative path on the filesystem but that's less robust) if CC-BY4.0 is insufficient. -Claire

from datacube-core.

mpaget commented on June 7, 2024

I think Jeremy (and Claire) are on the right track here. That's not to say you're wrong, Damian - not at all!

Rather, what do you mean by "pre-ingestion process overly complicated and prescriptive for someone looking to use a new dataset". The metadata elements you refer to are basic and should be, effectively, mandatory across all ingest-ready data sets. I think we have the "authority" to define mandatory and optional metadata elements - and the ones listed above, for my money, should be mandatory.

BTW: I'd include a point of contact (creator and/or custodian) in your list above.

from datacube-core.

omad commented on June 7, 2024

This stuff is essential for a share, public/semi-public deployment, but for a private/testing datacube could provide hurdles and delays. Especially for someone trying to ingest a third party dataset. I want to make it as easy for them to solve as possible.

For example:
The Landsat datasets we're experimenting with don't currently include this type of information. Creating compliant Storage Units requires the info from somewhere. The options are.

Get the datasets fixed properly before using them (lots of time/effort required)
~~Provide the metadata in an existing config file we already use.~~
Proposal Have an option to add extra metadata into datasets when ingesting. Either:

Rewrite the dataset configs into a temporary directory
Specify extra data to be added in the ingestion phase.

from datacube-core.

mpaget commented on June 7, 2024

I'll leave the implementation details to you for now :)

But if it helps I generally consider 3 levels of metadata specificity: 1. collection level (static until a major change, same for all files and variables); 2. variable level (capture differences between bands/variables), and; 3. file level (time/space dims). The ingest will likely have a couple more (e.g., ingest or storage unit params) or re-scope the above as appropriate.

In any case, if the set of metadata files are kept and maintained somewhere central/useful then we can build on the core ones as/when required. I agree that you don't need the full sets of metadata fields at this time. But a mechanism to add/join/append metadata files together - either on the fly in the code (know in which file a metadata element resides) or join multiple metadata files together (form a big dict in memory) - could provide the extensibility you're after.

from datacube-core.

jeremyh commented on June 7, 2024

@omad Since the current test data is generated from eods packages, in the short-term we can update Greg's eodataset packager driver to include the licensing fields (there are other fields I'd like to change in them, so I think we need to regenerate the test data anyway).

Longer term, I think it's worth exploring: such as collection "default properties", which are included in a dataset when none have been specified.

from datacube-core.

woodcockr commented on June 7, 2024

@omad your noting of the different public, private and temporary datacubes and the differing needs for metadata "hurdles" is important. Whilst the use case patterns haven't been full fleshed out (a certain lion we know is working on them ) the three formal names coming out of the design workshop are:

Custodial collections - which would require the gold meta-data standard, NERDIP style compliance, etc.
Analysis and Temporary collections - which seems to be the equivalent of your private/testing with minimal hurdles. I would add though that AGDC should encourage these have sufficient information so that when a final product is produced (e.g. wofs) there is sufficient information to be able to build the Custodial level should that be where the result is destined.
Private collections: which will be whatever level is required for that private use

There would be room to add these metadata characteristics to the Use-case-Patterns wiki page at: http://www.datacube.org.au/wiki/Patterns_of_Use#Ingestion_Use_Case_Patterns. I'm sure the input from the development team to these design goals and requirements would be welcomed and its a more useful record than the github chat for the outcome.

from datacube-core.

omad commented on June 7, 2024

@woodcockr Thanks for the use case/terminology pointers, it's really helpful in getting everyone on the same page.

from datacube-core.

woodcockr commented on June 7, 2024

@omad no problem. your addition to the docs welcome as well for the same reason.

from datacube-core.

omad commented on June 7, 2024

This type of information can now be specified in the storage mapping files, (eg. GA Landsat 5 NBAR Mapping: Line 18).

from datacube-core.

Add collection metadata to storage units about datacube-core HOT 11 CLOSED

Comments (11)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent