Comments (11)
Licensing/etc feels like something that should go in the source dataset metadata, as it's ultimately a property of the source dataset (and will assumedly remain consistent when output to new formats). The previous EODS dataset metadata format included similar licensing information, but it hasn't been added to the eodatasets structure so far.
We might be able to assume a collection has a single, consistent license, and so be able to add it there, But it feels like something that should go in each dataset metadata regardless, and we can match to collections based on that property if needed. We'll need input from others (including Ops)
from datacube-core.
Okay, agreed that it does belong as a property of the source dataset.
However, I'm concerned about making the pre-ingestion process overly complicated and prescriptive for someone looking to use a new dataset.
from datacube-core.
Remember you don't have to include the full licence, you can have in the licence field a link that simply points to a URL for the licence (or a relative path on the filesystem but that's less robust) if CC-BY4.0 is insufficient. -Claire
from datacube-core.
I think Jeremy (and Claire) are on the right track here. That's not to say you're wrong, Damian - not at all!
Rather, what do you mean by "pre-ingestion process overly complicated and prescriptive for someone looking to use a new dataset". The metadata elements you refer to are basic and should be, effectively, mandatory across all ingest-ready data sets. I think we have the "authority" to define mandatory and optional metadata elements - and the ones listed above, for my money, should be mandatory.
BTW: I'd include a point of contact (creator and/or custodian) in your list above.
from datacube-core.
This stuff is essential for a share, public/semi-public deployment, but for a private/testing datacube could provide hurdles and delays. Especially for someone trying to ingest a third party dataset. I want to make it as easy for them to solve as possible.
For example:
The Landsat datasets we're experimenting with don't currently include this type of information. Creating compliant Storage Units requires the info from somewhere. The options are.
- Get the datasets fixed properly before using them (lots of time/effort required)
Provide the metadata in an existing config file we already use.- Proposal Have an option to add extra metadata into datasets when ingesting. Either:
- Rewrite the dataset configs into a temporary directory
- Specify extra data to be added in the ingestion phase.
from datacube-core.
I'll leave the implementation details to you for now :)
But if it helps I generally consider 3 levels of metadata specificity: 1. collection level (static until a major change, same for all files and variables); 2. variable level (capture differences between bands/variables), and; 3. file level (time/space dims). The ingest will likely have a couple more (e.g., ingest or storage unit params) or re-scope the above as appropriate.
In any case, if the set of metadata files are kept and maintained somewhere central/useful then we can build on the core ones as/when required. I agree that you don't need the full sets of metadata fields at this time. But a mechanism to add/join/append metadata files together - either on the fly in the code (know in which file a metadata element resides) or join multiple metadata files together (form a big dict in memory) - could provide the extensibility you're after.
from datacube-core.
@omad Since the current test data is generated from eods packages, in the short-term we can update Greg's eodataset packager driver to include the licensing fields (there are other fields I'd like to change in them, so I think we need to regenerate the test data anyway).
Longer term, I think it's worth exploring: such as collection "default properties", which are included in a dataset when none have been specified.
from datacube-core.
@omad your noting of the different public, private and temporary datacubes and the differing needs for metadata "hurdles" is important. Whilst the use case patterns haven't been full fleshed out (a certain lion we know is working on them ) the three formal names coming out of the design workshop are:
- Custodial collections - which would require the gold meta-data standard, NERDIP style compliance, etc.
- Analysis and Temporary collections - which seems to be the equivalent of your private/testing with minimal hurdles. I would add though that AGDC should encourage these have sufficient information so that when a final product is produced (e.g. wofs) there is sufficient information to be able to build the Custodial level should that be where the result is destined.
- Private collections: which will be whatever level is required for that private use
There would be room to add these metadata characteristics to the Use-case-Patterns wiki page at: http://www.datacube.org.au/wiki/Patterns_of_Use#Ingestion_Use_Case_Patterns. I'm sure the input from the development team to these design goals and requirements would be welcomed and its a more useful record than the github chat for the outcome.
from datacube-core.
@woodcockr Thanks for the use case/terminology pointers, it's really helpful in getting everyone on the same page.
from datacube-core.
@omad no problem. your addition to the docs welcome as well for the same reason.
from datacube-core.
This type of information can now be specified in the storage mapping files, (eg. GA Landsat 5 NBAR Mapping: Line 18).
from datacube-core.
Related Issues (20)
- DeprecationWarning on `pkg_resources` in Python 3.10
- python 3.8/3.9 doesn't support entry_points with either group or select
- Update core docker image to latest osgeo/gdal images HOT 3
- Deprecation Warnings from `jsonschema` and SQLAlchemy HOT 1
- Dask load returning all nan values HOT 3
- Support for per-grid CRS, for better STAC interoperability
- API Query doctest fails due to different timezone representation
- --with-docker unable to connect to local postgres when running integration tests HOT 1
- Errors following documentation build instructions HOT 1
- AttributeError: module 'ee.data' has no attribute '_get_cloud_api_resource' HOT 1
- issue in find_less_mature when indexing datasets without region_code HOT 3
- Installation environment issues HOT 2
- Error running `datacube system init --no-init-users` HOT 1
- datacube.utils.documents.UnknownMetadataType: Unknown metadata type: 'eo3' while running the command datacube product add s2_l2a.odc-product.yaml in opendatacube HOT 4
- Record intentional omissions from collections HOT 3
- Compatibility issue - python version and aiohttp HOT 6
- Unable to Work with Temporary AWS Security Credentials HOT 6
- lost date time and measurements after ingesting the data using the template s2amsil1c_albers_10.yaml HOT 3
- ODC feature request: env var for `skip_broken_datasets`? HOT 2
- list_products ignores information in load hints
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from datacube-core.