Comments (7)
@wood-chris there is a scheming mapping for DCAT-AP Switzerland (which is a slightly adapted version of DCAT-AP EU) here: https://github.com/opendata-swiss/ckanext-switzerland/blob/master/ckanext/switzerland/dcat-ap-switzerland_scheming.json
from ckanext-dcat.
I am using ckanext-scheming
to define schemas for special dataset types.
Link to example schema: https://github.com/ckan/ckanext-scheming/blob/master/ckanext/scheming/camel_photos.json
I would like to have the metadata of these data types automatically represented in DCAT via this extension. I'm pretty sure that is the scope described by this issue, but just clarifying.
Currently, when I append .jsonld or .rdf to the dataset it displays only the normal metadata and ignores the other fields. Do those field need to have a namespace defined for them somehow?
Is there a recommended way to achieve this functionality?
from ckanext-dcat.
I think this is what I've been looking for (and hoping someone else would have done). When I first saw that this extension was compliant with dcat-ap, I assumed that it meant that it would automagically create the relevant fields (with the required
label for properties that are mandatory in the AP) - but it seems like I was a bit optimistic!
Does one already exist? If not, it looks like I'll need to create one - if it's helpful to add to a repo of common schemas (that may or may not already exist) I'd be happy to add it when I've done it
from ckanext-dcat.
Dumping some thoughts here on scheming support.
At the time the processors (parsers/serializers) that map between CKAN and DCAT were written, usage of ckanext-scheming was still starting to become widespread, so custom DCAT fields that didn't link directly to standard CKAN fields were stored as extras (see all the ones marked extra:
here). So the DCAT version_notes
field would be stored as:
{
"name": "test_dataset_dcat",
"title": "Test dataset DCAT",
// ....
"extras": [
{"key": "version_notes", "value": "Some version notes"}
// ....
]
}
The pattern nowadays is to create custom fields in a scheming schema, that internally handles the conversion to / from extras:
{
"name": "test_dataset_dcat",
"title": "Test dataset DCAT",
"version_notes": "Some version notes"
// ....
}
That's the goal of creating a DCAT scheming schema, that all properties are custom fields of the CKAN Dataset, aligning with the dcat:Dataset ones.
The difficulty here is how to offer support for existing sites using previous versions of the extension.
- For the CKAN -> DCAT (Serialization) direction it should be fine. The serializers support both forms, and for a given field (like
version_notes
) will check first for a root level field in the dataset_dict and if it's not there, look for an extra with that key (here) - For the DCAT -> CKAN (Parsing) direction, we will find issues. As I said, the current parsers will store the custom fields in
extras
, which will make the resulting dataset dict incompatible with a scheming schema that defines the field as a dataset field. When creating or updating, we will get the following error:
File "/home/adria/dev/pyenvs/ckan-py3.9/ckan/ckan/logic/action/create.py", line 185, in package_create
raise ValidationError(errors)
ckan.logic.ValidationError: None - {'extras': [{'key': ['There is a schema field with the same name']}]}
The second case is relevant to CKAN sites that import DCAT RDF representations from other systems (I'd imagine that in most cases through the DCAT harvester) and create CKAN datasets from them.
My current thinking on how to approach this is:
- Change the default parsers so they can store DCAT fields as root fields rather than extras, keeping the old behaviour via config option [1]
- This will go into a new major ckanext-dcat version (2.0.0), with clear documentation:
- New sites using scheming will just need to use the included schema and everything should work
- New sites that don't want to use scheming (unlikely) will have to set the config option, otherwise the custom fields will be ignored
- Existing sites that have custom DCAT profiles and want to use scheming will have to store custom fields as root level fields (or drop any custom validators/logic they are currently using now to make dcat and scheming work together)
- Existing sites that they don't want to use scheming and keep the old behaviour, they just need to set the config option [1]
[1] At first I thought about being clever and inspecting the dataset schema (if scheming was being used) to see if there were DCAT fields defined, and not store the values in extras if so but that seemed brittle, and sites could have different schemas in used, potentially for different DCAT versions even. I think it's better to be explicit and make site maintainers
Sorry if this is a bit convoluted, here's a TLDR:
In the next major version of ckanext-dcat I want to change the RDF DCAT Parsers so they store custom DCAT fields as first level CKAN dataset fields instead of dataset extras, but keep the old behviour via config option for backwards compatibility
from ckanext-dcat.
Chatting with @wardi about this, actually using scheming_dataset_schema_show
to check if there is a schema that contains DCAT fields could be a really good approach, as besides detecting if values need to be stored at the root level, we could use it to mark DCAT fields with certain keys in the schema (like dcat_validators
, etc). These are available to the output of scheming_dataset_schema_show
, snippets, validators etc.
To the point of knowing what schema to use a good approach might be:
- One explicitly provided when creating the Profile class: for instance for sites with harvesters that have more than one dataset schema that could be a harvester config option
- If scheming is loaded, default to the
dataset
schema - If scheming is not loaded or there is no schema defined, fall back to store things in extras
from ckanext-dcat.
PR with summary of work done so far here: #281
from ckanext-dcat.
from ckanext-dcat.
Related Issues (20)
- Too many locn:geometry HOT 1
- do not split keywords HOT 2
- Harvester crashes with missing title HOT 1
- Support for DCAT 3 HOT 2
- Improving Pagination Handling in RDF Harvester's gather_stage
- Google Search Console: contentUrl missing
- [META] DCAT v3 support HOT 4
- Create profile and parser for DCAT-AP 3.0.0 HOT 1
- Create serializer for DCAT-AP 3.0.0 HOT 1
- Create profile and parser for DCAT-US 3.0.0
- Create serializer for DCAT-US 3.0.0
- Create schema file(s) for DCAT-AP 3.0.0 HOT 1
- Create schema file(s) for DCAT-US 3.0.0
- Create config declaration
- DCAT installation error
- Docker dcat plugin integration issue
- Mapping non-namespaced xml fields
- _object_value and _object_value_list return BNode identifiers HOT 1
- Refactor code dependent on model.PackageExtra HOT 1
- Dataset Series support HOT 4
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from ckanext-dcat.