Comments (6)
- Endpoints on the KNB should use the DataONE MN.get() REST endpoint, so for example, for doi:10.5063/AA/nceas.912.9:
https://knb.ecoinformatics.org/knb/d1/mn/v1/object/doi:10.5063%2FAA%2Fnceas.912.9
However, note that we also recommend using the DataONE CN.resolve() service to find the list of nodes that might currently both have a copy of an object and are currently available on the network. The resolve() call returns a list of nodes that contain the object and the REST url for retrieving it. So, for example:
$ curl -s https://cn.dataone.org/cn/v1/resolve/doi%3A10.5063%2FAA%2Fnceas.912.9 | xmlstarlet fo
<?xml version="1.0" encoding="UTF-8"?>
<d1:objectLocationList xmlns:d1="http://ns.dataone.org/service/types/v1">
<identifier>doi:10.5063/AA/nceas.912.9</identifier>
<objectLocation>
<nodeIdentifier>urn:node:KNB</nodeIdentifier>
<baseURL>https://knb.ecoinformatics.org/knb/d1/mn</baseURL>
<version>v1</version>
<url>https://knb.ecoinformatics.org/knb/d1/mn/v1/object/doi:10.5063%2FAA%2Fnceas.912.9</url>
</objectLocation>
<objectLocation>
<nodeIdentifier>urn:node:CN</nodeIdentifier>
<baseURL>https://cn.dataone.org/cn</baseURL>
<version>v1</version>
<url>https://cn.dataone.org/cn/v1/object/doi:10.5063%2FAA%2Fnceas.912.9</url>
</objectLocation>
</d1:objectLocationList>
- Regarding IDs, the EML spec leaves it open other than saying they must be unique in the document. The point is to provide an unambiguous identifier to reference
<attribute>
definitions in EML. These can then be used in other places to refer to those attribute definitions. - EML doesn't specify how to define an attribute beyond using the natural language definition. That said, for OBOE we have come up with an annotation syntax that could be used in the additionalMetadata section to provide a linkage between the attribute definition in EML and an ontology. Some examples of its use are in SVN (https://code.ecoinformatics.org/code/semtools/trunk/dev/sms/examples). This is probably more complicated than you are looking for, as it maps several different semantic aspects of the data set, including the Characteristic being measured (what you are looking for I think), as well as the Entity being measured, the MeasurementStandard used (redundant with other fields in EML), and the Context. This is the mapping we've been experimenting with in Semtools and is the basis of the figure that you included in issue #8. There is an XML Schema for the annotation syntax in the directory above the examples. The annotation is in XML, but it could also be done in RDF, which would merge better with the OBOE OWL ontology. In addition, we debated over whether its better to include the annotation inline in the EML document (which nicely packages them together), or to provide a separate annotation file (which allows people other than the EML owner to provide annotations, and lets us annotate metadata files other than EML (such as FGDC). Which is best is still under discussion in our group. We have built out a prototype extension of Morpho that produces these annotations as separate files, and then a Metacat search service that knows how to use them to do semantic-driven searches and data integration tasks. It would be great to discuss how this relates to what you are trying to do in R, and what we might adapt for compatibility.
from eml.
Re 1. This is great, can definitely implement this kind of call.
I am curious about what we can offer, if anything, by way of search interfaces for EML data through the reml R package. Initially I was thinking about querying across large sets of EML files for matching column types, for data integration etc. Though EML files are generally pretty small, still, downloading and parsing large numbers of them might not be the best way to go. Thoughts?
Anyway, something to think about down the line at least.
from eml.
Okay, I'm now thinking that adding RDF to the additionalMetadata
section and using describes
references (as discussed above and more in #9) is the best way to go about adding semantic definitions, rather than the relying on the external semtools schema for this (as we considered in issue #8). When asked about using the semtools schema, Ben makes the case for this approach quite eloquently:
While we did use the sms annotation schema in the Semtools project, I can't say that I think you should also use it. I'd be more interested in seeing a "purer" semantic approach to storing those types of annotations (e.g., "this column of this data table is measured in Gram"). Basically, these are all RDF triples. I'm not sure if Shawn Bowers - the one who first drew up the sms annotation schema - is still advocating its use, but it was experimental even in the heyday of Semtools. One of the major issues with this annotation approach is that it is another independent file that describes the EML file. This gets annoying when you try to have tools work with the many files. You could potentially embed the annotation - or any XML - in EML's additionalMetadata section.
I think the really clever thing here is that the metadata
tag is flexible enough for us to just add RDF directly, as Ben illustrates like this:
<eml>
…
<dataTable id="http://some.namespace#myUniqueEntityId1">
<attribute id="http://some.namespace#myUniqueAttributeId1"/>
<attribute id="http://some.namespace#myUniqueAttributeId2"/>
</dataTable>
…
<additionalMetadata>
<describes>http://some.namespace#myUniqueAttributeId1</describes>
<metadata>
<!-- RDF stuff here that annotates http://some.namespace#myUniqueAttributeId1 -->
<rdf:RDF
xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
xmlns:o="http:/oboe-core#">
<rdf:Description rdf:about="http://some.namespace#myUniqueAttributeId1">
<o:entity>Air</o:entity>
<o:characteristic>Temperature</o:characteristic>
<cd:unit>Celsius</cd:unit>
</rdf:Description>
</rdf:RDF>
</metadata>
</additionalMetadata>
<additionalMetadata>
<describes>http://some.namespace#myUniqueAttributeId2</describes>
<metadata>
<!-- RDF stuff here that annotates http://some.namespace#myUniqueAttributeId2 -->
</metadata>
</additionalMetadata>
</eml>
A few questions:
- Would it be preferable to use RDFa (like NeXML does) instead of RDF, which would presumably allow us to extract the RDF data into a pure RDF file using standard tools (e.g. http://www.w3.org/2012/pyRdfa/#distill_by_uri)?
Or is there a good reason to prefer embedded RDF, as above? - Ben points out that one option would be, rather than a separate
additionalMetadata
for each attribute, we could have oneadditionalMetadata
referencing the root EML id indescribes
, since therdf:Description
node points to the attribute already anyhow. Any reason to prefer one approach over the other? - Presumably we could generate this automatically for standard units. We could also generate this automatically for species names, along with the adding the appropriate EML version of
coverage
? Or would it be better to have a singlecoverage
node with all the taxanomic coverage, etc? (Basically a question of how other tools are using the coverage nodes. Since it sounds like they are just using them at aggregate level to identify EML files containing certain coverage, rather than at the attribute level to give semantic meaning to columns, maybe there is no point in doing the latter? This issue already touched upon in #9 , though undecided.) - What namespace do we put the attribute ids under? (both in the
<rdf:Description rdf:about="http://some.namespace#myUniqueAttributeId1">
and in thedescribes
nodes?) - Obviously we simply don't have ontological meaning for lots of terms. For a first pass, I imagine
reml
adding this annotation 'silently' on the above cases where we can probably automatically interpret (or infer from the schema) the semantic meaning. The harder challenge is thinking how a user might specify additional semantic annotations of elements without expert knowledge of the schema, the relevant ontology, and lots of hand-crafting. Maybe that's an impossible problem.
from eml.
@mbjones Just brainstorming about adding semantics here, since Ben wasn't enthusiastic about the semtools XSD route. Would love to hear what you think about this approach when you get back.
I've just added an example in which semantic metadata is included using RDFa. Building on Ben's suggestions, the additionalMetadata node looks like:
<additionalMetadata>
<describes>1838</describes>
<metadata>
<subject about="http://some.namespace#1838" xmlns:o="http:/oboe-core#"
xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:dcterms="http://purl.org/dc/terms/"
xmlns:prism="http://prismstandard.org/namespaces/1.2/basic/"
xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
xmlns:rdfs="http://www.w3.org/2000/01/rdf-schema#"
xmlns:skos="http://www.w3.org/2004/02/skos/core#"
xmlns:xsd="http://www.w3.org/2001/XMLSchema#"
xmlns:nex="http://www.nexml.org/2009">
<meta property="o:entity" content="Air" datatype="xsd:string"/>
<meta property="o:characteristic" content="Temperature" datatype="xsd:string"/>
<meta property="o:unit" content="Celsius" datatype="xsd:string"/>
</subject>
</metadata>
</additionalMetadata>
I believe this has a few advantages over the (potentially depricated?) semtools xml annotations or RDF nodes:
- A dumb parser (e.g. without any knowledge of the schema) could still extract the triples, in any desired format (RDF, turtle, etc). For instance, w3c's pyRdfa gives
@prefix o: <http:/oboe-core#> .
@prefix xsd: <http://www.w3.org/2001/XMLSchema#> .
<http://some.namespace#1838> o:characteristic "Temperature"^^xsd:string;
o:entity "Air"^^xsd:string;
o:unit "Celsius"^^xsd:string .
- we have semantics embedded in the EML file in a language natural for the expression of semantic data.
One concern is that the contents our our additionalMetadata node are not very human-readable in this way. Nonetheless, it is still reasonably easy to understand when we render the EML file as a "plain text" format by coercing it into yaml:
describes: '1838'
metadata:
subject:
meta:
- o:entity
- Air
- xsd:string
meta:
- o:characteristic
- Temperature
- xsd:string
meta:
- o:unit
- Celsius
- xsd:string
.attrs: http://some.namespace#1838
Though perhaps "Air Temperature Celsius" would be the preferred human version. In any event, that can be added directly to the text. (Yeah, the example EML attribute at 1838 isn't actually about temperature, this is just a quick demo of what adding semantics might be about).
We still have the design consideration questions above to address. Semantics could be added automatically for Dublin Core terms (things like title, creators, publication date, etc, and for cases like standard units or taxanomic names (at least when stated in coverage nodes if not in attributes) that we can resolve from the schema logic.
In the long run, ideally additional functions will allow the user to add arbitrary annotations for EML elements through reml
.
from eml.
@mbjones One quick related issue: for some reason, my example file does not validate against the online validator. I get the error:
> doc <- saveXML(xmlParse("rdfa_example.xml"))
> eml_validate(doc)
$`EML specific tests`
[1] "Error processing keyrefs: //additionalMetadata/describes : Error in xml document. This EML instance is invalid because referenced id 1838 does not exist in the given keys."
even though there is indeed a node with id="1838"
, so I'm not sure what I did wrong.
from eml.
The EML parser was not actually configured to parse attribute@id values as valid references in the additionalMetadata/describes field. I've fixed this and will deploy it soon. Parsing errors aside, the sample EML+RDF looks pretty workable as it stands, but the more I think about it, you should probably just use a single additionalMetadata/describes block for all the RDF instead of little bits for each attribute. This will be easier for parsing the RDF in one go and, as you mention, the RDF explicitly references the attribute@id values anyway as the subject.
from eml.
Related Issues (20)
- set_attributes forces all numeric fields to have storageType = "float" HOT 7
- Taxonomic Coverage and bibtex HOT 1
- Species name epithet is not handled the way specified in the EML schema HOT 2
- Error with molePerKilogram in unit list returned by get_unitList() HOT 3
- dataset and datatable entries from README example fail HOT 2
- `shiny_attributes` performance improvments HOT 8
- Revisit how users can find a learn to use the `eml$*` constructors HOT 2
- Add a minimum version requirement on taxadb and wait to release the next version of this package HOT 1
- Web scraping | sapply function | Error in readBin(5L, "raw", 65536L) : Failure when receiving data from the peer HOT 1
- Creating EML elements with XML attributes HOT 2
- Duplicate person when using `write_eml()` HOT 2
- Set attributes for properties, e.g. `<title xml:lang="eng">` HOT 3
- Function to convert DataCite metadata to EML: good fit for this package? HOT 7
- `<![CDATA[` not always recognized HOT 1
- [Units] Discussion about current unit list HOT 5
- `set_coverage()`: Express common names in `commonName` in `taxonomicCoverage` HOT 10
- `set_responsibleParty()`: allow to create organization parties HOT 1
- namespace conflict introduced when importing/exporting EML generated under older schema
- EML::eml_validate conflicts with knb.ecoinformatics.org parser & appears to introduce invalid xml into valid files HOT 1
- EML seems to have trouble with foreign key constraints HOT 2
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from eml.