Comments (22)
@mbjones Is this a standard thing to do? Recommendation for how we encode it?
from eml.
I think the best place to add it is in /eml/dataset/methods/methodStep/software, and in the sibling description element describe the role that reml played in generating the metadata. You might also want to add the citation in that subtree to REML. EML is pretty flexible, so there are other options as well, but I think this is the most appropriate.
from eml.
Excellent. Since this will create a software node, we may as well write eml_software
first, and eml_R_software
, then we can create the software node with eml_software("reml")
, see #32
@mbjones Um, I'm not spotting the documentation for how a sibling description element should be constructed?
from eml.
Schema diagram is here:
http://knb.ecoinformatics.org/software/eml/eml-2.1.1/eml-methods.png
I'm referring to the three sibling elements:
/eml/dataset/methods/methodStep/software
/eml/dataset/methods/methodStep/description
/eml/dataset/methods/methodStep/citation
from eml.
@mbjones Thanks, this makes sense. First two are done, but trying to wrap my head around EML citation
objects.
I assume we cite software as <generic>
?
I'm a bit confused why I don't see things like title
and author
listed under fields such as Article: http://knb.ecoinformatics.org/software/eml/eml-2.1.1/eml-literature.html#Article, I guess that's because we have <title>
and <creator>
defined elsewhere?
I see that the citation object is built around the endnote format. R's citation tools (and lots of other tools) can return citations in bibtex format; I'm wondering if there's anything clever that can be done here in place of just mapping each term by hand...
from eml.
@cboettig Yeah, software is probably best listed as <generic>
. At the time we did this, EndNote was massively predominant, and the likes of Mendeley and Zotero were far off on the horizon yet. In retrospect I wish I had known more about Bibtex, as it seems to have survived the test of time, but in 1998-2000 there simply weren't any XML-based citation schemas available. So, we ported endnote. There must be a decent mapping to convert to more modern standards like Bibtex or Bibo, but I haven't looked carefully for it. I think Dryad uses a subset of Bibo, but they had to define their own xml schema for that too, as Bibo doesn't have a schema doc.
Regarding why <title>
and <creator>
are not shown in the spec, its because they are part of another module that is included by reference, specifically res:ResourceGroup
. In XSD, you can include schema portions by reference to a group of elements, and as we need the bibliographic fields that describe resources in many places, we created res:ResourceGroup
to be the common group for these fields. So, near the top of the CitationType
definition (http://knb.ecoinformatics.org/software/eml/eml-2.1.1/eml-literature.html#CitationType), you will see this:
A sequence of (
res:ResourceGroup
contact optional unbounded
...
}
which is a group inclusion. Follow the link to res:ResourceGroup and you'll see all of the fields. If you look at the diagram for eml-literature, you'll see that those fields in the group have been included by parsing the XSD (http://knb.ecoinformatics.org/software/eml/eml-2.1.1/eml-literature.png).
from eml.
Sounds good. Thanks for explaining the group inclusion with
res:ResourceGroup, clearly I'm brand new to XSD so these pointers help me
get up to speed.
Yeah, Dryad uses the custom:
https://raw.github.com/datadryad/dryad-repo/dryad-master/dspace/modules/xmlui/src/main/webapp/themes/Dryad/meta/schema/v3.1/bibo.xsd
I was hoping Shotton's group might be persuaded to make a xsd file for
fabio, an alternative to bibo with some advantages, including be OWL2 DL
instead of OWL full, see
http://semanticpublishing.wordpress.com/2011/06/29/comparison-of-bibo-and-fabio/
(I
have only a fuzzy understanding of the the differences, but probably means
more to you). A moot point since they don't seem to have an XSD file
either at this time, and even if they did this would mean changing the EML
schema? Or is it trivial to extend if you had such bibliographic schema?
On Sun, Jul 7, 2013 at 4:45 PM, Matt Jones [email protected] wrote:
@cboettig https://github.com/cboettig Yeah, software is probably best
listed as . At the time we did this, EndNote was massively
predominant, and the likes of Mendeley and Zotero were far off on the
horizon yet. In retrospect I wish I had known more about Bibtex, as it
seems to have survived the test of time, but in 1998-2000 there simply
weren't any XML-based citation schemas available. So, we ported endnote.
There must be a decent mapping to convert to more modern standards like
Bibtex or Bibo, but I haven't looked carefully for it. I think Dryad uses a
subset of Bibo, but they had to define their own xml schema for that too,
as Bibo doesn't have a schema doc.Regarding why <title> and are not shown in the spec, its
because they are part of another module that is included by reference,
specifically res:ResourceGroup. In XSD, you can include schema portions
by reference to a group of elements, and as we need the bibliographic
fields that describe resources in many places, we created 'res:ResourceGroupto
be the common group for these fields. So, near the top of theCitationType`
definition (
http://knb.ecoinformatics.org/software/eml/eml-2.1.1/eml-literature.html#CitationType),
you will see this:A sequence of (
res:ResourceGroupcontact optional unbounded
...
}which is a group inclusion. Follow the link to res:ResourceGroup and
you'll see all of the fields. If you look at the diagram for
eml-literature, you'll see that those fields in the group have been
included by parsing the XSD (
http://knb.ecoinformatics.org/software/eml/eml-2.1.1/eml-literature.png).—
Reply to this email directly or view it on GitHubhttps://github.com/ropensci/reml/issues/22#issuecomment-20577230
.
Carl Boettiger
UC Santa Cruz
http://carlboettiger.info/
from eml.
Definitely not trivial to extend -- many groups around the world use the EML schema and have written software to generate it and consume it -- any schema changes, especially backwards incompatible ones, have a ripple effect on the community. So, we try to avoid changes that break existing EML documents. Adding something as an optional new field is more acceptable and can generally get approved by the EML community fairly quickly.
from eml.
@mbjones Okay, I failed to write this methodsStep
(which states that reml
created the EML) correctly:
My R code creates a nod that looks like this:
<methods>
<methodsStep>
<software>
<license>CC0</license>
<version>0.0-1</version>
<implementation>
<distribution>
<online>
<url>https://github.com/ropensci/reml</url>
</online>
</distribution>
</implementation>
</software>
<description>An R package for reading, writing, integrating and publishing data
using the Ecological Metadata Language (EML) format.</description>
</methodsStep>
</methods>
And the validator complains:
[1] "cvc-complex-type.2.4.a: Invalid content starting with element 'methodsStep'. The content must match '(((\"\":methodStep){1-UNBOUNDED},(\"\":sampling){0-1}),(\"\":qualityControl){0-UNBOUNDED}){1-UNBOUNDED}'."
Um, does this mean I need sampling
and qualityControl
in a methods step? I'm confused.
from eml.
My R code creates a nod that looks like this
What function generated that metadata?
from eml.
eml_write, which called eml_dataset, which calls
methodsStep <- newXMLNode("methodsStep", parent = methods_node)
addChildren(methodsStep, eml_R_software("reml"))
addChildren(methodsStep,
newXMLNode("description",
packageDescription("reml", fields="Description")))
which uses eml_R_software creates that node... (using eml_software)
On Sat, Jul 20, 2013 at 8:56 AM, Karthik Ram [email protected]:
My R code creates a nod that looks like this
What function generated that metadata?
—
Reply to this email directly or view it on GitHubhttps://github.com/ropensci/reml/issues/22#issuecomment-21295658
.
Carl Boettiger
UC Santa Cruz
http://carlboettiger.info/
from eml.
@cboettig -- Just pushed a fix -- "methodsStep" should have been "methodStep".
from eml.
thanks!
On Sat, Jul 20, 2013 at 9:19 AM, Matt Jones [email protected]:
@cboettig https://github.com/cboettig -- Just pushed a fix --
"methodsStep" should have been "methodStep".—
Reply to this email directly or view it on GitHubhttps://github.com/ropensci/reml/issues/22#issuecomment-21296051
.
Carl Boettiger
UC Santa Cruz
http://carlboettiger.info/
from eml.
oh, validator still unhappy:
[1] "cvc-complex-type.2.4.a: Invalid content starting with element 'software'. The content must match '((((((\"\":description),((\"\":citation)|(\"\":protocol)){0-UNBOUNDED}),(\"\":instrumentation){0-UNBOUNDED}),(\"\":software){0-UNBOUNDED}),(\"\":subStep){0-UNBOUNDED}),(\"\":dataSource){0-UNBOUNDED})'."
from eml.
What does the methodStep snippet look like now? The error message is just relating the schema rules, which are somewhat easier to grok in this image:
http://knb.ecoinformatics.org/software/eml/eml-2.1.1/eml-methods.png
from eml.
now it is:
<methods>
<methodStep>
<software>
<license>CC0</license>
<version>0.0-1</version>
<implementation>
<distribution>
<online>
<url>https://github.com/ropensci/reml</url>
</online>
</distribution>
</implementation>
</software>
<description>An R package for reading, writing, integrating and publishing data
using the Ecological Metadata Language (EML) format.</description>
</methodStep>
</methods>
from eml.
Elements need to be in a different order to be valid. In addition, you are missing required fields from the software module, including title, and creator. See:
http://knb.ecoinformatics.org/software/eml/eml-2.1.1/eml-software.png
Something like this might validate (I didn't try it, just tried to follow the schema):
<methods>
<methodStep>
<description>An R package for reading, writing, integrating and publishing data
using the Ecological Metadata Language (EML) format.</description>
<software>
<title>reml</title>
<creator>
<individualName>
<givenName>Carl</givenName><surName>Boettiger</surName>
</individualName>
</creator>
<creator>
<individualName>
<givenName>Karthik</givenName><surName>Ram</surName>
</individualName>
</creator>
<implementation>
<distribution>
<online>
<url>https://github.com/ropensci/reml</url>
</online>
</distribution>
</implementation>
<license>CC0</license>
<version>0.0-1</version>
</software>
</methodStep>
</methods>
from eml.
Thanks for clarifying, sorry I haven't got the hang of reading the spec still. I keep forgetting resourceGroup
and forgetting to pay attention to node ordering. Once XMLSchema
package is fully running, we will be able to automate the creation of corresponding S4 objects to the schema, so it will just be a matter of writing coercion methods (e.g. like eml_R_software
that can extract information from native R formats (e.g. the R package DESCRIPTION) into the S4 object, which will reduce errors like this!
from eml.
I don't have a complete understanding of the spec either. @mbjones can you suggest some readings that will allow me to get up to speed?
from eml.
@karthikram The pngs are pretty handy once you get the hang of them, e.g.
http://knb.ecoinformatics.org/software/eml/eml-2.1.1/eml-software.png I
think ordering of nodes matters, top down as shown. dashed lines are
optional nodes. I don't get the symbols with boxes on lines (either linear
ones or stacked). And of course vector graphic would be eaiser to read
without squinting...
Otherwise I find the descriptions in the 'normative technical documents'
reasonably readable...
http://knb.ecoinformatics.org/software/eml/eml-2.1.1/eml-software.html
As I commented above, a working XMLSchema will be a big help in automating
a lot of this, and letting us focus on the UI. But writing a few nodes out
by hand is pretty instructive.
On Sat, Jul 20, 2013 at 9:23 PM, Karthik Ram [email protected]:
I don't have a complete understanding of the spec either. @mbjoneshttps://github.com/mbjonescan you suggest some readings that will allow me to get up to speed?
—
Reply to this email directly or view it on GitHubhttps://github.com/ropensci/reml/issues/22#issuecomment-21304956
.
Carl Boettiger
UC Santa Cruz
http://carlboettiger.info/
from eml.
The diagrams are the best way to understand the spec, although note that the diagrams do not show XML attributes. This was a shortcoming of the software used to generate the schemas. There is a nice explanation of the diagrams here: http://www.diversitycampus.net/projects/tdwg-sdd/minutes/SchemaDocu/SchemaDesignElements.html
I'm not sure what you are referring to with the 'symbols with boxes on lines' comment. Sorry. Getting up to speed on the EML spec (or any other) just takes time -- the EML schema itself is the most useful. We wrote a paper describing use of the spec a while ago targeted at ecologists, but it doesn't get into the technical details of the spec -- see Fegraus et al. 2005: http://www.mendeley.com/download/public/1825821/4445891665/2d411d3da6a51fecf34b4f0061a68d250c486eaa/dl.pdf
These diagrams were built with XML Spy, an XML editor that can display XML Schema. There are several others that will produce diagrams. If you are trying to understand the EML schema, it can be useful to open the eml.xsd schema in one of these editors so that you can explore the schema tree more dynamically -- the images are just static screenshots of the diagrams for certain subtrees.
from eml.
reml-generated methods node now exists, so I think we can close this issue. software
and citation
nodes will be written/improved upon once we get this S4 thing nailed down, and then we can revisit this.
from eml.
Related Issues (20)
- set_attributes forces all numeric fields to have storageType = "float" HOT 7
- Taxonomic Coverage and bibtex HOT 1
- Species name epithet is not handled the way specified in the EML schema HOT 2
- Error with molePerKilogram in unit list returned by get_unitList() HOT 3
- dataset and datatable entries from README example fail HOT 2
- `shiny_attributes` performance improvments HOT 8
- Revisit how users can find a learn to use the `eml$*` constructors HOT 2
- Add a minimum version requirement on taxadb and wait to release the next version of this package HOT 1
- Web scraping | sapply function | Error in readBin(5L, "raw", 65536L) : Failure when receiving data from the peer HOT 1
- Creating EML elements with XML attributes HOT 2
- Duplicate person when using `write_eml()` HOT 2
- Set attributes for properties, e.g. `<title xml:lang="eng">` HOT 3
- Function to convert DataCite metadata to EML: good fit for this package? HOT 7
- `<![CDATA[` not always recognized HOT 1
- [Units] Discussion about current unit list HOT 5
- `set_coverage()`: Express common names in `commonName` in `taxonomicCoverage` HOT 10
- `set_responsibleParty()`: allow to create organization parties HOT 1
- namespace conflict introduced when importing/exporting EML generated under older schema
- EML::eml_validate conflicts with knb.ecoinformatics.org parser & appears to introduce invalid xml into valid files HOT 1
- EML seems to have trouble with foreign key constraints HOT 2
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from eml.