Code Monkey home page Code Monkey logo

Comments (22)

cboettig avatar cboettig commented on June 13, 2024

@mbjones Is this a standard thing to do? Recommendation for how we encode it?

from eml.

mbjones avatar mbjones commented on June 13, 2024

I think the best place to add it is in /eml/dataset/methods/methodStep/software, and in the sibling description element describe the role that reml played in generating the metadata. You might also want to add the citation in that subtree to REML. EML is pretty flexible, so there are other options as well, but I think this is the most appropriate.

from eml.

cboettig avatar cboettig commented on June 13, 2024

Excellent. Since this will create a software node, we may as well write eml_software first, and eml_R_software, then we can create the software node with eml_software("reml"), see #32

@mbjones Um, I'm not spotting the documentation for how a sibling description element should be constructed?

from eml.

mbjones avatar mbjones commented on June 13, 2024

Schema diagram is here:
http://knb.ecoinformatics.org/software/eml/eml-2.1.1/eml-methods.png
I'm referring to the three sibling elements:
/eml/dataset/methods/methodStep/software
/eml/dataset/methods/methodStep/description
/eml/dataset/methods/methodStep/citation

from eml.

cboettig avatar cboettig commented on June 13, 2024

@mbjones Thanks, this makes sense. First two are done, but trying to wrap my head around EML citation objects.

I assume we cite software as <generic>?

I'm a bit confused why I don't see things like title and author listed under fields such as Article: http://knb.ecoinformatics.org/software/eml/eml-2.1.1/eml-literature.html#Article, I guess that's because we have <title> and <creator> defined elsewhere?

I see that the citation object is built around the endnote format. R's citation tools (and lots of other tools) can return citations in bibtex format; I'm wondering if there's anything clever that can be done here in place of just mapping each term by hand...

from eml.

mbjones avatar mbjones commented on June 13, 2024

@cboettig Yeah, software is probably best listed as <generic>. At the time we did this, EndNote was massively predominant, and the likes of Mendeley and Zotero were far off on the horizon yet. In retrospect I wish I had known more about Bibtex, as it seems to have survived the test of time, but in 1998-2000 there simply weren't any XML-based citation schemas available. So, we ported endnote. There must be a decent mapping to convert to more modern standards like Bibtex or Bibo, but I haven't looked carefully for it. I think Dryad uses a subset of Bibo, but they had to define their own xml schema for that too, as Bibo doesn't have a schema doc.

Regarding why <title> and <creator> are not shown in the spec, its because they are part of another module that is included by reference, specifically res:ResourceGroup. In XSD, you can include schema portions by reference to a group of elements, and as we need the bibliographic fields that describe resources in many places, we created res:ResourceGroup to be the common group for these fields. So, near the top of the CitationType definition (http://knb.ecoinformatics.org/software/eml/eml-2.1.1/eml-literature.html#CitationType), you will see this:

A sequence of (
res:ResourceGroup
contact optional unbounded
...
}

which is a group inclusion. Follow the link to res:ResourceGroup and you'll see all of the fields. If you look at the diagram for eml-literature, you'll see that those fields in the group have been included by parsing the XSD (http://knb.ecoinformatics.org/software/eml/eml-2.1.1/eml-literature.png).

from eml.

cboettig avatar cboettig commented on June 13, 2024

Sounds good. Thanks for explaining the group inclusion with
res:ResourceGroup, clearly I'm brand new to XSD so these pointers help me
get up to speed.

Yeah, Dryad uses the custom:
https://raw.github.com/datadryad/dryad-repo/dryad-master/dspace/modules/xmlui/src/main/webapp/themes/Dryad/meta/schema/v3.1/bibo.xsd

I was hoping Shotton's group might be persuaded to make a xsd file for
fabio, an alternative to bibo with some advantages, including be OWL2 DL
instead of OWL full, see
http://semanticpublishing.wordpress.com/2011/06/29/comparison-of-bibo-and-fabio/
(I
have only a fuzzy understanding of the the differences, but probably means
more to you). A moot point since they don't seem to have an XSD file
either at this time, and even if they did this would mean changing the EML
schema? Or is it trivial to extend if you had such bibliographic schema?

On Sun, Jul 7, 2013 at 4:45 PM, Matt Jones [email protected] wrote:

@cboettig https://github.com/cboettig Yeah, software is probably best
listed as . At the time we did this, EndNote was massively
predominant, and the likes of Mendeley and Zotero were far off on the
horizon yet. In retrospect I wish I had known more about Bibtex, as it
seems to have survived the test of time, but in 1998-2000 there simply
weren't any XML-based citation schemas available. So, we ported endnote.
There must be a decent mapping to convert to more modern standards like
Bibtex or Bibo, but I haven't looked carefully for it. I think Dryad uses a
subset of Bibo, but they had to define their own xml schema for that too,
as Bibo doesn't have a schema doc.

Regarding why <title> and are not shown in the spec, its
because they are part of another module that is included by reference,
specifically res:ResourceGroup. In XSD, you can include schema portions
by reference to a group of elements, and as we need the bibliographic
fields that describe resources in many places, we created 'res:ResourceGroupto
be the common group for these fields. So, near the top of theCitationType`
definition (
http://knb.ecoinformatics.org/software/eml/eml-2.1.1/eml-literature.html#CitationType),
you will see this:

A sequence of (
res:ResourceGroup

contact optional unbounded
...
}

which is a group inclusion. Follow the link to res:ResourceGroup and
you'll see all of the fields. If you look at the diagram for
eml-literature, you'll see that those fields in the group have been
included by parsing the XSD (
http://knb.ecoinformatics.org/software/eml/eml-2.1.1/eml-literature.png).


Reply to this email directly or view it on GitHubhttps://github.com/ropensci/reml/issues/22#issuecomment-20577230
.

Carl Boettiger
UC Santa Cruz
http://carlboettiger.info/

from eml.

mbjones avatar mbjones commented on June 13, 2024

Definitely not trivial to extend -- many groups around the world use the EML schema and have written software to generate it and consume it -- any schema changes, especially backwards incompatible ones, have a ripple effect on the community. So, we try to avoid changes that break existing EML documents. Adding something as an optional new field is more acceptable and can generally get approved by the EML community fairly quickly.

from eml.

cboettig avatar cboettig commented on June 13, 2024

@mbjones Okay, I failed to write this methodsStep (which states that reml created the EML) correctly:

My R code creates a nod that looks like this:

<methods>
  <methodsStep>
    <software>
      <license>CC0</license>
      <version>0.0-1</version>
      <implementation>
        <distribution>
          <online>
            <url>https://github.com/ropensci/reml</url>
          </online>
        </distribution>
      </implementation>
    </software>
    <description>An R package for reading, writing, integrating and publishing data
    using the Ecological Metadata Language (EML) format.</description>
  </methodsStep>
</methods> 

And the validator complains:

[1] "cvc-complex-type.2.4.a: Invalid content starting with element 'methodsStep'. The content must match '(((\"\":methodStep){1-UNBOUNDED},(\"\":sampling){0-1}),(\"\":qualityControl){0-UNBOUNDED}){1-UNBOUNDED}'."

Um, does this mean I need sampling and qualityControl in a methods step? I'm confused.

from eml.

karthik avatar karthik commented on June 13, 2024

My R code creates a nod that looks like this

What function generated that metadata?

from eml.

cboettig avatar cboettig commented on June 13, 2024

eml_write, which called eml_dataset, which calls

  methodsStep <- newXMLNode("methodsStep", parent = methods_node)
  addChildren(methodsStep, eml_R_software("reml"))
  addChildren(methodsStep,
              newXMLNode("description",
                         packageDescription("reml", fields="Description")))

which uses eml_R_software creates that node... (using eml_software)

On Sat, Jul 20, 2013 at 8:56 AM, Karthik Ram [email protected]:

My R code creates a nod that looks like this

What function generated that metadata?


Reply to this email directly or view it on GitHubhttps://github.com/ropensci/reml/issues/22#issuecomment-21295658
.

Carl Boettiger
UC Santa Cruz
http://carlboettiger.info/

from eml.

mbjones avatar mbjones commented on June 13, 2024

@cboettig -- Just pushed a fix -- "methodsStep" should have been "methodStep".

from eml.

cboettig avatar cboettig commented on June 13, 2024

thanks!

On Sat, Jul 20, 2013 at 9:19 AM, Matt Jones [email protected]:

@cboettig https://github.com/cboettig -- Just pushed a fix --
"methodsStep" should have been "methodStep".


Reply to this email directly or view it on GitHubhttps://github.com/ropensci/reml/issues/22#issuecomment-21296051
.

Carl Boettiger
UC Santa Cruz
http://carlboettiger.info/

from eml.

cboettig avatar cboettig commented on June 13, 2024

oh, validator still unhappy:

[1] "cvc-complex-type.2.4.a: Invalid content starting with element 'software'. The content must match '((((((\"\":description),((\"\":citation)|(\"\":protocol)){0-UNBOUNDED}),(\"\":instrumentation){0-UNBOUNDED}),(\"\":software){0-UNBOUNDED}),(\"\":subStep){0-UNBOUNDED}),(\"\":dataSource){0-UNBOUNDED})'."

from eml.

mbjones avatar mbjones commented on June 13, 2024

What does the methodStep snippet look like now? The error message is just relating the schema rules, which are somewhat easier to grok in this image:

http://knb.ecoinformatics.org/software/eml/eml-2.1.1/eml-methods.png

from eml.

cboettig avatar cboettig commented on June 13, 2024

now it is:

<methods>
  <methodStep>
    <software>
      <license>CC0</license>
      <version>0.0-1</version>
      <implementation>
        <distribution>
          <online>
            <url>https://github.com/ropensci/reml</url>
          </online>
        </distribution>
      </implementation>
    </software>
    <description>An R package for reading, writing, integrating and publishing data
    using the Ecological Metadata Language (EML) format.</description>
  </methodStep>
</methods> 

from eml.

mbjones avatar mbjones commented on June 13, 2024

Elements need to be in a different order to be valid. In addition, you are missing required fields from the software module, including title, and creator. See:
http://knb.ecoinformatics.org/software/eml/eml-2.1.1/eml-software.png

Something like this might validate (I didn't try it, just tried to follow the schema):

<methods>
  <methodStep>
    <description>An R package for reading, writing, integrating and publishing data
    using the Ecological Metadata Language (EML) format.</description>
    <software>
      <title>reml</title>
      <creator>
            <individualName>
                  <givenName>Carl</givenName><surName>Boettiger</surName>
            </individualName>
      </creator>
      <creator>
            <individualName>
                  <givenName>Karthik</givenName><surName>Ram</surName>
            </individualName>
      </creator>
      <implementation>
        <distribution>
          <online>
            <url>https://github.com/ropensci/reml</url>
          </online>
        </distribution>
      </implementation>
      <license>CC0</license>
      <version>0.0-1</version>
    </software>
  </methodStep>
</methods> 

from eml.

cboettig avatar cboettig commented on June 13, 2024

Thanks for clarifying, sorry I haven't got the hang of reading the spec still. I keep forgetting resourceGroup and forgetting to pay attention to node ordering. Once XMLSchema package is fully running, we will be able to automate the creation of corresponding S4 objects to the schema, so it will just be a matter of writing coercion methods (e.g. like eml_R_software that can extract information from native R formats (e.g. the R package DESCRIPTION) into the S4 object, which will reduce errors like this!

from eml.

karthik avatar karthik commented on June 13, 2024

I don't have a complete understanding of the spec either. @mbjones can you suggest some readings that will allow me to get up to speed?

from eml.

cboettig avatar cboettig commented on June 13, 2024

@karthikram The pngs are pretty handy once you get the hang of them, e.g.
http://knb.ecoinformatics.org/software/eml/eml-2.1.1/eml-software.png I
think ordering of nodes matters, top down as shown. dashed lines are
optional nodes. I don't get the symbols with boxes on lines (either linear
ones or stacked). And of course vector graphic would be eaiser to read
without squinting...

Otherwise I find the descriptions in the 'normative technical documents'
reasonably readable...
http://knb.ecoinformatics.org/software/eml/eml-2.1.1/eml-software.html

As I commented above, a working XMLSchema will be a big help in automating
a lot of this, and letting us focus on the UI. But writing a few nodes out
by hand is pretty instructive.

On Sat, Jul 20, 2013 at 9:23 PM, Karthik Ram [email protected]:

I don't have a complete understanding of the spec either. @mbjoneshttps://github.com/mbjonescan you suggest some readings that will allow me to get up to speed?


Reply to this email directly or view it on GitHubhttps://github.com/ropensci/reml/issues/22#issuecomment-21304956
.

Carl Boettiger
UC Santa Cruz
http://carlboettiger.info/

from eml.

mbjones avatar mbjones commented on June 13, 2024

The diagrams are the best way to understand the spec, although note that the diagrams do not show XML attributes. This was a shortcoming of the software used to generate the schemas. There is a nice explanation of the diagrams here: http://www.diversitycampus.net/projects/tdwg-sdd/minutes/SchemaDocu/SchemaDesignElements.html

I'm not sure what you are referring to with the 'symbols with boxes on lines' comment. Sorry. Getting up to speed on the EML spec (or any other) just takes time -- the EML schema itself is the most useful. We wrote a paper describing use of the spec a while ago targeted at ecologists, but it doesn't get into the technical details of the spec -- see Fegraus et al. 2005: http://www.mendeley.com/download/public/1825821/4445891665/2d411d3da6a51fecf34b4f0061a68d250c486eaa/dl.pdf

These diagrams were built with XML Spy, an XML editor that can display XML Schema. There are several others that will produce diagrams. If you are trying to understand the EML schema, it can be useful to open the eml.xsd schema in one of these editors so that you can explore the schema tree more dynamically -- the images are just static screenshots of the diagrams for certain subtrees.

from eml.

cboettig avatar cboettig commented on June 13, 2024

reml-generated methods node now exists, so I think we can close this issue. software and citation nodes will be written/improved upon once we get this S4 thing nailed down, and then we can revisit this.

from eml.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.