biocaddie / wg3-metadataspecifications Goto Github PK
View Code? Open in Web Editor NEWWG3 Metadata Specification
WG3 Metadata Specification
Related to the intensity
property of Treatment, add another property to indicate its unit.
Identified in the OMOP CDM mapping.
It should also include Material and StudyGroup (e.g. when the dataset refers to a patient or a cohort, respectively). Consider making the range very general, including any other entity.
intensity is currently represented as:
"intensity": {
"description": "a property used to specify how acute the perturbation is",
"type": "array",
"items": {
"oneOf": [
"string",
"number"
]
}
},
We believe it should, instead, read:
"intensity": {
"description": "a property used to specify how acute the perturbation is",
"type": "array",
"items": {
"oneOf": [
{"type": "string"},
{"type": "number"}
]
}
},
Reported by Anu:
suggestion: 'within the index it may be better to keep this a generic type field (e.g. URL) as most items will have a URL and may not be called homepage'
Following the requirement levels and cardinality restrictions in the specification.
In some repositories, dates are only provided at the dataset level rather than the distribution level. So, we need to relax the requirement level and cardinality for DatasetDistribution.dates from MUST/1..n to MAY/0..n.
Shouldn't there be a relation like publication> 'basedOn' > dataset or publication > 'evidence' >dataset ?
Publication > cites > dataset is ambiguous as to the actual relationship.
Reported by Anu:
"Within the index the information for the resource should be stored with each record"
transform column "$.'Study'.'Configuration'.'StudyNameReportPage'.'_$'" to "Study.Configuration.StudyNameReportPage";
unable to find metadata model
transform column "$.'Study'.'Configuration'.'StudyProjects'.'Project'" to "Study.StudyProjects.Project"
unable to find metadatamodel
transform column "$.'Study'.'Configuration'.'Publications'.'Publication'[*].'Journal'.'@notes'" to "Study.Configuration.Publications.Publication[].Journal.@notes";
unable to find metadatamodel
transform column "$.'Study'.'Configuration'.'Diseases'.'Disease'[*].'@vocab_source'" to "Disease[].@vocab_source";
unable to find metadatamodel
transform column "$.'Study'.'Configuration'.'DisplayPublicSummary'.'_$'" to "Study.Configuration.DisplayPublicSummary"
unable to find metadatamodel
transform column "$.'Study'.'AuthorizedAccess'.'DacInfo'.'DacEmail'.'_$'" to "Organization.emailAddress";
unable to find metadatamodel
transform column "$.'Study'.'AuthorizedAccess'.'DacInfo'.'DacPhone'" to "Organization.phoneNumber";
unable to find metadatamodel
transform column "$.'Study'.'AuthorizedAccess'.'Policy'.'@ref_ssDacId'" to "Study.AuthorizedAccess.Policy.@ref_ssDacId";
unable to find metadatamodel
transform column "$.'Study'.'AuthorizedAccess'.'Policy'.'DisplayResearchStatement'.'_$'" to "Study.AuthorizedAccess.Policy.DisplayResearchStatement";
unable to find metadatamodel
transform column "$.'Study'.'AuthorizedAccess'.'Policy'.'DisplayPublicSummary'.'_$'" to "Study.AuthorizedAccess.Policy.DisplayPublicSummary";
unable to find metadatamodel
transform column "$.'Study'.'AuthorizedAccess'.'Policy'.'EmbargoLength'.'_$'" to "Study.AuthorizedAccess.Policy.EmbargoLength";
unable to find metadatamodel
transform column "$.'Study'.'AuthorizedAccess'.'Policy'.'YearsUntilRenewal'.'_$'" to "Study.AuthorizedAccess.Policy.YearsUntilRenewal";
unable to find metadatamodel
transform column "$.'Study'.'AuthorizedAccess'.'Policy'.'WeeksCancelRequest'.'$'" to "Study.AuthorizedAccess.Policy.WeeksCancelRequest";
unable to find metadatamodel
transform column "$.'Study'.'AuthorizedAccess'.'Policy'.'PdfSupplementReqired'.'$'" to "Study.AuthorizedAccess.Policy.PdfSupplementReqired";
unable to find metadatamodel
transform column "$.'Study'.'AuthorizedAccess'.'Policy'.'AcknowledgementText'.'para'.'_$'" to "Study.AuthorizedAccess.Policy.AcknowledgementText.para";
unable to find metadatamodel
transform column "$.'Study'.'AuthorizedAccess'.'Policy'.'@ref_ssDacId'" to "Study.AuthorizedAccess.Policy.@ref_ssDacId";
unable to find metadatamodel
transform column "$.'Study'.'AuthorizedAccess'.'Policy'.'DisplayResearchStatement'.'_$'" to "Study.AuthorizedAccess.Policy.DisplayResearchStatement";
unable to find metadatamodel
transform column "$.'Study'.'AuthorizedAccess'.'Policy'.'DisplayPublicSummary'.'_$'" to "Study.AuthorizedAccess.Policy.DisplayPublicSummary";
unable to find metadatamodel
transform column "$.'Study'.'AuthorizedAccess'.'Policy'.'EmbargoLength'.'_$'" to "Study.AuthorizedAccess.Policy.EmbargoLength";
unable to find metadatamodel
transform column "$.'Study'.'AuthorizedAccess'.'Policy'.'YearsUntilRenewal'.'_$'" to "Study.AuthorizedAccess.Policy.YearsUntilRenewal";
unable to find metadatamodel
transform column "$.'Study'.'AuthorizedAccess'.'Policy'.'WeeksCancelRequest'.'_$'" to "Study.AuthorizedAccess.Policy.WeeksCancelRequest";
unable to find metadatamodel
transform column "$.'Study'.'AuthorizedAccess'.'Policy'.'PdfSupplementReqired'.'_$'" to "Study.AuthorizedAccess.Policy.PdfSupplementReqired";
unable to find metadatamodel
transform column "$.'Study'.'AuthorizedAccess'.'Policy'.'AcknowledgementText'.'para'.'_$'" to "Study.AuthorizedAccess.Policy.AcknowledgementText.para";
unable to find metadatamodel
transform column "$.'Study'.'AuthorizedAccess'.'Policy'.'DocumentSet'.'DataUseCertificate'.'@filename'" to "Study.AuthorizedAccess.Policy.DocumentSet.DataUseCertificate.@filename";
unable to find metadatamodel
transform column "$.'Study'.'AuthorizedAccess'.'ConsentGroups'.'ParticipantSet'[*].'@groupNum-REF'" to "Study.AuthorizedAccess.ConsentGroups.ParticipantSet[].@groupNum-REF";
similar to consent group groupnum above
transform column "$.'Study'.'AuthorizedAccess'.'ConsentGroups'.'ParticipantSet'[*].'ConsentName'.'_$'" to "Study.AuthorizedAccess.ConsentGroups.ParticipantSet[].ConsentName";
similar to consent group longname
transform column "$.'Study'.'AuthorizedAccess'.'ConsentGroups'.'ParticipantSet'[*].'ConsentAbbrev'.'_$'" to "Study.AuthorizedAccess.ConsentGroups.ParticipantSet[].ConsentAbbrev";
similar to consent group shortname
transform column "$.'Study'.'AuthorizedAccess'.'ConsentGroups'.'ParticipantSet'[*].'UseLimitation'.'_$'" to "Study.AuthorizedAccess.ConsentGroups.ParticipantSet[].UseLimitation";
unable to find metadata model
transform column "$.'Study'.'AuthorizedAccess'.'ConsentGroups'.'ParticipantSet'[*].'IrbRequired'.'_$'" to "Study.AuthorizedAccess.ConsentGroups.ParticipantSet[].IrbRequired";
unable to find metadata model
transform column "$.'Study'.'Documents'.'Document'[*].'@phd'" to "Study.Documents.Document[].@phd";
unable to find metadata model
transform column "$.'Study'.'Documents'.'Document'[*].'@type'" to "Study.Documents.Document[].@type";
unable to find metadata model
transform column "$.'Study'.'Documents'.'Document'[*].'@Createdate'" to "Study.Documents.Document[].@Createdate";
unable to find metadata model
transform column "$.'Study'.'Documents'.'Document'[*].'@modDate'" to "Study.Documents.Document[].@modDate";
unable to find metadata model
transform column "$.'Study'.'Documents'.'Document'[*].'@UrltoXML'" to "Study.Documents.Document[].@UrltoXML";
unable to find metadata model
transform column "$.'Study'.'Documents'.'Document'[*].'@urlToHtml'" to "Study.Documents.Document[].@urlToHtml";
unable to find metadata model
transform column "$.'Study'.'Documents'.'Document'[*].'OrigName'.'_$'" to "Study.Documents.Document[].OrigName";
unable to find metadata model
transform column "$.'Study'.'Documents'.'Document'[*].'DisplayName'.'_$'" to "Study.Documents.Document[].DisplayName";
unable to find metadata model
transform column "$.'Study'.'Documents'.'Document'[*].'Description'.'_$'" to "Study.Documents.Document[].Description";
unable to find metadata model
transform column "$.'Study'.'Documents'.'Document'[*].'displayStatus'.'_$'" to "Study.Documents.Document[].displayStatus";
unable to find metadata model
transform column "$.'Study'.'Documents'.'Document'[*].'xmlStatus'.'_$'" to "Study.Documents.Document[].xmlStatus";
unable to find metadata model
transform column "$.'Study'.'Annotations'.'DocumentPart'[*].'@phd'" to "DataAnalysis.identifier.DocumentPart[].@phd";
unable to find metadata model
transform column "$.'Study'.'Annotations'.'DocumentPart'[*].'@sectionId'" to "Study.Annotations.DocumentPart[].@sectionId";
unable to find metadata model
transform column "$.'Study'.'Annotations'.'DocumentPart'[*].'phvList'.'phvRef'.'@variableId'" to "Study.Annotations.DocumentPart[].phvList.phvRef.@variableId";
unable to find metadata model
transform column "$.'Study'.'Annotations'.'DocumentPart'[].'phvList'.'phvRef'[].'@variableId'" to "Study.Annotations.DocumentPart[].phvList.phvRef[].@variableId";
unable to find metadata model
transform column "$.'Study'.'Annotations'.'DocumentPart'[*].'phvList'" to "Study.Annotations.DocumentPart[].phvList"
unable to find metadata model
transform column "$.'Study'.'Analyses'.'Analysis'[*].'@genomeBuild'" to "Study.Analyses.Analysis[].@genomeBuild";
transform column "$.'Study'.'Analyses'.'Analysis'[*].'@snpBuild'" to "Study.Analyses.Analysis[].@snpBuild";
transform column "$.'Study'.'Analyses'.'Analysis'[*].'@analysisType'" to "Study.Analyses.Analysis[].@analysisType";
transform column "$.'Study'.'Analyses'.'Analysis'[*].'Description'.'_$'" to "DataAnalyses.description[]".@Analysis;
transform column "$.'Study'.'Analyses'.'Analysis'[*].'Method'.'_$'" to "Study.Analyses.Analysis[].Method";
transform column "$.'Study'.'Analyses'.'Analysis'[*].'GtyPlatform'.'@probeNum'" to "Study.Analyses.Analysis[].GtyPlatform.@probeNum";
Unsure what GtyPlatform is referring to
transform column "$.'Study'.'Analyses'.'Analysis'[*].'GtyPlatform'.'@snpBatchId'" to "Study.Analyses.Analysis[].GtyPlatform.@snpBatchId";
Unsure what GtyPlatform is referring to
transform column "$.'Study'.'Analyses'.'Analysis'[*].'GtyPlatform'.'Vendor'.'_$'" to "Study.Analyses.Analysis[].GtyPlatform.Vendor";
Unsure what GtyPlatform is referring to
transform column "$.'Study'.'Analyses'.'Analysis'[*].'GtyPlatform'.'VendorURL'" to "Study.Analyses.Analysis[].GtyPlatform.VendorURL";
Unsure what GtyPlatform is referring to
transform column "$.'Study'.'Analyses'.'Analysis'[*].'GtyPlatform'.'Platform'.'_$'" to "Study.Analyses.Analysis[].GtyPlatform.Platform";
Unsure what GtyPlatform is referring to
transform column "$.'Study'.'Analyses'.'Analysis'[*].'Comment'" to "Study.Analyses.Analysis[].Comment";
unble to find metadata model;
Unsure what GtyPlatform is referring to
transform column "$.'Study'.'Analyses'.'Analysis'[*].'GtyPlatform'.'VendorURL'.'_$'" to "Study.Analyses.Analysis[].GtyPlatform.VendorURL";
Unsure what GtyPlatform is referring to
To make things consistent, each entity which has a type will have a 'type' property without referring to the entity name again (e.g. Dataset has a type, rather than a 'datasetType').
some ask for more information, such as 'Mid Initial'.
other would rather have all the information lumped in one attribute, owing to the difficulty to assigned the result of a string split to either 'first name' or 'last name'.
Below is an example from Dryad repository:
"record": {
"header": {
"identifier": {"$": "oai:datadryad.org:10255/dryad.149"},
"setSpec": {"$": "hdl_10255_dryad.148"},
"repository": {"$": "Dryad Data Repository"},
"setName": {"$": "BIRDD"}
}
}
BIRDD (Beagle Investigations Return with Darwinian Data) is a collection of data relating to Galapagos finches. It spans multiples publications from multiple researchers, but all data has been converted into standardized formats for easy comparison.
Link to the dataset: http://datadryad.org/handle/10255/dryad.149
Linke to the set: http://datadryad.org/handle/10255/dryad.148
This set information isn't in the current metadata model.
reported by Anu.
The current attribute 'structure' is probably too narrow and its definition seems to imply either Protein or Nucleic Acid
Distribution can only be linked to licenses via an Access object and the accessModalities relations.
I am inclined to regularize the representation and move the 'license' element out of Access to bump it up to Distribution to be consistent with DataRepository, Publication,Software and Standards.
The alternative (which results in more involved navigation) means adding Access element to DataRepository, Publication,Software and Standards objects.
As discussed with @jgrethe, In order to differentiate between primary publication(s) (those publications from where the dataset was originated or first described --- for example for structures in PDB) and publications in which the dataset is cited, we will replace the property 'isCitedBy' in Dataset for two other properties:
We will also add a 'citationsCount' property.
The first element in the "partOf" definition should read:
"partOf" : {
----> "description" : "a property used to...
rather than
-----> "dimension" : "a property used to ...
While the relationships DatasetDistribution/DataRepository and DatasetDistribution/Access where first defined with cardinality 0..n, they need to be modified to be 1:1 so that the Access information corresponds to the specific repository. Thus, if a form of a Dataset is available in two different repositories and/or with two different access modalities, it will define different DatasetDistributions.
Following the approach by DCAT, which includes the dct:format property to the Distribution class. Format is also available in DataCite.
Discussion about this addition started followed issue identified at the workshop in June.
We believe that the allOf in studySchema should read:
"allOf": [
{
"$ref": "activity_schema.json#"
},
---> { "type": "object",
---> "properties": {
"schedulesActivity": {
...
}
----> }
Since the intention of relatedIdentifers is to provide links to related resources (correct me if I'm wrong), wouldn't 'relatedResources' be a clearer key name?
As discussed with CDT, we are adding the following qualifiers for Dataset:
The CVs for these qualifiers are being defined.
To report how the data is protected (relevant for human/clinical data).
Since WG7 is finished with their work on the accessibility metadata - this should be included at the dataset level.
According to the JSON-Schema specification, "allOf" references a list of objects, "each of which is a schema". The "uses", "input" and "output" elements are objects. We believe that the schema should read:
"allOf": [
{
"$ref": "activity_schema.json#"
},
{
---> "type": "object",
---> "properties": {
"uses": {
...
"minItems" : 1
}
---> }
This would be consistent with other uses of "allOf"
If we understand the goal correctly, we believe that:
"agent": {
"description": "a property used to specify the nature of the perturbation or intervention used in the study",
"oneOf": [
{
"type": {
"$ref": "molecular_entity_schema.json#"
}
},
{
"type": {
"$ref": "material_schema.json#"
}
},
{
"type": {
"$ref": "activity_schema.json#"
}
},
{
"type": "string",
"format": "uri"
}
should instead be:
"agent": {
"description": "a property used to specify the nature of the perturbation or intervention used in the study",
"oneOf": [
{"$ref": "molecular_entity_schema.json#"},
{"$ref": "material_schema.json#"},
{"$ref": "activity_schema.json#"},
{
"type": "string",
"format": "uri"
}
Line 50 should read
{"$ref": "dataset_schema.json" ...
rather than
{"$ref": "dateset_schema.json" ...
rationale:
nomenclature consistency with CategoryValuesPair object which uses categoryIRI and valueIRI
The current resource descriptions start with: "JSON-schema representing ..." (e.g. activity_schema.json -- description: JSON-schema representing an activity/process in the DDI model....). This looks odd when the schema is translated into other representations -- OWL, XML Schema, UML Models, etc. We would recommend that the names be changed to identify what an instance of the schema represents, not a description of what we are looking at. As an example, the activity description might read
""description": "An activity/process in the DDI model. A type of process scheduled in a study". Similarly, the "title" attribute would make more sense described as:
"description": "The title/name of the process" instead of
"description": "a property to specify the title/name of the process",
Related datasets will be address through related identifiers.
Seems like the license would be scoped to a dataset, not how it is distributed.
should have createdBy from dataset to person and inverse created from person to dataset.
the "type" property for the "values" key on dimension is "array", but the type for the objects in the array is not defined. Is this valid JSON schema v4?
As discussed in CDT call 22 Nov 2016.
Since these are both arrays, there needs to be some clear guidance on what to represent as an Identifier and what to represent as an alternateIdentifier (I'm a newbie, and haven't studied all the docs...)
Our understanding of json schema is that "allOf" has a list of objects, each of which is a schema. We believe that the schema should read:
{"$ref": "activity_schema.json#"},
{
---> "type": "object",
---> "properties": {
"agent": {
"description": "a property used to specify the nature of the perturbation or intervention used in the study",
...
"type": "number"
}
---> }
"A physical entity, part of collection or used in a study." Its not clear what the second part is supposed to mean, needs clarification.
...to enable direct representation of the software that may have been used to generate the dataset.
the category_values_pair allows an array of value strings and an array of valueIRI, which does not make explicit binding between a string value (what I'd think of as a label) and an IRI for the value. Why not use annotation object which does allow binding between a string and an IRI:
"values": {
"description": "A set of values associated with the cateogory.",
"type": "array",
"items": {
"$ref" : "annotation_schema.json#"
}
}
In trying to imagine what an implementation of DataSet.isAbout would look like, I can only guess that it would be an array of Identifiers (a URI) or Annotations (a label (aka value) or a URI). Perhaps it would make more sense to make the data type something like Annotation
reported by Anu:
(i.e. the data that does not fit within the base model)
the logic of why material.characteristic can have a value whose data type is a dimension or a material is not clear. Dimension makes sense, its what I'd think of as a property of the material, but I don't get a material as a characteristic of a material. Is this confusing attribute-of with part-of relationships?
I was just looking around the specifications trying to figure out how to map ImmPort studies to bioCADDIE and I noticed what may be misspellings in the DATS-sdo-context.jsonld file. Not sure if it matters but it looks like in a couple of places (Study and Disease) "Medica" should be "Medical"
Thank you,
John Campbell
Study: {
@id: "lifescisdo:MedicaStudy",
@type: "@id"
},
StudyGroup: {
@id: "lifescisdo:MISSING",
@type: "@id"
},
Treatment: {
@id: "lifescisdo:MedicalProcedure",
@type: "@id"
},
Disease: {
@id: "lifescisdo:MedicaCondition",
@type: "@id"
},
Another extension to DATS relates to the provenance information maintained within the index. This should include: ingestMethod, ingestTarget, filePattern, ingestTime/timestamp, transformationFile (considering version) used.
Following from issue #48, we need an entity to cover the location for a dataset, but also for activities and organization.
We looked at:
DataCite schema (http://schema.datacite.org) and its use of GeoLocation ( see example here: http://schema.datacite.org/meta/kernel-4.0/example/datacite-example-GeoLocation-v4.0.xml)
schema.org and its specification of Place (https://schema.org/Place) used in CreativeWork/Dataset for spatialCoverage
GeoJSON specification (http://geojson.org/, RFC7946 https://tools.ietf.org/html/rfc7946)
and as a result we are adding the Place entity with properties: name, description, postalAddress, geometry (point, line, polygon) and coordinates ( list of pairs for lat/lon)
This entity will be the range for Dataset.spatialCoverage, but also all the activities location and Organization.location (instead of postalAddress)
/* Not in the metadata model */
/* Data monitoring committee */
transform column "$.'clinical_study'.'oversight_info'.'authority'.'$'" to "clinicalStudy.oversight_info.authority";
transform column "$.'clinical_study'.'oversight_info'.'has_dmc'.'$'" to "clinicalStudy.oversight_info.has_dmc";
transform column "$.'clinical_study'.'overall_status'.'$'" to "study.status";
transform column "$.'clinical_study'.'phase'.'$'" to "study.phase";
transform column "$.'clinical_study'.'arm_group'[].'arm_group_type'.'$'" to "studyGroup.type[]";
transform column "$.'clinical_study'.'arm_group'[].'description'.'$'" to "studyGroup.description[]";
transform column "$.'clinical_study'.'verification_date'.'$'" to "dataset.verificationDate";
transform column "$.'clinical_study'.'is_fda_regulated'.'$'" to "dataset.is_fda_regulated";
transform column "$.'clinical_study'.'has_expanded_access'.'_$'" to "dataset.has_expanded_access";
/* In first iteration we can leave this - however, this should be added to the WG2 issue tracker as something to discuss with WG3 /
transform column "$.'clinical_study'.'arm_group'[].'description'.'$'" to "studyGroup.description[]";
transform column "$.'clinical_study'.'intervention'.'intervention_type'.'$'" to "treatment.title";
transform column "$.'clinical_study'.'intervention'.'intervention_name'.'$'" to "treatment.agent";
transform column "$.'clinical_study'.'intervention'.'description'.'$'" to "treatment.description";
transform column "$.'clinical_study'.'eligibility'.'criteria'.'textblock'.'$'" to "study.recruits.criteria";
transform column "$.'clinical_study'.'eligibility'.'gender'.'$'" to "study.recruits.gender";
transform column "$.'clinical_study'.'eligibility'.'minimum_age'.'$'" to "study.recruits.minimum_age";
transform column "$.'clinical_study'.'eligibility'.'maximum_age'.'$'" to "study.recruits.maximum_age";
/* In first iteration we can leave this - however, this should be added to the WG3 issue tracker as something to discuss with WG3 /
For other countries we need to turn that into a multi value array/
transform column "$.'clinical_study'.'location'.'facility'.'name'.'$'" to "study.location.name";
transform column "$.'clinical_study'.'location'.'facility'.'address'.'city'.'$'" to "study.location.city";
transform column "$.'clinical_study'.'location'.'facility'.'address'.'zip'.'$'" to "study.location.zip";
transform column "$.'clinical_study'.'location'.'facility'.'address'.'country'.'$'" to "study.location.country";
transform column "$.'clinical_study'.'location_countries'.'country'.'_$'" to "study.location.othercountries[0]";
Line 36 should read:
"$ref": "study_group_schema.json#"
instead of
"$ref": "study_group.json#"
it is unclear from the current definition how it should be used and what is expected
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.