incf / neuroshapes Goto Github PK

View Code? Open in Web Editor NEW

38.0 21.0 25.0 7.57 MB

Open schemas for F.A.I.R. neuroscience data

Home Page: https://incf.github.io/neuroshapes/

License: Creative Commons Attribution 4.0 International

Scala 31.00% Python 69.00%

morphoplogy electrophysiology schemas shacl provenance atlas simulation computational neuroscience

neuroshapes's Introduction

Welcome to Neuroshapes

The goal of Neuroshapes is the development of open, use case driven and shared validatable data models (schemas, vocabularies) to enable the FAIR principles (Findable, Accessible, Interoperable and Reusable) for basic, computational and clinical neuroscience (meta)data. The data models developed thus far entities for electrophysiology, neuron morphology, brain atlases, in vitro electrophysiology and computational modeling. Future developments could include brain imaging, transcriptomic and clinical form data, as determined by community interests.

Table of contents:

Goal
Tutorials
Adoption
Formats and standards
License
Testing the schemas
Roadmap

Goal

The main goal is to promote:

the use of standard semantic markups and linked data principles as ways to structure metadata and related data: the W3C RDF format is leveraged, specifically its developer friendly JSON-LD serialization. The adoption of linked data principles and JSON-LD will ease federated access and discoverability of distributed neuroscience (meta)data over the web.
the use of the W3C SHACL (Shape Constraint Language) recommendation as a rich metadata schema language which is formal and expressive; interoperable; machine interpretable; and domain agnostic. With SHACL, (meta)data quality can be enforced based on schemas and vocabularies (easily discoverable and searchable) rather than being fully encoded in procedural codes. SHACL also provides key interoperability capabities to ensure the evolution of standard data models and data longevity. It allows to incrementally build standard data models in term of semantics and sophistication.
the reuse of existing schemas and semantic markups (like schema.org) and existing ontologies and controlled vocabularies (including NIFSTD - NIF Standard Ontologies)
the use of W3C PROV-O recommendation as a format to record (meta)data provenance: a SHACL version of the W3C PROV-O is created.

Also, Neuroshapes aims at creating a community for an open and use case driven development of not only data models (schemas and vocabularies) and tools around them but also guidelines for FAIR neuroscience (meta)data.

Tutorials

A set of tutorials from the Blue Brain Nexus Forge project are available and use the schemas defined in Neuroshapes as data models to create and validate dataset as well as registering them in Blue Brain Nexus.

Try them in Binder

Adoption

The following projects have adopted Neuroshapes:

Formats and standards

All schemas in this repository conform to the W3C SHACL recommendation and are serialized using JSON-LD.

Testing shapes with examples

Two different tests are executed in the unittest. The first test validates that schemas conform with the SHACL specifications. The second tests consist of having valid and invalid data samples that are going to be tested against the modeled shapes. These examples are placed in the examples directory and follow the directory structure of the shape they should be tested against.

|-- examples
|   |-- neurosciencegraph
|   |   |-- datashapes
|   |   `-- commons
|   |       `-- list
|   |           |-- schema.json
|   |           `-- examples
|   |               |-- datashapes.json 
|   |               `-- valid
|   |               |   `-- recipe_ingredients_list.json 
|   |               `-- invalid
|   |                   `-- recipe_missing_ingredients.json
|   `-- prov     
`-- ...

Tests require python > 3.6, and pytest. To run them follow next:

# create your virtual environment and activate it
python3 -m venv env
source env/bin/activate
# install requirements
pip install pytest pyshacl
# run tests
pytest

To test a set of shapes inside shapes directory, an optional argument can be used:

pytest --testdir=shapes/neurosciencegraph/datashapes/atlas

Roadmap

Creation of an INCF/neuroshapes Special Interest Group
INCF endorsement as a standard and best practice that support FAIR neuroscience data
Extension of the current data model specifications

License

The license for all schemas and data is CC-BY-4.0.

neuroshapes's People

Contributors

Stargazers

Watchers

neuroshapes's Issues

InvalidSchemaIds for long schema names

When trying to upload the schemas to nexus, long schema names cause a 400 error ("invalidSchemaIds")
Example: neurosciencegraph/experiment/intracellularsharpelectroderecordedcell/v0.1.0

neurosciencegraph/simulation/emodelgeneration/v0.1.0 references nexus/provsh/generation/v0.1.0 instead of nexus/provsh/generation/v1.0.0

The emodelgeneration v0.1.0 schema references the wrong version of generation (v0.1.0 is not part of https://github.com/BlueBrain/nexus-prov ) - it probably should be v1.0.0

Create a protocol taxonomy

The idea is to classify every protocol with the activity type for which it can be used. The protocols are then classify by the activity taxonomy concepts.

Create data context v0.1.3 to accommodate storage type in distribution object

To allow a storage type to be added to a distribution object, a new data context version should be created which includes nsg:storageType and an @id for gpfs

Recursive context inclusion exception

A recursive context exception is thrown when loading schemas that import neurosciencegraph/commons/labeledontologyentity/v0.1.0, commons/activity/v0.1.0 and commons/activity/v0.1.1.

Update the protocol shape

Currently a protocol made of sub-protocols is not tackle.
One requirement of the protocol shape is to be able to pull protocols from protocols.io and to push them back. Within protocols.io, the protocols can be tagged with the activity taxonomy concepts (see #126)

Recursive reference for neurosciencegraph/commons/protocol/v0.1.0

When trying to upload neurosciencegraph/commons/protocol/v0.1.0 to nexus, nexus returns the violation {"violations":["Could not load import 'https://nexus-dev.humanbrainproject.org/v0/schemas/neurosciencegraph/commons/protocol/v0.1.0'"],"code":"ShapeConstraintViolations","@context":"https://nexus-dev.humanbrainproject.org/v0/contexts/nexus/core/error/v0.1.0"} - this is obviously non-resolvable.

Initialize simulation related shapes

simulation module related shapes list can be found [here](https://github.com/BlueBrain/nexus-bbp-domains/tree/master/modules/bbp-simulation/src/main/resources/schemas/bbp/simulation.

This work have been (in part) done during the 2018 HBP Brain Simulation hackathon.

Create an experimental protocol

The current Protocol schema is more relevant to describe experimental protocols (lab protocol). Protocols followed during simulation activities for example don't required most of the properties present in the current protocol schema.

A potential solution would be to:

create a generic protocol schema that can be used to publish protocol data independently of the domain of activities.
create an experimentalprotocol schema which specifically contains properties relevant for experimental protocols. It extends the generic protocol schema.
update the activity schema to use the new protocol schema

Extend typedlabeledontologyterm schema

Extend the typedlabeledontologyterm schema with additional shapes which do not enforce typed labeled ontology terms (e.g. a BrainRegionShape in addition to the existing BrainRegionOntologyTermShape which allows a string to be provided)

Clarify how to maintain correspondence between biological and model data

Consider using multiple Github repositories for the different modules

The shapes are currently organized in modules (atlas, electrophysiology, morphology,...) but are managed in one github repository. It is quite obvious that their life cycles will be different and there is no need to release a new version of the atlas related shapes if only the morphology ones changed.

Initialized common and shared shapes

It is useful to have a module where common and shared (e.g. quantitativevalue) schemas are stored.
The schemas are taken from previous work that happened in the following BBP repository:

Initialized electrophysiology related shapes

electrophysiology module related shapes list includes:

entities:
** trace
** wholecellpatchclamp
*activities:
**tracegeneration
** stimulusexperiment

This is obviously not the complete list of all electrophysiology related schemas. Note that some of them are defined in dependant modules:

experiment
core
commons

These shapes were designed and built with the following wholecellpatchclamp-recording provenance pattern in mind.

The shpes are taken from previous work that happened in the following BBP repository.

Create specific schemas for reconstruction from ImageStack and from LabeledCell

Define an approach for enforcing ontologies in the different shapes

The usage of controlled vocabularies (instead of plain text) as values of properties is a highly recommended best practice.
The reasons are multiple but the most important ones are:

Consistent (way less ambiguous) annotation of dataset between organizations and within communities => they can speak the same language (use the same vocabulary) so to say.
The vocabulary used in a given domain of application to describe entities, agents, activities,...
becomes an explicit and normalised artefact that can be shared and maintained.

Now how do we enforce the usage of ontologies/vocabularies in the data models defined in neuroshapes ?

Well let first recognise the fact that it is unlikely one can identified upfront ontologies/vocabularies that will cover all possible used cases users may have. Then a good approach would be to provide a set of defaults ontologies/vocabularies that can be enforced but allow users to enforce other ontologies/vocabularies (with better coverage for example) and/or to just provide text in first place when no ontology/vocabulary is available for a given property value.

Currently, some ontologies/vocabularies needed are referenced in the typedlabeledontologyterm schema. Clearly it needs to be extended to support the above approach.

How then ?
Let assume a shape for enforcing the usage of a brain region ontology needs to be created:

let give it a (fragment) name: BrainRegionShape
The BrainRegionShape can be defined as follows:

 {
      "@id": "this:BrainRegionShape",
      "@type": "sh:NodeShape",
      "label": "A brain region shape.",
      "or": [
        {
          "label": "The expected brain region value is an ontology/vocabulary term: identified by an IRI.",
          "node": "{{base}}/schemas/neurosciencegraph/commons/labeledontologyentity/v0.1.0/shapes/LabeledOntologyEntityShape"
        },{
          "label": "The expected brain region value is from a specific brain region ontology: identified by an IRI.",
          "node": "{{base}}/schemas/neurosciencegraph/commons/typedlabeledontologyterm/v0.1.0/shapes/BrainRegionOntologyTermShape"
        },{
         "label": "The expected brain region value is a free plain text.",
          "datatype": "xsd:string"
        }
      ]
    }

and a property (nsg:brainRegion for example) value can be constrained in the following way:

{
          "path": "nsg:brainRegion",
          "name": "Brain region",
          "description": "A brain region shape.",
          "node": "{{base}}/schemas/neurosciencegraph/commons/typedlabeledontologyterm/v0.1.0/shapes/BrainRegionShape"
}

Related #42

Initialized core neurosciencegraph shapes

Core neurosciencegraph shapes list includes:

prov extension:
** activity
**agent
** collection
** emptycollection
** person
** softwareagent
** organization
** entity
dataset publication:
** dataset
neuroscience related core entities:
** slice
** subject
** protocol

This list can be updated later.

The shapes in this module contain target declaration (mainly targetClass) unlike the shapes in neurosciencegraph/commons (that are only reusable shapes)

The schemas are taken from previous work that happened in the following BBP repository

Create a schema for a prov pattern

Use node key instead of class one in neurosciencegraph/commons/brainlocation/v0.1.1.json

How to handle array of polymorphic items ?

For example items of prov:used

Initialize MINDS data model

MINDS is about enabling neuroscience datasets discovering through minimal metadata. It is made of a set of shapes and vocabularies that specific data types (morphologies, traces,...)can specialise to add specific properties.

Further define model simulation input data

'In addition to the Model and the Configuration, a simulation may use input data files, e.g. containing the stimulus to be applied to the model.' dixit @apdavison.

We should come up with a reasonable default set of input data for a model simulation activity.

Fix typo in the shape HadProtocolValueShape in neurosciencegraph/commons/protocol

sh:PropetyShape instead of sh:PropertyShape

Initialized morphology related shapes

morphology module related shapes list includes:

entities:
** annotatedslice
** fixedstainedslice
** labeledcell
** labeledcellcollection
** reconstructedcell

*activities:
** acquisitionannotation
** fixationstainingmounting
** reconstruction

This is obviously not the complete list of all morphology related shapes. Note that some of them are defined in dependant modules:

experiment
core
commons

These shapes were designed and built with the following morphology reconstruction provenance pattern in mind.

The shapes are taken from previous work that happened in the following BBP repository.

Propose a naming pattern for protocols

A possible pattern is:
Author - Lab - Title

Extend MINDS schema for every neuroscience data type covered by a prov pattern within Neuroshapes

Prerequisite: a data model is available for MINDS (#113)
MINDS data model will only enforce shared properties across data types (like brain location, subject,...). But every data type (morphology, Trace,...) has specific properties that users will use to search with.

How to estimate neuroscience data similarity and relatedness based on Neuroshapes prov patterns

Document the MINDS data model

Document the prov pattern design pattern so that, when followed, dat can be mapped to MINDS.
Write down the MINDS data model targeted competency questions. Implement them through a python script.

To be repurpose: Create an activity taxonomy

The activity taxonomy can be built out of the activity types referenced within the different prov patterns. The taxonomy should define and implement a naming pattern for the activity types (SingleCellStimulusResponse for example)

Provide correct mimetype syntax

Initialized experiment related shapes

experiment related shapes list includes:

activities:
** brainslicing
** wholecellpatchclamp

*entities:
**patchedcell
** patchedcellcollection
** patchedslice
** subjectcollection

This list can be updated later.

The schemas are taken from previous work that happened in the following BBP repository

Add prov patterns for morphology, trace and brain atlas integration

Prov patterns were built in the BBP project to explicitly describe what entities are key when describing the provenance of the following entities:

ReconstructedCell
Trace
AtlasRelease

Initialize brain atlas related shapes

Atlas registration module related shapes/

These shapes were designed and built with the following atlas registration provenance pattern in mind.

The shapes are taken from previous work that happened in the following BBP repository.

Allow a range of ages in AgeShape

(as defined in core/subject)

AgeShape requires a single float value, but some datasets only specify a range of ages, e.g. "21-30 days", rather than providing the precise age of each subject.

Add schemas linking ReconstructedCell and MorphologyRelease

ReconstructedCellReleaseProcess
ReconstructedCellRelease
ReconstructedCellReleaseGeneration
MorphologyDiversification
Configuration

Brain region and species information for simulation schemas

Hi,
Currently brain region and species information is not captured in the simulation domain entities. I think we intended to have it in ModelInstance entity modelOf property. I would actually prefer to have distinct brainRegion and species properties there. Any comments(@jdcourcol @apdavison @MFSY )? This also relates to #53

Create schema for type RecordedCell

Since the schemas for the type PatchedCell and IntracellularSharpElectrodeRecordedCell share many property shapes, a 'parent' schema for type RecordedCell should be create which can then be imported by the schemas for PatchedCell and IntracellularSharpElectrodeRecordedCell.

Create dedicated properties and shapes for specific data identifiers (doi, pmid)

Since multiple identifiers are needed (doi, pubmed id,...) it may be useful to create dedicated properties for them just like schema.org does. For example an "isbn" property is provided as an identifier sub-property.

The goal is to be able to assert that an entity (an article) can have a doi and/or pubmedID as identifiers in the following way:

 {
     "path": "nsg:doi",
     "name": "DOI",
     "description": "A doi identifier",
     "node": "{{base}}/schemas/neurosciencegraph/commons/identifier/v0.1.0/shapes/DOIShape",
     "maxCount": 1
}, {
     "path": "nsg:pmid",
     "name": "PMID",
     "description": "A PubMed identifier",
     "node": "{{base}}/schemas/neurosciencegraph/commons/identifier/v0.1.0/shapes/PMIDShape",
     "maxCount": 1
}

The shapes {{base}}/schemas/neurosciencegraph/commons/identifier/v0.1.0/shapes/DOIShape and {{base}}/schemas/neurosciencegraph/commons/identifier/v0.1.0/shapes/PMIDShape can extends the IdentifierShape.

Externalize nexus-schemas module in BlueBRain/nexus-schemaorg project

Register 'nsg' to prefix/curie registers

nsg is the preferred prefix for neurosciencegraph models (shapes, ontologies as well as taxonomies). It may be useful to enable compact identifiers resolving using nsg:identifier specially if we consider using nsg as prefix for some data.

Align/map the protocol ontology with/to NIF methods ontology

Incorrect nodeKind for prov:wasRevisionOf in commons/entity/v0.1.0.json

The relevant section of https://github.com/INCF/neuroshapes/blob/793326dbb8e42fc56ca716fbe3c240080d49b3c8/modules/commons/src/main/resources/schemas/neurosciencegraph/commons/entity/v0.1.0.json is

        {
          "path": "prov:wasRevisionOf",
          "name": "Revision of",
          "description": "The entity for which this entity is a revision of.",
          "nodeKind": "xsd:dateTime",
          "maxCount": 1
        }

I don't think "nodeKind": "xsd:dateTime" is correct

Add Ontology terms to annotate the different atlases

typedlabeledontologyterm v0.1.0 missing

The schema neurosciencegraph/core/subject/v0.1.0 references /neurosciencegraph/core/typedlabeledontologyterm/v0.1.0 which seems not to be present in the repository

A data model for describing parameters taken from literature

Three specific models are needed here:

a data model to describe scientific publication
- valuable vocabulary sources (schema.org)
a data model to describe parameters:
- shape of a parameter
- numeric parameters with units
- others ?
a data model to describe the provenance of a parameter
- from which publication it comes from and where ?
  - this is related to how the parameter is linked to the paper it help annotate
- who did the annotation, when, evidence sources,... ?

Create a object of studies taxonomy

In the MINDS data model, we would like to have a object of study property. Possible values are brain region, single cell, ... We need a taxonomy organising object of studies

"Date of surgery" should allow multiple values

In core/subject/v0.1.0.json, nsg:dateOfSurgery has "maxCount": 1

For subjects with chronically implanted electrodes, or multiple electrode arrays, it is possible to have more than one surgery, on different days.

Similar remarks apply to nsg:disease and nsg:treatment. These could in principle have multiple values.

Add schemas for whole-brain cell reconstruction from slice collection

Whole-brain cell reconstructions from slice collections are modelled with the following provenance pattern:

https://github.com/annakristinkaufmann/neuroshapes/blob/bb3a2bf73752a69a2594dff36f365c2e73532c83/provpatterns/assets/whole-brain-morphology-reconstruction.svg

The following schemas are required for this pattern:

BrainSlicing generating a SliceCollection
FixationStainingMounting using a SliceCollection and generating a StainedSliceCollection
SliceCollection

Write a script that transforms to MINDS any data conformant to one of the prov patterns defined within Neuroshapes

The context of generation (provenance) for different data types is captured through prov patterns within Neuroshapes.

It results from that the coexistence of many different dataset description (for morphologies, traces,models,...).

The goal here is to write a script that can normalize and unify the description of those different data types using MINDS data model.

The description of the prov patterns should follow a design pattern which should be documented. The script to be developed will suppose that all prov patterns follow the described design pattern.