Code Monkey home page Code Monkey logo

neuroshapes's Introduction

Join the chat at https://gitter.im/INCF/neuroshapes Build Status GitHub release

Welcome to Neuroshapes

The goal of Neuroshapes is the development of open, use case driven and shared validatable data models (schemas, vocabularies) to enable the FAIR principles (Findable, Accessible, Interoperable and Reusable) for basic, computational and clinical neuroscience (meta)data. The data models developed thus far entities for electrophysiology, neuron morphology, brain atlases, in vitro electrophysiology and computational modeling. Future developments could include brain imaging, transcriptomic and clinical form data, as determined by community interests.

Table of contents:

Goal

The main goal is to promote:

  • the use of standard semantic markups and linked data principles as ways to structure metadata and related data: the W3C RDF format is leveraged, specifically its developer friendly JSON-LD serialization. The adoption of linked data principles and JSON-LD will ease federated access and discoverability of distributed neuroscience (meta)data over the web.

  • the use of the W3C SHACL (Shape Constraint Language) recommendation as a rich metadata schema language which is formal and expressive; interoperable; machine interpretable; and domain agnostic. With SHACL, (meta)data quality can be enforced based on schemas and vocabularies (easily discoverable and searchable) rather than being fully encoded in procedural codes. SHACL also provides key interoperability capabities to ensure the evolution of standard data models and data longevity. It allows to incrementally build standard data models in term of semantics and sophistication.

  • the reuse of existing schemas and semantic markups (like schema.org) and existing ontologies and controlled vocabularies (including NIFSTD - NIF Standard Ontologies)

  • the use of W3C PROV-O recommendation as a format to record (meta)data provenance: a SHACL version of the W3C PROV-O is created.

Also, Neuroshapes aims at creating a community for an open and use case driven development of not only data models (schemas and vocabularies) and tools around them but also guidelines for FAIR neuroscience (meta)data.

Tutorials

A set of tutorials from the Blue Brain Nexus Forge project are available and use the schemas defined in Neuroshapes as data models to create and validate dataset as well as registering them in Blue Brain Nexus.

Try them in Binder Tutorials

Adoption

The following projects have adopted Neuroshapes:

Formats and standards

All schemas in this repository conform to the W3C SHACL recommendation and are serialized using JSON-LD.

Testing shapes with examples

Two different tests are executed in the unittest. The first test validates that schemas conform with the SHACL specifications. The second tests consist of having valid and invalid data samples that are going to be tested against the modeled shapes. These examples are placed in the examples directory and follow the directory structure of the shape they should be tested against.

|-- examples
|   |-- neurosciencegraph
|   |   |-- datashapes
|   |   `-- commons
|   |       `-- list
|   |           |-- schema.json
|   |           `-- examples
|   |               |-- datashapes.json 
|   |               `-- valid
|   |               |   `-- recipe_ingredients_list.json 
|   |               `-- invalid
|   |                   `-- recipe_missing_ingredients.json
|   `-- prov     
`-- ...

Tests require python > 3.6, and pytest. To run them follow next:

# create your virtual environment and activate it
python3 -m venv env
source env/bin/activate
# install requirements
pip install pytest pyshacl
# run tests
pytest

To test a set of shapes inside shapes directory, an optional argument can be used:

pytest --testdir=shapes/neurosciencegraph/datashapes/atlas

Roadmap

  • Creation of an INCF/neuroshapes Special Interest Group
  • INCF endorsement as a standard and best practice that support FAIR neuroscience data
  • Extension of the current data model specifications

License

The license for all schemas and data is CC-BY-4.0.

neuroshapes's People

Contributors

alegrm avatar annakristinkaufmann avatar apdavison avatar genric avatar huanxiang avatar jonathanlurie avatar mfsy avatar olinux avatar pafonta avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

neuroshapes's Issues

InvalidSchemaIds for long schema names

When trying to upload the schemas to nexus, long schema names cause a 400 error ("invalidSchemaIds")
Example: neurosciencegraph/experiment/intracellularsharpelectroderecordedcell/v0.1.0

Create a protocol taxonomy

The idea is to classify every protocol with the activity type for which it can be used. The protocols are then classify by the activity taxonomy concepts.

Recursive context inclusion exception

A recursive context exception is thrown when loading schemas that import neurosciencegraph/commons/labeledontologyentity/v0.1.0, commons/activity/v0.1.0 and commons/activity/v0.1.1.

Update the protocol shape

Currently a protocol made of sub-protocols is not tackle.
One requirement of the protocol shape is to be able to pull protocols from protocols.io and to push them back. Within protocols.io, the protocols can be tagged with the activity taxonomy concepts (see #126)

Create an experimental protocol

The current Protocol schema is more relevant to describe experimental protocols (lab protocol). Protocols followed during simulation activities for example don't required most of the properties present in the current protocol schema.

A potential solution would be to:

  • create a generic protocol schema that can be used to publish protocol data independently of the domain of activities.
  • create an experimentalprotocol schema which specifically contains properties relevant for experimental protocols. It extends the generic protocol schema.
  • update the activity schema to use the new protocol schema

Extend typedlabeledontologyterm schema

Extend the typedlabeledontologyterm schema with additional shapes which do not enforce typed labeled ontology terms (e.g. a BrainRegionShape in addition to the existing BrainRegionOntologyTermShape which allows a string to be provided)

Consider using multiple Github repositories for the different modules

The shapes are currently organized in modules (atlas, electrophysiology, morphology,...) but are managed in one github repository. It is quite obvious that their life cycles will be different and there is no need to release a new version of the atlas related shapes if only the morphology ones changed.

Initialized electrophysiology related shapes

electrophysiology module related shapes list includes:

entities:
** trace
** wholecellpatchclamp
*activities:
**tracegeneration
** stimulusexperiment

This is obviously not the complete list of all electrophysiology related schemas. Note that some of them are defined in dependant modules:

  • experiment
  • core
  • commons

These shapes were designed and built with the following wholecellpatchclamp-recording provenance pattern in mind.

The shpes are taken from previous work that happened in the following BBP repository.

Define an approach for enforcing ontologies in the different shapes

The usage of controlled vocabularies (instead of plain text) as values of properties is a highly recommended best practice.
The reasons are multiple but the most important ones are:

  • Consistent (way less ambiguous) annotation of dataset between organizations and within communities => they can speak the same language (use the same vocabulary) so to say.

  • The vocabulary used in a given domain of application to describe entities, agents, activities,...
    becomes an explicit and normalised artefact that can be shared and maintained.

Now how do we enforce the usage of ontologies/vocabularies in the data models defined in neuroshapes ?

Well let first recognise the fact that it is unlikely one can identified upfront ontologies/vocabularies that will cover all possible used cases users may have. Then a good approach would be to provide a set of defaults ontologies/vocabularies that can be enforced but allow users to enforce other ontologies/vocabularies (with better coverage for example) and/or to just provide text in first place when no ontology/vocabulary is available for a given property value.

Currently, some ontologies/vocabularies needed are referenced in the typedlabeledontologyterm schema. Clearly it needs to be extended to support the above approach.

How then ?
Let assume a shape for enforcing the usage of a brain region ontology needs to be created:

  • let give it a (fragment) name: BrainRegionShape
  • The BrainRegionShape can be defined as follows:
 {
      "@id": "this:BrainRegionShape",
      "@type": "sh:NodeShape",
      "label": "A brain region shape.",
      "or": [
        {
          "label": "The expected brain region value is an ontology/vocabulary term: identified by an IRI.",
          "node": "{{base}}/schemas/neurosciencegraph/commons/labeledontologyentity/v0.1.0/shapes/LabeledOntologyEntityShape"
        },{
          "label": "The expected brain region value is from a specific brain region ontology: identified by an IRI.",
          "node": "{{base}}/schemas/neurosciencegraph/commons/typedlabeledontologyterm/v0.1.0/shapes/BrainRegionOntologyTermShape"
        },{
         "label": "The expected brain region value is a free plain text.",
          "datatype": "xsd:string"
        }
      ]
    }
  • and a property (nsg:brainRegion for example) value can be constrained in the following way:
{
          "path": "nsg:brainRegion",
          "name": "Brain region",
          "description": "A brain region shape.",
          "node": "{{base}}/schemas/neurosciencegraph/commons/typedlabeledontologyterm/v0.1.0/shapes/BrainRegionShape"
}

Related #42

Initialized core neurosciencegraph shapes

Core neurosciencegraph shapes list includes:

  • prov extension:
    ** activity
    **agent
    ** collection
    ** emptycollection
    ** person
    ** softwareagent
    ** organization
    ** entity

  • dataset publication:
    ** dataset

  • neuroscience related core entities:
    ** slice
    ** subject
    ** protocol

This list can be updated later.

The shapes in this module contain target declaration (mainly targetClass) unlike the shapes in neurosciencegraph/commons (that are only reusable shapes)

The schemas are taken from previous work that happened in the following BBP repository

Initialize MINDS data model

MINDS is about enabling neuroscience datasets discovering through minimal metadata. It is made of a set of shapes and vocabularies that specific data types (morphologies, traces,...)can specialise to add specific properties.

Further define model simulation input data

'In addition to the Model and the Configuration, a simulation may use input data files, e.g. containing the stimulus to be applied to the model.' dixit @apdavison.

We should come up with a reasonable default set of input data for a model simulation activity.

Initialized morphology related shapes

morphology module related shapes list includes:

entities:
** annotatedslice
** fixedstainedslice
** labeledcell
** labeledcellcollection
** reconstructedcell

*activities:
** acquisitionannotation
** fixationstainingmounting
** reconstruction

This is obviously not the complete list of all morphology related shapes. Note that some of them are defined in dependant modules:

  • experiment
  • core
  • commons

These shapes were designed and built with the following morphology reconstruction provenance pattern in mind.

The shapes are taken from previous work that happened in the following BBP repository.

Document the MINDS data model

Document the prov pattern design pattern so that, when followed, dat can be mapped to MINDS.
Write down the MINDS data model targeted competency questions. Implement them through a python script.

To be repurpose: Create an activity taxonomy

The activity taxonomy can be built out of the activity types referenced within the different prov patterns. The taxonomy should define and implement a naming pattern for the activity types (SingleCellStimulusResponse for example)

Initialized experiment related shapes

experiment related shapes list includes:

  • activities:
    ** brainslicing
    ** wholecellpatchclamp

*entities:
**patchedcell
** patchedcellcollection
** patchedslice
** subjectcollection

This list can be updated later.

The schemas are taken from previous work that happened in the following BBP repository

Allow a range of ages in AgeShape

(as defined in core/subject)

AgeShape requires a single float value, but some datasets only specify a range of ages, e.g. "21-30 days", rather than providing the precise age of each subject.

Create schema for type RecordedCell

Since the schemas for the type PatchedCell and IntracellularSharpElectrodeRecordedCell share many property shapes, a 'parent' schema for type RecordedCell should be create which can then be imported by the schemas for PatchedCell and IntracellularSharpElectrodeRecordedCell.

Create dedicated properties and shapes for specific data identifiers (doi, pmid)

Since multiple identifiers are needed (doi, pubmed id,...) it may be useful to create dedicated properties for them just like schema.org does. For example an "isbn" property is provided as an identifier sub-property.

The goal is to be able to assert that an entity (an article) can have a doi and/or pubmedID as identifiers in the following way:

 {
     "path": "nsg:doi",
     "name": "DOI",
     "description": "A doi identifier",
     "node": "{{base}}/schemas/neurosciencegraph/commons/identifier/v0.1.0/shapes/DOIShape",
     "maxCount": 1
}, {
     "path": "nsg:pmid",
     "name": "PMID",
     "description": "A PubMed identifier",
     "node": "{{base}}/schemas/neurosciencegraph/commons/identifier/v0.1.0/shapes/PMIDShape",
     "maxCount": 1
}

The shapes {{base}}/schemas/neurosciencegraph/commons/identifier/v0.1.0/shapes/DOIShape and {{base}}/schemas/neurosciencegraph/commons/identifier/v0.1.0/shapes/PMIDShape can extends the IdentifierShape.

Register 'nsg' to prefix/curie registers

nsg is the preferred prefix for neurosciencegraph models (shapes, ontologies as well as taxonomies). It may be useful to enable compact identifiers resolving using nsg:identifier specially if we consider using nsg as prefix for some data.

typedlabeledontologyterm v0.1.0 missing

The schema neurosciencegraph/core/subject/v0.1.0 references /neurosciencegraph/core/typedlabeledontologyterm/v0.1.0 which seems not to be present in the repository

A data model for describing parameters taken from literature

Three specific models are needed here:

  • a data model to describe scientific publication
    • valuable vocabulary sources (schema.org)
  • a data model to describe parameters:
    - shape of a parameter
    - numeric parameters with units
    - others ?
  • a data model to describe the provenance of a parameter
    • from which publication it comes from and where ?
      • this is related to how the parameter is linked to the paper it help annotate
    • who did the annotation, when, evidence sources,... ?

Create a object of studies taxonomy

In the MINDS data model, we would like to have a object of study property. Possible values are brain region, single cell, ... We need a taxonomy organising object of studies

"Date of surgery" should allow multiple values

In core/subject/v0.1.0.json, nsg:dateOfSurgery has "maxCount": 1

For subjects with chronically implanted electrodes, or multiple electrode arrays, it is possible to have more than one surgery, on different days.

Similar remarks apply to nsg:disease and nsg:treatment. These could in principle have multiple values.

Add schemas for whole-brain cell reconstruction from slice collection

Whole-brain cell reconstructions from slice collections are modelled with the following provenance pattern:

https://github.com/annakristinkaufmann/neuroshapes/blob/bb3a2bf73752a69a2594dff36f365c2e73532c83/provpatterns/assets/whole-brain-morphology-reconstruction.svg

The following schemas are required for this pattern:

  • BrainSlicing generating a SliceCollection
  • FixationStainingMounting using a SliceCollection and generating a StainedSliceCollection
  • SliceCollection

Write a script that transforms to MINDS any data conformant to one of the prov patterns defined within Neuroshapes

The context of generation (provenance) for different data types is captured through prov patterns within Neuroshapes.

It results from that the coexistence of many different dataset description (for morphologies, traces,models,...).

The goal here is to write a script that can normalize and unify the description of those different data types using MINDS data model.

The description of the prov patterns should follow a design pattern which should be documented. The script to be developed will suppose that all prov patterns follow the described design pattern.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.