neurodatawithoutborders / nwb-schema Goto Github PK
View Code? Open in Web Editor NEWData format specification schema for the NWB neurophysiology data format
Home Page: http://nwb-schema.readthedocs.io
License: Other
Data format specification schema for the NWB neurophysiology data format
Home Page: http://nwb-schema.readthedocs.io
License: Other
can someone please clarify what this required attribute is supposed to be?
manifold (Iterable): Physical position of each pixel. height, weight, x, y, z.
is this supposed to be a 5 x n_pixels array?
what is the ordering of the pixels?
GH-29 deployed a change that added a default_value for TimeSeries.description and TimeSeries.comments. They have been set to default to the following:
The changelog will need to be updated to reflect this.
Originally reported by: Oliver Ruebel (Bitbucket: oruebel, GitHub: oruebel)
The current specification of OpticalSeries.field_of_view is
For the shape to be consistent with the dims, the shape should change to:
shape:
Users would like to store timestamps in both absolute and relative time. To achieve this, the following modification is proposed:
Proposal
Add an additional top-level field called timestamps_offset - the time difference by which timestamps are offset from being relative to the file's global starting time i.e. session_start_time
. This value is 0 by default i.e. timestamps are directly relative to session_start_time
. If this value is not 0, then to obtain timestamps that are directly relative to session_start_time
, one must subtract this offset.
Advantages
This allows users to store timestamps in absolute POSIX time without having to set session_start_time
to "00:00:00 1 January 1970" i.e. the unix epoch. Saving timestamps as absolute POSIX timestamps can be done by setting timestamps_offset
to the POSIX timestamp equivalent of session_start_time
. Saving timestamps as relative to session_start_time
(the current specification) can be achieved by setting timestamps_offset
to 0.
This also makes the specification easier to develop against, since this will explicitly specify the offset to obtain relative timestamps, eliminating the need for APIs to guess based on range.
Disadvantages
Modifications
[Hackathon]
Proposed Changes
This ticket is in response to NeurodataWithoutBorders/pynwb#101 originally filed by @neuromusic. The proposal is to make "source" optional on NWBContainer.
[Hackathon] The ElectrodeGroup should be restructured. Once this effort is complete, the release notes of the format need to be updated to describe those changes, i.e., the current description of the changes to ElectrodeGroup need to be updated accordingly.
When NWBContainer (formerly Interface) became the base type for all times, it brought along the required help attribute.
The following types need this attribute "hardcoded" i.e. an attribute with value set.
Originally reported by: Oliver Ruebel (Bitbucket: oruebel, GitHub: oruebel)
[Hackthon] Check/clarify in the documentation that only SoftLinks and ExternalLinks are allowed to be used in NWB-N and clarify why HardLinks are not allowed in the format (i.e., cannot be distinguished from regular dataset/group and, hence, unclear primary location of the data and inability to determine target of links)
Originally reported by: Loren Frank (Bitbucket: lmfrnk, GitHub: Unknown)
There are a number of cases when a user would need to know exactly when a particular SpikeWaveform or ElectrodeGroup is valid. As an example, a given unit might be clusterable for only part of the time that it's ElectrodeGroup is valid, and those times may be distributed into multiple, disjoint periods. Similarly, an ElectrodeGroup might be valid for a part of a definied Epoch, and right now there is no way to indicate that is the case.
Note that as there might be multiple intervals where one of these objects was valid, we would need an integrated IntervalSeries, not an integrated Epoch object.
Originally reported by: Oliver Ruebel (Bitbucket: oruebel, GitHub: oruebel)
[Hackathon] At the hackathon it was decided that the datesets created by autogen in the orginal version should be removed. Users will "raise their hand" in case that any specific of these datasets are needed.
Need
List of relevant autogen datasets
Redundant storage of the path of links stored in the same group:
A
is path
”List groups/datasets (of a given type or property) in the current group
/missing_fields: List of required/recommended fields that are missing in the time series (i.e., the file is not NWB compliant)
/epochs/epoch_x/tags : A sorted list of the different tags used by epochs
/num_samples: Number of samples in data (scalar)
/cluster_num: Unique values of the dataset num (i.e., cluster indexes used)
Update release notes to explicitly list the autogen datasets that have been removed:
NWBFile/epochs.tags
NWBFile/epochs/epoch_X.links
TimeSeries.timestamp_link
TimeSeries.extern_fields
TimeSeries.data_link
TimeSeries.missing_fields
Module.interfaces
TimeSeries/num_samples
ClusterWaveforms/clustering_interface_path
Clustering/cluster_nums
EventDetection/source_electricalseries_path
ImageMaskSeries/masked_imageseries_path
IndexSeries/indexed_timeseries_path
RoiResponseSeries/segmentation_interface_path
ImageSegmentation/image_plane/roi_list
MotionCorrection/image_stack_name/original_path
UnitTimes/unit_list
The intro page of the docs http://nwb-overview.readthedocs.io/en/latest/nwbintro.html needs to be updated to point to the GitHub repos instead of Bitbucket.
When including the nested list with the type hierarchy in Latex, Sphinx generates a too deeply nested list resulting in the following error:
! LaTeX Error: Too deeply nested.
See the LaTeX manual or LaTeX Companion for explanation.
Type H <return> for immediate help.
...
l.647 \item
{}
I have already added "enumitem" to the preamble as part of the conf.py of the format docs, however, that does not seem to fix the issue. To address the issue for now, I have modified the generate_format_docs.py script to include the type hierarchy only in the HTML version of the docs via:
def render_namespace(...)
....
if type_hierarchy_include:
if type_hierarchy_include_in_html_only:
ns_desc_doc.add_text('.. only:: html %s%s' % (ns_desc_doc.newline, ns_desc_doc.newline))
ns_desc_doc.add_include(type_hierarchy_include, indent=' ')
ns_desc_doc.add_text(ns_desc_doc.newline)
...
However, this does not really fix the problem but rather avoids it by excluding the problematic parts from the docs. Ideally we would find a workaround that would allow LaTeX to deal with the nested lists properly or modify render_type_hierarchy(...)
to render the hierarchy in a way that it renders properly in both HTML and LaTeX.
Moving here from NeurodataWithoutBorders/pynwb#106 since this is a schema issue.
The refactoring of the NWB infrastructure has brought a much-needed separation of the concerns of the API, schema, and backend. In particular, the schema is now ostensibly agnostic to the specifics of the backend implementation. Practically, the only supported backend is HDF5 via the HDF5IO object in the form.backends.hdf5
module, however this can now be replaced with e.g. a database or filestore backend.
Despite this, the schema maintains the now-archaic "File" semantics.
I propose that we rename NWBFile to be NWBSession (or NWBExperiment or NWBDataset) & update descriptions of various attributes that reference the NWB "file" accordingly.
Change Change small metadata datasets to attributes where appropriate:
Reason: Storing small metadata as datasets (compared to attributes) can lead to: i) clutter in the file hierarchy making it harder for users to navigate files, ii) makes metadata appear as core data, and iii) causes poor performance when extracting metadata from files (reading attributes is more efficient in many cases).
Select Specific Changes Change top-level datasets identifier
, nwb_version
, session_description
, session_start_time
from datasets to attributes. @t-b suggests in #45 that the top-level dataset file_create_date
should remain a dataset as this datasets may need to be update/extended repeatedly rather than just being just a a static piece of metadata.
The NWB spec says "Date + time, Use ISO format (eg, ISO 8601) or a format that is easy to read and unambiguous."
Line 11 in 4dfe947
Line 31 in 4dfe947
This definition is too vague for interaction across different APIs.
I therefore propose to change the wording toStrict ISO8601 format including timezone information and T separator: “2017-08-28T23:24:47+02:00"
. Using a 64bit POSIX timestamp has the drawback that the reader needs to account for leap seconds, but would be equally fine for me.
Note: This issue is different than #49 as it does not change the specification but just clarifies it.
Moved prior ticket from pynwb to nwb-schema issue tracker.
A common analysis is the factorization of timeseries matrizes via PCA, NMF, SVD, CUR etc. matrix factorizations. These decompositions often result in the creation of typically 2-3 matrices, one for row-space, column-space, and weights. It is unclear where and how to store this kind of analysis right now.
[Hackathon] To ease intuition Module should be renamed to more directly identify the purpose of a Module.
Proposed new name
Currently ReadTheDocs does not automatically generate the sources for the schema docs but we have to run make apidoc and check in the sources. This should be automated so that ReadTheDocs automatically generates the sources from the YAML specs directly.
Update documentation of Module to reflect the new name "Processing Module"
Due to changes to the schema (specifically the renaming of Interface and Module and the fact that all types now inherit from NWBContainer) has broken the sorting of core format types into sections. The function that needs to be fixed is sort_type_hierarchy_to_sections
in docs/utils/generator_format_docs.py
I've implemented an API for Igor Pro 1 for reading and writing a NWB files. It implements our subset of the NWB specification language which we use.
In the course of implementing this API the dataset file_create_date
was introduced and modification_time
was deprecated due to my intervention.
See 2 for the changelog entry.
You now seem to partly undo that with the proposed NWB changes.
My original argumentation was:
The definition of modification_time as attribute is suboptimal.
* Attributes can only be read/written in full. This means I always have
to read the whole array append to it and then write it back. This also
means I can not just append to it as e.g. possible with datasets.
* Attributes should be smaller than 64kB, this can be worked around with
some effort though [1]. This limit restricts the nwb files to fewer than
~3000 modifications as my ISO8601 timestamps are 20 characters long.
Therefore I would propose to instead use the dataset
"/modification_time". Datasets allow partial IO and don't have a size
restriction. The attribute "modification_time" would only be allowed for
backward compatibility.
In addition there are some limitations on how often you can overwrite an attribute 2
So I would be in favour of keeping file_create_date
as an dataset.
Update release notes of the format specification to mention that the neurodata_type: custom has been removed without replacement. We may also want to mention this in the specification language release notes.
[Hackathon] The /general should be restructured to ease addition of metadata via extensions. Once this effort is complete, the release notes of the format need to be updated accordingly.
Originally reported by: Loren Frank (Bitbucket: lmfrnk, GitHub: Unknown)
Right now timestamps are optional for a SpikeEventSeries. The list of timestamps should be a required argument, as otherwise the times for each spike could get lost.
Originally reported by: Loren Frank (Bitbucket: lmfrnk, GitHub: Unknown)
It will be important for a user (and for the query framework) to be able to easily determine when a given data set is valid. In some cases this can be determined from the TimeStamp data in an object field, but in other cases, and in particular for SpikeWaveForm/EventDetection objects, the first and last times of events cannot be used for this purpose. As such, a general method that knows to look in, for example, a locally stored IntervalSeries that is part of each data object would be very helpful.
Port of NeurodataWithoutBorders/pynwb#9 to nwb-schema issue tracker.
A typical analysis is to perform a frequency decomposition (or other form of decomposition of signals), e.g,. of electrical recordings of the brain. This kind of analysis results in a timeseries with an additional dimension to represent the bands/features the signal is decomposed into. A possible target for this is FilteredEphys, but, e.g., the ElectricalSeries does only allow for 2D data.
Interface classes have been updated to use default_name instead of fixed name.
Need to update the release notes and rerun the autogen for the docs (required fix of issue #11.)
Originally reported by: Oliver Ruebel (Bitbucket: oruebel, GitHub: oruebel)
The from_hdf5
function doc.utils.render.HierarchyDescription
currently does not add links to the hierarchy nor the relationships that the links define. The reason for this behavior is in large part because h5py.visititems function does not visit links so we need a separate mechanism to discover links.
#6 & @t-b's questions has me thinking.
Is there currently a mechanism for validating one NWB file (or experiment dataset) against an arbitrary schema?
For example...
Hi everyone,
I would like to make quite a large number of suggested additions to the nwb data model.
Short story: we have developed a relational database system for keeping track of data in our lab. We have been using it for several months now and are happy with it. We based the schema on NWB 1.0 but there were many additions needed in order to capture all the data we need to store. While doing this we also had in mind the forthcoming IBL project, and I believe that these additions to the data model will cover most of what is needed there also.
The full data model is at:
http://alyx.readthedocs.io/en/latest/models.html
It's formatted as a Django specification.
All the best,
Kenneth.
[Hackathon] To support processing modules with multiple Interfaces of the same type, all Interfaces should be changes to use a default_name instead of a fixed name (i.e., move value from name to default_name key). As such, all Interfaces should also have a neurodata_type_def (this should already be the case).
Note Need to also add a note to the release notes of the format to describe this change.
Originally reported by: Loren Frank (Bitbucket: lmfrnk, GitHub: Unknown)
[Hackathon]
Need Need the ability to reference arbitrary subsets of electrodes to allow processing of subsets of electrodes while being able to describe and access the data of the electrodes that were used.
Path Andrew/Oliver will create a proposal for restructuring electrode group to address this issue and send it out to decide on a specific solution.
Goal This item should be completed before SFN.
[Use case example: Loren Frank]
As it stands now, the electrode group does not have a list of individual electrode objects that make it up. It would be much easier to work with if one created an Electrode object for each electrode, grouped those together in an ElectrodeGroup where appropriate, and then saved a link / index / identifier for either the electrode or the electrodegroup in associated LFP or spikewaveform series.
Here's an example use case from our previously collected data: A four wire tetrode consists of four electrodes that are part of one ElectrodeGroup. From this four wire tetrode, one electrode is selected to record LFP data at a sampling rate of 1500 Hz, while spike snippets are saved at 30 KHz from all four channels. It would be helpful to have a single electrode object that could be associated with the LFP data and the same electrode object (in association with the other three electrodes of the tetrode) as part of an ElectrodeGroup that was associated with the spike waveforms.
As for the association, ideally things would be stored so the electrode / electrodegroup could be accessed directly from the data.
Interface has been renamed to NWBContainer. Need to update the docs to reflect this change.
NOTES from #29
This change addresses the following issues:
You can view, comment on, or merge this pull request online at:
#29
Commit Summary
File Changes
M bin/reformat_spec.py (128)
M core/nwb.base.yaml (177)
M core/nwb.behavior.yaml (36)
M core/nwb.ecephys.yaml (88)
M core/nwb.epoch.yaml (33)
M core/nwb.file.yaml (201)
M core/nwb.icephys.yaml (70)
M core/nwb.image.yaml (87)
M core/nwb.misc.yaml (50)
M core/nwb.ogen.yaml (8)
M core/nwb.ophys.yaml (106)
M core/nwb.retinotopy.yaml (136)
Patch Links:
https://github.com/NeurodataWithoutBorders/nwb-schema/pull/29.patch
https://github.com/NeurodataWithoutBorders/nwb-schema/pull/29.diff
the following attributes are required to define an ROI
pix_mask (Iterable): List of pixels (x,y) that compose the mask.
pix_mask_weight (Iterable): Weight of each pixel listed in pix_mask.
img_mask (Iterable): ROI mask, represented in 2D ([y][x]) intensity image.
img_mask
is redundant with pix_mask
+pix_mask_weight
it should be required to define img_mask
OR pix_mask
+pix_mask_weight
Originally reported by: Loren Frank (Bitbucket: lmfrnk, GitHub: Unknown)
We often have multiple pieces of information for location (e.g. hippocampus CA1 proximal) so it would be useful if the location and channel_location fields could be a list instead of strings. We could of course parse the strings, but I would think searching would be easier on lists.
[Hackathon] To help with issues of multiple inheritance an ease customization of /general metdata we should
currently "string", this should be a float or int with a separate "unit"
See #31
Originally reported by: Andrew Tritt (Bitbucket: ajtritt, GitHub: ajtritt)
Goal: enhance dataset specification language to allow specification of table datasets
Proposed solution: Allow dtype key of the dataset specification language to take the following options:
HDF5 Storage: In HDF5 such a data type would be represented as a compound data type/
recent changes to the inheritance structure means that objects (such as NWBFile) which previously did not have "source" and "help" attributes (because they were not particularly relevant) now have them required.
see, for example, NeurodataWithoutBorders/pynwb#101
in order to (a) minimize required attributes that may not be relevant for all containers and (b) facilitate the migration of earlier NWB files to the most recent version, I propose that we minimally make "source" and "help" optional attributes of NWB Container
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.