The hdmf-common-schema from hdmf-dev

Render docs for experimental namespace

The docs for the new experimental namespace are currently not being rendered on ReadTheDocs. This will likely require some changes to the setup of how the docs are generated and/or the hdmf-docutils.

Specify schema language version

Add a comment to all of the YAML files saying which version of the schema language is in use, e.g.,
# hdmf-schema-language version 2.0.2

Add schema validation test

Will need to modify https://github.com/NeurodataWithoutBorders/nwb-schema/blob/dev/nwb.schema.json and see https://github.com/NeurodataWithoutBorders/nwb-schema/pull/379/files

Make tagged release of version 1.8.0

First we should test to make sure everything works in HDMF.

Support NWBv2 files created by IPNWB

Branch: https://github.com/t-b/hdmf-common-schema/tree/use-text-as-encoding-for-dynamic-table-column-names

Good day ✨

When creating NWBv2 files from Igor Pro I'm facing a general problem with the text encoding. I can only write files with UTF8 encoded strings.

The easy fix would be the following change

$ git diff .
diff --git a/common/table.yaml b/common/table.yaml
index 49c8b6c..73a5ad3 100644
--- a/common/table.yaml
+++ b/common/table.yaml
@@ -87,7 +87,7 @@ groups:
     of usability.
   attributes:
   - name: colnames
-    dtype: ascii
+    dtype: text
     dims:
     - num_columns
     shape:
(base)

That would undo the change from NeurodataWithoutBorders/pynwb@b0939429 but I don't know why that was done.

Rename `VocabData` to `EnumText`

From discussion with @oruebel and @ajtritt

VocabData is restricted to string mappings and the name may be confusing in the ontologies world.

Rename some resources fields for consistency

In https://github.com/hdmf-dev/hdmf-common-schema/blob/master/common/resources.yaml:

First:

the keys table contains a field 'key_name'
the entities table contains fields 'entity_id' and 'entity_uri'
the resources table contains fields 'name' and 'resource_uri'

I think the prefixes for these key names (except for "resources" > "name") is redundant and can be removed for readability and consistency with "resources" > "name". This would result in:

the keys table contains a field 'name'
the entities table contains fields 'id' and 'uri'
the resources table contains fields 'name' and 'uri'

Would this be too confusing? If so, then for consistency, the resources table "name" field should be renamed "resource_name".

Second:

the entities table contains fields "keytable_idx" and "resource_table_idx"
the object_keys table contains fields "objecttable_idx" and "keytable_idx"

Either they should all have an underscore before "table" or none of them should. I personally prefer that they have an underscore before "table".

I think there is a bit of unclarity here with regards to the schema language. I do not see any documentation about the HDMF schema language. We do have the schema-language readthedocs, but that page states that it is for NWB and contains the NWB-specific keys neurodata_type_def and neurodata_type_inc. If we want to have a more general schema language we would need

A page describing the HDMF schema language. It may be preferable to just modify the existing NWB specification language.
A formal specification for changing the names of keys. I suppose this should be added to the schema language as well.
A json schema validator for the HDMF schema language.

Add new dtype for external file

Add new dtype for external file, which is basically the same as dtype: text except that its value will get passed to a resolver, if present, during the build process. E.g. for externally stored images and video files in NWB

Doc for `VectorIndex` is wrong

Based on the graphic in the NWB preprint which is also here:
https://github.com/hdmf-dev/hdmf-common-schema/blob/master/docs/source/figures/ragged-array.png

the docstring for VectorIndex is wrong:

hdmf-common-schema/common/table.yaml

Lines 11 to 13 in b22b352

    
           of the DynamicTable by indexing into this VectorData. The first 
        
           vector is at VectorData[0:VectorIndex(0)+1]. The second vector is at 
        
           VectorData[VectorIndex(0)+1:VectorIndex(1)+1], and so on.

I think I wrote this incorrectly months ago. It should say:
The first vector is at VectorData[0:VectorIndex[0]]. The second vector is at VectorData[VectorIndex[0]:VectorIndex[1]].

example:

>>> from hdmf.common import VectorData, VectorIndex
>>> foo = VectorData(name='foo', description='foo column', data=['a', 'b', 'c', 'd'])
>>> foo_ind = VectorIndex(name='foo_index', target=foo, data=[2, 4])
>>> foo_ind[0]
['a', 'b']

Document how ExternalResources `field` values should be written

When adding an object to the ExternalResources objects table, you supply a container object ID and a field. In most cases, the field is the name of a dataset or attribute, but it could be a little more complicated.

Let's say the container is a group data type (e.g., TimeSeries) and it has a dataset (e.g., data) without a data type, and that dataset has a string attribute (e.g., unit). The 'field' value then needs to signal that the field is the 'unit' attribute on the 'data' dataset. This could be done using '/' as a separator, e.g., field='data/unit'.

Now let's say the attribute is not a string but a compound data type with columns/fields 'x', 'y', and 'z', and each column/field is associated with different ontologies. The 'field' value also needs to account for this. This could also be done using '/' as a separator, e.g., field='data/unit/x'.

Whatever string formatting scheme we choose should be explicitly described in the docs and then handled by the API.

This comes up from @oruebel and @rightbower when ontologizing a column/field of a compound data type column in an ICEPhys table.

Refactor Data and Container types to separate yaml

The types Data and Container are much more general than their use in a DynamicTable.

Suggestion: extract them out into a base.yaml file.

Add missing ragged array figures

According to #8, there should be figures in https://hdmf-common-schema.readthedocs.io/en/latest/format_description.html but they do not show up.

Remove the Resources table

Based on feedback from the LinkML developers and ontology experts, the Resources table is redundant and would rarely be used by the community. It adds extra overhead for adding entries to the ExternalResources "database". So, we decided to remove it. cc @oruebel @mavaylon1

The HDMF API will also need to be updated. rly/ndx-external-resources#6 may also need to be updated.

Make `VectorIndex` inherit from `VectorData` instead of `Index`

See NeurodataWithoutBorders/nwb-schema#448

Add URIs for external resources / ontologies

Currently, with the new ExternalResources data type, each term is associated with an external resource (e.g., ontology), a unique identifier at the resource, and the URI for the resource entity.

Users would also like to associate a URI for the external resource. For example, the external resource "NCBITaxon" would have the URI "https://www.ncbi.nlm.nih.gov/taxonomy" associated with it. To normalize these data in the "resources" table, we should add a new table with fields (name, uri) and change the "resource_name" field in the "resources" table to be an index (foreign key) into the new table.

This change requires further feedback and coordination with ontology users.

add dtype to VectorIndex

VectorIndex has no dtype, but it should always be a type of unsigned int, since VectorIndex will always hold indices into a VectorData type

hdmf-common-schema/common/table.yaml

Lines 53 to 67 in 49e5fcb

    
           - data_type_def: VectorIndex 
        
             data_type_inc: Index 
        
             doc: Used with VectorData to encode a ragged array. An array of indices 
        
               into the first dimension of the target VectorData, and forming a map 
        
               between the rows of a DynamicTable and the indices of the VectorData. 
        
             dims: 
        
             - num_rows 
        
             shape: 
        
             - null 
        
             attributes: 
        
             - name: target 
        
               dtype: 
        
                 target_type: VectorData 
        
                 reftype: object 
        
               doc: Reference to the target dataset that this index applies to.

VectorData, VectorIndex, DynamicTableRegion missing shape

The VectorData, VectorIndex, and DynamicTableRegion types are missing the 'shape' key and are thus interpreted as scalar when they should be 1-D/2-D/3-D/4-D, 1-D, and 1-D.

See hdmf-dev/hdmf#269

"object_type" column in ExternalResources may not be sufficient

We added "object_type" in the objects table in ExternalResources to make queries easier.

But in DynamicTables, the "object_type" would be "VectorData" which is very generic and using that would pick up a lot of false positives, so it does not make queries for annotations of table columns any easier.

Documentation for hdmf-common types missing

With the move of data types from NWB to hdmf-common, the hdmf-common types may no longer have documentation. They might get copied over into NWB, but it is not clear. Regardless, the hdmf-common types should have their own documentation on its own readthedocs page.

	of the DynamicTable by indexing into this VectorData. The first
	vector is at VectorData[0:VectorIndex(0)+1]. The second vector is at
	VectorData[VectorIndex(0)+1:VectorIndex(1)+1], and so on.

	- data_type_def: VectorIndex
	data_type_inc: Index
	doc: Used with VectorData to encode a ragged array. An array of indices
	into the first dimension of the target VectorData, and forming a map
	between the rows of a DynamicTable and the indices of the VectorData.
	dims:
	- num_rows
	shape:
	- null
	attributes:
	- name: target
	dtype:
	target_type: VectorData
	reftype: object
	doc: Reference to the target dataset that this index applies to.

hdmf-dev / hdmf-common-schema Goto Github PK

hdmf-common-schema's People

Contributors

Stargazers

Watchers

Forkers

hdmf-common-schema's Issues

Recommend Projects

Recommend Topics

Recommend Org