hdmf-dev / hdmf-schema-language Goto Github PK

View Code? Open in Web Editor NEW

0.0 0.0 2.0 131 KB

The specification language for HDMF

Home Page: https://hdmf-schema-language.readthedocs.io

License: Other

Makefile 29.25% Batchfile 26.42% CSS 1.78% Python 42.56%

hdmf-schema-language's People

Contributors

Watchers

Forkers

neurodatawithoutborders eegkit

hdmf-schema-language's Issues

Remove "default_value" option for datasets

This is not used as far as we know in NWB or HDMF, it doesn't quite make sense, and I do not think it is supported by NWB/HDMF or MatNWB. "value" is already not an option for datasets.

Consider removing the 'linkable' key from the schema

The 'linkable' key for group and dataset specifications is a holdover from NWB 1.0. It is not used by the NWB 2 schema, HDMF common schema, or any extensions to my knowledge. The only known use in the official APIs is in the HDMF validator which raises an IllegalLinkError if a link is made to a group/dataset with linkable=False. I also do not understand why a user would set linkable=False.

@oruebel and I think it could be safely deprecated.
Pro: Removes complexity from schema language. Reduces edge cases and required support in the official APIs.
Con: It is unlikely, but there may be some extensions in the wild that are using 'linkable', and this change would remove support for that functionality.

Bug: Build on ReadTheDocs is currently failing

It looks like build for the "latest" are passing but for "stable" they are not, see https://readthedocs.org/projects/schema-language/builds/

I'm not sure whether we need both stable and latest builds for this repo, but we should discuss a fix and versioning of the repo.

test

Add best practices section

The schema language supports some flexibility in how data types are defined, and some methods are encouraged over others for clarity and consistency. These best practices should be added to the schema language documentation:

Define new data types at the root of the schema rather than nested within another data type definition. Nested type definitions may in some cases lead to errors in HDMF. See hdmf-dev/hdmf#511, hdmf-dev/hdmf#316, and hdmf-dev/hdmf#73
Use the quantity key not in the data type definition but in the group/dataset spec where the type is included. When the data type is defined at the root of the schema (as opposed to nested), then in order to use the data type, a new group (subgroup) spec is defined where the quantity key is set to a value or if omitted, the default value of 1 would be used. This makes the quantity defined in the data type definition meaningless and confusing. See also NeurodataWithoutBorders/nwb-schema#472
Use the name key not in the data type definition but iin the group/dataset spec where the type is included. Mismatch between the name defined on the data type definition and where it is included can lead to confusion in the expected behavior and may lead to errors in HDMF. See hdmf-dev/hdmf#582
Create a new data type when adding attributes/datasets/groups/links to an existing data type. See #13. -- Make this a rule (stop allowing new specs with this)
Modifying the dtype, shape, or quantity of a data type when using data_type_inc should only restrict the values from their original definitions. For example, if type A has dtype: text and type B extends type A (data_type_def: B, data_type_inc: A), then type B should not redefine dtype to be int which is incompatible with the dtype of type A. Same thing if type A is included and a new type is not defined (just data_type_inc: A). In other words, all children types should be valid against the parent type. This is not yet checked in HDMF but see progress in hdmf-dev/hdmf#321 .
Non-scalar values for the value and default_value keys are not yet supported by the official APIs, so these are discouraged until support is added.
Don'e allow spaces in any names. NeurodataWithoutBorders/pynwb#1421

@bendichter @oruebel @ajtritt Can you think of other best practices to add? Do you agree with the above?

Make tag releases of this repo

Corresponding to the schema language version tag used in HDMF when writing schema files, we should make tagged releases of this repo for the corresponding versions.

Document restrictions to datasets without data type inc/def

HDMF does not support datasets that do not have a data type inc/def and contain either a reference dtype a compound dtype. Such a situation should be noted as forbidden in the schema language documentation until support is added.

Dtypes missing from documentation

Supported dtypes were added to the documention here:
https://github.com/NeurodataWithoutBorders/nwb-schema/pull/382/files

but these were not transferred here for some reason.

move these over
investigate why they were not transferred and transfer any other missing changes from nwb schema

Add 'version' key for included namespaces

As new versions of schema A are released, schema A may no longer be compatible with schema B that includes schema A. Older versions of schema A may also not be compatible with schema B that includes schema A.

For example, if the hdmf-common schema has a type X but changes it in a compatibility breaking way from version 2 to version 3, then my extension will break if hdmf-common schema version 3 is loaded. It should be restricted to versions <3.

If the hdmf-common schema introduces a new type X in version 1.4 and my extension schema depends on that new type, then my extension will break if hdmf-common schema version 1.3 is loaded. It should be restricted to versions >=1.4.

These should be combinable and support >, >=, ==, !=, <, and <=, just like in pip requirements specs and conda requirements specs. See https://www.python.org/dev/peps/pep-0440/ and https://www.python.org/dev/peps/pep-0508/

This would be the value for a key "version" under namespaces[i] > schema[i] alongside "namespace" and "data_types", like the following:

namespaces:
- author:
  - ...
  contact:
  - ...
  doc: ...
  name: ndx-my-ext
  schema:
  - namespace: core
    version: >=2,<3
    neurodata_types:
    - NWBDataInterface
    - DynamicTable
    - VectorData
    - VectorIndex
  - source: ndx-my-ext.extensions.yaml
  version: 0.2.0

t

Support specification of at least one of {x, y, z}

This comes up in NWB:

TimeSeries requires at least one of starting_time and rate
PlaneSegmentation requires at least one of pixel_mask, image_mask, voxel_mask

Clarify whether fields can be added to groups/dsets with only `data_type_inc`

Raised in hdmf-dev/hdmf#542 in the context of how extra fields are treated.

It is not clear in the documentation whether the schema language allows new fields for groups/datasets defined with only a data_type_inc. We should clarify this in the documentation.

It is my understanding that it is allowed (though perhaps not recommended) for fields to be added to groups/datasets defined with only a data_type_inc.

For example: a new group may contain a dataset with:

- data_type_def: MyTable
  datasets:
  - name: new_column
    data_type_inc: VectorData
    attributes:
    - name: new_attr
      dtype: text
      doc: a new attribute

A concrete example is in the Units table in NWB: https://github.com/NeurodataWithoutBorders/nwb-schema/blob/dev/core/nwb.misc.yaml#L187-L199

The resolution attribute is added to the base VectorData definition.

Or a new group may be defined as:

- data_type_def: MyContainer
  groups:
  - name: new_group
    data_type_inc: Container
    groups:
    - name: new_subgroup
      data_type_inc: Container
      doc: a new subgroup
    # also add new datasets, new links, and new attributes

A related but not fully matching example is the 'electrodes' table in the NWBFile: https://github.com/NeurodataWithoutBorders/nwb-schema/blob/dev/core/nwb.file.yaml#L269

The 'electrodes' table lacks a data_type_def but specifies many named VectorData types. However, this differs from the above because DynamicTable explicitly allows quantity: * of data_type_inc: VectorData without a name specified.

Note that a group/dataset defined with only a data_type_inc may have a different doc, quantity, dims, shape, and sometimes even dtype than in the data_type_def definition, as long as these are compatible with the included spec (e.g., a square is a special case of a rectangle). This should also be clarified in the documentation and fully supported in HDMF.

If adding fields to types with only data_type_inc is not allowed, then the NWB schema should be amended.
If the above is allowed, then the HDMF validator should be amended to validate against the spec with any additions to the included type.

Build of read the docs fails

The build of the hdmf schema language docs on RTD currently fails with

Running Sphinx v1.8.5

Traceback (most recent call last):
  File "/home/docs/checkouts/readthedocs.org/user_builds/hdmf-schema-language/envs/latest/lib/python3.7/site-packages/sphinx/config.py", line 368, in eval_config_file
    execfile_(filename, namespace)
  File "/home/docs/checkouts/readthedocs.org/user_builds/hdmf-schema-language/envs/latest/lib/python3.7/site-packages/sphinx/util/pycompat.py", line 150, in execfile_
    exec_(code, _globals)
  File "/home/docs/checkouts/readthedocs.org/user_builds/hdmf-schema-language/checkouts/latest/source/conf.py", line 16, in <module>
    from ruamel import yaml
ModuleNotFoundError: No module named 'ruamel'

No support for arrays in compound data types

HDF5 allows you to have vectors of up to 4 dimensions within compound datatypes (ref), but our DTypeSpec does not allow us to have a shape parameter, so we cannot use this feature. I propose that we extend our schema language so that we can put vectors inside of compound data types.

Update docs to say that time can be missing from isodatetime/datetime dtype?

Useful for NWB: NeurodataWithoutBorders/nwb-schema#542

Current docs: https://hdmf-schema-language.readthedocs.io/en/latest/description.html#dtype

Proposal:

Allow date in the current isodatetime / datetime definition. Example: date of birth (time is rarely relevant and rarely tracked)

Alternative:

Define a new date dtype (linkml analog: https://linkml.io/linkml-model/latest/docs/Date/ )
Define a new dateordatetime dtype (linkml analog: https://linkml.io/linkml-model/latest/docs/DateOrDatetime/ )

hdmf-dev / hdmf-schema-language Goto Github PK

hdmf-schema-language's People

Contributors

Watchers

Forkers

hdmf-schema-language's Issues

Recommend Projects

Recommend Topics

Recommend Org