Code Monkey home page Code Monkey logo

hdmf-schema-language's People

Contributors

bendichter avatar mavaylon1 avatar rly avatar

Watchers

 avatar  avatar  avatar  avatar  avatar

hdmf-schema-language's Issues

Remove "default_value" option for datasets

This is not used as far as we know in NWB or HDMF, it doesn't quite make sense, and I do not think it is supported by NWB/HDMF or MatNWB. "value" is already not an option for datasets.

Consider removing the 'linkable' key from the schema

The 'linkable' key for group and dataset specifications is a holdover from NWB 1.0. It is not used by the NWB 2 schema, HDMF common schema, or any extensions to my knowledge. The only known use in the official APIs is in the HDMF validator which raises an IllegalLinkError if a link is made to a group/dataset with linkable=False. I also do not understand why a user would set linkable=False.

@oruebel and I think it could be safely deprecated.
Pro: Removes complexity from schema language. Reduces edge cases and required support in the official APIs.
Con: It is unlikely, but there may be some extensions in the wild that are using 'linkable', and this change would remove support for that functionality.

Add best practices section

The schema language supports some flexibility in how data types are defined, and some methods are encouraged over others for clarity and consistency. These best practices should be added to the schema language documentation:

  1. Define new data types at the root of the schema rather than nested within another data type definition. Nested type definitions may in some cases lead to errors in HDMF. See hdmf-dev/hdmf#511, hdmf-dev/hdmf#316, and hdmf-dev/hdmf#73
  2. Use the quantity key not in the data type definition but in the group/dataset spec where the type is included. When the data type is defined at the root of the schema (as opposed to nested), then in order to use the data type, a new group (subgroup) spec is defined where the quantity key is set to a value or if omitted, the default value of 1 would be used. This makes the quantity defined in the data type definition meaningless and confusing. See also NeurodataWithoutBorders/nwb-schema#472
  3. Use the name key not in the data type definition but iin the group/dataset spec where the type is included. Mismatch between the name defined on the data type definition and where it is included can lead to confusion in the expected behavior and may lead to errors in HDMF. See hdmf-dev/hdmf#582
  4. Create a new data type when adding attributes/datasets/groups/links to an existing data type. See #13. -- Make this a rule (stop allowing new specs with this)
  5. Modifying the dtype, shape, or quantity of a data type when using data_type_inc should only restrict the values from their original definitions. For example, if type A has dtype: text and type B extends type A (data_type_def: B, data_type_inc: A), then type B should not redefine dtype to be int which is incompatible with the dtype of type A. Same thing if type A is included and a new type is not defined (just data_type_inc: A). In other words, all children types should be valid against the parent type. This is not yet checked in HDMF but see progress in hdmf-dev/hdmf#321 .
  6. Non-scalar values for the value and default_value keys are not yet supported by the official APIs, so these are discouraged until support is added.
  7. Don'e allow spaces in any names. NeurodataWithoutBorders/pynwb#1421

@bendichter @oruebel @ajtritt Can you think of other best practices to add? Do you agree with the above?

Make tag releases of this repo

Corresponding to the schema language version tag used in HDMF when writing schema files, we should make tagged releases of this repo for the corresponding versions.

Add 'version' key for included namespaces

As new versions of schema A are released, schema A may no longer be compatible with schema B that includes schema A. Older versions of schema A may also not be compatible with schema B that includes schema A.

For example, if the hdmf-common schema has a type X but changes it in a compatibility breaking way from version 2 to version 3, then my extension will break if hdmf-common schema version 3 is loaded. It should be restricted to versions <3.

If the hdmf-common schema introduces a new type X in version 1.4 and my extension schema depends on that new type, then my extension will break if hdmf-common schema version 1.3 is loaded. It should be restricted to versions >=1.4.

These should be combinable and support >, >=, ==, !=, <, and <=, just like in pip requirements specs and conda requirements specs. See https://www.python.org/dev/peps/pep-0440/ and https://www.python.org/dev/peps/pep-0508/

This would be the value for a key "version" under namespaces[i] > schema[i] alongside "namespace" and "data_types", like the following:

namespaces:
- author:
  - ...
  contact:
  - ...
  doc: ...
  name: ndx-my-ext
  schema:
  - namespace: core
    version: >=2,<3
    neurodata_types:
    - NWBDataInterface
    - DynamicTable
    - VectorData
    - VectorIndex
  - source: ndx-my-ext.extensions.yaml
  version: 0.2.0

Clarify whether fields can be added to groups/dsets with only `data_type_inc`

Raised in hdmf-dev/hdmf#542 in the context of how extra fields are treated.

It is not clear in the documentation whether the schema language allows new fields for groups/datasets defined with only a data_type_inc. We should clarify this in the documentation.

It is my understanding that it is allowed (though perhaps not recommended) for fields to be added to groups/datasets defined with only a data_type_inc.

For example: a new group may contain a dataset with:

- data_type_def: MyTable
  datasets:
  - name: new_column
    data_type_inc: VectorData
    attributes:
    - name: new_attr
      dtype: text
      doc: a new attribute

A concrete example is in the Units table in NWB: https://github.com/NeurodataWithoutBorders/nwb-schema/blob/dev/core/nwb.misc.yaml#L187-L199

The resolution attribute is added to the base VectorData definition.

Or a new group may be defined as:

- data_type_def: MyContainer
  groups:
  - name: new_group
    data_type_inc: Container
    groups:
    - name: new_subgroup
      data_type_inc: Container
      doc: a new subgroup
    # also add new datasets, new links, and new attributes

A related but not fully matching example is the 'electrodes' table in the NWBFile: https://github.com/NeurodataWithoutBorders/nwb-schema/blob/dev/core/nwb.file.yaml#L269

The 'electrodes' table lacks a data_type_def but specifies many named VectorData types. However, this differs from the above because DynamicTable explicitly allows quantity: * of data_type_inc: VectorData without a name specified.

Note that a group/dataset defined with only a data_type_inc may have a different doc, quantity, dims, shape, and sometimes even dtype than in the data_type_def definition, as long as these are compatible with the included spec (e.g., a square is a special case of a rectangle). This should also be clarified in the documentation and fully supported in HDMF.

If adding fields to types with only data_type_inc is not allowed, then the NWB schema should be amended.
If the above is allowed, then the HDMF validator should be amended to validate against the spec with any additions to the included type.

Build of read the docs fails

The build of the hdmf schema language docs on RTD currently fails with

Running Sphinx v1.8.5

Traceback (most recent call last):
  File "/home/docs/checkouts/readthedocs.org/user_builds/hdmf-schema-language/envs/latest/lib/python3.7/site-packages/sphinx/config.py", line 368, in eval_config_file
    execfile_(filename, namespace)
  File "/home/docs/checkouts/readthedocs.org/user_builds/hdmf-schema-language/envs/latest/lib/python3.7/site-packages/sphinx/util/pycompat.py", line 150, in execfile_
    exec_(code, _globals)
  File "/home/docs/checkouts/readthedocs.org/user_builds/hdmf-schema-language/checkouts/latest/source/conf.py", line 16, in <module>
    from ruamel import yaml
ModuleNotFoundError: No module named 'ruamel'

No support for arrays in compound data types

HDF5 allows you to have vectors of up to 4 dimensions within compound datatypes (ref), but our DTypeSpec does not allow us to have a shape parameter, so we cannot use this feature. I propose that we extend our schema language so that we can put vectors inside of compound data types.

Update docs to say that time can be missing from isodatetime/datetime dtype?

Useful for NWB: NeurodataWithoutBorders/nwb-schema#542

Current docs: https://hdmf-schema-language.readthedocs.io/en/latest/description.html#dtype

Proposal:

  • Allow date in the current isodatetime / datetime definition. Example: date of birth (time is rarely relevant and rarely tracked)

Alternative:

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.