Code Monkey home page Code Monkey logo

hdmf-common-schema's People

Contributors

ajtritt avatar bendichter avatar mavaylon1 avatar oruebel avatar rly avatar t-b avatar yarikoptic avatar

Stargazers

 avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar

hdmf-common-schema's Issues

Render docs for experimental namespace

The docs for the new experimental namespace are currently not being rendered on ReadTheDocs. This will likely require some changes to the setup of how the docs are generated and/or the hdmf-docutils.

Specify schema language version

Add a comment to all of the YAML files saying which version of the schema language is in use, e.g.,
# hdmf-schema-language version 2.0.2

Support NWBv2 files created by IPNWB

Branch: https://github.com/t-b/hdmf-common-schema/tree/use-text-as-encoding-for-dynamic-table-column-names

Good day โœจ

When creating NWBv2 files from Igor Pro I'm facing a general problem with the text encoding. I can only write files with UTF8 encoded strings.

The easy fix would be the following change

$ git diff .
diff --git a/common/table.yaml b/common/table.yaml
index 49c8b6c..73a5ad3 100644
--- a/common/table.yaml
+++ b/common/table.yaml
@@ -87,7 +87,7 @@ groups:
     of usability.
   attributes:
   - name: colnames
-    dtype: ascii
+    dtype: text
     dims:
     - num_columns
     shape:
(base)

That would undo the change from NeurodataWithoutBorders/pynwb@b0939429 but I don't know why that was done.

Rename some resources fields for consistency

In https://github.com/hdmf-dev/hdmf-common-schema/blob/master/common/resources.yaml:

First:

  • the keys table contains a field 'key_name'
  • the entities table contains fields 'entity_id' and 'entity_uri'
  • the resources table contains fields 'name' and 'resource_uri'

I think the prefixes for these key names (except for "resources" > "name") is redundant and can be removed for readability and consistency with "resources" > "name". This would result in:

  • the keys table contains a field 'name'
  • the entities table contains fields 'id' and 'uri'
  • the resources table contains fields 'name' and 'uri'

Would this be too confusing? If so, then for consistency, the resources table "name" field should be renamed "resource_name".

Second:

  • the entities table contains fields "keytable_idx" and "resource_table_idx"
  • the object_keys table contains fields "objecttable_idx" and "keytable_idx"

Either they should all have an underscore before "table" or none of them should. I personally prefer that they have an underscore before "table".

hdmf schema language

I think there is a bit of unclarity here with regards to the schema language. I do not see any documentation about the HDMF schema language. We do have the schema-language readthedocs, but that page states that it is for NWB and contains the NWB-specific keys neurodata_type_def and neurodata_type_inc. If we want to have a more general schema language we would need

  1. A page describing the HDMF schema language. It may be preferable to just modify the existing NWB specification language.
  2. A formal specification for changing the names of keys. I suppose this should be added to the schema language as well.
  3. A json schema validator for the HDMF schema language.

Add new dtype for external file

Add new dtype for external file, which is basically the same as dtype: text except that its value will get passed to a resolver, if present, during the build process. E.g. for externally stored images and video files in NWB

Doc for `VectorIndex` is wrong

Based on the graphic in the NWB preprint which is also here:
https://github.com/hdmf-dev/hdmf-common-schema/blob/master/docs/source/figures/ragged-array.png

the docstring for VectorIndex is wrong:

of the DynamicTable by indexing into this VectorData. The first
vector is at VectorData[0:VectorIndex(0)+1]. The second vector is at
VectorData[VectorIndex(0)+1:VectorIndex(1)+1], and so on.

I think I wrote this incorrectly months ago. It should say:
The first vector is at VectorData[0:VectorIndex[0]]. The second vector is at VectorData[VectorIndex[0]:VectorIndex[1]].

example:

>>> from hdmf.common import VectorData, VectorIndex
>>> foo = VectorData(name='foo', description='foo column', data=['a', 'b', 'c', 'd'])
>>> foo_ind = VectorIndex(name='foo_index', target=foo, data=[2, 4])
>>> foo_ind[0]
['a', 'b']

Document how ExternalResources `field` values should be written

When adding an object to the ExternalResources objects table, you supply a container object ID and a field. In most cases, the field is the name of a dataset or attribute, but it could be a little more complicated.

Let's say the container is a group data type (e.g., TimeSeries) and it has a dataset (e.g., data) without a data type, and that dataset has a string attribute (e.g., unit). The 'field' value then needs to signal that the field is the 'unit' attribute on the 'data' dataset. This could be done using '/' as a separator, e.g., field='data/unit'.

Now let's say the attribute is not a string but a compound data type with columns/fields 'x', 'y', and 'z', and each column/field is associated with different ontologies. The 'field' value also needs to account for this. This could also be done using '/' as a separator, e.g., field='data/unit/x'.

Whatever string formatting scheme we choose should be explicitly described in the docs and then handled by the API.

This comes up from @oruebel and @rightbower when ontologizing a column/field of a compound data type column in an ICEPhys table.

Remove the Resources table

Based on feedback from the LinkML developers and ontology experts, the Resources table is redundant and would rarely be used by the community. It adds extra overhead for adding entries to the ExternalResources "database". So, we decided to remove it. cc @oruebel @mavaylon1

The HDMF API will also need to be updated. rly/ndx-external-resources#6 may also need to be updated.

Add URIs for external resources / ontologies

Currently, with the new ExternalResources data type, each term is associated with an external resource (e.g., ontology), a unique identifier at the resource, and the URI for the resource entity.

Users would also like to associate a URI for the external resource. For example, the external resource "NCBITaxon" would have the URI "https://www.ncbi.nlm.nih.gov/taxonomy" associated with it. To normalize these data in the "resources" table, we should add a new table with fields (name, uri) and change the "resource_name" field in the "resources" table to be an index (foreign key) into the new table.

This change requires further feedback and coordination with ontology users.

add dtype to VectorIndex

VectorIndex has no dtype, but it should always be a type of unsigned int, since VectorIndex will always hold indices into a VectorData type

- data_type_def: VectorIndex
data_type_inc: Index
doc: Used with VectorData to encode a ragged array. An array of indices
into the first dimension of the target VectorData, and forming a map
between the rows of a DynamicTable and the indices of the VectorData.
dims:
- num_rows
shape:
- null
attributes:
- name: target
dtype:
target_type: VectorData
reftype: object
doc: Reference to the target dataset that this index applies to.

"object_type" column in ExternalResources may not be sufficient

We added "object_type" in the objects table in ExternalResources to make queries easier.

But in DynamicTables, the "object_type" would be "VectorData" which is very generic and using that would pick up a lot of false positives, so it does not make queries for annotations of table columns any easier.

Documentation for hdmf-common types missing

With the move of data types from NWB to hdmf-common, the hdmf-common types may no longer have documentation. They might get copied over into NWB, but it is not clear. Regardless, the hdmf-common types should have their own documentation on its own readthedocs page.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.