Code Monkey home page Code Monkey logo

cf-conventions's People

Contributors

andersms avatar castelao avatar chrisbarker-noaa avatar cofinoa avatar czender avatar d70-t avatar dave-allured avatar davidhassell avatar dblodgett-usgs avatar erget avatar ethanrd avatar fmanzano-pde avatar jimbiardcics avatar jonathangregory avatar larsbarring avatar lesserwhirls avatar marqh avatar martinjuckes avatar mo-marqh avatar mwengren avatar neumannd avatar painter1 avatar reshel3 avatar rhattersley avatar rosalynhatcher avatar sadielbartholomew avatar snowman2 avatar squaregoldfish avatar twhiteaker avatar zklaus avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

cf-conventions's Issues

Please improve description for direction_of_sea_water_velocity standard name

The description for the direction_of_sea_water_velocity standard name doesn't clearly state that it's a "to" direction. I assume it follows oceanographic convention, but think it needs to be stated explicitly. Wind and wave direction variables have _from and _to options, with clear descriptions of what this means. I think we can do better for direction_of_sea_water_velocity.
Thanks!

Update repo description

It currently states "[UNOFFICIAL] Work-in-progress demo of using AsciiDoc" which happily is no longer accurate. ๐Ÿ˜„

Bold or not bold?

Hi,

There is some confusion as to when monospaced text (blah) should be bold (blah) or not - sometimes it is, sometimes not. Do we need a scheme, like attributes names (and some other things) = bold, attribute values (and some different things) = not bold?

What do you think?

All the best,

David

Support for enum type in CF 2.0

CF 1.x uses an integer type with "flag_values" and "flag_meanings" attributes to create enumerated variables with self-describing flag values.

netCDF4 has added the enum type which also allows the user to create enumerated variables with self-describing flag types, but with a simpler syntax and greater re-usability. For instance, a boolean enum type may be defined once at the root level and then used consistently throughout the netCDF dataset.

Will CF 2.0 allow/support the use of the enum data type?

  • Tim Patterson
    (working on netCDF formats for EUMETSAT/MTG products).

Preserve deleted sections?

Should deleted sections 5.4, "Timeseries of Station Data" and 5.5, "Trajectories" be preserved in the AsciiDoc version?

If we are preserving sections 5.4 and 5.5, presumably the "This section has been superseded by ..." text should also be kept.

If we are not preserving sections 5.4 and 5.5 then do we still want to keep the following section numbers as 5.6 and 5.7?

For what it's worth, my suggestion would be to keep 5.4 and 5.5 and their "This section has been superseded..." text. This is what I've done in #16.

Allow a standard name `alias` to have more than one `entry_id`

[This issue was originally entitled "TRAC #155: Invalid "id" values in CF Standard Name aliasses"]

Running an XML schema check on the CF standard name list, I found the following minor (because they relate to aliasses, not the standard name definitions) issues:

There are spurios spaces in these ids:

  • rate_of_ hydroxyl_radical_destruction_due_to_reaction_with_nmvoc
  • mole_fraction_of_hypochlorous acid_in_air
  • mole_fraction_of_dichlorine peroxide_in_air
  • mole_fraction_of_chlorine monoxide_in_air
  • mole_fraction_of_chlorine dioxide_in_air
    https://cfconventions.org/Data/Trac-tickets/155.html
    The standard name surface_carbon_dioxide_mole_flux has two aliasses, surface_upward_mole_flux_of_carbon_dioxide and surface_downward_mole_flux_of_carbon_dioxide, which is intended (the definitions of the two newer names indicate that the deprecated name was too imprecise). The problem here is that the XSD schema does not allow for two aliasses with the same id. Having unique id values for each element is useful, so I suggest we change the schema and the document to replace
  <alias id="surface_carbon_dioxide_mole_flux">
    <entry_id>surface_upward_mole_flux_of_carbon_dioxide</entry_id>
  </alias>

  <alias id="surface_carbon_dioxide_mole_flux">
    <entry_id>surface_downward_mole_flux_of_carbon_dioxide</entry_id>
  </alias>

with

<alias id="surface_carbon_dioxide_mole_flux">
    <entry_id>surface_upward_mole_flux_of_carbon_dioxide</entry_id>
    <entry_id>surface_downward_mole_flux_of_carbon_dioxide</entry_id>
</alias>

EDIT 2024-01-19: Changed the top link to correctly point to the Trac ticket /@larsbarring

Clarify the use of compressed dimensions in related variables

The problem

When a data variable has dimensions that have been compressed by gathering (http://cfconventions.org/Data/cf-conventions/cf-conventions-1.7/cf-conventions.html#compression-by-gathering), the conventions currently make no statement on whether or not any related auxiliary coordinate, cell measure or ancillary variables that spans those dimensions must also be compressed, or whether such attached variables may be compressed across a different list of dimensions.

The situation is, however, clarified in the conformance document (http://cfconventions.org/Data/cf-documents/requirements-recommendations/requirements-recommendations-1.7.html), which states that attached variables must span a subset of the actual dimensions spanned by the data variable (sections 5 and 7.2). There are exceptions to this rule for auxiliary coordinate variables, for character arrays and DSG ragged arrays, but they do not apply to compression by gathering.

Proposed solution

I propose that the conformance document is correct, and we would be fixing a defect in the conventions to clarify this in the main text.

Backwards compatibility

I see no problems, because If there are existing datasets that apply compression differently to related variables than to the data variable, then these datasets would already fail the tests from the conformance document.

Changes required

A. Add a third paragraph to section 8.2

Any auxillary coordinate, cell measure or ancillary variables related
to a compressed data variable must not be compressed unless it can be
compressed using the same list variable indices as used by the data
variable, in which case compression must be applied. This will occur
for a related variable for which, when uncompressed, the individual
compressed axes appear adjacently and in the same order as the data
variable.

B. Update Example 8.1 to include a 2-d auxiliary coordinate variable:

Example 8.1. Horizontal compression of a three-dimensional array

dimensions:
  lat=73;
  lon=96;
  landpoint=2381;
  depth=4;
variables:
  int landpoint(landpoint);
    landpoint:compress="lat lon";
  float landsoilt(depth,landpoint);
    landsoilt:long_name="soil temperature";
    landsoilt:units="K";
    landsoilt:coordinates="soil_type";
  float depth(depth);
  float lat(lat);
  float lon(lon);
  int soil_type(landpoint);
     soil_type:long_name="integer code defining the soil type";
data:
  landpoint=363, 364, 365, ...;

List of tables/examples

The "List of Tables" and "List of Examples" are missing. In the original they occur immediately after the "Table of Contents".

Cannot assign labels to issues

CONTRIBUTING.md says to assign labels to issues, but I (and other users) don't have permission to do so, unless I'm missing something.

Preface revision history

In the preface, the existing description of the revision history is no longer accurate.

In particular, if we are no longer using the pink, yellow, green highlighting scheme the following phrase no longer applies:

Changes with provisional status use the following mark-up style: ...

Also, we need to consider if the following phrase is still accurate:

See Appendix G, Revision History for the full revision history.

Perhaps the following wording would be more appropriate?

This document will be updated to reflect agreed changes to the standard and to correct mistakes according to the rules of CF governance. See Revision History for the revision history prior to the move to GitHub. See GitHub for the subsequent revision history.

Simple Geometry Contribution and github test case

Preface:
This issue follows the conversation in PR #109 and is purposefully a test case working on the migration described in issue #106. #109 is still open so we can kick the tires on both approaches.

For discussion of the text of this proposal, use copy and paste content into a comment below and use strikethrough (~~strikethrough~~) and bold (**bold**) text to indicate removed text. E.g. change this to this. An alternative could be: strikthrough(new text in parens in new expected style)

Note that this is probably a very long submission in comparison to typical change requests that would be vetted using github issues. Comments here will likely get long, but it seems that should be OK as long as we remember that it is probably an outlier and is about as long as these would ever get.


Summary
This proposal has been vetted on the CF email list extensively and has gone through a number of iterations. The structure and semantics of the proposed addition below should be close to complete, but this is the first review of proposed text to be added to the CF 1.8 specification. This is entirely new text (section 7.5) to be added just after section 7.4. There is also text to be added as Example E1 in Appendix E. The text should more or less speak for its self, but much more information about the proposal can be seen in the readme here, on the wiki about the specification here, and in the poster here.

The proposed text follows first with a suggested section 7.5 then a suggested example to be added to appendix E.


Section 7.5 Spatial Geometries

For many geospatial applications, data values are associated with a spatial geometry (e.g., the average monthly rainfall in the UK). Although cells with an arbitrary number of multiple vertices can be described using Section 7.1, "Cell Boundaries", spatial geometries contain an arbitrary number of nodes for each geometry and include line and multipart geometries (e.g., the different islands of the UK). The approach described here specifies how to encode such geometries following the pattern in 9.3.3 Contiguous ragged array representation and attach them to variables in a way that is consistent with the cell bounds approach.

A geometry is usually thought to be a spatial representation of a real-world feature. It can be disjoint, having multiple parts. Geometry types are limited to point, multipoint, line, multiline, polygon and multipolygon types. Other types exist and may be introduced in a later version of the specification. Similar to other geospatial data formats, geometries are encoded as ordered sets of geospatial nodes. The connection between nodes is assumed to be linear in the coordinate reference system the nodes are defined in. Parametric geometries or otherwise curved features may be supported in the future.

All geometries are made up of one or more nodes. The geometry type specifies the set of topological assumptions to be applied to relate the nodes. For example, multipoint and line geometries are nearly the same except nodes are interpreted as being connected for lines. Lines and polygons are also nearly the same except the first and last nodes must be identical for polygons. Polygons that have holes, such as waterbodies in a land unit, are encoded as a collection of polygon ring parts, each identified as exterior or interior polygons. Multipart geometries, such as multiple lines representing the same river or multiple islands representing the same jurisdiction, are encoded as collections of un-connected points, lines, or polygons that are logically grouped into a single geometry.

While this geometry encoding is applicable to any variable that shares a dimension with a set of geometriesy, the application it was originally designed for requires that the geometry be joined to the instance dimension of a Discrete Sampling Geometry timeSeries featureType. In this case, any data variable can be given a geometry attribute that is to be interpreted as the representative geometry for the quantity held in the variable. An example of this is areal average precipitation over a watershed. An example of line geometry with time series data is given in Appendix E: Cell Methods.

Geometry Variables and Attributes

A set of geometries can be added to a file by inserting all required data variables and a geometry container variable that acts as a container for attributes that describe a set of geometries. A geometry attribute containing the name of a geometry container variable can be added to any variable that shares a dimension with the geometries. The geometry container must hold geometry_type and node_coordinates attributes. Depending on the geometry_type, the geometry container may also need to contain a node_count, part_node_count, and interior_ring attribute. These attributes are described in detail below.

The geometry_type attribute must be carried by a geometry container variable and indicates the type of geometry present. Its allowable values are: point, multipoint, line, multiline, polygon, multipolygon. The node_coordinates attribute must be carried by a geometry container variable and contains the space delimited names of the x and y (and z) variables that contain geometry node coordinates.

For all geometry types except point, the geometry container variable must have a node_count attribute that contains the name of a variable indicating the count of nodes per geometry. Note that the node count may span multiple geometry parts. For multiline, multipolygon, and polygons with holes, the geometry container variable must have a part_node_count attribute that contains the name of a variable indicating the count of nodes per geometry part. Note that because multipoint geometries always have a single node per part, the part_node_count is not required.

For polygon and multipolygon geometries with holes, the geometry container variable must have an interior_ring attribute that contains the name of a variable that indicates if the polygon parts are interior rings (i.e., holes) or not. The variable indicated by the interior_ring attribute should contain the value 0 to indicate an exterior ring polygon and 1 to indicate an interior ring polygon. Note that single part polygons can have interior rings; multipart polygons are distinct in that they have more than one exterior ring.

The variables that contain geometry node coordinate data, indicated by the node_coordinates attribute on the geometry container variable, are also identifiable through the use of a required cf_role attribute. Allowable values are geometry_x_node, geometry_y_node, and geometry_z_node.

Encoding Geometries

Geometry encoding follows a similar pattern to the contiguous ragged array approach in 9.3.3 Contiguous ragged array representation with some modification to suit the spatial geometry use case rather than observational time series. All spatial data are encoded in the variables indicated by the node_coordinates and appropriate cf_role attribute. These node variables should be one dimensional and total number of nodes long. There are three one dimensional variables that are used to break up and interpret the node variabes: node_count, part_node_count, and interior_ring.

For geometry types requiring a node_count attribute, the node count variable should be the number of geometries long and indicate the number of nodes per geometry. For geometry types requireing a part_node_count attribute, the part node count variable should be the number of geometry parts long and indicate the number of nodes per geometry part. For geometry types requireing an interior_ring attribute, the interior ring variable should be the number of geometry parts long and contain 0s and 1s to indicate exterior or interior.

The ecosystem of polygon specifications and software implementations of those specifications varies in how polygons are encoded. Nodes within a polygon exterior or interior ring are typically encoded in opposite clockwise or anticlockwise direction around the polygon. This is important for operations such as caluclating area. CF requires that outer rings be encoded in anticlockwise order and interior rings be encoded in clockwise order. CF also requires that the first and last node in a polygon be identical to ensure polygon rings are complete.

A coordinate reference system (CRS) (referred to as a grid mapping elsewhere in the CF convention) is strictly required for geometries. The normal CF practice, of attaching a grid_mapping attribute--containing the name of a CRS container variable--to a data variable, can be used and the grid_mapping CRS should be assumed to apply to the geometry. However, the normal grid_mapping, which typically applies to auxiliary coordinate variables and remains optional for use with geometries, can be overridden by attaching a crs attribute that contains the name of a CRS container variable to the geometry container variable. If a grid_mapping is not present on a data variable linked to geometry, a crs attribute is required.

Example 7.14. A multipolygon with holes

This example demonstrates the use of all potential attributes and variables for encoding geometries.

dimensions:
  node = 25 ;
  instance = 1 ;
  part = 6 ;
variables:
  double x(node) ;
    x:units = "degrees_east" ;
    x:standard_name = "longitude" ;
    x:cf_role = "geometry_x_node" ;
  double y(node) ;
    y:units = "degrees_north" ;
    y:standard_name = "latitude" ;
    y:cf_role = "geometry_y_node" ;
  float geometry_container ;
    geometry_container:geometry_type = "multipolygon" ;
    geometry_container:node_count = "node_count" ;
    geometry_container:node_coordinates = "x y" ;
    geometry_container:crs = "crs" ;
    geometry_container:part_node_count = "part_node_count" ;
    geometry_container:interior_ring = "interior_ring" ;
  int node_count(instance) ;
    node_count:long_name = "count of coordinates in each instance geometry" ;
  int part_node_count(part) ;
    part_node_count:long_name = "count of nodes in each geometry part" ;
  int interior_ring(part) ;
    interior_ring:long_name = "type of each geometry part" ;
  float crs ;
    crs:grid_mapping_name = "latitude_longitude" ;
    crs:semi_major_axis = 6378137. ;
    crs:inverse_flattening = 298.257223563 ;
    crs:longitude_of_prime_meridian = 0. ;
// global attributes:
  :Conventions = "CF-1.8" ;
data:
 x = 0, 20, 20, 0, 0, 1, 10, 19, 1, 5, 7, 9, 5, 11, 13, 15, 11, 5, 9, 7, 5, 
    11, 15, 13, 11 ;
 y = 0, 0, 20, 20, 0, 1, 5, 1, 1, 15, 19, 15, 15, 15, 19, 15, 15, 25, 25, 29, 
    25, 25, 25, 29, 25 ;
 geometry_container = 0. ;
 node_count = 25 ;
 part_node_count = 5, 4, 4, 4, 4, 4 ;
 interior_ring = 0, 1, 1, 1, 0, 0 ;
 crs = 0. ;

Example E.1. Timeseries with geometry.

dimensions:
  instance = 2 ;
  node = 5 ;
  time = 4 ;
variables:
  int time(time) ;
    time:units = "days since 2000-01-01" ;
  double lat(instance) ;
    lat:units = "degrees_north" ;
    lat:standard_name = "latitude" ;
    lat:geometry = "geometry_container" ;
  double lon(instance) ;
    lon:units = "degrees_east" ;
    lon:standard_name = "longitude" ;
    lon:geometry = "geometry_container" ;
  int crs ;
    crs:grid_mapping_name = "latitude_longitude" ;
    crs:longitude_of_prime_meridian = 0.0 ;
    crs:semi_major_axis = 6378137.0 ;
    crs:inverse_flattening = 298.257223563 ;
  int geometry_container ;
    geometry_container:geometry_type = "line" ;
    geometry_container:node_count = "node_count" ;
    geometry_container:node_coordinates = "x y" ;
  int node_count(instance) ;
  double x(node) ;
    x:units = "degrees_east" ;
    x:standard_name = "longitude" ;
    x:cf_role = "geometry_x_node" ;
  double y(node) ;
    y:units = "degrees_north" ;
    y:standard_name = "latitude" ;
    y:cf_role = "geometry_y_node" ;
  double someData(instance, time) ;
    someData:coordinates = "time lat lon" ;
    someData:grid_mapping = "crs" ;
    someData:geometry = "geometry_container" ;
// global attributes:
  :Conventions = "CF-1.8" ;
  :featureType = "timeSeries" ;
data:
  time = 1, 2, 3, 4 ;
  lat = 30, 50 ;
  lon = 10, 60 ;
  someData =
    1, 2, 3, 4,
    1, 2, 3, 4 ;
  node_count = 3, 2 ;
  x = 30, 10, 40, 50, 50 ;
  y = 10, 30, 40, 60, 50 ;

The time series variable, someData, is associated with line geometries via the geometry attribute. The first line geometry is comprised of three nodes, while the second has two nodes. Client applications unaware of CF geometries can fall back to the lat and lon variables to locate feature instances in space. In this example, lat and lon coordinates are identical to the first node in each line geometry, though any representative point could be used.

GitHub Contribution Guidelines

Dear All,

As a next step toward the CF community using GitHub tools to discuss and refine the specification, we need contribution guidelines for this repository. For background and interesting reading, this issue follows #106 and #112 and is more or less governed by the CF community conversation in Trac ticket 160. The consensus in that ticket has lead us here.

The outcome of this issue should be an initial CONTRIBUTING.md and modifications to the pull request template per the requirements described below. We should use this issue to discuss and evolve these requirements then I will submit a pull request with content according to the consensus.

Please review the use case and guidelines below. I will begin work on the CONTRIBUTIONS.md document in a few weeks based on the consensus we have sometime in May.

Top level use case: As a contributor to the CF specification, I need to know the rules and instructions for how GitHub should be used, so I know the right way to submit my suggested addition or changes.

Guidelines: (derived from correspondence between @JonathanGregory, @dblodgett-usgs and others.)

  • Some simple instructions are needed so that anyone who does not use GitHub for any other purpose can follow them successfully. GitHub can be used in a variety of ways and we need clear guidance regarding how the CF community wants to use the tools.

  • A given proposal should be discussed as one issue. It shouldn't fork or be superseded by another one, unless that reflects what has happened to the proposal, in the same way that we continue discussion under one trac ticket for a given proposal. This is so that it's easy to trace the discussion which led to a given agreed proposal.

  • It has to be easy to see what is being proposed in the context of the
    discussion. In Trac we say:

    I propose to do X, Y and Z, with the result that the text will read as follows
    in Section N
    blah blah blah
    and as follows in Section M
    rhubarb rhubarb

    Thus it's possible to see the reasoning and the effect on all the sections, all gathered together, and understand it. In GitHub, there is no relation between the argument and the modified text, unless the argument is put as annotations in the text, but in that case the bits of argument are all over the place. Somehow you need to get an overview of the changes all together.

  • In the case of short changes, I think that excerpting the existing spec in a GitHub issue with suggested new content in the body of a GitHub issue would work well. Once agreed to in the issue, the change can be implemented and submitted as a pull request referencing the issue.

  • In the case of long changes (i.e. where the body of an issue isn't long enough or changes are in many parts of the specification) the significant addition would likely come in the form of added content in a persons fork of the specification. They might register the contribution and their intent with the community in an issue for general feedback, work on the content in their fork, then submit a pull request when the consensus is that it's ready to be submitted. If the community finds utility in line by line comments in a pull request, that's great, but I don't think we need to force that issue. Line by line comments are really great for code. They can be a great for text, but they can also be very cumbersome.

  • Regarding "long changes" Maybe we could recommend that the discussion begins with proposed changes in the issue itself, but if the proposal involves changes in many places in the document, so it's awkward to spell them all out, it would be better to do this in a fork, but it's not worth doing that until it seems that the proposal is likely to be agreed. Moreover, if it does look like a proposal is going to be agreed, or when it has been, it would definitely be useful for the proposer to fork and update the documents, because otherwise someone else will have to do it later (as for trac tickets). So we should encourage that. However, to enable it, we need really simple instructions about how to do it!

  • Regarding a hybrid approach of GitHub and trac, it seems that part of that ship sailed when we started managing the text in GitHub. That said, I see no reason not to allow changes to be vetted in trac until we have enough success using GitHub to track modifications that the community is ready to leave trac behind. We are already referencing trac from GitHub. Opening the option to not use Trac seams like the next step to me.

Best,

Dave Blodgett

Incorporating the CF data model into the conventions

Title: Incorporating the CF data model into the conventions
Moderator: @JonathanGregory
Requirement Summary: A data model allows for a more structured development of the CF conventions.
Technical Proposal Summary: A new document describing the CF data model is incorporated into CF, and new rules for keeping it up to date.
Benefits: Provides structure to proposers of enhancements to CF, and creators of CF-compliant software
Status Quo: CF continues to not benefit from having a formal data model.

Detailed Proposal:
In this ticket we propose that the CF data model for data and metadata corresponding to the existing standard (version 1.7) is incorporated into the CF conventions.

This follows many years of on-line discussion, the external publication of the data model (GMD, https://doi.org/10.5194/gmd-10-4619-2017) and the consensus to incorporate the data model that was reached at the 2018 CF-netCDF meeting in Reading.

The full description of the data model, as it would be included in CF, may be found at https://github.com/davidhassell/cf-data-model/blob/master/cf-data-model.adoc. This document is almost entirely a cut down version of the GMD paper.

It is proposed that this description of the data model resides in a new repository called https://github.com/cf-convention/cf-data-model (which does not yet exist).

The CF data model document contains background information on what the data model aims to achieve, its design criteria, and relationship to the current netCDF encoding.

Note that the data model published in GMD was for CF-1.6, but it is consistent with all of the changes to the conventions arising from the 28 tickets that contributed to CF-1.7. Therefore we may consider this also to be
the data model for CF version 1.7.

Governance:
The CF data model will guide the development of CF by providing a framework for ensuring that proposed changes fit into CF in a logical way, rather than just a pragmatic one.

To ensure this occurs, some additional rules for conventions changes have been drafted. These are included in pull request cf-convention/cf-convention.github.io#72

In brief, a proposed change should be assessed for data model compatibility. If it is not compatible, and the proposal cannot be acceptably modified so that it is, then the data model will be changed to accommodate the change. The proposer is not required to have detailed data model knowledge - the new rules state that "The assessment will be carried out by a member of the conventions committee or another suitably qualified person. If no-one volunteers, the chairman of the committee will ask someone to do it".

test

Please ignore this.

timeSeries featureType with a forecast / reference time dimension?

Dear All,

There's been some question as to whether a timeSeries featureType is allowed to have a "reference time" dimension in addition to a "valid time" dimension as in a forecast model run collection. See: Unidata/thredds#1080

Is Mandatory space-time coordinates for a collection of these features of x(i) y(i) t(i,o) in the table here. to be interpreted as limited to t(i,o) or not limited to t(i,o) ? @cofinoa and I agree that it should be "not limited to" but I wanted to poll the community and get the take of those with a bit more background on the CF convention language.

Thanks!!

  • Dave

Closure Criteria:
This issue should be closed when this question is answered. It may propogate an additional issue to clarify the text in the CF spec, but I would consider that an additional issue with it's own description.

Support of groups in CF

As discussed in CF meeting on 20 June 2018 in Reading, UK, we'd like to add support for the use of groups in CF files. Charlie and I have drafted an appropriate pull request containing the suggestions we'd like to implement.

Basically, the idea is to allow elements in CF files to refer to others which are not in the same group. The proposal defines ways of locating variables in this situation and tries to capture current ways of doing so "in the wild".

I'll update this issue with further material on this (Pull Request reference, summary presentation discussed at the meeting last June, and the proposal as drafted up till that point) as soon as the PR has an URL.

We haven't updated the conformance rules and checker yet, as we want to agree on the content first.

Re-enable automatic HTML/PDF builds

The secure values in .travis.yml need updating to work in the new location.

The three values are:

  • GH_TOKEN: Defines a personal access token to authenticate against this repo. This needs to be an access token for a user with write access to this repo.
  • GIT_EMAIL & GIT_NAME: Defines the git identity used for the automated commits. Should perhaps be set to something like "auto@cf-conventions" and "auto-generated".

synda metric by job

"synda metric" reports overall performance information for each server. Often I want to see performance for each recent download job. That is, for each time I issue "synda install -s selection-file", I would like to be able to see separate performance figures. For a big job (some may take over a month to complete!), I would like to be able to see just the recent performance, as well as the overall performance figures.

Add attribute citation_id

THIS IS OUTDATED. I'm editing this proposal to reflect the discussions so far, but I'll save a copy of this original proposal.

Title: DOI attribute

Moderator: to be defined

Requirement Summary: Optional DOI attribute in section Description of file contents (2.6.2).

Technical Proposal Summary: Add a new optional attribute to designate the Digital Object Identifier (DOI) of the data contained in the CF data object.

Benefits: DOIs allow easy automation for tracking the scientific impact of the data on the exact same fashion that scientific publications are tracked with DOIs. Anyone involved in the resulted data can be recognized, including funding agencies.

Status Quo: An increasing number of scientific journals start to require a DOI for the dataset used in the publication. Many groups already include DOI as an attribute in its NetCDF-CF datasets but without a standard, thus hard to automate.

Detailed Proposal: The only modification required would be in section 2.6.2: Description of file contents. In the bottom, after item comment, it would be added:

doi: Digital object identifier (DOI) of the dataset. For simplicity, the proxy part
       of the DOI is dropped, so it is composed by the suffix plus the prefix only,
       e.g. โ€œ10.21238/S8SPRAY1618โ€.

As mentioned in the 2.6.2 section, all attributes are optional, and the doi would follow the same rule.

This propose was developed with the help of @kenkehoe

Reasoning:

DOI is a de facto standard to track academic publications, thus providing the foundation for some measurement of scientific impact. There is a clear intention by the scientific community to also track the scientific impact of data and software, thus giving proper credit for who makes those available. The strategy adopted by AMS journals, and more recently by AGU, was to require citation of the dataset DOI used in any publication in the references list (https://www.ametsoc.org/ams/index.cfm/publications/authors/journal-and-bams-authors/formatting-and-manuscript-components/references/dataset-references/).
The use of DOI for datasets will increase. A few groups already include the dataset DOI in its NetCDF-CF data files, but without a standard, it is hard for a machine to keep track of that.

Justification:

  • It should be optional, nobody should be forced to adopt DOIs, plus it guarantees backward compatibility.
  • Although other standards allow the use of DOI, for example the id attribute recommended by ACDD, it conflicts with possible uses of DOIs. For instance, while id is stated to be unique, the same dataset DOI could be used in multiple files with chunks of the dataset assigned by the DOI.
  • CF reserved attribute โ€˜referencesโ€™ is typically tied to a free form text that can list a publication style reference or a URL. โ€œPublished or web-based references that describe the data or methods used to produce it.โ€ (CF-1.7). A DOI function differently and the use of a dedicated attribute will enable automated tracking.
  • It is not being argued that DOI is the best option to link data nor it is a perfect solution, but this is the current standard to scientific impact tracking for articles, and this will not change soon.
  • A variable level attribute should be allowed to attend cases of distinct contexts, so that datasets could be split or aggregated.

Tiny background on DOI:

  • DOI is not an URL, despite having a similar syntax and be commonly used in web services called proxies, like https://doi.org/ . Once created, its records are replicated and preserved by libraries with the commitment of long term archiving.
  • Associated to a DOI there are several fields of information. We could compare a DOI as a unique id for a record in a database with many columns. Some of those columns could be funding agencies, creators, or even a reference to other DOIs like related publications or datasets.
  • Independent of how many fields associated with a DOI, in the NetCDF-CF dataset file it always goes as a single text, the DOI itself, thus it has a tiny footprint in the file.
  • One of the metadata fields of a DOI is an URL, also called the landing page. Although this URL could point directly to the data file, the landing page is usually a human-readable page about the data, including a link on how to download the data file itself.
    A dataset DOI can have a record referring the DOI of the NetCDF-CF documentation, thus giving a metric of the impact of cf-conventions document. Adding additional metadata to the DOI or additional DOIs in the future do not change the DOI and allow for future updates without needing to update or reprocess netCDF data file.
  • Although it is not common practice, the DOI metadata can be modified. This allows information about the data to be updated without the need to update the reference in the netCDF data file. This would eliminate the need for organizations to reprocess datasets when organizations or contact information changes.

Details

  • attribute name would be โ€œdoiโ€
  • multiple DOIs are allowed with space delimiter separation in a character array

Example

// global attributes:
	:Conventions = "CF-1.7, ACDD-1.3" ;
	:title = "California Underwater Glider Network" ;
	:featureType = "trajectoryProfile" ;
	:id = "CUGN_90" ;
	:standard_name_vocabulary = "CF Standard Name Table v62" ;
	:doi = "10.21238/S8SPRAY1618" ;

GitHub Migration Plan

Dear All,

@rsignell-usgs has urged me to comment on the thread related to GitHub and itโ€™s use in place of trac. I donโ€™t have a trac account and couldnโ€™t figure out how to sign up, so I've decided to respond in github to demonstrate what it's about.

As someone who uses GitHub extensively for project planning/management, as a source code repository, and as a registry for development of an in-process OGC standard, I donโ€™t think itโ€™s worth debating the merits of githubโ€™s community facilitation model. Rather, the discussion should be how this community wants to migrate its existing activities to GitHub and how the community wants to leverage the github infrastructure.

A few points to note about github's functionality that may be of use to the community.

  1. The CF email list should probably live on near term, at some level, and repeating GitHub notifications through the list is fine. That said, this is the last email list Iโ€™m on and I REALLY wish it would move to a searchable indexed list of issues, as Iโ€™d like to get the conversations out in the open and not buried in email formatting and archived inboxes. Subscribing is really easy! Joining github is too!
  2. Github issues work just like email if you want to use them that way. Once youโ€™ve watched a repository, you can respond directly to an issue email and your comment shows up in the issueโ€™s discussion.
  3. Using GitHub is easy if you donโ€™t care to use all the software repository features, e.g. branches. Thereโ€™s super simple wiki functionality, forking a project and editing documents in the browser are super simple and you donโ€™t need to know all the complexities behind it.
  4. A lot more cool stuff can be done... and things can get kind of out of hand... peruse the back issues here or just check out this cherry bomb of a 60-comment thread!

On and on... Like I said above though, the discussion should be how does the community want ot use this system. What the tagging scheme will be, things like repository ownership raised by @marqh in #63, how to deal with stale old pull requests like #35, etc. etc.

Finally, regarding sequencing, I hope we could get 1.7 done and dusted prior to suggesting a full stop change to the infrastructure underlying CF governance. It would make a lot of sense to move 1.7+ into the new space though.

Regards,

Dave

p.s. It's always good practice to finish a new issue with closure criteria so it's original intent is clear. This issue can be closed once a planning of a process to decide how the community wants to use github has started.

Review repository "labels"

The labels list is good but could use some modifications?

screen shot 2018-06-07 at 7 19 46 am

Suggestion:
Remove: asciidoctor mod?, bug, invalid, simple.
Add: typo, style

This issue is related to #130

Place CF, UGRID and SGRID docs on ftp://ftp.unidata.ucar.edu/pub/netcdf/Conventions

In the CF conventions document section 2.6.1 it says:

The conventions directory name is currently interpreted relative to the directory pub/netcdf/Conventions/ on the host machine ftp.unidata.ucar.edu

This does not seem to be the case however. Looking at ftp://ftp.unidata.ucar.edu/pub/netcdf/Conventions, we do not see CF:
2018-12-14_8-44-49

We should put CF, UGRID and SGRID there as additional conventions.

This clearly needs to be done by a Unidata person.

@ethanrd, can you make this happen?

Merge rights

I think there is a need to agree which users have merge rights to this repository and implement the permissions.

I provided review and merge services on this repository's predecessor, but I am unable to provide this service now, as my permissions do not allow it.

Please may we create a team to manage merging of content onto this repository and agree its membership?

thank you
mark

Require PROJ.4 compatible horizontal and vertical projection declarations.

For any software to accurately interoperate with a geospatial dataset it must be given or make an assumption about the datum and projection used for the geospatial content. It is unacceptable to omit this information regardless of the scale or intended use of the data. Specification of the reference datum (horizontal and vertical) and projection (as applicable to the dimensionality of the data) should be a requirement akin to inclusion of units for coordinate variables. If the requirement for a dataset to include such metadata is considered too onerous for data producers who are unfamiliar with the datum their data uses, the CF community should adopt a default lat/lon/elevation datum and encourage software producers to standardize on that datum to foster consistency across the community. What default to use should be determined in consultation with the National Geodetic Survey and their counterparts internationally.

Proj.4 has been the de facto implementation of coordinate transformations, more or less, since the beginning of digital geospatial data. The ability to integrate CF-described geospatially referenced data with tools that implement the Proj.4 projection libraries is important.

Conversion of geospatial data into CF-described files requires CF support for the prevailing set of projections and reference datums.

Use of identifiers from the EPSG naming authority and conventions consistent with OGC-WKT should be supported. The issue that forces this assertion is the need for 'shift grids' to convert to/from non-parametric datums. This is of particular importance for vertical datums but is also important for the common NADCON conversion to/from the NAD27 datum.

In practice, codes defined by the EPSG naming authority, encoded either alone or as part of a WKT datum/projection declaration, are necessary for integration of data with web services and for conversion to and from other formats. Geospatial applications that desire to interoperate with CF should not be forced to construct utilities like this one.. This leads to the conclusion that proj.4 strings, EPSG codes, or WKT projections should be allowed for specification of projections.

Add support for attributes of type string

Attributes with a type of string are now possible with netCDF-4, and many examples of attributes with this type are "in the wild". As an example of how this is happening, IDL creates an attribute with this type if you select its version of string type instead of char type. It seems that people often assume that string is the correct type to use because they wish to store strings, not characters.

I propose to add verbiage to the Conventions to allow attributes that have a type of string. There are two ramifications to allowing attributes of this type, the second of which impacts string variables as well.

  1. A string attribute can contain 1D atomic string arrays. We need to decide whether or not we want to allow these or limit them (at least for now) to atomic string scalars. Attributes with arrays of strings could allow for cleaner delimiting of multiple parts than spaces or commas do now (e.g. flag_values and flag_meanings could both be arrays), but this would be a significant stretch for current software packages.
  2. A string attribute (and a string variable) can contain UTF-8 Unicode strings. UTF-8 uses variable-length characters, with the standard ASCII characters as the 1-byte subset. According to the Unicode standard, a UTF-8 string can be signaled by the presence of a special non-printing three byte sequence known as a Byte Order Mark (BOM) at the front of the string, although this is not required. IDL (again, for example) writes this BOM sequence at the beginning of every attribute or variable element of type string.

Allowing attributes containing arrays of strings may open up useful future directions, but it will be more of a break from the past than attributes that have only single strings. Allowing attributes (and variables) to contain UTF-8 will free people to store non-English content, but it might pose headaches for software written in older languages such as C and FORTRAN.

To finalize the change to support string type attributes, we need to decide:

  1. Do we explicitly forbid string array attributes?
  2. Do we place any restrictions on the content of string attributes and (by extension) variables?

Now that I have the background out of the way, here's my proposal.

Allow string attributes. Specify that the attributes defined by the current CF Conventions must be scalar (contain only one string).

Allow UTF-8 in attribute and variable values. Specify that the current CF Conventions use only ASCII characters (which are a subset of UTF-8) for all terms defined within. That is, the controlled vocabulary of CF (standard names and extensions, cell_methods terms other than free-text elements of comments(?), area type names, time units, etc) is composed entirely of ASCII characters. Free-text elements (comments, long names, flag_meanings, etc) may use any UTF-8 character.

Trac ticket: #176

Add calendars gregorian_tai and gregorian_utc

Introduction

The current CF time system does not address the presence or absence of leap seconds in data with a standard name of time. This is not an issue for model runs or data with time resolutions on the order of hours, days, etc, but it can be an issue for modern satellite swath data and other systems with time resolutions of tens of seconds or finer.

I have written a background section for this proposal, but I have put it at the end so that people don't have to scroll through it in order to get to proposal itself. If something about the proposal seems unclear, I hope the background will help resolve your question.

Proposal

After past discussions with @JonathanGregory and again with he and @marqh at the 2018 CF Workshop, I propose the new calendars listed below and a change to existing calendar definitions.

  • gregorian_tai - When this calendar is called out, the epoch date and time stated in the units attribute are required to be Coordinated Universal Time (UTC) and the time values in the variable are required to be fully metric, representing the the advance in International Atomic Time (TAI) since that epoch. Conversion of a time value in the variable to a UTC date and time must account for any leap seconds between the epoch date and the time being converted.
  • gregorian_utc - When this calendar is called out, the epoch date and time stated in the units attribute are required to be in UTC and the time values in the variable are assumed to be conversions from UTC dates and times that did not account for leap seconds. As a consequence, the time values may not be fully metric. Conversion of a time value in the variable to a UTC date and time must not use leap seconds.
  • gregorian - When this calendar is called out, the epoch date stated in the units attribute is required to be in mixed Gregorian/Julian form. The epoch date and time have an unknown relationship to UTC. The time values in the variable may not be fully metric, and conversion of a time value in the variable to a date and time produces results of unknown precision.
  • the others - The other calendars all have an unknown relationship to UTC, similar to the gregorian calendar above.

The large majority of existing files (past and future) are based on artificial model time or don't need to record time precisely enough to require either of the new calendars (gregorian_tai or gregorian_utc). The modified definition of the gregorian calendar won't pose any problem for them. For users that know exactly how they obtained their times and how they processed them to get time values in a variable, the two new calendars allow them to tell users how to handle (and not handle) those time values.

Once we come to an agreement on the proposal, we can work out wording Section 4.4 to reflect these new/changed calendar definitions.

Background

There are three parts to the way people deal with time. The first part is the counting of the passing of time, the second part is the representation of time for human consumption, and the third is the relationship between the representation of time and the orbital and rotational cycles of the earth. This won't be a deep discussion, but I want to define a few terms here in the hopes that it will help make things clearer. For gory details, please feel free to consult Google and visit places such as the NIST and US Naval Observatory websites. I'm glossing over some things here, and many of my definitions are not precise. My goal is to provide a common framework for thinking about the proposal, as opposed to writing a textbook on the topic.

The first part is the simplest. This is time as a scalar quantity that grows at a fixed rate. This, precisely measured, is what people refer to as 'atomic time' - a count of cycles of an oscillator tuned to resonate with an electron level transition in a sample of super-cooled atoms. The international standard atomic time is known as International Atomic Time (TAI). So time in this sense is a counter that advances by one every SI second. (For simplicity, I am going to speak in terms of counts of seconds throughout this proposal.) No matter how you may represent time, whether with or without leap days or seconds, this time marches on at a fixed pace. This time is metric. You can do math operations on pairs or other groups of these times and get consistently correct results. In the rest of this proposal I'm going to refer to this kind of time as 'metric time'.

The second part, the representation of time, is all about how we break time up into minutes, hours, days, months, and years. Astronomy, culture, and history have all affected the way we represent time. When we display a time as YYYY-MM-DD HH:MM:SS, we are representing a point in time with a label. In the rest of this proposal I'm going to refer to this labeling of a point in time as a time stamp.

The third part, the synchronization of time stamps with the cycles of the planet, is where calendars come into play, and this is where things get ugly. Reaching way back in time, there were three basic units for time - the solar year, the lunar month, and the solar day. Unfortunately, these three units of time are not compatible with each other or with counts of seconds. A solar day is not (despite our definitions) an integer number of seconds in length, a lunar month is not an integer number of solar days (and we pretty much abandoned them in Western culture), and a solar year is not an integer number of solar days or lunar months in length. If you attempt to count time by incrementing a time stamp like an odometer - having a given element increment once each time the element below it has 'rolled over', you find that the time stamps pretty quickly get out of synchronization with the sun and the seasons.

The first attempts to address this asynchrony were leap days. The Julian calendar specified that every four years February would wait an extra day to roll over to March. The Gregorian calendar addressed a remaining asynchrony by specifying that this only happens on the last year of a century (when it normally would) every fourth century. That was close enough for the technology of those days. Clocks weren't accurate enough at counting seconds to worry about anything else. But the addition of leap days (as well as months with random lengths) means that time stamps aren't metric. You can't do straightforward math with them.

In more recent times technology and science have advanced to the point that we can count seconds quite accurately, and we found that keeping the time stamp hours, minutes, and seconds sufficiently aligned with the rising of the sun each day requires the addition (or subtraction) of leap seconds. On an irregular, potentially bi-yearly, basis, the last minute of a day is allowed to run to 60 before rolling over instead of 59 (or rolls over after 58, though it's lately been only additions). Coordinated Universal Time (UTC) is the standard for time stamps that include both leap days and leap seconds.

UTC time stamps represent the time in a human-readable form that is precise and synchronized with the cycles of the earth. But they aren't metric. It's not hard to deal with the leap days part because they follow a fixed pattern. But the leap seconds don't. If you try to calculate the interval between 2018-01-01 00:00:00 and 1972-01-01 00:00:00 without consulting a table of leap seconds and when they were applied, you will have a difference of 27 seconds between the time you get from your calculation and the time has actually elapsed between those two time stamps. This isn't enough of a discrepancy to worry about for readings from rain gauges or measurements of daily average temperature, but an error of even one second can make a big difference for data from a polar-orbiting satellite moving at a rate of 7 km/second.

The clocks in our computers can add further complexity to measuring time. The vast majority of computers don't handle leap seconds. We typically attempt to address this by using time servers to keep our computer clocks synchronized, but this is done by altering the metric time count in the computer rather than modifying the time stamps by updating a table of leap seconds.

Furthermore, most computer software doesn't have 'leap second aware' libraries. When you take a perfectly exact UTC time stamp (perhaps taken from a GPS unit) and convert it to a count of seconds since an epoch using a time calculation function in your software, you are highly likely to have introduced an error of however many leap seconds that have been added between your epoch and the time represented by the time stamp.

As a result of all this, many of the times written in netCDF files are not metric times, and there is no good way to know how to produce accurate time stamps from them. They may be perfectly metric within a given file or dataset, they may include skips or repeats, or they may harbor non-linearities where there are one or more leap seconds between two time values.

We have another minor issue for times prior to 1972-01-01. There's not much way to relate times prior to that epoch to times since - not to the tens of seconds or better level. I'd be surprised if this would ever be a significant problem in our domain.

To summarize, we have TAI, which is precise metric time. We have UTC, which is a precise, non-metric sequence of time stamps that are tied to TAI, and we have a whole host of ways that counts time since epoch stored in netCDF files can be inaccurate to a level as high as 37 seconds (the current leap seconds offset between TAI and UTC).

Most uses of time in netCDF aren't concerned with this level of accuracy, but for those that are, it can be critical.

Typos in Appendix H

Charlie Zender noticed some anomalies in Appendix H, section H.6.3. He saw them in version 1.7, but they have been around for some time. I will fix them. Here is his email on the cf-metadata list:

These appear to be typos near the end of the current 1.7 draft:

  1. 'projectory' should be 'trajectory'
  2. omit 'section = 3;'
  3. 'section:standard_namecf_role = "trajectory_id" ;' should be 'cf_role = "trajectory_id" ;'

Specify Geometry Dimension on Geometry Container

Title: Specify Geometry Dimension on Geometry Container

Moderator: @davidhassell

Requirement Summary:
This is in regard to geometries in Chapter 7. For single part point geometries, the node_count variable is omitted since it is not needed. However, node_count is dimensioned using the instance dimension of the geometries, so without node_count, we don't know the geometry instance dimension.

Example of ambiguous geometry instance dimension (could be foo or instance):

dimensions:
    foo = 3 ;
    instance = 3 ;
    node = 3 ;
variables:
    int foo(foo);
        time:long_name = "bar" ;
    int geometry_container ;
        geometry_container:geometry_type = "point" ;
        geometry_container:node_coordinates = "x y z" ;
    double x(node) ;
        x:units = "degrees_east" ;
        x:standard_name = "longitude" ;
        x:axis = "X" ;
    double y(node) ;
        y:units = "degrees_north" ;
        y:standard_name = "latitude" ;
        y:axis = "Y" ;
    double z(node) ;
        z:units = "m" ;
        z:standard_name = "altitude" ;
        z:axis = "Z" ;
    double someData(instance, foo) ;
        someData:geometry = "geometry_container" ;

Technical Proposal Summary:
Add a geometry_dimension attribute to geometry container, which indicates the geometry dimension, as in:

dimensions:
    foo = 3 ;
    instance = 3 ;
    node = 3 ;
variables:
    int foo(foo);
        time:long_name = "bar" ;
    int geometry_container ;
        geometry_container:geometry_type = "point" ;
        geometry_container:node_coordinates = "x y z" ;
        geometry_container:geometry_dimension = "instance" ;
    double x(node) ;
        x:units = "degrees_east" ;
        x:standard_name = "longitude" ;
        x:axis = "X" ;
    double y(node) ;
        y:units = "degrees_north" ;
        y:standard_name = "latitude" ;
        y:axis = "Y" ;
    double z(node) ;
        z:units = "m" ;
        z:standard_name = "altitude" ;
        z:axis = "Z" ;
    double someData(instance, foo) ;
        someData:geometry = "geometry_container" ;

Benefits: Who or what will benefit from this proposal?
Software developers can now properly parse the file; they know to which dimension the x, y, and z nodes apply.

Status Quo: Discussion of the current state CF and other standards.
CF already has an instance_dimension, but the existence of that attribute normally identifies the variable as a DSG index variable. Therefore, we use geometry_dimension.

Detailed Proposal: Complete proposal
Require a geometry_dimension attribute on geometry container, which identifies the instance dimension of the geometries. This will require updating Ch7 text and examples.

Vertical coordinates when only the bounds of the cells are of interest

In the current CF the following is needed to specify vertical profiles with boundary information:

dimensions:
    lon = 360;
    lat = 180;
    layer = 18;
    vertices = 2;

variables:
    float lat(lat);
        lat:long_name = "latitude";
        lat:units = "degrees_north";
        lat:bounds = "lat_bnds";
    float lon(lon);
        lon:long_name = "longitude";
        lon:units = "degrees_east";
        lon:bounds = "lon_bnds";
    float pressure(layer, lon, lat);
        pressure:long_name = "pressure grid";
        pressure:units = "hPa";
        pressure:bounds = "pressure_bnds";
    float lat_bnds(lon,vertices);
    float lon_bnds(lat,vertices);
    float pressure_bnds(layer,lon,lat,vertices);
    float O3(layer, lon, lat);
        O3:units = "1e-9";
        O3:coordinates = "pressure";

In this example one of the vertical interfaces is the tropopause pressure, which doesn't fit nicely into a compression scheme or parametrization. However, the only useful vertical grid are the interfaces, the pressures themselves are useless. Furthermore the pressure grid is complete, i.e. the boundaries fill the vertical completely from surface to the top pressure without gaps. An extra dimension would suffice to describe the vertical grid in a much more compact way:

dimensions:
    lon = 360;
    lat = 180;
    layer = 18;
    interfaces = 19;

variables:
    float lat(lat);
        lat:long_name = "latitude";
        lat:units = "degrees_north";
        lat:bounds = "lat_bnds";
    float lon(lon);
        lon:long_name = "longitude";
        lon:units = "degrees_east";
        lon:bounds = "lon_bnds";
    float pressure(interfases, lon, lat);
        pressure:long_name = "pressure interface grid";
        pressure:units = "hPa";
    float lat_bnds(lon,vertices);
    float lon_bnds(lat,vertices);
    float O3(layer, lon, lat);
        O3:units = "1e-9";
        O3:coordinates = "pressure";

In the current CF conventions the pressure information is almost 3 times larger than the data itself, which seems inefficient. The n+1 method of storing the interfaces is rather common, for instance it is used by ECMWF in many of its datasets.

I propose to allow an 'off by one' method of storing vertical information only as a pressure grid with only interfaces, rather than pressure levels.

Moderation of proposals?

In the discussion of #148, the issue of a moderator was brought up:

#148 (comment)

Which refers to theRules for CF Conventions Changes:

http://cfconventions.org/rules.html

The question at hand was whether the "moderator" could be the proposer / a proponent of the proposal, or whether that is a conflict of interest.

I suggest that we should modify rules to make this more clear, an maybe make a change/addition:

In that doc the moderator:

"""
The moderator periodically summarises discussion on github, keeps it moving forward and tries to achieve a consensus. It is expected that everyone with an interest will contribute to the discussion and to achieving a consensus during this stage. During the discussion, if an objection is raised, answered and not reasserted, the moderator will assume the objection has been dropped. However, since consensus is the best outcome, it will be helpful if anyone who expresses an objection explicitly withdraws it on changing their mind or deciding to accept the majority view.

The moderator is encouraged to organize conference calls and/or webex-type interactions if this might help resolve an issue more quickly.
"""

This is a LOT of work I expect that we will be most likely to get someone to do it well if that have some "skin in the game" -- granted, lots of folks have skin in the game in the sense that they want CF to be as good as it can be, but I think someone that really wants the new feature is going to be more invested. And anyone that knows about and cares about CF will probably form an opinion anyway, so a truly "unbiased" moderator is kind of impossible.

So I suggest that the role of the moderator be divided (though it could still be one person, I suppose)

Role 1 (Propoonent?):

The moderator periodically summarises discussion on github, keeps it moving forward and tries to achieve a consensus.

The end result is a document that summarises both the proposal, and the discussion / rejected alternatives, objections, etc.

Role 2 (Moderator):

"attempt to move toward a decision on the proposal by summarising the discussion and indicating the outcome as consensus, near consensus, or not near consensus"

I guess what I'm suggesting is that the development of a final proposal and the guiding of a decion on that proposal be handled a bit differently.

I also think that the document that summarises both the proposal, and the discussion: e.g. rejected alternatives, objections, etc. be maintained in a single place -- could be the initial issue, or better, yet, somewhere else in the repo to be preserved for posterity. See #130 for more on that idea.

Proof read

Compare the original v1.6 with the AsciiDoc version and raise issues and/or submit PRs as required.

NB. A section should be ticked off once any necessary issues and/or PRs have been created. It does not need to wait for those issues/PRs to be resolved.

Reference UGRID conventions in CF

As discussed in Trac ticket 171 we would like to associate a specific version of UGRID with each version of CF.

We propose to simply add a section 1.5 to the Conventions Document called "Relationship to the UGRID Conventions" which would say:

UGRID is a convention for unstructured (e.g. triangular) grids that supplements the CF Conventions, including specification of grid topology and location of data on grid elements. Each version of CF is associated with a particular version of UGRID through the Conventions attribute in 2.6.1.

Then in Section 2.6.1, modify the beginning to read:

We recommend that netCDF files that follow these conventions indicate this by setting the NUG defined global attribute Conventions to the string value "CF-1.8" which also implies "UGRID-1.0".

Nested sections in bibliography

When I try to build the HTML as described in README.md, I get the following error:

asciidoctor: ERROR: bibliography.adoc: line 4: bibliography sections do not support nested sections

I see that there is indeed a nested section in bibliography.adoc, but this is nothing new. Am I doing something wrong?

first changes for version 1.7

These Trac tickets have been implemented for the DocBook version of the CF Conventions version 1.7. They need to be re-implemented in the AsciiDoc version.
ticket 61
ticket 62
ticket 64
ticket 65
ticket 69
For these tickets, there is no need to make corresponding changes in the conformance document.

Workflow?

In #130, we discussed and developed a CONTRIBUTING.md doc.

But that Doc was aimed at contributors -- and a number of issues came up in the discussion about the workflow that weren't decided, and also not about things we should be putting in a doc designed for outside contributors. But I don't hink we ever did nail down those issues, or at least I dont see it documented anywhere.

So I also propose we start a new discussion and document for the workflow: how we are going to use branches, etc.

I also propose that we create the concept of an "CF enhancement proposal" (CEP) where we document the pros and cons and final decision about a significant CF change. The Workflow doc could be the first of these.

This idea was inspired by the long discussion in #148, and by other projects use of Enhancement proposals, at least in the Python community:

https://www.python.org/dev/peps/

https://www.numpy.org/neps/index.html

https://matplotlib.org/devel/MEP/index.html

The idea is that when there is a significant (and perhaps contentious) addition or change to CF, our primary goal is an update to the convention doc. The previous discussion in #130 captured a fair bit about that process. But, in fact, we also need:

  • a better way to manage the discussion -- one central pace where the current proposal and pros and cons, etc are written out.

  • a way to capture that discussion for the future, so that when folks re-visit it in the future, they will see not just the convention, but why it is the way it is.

So I propose that we create a new section in the docs in this repo for enhancement proposals -- we can start with an index and draft of a gitHub workflow doc. (and maybe one for #148, too.

Suggested corrections in "OGC WKT Coordinate System Issues" wiki page

The OGC WKT Coordinate System Issues page is slightly outdated (it said nothing about ISO 19162, a.k.a. "WKT 2") and sometime inexact (in the line "While this text clearly has an error in it", the original author seems to have missed that the GEOGCS and GEOCCS are not the same WKT element). I can volunteer for editing this page with the following content:

  • Add ISO 19162 in the discussion.
  • Fix the confusion in the paragraph about PRIMEM units.
  • The paragraph about TOWGS84 become irrelevant in WKT 2. While we should keep this discussion for historical purpose, I suggest to add a paragraph explaining why TOWGS84 is deprecated and not supported anymore in WKT 2.

Is there any objection if I edit the wiki page accordingly?

Alternate grid mappings for geometry containers

Title: Alternate grid mappings for geometry containers
Moderator: @davidhassell
Requirement Summary:
Allow geometries to have a grid_mapping different than the data variable. For example, data variable coordinates could be in geographic coordinates, while coordinates for watershed polygons could be in a projected coordinate system.

Technical Proposal Summary:
Allow the grid_mapping attribute value on the geometry container to differ from the grid_mapping attribute on the data variable, provided the auxiliary coordinates do not have the nodes as bounds.

Benefits: This benefits those who have defined their geometries in a coordinate system different than the coordinates used for the data variable.

Status Quo:
According to Chapter 7, a data variable's coordinate variables can have a nodes attribute which identifies the equivalent variable from a geometry. Thus, nodes act as bounds to the coordinates, and bounds are considered as metadata to the coordinates. Therefore, following this logic, bounds can't have a different coordinate system.

Detailed Proposal:
Allow the different grid mapping on the geometry container variable, provided the auxiliary coordinates do not have the nodes as bounds, e.g.

dimensions:
  instance = 2 ;
  node = 5 ;
  time = 4 ;
variables:
  int time(time) ;
    time:units = "days since 2000-01-01" ;
  double lat(instance) ;                     
    lat:units = "degrees_north" ;
    lat:standard_name = "latitude" ;
  double lon(instance) ;                  
    lon:units = "degrees_east" ;
    lon:standard_name = "longitude" ;
  int datum ;
    datum:grid_mapping_name = "latitude_longitude" ;
    datum:longitude_of_prime_meridian = 0.0 ;
    datum:semi_major_axis = 6378137.0 ;
    datum:inverse_flattening = 298.257223563 ;
  int Lambert_Conformal;
    Lambert_Conformal:grid_mapping_name = "lambert_conformal_conic";
    Lambert_Conformal:standard_parallel = 25.0;
    Lambert_Conformal:longitude_of_central_meridian = 265.0;
    Lambert_Conformal:latitude_of_projection_origin = 25.0;
  int geometry_container ;
    geometry_container:geometry_type = "line" ;
    geometry_container:node_count = "node_count" ;
    geometry_container:node_coordinates = "x y" ;
    geometry_container:grid_mapping = "Lambert_Conformal" ;
  int node_count(instance) ;
  double x(node) ;
    x:units = "degrees_east" ;
    x:standard_name = "longitude" ;
    x:axis = "X" ;
  double y(node) ;
    y:units = "degrees_north" ;
    y:standard_name = "latitude" ;
    y:axis = "Y" ;
  double someData(instance, time) ;
    someData:coordinates = "time lat lon" ;
    someData:grid_mapping = "datum" ;
    someData:geometry = "geometry_container" ;
// global attributes:
  :Conventions = "CF-1.8" ;
  :featureType = "timeSeries" ;

"lat" and "lon" could have bounds, so long as they not "y" and "x".

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.