Incorporating the CFA convention for aggregated datasets into CF about cf-conventions HOT 4 OPEN

davidhassell commented on August 16, 2024 1

Incorporating the CFA convention for aggregated datasets into CF

from cf-conventions.

Comments (4)

larsbarring commented on August 16, 2024 1

David,
This will be an excellent and very useful addition to the CF Conventions! I have not yet wrapped my head around the technical details. There is one thing I do not quite understand, first you write:

The aggregation variables contain no data but instead record instructions on both how to find the data in their original files, and how to combine the data into an aggregated data array

And below the Figure 1 you write:

Note that this proposal does not cover how to decide whether or not the data arrays of two existing variables could or should be aggregated into a single larger array.

Probably I am missing something here, but to me this seems contradictory? Anyway, that is a detail, and I think the more important questions are the one you raise in the Technical Proposal Summary:

... incorporate CFA into CF ... that this is a good idea, ...

To me this is no doubt a good idea, which already has a strong community backing.

... how the new content should be structured (e.g. a new section, a new appendix, both, or something else).

Perhaps an outline somewhere in the main text: end of Chapter 2 regarding aggregation files and their relation to the fragment files, somewhere in Chapter 3 regarding aggregation variables? And then an exhaustive description in an Appendix?

This, brings me a more general thought that I have been thinking about for some time:
I think that the CF Conventions document is getting increasingly long and complex/difficult to get an overview of. The Table of Content takes 8 full screens (5 pdf pages), then 5 screens of Tables of tables/figures/examples (3 pdf pages). I have no idea how to improve upon this, but it becomes more and more of a concern as we add new features to the Conventions. However, this is not something to discuss and solve here in this enhancement proposal, but I wanted to bring it up here anywaay.

from cf-conventions.

davidhassell commented on August 16, 2024

Thank you for you comments, Lars, and sorry that it has taken me some time to respond.

Even though you are the only person to have commented here (and in support), this proposal has been scrutinised carefully at two CF workshops, with a group decision being reached in 2023 to work towards incorporating CFA into CF. I'm therefore minded to move to writing the PR, now that Lars has made a good suggestion of how and where the content could go into the existing CF conventions. This shouldn't take too long, because it will largely be a "cut and paste" job from the existing CFA description, which was deliberately written in a CF-ish style in anticipation of this :).

The aggregation variables contain no data but instead record instructions on both how to find the data in their original files, and how to combine the data into an aggregated data array
...
Note that this proposal does not cover how to decide whether or not the data arrays of two existing variables could or should be aggregated into a single larger array.

Good point. The first statement applies to the reading of the data, and the second to the writing of the data. The CFA conventions do not give any guidance on the decision of how fragment files can be combined prior to creating an aggregation variable, rather once you have an aggregation in mind, they provide a framework in which you can encode it in such a way that other people can decode it.

If I give you two datasets (A and B) then the CFA conventions won't give you any help in working out if A and B can be sensibly combined into a single larger dataset (C). There are various ways in which you could work this out yourself - you could inspect the metadata and apply an aggregation algorithm (e.g. this one, or by visual inspection), or base it on files names (e.g. I know that model outputs from March.nc and April.nc are safe to combine into a 2-month dataset), etc.

Perhaps an outline somewhere in the main text: end of Chapter 2 regarding aggregation files and their relation to the fragment files, somewhere in Chapter 3 regarding aggregation variables? And then an exhaustive description in an Appendix?

I like the idea of a Chapter 2 outline. I might suggest content from Introduction, Terminology, Aggregation variables, and Aggregation instructions (without its subsections) for Chapter 2, and everything else - which is most of the existing CFA document - (Standardized aggregation instructions, Non-standardized terms, Fragment Storage and examples) for the appendix.

The Table of Content takes 8 full screens (5 pdf pages), then 5 screens of Tables of tables/figures/examples (3 pdf pages).

Just a thought - the TOC currently shows all subⁿsections - maybe it could be restricted to just one level of subsection, so for instance Chapter 7 would go from

[7. Data Representative of Cells](https://cfconventions.org/cf-conventions/cf-conventions.html#_data_representative_of_cells)
    [7.1. Cell Boundaries](https://cfconventions.org/cf-conventions/cf-conventions.html#cell-boundaries)
    [7.2. Cell Measures](https://cfconventions.org/cf-conventions/cf-conventions.html#cell-measures)
    [7.3. Cell Methods](https://cfconventions.org/cf-conventions/cf-conventions.html#cell-methods)
        [7.3.1. Statistics for more than one axis](https://cfconventions.org/cf-conventions/cf-conventions.html#statistics-more-than-one-axis)
        [7.3.2. Recording the spacing of the original data and other information](https://cfconventions.org/cf-conventions/cf-conventions.html#recording-spacing-original-data)
        [7.3.3. Statistics applying to portions of cells](https://cfconventions.org/cf-conventions/cf-conventions.html#statistics-applying-portions)
        [7.3.4. Cell methods when there are no coordinates](https://cfconventions.org/cf-conventions/cf-conventions.html#cell-methods-no-coordinates)
    [7.4. Climatological Statistics](https://cfconventions.org/cf-conventions/cf-conventions.html#climatological-statistics)
    [7.5. Geometries](https://cfconventions.org/cf-conventions/cf-conventions.html#geometries)

[7. Data Representative of Cells](https://cfconventions.org/cf-conventions/cf-conventions.html#_data_representative_of_cells)
    [7.1. Cell Boundaries](https://cfconventions.org/cf-conventions/cf-conventions.html#cell-boundaries)
    [7.2. Cell Measures](https://cfconventions.org/cf-conventions/cf-conventions.html#cell-measures)
    [7.3. Cell Methods](https://cfconventions.org/cf-conventions/cf-conventions.html#cell-methods)
    [7.4. Climatological Statistics](https://cfconventions.org/cf-conventions/cf-conventions.html#climatological-statistics)
    [7.5. Geometries](https://cfconventions.org/cf-conventions/cf-conventions.html#geometries)

That alone would remove 71 lines from the TOC! But as you say, any more on that should be discussed elsewhere, which I would welcome.

from cf-conventions.

taylor13 commented on August 16, 2024

I think this is generally a good idea and have been meaning to go over the details.

A quick thought about the table of contents: Would it be easy in the web view to collapse the subsection hierarchy to 1 or 2 levels, then click on an upper level to display its subsections? That might give a newbie a more accessible overview. On the other hand, I usually just execute "find" for some key word I know is relevant to what I want to look up, and if that word becomes hidden (in a hidden low level subsection), then I may have a harder time navigating quickly to the relevant section. So I can see arguments for the current expanded table of contents.

from cf-conventions.

Incorporating the CFA convention for aggregated datasets into CF about cf-conventions HOT 4 OPEN

Comments (4)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent