Code Monkey home page Code Monkey logo

Comments (4)

larsbarring avatar larsbarring commented on August 16, 2024 1

David,
This will be an excellent and very useful addition to the CF Conventions! I have not yet wrapped my head around the technical details. There is one thing I do not quite understand, first you write:

The aggregation variables contain no data but instead record instructions on both how to find the data in their original files, and how to combine the data into an aggregated data array

And below the Figure 1 you write:

Note that this proposal does not cover how to decide whether or not the data arrays of two existing variables could or should be aggregated into a single larger array.

Probably I am missing something here, but to me this seems contradictory? Anyway, that is a detail, and I think the more important questions are the one you raise in the Technical Proposal Summary:

... incorporate CFA into CF ... that this is a good idea, ...

To me this is no doubt a good idea, which already has a strong community backing.

... how the new content should be structured (e.g. a new section, a new appendix, both, or something else).

Perhaps an outline somewhere in the main text: end of Chapter 2 regarding aggregation files and their relation to the fragment files, somewhere in Chapter 3 regarding aggregation variables? And then an exhaustive description in an Appendix?

This, brings me a more general thought that I have been thinking about for some time:
I think that the CF Conventions document is getting increasingly long and complex/difficult to get an overview of. The Table of Content takes 8 full screens (5 pdf pages), then 5 screens of Tables of tables/figures/examples (3 pdf pages). I have no idea how to improve upon this, but it becomes more and more of a concern as we add new features to the Conventions. However, this is not something to discuss and solve here in this enhancement proposal, but I wanted to bring it up here anywaay.

from cf-conventions.

davidhassell avatar davidhassell commented on August 16, 2024

Thank you for you comments, Lars, and sorry that it has taken me some time to respond.

Even though you are the only person to have commented here (and in support), this proposal has been scrutinised carefully at two CF workshops, with a group decision being reached in 2023 to work towards incorporating CFA into CF. I'm therefore minded to move to writing the PR, now that Lars has made a good suggestion of how and where the content could go into the existing CF conventions. This shouldn't take too long, because it will largely be a "cut and paste" job from the existing CFA description, which was deliberately written in a CF-ish style in anticipation of this :).

The aggregation variables contain no data but instead record instructions on both how to find the data in their original files, and how to combine the data into an aggregated data array
...
Note that this proposal does not cover how to decide whether or not the data arrays of two existing variables could or should be aggregated into a single larger array.

Good point. The first statement applies to the reading of the data, and the second to the writing of the data. The CFA conventions do not give any guidance on the decision of how fragment files can be combined prior to creating an aggregation variable, rather once you have an aggregation in mind, they provide a framework in which you can encode it in such a way that other people can decode it.

If I give you two datasets (A and B) then the CFA conventions won't give you any help in working out if A and B can be sensibly combined into a single larger dataset (C). There are various ways in which you could work this out yourself - you could inspect the metadata and apply an aggregation algorithm (e.g. this one, or by visual inspection), or base it on files names (e.g. I know that model outputs from March.nc and April.nc are safe to combine into a 2-month dataset), etc.

Perhaps an outline somewhere in the main text: end of Chapter 2 regarding aggregation files and their relation to the fragment files, somewhere in Chapter 3 regarding aggregation variables? And then an exhaustive description in an Appendix?

I like the idea of a Chapter 2 outline. I might suggest content from Introduction, Terminology, Aggregation variables, and Aggregation instructions (without its subsections) for Chapter 2, and everything else - which is most of the existing CFA document - (Standardized aggregation instructions, Non-standardized terms, Fragment Storage and examples) for the appendix.

The Table of Content takes 8 full screens (5 pdf pages), then 5 screens of Tables of tables/figures/examples (3 pdf pages).

Just a thought - the TOC currently shows all subnsections - maybe it could be restricted to just one level of subsection, so for instance Chapter 7 would go from

[7. Data Representative of Cells](https://cfconventions.org/cf-conventions/cf-conventions.html#_data_representative_of_cells)
    [7.1. Cell Boundaries](https://cfconventions.org/cf-conventions/cf-conventions.html#cell-boundaries)
    [7.2. Cell Measures](https://cfconventions.org/cf-conventions/cf-conventions.html#cell-measures)
    [7.3. Cell Methods](https://cfconventions.org/cf-conventions/cf-conventions.html#cell-methods)
        [7.3.1. Statistics for more than one axis](https://cfconventions.org/cf-conventions/cf-conventions.html#statistics-more-than-one-axis)
        [7.3.2. Recording the spacing of the original data and other information](https://cfconventions.org/cf-conventions/cf-conventions.html#recording-spacing-original-data)
        [7.3.3. Statistics applying to portions of cells](https://cfconventions.org/cf-conventions/cf-conventions.html#statistics-applying-portions)
        [7.3.4. Cell methods when there are no coordinates](https://cfconventions.org/cf-conventions/cf-conventions.html#cell-methods-no-coordinates)
    [7.4. Climatological Statistics](https://cfconventions.org/cf-conventions/cf-conventions.html#climatological-statistics)
    [7.5. Geometries](https://cfconventions.org/cf-conventions/cf-conventions.html#geometries)

to

[7. Data Representative of Cells](https://cfconventions.org/cf-conventions/cf-conventions.html#_data_representative_of_cells)
    [7.1. Cell Boundaries](https://cfconventions.org/cf-conventions/cf-conventions.html#cell-boundaries)
    [7.2. Cell Measures](https://cfconventions.org/cf-conventions/cf-conventions.html#cell-measures)
    [7.3. Cell Methods](https://cfconventions.org/cf-conventions/cf-conventions.html#cell-methods)
    [7.4. Climatological Statistics](https://cfconventions.org/cf-conventions/cf-conventions.html#climatological-statistics)
    [7.5. Geometries](https://cfconventions.org/cf-conventions/cf-conventions.html#geometries)

That alone would remove 71 lines from the TOC! But as you say, any more on that should be discussed elsewhere, which I would welcome.

from cf-conventions.

taylor13 avatar taylor13 commented on August 16, 2024

I think this is generally a good idea and have been meaning to go over the details.

A quick thought about the table of contents: Would it be easy in the web view to collapse the subsection hierarchy to 1 or 2 levels, then click on an upper level to display its subsections? That might give a newbie a more accessible overview. On the other hand, I usually just execute "find" for some key word I know is relevant to what I want to look up, and if that word becomes hidden (in a hidden low level subsection), then I may have a harder time navigating quickly to the relevant section. So I can see arguments for the current expanded table of contents.

from cf-conventions.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.