Code Monkey home page Code Monkey logo

essdive-csv-structure's People

Contributors

dylanporyan avatar tvelliquette avatar

Watchers

 avatar  avatar  avatar  avatar  avatar

Forkers

tvelliquette

essdive-csv-structure's Issues

N/A values in R

Submitter: @robcrystalornelas and @JEDamerow

I suggest the following changes:
The "N/A" requirement that the CSV format recommends for missing values requires additional step in R.

R specifically uses "NA" for missing values rather than N/A

Other options for characters and symbols in variable names

Submitter: Rob Crystal-Ornelas

I suggest the following changes: Currently, the reporting format says that variable names should only include: letters, numbers, hyphen, underscores.

With these restrictions, would have to reformat variable named:
image

Per recommendation of @vchendrix, keeping variable names restricted to letters, numbers, hyphen and underscore helps ensure that files are portable to as many languages as possible.

Options to consider for removing slash from variable name

  • chloride_umol_per_l (this follows recommendations from: https://units-of-measurement.org/umol)
  • chloride_micromol_per_l (the mu symbol is not ascii character so should be swapped out)
  • chloride_micromol-per-l (in this case the underscore separates the variable being measured and the unit)
  • Other option is to write a script to separate the unit from the variable being measured so that we fully comply with CSV guidelines.

With all the options listed above, we need to consider how much it may disrupt researchers already ingesting and using data that have variables in their original format.

CSV Quick Guide File Orientation Correction

I suggest the following changes to the Reporting Format Description for Column or Row Name Orientation:

Tabular data can be organized: 1) Horizontally meaning that data are organized in rowscolumns and there's a new name describing the data at the start of every rowtop of every column 2) Vertically meaning that data are organized in columnsrows and there is a new columnrow name describing the data at the top of every columnstart of every row.

Time reporting format

I suggest the following changes: Could we agree on an "ESS-DIVE central" format for how to report date and time? Looking through the currently uploaded instructions (CSV, Sample ID, Soil Respiration, Hydrologic Monitoring, Leaf Gas exchange) there are as many ways of reporting date and time as there are reporting formats (although all recommend UTC and request UTC offset if using local time). I personally like (and think it would make things more script-friendly as we are trying to limit the use of special characters????) to keep delimiters out of it (as proposed in the soil respiration reporting format) and simply use up to 12 digits (YYYYMMDDHHMMSS) to the level of appropriate resolution for the data, but as long as we agree on what we should require/recommend I think that would help a lot...

Possible to use semicolon in list?

Is there a reason why we shouldn't use semicolon in (for example) a list of terms within a CSV cell? I reviewed notes on this reporting format and did not find a reason why semicolons shouldn't be used to separate words in a list.

I suggest the following changes:
Right now, the CSV reporting format suggests: "For commas not meant to be a delimiter (e.g. used within a cell), use a vertical bar"

could be changed to: "For commas not meant to be a delimiter (e.g. used within a cell), use a vertical bar or semicolon"

@charuleka and @vchendrix Is there a use case we can think of where semicolon's should not be used as part of a list of terms?

Possible additions

Hi all, in my experience with CSV files, there are two issues that come up that can affect portability/reading that this reporting format might want to address:

  1. Last line in file. Should files end with the last line of data or have a single empty line at the end? I'd suggest the latter, which is the POSIX standard and generally recommended. Note however that Excel saves CSVs without such an empty line. ๐Ÿ˜ž
  2. Line separator. Windows for example separates them by CR+LF, whereas Unix-y systems use LF. Not sure if the format wants to get this far into the weeds, but if yes, I'd recommend the latter.

Anyway for your consideration. The first is probably more important than the second.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.