The cdms.products from idemsinternational

Create timeseries function

This will have similar specification to the histogram function #12

Below is an example.

Include the same facet options as for histogram, and other common options.

Some of these options from the R-Instat dialog could also be sensible to include as options:

Incorrect column names from export_geoclim_month.R

The file generated the station names are labeled as "id" instead of "station name"
@lilyclements could you please look at this.

Bugs in the inventory_plot function

These might not be bugs, but confusion as to why they're not working. @dannyparsons if you could explain this to me, that would be great, then I can add it into the documentation.

x_scale_from

If x_scale_from is given, then x_scale_to, and x_scale_by arguments have to be given (and similarly if at least one of the three are given, but not all three)
Should we set defaults if this is the case - x_scale_by to be 1 and x_scale_to to be the maximum year value in the date variable?

data(daily_niger)
# this does not work
inventory_plot(data = daily_niger, station = "station_name", elements = c("tmax", "tmin"),
               date = "date", x_scale_from = 1940, x_scale_by = 5)

# this does work
inventory_plot(data = daily_niger, station = "station_name", elements = c("tmax", "tmin"),
               date = "date", x_scale_from = 1940, x_scale_to = 1950, x_scale_by = 5)

x_scale_from with year_doy_plot = TRUE

If x_scale_... is given with year_doy_plot = TRUE, then the maximum year does not seem to work correctly.

inventory_plot(data = daily_niger, station = "station_name", elements = c("tmax", "tmin"),
               date = "date", year_doy_plot = TRUE,
               x_scale_from = 1940, x_scale_to = 1950, x_scale_by = 5)

facet_dir is a parameter in the list of parameters in the function but is never used in the function.
Was this removed, but left in the list of parameters accidentally, or should it be in the function somewhere?
rain_cats
There is 0, 0.85, and an upper bound value. Is this essentially setting the threshold value? What are the lower and upper boundaries for?
facet_by with year_doy_plot = TRUE
I can't seem to get elements-stations to work when year_doy_plot = TRUE. Is this intentional? If so, why? I'll update the documentation.

inventory_plot(data = daily_niger, station = "station_name",
               year_doy_plot = TRUE,
               elements = c("tmax", "tmin"), date = "date",
               facet_by = "elements-stations")

In addition there are some parameters in the documentation where I'm unsure of their purpose. Can you have a look, or briefly outline their aim so I can update the documentation?

Add test for climatic_summary that contains a `summaries_params` parameter

climatic_summary() has a summaries_params parameter.
This parameter has one list item for each summary function, and one list of parameters for each summary function e.g. 'list(mean = list(trim = 0.5))'.
In the Python wrapper, this parameter is translated from Python to R using some special code. I would like to test this translation.
Please could we add some climatic_summary() tests that have meaningful values for summaries_param (i.e. values that affect the output)?

The signature of `export_geoclim_month` is inconsistent with Python

In the function below, the parameters station_id, latitude, longitude come after parameters with default values. This is illegal in Python.

export_geoclim_month <- function(data, year, month, element, metadata = NULL,
                                 join_by = NULL, station_id,
                                 latitude, longitude, add_cols = NULL, 
                                 file_path = paste0("GEOCLIM-", element, ".csv"),
                                 ...)

Climsoft Products to implement

Data

1.1 Mean or Total of minute/hourly/daily/pentad/dekadal/monthly/annual data
1.2 Long term mean
1.3 Max and min, with or without the corresponding date

Graphics

2.1 Windrose
2.2 Timeseries chart
2.3 Histograms

Inventory

3.1 Details of Data Records
3.2 Inventory of Missing Data

Messages

4.1 CLIMAT Messages

Output for other applications

Use `checkmate` to validate inputs in all functions

The checkmate package seems very nice for easily checking the types of arguments and giving helpful error messages for wrong types https://cloud.r-project.org/web/packages/checkmate/vignettes/checkmate.html. I suggest we use this throughout the package if it works well.

@lilyclements Could you try to implement it for one function can check that it is sensible for us to use?

Creates extremes function

This is almost a special case of climatic_summary and will call this internally to calculate extreme values. A slightly different summary with a filter is needed to get date and occurrence of extreme events.

Add test for climatic_summary when there's no summaries provided

@HawardKetoyoMsatsi Lily found a bug here #67 which wasn't picked up by the tests I had written because I didn't include a test example where the summaries parameter is not provided e.g. this example below.

climatic_summary(data = daily_niger, date_time = "date", station = "station_name",
                              elements = c("rain", "tmax"))

Could you add a test for this case? It can be any call to climatic_summary where summaries is not provided. In this case, it uses the default summary which is dplyr::n().

Create function to export data for CDT

Export data in format for CDT https://iri.columbia.edu/our-expertise/climate/tools/cdt/

Daily and dekadal output should be possible.

Example format for daily output attached.

daily_CDT_PRECIP-2000-2022.csv
daily_CDT_TMPMIN-2000-2022.csv

"element" not recognized in Input (file_path = paste0("CDT-", element, ".csv")

@lilyclements I put this error description earlier in PR #78 but I will be opening it as a new issue.

Files export_cdt.R, export_cdt_daily.R and export_cdt_dekad.R doesnt recognise "element" in the input (file_path = paste0("CDT-", element, ".csv") (this should be fixed for the tests to run)
Whenever I run the functions I get the error:

Error in paste0("CDT-", element, ".csv") : object 'element' not found

Create function to export data for GeoCLIM

Export data in format for GeoCLIM https://earlywarning.usgs.gov/fews/software-tools/20

Daily, dakadal and monthly output should be possible.

Example format for monthly output attached.

GEOCLM-PRECIP.csv
GEOCLM-TMPMIN.csv

Parameters ignored in `export_geoclim_dekad()` and `export_geoclim_pentad()`

The export_geoclim_dekad() function is shown below.
Should the file_path and ... parameters also be passed to export_geoclim()?
The export_geoclim_pentad() function has a similar problem.

export_geoclim_dekad <- function(data, year, dekad, element, metadata = NULL,
                                 join_by = NULL, station_id,
                                 latitude, longitude, add_cols = NULL,
                                 file_path = paste0("GEOCLIM-", 
                                                    element, 
                                                    ".csv"),
                                 ...) {
  export_geoclim(data = data, year = year, type = "dekad", type_col = dekad, 
                 element = element, metadata = metadata, join_by = join_by,
                 station_id = station_id, latitude = latitude, 
                 longitude = longitude, add_cols = add_cols)
}

R functions to write tests on

Check if we can remove the following R files without it impacting elsewhere:

convert_to_dec_deg.R
get_lon_from_data.R
get_lat_from_data.R
lat_lon_dataframe.R

Need to create tests on:

Create histogram function

The histogram function should have similar main parameters to inventory_plot: data, date, elements, station

Below are examples that should be possible to produce.

I suggest we have a facet option which could be either none (as below), station, elements or station-elements. There should also be a nrow/ncol option which is used facetting by a single variable.

If there are multiple histograms on one plot e.g. two elements given but facet set to none, then colour should be used to distinguish elements. Similarly for multiple stations.

Also include some of the common options like title, axis titles, bar colour/fill, axis break specification

General theme options don't need to be included.

There could also be an option, like in the R-Instat dialog to produce either a histogram, density plot, ridge plot or frequency polygon.

Create function for Inventory of Missing Data

Example to follow.

Create climate summary function

This will be a general summary function, being able to summarise to different time periods e.g. hourly, daily, monthly, annual, using specified standard summaries e.g. mean, min, max, sum.

The functionality will be similar to the Climate Summaries dialog in R-Instat.

Flexible options for dealing with missing values should also be available.

Add an example dataset to the package

Add a sample of data from the DARE project to recover data from West Africa that has agreed to be freely available as a dataset included with the package. Use the guidance here https://r-pkgs.org/data.html

Best practice for ggplot2 wrapper functions

Been looking to best practice for writing functions that produce a ggplot2 graph:

https://rpubs.com/hadley/97970 - suggestions on functions and parameters
https://fishandwhistle.net/slides/rstudioconf2020/#1 - how to call things correctly to pass R package checks
https://icydk.com/how-to-write-functions-to-make-plots-with-ggplot2-in-r/ - idea of using glue package

I haven't quite found what I want which is how to make a function as flexible as possible without having 100s of parameters. The first link suggests optional parameters that can be lists of parameters to be passed to different bits of the plot e.g. bar.params = list(), errorbar.params = list().

I haven't yet found a suggestion on how to do this in general to cover all aspects of the plot.

Discussion on using "tidy evaluation" or not for column names

We generally do not use tidy evaluation but should have a discussion on this soon.

@dannyparsons in some functions, such as climatic_missing, we use the tidy evaluation method.

I'll list here functions we use tidy evaluation in, and so we can change either these functions later - or the functions not listed here to update to tidy evaluation.

Errors in prepare_geoclim function

@dannyparsons I am adding documentation to more functions and so running examples.

prepare_geoclim_month
Perhaps this is an error, or perhaps I'm doing something wrong. If I run the following code I get an error that

object 'month_abb_english' not found

I cannot find in the data where month_abb_english is defined, so am not sure how to fix this one!

prepare_geoclim_month(data = daily_niger, year = "year", month = "month",
                   station_id = "station_name",
                   element = "rain", metadata = stations_niger, 
                   join_by = "station_name",
                   latitude = "lat", longitude = "long")

Also

Should this have an optional date parameter which year and month can be created from
Should this be station not station_id
What length argument can be given to add_cols? (character(1)?)

I'm also getting an error in prepare_geoclim:
If I run the following then I get an error

dekad_data <- daily_niger %>%
  dplyr::mutate(dekad = dekad(date))

prepare_geoclim(data = dekad_data, year = "year",
                station_id = "station_name",
                type_col = "dekad",
                element = "rain", metadata = stations_niger, 
                join_by = "station_name",
                latitude = "lat", longitude = "long")

Create CLIMAT Messages function

Data export function to export data in CLIMAT Messages format as described here https://www.ncdc.noaa.gov/monitoring-references/dyk/climat and with detailed specification here https://library.wmo.int/doc_num.php?explnum_id=9253

Create function for Details of Data Records (Inventory)

This has details of available and missing data for the station elements selected.

Example to follow.

Make tests accessible on installed package

Use suggestion in https://stackoverflow.com/a/36575128/6424231 to make the tests accessible on an installed package

Invalid tests in `test-histogram_plot.R`

test-histogram_plot.R contains the tests below (currently commented out):

#   t1_points <- histogram_plot(data = agades, date_time = "date", 
#                                facet_by = "none",
#                                elements = "tmin", add_points = TRUE)
#   t1_lobf <- histogram_plot(data = agades, date_time = "date",
#                              facet_by = "none",
#                              elements = "tmin", add_line_of_best_fit = TRUE)
#   t1_path <- histogram_plot(data = agades, date_time = "date",
#                              facet_by = "none",
#                              elements = "tmin", add_path = TRUE)
#   t1_step <- histogram_plot(data = agades, date_time = "date",
#                              facet_by = "none",
#                              elements = "tmin", add_step = TRUE)

When I try to execute these tests in RStudio, I get the following errors:

>    t1_points <- histogram_plot(data = agades, date_time = "date", 
+                                 facet_by = "none",
+                                 elements = "tmin", add_points = TRUE)
Error in histogram_plot(data = agades, date_time = "date", facet_by = "none",  : 
  unused argument (add_points = TRUE)
>    t1_lobf <- histogram_plot(data = agades, date_time = "date",
+                               facet_by = "none",
+                               elements = "tmin", add_line_of_best_fit = TRUE)
Error in histogram_plot(data = agades, date_time = "date", facet_by = "none",  : 
  unused argument (add_line_of_best_fit = TRUE)
>    t1_path <- histogram_plot(data = agades, date_time = "date",
+                               facet_by = "none",
+                               elements = "tmin", add_path = TRUE)
Error in histogram_plot(data = agades, date_time = "date", facet_by = "none",  : 
  unused argument (add_path = TRUE)
>    t1_step <- histogram_plot(data = agades, date_time = "date",
+                               facet_by = "none",
+                               elements = "tmin", add_step = TRUE)
Error in histogram_plot(data = agades, date_time = "date", facet_by = "none",  : 
  unused argument (add_step = TRUE)

Parameter names - consistency

Seen a few inconsistent parameter names between functions.

E.g.

date_time vs date
element vs elements
station vs stations

I suggest we use: date, element, station.

@dannyparsons what do you think?

Create initial documentation for all functions

Decide on copyright

At the moment there's a GPL-3 and an MIT licence document in the repo. These are the two most common so I think we want one of these and then remove the other.

@volloholic Do you have a view on this?

@isedwards This is the repo for the R package that will have climatic functionalities from R-Instat/Climsoft products. I don't think the licence choice has any implicated for use in OpenCDMS Processes since (from https://r-pkgs.org/license.html):

Note that simply using a package or R itself doesn’t require that you comply with the license; this is why you can write proprietary R code and why R packages can have any license you choose.

Package name

Any strong views on the package name? @isedwards @lilyclements

RInstatClmatic was just a placeholder. My current ideas are:

climate_products
cdms_products

Any other ideas?

Probably leaning towards cdms_products as it's shorter and fits with these being functions written for data from CDMS but not being too specifically tied to any one software package.

Only use names and function signatures that are also valid in Python

The Python layer uses the 'rpy2' Python package. This package allows Python code to refer directly to R functions, parameters and other objects. In order to make the Python code simple and readable, we should minimise the transformations needed between the Python and R layers.

However, R allows some things that are illegal in Python. For example, the climatic_summary() function uses the following practices that are illegal in Python:

Some function parameters have a dot ('.') as part of their names. For example, na.rm and summaries.params. To be consistent with Python, they should be named na_rm and summaries_params (or something similar).
Some function parameters without default values are listed after parameters with default values. For example, the elements and summaries parameters are in the middle of the list, they should be before the first parameter with a default value.

The Python rpy2 package provides workarounds for the above but they add complexity and reduce readability.

Please would it be possible to avoid the above and only use names and function signatures that are also valid in Python?

Note: If I find new inconsistencies while I implement the Python layer, then I will add extra check boxes to the list above.

Bug when using elements in the climatic_summary function

@dannyparsons I'm getting an error when running the climatic_summary function, coming from this line:

cdms.products/R/climatic_summary.R

Line 216 in b0ecc5d

dplyr::summarise(dplyr::across(dplyr::all_of(elements),

I'm not sure how to fix it. It seems to occur if I put an element in the elements parameter.

climatic_summary(data = daily_niger, date_time = "date", station = "station_name",
                              elements = c("rain", "tmax"))

Two examples are attached

MN005050_Rclimdex.txt
NP002051_Rclimdex.txt

Set up unit testing with the `testthat` package

Follow the steps here https://r-pkgs.org/tests.html to set up the testing system.
Add some initial tests

Export CDT Dekadal data

This is very similar to the function for #18 but for dekadal data. The format is essentially the same. Examples attached.

dekadal_CDT_PRECIP-2000-2022.csv
dekadal_CDT_TMPMIN-2000-2022.csv

idemsinternational / cdms.products Goto Github PK

cdms.products's People

Contributors

Watchers

Forkers

cdms.products's Issues

Recommend Projects

Recommend Topics

Recommend Org