idemsinternational / cdms.products Goto Github PK
View Code? Open in Web Editor NEWAn R package for manipulating and analysing historical climatic data.
License: Other
An R package for manipulating and analysing historical climatic data.
License: Other
This will have similar specification to the histogram function #12
Below is an example.
Include the same facet options as for histogram, and other common options.
Some of these options from the R-Instat dialog could also be sensible to include as options:
The file generated the station names are labeled as "id" instead of "station name"
@lilyclements could you please look at this.
These might not be bugs, but confusion as to why they're not working. @dannyparsons if you could explain this to me, that would be great, then I can add it into the documentation.
x_scale_from
If x_scale_from
is given, then x_scale_to
, and x_scale_by
arguments have to be given (and similarly if at least one of the three are given, but not all three)
Should we set defaults if this is the case - x_scale_by
to be 1
and x_scale_to
to be the maximum year
value in the date
variable?
data(daily_niger)
# this does not work
inventory_plot(data = daily_niger, station = "station_name", elements = c("tmax", "tmin"),
date = "date", x_scale_from = 1940, x_scale_by = 5)
# this does work
inventory_plot(data = daily_niger, station = "station_name", elements = c("tmax", "tmin"),
date = "date", x_scale_from = 1940, x_scale_to = 1950, x_scale_by = 5)
x_scale_from
with year_doy_plot = TRUE
If x_scale_...
is given with year_doy_plot = TRUE
, then the maximum year does not seem to work correctly.
inventory_plot(data = daily_niger, station = "station_name", elements = c("tmax", "tmin"),
date = "date", year_doy_plot = TRUE,
x_scale_from = 1940, x_scale_to = 1950, x_scale_by = 5)
facet_dir
is a parameter in the list of parameters in the function but is never used in the function.
Was this removed, but left in the list of parameters accidentally, or should it be in the function somewhere?
rain_cats
There is 0, 0.85, and an upper bound value. Is this essentially setting the threshold value? What are the lower and upper boundaries for?
facet_by
with year_doy_plot = TRUE
I can't seem to get elements-stations
to work when year_doy_plot = TRUE
. Is this intentional? If so, why? I'll update the documentation.
inventory_plot(data = daily_niger, station = "station_name",
year_doy_plot = TRUE,
elements = c("tmax", "tmin"), date = "date",
facet_by = "elements-stations")
climatic_summary()
has a summaries_params
parameter.
This parameter has one list item for each summary function, and one list of parameters for each summary function e.g. 'list(mean = list(trim = 0.5))'.
In the Python wrapper, this parameter is translated from Python to R using some special code. I would like to test this translation.
Please could we add some climatic_summary()
tests that have meaningful values for summaries_param
(i.e. values that affect the output)?
In the function below, the parameters station_id, latitude, longitude
come after parameters with default values. This is illegal in Python.
export_geoclim_month <- function(data, year, month, element, metadata = NULL,
join_by = NULL, station_id,
latitude, longitude, add_cols = NULL,
file_path = paste0("GEOCLIM-", element, ".csv"),
...)
The checkmate
package seems very nice for easily checking the types of arguments and giving helpful error messages for wrong types https://cloud.r-project.org/web/packages/checkmate/vignettes/checkmate.html. I suggest we use this throughout the package if it works well.
@lilyclements Could you try to implement it for one function can check that it is sensible for us to use?
This is almost a special case of climatic_summary
and will call this internally to calculate extreme values. A slightly different summary with a filter is needed to get date and occurrence of extreme events.
@HawardKetoyoMsatsi Lily found a bug here #67 which wasn't picked up by the tests I had written because I didn't include a test example where the summaries
parameter is not provided e.g. this example below.
climatic_summary(data = daily_niger, date_time = "date", station = "station_name",
elements = c("rain", "tmax"))
Could you add a test for this case? It can be any call to climatic_summary
where summaries
is not provided. In this case, it uses the default summary which is dplyr::n()
.
Export data in format for CDT https://iri.columbia.edu/our-expertise/climate/tools/cdt/
Daily and dekadal output should be possible.
Example format for daily output attached.
daily_CDT_PRECIP-2000-2022.csv
daily_CDT_TMPMIN-2000-2022.csv
@lilyclements I put this error description earlier in PR #78 but I will be opening it as a new issue.
Files export_cdt.R, export_cdt_daily.R and export_cdt_dekad.R doesnt recognise "element" in the input (file_path = paste0("CDT-", element, ".csv") (this should be fixed for the tests to run)
Whenever I run the functions I get the error:
Error in paste0("CDT-", element, ".csv") : object 'element' not found
Export data in format for GeoCLIM https://earlywarning.usgs.gov/fews/software-tools/20
Daily, dakadal and monthly output should be possible.
Example format for monthly output attached.
The export_geoclim_dekad()
function is shown below.
Should the file_path
and ...
parameters also be passed to export_geoclim()
?
The export_geoclim_pentad()
function has a similar problem.
export_geoclim_dekad <- function(data, year, dekad, element, metadata = NULL,
join_by = NULL, station_id,
latitude, longitude, add_cols = NULL,
file_path = paste0("GEOCLIM-",
element,
".csv"),
...) {
export_geoclim(data = data, year = year, type = "dekad", type_col = dekad,
element = element, metadata = metadata, join_by = join_by,
station_id = station_id, latitude = latitude,
longitude = longitude, add_cols = add_cols)
}
Check if we can remove the following R files without it impacting elsewhere:
Need to create tests on:
make_factor
prepare_geoclim_*
functionsspells
functionsummarise_inventory_data
functionwwr_export
(should output textfile should be this text. so can write it and compare)export_climat_messages
functionexport_*
functions contain ...
parameters. Please add tests that pass ...
parameters that have an observable impact on the converted fileThe histogram function should have similar main parameters to inventory_plot: data, date, elements, station
Below are examples that should be possible to produce.
I suggest we have a facet option which could be either none (as below), station, elements or station-elements. There should also be a nrow/ncol option which is used facetting by a single variable.
If there are multiple histograms on one plot e.g. two elements given but facet set to none, then colour should be used to distinguish elements. Similarly for multiple stations.
Also include some of the common options like title, axis titles, bar colour/fill, axis break specification
General theme options don't need to be included.
There could also be an option, like in the R-Instat dialog to produce either a histogram, density plot, ridge plot or frequency polygon.
Example to follow.
This will be a general summary function, being able to summarise to different time periods e.g. hourly, daily, monthly, annual, using specified standard summaries e.g. mean, min, max, sum.
The functionality will be similar to the Climate Summaries dialog in R-Instat.
Flexible options for dealing with missing values should also be available.
Add a sample of data from the DARE project to recover data from West Africa that has agreed to be freely available as a dataset included with the package. Use the guidance here https://r-pkgs.org/data.html
Been looking to best practice for writing functions that produce a ggplot2 graph:
https://rpubs.com/hadley/97970 - suggestions on functions and parameters
https://fishandwhistle.net/slides/rstudioconf2020/#1 - how to call things correctly to pass R package checks
https://icydk.com/how-to-write-functions-to-make-plots-with-ggplot2-in-r/ - idea of using glue package
I haven't quite found what I want which is how to make a function as flexible as possible without having 100s of parameters. The first link suggests optional parameters that can be lists of parameters to be passed to different bits of the plot e.g. bar.params = list(), errorbar.params = list()
.
I haven't yet found a suggestion on how to do this in general to cover all aspects of the plot.
We generally do not use tidy evaluation but should have a discussion on this soon.
@dannyparsons in some functions, such as climatic_missing
, we use the tidy evaluation method.
I'll list here functions we use tidy evaluation in, and so we can change either these functions later - or the functions not listed here to update to tidy evaluation.
@dannyparsons I am adding documentation to more functions and so running examples.
prepare_geoclim_month
Perhaps this is an error, or perhaps I'm doing something wrong. If I run the following code I get an error that
object 'month_abb_english' not found
I cannot find in the data where month_abb_english
is defined, so am not sure how to fix this one!
prepare_geoclim_month(data = daily_niger, year = "year", month = "month",
station_id = "station_name",
element = "rain", metadata = stations_niger,
join_by = "station_name",
latitude = "lat", longitude = "long")
Also
date
parameter which year
and month
can be created fromstation
not station_id
add_cols
? (character(1)
?)I'm also getting an error in prepare_geoclim
:
If I run the following then I get an error
dekad_data <- daily_niger %>%
dplyr::mutate(dekad = dekad(date))
prepare_geoclim(data = dekad_data, year = "year",
station_id = "station_name",
type_col = "dekad",
element = "rain", metadata = stations_niger,
join_by = "station_name",
latitude = "lat", longitude = "long")
Data export function to export data in CLIMAT Messages format as described here https://www.ncdc.noaa.gov/monitoring-references/dyk/climat and with detailed specification here https://library.wmo.int/doc_num.php?explnum_id=9253
This has details of available and missing data for the station elements selected.
Example to follow.
Use suggestion in https://stackoverflow.com/a/36575128/6424231 to make the tests accessible on an installed package
test-histogram_plot.R
contains the tests below (currently commented out):
# t1_points <- histogram_plot(data = agades, date_time = "date",
# facet_by = "none",
# elements = "tmin", add_points = TRUE)
# t1_lobf <- histogram_plot(data = agades, date_time = "date",
# facet_by = "none",
# elements = "tmin", add_line_of_best_fit = TRUE)
# t1_path <- histogram_plot(data = agades, date_time = "date",
# facet_by = "none",
# elements = "tmin", add_path = TRUE)
# t1_step <- histogram_plot(data = agades, date_time = "date",
# facet_by = "none",
# elements = "tmin", add_step = TRUE)
When I try to execute these tests in RStudio, I get the following errors:
> t1_points <- histogram_plot(data = agades, date_time = "date",
+ facet_by = "none",
+ elements = "tmin", add_points = TRUE)
Error in histogram_plot(data = agades, date_time = "date", facet_by = "none", :
unused argument (add_points = TRUE)
> t1_lobf <- histogram_plot(data = agades, date_time = "date",
+ facet_by = "none",
+ elements = "tmin", add_line_of_best_fit = TRUE)
Error in histogram_plot(data = agades, date_time = "date", facet_by = "none", :
unused argument (add_line_of_best_fit = TRUE)
> t1_path <- histogram_plot(data = agades, date_time = "date",
+ facet_by = "none",
+ elements = "tmin", add_path = TRUE)
Error in histogram_plot(data = agades, date_time = "date", facet_by = "none", :
unused argument (add_path = TRUE)
> t1_step <- histogram_plot(data = agades, date_time = "date",
+ facet_by = "none",
+ elements = "tmin", add_step = TRUE)
Error in histogram_plot(data = agades, date_time = "date", facet_by = "none", :
unused argument (add_step = TRUE)
Seen a few inconsistent parameter names between functions.
E.g.
date_time
vs date
element
vs elements
station
vs stations
I suggest we use: date
, element
, station
.
@dannyparsons what do you think?
At the moment there's a GPL-3 and an MIT licence document in the repo. These are the two most common so I think we want one of these and then remove the other.
@volloholic Do you have a view on this?
@isedwards This is the repo for the R package that will have climatic functionalities from R-Instat/Climsoft products. I don't think the licence choice has any implicated for use in OpenCDMS Processes since (from https://r-pkgs.org/license.html):
Note that simply using a package or R itself doesn’t require that you comply with the license; this is why you can write proprietary R code and why R packages can have any license you choose.
Any strong views on the package name? @isedwards @lilyclements
RInstatClmatic
was just a placeholder. My current ideas are:
climate_products
cdms_products
Any other ideas?
Probably leaning towards cdms_products
as it's shorter and fits with these being functions written for data from CDMS but not being too specifically tied to any one software package.
The Python layer uses the 'rpy2' Python package. This package allows Python code to refer directly to R functions, parameters and other objects. In order to make the Python code simple and readable, we should minimise the transformations needed between the Python and R layers.
However, R allows some things that are illegal in Python. For example, the climatic_summary()
function uses the following practices that are illegal in Python:
na.rm
and summaries.params
. To be consistent with Python, they should be named na_rm
and summaries_params
(or something similar).elements
and summaries
parameters are in the middle of the list, they should be before the first parameter with a default value.The Python rpy2 package provides workarounds for the above but they add complexity and reduce readability.
Please would it be possible to avoid the above and only use names and function signatures that are also valid in Python?
Note: If I find new inconsistencies while I implement the Python layer, then I will add extra check boxes to the list above.
@dannyparsons I'm getting an error when running the climatic_summary
function, coming from this line:
cdms.products/R/climatic_summary.R
Line 216 in b0ecc5d
I'm not sure how to fix it. It seems to occur if I put an element in the elements
parameter.
climatic_summary(data = daily_niger, date_time = "date", station = "station_name",
elements = c("rain", "tmax"))
I think dekad
is the correct spelling
The naflex
package for handling missing values in summary functions is required for this package.
I have now published it on CRAN so it can be referred to as a package dependency in the usual way.
Function should export data for use in https://github.com/ECCC-CDAS/RClimDex
The format is specified in Appendix B here https://github.com/ECCC-CDAS/RClimDex/blob/master/inst/doc/manual.pdf
Two examples are attached
This is very similar to the function for #18 but for dekadal data. The format is essentially the same. Examples attached.
dekadal_CDT_PRECIP-2000-2022.csv
dekadal_CDT_TMPMIN-2000-2022.csv
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.