Code Monkey home page Code Monkey logo

cdms.products's People

Contributors

dannyparsons avatar hawardketoyomsatsi avatar lilyclements avatar

Watchers

 avatar  avatar  avatar

cdms.products's Issues

Create timeseries function

This will have similar specification to the histogram function #12

Below is an example.

timeseries_graph

Include the same facet options as for histogram, and other common options.

Some of these options from the R-Instat dialog could also be sensible to include as options:

image

Bugs in the inventory_plot function

These might not be bugs, but confusion as to why they're not working. @dannyparsons if you could explain this to me, that would be great, then I can add it into the documentation.

  1. x_scale_from

If x_scale_from is given, then x_scale_to, and x_scale_by arguments have to be given (and similarly if at least one of the three are given, but not all three)
Should we set defaults if this is the case - x_scale_by to be 1 and x_scale_to to be the maximum year value in the date variable?

data(daily_niger)
# this does not work
inventory_plot(data = daily_niger, station = "station_name", elements = c("tmax", "tmin"),
               date = "date", x_scale_from = 1940, x_scale_by = 5)

# this does work
inventory_plot(data = daily_niger, station = "station_name", elements = c("tmax", "tmin"),
               date = "date", x_scale_from = 1940, x_scale_to = 1950, x_scale_by = 5)
  1. x_scale_from with year_doy_plot = TRUE

If x_scale_... is given with year_doy_plot = TRUE, then the maximum year does not seem to work correctly.

inventory_plot(data = daily_niger, station = "station_name", elements = c("tmax", "tmin"),
               date = "date", year_doy_plot = TRUE,
               x_scale_from = 1940, x_scale_to = 1950, x_scale_by = 5)

image

  1. facet_dir is a parameter in the list of parameters in the function but is never used in the function.
    Was this removed, but left in the list of parameters accidentally, or should it be in the function somewhere?

  2. rain_cats
    There is 0, 0.85, and an upper bound value. Is this essentially setting the threshold value? What are the lower and upper boundaries for?

  3. facet_by with year_doy_plot = TRUE
    I can't seem to get elements-stations to work when year_doy_plot = TRUE. Is this intentional? If so, why? I'll update the documentation.

inventory_plot(data = daily_niger, station = "station_name",
               year_doy_plot = TRUE,
               elements = c("tmax", "tmin"), date = "date",
               facet_by = "elements-stations")
  1. In addition there are some parameters in the documentation where I'm unsure of their purpose. Can you have a look, or briefly outline their aim so I can update the documentation?

Add test for climatic_summary that contains a `summaries_params` parameter

climatic_summary() has a summaries_params parameter.
This parameter has one list item for each summary function, and one list of parameters for each summary function e.g. 'list(mean = list(trim = 0.5))'.
In the Python wrapper, this parameter is translated from Python to R using some special code. I would like to test this translation.
Please could we add some climatic_summary() tests that have meaningful values for summaries_param (i.e. values that affect the output)?

The signature of `export_geoclim_month` is inconsistent with Python

In the function below, the parameters station_id, latitude, longitude come after parameters with default values. This is illegal in Python.

export_geoclim_month <- function(data, year, month, element, metadata = NULL,
                                 join_by = NULL, station_id,
                                 latitude, longitude, add_cols = NULL, 
                                 file_path = paste0("GEOCLIM-", element, ".csv"),
                                 ...)

Climsoft Products to implement

  1. Data
  • 1.1 Mean or Total of minute/hourly/daily/pentad/dekadal/monthly/annual data
  • 1.2 Long term mean
  • 1.3 Max and min, with or without the corresponding date
  1. Graphics
  • 2.1 Windrose
  • 2.2 Timeseries chart
  • 2.3 Histograms
  1. Inventory
  • 3.1 Details of Data Records
  • 3.2 Inventory of Missing Data
  1. Messages
  • 4.1 CLIMAT Messages
  1. Output for other applications
  • [ ] 5.1. Instat Edit: Old Instat format no longer needed
  • 5.2 Rclimdex
  • 5.3 CPT
  • 5.4 GeoCLIM Monthly
  • 5.5 GeoCLIM Dekadal
  • 5.6 GeoCLIM Pentad
  • 5.7 CDT Daily
  • 5.8 CDT Dekad

Creates extremes function

This is almost a special case of climatic_summary and will call this internally to calculate extreme values. A slightly different summary with a filter is needed to get date and occurrence of extreme events.

Add test for climatic_summary when there's no summaries provided

@HawardKetoyoMsatsi Lily found a bug here #67 which wasn't picked up by the tests I had written because I didn't include a test example where the summaries parameter is not provided e.g. this example below.

climatic_summary(data = daily_niger, date_time = "date", station = "station_name",
                              elements = c("rain", "tmax"))

Could you add a test for this case? It can be any call to climatic_summary where summaries is not provided. In this case, it uses the default summary which is dplyr::n().

"element" not recognized in Input (file_path = paste0("CDT-", element, ".csv")

@lilyclements I put this error description earlier in PR #78 but I will be opening it as a new issue.

Files export_cdt.R, export_cdt_daily.R and export_cdt_dekad.R doesnt recognise "element" in the input (file_path = paste0("CDT-", element, ".csv") (this should be fixed for the tests to run)
Whenever I run the functions I get the error:

Error in paste0("CDT-", element, ".csv") : object 'element' not found

Parameters ignored in `export_geoclim_dekad()` and `export_geoclim_pentad()`

The export_geoclim_dekad() function is shown below.
Should the file_path and ... parameters also be passed to export_geoclim()?
The export_geoclim_pentad() function has a similar problem.

export_geoclim_dekad <- function(data, year, dekad, element, metadata = NULL,
                                 join_by = NULL, station_id,
                                 latitude, longitude, add_cols = NULL,
                                 file_path = paste0("GEOCLIM-", 
                                                    element, 
                                                    ".csv"),
                                 ...) {
  export_geoclim(data = data, year = year, type = "dekad", type_col = dekad, 
                 element = element, metadata = metadata, join_by = join_by,
                 station_id = station_id, latitude = latitude, 
                 longitude = longitude, add_cols = add_cols)
}

R functions to write tests on

Check if we can remove the following R files without it impacting elsewhere:

  • convert_to_dec_deg.R
  • get_lon_from_data.R
  • get_lat_from_data.R
  • lat_lon_dataframe.R

Need to create tests on:

  • Export functions (possibly: call prepare function, export as CSV. Run export function (creates second CSV). Read in both CSVs, and check same data frame. Use short data to test this.)
  • Plot functions
  • make_factor
  • prepare_geoclim_* functions
  • spells function
  • CPT (perhaps split this into a prepare and export function)
  • summarise_inventory_data function
  • wwr_export (should output textfile should be this text. so can write it and compare)
  • export_climat_messages function
  • export_* functions contain ... parameters. Please add tests that pass ... parameters that have an observable impact on the converted file

Create histogram function

The histogram function should have similar main parameters to inventory_plot: data, date, elements, station

Below are examples that should be possible to produce.

I suggest we have a facet option which could be either none (as below), station, elements or station-elements. There should also be a nrow/ncol option which is used facetting by a single variable.

If there are multiple histograms on one plot e.g. two elements given but facet set to none, then colour should be used to distinguish elements. Similarly for multiple stations.

Also include some of the common options like title, axis titles, bar colour/fill, axis break specification

General theme options don't need to be included.

There could also be an option, like in the R-Instat dialog to produce either a histogram, density plot, ridge plot or frequency polygon.

histogram_graph_1_station
histogram_graph_2_stations

Create climate summary function

This will be a general summary function, being able to summarise to different time periods e.g. hourly, daily, monthly, annual, using specified standard summaries e.g. mean, min, max, sum.

The functionality will be similar to the Climate Summaries dialog in R-Instat.

Flexible options for dealing with missing values should also be available.

Best practice for ggplot2 wrapper functions

Been looking to best practice for writing functions that produce a ggplot2 graph:

https://rpubs.com/hadley/97970 - suggestions on functions and parameters
https://fishandwhistle.net/slides/rstudioconf2020/#1 - how to call things correctly to pass R package checks
https://icydk.com/how-to-write-functions-to-make-plots-with-ggplot2-in-r/ - idea of using glue package

I haven't quite found what I want which is how to make a function as flexible as possible without having 100s of parameters. The first link suggests optional parameters that can be lists of parameters to be passed to different bits of the plot e.g. bar.params = list(), errorbar.params = list().

I haven't yet found a suggestion on how to do this in general to cover all aspects of the plot.

Discussion on using "tidy evaluation" or not for column names

We generally do not use tidy evaluation but should have a discussion on this soon.

@dannyparsons in some functions, such as climatic_missing, we use the tidy evaluation method.

I'll list here functions we use tidy evaluation in, and so we can change either these functions later - or the functions not listed here to update to tidy evaluation.

Errors in prepare_geoclim function

@dannyparsons I am adding documentation to more functions and so running examples.

prepare_geoclim_month
Perhaps this is an error, or perhaps I'm doing something wrong. If I run the following code I get an error that

object 'month_abb_english' not found

I cannot find in the data where month_abb_english is defined, so am not sure how to fix this one!

prepare_geoclim_month(data = daily_niger, year = "year", month = "month",
                   station_id = "station_name",
                   element = "rain", metadata = stations_niger, 
                   join_by = "station_name",
                   latitude = "lat", longitude = "long")

Also

  1. Should this have an optional date parameter which year and month can be created from
  2. Should this be station not station_id
  3. What length argument can be given to add_cols? (character(1)?)

I'm also getting an error in prepare_geoclim:
If I run the following then I get an error

dekad_data <- daily_niger %>%
  dplyr::mutate(dekad = dekad(date))

prepare_geoclim(data = dekad_data, year = "year",
                station_id = "station_name",
                type_col = "dekad",
                element = "rain", metadata = stations_niger, 
                join_by = "station_name",
                latitude = "lat", longitude = "long")

Invalid tests in `test-histogram_plot.R`

test-histogram_plot.R contains the tests below (currently commented out):

#   t1_points <- histogram_plot(data = agades, date_time = "date", 
#                                facet_by = "none",
#                                elements = "tmin", add_points = TRUE)
#   t1_lobf <- histogram_plot(data = agades, date_time = "date",
#                              facet_by = "none",
#                              elements = "tmin", add_line_of_best_fit = TRUE)
#   t1_path <- histogram_plot(data = agades, date_time = "date",
#                              facet_by = "none",
#                              elements = "tmin", add_path = TRUE)
#   t1_step <- histogram_plot(data = agades, date_time = "date",
#                              facet_by = "none",
#                              elements = "tmin", add_step = TRUE)

When I try to execute these tests in RStudio, I get the following errors:

>    t1_points <- histogram_plot(data = agades, date_time = "date", 
+                                 facet_by = "none",
+                                 elements = "tmin", add_points = TRUE)
Error in histogram_plot(data = agades, date_time = "date", facet_by = "none",  : 
  unused argument (add_points = TRUE)
>    t1_lobf <- histogram_plot(data = agades, date_time = "date",
+                               facet_by = "none",
+                               elements = "tmin", add_line_of_best_fit = TRUE)
Error in histogram_plot(data = agades, date_time = "date", facet_by = "none",  : 
  unused argument (add_line_of_best_fit = TRUE)
>    t1_path <- histogram_plot(data = agades, date_time = "date",
+                               facet_by = "none",
+                               elements = "tmin", add_path = TRUE)
Error in histogram_plot(data = agades, date_time = "date", facet_by = "none",  : 
  unused argument (add_path = TRUE)
>    t1_step <- histogram_plot(data = agades, date_time = "date",
+                               facet_by = "none",
+                               elements = "tmin", add_step = TRUE)
Error in histogram_plot(data = agades, date_time = "date", facet_by = "none",  : 
  unused argument (add_step = TRUE)

Parameter names - consistency

Seen a few inconsistent parameter names between functions.

E.g.

  1. date_time vs date
  2. element vs elements
  3. station vs stations

I suggest we use: date, element, station.

@dannyparsons what do you think?

Decide on copyright

At the moment there's a GPL-3 and an MIT licence document in the repo. These are the two most common so I think we want one of these and then remove the other.

@volloholic Do you have a view on this?

@isedwards This is the repo for the R package that will have climatic functionalities from R-Instat/Climsoft products. I don't think the licence choice has any implicated for use in OpenCDMS Processes since (from https://r-pkgs.org/license.html):

Note that simply using a package or R itself doesn’t require that you comply with the license; this is why you can write proprietary R code and why R packages can have any license you choose.

Package name

Any strong views on the package name? @isedwards @lilyclements

RInstatClmatic was just a placeholder. My current ideas are:

  • climate_products
  • cdms_products

Any other ideas?

Probably leaning towards cdms_products as it's shorter and fits with these being functions written for data from CDMS but not being too specifically tied to any one software package.

Only use names and function signatures that are also valid in Python

The Python layer uses the 'rpy2' Python package. This package allows Python code to refer directly to R functions, parameters and other objects. In order to make the Python code simple and readable, we should minimise the transformations needed between the Python and R layers.

However, R allows some things that are illegal in Python. For example, the climatic_summary() function uses the following practices that are illegal in Python:

  • Some function parameters have a dot ('.') as part of their names. For example, na.rm and summaries.params. To be consistent with Python, they should be named na_rm and summaries_params (or something similar).
  • Some function parameters without default values are listed after parameters with default values. For example, the elements and summaries parameters are in the middle of the list, they should be before the first parameter with a default value.

The Python rpy2 package provides workarounds for the above but they add complexity and reduce readability.

Please would it be possible to avoid the above and only use names and function signatures that are also valid in Python?

Note: If I find new inconsistencies while I implement the Python layer, then I will add extra check boxes to the list above.

Refer to `naflex` package on CRAN

The naflex package for handling missing values in summary functions is required for this package.

I have now published it on CRAN so it can be referred to as a package dependency in the usual way.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.