Code Monkey home page Code Monkey logo

quitte's Introduction

Bits and pieces of code to use with quitte-style data frames

R package quitte, version 0.3128.4

CRAN status R build status codecov r-universe

Purpose and Functionality

A collection of functions for easily dealing with quitte-style data frames, doing multi-model comparisons and plots.

Installation

For installation of the most recent package version an additional repository has to be added in R:

options(repos = c(CRAN = "@CRAN@", pik = "https://rse.pik-potsdam.de/r/packages"))

The additional repository can be made available permanently by adding the line above to a file called .Rprofile stored in the home folder of your system (Sys.glob("~") in R returns the home directory).

After that the most recent version of the package can be installed using install.packages:

install.packages("quitte")

Package updates can be installed using update.packages (make sure that the additional repository has been added before running that command):

update.packages()

Tutorial

The package comes with a vignette describing the basic functionality of the package and how to use it. You can load it with the following command (the package needs to be installed):

vignette("quitte-data-analysis") # REMIND/IAM Data Analysis Using quitte

Questions / Problems

In case of questions / problems please contact Michaja Pehl [email protected].

Citation

To cite package quitte in publications use:

Pehl M, Bauer N, Hilaire J, Levesque A, Luderer G, Schultes A, Dietrich J, Richters O (2024). quitte: Bits and pieces of code to use with quitte-style data frames. R package version 0.3128.4, <URL: https://github.com/pik-piam/quitte>.

A BibTeX entry for LaTeX users is

@Manual{,
 title = {quitte: Bits and pieces of code to use with quitte-style data frames},
 author = {Michaja Pehl and Nico Bauer and Jérôme Hilaire and Antoine Levesque and Gunnar Luderer and Anselm Schultes and Jan Philipp Dietrich and Oliver Richters},
 year = {2024},
 note = {R package version 0.3128.4},
 url = {https://github.com/pik-piam/quitte},
}

quitte's People

Contributors

0umfhxcvx5j7joaohfss5mncnistjj6q avatar aodenweller avatar fbenke-pik avatar giannou avatar johanneskoch94 avatar mikapfl avatar nicobauer avatar orichters avatar pfuehrlich-pik avatar piklev avatar pre-commit-ci[bot] avatar tscheypidi avatar

Watchers

 avatar  avatar  avatar

quitte's Issues

calc_addVariable(): problem with "completeMissing = TRUE"

bla <- 
    tibble(model = "M",
           scenario = c("A","A","B"),
           region = "R",
           variable = c("X","Y","X"),
           unit = "U",
           period = 2020,
           value = c(1,2,4)) %>% 
    as.quitte()
  
  #works
  bla %>% 
    calc_addVariable("Z" = "X + Y", units = "U",
                     na.rm = FALSE, completeMissing = FALSE)
  # doesn't work
  bla %>% 
    calc_addVariable("Z" = "X + Y", units = "U",
                     na.rm = FALSE, completeMissing = TRUE)
 Error: Join columns must be present in data.
x Problem with `cols` and `fill`.

Occured after a package update, so presumably related to upstream changes in dplyr/tidyr/... .
Package versions:

dplyr_1.0.1
tidyr_1.1.1
quitte_0.3084.3

warning from replace_column with ambigous mask

replace_column works silently even if provided with an ambiguous mask that maps values to multiple other values (i.e. columns of mask contain duplicates). In all my applications so far, this would be a mistake and the result with more rows than data was not what I intended calculate.
Do you think it makes sense to give a warning or even throw an error if this kind of mask is not explicitly allowed via a new argument? Or are my use cases just a very limited fraction of what this function should be able to do?

bug in calc_addVariable causing wrong additional output

@0UmfHxcvx5J7JoaOhFSs5mncnisTJJ6q There is a bug in calc_addVariable. Using only.new = F and setting variable to something else than "variable", might lead to additional unintended rows in the output dataframe.
Example:

test <- data.frame(
  "efScen" = c("efCO2-00", "efCO2-00", "efCO2-05", "efCO2-05"),
  "variable" = c("Emi|CO2|Land-Use Change", "PE|Biomass|Energy Crops",
                 "Emi|CO2|Land-Use Change", "PE|Biomass|Energy Crops"),
  "unit" = c("Mt CO2/yr", "EJ/yr", "Mt CO2/yr", "EJ/yr"),
  "value" = c(1952.2740, 383.4084, 1443.3521, 330.4681)
)

test_add_var <- test %>%
  calc_addVariable("efCO2-diff" = "`efCO2-00` - `efCO2-05`",
                   variable = efScen)

I would expect the following:

> test
    efScen                variable      unit     value
1 efCO2-00 Emi|CO2|Land-Use Change Mt CO2/yr 1952.2740
2 efCO2-00 PE|Biomass|Energy Crops     EJ/yr  383.4084
3 efCO2-05 Emi|CO2|Land-Use Change Mt CO2/yr 1443.3521
4 efCO2-05 PE|Biomass|Energy Crops     EJ/yr  330.4681

> test_add_var
       efScen                variable      unit     value
1    efCO2-00 Emi|CO2|Land-Use Change Mt CO2/yr 1952.2740
4    efCO2-00 PE|Biomass|Energy Crops     EJ/yr  383.4084
5    efCO2-05 Emi|CO2|Land-Use Change Mt CO2/yr 1443.3521
8    efCO2-05 PE|Biomass|Energy Crops     EJ/yr  330.4681
9  efCO2-diff Emi|CO2|Land-Use Change      <NA>  508.9219
10 efCO2-diff PE|Biomass|Energy Crops      <NA>   52.9403

But this is what happens:

> test_add_var
       efScen                variable      unit     value
1    efCO2-00 Emi|CO2|Land-Use Change Mt CO2/yr 1952.2740
2    efCO2-00 Emi|CO2|Land-Use Change     EJ/yr 1952.2740
3    efCO2-00 PE|Biomass|Energy Crops Mt CO2/yr  383.4084
4    efCO2-00 PE|Biomass|Energy Crops     EJ/yr  383.4084
5    efCO2-05 Emi|CO2|Land-Use Change Mt CO2/yr 1443.3521
6    efCO2-05 Emi|CO2|Land-Use Change     EJ/yr 1443.3521
7    efCO2-05 PE|Biomass|Energy Crops Mt CO2/yr  330.4681
8    efCO2-05 PE|Biomass|Energy Crops     EJ/yr  330.4681
9  efCO2-diff Emi|CO2|Land-Use Change      <NA>  508.9219
10 efCO2-diff PE|Biomass|Energy Crops      <NA>   52.9403

That is, there are additional rows that did not exist before, where "Emi|CO2|Land-Use Change" is combined with "EJ/yr" and "PE|Biomass|Energy Crops" with "Mt CO2/yr". The value is equal to the row with the respective entry in "variable". If only.new = T, this does not happen.

vignette review

We have collected a list of minor recommendation for changes but none of these are important:

  1. Move loading of exemplary data further up and make it more clear how to read in exemplary data from quitte (i.e. write "data <- quitte_example_data")
  2. Always put explanation in front of code for function, e.g.
  • inline.data.frame()
  • order.levels()
  1. Add explanation „By settting only.new = TRUE“ to the text „It is also possible to only keep the newly calculated values for further manipulation.“
  2. Format the line „(Note: You don't have to call the mip-functions via the double colon (::) operator. This is only needed in this vignette, since quitte can't import mip.)“ as a Tip

One thing we noticed for the interpolation, is that the results are same for linear and spline interpolation. Is this how it is supposed to be?

New variable in calc_addVariable cannot be named as exisitng variable

The function calc_addVariable does not allow a new variable to have the same name as an existing one. Is this intended or a bug? I would actually like the function to overwrite the old variable by the newly calculated one. Unit conversions would be a classic application of this functionality.

> df <- tibble(period = rep(2000:2002, 2),
+              variable = rep(c("a", "b"), each = 3),
+              value = 1:6)
> calc_addVariable(df, a = "a")
Error in `mutate()`:
! Problem while computing `a = f_eval(f = .dots[[i]], data = .)`.
Caused by error:
! object 'a' not found
Run `rlang::last_error()` to see where the error occurred.

[Feature request] fast way to read subset of variables into quitte df

Would it be possible to have a function to read only a subset of variables from one or multiple mif/csv files into a quitte dataframe? To make it fast, the best way is probably to pre-filter the file(s) using command line tools behind the scenes, and then to read only the filtered temporary file? (as suggested in previous discussions on Mattermost)

build fail - depreciation warning from forcats

building piamInterfaces fails with the following

══ Warnings ════════════════════════════════════════════════════════════════════
1. test summationFile without errors using AR6 ('test-checkSummations.R:22') - `fct_explicit_na()` was deprecated in forcats 1.0.0.
ℹ Please use `fct_na_value_to_level()` instead.
ℹ The deprecated feature was likely used in the quitte package.
  Please report the issue at <https://github.com/pik-piam/quitte/issues>.

══ DONE ════════════════════════════════════════════════════════════════════════
Before submission you need to take care of the following warnings:

test "test summationFile without errors using AR6": `fct_explicit_na()` was deprecated in forcats 1.0.0.
ℹ Please use `fct_na_value_to_level()` instead.
ℹ The deprecated feature was likely used in the quitte package.
  Please report the issue at <https://github.com/pik-piam/quitte/issues>.

You can find solutions to common problems at https://github.com/pik-piam/discussions/discussions/18
Error in lucode2::verifyTests(...) : The package tests produced warnings.
 Error in check(cran = cran, config = cfg) : lucode2::check failed

sum_total() subsets wrong groups

structure(
  list(
    region = c("US", "US", "US", "US", "EU28", "EU28", "EU28", "EU28", "EU28", "EU28"), 
    iso3c = c("PRI", "PRI", "USA", "USA", "ALA", "ALA", "AND", "AND", "AUT", "AUT"), 
    name = c("EEK", "kap", "EEK", "kap", "EEK", "kap", "EEK", "kap", "EEK", "kap"), 
    value = c(7.28297787369661e-07, 1.4825575525539, 5.47830041372924e-05, 
              196.700540269622, 7.1732435863781e-09, 0.00494520584427543, 
              1.98196459756622e-08, 0.0099356972852833, 9.80993573948498e-07, 
              0.955472798121879)), 
  row.names = c(NA, -10L), 
  class = c("tbl_df", "tbl", "data.frame")
) %>% 
  sum_total(group = iso3c)

Too much filling by completeMissing in calc_addVariable?

I am still struggling with calc_addVariable if completeMissing == TRUE (related to this comment). The function will fill much more "missing data combinations" with zero than I expected.

> data
  model variable period value
1    m1        A      1     1
2    m1        A      2     2
3    m1        A      3     3
4    m1        B      1     4
5    m1        B      2     5
6    m2        A      1     6
7    m2        A      2     7
8    m2        B      1     8
9    m2        B      2     9
> calc_addVariable(data, C = "A + B", completeMissing = TRUE)
   model variable period value
 1 m1    A             1     1
 2 m1    B             1     4
 3 m1    C             1     5
 4 m1    A             2     2
 5 m1    B             2     5
 6 m1    C             2     7
 7 m1    A             3     3
 8 m1    B             3     0
 9 m1    C             3     3
10 m2    A             1     6
11 m2    B             1     8
12 m2    C             1    14
13 m2    A             2     7
14 m2    B             2     9
15 m2    C             2    16
16 m2    A             3     0
17 m2    B             3     0
18 m2    C             3     0

In the case above, I don't see the need to fill values for period 3 in model m2 because none of the variables A and B are defined for this combination. So I would expect the result without rows 16-18. Row 8 on the other hand should be filled in order to calculate row 9.
If the current behavior should remain, is it possible to introduce an reduced version of the argument like completeGaps that would only fill missing variables for data combination that exist for at least one of the variable used in the calculation?
I might have a special use case, but in EDGE-Buildings we often have the scenario history with values for historic periods and the scenarios SSP* for projected periods. calc_addVariable with completeMissing == TRUE introduces many unwanted zeros.
I can imagine this to require a bit more effort. So if there is no time for this feature (or my suggestion makes no sense at all), this issues is just to flag the behaviour I described. Thanks in any case!

Requirement for forcats >= 1.0.0 missing

Hi,

Since 0.3110.0, quitte is using forcats::fct_na_value_to_level, which is only available in forcats since version 1.0.0 according to the changelog. If my analysis is correct, it should be enough to include this requirement in the DESCRIPTION.

With an earlier version of forcats, installing quitte fails:

> install.packages("quitte")
Installing package into ‘/home/pflueger/R/x86_64-pc-linux-gnu-library/4.2’
(as ‘lib’ is unspecified)
trying URL 'https://pik-piam.r-universe.dev/src/contrib/quitte_0.3110.0.tar.gz'
Content type 'application/x-gzip' length 989055 bytes (965 KB)
==================================================
downloaded 965 KB

* installing *source* package ‘quitte’ ...
** using staged installation
** R
** data
*** moving datasets to lazyload DB
** inst
** byte-compile and prepare package for lazy loading
Error:
! object ‘fct_na_value_to_level’ is not exported by 'namespace:forcats'
Backtrace:
    ▆
 1. └─tools:::makeLazyLoading(...)
 2.   └─tools:::code2LazyLoadDB(...)
 3.     ├─base::suppressPackageStartupMessages(...)
 4.     │ └─base::withCallingHandlers(expr, packageStartupMessage = function(c) tryInvokeRestart("muffleMessage"))
 5.     └─base::loadNamespace(...)
 6.       └─base::namespaceImportFrom(...)
 7.         └─base::importIntoEnv(impenv, impnames, ns, impvars)
Execution halted
ERROR: lazy loading failed for package ‘quitte’
* removing ‘/home/pflueger/R/x86_64-pc-linux-gnu-library/4.2/quitte’
* restoring previous ‘/home/pflueger/R/x86_64-pc-linux-gnu-library/4.2/quitte’

The downloaded source packages are in
	‘/tmp/RtmpF4Ccfq/downloaded_packages’
Warning message:
In install.packages("quitte") :
  installation of package ‘quitte’ had non-zero exit status

Cheers,

Mika

default handling of Inf in read.quitte

Currently read.quitte("scenario.mif") throws a (non-helpful) warning if the mif file contains Inf or -Inf as values. This can be prevented by adding "Inf", "-Inf" to the argument na.strings, e.g., read.quitte("scenario.mif", na.strings = c("UNDF", "NA", "N/A", "n_a", "Inf", "-Inf")). It would be nice if either this would be the default or if "Inf" would be correctly parsed as the value Inf.

add a comment header in mif file with a few lines of metadata (if mif is produced on the cluster)

  • the idea is to be able to link a mif file to a specific run path on the cluster, so when mif is created the run path is automatically written to a comment header after #
  • R has a function to skip hash comment line: https://stackoverflow.com/questions/28433328/skip-comment-line-in-csv-file-using-r
    it just needs to be made compatible with e.g. "read.report" type of magclass functions (or similar quitte function)
  • Besides the run path, it would also be nice to include library version and model version from the reporting and run, and maybe a pointer to file name indicating the existence of a reporting documentation (if there is one in the future)

quitte::read.quitte() return object not recognized by tidyr::pivot_wider() as tibble

Hey Michaja,

so Silvia and I happened upon this little bug/anomaly:
When reading in a mif file with read.quitte(), the returned object is not recognized by pivot_wider() as a tibble. Even though is_tibble() returns TRUE, you have to pass the quitte object through as_tibble(), before pivot_wider() accepts it.

Below is some code describing the problem. Specifically a warning message that is only displayed every 8 hours(?). Does the problem maybe have something to do with the use of the deprecated data_frame() function?

So I was wondering if this is something you were aware of and if it needs fixing. I am glad to work on it, I just wanted to get your input on this.

Thanks a lot!

x<- read.quitte("output/default_2020-03-03_09.38.01/REMIND_generic_default.mif")
Warning message:
`data_frame()` is deprecated as of tibble 1.1.0.
Please use `tibble()` instead.
This warning is displayed once every 8 hours.
Call `lifecycle::last_warnings()` to see where this warning was generated. 
x
A tibble: 356,649 x 7
model  scenario region variable    unit               period value
   <fct>  <fct>    <fct>  <fct>       <fct>               <int> <dbl>
 1 REMIND default  CAZ    Consumption billion US$2005/yr   2005  972.
 2 REMIND default  CHA    Consumption billion US$2005/yr   2005 3493.
 3 REMIND default  EUR    Consumption billion US$2005/yr   2005 7846.
 4 REMIND default  IND    Consumption billion US$2005/yr   2005  898.
 5 REMIND default  JPN    Consumption billion US$2005/yr   2005 2671.
 6 REMIND default  LAM    Consumption billion US$2005/yr   2005 1710.
 7 REMIND default  MEA    Consumption billion US$2005/yr   2005 1086.
 8 REMIND default  NEU    Consumption billion US$2005/yr   2005  858.
 9 REMIND default  OAS    Consumption billion US$2005/yr   2005 1402.
10 REMIND default  REF    Consumption billion US$2005/yr   2005  660.
… with 356,639 more rows
x %>%  pivot_wider(names_from = variable)
Error: `x` must be a vector, not a `tbl_df/tbl/data.frame/quitte` object.
Run `rlang::last_error()` to see where the error occurred.

character.data.frame returns malformed column names

It seem with the latest version of the tibble package (tibble_3.0.0) the quitte function character.data.frame() returns malformed column names. With an older tibble version (tibble_2.1.1) it worked on my local machine. The issue occured after updating tibble on my local machine.

The result of:
character.data.frame(as.quitte(read.report("REMIND_generic_SSP2-Base.mif",as.list = FALSE)))

Result with tibble_2.1.1:

#A tibble: 356,174 x 7
   model  scenario  region variable    unit               period value
   <chr>  <chr>     <chr>  <chr>       <chr>               <int> <dbl>
 1 REMIND SSP2-Base CAZ    Consumption billion US$2005/yr   2005 1122.
 2 REMIND SSP2-Base CHA    Consumption billion US$2005/yr   2005 1604.
 3 REMIND SSP2-Base EUR    Consumption billion US$2005/yr   2005 8753.
 4 REMIND SSP2-Base IND    Consumption billion US$2005/yr   2005  517.
 5 REMIND SSP2-Base JPN    Consumption billion US$2005/yr   2005 3032.
 6 REMIND SSP2-Base LAM    Consumption billion US$2005/yr   2005 1867.
 7 REMIND SSP2-Base MEA    Consumption billion US$2005/yr   2005  996.
 8 REMIND SSP2-Base NEU    Consumption billion US$2005/yr   2005  943.
 9 REMIND SSP2-Base OAS    Consumption billion US$2005/yr   2005 1300.
10 REMIND SSP2-Base REF    Consumption billion US$2005/yr   2005  570.
 #... with 356,164 more rows

Result with tibble_3.0.0:

# A tibble: 356,174 x 7
   model[,"model"] [,"scenario"] [,"region"] [,"variable"] [,"unit"] scenario[,"mode~ [,"scenario"] [,"region"] [,"variable"] [,"unit"] region[,"model"]
   <chr>           <chr>         <chr>       <chr>         <chr>     <chr>            <chr>         <chr>       <chr>         <chr>     <chr>           
 1 REMIND          SSP2-Base     CAZ         Consumption   billion ~ REMIND           SSP2-Base     CAZ         Consumption   billion ~ REMIND          
 2 REMIND          SSP2-Base     CHA         Consumption   billion ~ REMIND           SSP2-Base     CHA         Consumption   billion ~ REMIND          
 3 REMIND          SSP2-Base     EUR         Consumption   billion ~ REMIND           SSP2-Base     EUR         Consumption   billion ~ REMIND          
 4 REMIND          SSP2-Base     IND         Consumption   billion ~ REMIND           SSP2-Base     IND         Consumption   billion ~ REMIND          
 5 REMIND          SSP2-Base     JPN         Consumption   billion ~ REMIND           SSP2-Base     JPN         Consumption   billion ~ REMIND          
 6 REMIND          SSP2-Base     LAM         Consumption   billion ~ REMIND           SSP2-Base     LAM         Consumption   billion ~ REMIND          
 7 REMIND          SSP2-Base     MEA         Consumption   billion ~ REMIND           SSP2-Base     MEA         Consumption   billion ~ REMIND          
 8 REMIND          SSP2-Base     NEU         Consumption   billion ~ REMIND           SSP2-Base     NEU         Consumption   billion ~ REMIND          
 9 REMIND          SSP2-Base     OAS         Consumption   billion ~ REMIND           SSP2-Base     OAS         Consumption   billion ~ REMIND          
10 REMIND          SSP2-Base     REF         Consumption   billion ~ REMIND           SSP2-Base     REF         Consumption   billion ~ REMIND          
# ... with 356,164 more rows, and 16 more variables: [,"scenario"] <chr>, [,"region"] <chr>, [,"variable"] <chr>, [,"unit"] <chr>, variable[,"model"] <chr>,
#   [,"scenario"] <chr>, [,"region"] <chr>, [,"variable"] <chr>, [,"unit"] <chr>, unit[,"model"] <chr>, [,"scenario"] <chr>, [,"region"] <chr>, [,"variable"] <chr>,
#   [,"unit"] <chr>, period <int>, value <dbl>

inpolate_missing_periods falsly extrapolates in particular cases

I found an erratic behavior of interpolate_missing_periods when the data has periods before the interpolation period but is missing periods at the end of the interpolation period.

> df <- data.frame(period   = 2000:2004,
+                  variable = rep("a", 5),
+                  value    = sqrt(1:5))

> df
  period variable    value
1   2000        a 1.000000
2   2001        a 1.414214
3   2002        a 1.732051
4   2003        a 2.000000
5   2004        a 2.236068

> interpolate_missing_periods(df, 2002:2005)
# A tibble: 6 x 3
  variable period value
  <chr>     <int> <dbl>
1 a          2002  1.73
2 a          2003  2   
3 a          2004  2.24
4 a          2005  1.41
5 a          2000  1   
6 a          2001  1.41

To my understanding, the function should not extrapolate until 2005, as expand.values is FALSE. But instead, it assumes missing periods at the end of the interpolation period as the final value of the periods before the interpolation period.

write.mif could support append

magclass::write.report supports appending to an existing .mif-style output file. It would be nice to have this functionality also in write.mif.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.