doi-usgs / streammetabolizer Goto Github PK

streamMetabolizer uses inverse modeling to estimate aquatic metabolism (photosynthesis and respiration) from time series data on dissolved oxygen, water temperature, depth, and light.

Home Page: http://usgs-r.github.io/streamMetabolizer/

License: Other

R 23.38% Stan 76.62%

r rstats usgs

streammetabolizer's Introduction

streamMetabolizer: Models for Estimating Aquatic Photosynthesis and Respiration

! In summer or fall 2023, this package will move from
! https://github.com/USGS-R/streamMetabolizer to
! https://github.com/DOI-USGS/streamMetabolizer.
! Please update your links accordingly.

The streamMetabolizer R package uses inverse modeling to estimate aquatic photosynthesis and respiration (collectively, metabolism) from time series data on dissolved oxygen, water temperature, depth, and light. The package assists with data preparation, handles data gaps during modeling, and provides tabular and graphical reports of model outputs. Several time-honored methods are implemented along with many promising new variants that produce more accurate and precise metabolism estimates.

This package has been described, with special focus on the Bayesian model options, by Appling et al. 2018a. An application to 356 streams across the U.S. is described in Appling et al. 2018b.

Appling, A. P., Hall, R. O., Yackulic, C. B., & Arroita, M. (2018a). Overcoming equifinality: Leveraging long time series for stream metabolism estimation. Journal of Geophysical Research: Biogeosciences, 123(2), 624–645. https://doi.org/10.1002/2017JG004140

Appling, A. P., Read, J. S., Winslow, L. A., Arroita, M., Bernhardt, E. S., Griffiths, N. A., Hall, R. O., Harvey, J. W., Heffernan, J. B., Stanley, E. H., Stets, E. G., & Yackulic, C. B. (2018b). The metabolic regimes of 356 rivers in the United States. Scientific Data, 5(1), 180292. https://doi.org/10.1038/sdata.2018.292

To see the recommended citation for this package, please run citation('streamMetabolizer') at the R prompt.

citation('streamMetabolizer')
## 
## To cite streamMetabolizer in publications, please use:
## 
##   Appling, Alison P., Robert O. Hall, Charles B. Yackulic, and Maite
##   Arroita. “Overcoming Equifinality: Leveraging Long Time Series for
##   Stream Metabolism Estimation.” Journal of Geophysical Research:
##   Biogeosciences 123, no. 2 (February 2018): 624–45.
##   https://doi.org/10.1002/2017JG004140.
## 
## A BibTeX entry for LaTeX users is
## 
##   @Article{,
##     author = {Alison P. Appling and Robert O. {Hall Jr.} and Charles B. Yackulic and Maite Arroita},
##     title = {Overcoming Equifinality: Leveraging Long Time Series for Stream Metabolism Estimation},
##     journal = {Journal of Geophysical Research: Biogeosciences},
##     year = {2018},
##     volume = {123},
##     number = {2},
##     doi = {10.1002/2017JG004140},
##     url = {https://github.com/USGS-R/streamMetabolizer},
##   }

Installation

To install the streamMetabolizer package, use the remotes package (running install.packages('remotes') first if needed). To use remotes::install_github() it is convenient to set a GitHub Personal Access Token (PAT). There are several methods for setting your PATs within R; the simplest is to call `Sys.setenv(GITHUB_PAT=“yyyy”), replacing yyyy with the PAT you established on the GitHub website.

You may first need to install the unitted dependency:

remotes::install_github('appling/unitted')

You can then install the most cutting edge version of streamMetabolizer with this command:

remotes::install_github(
  "USGS-R/streamMetabolizer", # soon to be "DOI-USGS/streamMetabolizer"
  build_vignettes = TRUE)

Software dependencies for Bayesian models

The major dependency for Bayesian models is the rstan package, and installation of that package is rarely as simple as a call to install.packages(). Start at the rstan wiki page for the most up-to-date installation instructions, which differ by operating system.

Getting started

After installing and loading streamMetabolizer, run vignette() in R to see tutorials on getting started and customizing your metabolism models.

vignette(package='streamMetabolizer')
## displays a list of available vignettes

vignette('get_started', package='streamMetabolizer')
## displays an html or pdf rendering of the 'get_started' vignette

You can also view pre-built html versions of these vignettes in the “inst/doc” folder in the source code, e.g., inst/doc/get_started.html, which you can download and then open in a browser.

Development and Maintenance Status

streamMetabolizer is a USGS Archive Research Package:

Project funding has ended and our maintenance time is limited, but we do attempt to provide bug fixes and lightweight support as we are able. Submit questions or suggestions to https://github.com/USGS-R/streamMetabolizer/issues.

Contributing

We want to encourage a warm, welcoming, and safe environment for contributing to this project. See CODE_OF_CONDUCT.md for more information.

For technical details on how to contribute, see CONTRIBUTING.md

Development History

streamMetabolizer was developed 2015-2018 with support from the USGS Powell Center (through a working group on Continental Patterns of Stream Metabolism), the USGS National Water Quality Program, and the USGS Office of Water Information.

Model Archive

The following version of R and package dependencies were used most recently to pass the embedded tests within this package. There is no guarantee of reproducible results using future versions of R or updated versions of package dependencies; however, we aim to test and update future modeling environments.

sessioninfo::session_info()

## ─ Session info ───────────────────────────────────────────────────────────────────────────────────
##  setting  value
##  version  R version 4.2.3 (2023-03-15)
##  os       macOS Ventura 13.4.1
##  system   x86_64, darwin17.0
##  ui       RStudio
##  language (EN)
##  collate  en_US.UTF-8
##  ctype    en_US.UTF-8
##  tz       America/New_York
##  date     2023-07-02
##  rstudio  2023.06.0+421 Mountain Hydrangea (desktop)
##  pandoc   3.1.1 @ /Applications/RStudio.app/Contents/Resources/app/quarto/bin/tools/ (via rmarkdown)
## 
## ─ Packages ───────────────────────────────────────────────────────────────────────────────────────
##  package           * version  date (UTC) lib source
##  cli                 3.6.1    2023-03-23 [1] CRAN (R 4.2.0)
##  deSolve             1.35     2023-03-12 [1] CRAN (R 4.2.0)
##  digest              0.6.32   2023-06-26 [1] CRAN (R 4.2.0)
##  dplyr               1.1.2    2023-04-20 [1] CRAN (R 4.2.0)
##  evaluate            0.21     2023-05-05 [1] CRAN (R 4.2.0)
##  fansi               1.0.4    2023-01-22 [1] CRAN (R 4.2.0)
##  fastmap             1.1.1    2023-02-24 [1] CRAN (R 4.2.0)
##  generics            0.1.3    2022-07-05 [1] CRAN (R 4.2.0)
##  glue                1.6.2    2022-02-24 [1] CRAN (R 4.2.0)
##  htmltools           0.5.5    2023-03-23 [1] CRAN (R 4.2.0)
##  knitr               1.43     2023-05-25 [1] CRAN (R 4.2.0)
##  LakeMetabolizer     1.5.5    2022-11-15 [1] CRAN (R 4.2.0)
##  lazyeval            0.2.2    2019-03-15 [1] CRAN (R 4.2.0)
##  lifecycle           1.0.3    2022-10-07 [1] CRAN (R 4.2.0)
##  lubridate           1.9.2    2023-02-10 [1] CRAN (R 4.2.0)
##  magrittr            2.0.3    2022-03-30 [1] CRAN (R 4.2.0)
##  pillar              1.9.0    2023-03-22 [1] CRAN (R 4.2.0)
##  pkgconfig           2.0.3    2019-09-22 [1] CRAN (R 4.2.0)
##  plyr                1.8.8    2022-11-11 [1] CRAN (R 4.2.0)
##  purrr               1.0.1    2023-01-10 [1] CRAN (R 4.2.0)
##  R6                  2.5.1    2021-08-19 [1] CRAN (R 4.2.0)
##  Rcpp                1.0.10   2023-01-22 [1] CRAN (R 4.2.0)
##  rLakeAnalyzer       1.11.4.1 2019-06-09 [1] CRAN (R 4.2.0)
##  rlang               1.1.1    2023-04-28 [1] CRAN (R 4.2.0)
##  rmarkdown           2.22     2023-06-01 [1] CRAN (R 4.2.0)
##  rstudioapi          0.14     2022-08-22 [1] CRAN (R 4.2.0)
##  sessioninfo         1.2.2    2021-12-06 [1] CRAN (R 4.2.0)
##  streamMetabolizer * 0.12.1   2023-07-02 [1] local
##  tibble              3.2.1    2023-03-20 [1] CRAN (R 4.2.0)
##  tidyr               1.3.0    2023-01-24 [1] CRAN (R 4.2.0)
##  tidyselect          1.2.0    2022-10-10 [1] CRAN (R 4.2.0)
##  timechange          0.2.0    2023-01-11 [1] CRAN (R 4.2.0)
##  unitted             0.2.9    2023-06-05 [1] Github (appling/unitted@d1f1172)
##  utf8                1.2.3    2023-01-31 [1] CRAN (R 4.2.0)
##  vctrs               0.6.3    2023-06-14 [1] CRAN (R 4.2.0)
##  xfun                0.39     2023-04-20 [1] CRAN (R 4.2.0)
##  yaml                2.3.7    2023-01-23 [1] CRAN (R 4.2.0)
## 
##  [1] /Library/Frameworks/R.framework/Versions/4.2/Resources/library

Disclaimer

This software is preliminary or provisional and is subject to revision. It is being provided to meet the need for timely best science. The software has not received final approval by the U.S. Geological Survey (USGS). No warranty, expressed or implied, is made by the USGS or the U.S. Government as to the functionality of the software and related material nor shall the fact of release constitute any such warranty. The software is provided on the condition that neither the USGS nor the U.S. Government shall be held liable for any damages resulting from the authorized or unauthorized use of the software.

streammetabolizer's People

Contributors

Stargazers

Watchers

streammetabolizer's Issues

allow K600 to be specified as instantaneous values

You can currently (soon) specify K600 as daily values in metab_mle. but why not also allow instantaneous values, given that K600 will probably be predicted from Q with the nighttime regression method, anyway? this would be cool but is nonessential, so marking as a task for down the road.

clean up use of mm_model_by_ply

The same pattern applies, with minor modifications, to all descendants of metab_model and other uses of mm_model_by_ply. Here's the formatting I'll use:

#the metab_model function has args:
#' @inheritParams metab_model_prototype
#' @inheritParams mm_is_valid_day
#' @inheritParams model_specific_fun
metab_model <- function(
  data, data_daily, info, day_start=xx, day_end=xx, # inheritParams metab_model_prototype
  tests=c('full_day', 'even_timesteps', 'complete_data'), # inheritParams mm_is_valid_day
  model-specific args # inheritParams model_specific_fun
) {
  ...
  metab_model("metab_xxx", ...
    args=list(day_start=day_start, day_end=day_end, tests=tests, model-specific args) ...)
  ...
}

#it calls mm_model_by_ply with:
mm_model_by_ply(
  model_fun, data=data, data_daily=data_daily, # for mm_model_by_ply
  day_start=day_start, day_end=day_end, # for mm_model_by_ply and mm_is_valid_day
  tests=tests, # for mm_is_valid_day
  model-specific args # for model_specific_fun
)

#the model_fun function has args:
#' @inheritParams mm_model_by_ply_prototype
#' @inheritParams mm_is_valid_day
#' @inheritParams model_specific_fun
model_fun <- function(
  data_ply, data_daily_ply, day_start=xx, day_end=xx, local_date, # inheritParams mm_model_by_ply_prototype
  tests=c('full_day', 'even_timesteps', 'complete_data'), # inheritParams mm_is_valid_day
  model-specific args # inheritParams model_specific_fun
) {
  ...
  model_specific_fun(model-specific args)
  ...
}

This all gets simplified when mm_is_valid_day is not called and the first function is not a model constructor.

#the multi-day function has args:
#' @inheritParams model_specific_fun
metab_model <- function(
  data, data_daily, day_start=xx, day_end=xx, # inheritParams metab_model_prototype
  model-specific args # inheritParams model_specific_fun
) {
  ...
  metab_model("metab_xxx", ...
    args=list(day_start=day_start, day_end=day_end, model-specific args) ...)
  ...
}

#it calls mm_model_by_ply with:
mm_model_by_ply(
  model_fun, data=data, data_daily=data_daily, # for mm_model_by_ply
  day_start=day_start, day_end=day_end, # for mm_model_by_ply
  model-specific args # for model_specific_fun
)

#the model_fun function has args:
#' @inheritParams mm_model_by_ply_prototype
#' @inheritParams model_specific_fun
model_fun <- function(
  data_ply, data_daily_ply, day_start=xx, day_end=xx, local_date, # inheritParams mm_model_by_ply_prototype
  model-specific args # inheritParams model_specific_fun
) {
  ...
  model_specific_fun(model-specific args)
  ...
}

runjags and rjags installation

On Travis-CI, e.g., https://travis-ci.org/USGS-R/streamMetabolizer/builds/70088636#L1478:

* installing *source* package ‘runjags’ ...
** package ‘runjags’ successfully unpacked and MD5 sums checked
checking for prefix by checking for jags... no
configure: error: "Location of JAGS headers not defined. Use configure arg '--with-jags-include' or environment variable 'JAGS_INCLUDE'"
ERROR: configuration failed for package ‘runjags’
* removing ‘/usr/local/lib/R/site-library/runjags’

saw similar issues with rjags on Condor before I [temporarily] removed it.

align default dates for all inter-operating models

e.g., for metab_mle, we may be using 6am to 6am for multiple days in a row. if this is the case, then the date for a metabolism estimate should refer to the 6am-12am period of that day, and the date for a separately estimated K600 estimate (e.g., from metab_night) should refer to the ~8pm-12am period of that night (i.e., the first rather than the last date represented by local.time for a time series used in metab_night).

add K600 argument to metab_mle

K600 can be a ts-style data.frame with Date as the first column. Needs to be passed through mm_model_by_ply to mle_1ply, filtered there to the date of the ply, and passed as a single number to the negloglik function. if it's possible to share code between the PRK and PR versions of metab_mle, that'll be great

data interpolation functions

data interpolation is currently outside the scope of this package; users need to provide pre-interpolated data for the models to work. (see thread in #79)

it would be nice to help users with interpolation by providing functions to do it well. some functionality is already in mda.streams (e.g., combine_ts) and could be ported over
this is a problem that has been solved for many other cases already; there's probably a good existing package for it. look into existing options before getting fancy.

implement utility functions for getting `DO.deficit` from the other ts variables

Create html_vignette for basic usage

This is probably lower priority, but once we have a full pass-through of data, it would be good to start documenting the patterns in a vignette.

calc_DO_mod

port from core_model_metab_mle. create wrapper and/or shared internal function to keep this function efficient but also make it possible to add noise (observation, process, or autocorrelated process) for simulating data

units for convert_SW_to_PAR

pass jags model filename as arg through metab_bayes

analogous to passing calc_DO_mod function name through metab_mle

Implement a simple metabolism calculator

start from Bob's code, use minimal set of input data

Include NA rows when there are missing data

Here what I do to make sure the time lag from row to row is constant. Let me know if you want me to modify it or check the data sets! Hope it helps.

#Just to use an example
library(powstreams)
t<-"nwis_07239450"
d<-c("depth_calcDischHarvey","doobs_nwis","dosat_calcGGbts","wtr_nwis","par_calcLat")
dat<-get_ts(d,t)

#Create reference date and time column. I used POSIXct format because this is the format of the data set
#For each stream starting and ending dates and frequence should be checked.
library(chron)
dtime<-seq(from=as.POSIXct("2007-10-01 05:00:00",tz="GMT"),to=as.POSIXct("2015-06-06 23:30:00",tz="GMT"),by="30 min")   

#I checked the dimensions for both and it seems in this particular case there are a lot of rows missing
Plotting the DateTime column works too
dim(dat)
length(dtime)
plot(dat$DateTime)

#Match DateTime column in your data set with the reference dtime column
depth<-dat$depth[match(dtime,dat$DateTime)]
doobs<-dat$doobs[match(dtime,dat$DateTime)]
#....

#I usually create a new data frame. It's also great to check it matches the number of rows you expect considering the number of days and frequency you have
dat<-data.frame(dtime,depth,doobs...)

implement hierarchical bayesian options

(expansion on issue #56 for hierarchical bayesian models)

All the hierarchical models I think we might implement include an expectation for a distribution of daily values of GPP, ER, and K600. They therefore usually include these lines:

for(d in 1:nday) {
  GPP.daily ~ dnorm(GPP.daily.mu, GPP.daily.tau)
  ER.daily ~ dnorm(ER.daily.mu, ER.daily.tau)
  K600.daily ~ dnorm(K600.daily.mu, K600.daily.tau)
}

For non-hierarchical models, GPP.daily.mu, GPP.daily.tau, etc. are constants supplied by the user, and so the above lines can be the end of the model description.

For hierarchical variants, additional lines need to be included.

Constrain overall average P, R, and/or K, i.e., fit values for the probability distributions of GPP.daily.mu and/or GPP.daily.tau, etc.

# constrain the means
GPP.daily.mu ~ dnorm(GPP.daily.mu.mu, GPP.daily.mu.tau)
ER.daily.mu ~ dnorm(ER.daily.mu.mu, ER.daily.mu.tau)
K600.daily.mu ~ dnorm(K600.daily.mu.mu, K600.daily.mu.tau)
# and/or constrain the taus
GPP.daily.tau ~ dgamma(GPP.daily.tau.r, GPP.daily.tau.lambda)
ER.daily.tau ~ dgamma(ER.daily.tau.r, ER.daily.tau.lambda)
K600.daily.tau ~ dgamma(K600.daily.tau.r, K600.daily.tau.lambda)

Constrain day-to-day variation in P, R, and/or K, i.e., specify a normal distribution with mu=0 for the diffs between consecutive days, e.g., (GPP.daily[d] - GPP.daily[d-1])

for(d in 2:nday) {
  GPP.daily.diff[d-1] <- GPP.daily[d] - GPP.daily[d-1]
  GPP.daily.diff[d-1] ~ dnorm(0, GPP.daily.diff.tau)
  # and so on for ER and/or K600
}

Constrain K to be near the daily K values estimated by e.g. D_OLS__K, i.e., accept daily prior means for K600.daily (K600.daily.mu[d]) and either set or fit the value of K600.daily.tau for K600.daily ~ dnorm(K600.daily.mu[d], K600.daily.tau)

# replace the first code chunk in this issue comment with these lines:
for(d in 1:nday) {
  GPP.daily ~ dnorm(GPP.daily.mu, GPP.daily.tau)
  ER.daily ~ dnorm(ER.daily.mu, ER.daily.tau)
  # K600.daily ~ dnorm(K600.daily.mu, K600.daily.tau) # replace this with the following:
  K600.daily.fQ <- K600.A + K600.B*Q[d]
  K600.daily ~ dnorm(K600.daily.fQ, K600.daily.tau)
  # K600.A and K600.B are K v Q coefficients that are either given as inputs or are fit here e.g. by 
  #   K600.A ~ dnorm(K600.A.mu, K600.A.tau)
  #   K600.B ~ dnorm(K600.B.mu, K600.B.tau)
}

Could do any combination of the above. Choosing among these options requires selecting different values for:

the JAGS txt file, which specifies the hierarchy to assume. See the code chunks above.
the constants and/or daily values supplied to metab_bayes, then passed to prepjags_bayes_simple, then included in the jags dataList, and ultimately made available within the JAGS model, because the priors differ for each option. The necessary priors are implied by the above code chunks.
the outputs stored in the metab_bayes@fit slot, because different parameters are fit in each option. The useful outputs will include any parameters that are fit in a given variant.

function[s] to interpolate K600

model_K600 - produce some model to predict daily K from a smaller number of daily K estimates (e.g., from metab_night) - start with a spline of K v Q with mean(K) replacing Q bins with few observations.

calc_K600_interp - accept an output from model_K600 and a ts of discharge and/or dates; produce a ts of daily K600 estimates

alternatively, calc_K600_interp could do it all, first model and then predict. this would be simpler for multi-part models like the spline + means, but makes it harder to access the model fits themselves. give option to also return the model[s], maybe?

uniform data interface for all metab_models

metab_models ought to accept several of a finite number of data column names. Those columns should always require the same units.

refer to dates as local.date in both input and output

applies to metab_mle, metab_bayes, metab_knight, etc.; useful to have same column name now that it's possible to pass predictions from one model as input to the next

catch & save warnings from nlm() in metab_mle

right now the warnings just go to stdout, hard to tell which day they apply to

build vignettes into package

Add unitted as dependency and set up CI accordingly

https://github.com/appling/unitted/releases/tag/v0.2.0

rename metab_bayes_simple

this can just be metab_bayes. it's going to cover a lot of bayesian options all in one function.

create initial simple metab fit function

take advantage of data_daily handling for mm_predict_1ply

mm_predict_1ply currently picks out the right row from metab_ests. but now that mm_model_by_ply splits both instantaneous and daily data, mm_predict_1ply should be able to pass metab_ests as the daily data and expect the row to be picked out for it.

make metab_simple handle multiple dates

right now it won't even notice if there's more than one date

add test suite

unify naming conventions for imported LMetab functions

pull more metrics from each model

there ought to be CIs from metab_mle and convergence metrics from metab_bayes, for example

naming/organization for variants on light, depth, schmidt, etc. functions

LakeMetabolizer uses k.cole, k.read, k.vachon, etc. to distinguish among available functions for estimating k. Did that work well, and should we plan to do it again (though probably with underscores, e.g., calc_light_methodA, calc_light_methodB)? Any lessons learned that we can apply here? @lawinslow @jread-usgs

is mm_is_valid_day::need_complete obsolete?

think about it. it might be, now that mm_data records which columns are optional.

if it's not, then more functions ought to be accepting and passing this argument. metab_bayes, metab_mle, etc. would be candidates.

Research LakeMetabolizer utility functions, pull them in

@importFrom ... and create lightweight wrappers if appropriate (if only useful as a private function, do not expose to user).

option to use estimated K600 in metab_mle

this is already partly implemented, but it's broken. fix it and test it.

add function for estimating a smooth light curve

i have code for this from another project and will copy / clean for this package

switch from local.time to solar.time

doesn't actually matter which time we use as long as it has regular time steps (i.e., not apparent solar time) and is interpretable (UTC time is a poor choice because in plots/tables you'll have to guess at the timing of peak sun) and comes close to representing solar time (because model_by_ply splits data up into days according to the hours specified in this time column).

both local.time(standard) and solar.time(mean) are decent candidates, but solar.time is best because it's even more closely aligned to true solar time. users will generally have to calculate it anyway in order to model light.

does this mean we should also switch from 'local.date' to 'solar.date'? or just 'date', maybe.

limiting dependencies?

How careful do we want to be about additional dependencies? For example, in PR #34 I just added lubridate to aid in a single function, but I could probably remove that dependency if needed. Not sure yet whether lubridate will be useful in functions we'll be writing later.

Should we use match.args on inputs?

handy function, or too sloppy?

add coveralls

add travis CI and appveyor

add package disclaimer

handle tryCatch warnings better

I've learned how to catch warnings better - something about muffleWarnings. this needs to be done in mle_1ply and bayes_simple_1ply in particular.

calc_depth.R

Need a function that takes discharge (or something fancier, later) and returns an estimate of depth in m.

Add Bob to this project

looks like @jread-usgs or another owner has to do this

metab_simple class w/ predict_metab and predict_DO

other functions in the metab_model_interface should be taken care of by the parent metab_model class, but predict_metab and predict_DO will be specific to this one.

implement metab_night

nighttime regressions for K estimation

travis & appveyor issues

as of 6a3afb2:

appveyor is currently failing because unitted has that unitted_ordered S4 problem in the new (devel?) version[s] of R.

travis is failing here, in a testthat check: https://travis-ci.org/USGS-R/streamMetabolizer#L7019

Running the tests in ‘tests/testthat.R’ failed.
Last 13 lines of output:
         time.type = "standard"))
  13: with_tz(local.time, "GMT")
  14: is.POSIXlt(time)
  15: is(x, "POSIXlt")
  16: convert_GMT_to_localtime(adate, latitude = 40, longitude = -103.8, time.type = "standard")
  17: stop("sorry, could not find time zone for specified lat/long")

This works on my computer, so I suspect it has to do with the call from a Travis process to Google to get the local time zone.

plan interface for the suite of expected models

(This issue will be modified as I continue to think about it)

Desired models:

Day-by-day MLE with observation error to estimate P+R+K
Day-by-day Bayes with observation error to estimate P+R+K
Day-by-day MLE with process error to estimate P+R+K
Day-by-day Bayes with process error to estimate P+R+K
Nighttime regression by OLS to estimate K
Bayesian hierarchical approach to estimate K vs Q function?
Day-by-day MLE with observation error to estimate P+R given K
Day-by-day Bayes with observation error to estimate P+R given K
Day-by-day MLE with process error to estimate P+R given K
Day-by-day Bayes with process error to estimate P+R given K
Hierarchical Bayes with observation error to estimate P+R+K with which hierarchy? So many options. Could do any combination of the following. See issue #57 for more.
- Constrain overall average (mean and/or tau) P, R, and/or K.
- Constrain day-to-day variation in P, R, and/or K.
- Constrain K to be near the daily K values estimated by nighttime regression

Options shared across MLE, Bayesian models

observation vs process error: calc_DO_fun = c('calc_DO_mod', 'calc_DO_mod_by_diff')
date delineation: c(start_hour, end_hour)
if taking K as given, then ts of K values should be supplied as an arg to metab_xxx

MLE models: metab_mle

constant parameters: inits are c(GPP=3, ER=-5, K600=5)
if taking K as given, then we should use a variant on onestation_negloglik that doesn't expect K600.daily among the params

Bayesian models: metab_bayes

if non-hierarchical (independent days), constant parameters: DO.err.tau.shape=0.001, DO.err.tau.rate=0.001, GPP.daily.mu = 10, GPP.daily.tau = 1/(10^2), ER.daily.mu = -10, ER.daily.tau = 1/(10^2), K600.daily.mu = 10, K600.daily.tau = 1/(10^2)
if hierarchical, use mm_model_by_ply to produce new input data with non-overlapping (partially copied) plys
if taking K as given, then we should use variants on prepjags_bayes_simple and runjags_bayes_simple that don't expect K600.daily among the params to estimate

Questions

Is it OK to have overlapping days when modeling consecutive days? Does it matter whether the model is distinct for each day vs hierarchical using the distribution of daily estimates?
What to do about hierarchical models for which we have missing days? Can we ignore that there are gaps?

mm_is_valid_day and metab_mle and metab_bayes should use the same start/end hour format

currently have both day_start/day_end and start_hour/end_hour, interpreted differently, and overridden in places so that these aren't actually user-specifiable options. fix that.

manage units within metab_mle()

something like what Jordan is doing with calc_DO_deficit, etc.

simulate ER to compare the effectiveness of different daily time windows

See #56. From that thread, here's a copy of Bob's thoughts on simulating ER data to evaluate the utility of ~31 hour days in metabolism estimation:

I think we should base how we solve for ER based on data and not a hunch one way or the other. The thing to do would be to be to generate a month long time series with varying and known ER and then try both approaches and see which gives back the best ER. The key is how to generate the fake data? We do not want ER varying randomly. Maybe allow it to wander. Or put a shock in (say a flood) lowering ER and then it recovers.

One thing about ER is that it particularly tricky to measure. Unlike GPP which is a relative change in O2, ER is absolute difference. So from an estimation perspective it is probably best to use both nights to get more data. Yes it adds autocorrelation, but if ER is not biologically auto correlated, then we have big problems. The one way that we could make a mistake with using both nights is in high GPP streams where the ER on any one night is a function of GPP the day before, so that daily variation in ER responses to the daily variation in GPP. Furthering the problem is that ER will change through the night as the stream temp changes or as the yummy carbon from the day's photosynthesis gets eaten up. And ER during the day might be 2-10 time higher than the night, but there is not much we can do about that.

Hmm, that might be a way to vary ER with fake data, vary GPP and make ER a fraction of GPP above some base, as did Hall and Beaulieu.

Model to simulate data

This might not make sense, but question: I agree that simulated data should be created including the process error, but can the model used to create the data influence the results of the comparison between this model and another (the one without process error). I mean, could someone say results obtained with model A are better because data were created with model A? This is why the objective of
confirming we’ve made the right assumption about the presence of process error in real data was included.

Need to figure out what phi should be to create simulated data.

metab_mle sds aren't working

getting NaNs