Code Monkey home page Code Monkey logo

rmo's People

Contributors

hsloot avatar

Stargazers

 avatar  avatar  avatar  avatar

Watchers

 avatar  avatar

Forkers

gzeller1 lexurifl

rmo's Issues

[REVIEW] Test concept and custom expectations

Checklist

  • Usability
  • Output

Proposal

The test concept is based on using the function expect_equal_sampling_result to easily compare to sampling algorithms against each other for a given parameter-set, a number of simulations, and dimension.

It was already evident, that testing the sample_cpp function in this concept was not straightforward. Thus a change might be necessary.

[REFACTOR] Parametrization of the exchangeable Markov model

Summary

The exchangeable Markov model is currently parametrized with the exchangeable intensities $\lambda_i$. However, the relevant quantity for the algorithm are the scaled exchangeable intensities $\binom{d}{I} \lambda_i$. Computationally, it is desirable to calculate the latter directly.

Proposal

Rewrite the algorithm such that the parameter ex_intensity has the interpretation of the scaled exchangeable intensity.

Additional context

Here are three mathematically equivalent ways to calculate the intensity matrix

qmatrix1 <- function(bf, d) {
  ex_intensities <- sapply(1:d, function(i) valueOf(bf, d-i, i))
  out <- matrix(0, nrow = d+1, ncol = d+1)
  for (i in 0:d) {
    if (i < d) {
      for (j in (i+1):d) {
        out[1+i, 1+j] <- rmo:::multiply_binomial_coefficient(
          sum(sapply(
            0:i, 
            function(k) {
              rmo:::multiply_binomial_coefficient(ex_intensities[k + j-i], i, k)
            })), d-i, j-i)
      }
      out[1+i, 1+i] <- -sum(out[1+i, ])
    }
  }
  
  out
}
qmatrix2 <- function(bf, d) {
  out <- matrix(0, nrow = d+1, ncol = d+1)
  for (i in 0:d) {
    if (i < d) {
      for (j in (i+1):d) {
        out[1+i, 1+j] <- valueOf(bf, d-j, j-i, n=d-i, k=j-i)
      }
      out[1+i, 1+i] <- -valueOf(bf, d-i, n = d-i, k = j-i)
    }
  }
  
  out
}
qmatrix3 <- function(bf, d) {
  ex_intensities <- sapply(
    1:d, 
    function(i) {
      valueOf(bf, d-i, i, n = d, k = i)
      })
  out <- matrix(0, nrow = d+1, ncol = d+1)
  out[1, -1] <- ex_intensities
  out[1, 1] <- -sum(out[1, -1])
  for (i in 1:d) {
    if (i < d) {
      for (j in (i+1):d) {
        out[1+i, 1+j] <- (d-j+1)/(d-i+1) * out[i, j] + (j+1-i)/(d-i+1) * out[i, 1+j]
      }
      out[1+i, 1+i] <- -sum(out[1+i, ])
    }
  }
  
  out
}

[BUG] Integration test shows that there are numerical problems for large dimension `d > 100`

Description

It was discovered in the integration test that the representation of the distribution via intensities might lead to numerical problems for large dimension parameters and certain parametrisations.

  1. Certain Kolmogorov-Smirnov test rejects Null-Hypothesis for rmo_ex_arnold and large d.
  2. rmo_lfm_cpp sometimes produces ties for large d.

reprex (1)

library(rmo)
n <- 1e4L
alpha <- 0.4
bf <- AlphaStableBernsteinFunction(alpha=alpha)

d <- 10L
valueOf(bf, d, 0L)
#> [1] 2.511886
ex_intensities <- ex_intensities_alpha_stable(d, alpha=alpha)
scale <- sum(ex_intensities * sapply(1:d, function(x) choose(d, x)))
## Kolmogorov-Smirnov test should not reject
ks.test(scale * apply(rmo_ex_arnold(n, d, ex_intensities), 1, min), pexp)
#> 
#>  One-sample Kolmogorov-Smirnov test
#> 
#> data:  scale * apply(rmo_ex_arnold(n, d, ex_intensities), 1, min)
#> D = 0.0069895, p-value = 0.713
#> alternative hypothesis: two-sided
## Should be very similar
all.equal(scale, valueOf(bf, d, 0L))
#> [1] "Mean relative difference: 2.618647e-07"

d <- 125L
valueOf(bf, d, 0L)
#> [1] 6.898648
ex_intensities <- ex_intensities_alpha_stable(d, alpha=alpha)
scale <- sum(ex_intensities * sapply(1:d, function(x) choose(d, x)))
## Kolmogorov-Smirnov test should not reject
ks.test(scale * apply(rmo_ex_arnold(n, d, ex_intensities), 1, min), pexp)
#> 
#>  One-sample Kolmogorov-Smirnov test
#> 
#> data:  scale * apply(rmo_ex_arnold(n, d, ex_intensities), 1, min)
#> D = 0.067735, p-value < 2.2e-16
#> alternative hypothesis: two-sided
## Should be very similar
all.equal(scale, valueOf(bf, d, 0L))
#> [1] "Mean relative difference: 0.0002861992"

Created on 2020-09-19 by the reprex package (v0.3.0)

reprex (2)

library(rmo)

n <- 1e4L
d <- 125L

set.seed(1623L)
length(unique(apply(rmo_lfm_cpp(n, d, 0.1, 0.2, 0.4, "rexp", list("rate" = 0.6)), 1, min)))
#> [1] 9998

Created on 2020-09-19 by the reprex package (v0.3.0)

Scientific documentation

Feature request: Scientific documentation of Marshall-Olkin distributions

For a good collaboration, the background material must be accessible to all possible collaborators.

To-Do's

  • Create a document for background information

Write a section on

  • the classical definition of Marshall-Olkin distributions
  • the exogenous shock model
  • the Arnold model
  • operations on Marshall-Olkin distributions
  • the exchangeable Marshall-Olkin distribution
  • the modification of the Arnold model for exchangeable distributions
  • the extendible Marshall-Olkin distribution
  • the Cuadras-Augé distribution
  • Lévy frailty models
  • Models based on hierarchical building blocks

References

Mai, J. F., & Scherer, M. (2017). Simulating Copulas.

[REVIEW] Implementation of the exchangeable Arnold model

Checklist

  • Rcpp implementation
  • Original R implementation
  • Bivariate R implementation

Proposal

Review the implementation of the exogenous shock model in Rcpp:

rmo/src/rmo.cpp

Lines 114 to 164 in cf85b82

NumericMatrix Rcpp__rmo_ex_arnold(unsigned int n, unsigned int d, NumericVector ex_intensities) {
double tmp, waiting_time;
unsigned int state;
double total_intensity;
NumericVector transition_probs;
NumericVector values;
IntegerVector perm;
NumericMatrix generator_matrix(d+1, d+1);
for (unsigned int i=0; i<d+1; i++) {
for (unsigned int j=0; j<d+1; j++) {
if (j < i) {
generator_matrix(i, j) = 0.;
} else if (j > i) {
tmp = 0.;
for (unsigned int k=0; k<i+1; k++) {
tmp += R::choose(i, k) * ex_intensities[k+j-i-1];
}
tmp *= R::choose(d-i, (j-i));
generator_matrix(i, j) = tmp;
}
}
generator_matrix(i, i) = -sum(generator_matrix(i, _));
}
NumericMatrix out(n, d);
for (unsigned int k=0; k<n; k++) {
if ((d*k) % C_CHECK_USR_INTERRUP == 0)
checkUserInterrupt();
values = NumericVector(d, 0.);
state = 0;
while (state < d) {
total_intensity = -generator_matrix(state, state);
transition_probs = NumericVector(d-state, 0.);
for (unsigned int i=state+1; i<d+1; i++) {
transition_probs[i-state-1] = generator_matrix(state, i) / total_intensity;
}
waiting_time = R::exp_rand() / total_intensity;
for (unsigned int i=state; i<d; i++) {
values[i] += waiting_time;
}
state += 1 + sample(d-state, 1, false, transition_probs, false)[0];
}
perm = sample(d, d, false, R_NilValue, false); // Use `RNGkind(sample.kind="Rounding")` for comparison, since R.3.6.x not implemented in Rcpp
values = values[perm];
out(k, _) = values;
}
return wrap( out );
}

Review, simplify, and document the test-implementations in R:

test__rmo_ex_arnold_R <- function(n, d, ex_intensities) { # nolint
## store total_intensity and transition_probs for all possible states
## (number of destroyed components) in a list
generator_list <- list()
for (i in 1:d) {
transition_probs <- vapply(1:i, function(x) sum(vapply(0:(d-i), function(y) choose((d-i), y) * ex_intensities[[x + y]], FUN.VALUE=0.5)) , FUN.VALUE=0.5) * vapply(1:i, function(x) choose(i, x), FUN.VALUE = 0.5) # nolint intermediate result
total_intensity <- sum(transition_probs)
transition_probs <- transition_probs / total_intensity
generator_list[[i]] <- list("total_intensity" = total_intensity,
"transition_probs" = transition_probs)
}
out <- matrix(0, nrow=n, ncol=d)
for (k in 1:n) {
value <- test__rmo_ex_arnold_sorted_R(d, generator_list)
perm <- sample.int(d, d, replace = FALSE)
out[k, ] <- value[perm]
}
out
}

test__rmo_ex_arnold_sorted_R <- function(d, generator_list) { # nolint
total_intensity <- generator_list[[d]]$total_intensity
transition_probs <- generator_list[[d]]$transition_probs
waiting_time <- rexp(1, total_intensity)
num_affected <- sample.int(d, 1, replace=FALSE, prob=transition_probs)
if (d == num_affected) {
return(rep(waiting_time, d))
}
waiting_time + c(rep(0, num_affected),
test__rmo_ex_arnold_sorted_R(d-num_affected, generator_list))
}

test__rmo_ex_arnold_alternative_R <- function(n, d, ex_intensities) { # nolint
ex_a <- vapply(0:(d-1), function(x) sum(vapply(0:(d-x-1), function(y) choose(d-x-1, y) * ex_intensities[[y+1]], FUN.VALUE=0.5)), FUN.VALUE=0.5) # nolint
out <- matrix(0, nrow=n, ncol=d)
for (i in 1:n) {
ex_a_tmp <- ex_a
d_tmp <- d
while (d_tmp > 0) {
ex_a_tmp <- ex_a_tmp[1:d_tmp]
ex_intensities_tmp <- vapply(1:d_tmp, function(x) sum(vapply(0:(x-1), function(y) (-1)^(y) * choose(x-1, y) * ex_a_tmp[[d_tmp-x+y+1]], FUN.VALUE=0.5)), FUN.VALUE=0.5) # nolint
transition_probs <- vapply(1:d_tmp, function(x) choose(d_tmp, x), FUN.VALUE=0.5) *
ex_intensities_tmp # intermediate result
total_intensity <- sum(transition_probs)
transition_probs <- transition_probs / total_intensity
waiting_time <- rexp(1, total_intensity)
num_affected <- sample.int(d_tmp, 1, replace=FALSE, prob=transition_probs)
out[i, (d-d_tmp+1):d] <- out[i, (d-d_tmp+1):d] + waiting_time
d_tmp <- d_tmp - num_affected
}
perm <- sample.int(d, d, replace=FALSE)
out[i, ] <- out[i, perm]
}
out
}

test__rmo_ex_arnold_bivariate_R <- function(n, d, ex_intensities) { # nolint
total_intensity <- 2*ex_intensities[[1]] + ex_intensities[[2]]
transition_probs <- c(2*ex_intensities[[1]], ex_intensities[[2]]) /
total_intensity
out <- matrix(0, nrow=n, ncol=2)
for (i in 1:n) {
waiting_time <- rexp(1, total_intensity)
num_affected <- sample.int(2, 1, replace=FALSE, prob=transition_probs)
if (num_affected < 2) {
out[i, 1] <- waiting_time
waiting_time <- rexp(1, sum(ex_intensities))
num_affected <- sample.int(1, 1, replace=FALSE) # dummy
out[i, 2] <- out[i, 1] + waiting_time
perm <- sample.int(2, 2, replace=FALSE)
out[i, ] <- out[i, perm]
} else {
out[i, ] <- waiting_time
perm <- sample.int(2, 2, replace=FALSE) # dummy
}
}
out
}

[FEAT] Calculate entries of Markovian intensity matrix directly

Summary

At the moment, we are able to calculate the exchangeable intensities either directly or via their integral representation. For the main application, the input often might be a scaled version thereof (multiplied by a binomial coefficient).

Proposal

Generalize the valueOf-function such that the controlled integration error is calculated for the scaled value (binomial coefficient goes under the integral).

Better collaboration instructions

Github: Better collaboration

Currently, the guidelines on collaborations for this project are not very specific.

To-Do's:

  • Create issue template
  • Create pull request template
  • Create code of conduct
  • Create contributing document
  • Create section in README.Rmd for contributing

[BUG] Wrong results for `is_within` with high values

Description

The function is_within(i, j) produces wrong results for very high values of j>2^31-1, see also #35.

The reason for this is undocumented, implementation-dependent, or maybe just unknown behaviour (to me) how conversion is implemented in Rcpp and c++. R prevents integer overflow, by conversing an integer higher than 2^31-1 to a double. However, if a (theoretical) integer between 2^31-1 (32bit) and 2^63-1 (64bit) is passed from R to an Rcpp-function taking a long int there might be some loss due to previous internal conversions to a double. See also https://stackoverflow.com/a/1848762 for a discussion on the largest representable integer in a 64bit double.

reprex

library(rmo)
rmo:::is_within(1L, 2L^55L+1L) ## FALSE, should be TRUE
#> [1] FALSE

Created on 2020-02-16 by the reprex package (v0.3.0)

Session info
devtools::session_info()
#> ─ Session info ───────────────────────────────────────────────────────────────
#>  setting  value                       
#>  version  R version 3.6.2 (2019-12-12)
#>  os       macOS Catalina 10.15.3      
#>  system   x86_64, darwin19.2.0        
#>  ui       unknown                     
#>  language (EN)                        
#>  collate  de_DE.UTF-8                 
#>  ctype    de_DE.UTF-8                 
#>  tz       Europe/Berlin               
#>  date     2020-02-16                  
#> 
#> ─ Packages ───────────────────────────────────────────────────────────────────
#>  package     * version    date       lib source        
#>  assertthat    0.2.1      2019-03-21 [1] CRAN (R 3.6.0)
#>  backports     1.1.5      2019-10-02 [1] CRAN (R 3.6.1)
#>  callr         3.4.1      2020-01-24 [1] CRAN (R 3.6.2)
#>  cli           2.0.1      2020-01-08 [1] CRAN (R 3.6.1)
#>  crayon        1.3.4      2017-09-16 [1] CRAN (R 3.6.0)
#>  desc          1.2.0      2018-05-01 [1] CRAN (R 3.6.0)
#>  devtools      2.2.1      2019-09-24 [1] CRAN (R 3.6.1)
#>  digest        0.6.23     2019-11-23 [1] CRAN (R 3.6.1)
#>  ellipsis      0.3.0      2019-09-20 [1] CRAN (R 3.6.1)
#>  evaluate      0.14       2019-05-28 [1] CRAN (R 3.6.0)
#>  fansi         0.4.1      2020-01-08 [1] CRAN (R 3.6.1)
#>  fs            1.3.1      2019-05-06 [1] CRAN (R 3.6.0)
#>  glue          1.3.1      2019-03-12 [1] CRAN (R 3.6.0)
#>  highr         0.8        2019-03-20 [1] CRAN (R 3.6.0)
#>  htmltools     0.4.0      2019-10-04 [1] CRAN (R 3.6.1)
#>  knitr         1.28       2020-02-06 [1] CRAN (R 3.6.2)
#>  magrittr      1.5        2014-11-22 [1] CRAN (R 3.6.0)
#>  memoise       1.1.0      2017-04-21 [1] CRAN (R 3.6.0)
#>  pkgbuild      1.0.6      2019-10-09 [1] CRAN (R 3.6.1)
#>  pkgload       1.0.2      2018-10-29 [1] CRAN (R 3.6.0)
#>  prettyunits   1.1.1      2020-01-24 [1] CRAN (R 3.6.2)
#>  processx      3.4.2      2020-02-09 [1] CRAN (R 3.6.2)
#>  ps            1.3.0      2018-12-21 [1] CRAN (R 3.6.0)
#>  R6            2.4.1      2019-11-12 [1] CRAN (R 3.6.1)
#>  Rcpp          1.0.3      2019-11-08 [1] CRAN (R 3.6.1)
#>  remotes       2.1.0      2019-06-24 [1] CRAN (R 3.6.0)
#>  rlang         0.4.4      2020-01-28 [1] CRAN (R 3.6.2)
#>  rmarkdown     2.1        2020-01-20 [1] CRAN (R 3.6.2)
#>  rmo         * 0.2.0.0000 2020-02-10 [1] local         
#>  rprojroot     1.3-2      2018-01-03 [1] CRAN (R 3.6.0)
#>  sessioninfo   1.1.1      2018-11-05 [1] CRAN (R 3.6.0)
#>  stringi       1.4.5      2020-01-11 [1] CRAN (R 3.6.1)
#>  stringr       1.4.0      2019-02-10 [1] CRAN (R 3.6.0)
#>  testthat      2.3.1      2019-12-01 [1] CRAN (R 3.6.1)
#>  usethis       1.5.1      2019-07-04 [1] CRAN (R 3.6.0)
#>  withr         2.1.2      2018-03-15 [1] CRAN (R 3.6.0)
#>  xfun          0.12       2020-01-13 [1] CRAN (R 3.6.1)
#>  yaml          2.2.1      2020-02-01 [1] CRAN (R 3.6.2)
#> 
#> [1] /usr/local/lib/R/3.6/site-library
#> [2] /usr/local/Cellar/r/3.6.2/lib/R/library

[REFACTOR] Remove redundant assertions

Description

After moving to a more sophisticated C++ backend, many parameter-checks are performed there and the original assertions became redundant. They should be removed and the error messages of the backend should be reviewed.

Related Issues

#56

[FEAT] Parameter generators

Summary

At the moment, the package does not provide any functions to generate Marshall--Olkin parameter. for high dimensional models.

Proposal

Provide helper functions to generate various types of Marshall-Olkin distribution parameters. These could first be used internally to improve and extend testing.

[REFACTOR] Don't use extended signature for `valueOf`

Summary

At the moment, the valueOf-function uses signatures c("BernsteinFunction", "numeric", "integer"). This leads to impractical consequences. E.g. the third argument must be an actual integer and not and integerish double. Also for some applications a complex argument might be appropriate and for some Bernstein functions, the implementation might be able to process complex input.

Proposal

Only dispatch on first argument and argument check the rest.

[REFACTOR] LFM and CPP

Problem

Currently, the LFM outsources the generation of the whole CPP history which is required to a dedicated method. This implies overhead and produces more problems (e.g. more overhead by re-allocation or the requirement to calculate the Laplace transform of the jump distribution for estimating the required capacity, see #49).

Proposal

  • Refactor the LFM algorithm such that only the current state and next jump is in memory.
  • Create an additional method to test the CPP.

[FEATURE] Add composition scaling class for Bernstein Functions

Summary

Compositions of Bernstein functions are again Bernstein functions. This holds in particular for the inner functions being a linear function. This is easy to implement and will allow a larger variety of Bernstein functions to be created.

Proposal

A possible implementation

## allClass-S4.R

CompositScaledBernsteinFunction <- setClass("CompositScaledBernsteinFunction",
  contains  = "BernsteinFunction",
  slots = c(cscale = "numeric", original = "BernsteinFunction"))
## allValidity-S4.R
setValidity("CompositScaledBernsteinFunction",
  function(object) {
    qassert(object@cscale, "N1(0,)")

    invisible(TRUE)
  })
## allInitialize-S4.R

setMethod("initialize", "CompositScaledBernsteinFunction",
  function(.Object, cscale = 1, original = AlgebraicBernsteinFunction2()) { # nolint
    .Object@cscale <- cscale
    .Object@original <- original
    validObject(.Object)

    invisible(.Object)
  })
## valueOf-S4.R

setMethod("valueOf", "CompositScaledBernsteinFunction",
  function(object, x, difference_order = 0L, cscale = 1, n = 1, k = 0, ...,
           method = c("default", "levy", "stieltjes"),
           tolerance = .Machine$double.eps^0.5) {
    method <- match.arg(method)
    if (!"default" == method) {
      return(callNextMethod())
    }
    assert(combine = "or",
           check_numeric(x, lower = 0, min.len = 1L, any.missing = FALSE),
           check_complex(x, min.len = 1L, any.missing = FALSE))
    qassert(difference_order, "X1[0,)")
    qassert(cscale, "N1(0,)")
    qassert(n, "X1(0,)")
    qassert(k, "N1[0,)")

    valueOf(object@original, x, difference_order, cscale*object@cscale, n, k, ...)
  })

As a showcase, the following shows how this can be used to create the InverseGaussianBernsteinFunction based on the Algebraic Bernstein function no. 2.

## allClass-S4.R

AlgebraicBernsteinFunction2 <- setClass("AlgebraicBernsteinFunction2",
  contains = "CompleteBernsteinFunction",
  slots = c(alpha = "numeric"))
## allValidity-S4.R

setValidity("AlgebraicBernsteinFunction2",
  function(object) {
    qassert(object@alpha, "N1(0,)")

    invisible(TRUE)
  })
## allInitialize-S4.R

setMethod("initialize", "AlgebraicBernsteinFunction2",
  function(.Object, alpha = log2(2 - 0.5)) { # nolint
    .Object@alpha <- alpha
    validObject(.Object)

    invisible(.Object)
  })
## levyDensity-S4.R

setMethod("levyDensity", "AlgebraicBernsteinFunction2",
  function(object) {
    structure(
      function(x) {
        object@alpha / gamma(1 - object@alpha) * x ^ (-1 - object@alpha) * exp(-x)
      },
      lower = 0, upper = Inf, type = "continuous"
    )
  })
## stieltjesDensity-S4.R

setMethod("stieltjesDensity", "AlgebraicBernsteinFunction2",
  function(object) {
    structure(
      function(x) {
        sin(object@alpha * pi) / pi * (x - 1) ^ (object@alpha - 1) / x
      },
      lower = 1, upper = Inf, type = "continuous"
    )
  })
eta <- 2
valueOf(InverseGaussianBernsteinFunction(eta = eta), seq(0, 2, by = 0.25), 0L, cscale = 1)

valueOf(
  ScaledBernsteinFunction(
    scale = eta,
    original = CompositScaledBernsteinFunction(
      cscale = 2 / eta^2,
      original = AlgebraicBernsteinFunction2(
        alpha = 0.5
        )
      )
    ),
  seq(0, 2, by = 0.25))

[REVIEW] Implementation of the Cuadras-Augé model

Checklist

  • Rcpp implementation
  • Original R implementation
  • Bivariate R implementation

Proposal

Review the implementation of the exogenous shock model in Rcpp:

rmo/src/rmo.cpp

Lines 171 to 189 in cf85b82

NumericMatrix Rcpp__rmo_esm_cuadras_auge(unsigned int n, unsigned int d, double alpha, double beta) { // alpha, beta >= 0
NumericVector individual_shocks;
double global_shock;
NumericMatrix out(n, d);
for (unsigned int k=0; k<n; k++) {
if ((d*k) % C_CHECK_USR_INTERRUP == 0)
checkUserInterrupt();
individual_shocks = Rcpp::rexp(d, alpha);
global_shock = ((0 == beta) ? R_PosInf : exp_rand() / beta);
for (unsigned int i=0; i<d; i++) {
out(k, i) = min2(individual_shocks[i], global_shock);
}
}
return wrap( out );
}

Review, simplify, and document the test-implementations in R:

test__rmo_esm_cuadras_auge_R <- function(n, d, alpha, beta) { # nolint
out <- matrix(NA, nrow=n, ncol=d)
for (k in 1:n) { # use rexp_if_rate_zero_then_infinity from `R/sample.R`
individual_shocks <- rexp_if_rate_zero_then_infinity(d, alpha)
global_shock <- rexp_if_rate_zero_then_infinity(1L, beta)
out[k, ] <- pmin(individual_shocks, rep(global_shock, d))
}
out
}

test__rmo_esm_cuadras_auge_bivariate_R <- function(n, d, alpha, beta) { # nolint
out <- matrix(NA, nrow=n, ncol=2L)
for (k in 1:n) { # use rexp_if_rate_zero_then_infinity from `R/sample.R`
individual_shock_1 <- rexp_if_rate_zero_then_infinity(1L, alpha)
individual_shock_2 <- rexp_if_rate_zero_then_infinity(1L, alpha)
global_shock <- rexp_if_rate_zero_then_infinity(1L, beta)
out[k, ] <- pmin(c(individual_shock_1, individual_shock_2), rep(global_shock, 2L))
}
out
}

[REVIEW] Implementation of `rmo_esm`

Checklist

  • Rcpp implementation
  • Original R implementation
  • Bivariate R implementation

Proposal

Review the implementation of the exogenous shock model in Rcpp:

rmo/src/rmo.cpp

Lines 44 to 67 in cf85b82

NumericMatrix Rcpp__rmo_esm(unsigned int n, unsigned int d, NumericVector intensities) {
double intensity, shock_time;
NumericVector value;
NumericMatrix out(n, d);
for (unsigned int k=0; k<n; k++) {
if ((d*k) % C_CHECK_USR_INTERRUP == 0)
checkUserInterrupt();
value = NumericVector(d, R_PosInf);
for (unsigned int j=0; j < pow(2., d)-1; j++) {
intensity = intensities[j];
shock_time = (intensity == 0 ? R_PosInf : R::exp_rand() / intensity);
for (unsigned int i=0; i<d; i++) {
if (is_within(i+1, j+1)) {
value[i] = min2(value[i], shock_time);
}
}
}
out(k, _) = value;
}
return wrap( out );
}

Review, simplify, and document the test-implementations in R:

test__rmo_esm_R <- function(n, d, intensities) { # nolint
out <- matrix(NA, nrow=n, ncol=d)
for (k in 1:n) {
value <- rep(Inf, d)
for (j in 1:(2^d - 1)) {
shock_time <- rexp_if_rate_zero_then_infinity(1, intensities[[j]])
for (i in 1:d) {
if (test__is_within_R(i, j))
value[i] <- min(c(value[[i]], shock_time))
}
}
out[k, ] <- value
}
out
}

test__rmo_esm_bivariate_R <- function(n, d, intensities) { # nolint
out <- matrix(0, nrow = n, ncol = 2)
for (i in 1:n) {
shock_for_1 <- rexp_if_rate_zero_then_infinity(1, intensities[[1]])
shock_for_2 <- rexp_if_rate_zero_then_infinity(1, intensities[[2]])
shock_for_1_and_2 <- rexp_if_rate_zero_then_infinity(1, intensities[[3]])
out[i, ] <- pmin(c(shock_for_1, shock_for_2), shock_for_1_and_2)
}
out
}

[TEST] Use mocking framework in simulation tests

Summary

The simulation tests, which test if a C++ implementation based on R's RNG and univariate sampling algorithms yield the same result as a naive R implementation, currently use separate C++ functions following the naming convention rtest__*. It might be more clear to mock the production R-functions.

Proposal

Use mockery's stub function similarly to

## in tests/testthat/test_esm-model.R
...
test_that("ESM implementation for d = 2", {
  stub(rmo_esm, 'Rcpp__rmo_esm', rtest__rmo_esm)
  ...
  expect_equal_rn_generation(
    "rmo_esm", "test__rmo_esm_bivariate",
    args, n, use_seed)
}
...

[FEATURE REQUEST]: Implement argument checks

Summary

The current implementations of the sampling algorithms, rmo_esm, rmo_arnold, and rmo_ex_arnold, do not perform any argument checks. To avoid any potential problems, the algorithms should check the input arguments for proper type and range as well as matching dimensions.

Proposal

As argument checks are only performed once in the beginning, code readability instead of speed should be the main concern. As these checks are only performed once for all simulations, it is more important that the assertions are readable than that they are fast.

For rmo_esm, rmo_arnold, and rmo_ex_arnold it has to be checked that the number of sample n and the dimension d are counting numbers:

assert_that(is.count(n), is.count(d))

For rmo_esm and rmo_arnold, it has to be checked that intensities is a numeric, non-negative vector of length 2^d-1 and that the marginal rates are strictly positive:

assert_that(is.numeric(intensities), length(intensities) == 2^d-1, all(intensities >= 0))
marginal_intensities <- numeric(d)
for (i in 1:d) {
  for (j in :1:(2^d-1)) {
    if (is_within(i, j) { ## use `is_within` from sets.R
      marginal_intensities[i] <- marginal_intensities[i] + intensities[j]
    }
  }
}
assert_that(all(marginal_intensities > 0)) ## could also raise a warning instead of an error

For rmo_ex_arnold , it has to be checked that ex_intensities is a numeric, non-negative random vector with lentgh d and that the marginal rates are strictly positive:

assert_that(is.numeric(ex_intensites), all(ex_intensities >= 0), length(ex_intensities) == d)
marginal_intensity <- sum(vapply(0:(d-1), function(y) choose(d-1, y) * ex_intensities[[y+1]], FUN.VALUE=0.5))
assert_that(marginal_intensity > 0)

Additional context

A comparison of multiple packages to implement assertions can be found here: https://cran.r-project.org/web/packages/checkr/vignettes/assertive-programming.html.

[REVIEW] Custom assertions

Summary

The custom assertions should be as simple as possible, follow the single responsibility principle, have descriptive names. The error messages should be informative and the corresponding variables should follow a naming convention.

Checklist

Miscellaneous assertions

  • is_positive_number: Tests if a number is scalar and positive; not contained in assertthat-package
  • is_nonnegative_number: Tests if a number is scalar and non-negative; not contained in assertthat-package

Discribution assertions

  • is_dimension: Tests if a number is a natural number larger than 1

Marshall--Olkin assertions

  • is_32bit_complient_dimension: Tests if a dimension for a Marshall--Olkin distribution is 32-bit compliant, i.e. if the number of distribution parameters will not exceed the 32-bit limit.
  • is_mo_parameter: Tests if a parameter vector of shock rates is a valid Marshall--Olkin parameter vector, i.e. if it has the correct length, all elements are non-negative numbers, and if the resulting marginal rates are positive.
  • is_ex_mo_parameter: Tests if a parameter vector of shock rates is a valid Marshall--Olkin parameter vector, i.e. if it has the correct length, all elements are non-negative numbers, and if the resulting marginal rate is positive.
  • is_lfm_cpp_param: Tests if a parameter list is a valid parameterisation of a Levy-frailty Marshall--Olkin model, i.e. if the rate, drift, and killing parameters are non-negative numbers and sum up to a positive number and if jump distribution is correctly parametrised.

Jump distribution assertions

  • is_rjump_name: Tests if the given name is a valid method to simulate jumps; jump distributions are currently limited to exponential and fixed jump distributions.
  • is_rjump_param: Tests if the given parameter list is valid for the given method name by performing a test run (warning: this might change the RNG state).

[FEAT] Pareto jumps in CPP + Pareto jump CPP Bernstein function

Summary and Proposal

In Sec. 6.3 of the following reference, it is described how an alpha-stable subordinator can be approximated by a CPP with Pareto-type jumps and a drift.

Fernández Loroño, Lexuri. "Selected topics in financial engineering: first-exit times and dependence structures of Marshall-Olkin Kind." (2015).

It would be good to implement

  • Pareto-type jumps and
  • A Bernstein function for a CPP process with Pareto jumps

Additional context

  • There is already a package Pareto which implements the sampling algorithm via the probability integral transform.
  • Comparing the approximation of the alpha-stable subordinator via a CPP subordinator with Pareto jumps and drift vs. the corresponding ex-Arnold model would be good material for the README or a vignette.

[TEST] Integration tests

Summary

While unit tests are already extensive, an integration test of some sort is missing. This could be a Monte-Carlo simulation, which produces estimators for intensities or for shock arrival probabilities.

[REVIEW] Binary representation and `is_within`

Checklist

  • Rcpp implementation
  • Original R implementation

Proposal

Review the implementation of the MO parameter mapping via the is_within function in Rcpp:

rmo/src/rmo.cpp

Lines 11 to 24 in cf85b82

bool is_within(unsigned int i, unsigned int j) {
if (1 + log2(j+1) < i) // j >= 2^(i-1)-1 <=> 1+log2(j+1) >= i
return false;
int count = 1;
while (j>0) {
if (1 == (j%2) && count == i)
return true;
j /= 2;
count++;
}
return false;
} // bool is_within(int i, int j);

[REVIEW] Parameter checks

Proposal

At the moment, most of the parameter checks are implemented in R using assertthat. A consequence of that is a performance hit - in particular, if large parameters are involved. It would be good to either incorporate parameter checks directly in the support library or at least rewrite some of the tests in C++.

[REFACTOR] Expose jump generators

  • Create a scr/jump_generators.cpp to expose all jump generators.
  • Create dedicates tests for all jump generators.
  • Refactor Test helpers to use these generators, where appropriate.

[FEATURE] Add S4 methods to create `intensities`, `ex_intensities` and `qmatrix`

Summary

The package should provide S4 methods to create either an intensities vector, an ex_intensities vector or the Markov intensity matrix qmatrix.

Possible implementation

## valueOf-S4.R

setGeneric("intensities", 
  function(object, d, ...) {
    standardGeneric("intensities")
  })

setGeneric("exIntensities", 
  function(object, d, ...) {
    standardGeneric("exIntensities")
  })

setGeneric("exQMatrix", 
  function(object, d, ...) {
    standardGeneric("exQMatrix")
  })

setMethod("intensities", "BernsteinFunction",
  function(object, d, ...) {
    tmp <- sapply(1:d, function(i) valueOf(object, d-i, i, ...))
    out <- numeric(2^d-1)
    for (j in seq_along(out)) {
      count <- 0
      for (i in 1:d) {
        count <- count + Rcpp__is_within(i, j)
      }
      out[j] <- tmp[count]
      }

  out
  })

setMethod("exIntensities", "BernsteinFunction", 
  function(object, d, ...) {
    sapply(1:d, function(i) valueOf(object, d-i, i, n = d, k = i))
  })

setMethod("exQMatrix", "BernsteinFunction", 
  function(object, d, ...) {
    out <- matrix(0, nrow = d+1, ncol = d+1)
    out[1, -1] <- exIntensities(object, d, ...)
    for (i in 1:d) {
      if (i < d) {
        for (j in (i+1):d) {
          out[1+i, 1+j] <- (d-j+1)/(d-i+1) * out[i, j] + (j+1-i) / (d-i+1) * out[i,j+1]
        }
      }
    }
    diag(out) <- -apply(out, 1, sum)
    
    out
  })

[FEATURE REQUEST] Add `R` implementation of the Arnold model

Summary

Add an implementation of the Arnold model in sample.R in plain R.

Proposal

The algorithm for the Arnold model is described in section 3.1.2 in Mai, J. F., & Scherer, M. (2017). Simulating Copulas.

An implementation that I found in my old files, but which has to be tested and documented is the following:

rmo_Arnold <- function(n, d, intensities) {
  total_intensity <- sum(intensities)
  transition_probs <- intensities / total_intensity
  out <- matrix(nrow=n, ncol=d)

  for (k in 1:n) {
    destroyed <- logical(d)
    value <- numeric(d)

    while(!all(destroyed)) {
      W <- rexp(1, total_intensity)
      Y <- sample.int(n = 2^d-1, size=1, prob = transition_probs)

      for (i in 1:d) {
        if (!destroyed[[i]]) {
          if (is_within(i, Y)) { ## uses the fn. `is_within` (implemented in `sets.R`)
            destroyed[[i]] <- TRUE
          }
          value[[i]] <- value[[i]] + W
        }
      }
    }
    out[k, ] <- value
  }

  out
}

This algorithm uses the function is_within which is implemented in sets.R. For the reference:

is_within <- function(i, j) {
  count <- 1
  while (j > 0) {
    if (1 == (j %% 2) && count == i)
      return(TRUE)
    j <- j %/% 2
    count <- count + 1
  }

  return(FALSE)
}

Alternatives

Ultimately, a C or Rcpp implementation of the algorithm would be preferable, but for the beginning, a plain R implementation should be sufficient. Especially, because this implementation can later be used for testing if a reimplementation in C or Rcpp is done.

Ultimately, if a reimplementation in C or Rcpp is done, one has to carefully think on the algorithms which are used such that a comparison with the R implementation for testing is possible. Particularly, the sample.int function might be a candidate for review.

Additional context

No additional context.

[BUG] LFM implementation does not properly account for drift

Description

The implementation of the compound Poisson LFM does not properly take into account the drift. In particular, it is not considered that the compound Poisson subordinator, simulated in sample_cpp, can drift over multiple barrier_values before the next waiting_time/killing_time is attained.

[REFACTOR] Use checkmate

Summary

Use the checkmate package for simple assert statements (such as value is scalar and integerish).

Proposal

A clear and concise description of what you want to happen.

[REFACTOR] Remove `lambda` parameter from `PoissonBernsteinFunction`

Proposal

The scale parameter lambda from the PoissonBernsteinFunction is redundant since the following Bernstein Functions are equivalent:

PoissonBernsteinFunction(lambda = lambda, eta = eta)
ScaledBernsteinFunction(scale = lambda, original = PoissonBernsteinFunction(eta = eta))

Removing lambda will not remove any feature, but will make the user interface more consistent, since most Bernstein function families have a single parameter.

[DOC] Include CI benchmark report on website

Proposal

It would be good a have the most recent benchmarks included on the website. For this reason, we have to write our own plotting routine such that these display only tagged commits and are separated by their algorithm.

Integrate lintr

Feature Request: Integrate lintrcode-checking

The R-package lintr allows to define static code tests to ensure certain code standards and styles are fulfilled.

To-Do's:

  • Choose coding style
  • Create lintr setup according to code style
  • Use lintr with travis-ci
  • Write section in contributing document
  • Update pull request document

[REFACTOR] cpp11 conversion

Summary

Since we use a limited range of Rcpp's functionality, we might benefit from switching to cpp11 (see https://github.com/r-lib/cpp11). However, some of the currently exported functions and tests might not work and have to be changed. The main benefit would probably be reduced compilation time.

Proposal

TODO

  • Create branch from master
  • Refactor functions with in-out parameters
  • Refactor functions with non-standard parameters (e.g. Nullable)

[REVIEW] Implementation of the Arnold model

Checklist

  • Rcpp implementation
  • Original R implementation
  • Bivariate R implementation

Proposal

Review the implementation of the Arnold model in Rcpp:

rmo/src/rmo.cpp

Lines 73 to 108 in cf85b82

NumericMatrix Rcpp__rmo_arnold(unsigned int n, unsigned int d, NumericVector intensities) {
double total_intensity, waiting_time;
unsigned int affected;
LogicalVector destroyed;
NumericVector transition_probs, values;
total_intensity = sum(intensities);
transition_probs = intensities / total_intensity;
NumericMatrix out(n, d);
for (unsigned int k=0; k<n; k++) {
if ((d*k) % C_CHECK_USR_INTERRUP == 0)
checkUserInterrupt();
destroyed = LogicalVector(d, false);
values = NumericVector(d, 0.);
while (is_false(all(destroyed))) {
waiting_time = R::exp_rand() / total_intensity;
affected = sample((int) pow(2., d)-1, 1, false, transition_probs, true)[0];
for (unsigned int i=0; i<d; i++) {
if (!destroyed[i]) {
values[i] += waiting_time;
if (is_within(i+1, affected)) {
destroyed[i] = true;
}
}
}
}
out(k, _) = values;
}
return wrap( out );
}

Review, simplify, and document the test-implementations in R:

test__rmo_arnold_R <- function(n, d, intensities) { # nolint
total_intensity <- sum(intensities)
transition_probs <- intensities / total_intensity
out <- matrix(nrow=n, ncol=d)
for (k in 1:n) {
destroyed <- logical(d)
value <- numeric(d)
while (!all(destroyed)) {
waiting_time <- rexp(1, total_intensity)
affected <- sample.int(n=2^d-1, size=1, replace=FALSE, prob=transition_probs)
for (i in 1:d) {
if (!destroyed[[i]]) {
if (test__is_within_R(i, affected)) {
destroyed[[i]] <- TRUE
}
value[[i]] <- value[[i]] + waiting_time
}
}
}
out[k, ] <- value
}
out
}

test__rmo_arnold_bivariate_R <- function(n, d, intensities) { # nolint
total_intensity <- sum(intensities)
transition_probs <- intensities / total_intensity
out <- matrix(0, nrow=n, ncol=2)
for (i in 1:n) {
destroyed <- rep(FALSE, 2)
while (!all(destroyed)) {
waiting_time <- rexp(1, total_intensity)
affected <- sample.int(3, 1, replace=FALSE, prob=transition_probs)
out[i, !destroyed] <- out[i, !destroyed] + waiting_time
if (affected == 1) {
destroyed[1] <- TRUE
} else if (affected == 2) {
destroyed[2] <- TRUE
} else {
destroyed <- rep(TRUE, 2)
}
}
}
out
}

[BUG]: Problem with zero rates in `rmo_esm`

Description

The rmo_esm sampling algorithm does not properly deal with zero rates. By convention, a rate of zero should correspond to Inf, but rexp produces NA. Problem is also present in the corresponding test.

reprex

library(rmo)

rmo_esm(1, 2, c(1, 1, 0))
#> Warning in rexp(1, intensities[[j]]): NAs produziert
#>      [,1] [,2]
#> [1,]  NaN  NaN

Created on 2019-11-14 by the reprex package (v0.3.0)

[BUG]: `rmo_lfm_cpp` produces `Inf`-values

Description

The function rmo_lfm_cpp produces Inf-values for the example code for the independence case. Also, the last two examples are marked with NOT RUN. This should be removed.

reprex

library(rmo)

sessionInfo()
#> R version 3.6.1 (2019-07-05)
#> Platform: x86_64-w64-mingw32/x64 (64-bit)
#> Running under: Windows 10 x64 (build 14393)
#> 
#> Matrix products: default
#> 
#> locale:
#> [1] LC_COLLATE=German_Germany.1252  LC_CTYPE=German_Germany.1252   
#> [3] LC_MONETARY=German_Germany.1252 LC_NUMERIC=C                   
#> [5] LC_TIME=German_Germany.1252    
#> 
#> attached base packages:
#> [1] stats     graphics  grDevices utils     datasets  methods   base     
#> 
#> other attached packages:
#> [1] rmo_0.1.0.9000
#> 
#> loaded via a namespace (and not attached):
#>  [1] compiler_3.6.1   assertthat_0.2.1 magrittr_1.5     tools_3.6.1     
#>  [5] htmltools_0.4.0  yaml_2.2.0       Rcpp_1.0.3       stringi_1.4.3   
#>  [9] rmarkdown_1.17   highr_0.8        knitr_1.26       stringr_1.4.0   
#> [13] xfun_0.11        digest_0.6.22    rlang_0.4.1      evaluate_0.14

rmo_lfm_cpp(10L, 2L, 0, 0, 1, "rposval", list("value"=1))  ## independence
#>            [,1]       [,2]
#>  [1,] 1.1298525        Inf
#>  [2,]       Inf 0.32693263
#>  [3,]       Inf 0.09800674
#>  [4,] 0.0532895        Inf
#>  [5,]       Inf 0.02729041
#>  [6,] 0.5211671        Inf
#>  [7,] 0.8131072        Inf
#>  [8,]       Inf 0.21877149
#>  [9,]       Inf 0.22773527
#> [10,]       Inf 0.19788030

Created on 2019-11-27 by the reprex package (v0.3.0)

[TEST] Statistical unit tests

Summary

The "production" sampling algorithms should be subject to statistical unit tests. A general framework for this is outlined in https://github.com/hsloot/rmo/blob/master/other/integration-test.md.

Proposal

Create a test utility class for providing a Kolmogorov-Smirnov or a chi-squared test (for discrete distributions). For MO distributions, subsequently, test the (appropriately) scaled joined minimum of each observation-row against a unit exponential distribution. For univariate distributions, test against the theoretical distribution. Boost/random uses a similar test-framework with a test-failure if p<0.01.

Alternatives

  • Don't test univariate distributions and helper classes.
  • Don't test alternative variants for test-purposes
  • Use other statistical tests or different methods to attain a univariate rv for the KS test
  • Use a different value for p-threshold

[BUG]: `rmo_lfm_cpp` should accept `rate = 0` with conditions

Description

As of now, rmo_lfm_cpp requires that is_positive_number(rate) is TRUE. This is too strong since it would be valid to choose e.g.

rate=0, rate_killing=1, rate_drift=0 ## comonotone case 
rate=0, rate_killing=0, rate_drift=1 ## independence case 

[FEATURE REQUEST]: Refactor test and test more input configurations

Summary

As of now, all test are run only for specific parameter configurations. To ensure that all algorithms are properly working, multiple input combinations should be tested.

Proposal

The test implementations of the simulation algorithms should be refactored into separate functions which are stored in helper__*.R files in the tests/testthat directory.

Then, tests should be always run with various different input configurations.

Perhaps, one can use the MonteCarlo-package to facilitate these tests.

Additional context

For more information on test setup, see the setup and teardown section of https://www.tidyverse.org/blog/2017/12/testthat-2-0-0.

[REVIEW] Implementation of the Lévy-frailty model

Checklist

  • Rcpp implementation
  • Original R implementation
  • Bivariate R implementation

Proposal

Review the implementation of the exogenous shock model in Rcpp:

rmo/src/rmo.cpp

Lines 289 to 310 in cf85b82

NumericMatrix Rcpp__rmo_lfm_cpp(unsigned int n, unsigned int d, double rate, double rate_killing, double rate_drift, Function rjump, List rjump_arg_list) {
NumericVector unit_exponentials(d);
NumericMatrix cpp_subordinator;
unsigned int count;
NumericMatrix out(n, d);
for (unsigned int k=0; k<n; k++) {
unit_exponentials = Rcpp::rexp(d);
cpp_subordinator = sample_cpp(rate, rate_killing, rate_drift, rjump, rjump_arg_list, unit_exponentials);
for (unsigned int i=0; i<d; i++) {
count = 0;
while (cpp_subordinator(count, 1) < unit_exponentials[i] && count < cpp_subordinator.nrow())
count += 1;
if (cpp_subordinator.nrow() == count)
stop("internal error: exponential value out of subordinator range");
out(k, i) = cpp_subordinator(count, 0);
}
}
return out;
}

rmo/src/rmo.cpp

Lines 213 to 283 in cf85b82

NumericMatrix sample_cpp(double rate, double rate_killing, double rate_drift, Function rjump, List rjump_arg_list, NumericVector barrier_values) {
barrier_values = clone(barrier_values);
if (rate_drift>0.) {
std::sort(barrier_values.begin(), barrier_values.end());
} else {
barrier_values = NumericVector(1, max(barrier_values));
}
unsigned int d = barrier_values.size();
double waiting_time;
double jump_value;
double killing_waiting_time;
double current_value;
double intermediate_waiting_time;
double intermediate_value;
Function do_call("do.call");
rjump_arg_list.push_back(1, "n");
std::vector<double> times(1);
std::vector<double> values(1);
for (unsigned int i=0; i<d; i++) {
while (values.back() < barrier_values[i]) {
waiting_time = ((0. == rate) ? R_PosInf : R::exp_rand()/rate);
// requires RNGstate synchronisation
PutRNGstate();
jump_value = as<double>(do_call(rjump, rjump_arg_list));
GetRNGstate();
killing_waiting_time = ((0. == rate_killing) ? R_PosInf : R::exp_rand()/rate_killing);
if (killing_waiting_time < R_PosInf && killing_waiting_time <= waiting_time) {
for (unsigned int j=i; j<d; j++) {
if (rate_drift > 0. && (barrier_values[j] - values.back())/rate_drift <= killing_waiting_time) {
intermediate_waiting_time = (barrier_values[j] - values.back()) / rate_drift;
times.push_back(times.back() + intermediate_waiting_time);
values.push_back(barrier_values[j]);
killing_waiting_time -= intermediate_waiting_time;
}
}
times.push_back(times.back() + killing_waiting_time);
values.push_back(R_PosInf);
} else {
for (unsigned int j=i; j<d; j++) {
if (rate_drift > 0. && (barrier_values[j] - values.back())/rate_drift <= waiting_time) {
intermediate_waiting_time = (barrier_values[j] - values.back())/rate_drift;
times.push_back(times.back() + intermediate_waiting_time);
values.push_back(barrier_values[j]);
waiting_time -= intermediate_waiting_time;
}
}
if (rate > 0.) { // waiting_time < R_PosInf
times.push_back(times.back() + waiting_time);
values.push_back(values.back() + waiting_time * rate_drift + jump_value);
}
}
}
}
NumericMatrix out(times.size(), 2);
for (unsigned int i=0; i<times.size(); i++) {
out(i, 0) = times[i];
out(i, 1) = values[i];
}
colnames(out) = CharacterVector::create("t", "value");
return out;
}

Review, simplify, and document the test-implementations in R:

test__rmo_lfm_cpp_R <- function(n, d, rate, rate_killing, rate_drift, rjump_name, rjump_arg_list = list()) { # nolint
assert_that(is.count(n), is.count(d), is_nonnegative_number(rate),
is_nonnegative_number(rate_killing), is_nonnegative_number(rate_drift),
is_positive_number(rate + rate_killing + rate_drift),
is_rjump_name(rjump_name), is_rjump_arg_list(rjump_name, rjump_arg_list))
out <- matrix(NA, nrow=n, ncol=d)
for (k in 1:n) {
unit_exponentials <- rexp(d)
cpp_subordinator <- test__sample_cpp_R(rate, rate_killing, rate_drift,
rjump_name, rjump_arg_list, unit_exponentials)
out[k, ] <- vapply(1:d, function(x) {
min(cpp_subordinator[cpp_subordinator[, 2] >= unit_exponentials[[x]], 1])
}, FUN.VALUE=0.5)
}
out
}

test__sample_cpp_R <- function(rate, rate_killing, rate_drift, rjump, rjump_arg_list, barrier_values) { # nolint
if (rate_drift>0.) {
barrier_values <- sort(barrier_values)
} else {
barrier_values <- max(barrier_values)
}
d <- length(barrier_values)
times <- 0.
values <- 0.
for (i in 1:d) {
while (last(values) < barrier_values[[i]]) {
waiting_time <- rexp_if_rate_zero_then_infinity(1, rate)
jump_value <- do.call(rjump, args=c("n"=1, rjump_arg_list))
killing_waiting_time <- rexp_if_rate_zero_then_infinity(1, rate_killing)
if (killing_waiting_time < Inf && killing_waiting_time <= waiting_time) {
for (j in i:d) {
if (rate_drift>0. &&
(barrier_values[[j]] - last(values)) /
rate_drift<=killing_waiting_time) {
intermediate_waiting_time <- (barrier_values[[j]] - last(values)) / rate_drift
times <- c(times, last(times) + intermediate_waiting_time)
values <- c(values, barrier_values[[j]])
killing_waiting_time <- killing_waiting_time - intermediate_waiting_time
}
}
times <- c(times, last(times) + killing_waiting_time)
values <- c(values, Inf)
} else {
for (j in i:d) {
if (rate_drift>0. &&
(barrier_values[[j]] - last(values))/rate_drift <= waiting_time) {
intermediate_waiting_time <- (barrier_values[[j]] - last(values)) / rate_drift
times <- c(times, last(times) + intermediate_waiting_time)
values <- c(values, barrier_values[[j]])
waiting_time <- waiting_time - intermediate_waiting_time
}
}
if (rate>0.) { ## waiting_time<Inf # nolint
times <- c(times, last(times) + waiting_time)
values <- c(values, last(values) + waiting_time * rate_drift + jump_value)
}
}
}
}
cbind("t"=times, "value"=values)
}

test__rmo_lfm_cpp_independence_R <- function(n, d, rate, rate_killing, rate_drift, rjump_name, rjump_arg_list) { # nolint
assertthat::assert_that(assertthat::is.count(n), assertthat::is.count(d),
is_positive_number(rate_drift),
is_rjump_name(rjump_name), is_rjump_arg_list(rjump_name, rjump_arg_list)) ## dummy test
out <- matrix(NA, nrow=n, ncol=d)
for (k in 1:n) {
unit_exponentials <- rexp(d, 1)
out[k, ] <- unit_exponentials / rate_drift
}
out
}

test__rmo_lfm_cpp_comonotone_R <- function(n, d, rate, rate_killing, rate_drift, rjump_name, rjump_arg_list) { # nolint
assertthat::assert_that(assertthat::is.count(n), assertthat::is.count(d),
is_positive_number(rate_killing),
is_rjump_name(rjump_name), is_rjump_arg_list(rjump_name, rjump_arg_list)) ## dummy test
out <- matrix(NA, nrow=n, ncol=d)
for (k in 1:n) {
stopifnot(all(rexp(d, 1) >= 0)) ## dummy
out[k, ] <- rexp(1L, rate_killing)
}
out
}

test__rmo_lfm_cpp_bivariate_R <- function(n, d, rate, rate_killing, rate_drift, rjump_name, rjump_arg_list) { # nolint
if (rjump_name == "rexp") {
return(test__rmo_lfm_cpp_bivariate_rexp_R(n, rate, rate_killing,
rate_drift, rjump_arg_list$rate))
} else if (rjump_name == "rposval") {
return(test__rmo_lfm_cpp_bivariate_rposval_R(n, rate, rate_killing,
rate_drift, rjump_arg_list$value))
}
}

test__rmo_lfm_cpp_bivariate_rexp_R <- function(n, rate, rate_killing, rate_drift, jump_rate) { # nolint
stopifnot(rexp(1L, jump_rate) > 0) ## dummy
out <- matrix(NA, nrow=n, ncol=2L)
for (k in 1:n) {
unit_exponentials <- rexp(2L, 1)
if (rate_drift>0.) {
barrier_values <- sort(unit_exponentials)
} else {
barrier_values <- max(unit_exponentials)
}
d <- length(barrier_values)
times <- 0.
values <- 0.
for (i in 1:d) {
while (last(values) < barrier_values[[i]]) {
waiting_time <- rexp_if_rate_zero_then_infinity(1, rate)
jump_value <- rexp_if_rate_zero_then_infinity(1, jump_rate)
killing_waiting_time <- rexp_if_rate_zero_then_infinity(1, rate_killing)
if (killing_waiting_time < Inf && killing_waiting_time <= waiting_time) {
for (j in i:d) {
if (rate_drift>0. &&
(barrier_values[[j]] - last(values))/rate_drift<=killing_waiting_time) {
intermediate_waiting_time <- (barrier_values[[j]] - last(values)) / rate_drift
times <- c(times, last(times) + intermediate_waiting_time)
values <- c(values, barrier_values[[j]])
killing_waiting_time <- killing_waiting_time - intermediate_waiting_time
}
}
times <- c(times, last(times) + killing_waiting_time)
values <- c(values, Inf)
} else {
for (j in i:d) {
if (rate_drift>0. && (barrier_values[[j]] - last(values))/rate_drift <= waiting_time) {
intermediate_waiting_time <- (barrier_values[[j]] - last(values)) / rate_drift
times <- c(times, last(times) + intermediate_waiting_time)
values <- c(values, barrier_values[[j]])
waiting_time <- waiting_time - intermediate_waiting_time
}
}
if (rate>0) { ## waiting_time<Inf # nolint
times <- c(times, last(times) + waiting_time)
values <- c(values, last(values) + waiting_time * rate_drift + jump_value)
}
}
}
}
cpp_subordinator <- cbind("t"=times, "values"=values)
out[k, ] <- vapply(1:2L, function(x) min(cpp_subordinator[cpp_subordinator[, 2] >= unit_exponentials[[x]], 1]), FUN.VALUE=0.5) # nolint
}
out
}

test__rmo_lfm_cpp_bivariate_rposval_R <- function(n, rate, rate_killing, rate_drift, jump_value) { # nolint
out <- matrix(NA, nrow=n, ncol=2L)
for (k in 1:n) {
unit_exponentials <- rexp(2L, 1)
if (rate_drift>0.) {
barrier_values <- sort(unit_exponentials)
} else {
barrier_values <- max(unit_exponentials)
}
d <- length(barrier_values)
times <- 0.
values <- 0.
for (i in 1:d) {
while (last(values) < barrier_values[i]) {
waiting_time <- rexp_if_rate_zero_then_infinity(1, rate)
killing_waiting_time <- rexp_if_rate_zero_then_infinity(1, rate_killing)
if (killing_waiting_time < Inf && killing_waiting_time <= waiting_time) {
for (j in i:d) {
if (rate_drift>0. &&
(barrier_values[[j]] - last(values))/rate_drift<=killing_waiting_time) {
intermediate_waiting_time <- (barrier_values[[j]] - last(values)) / rate_drift
intermediate_value <- barrier_values[[j]]
times <- c(times, last(times) + intermediate_waiting_time)
values <- c(values, intermediate_value)
killing_waiting_time <- killing_waiting_time - intermediate_waiting_time
}
}
times <- c(times, last(times) + killing_waiting_time)
values <- c(values, Inf)
} else {
for (j in i:d) {
if (rate_drift>0. &&
(barrier_values[[j]] - last(values))/rate_drift <= waiting_time) {
intermediate_waiting_time <- (barrier_values[[j]] - last(values)) / rate_drift
intermediate_value <- barrier_values[[j]]
times <- c(times, last(times) + intermediate_waiting_time)
values <- c(values, intermediate_value)
waiting_time <- waiting_time - intermediate_waiting_time
}
}
if (rate>0.) { ## waiting_time<Inf # nolint
times <- c(times, last(times) + waiting_time)
values <- c(values, last(values) + waiting_time * rate_drift + jump_value)
}
}
}
}
cpp_subordinator <- cbind("t"=times, "values"=values)
out[k, ] <- vapply(1:2L, function(x) min(cpp_subordinator[cpp_subordinator[, 2] >= unit_exponentials[[x]], 1]), FUN.VALUE=0.5) # nolint
}
out
}

[TEST] Statistical unit tests

Summary

The "production" sampling algorithms should be subject to statistical unit tests. A general framework for this is outlined in https://github.com/hsloot/rmo/blob/master/other/integration-test.md.

Proposal

Create a test utility class for providing a Kolmogorov-Smirnov or a chi-squared test (for discrete distributions). For MO distributions, subsequently, test the (appropriately) scaled joined minimum of each observation-row against a unit exponential distribution. For univariate distributions, test against the theoretical distribution. Boost/random uses a similar test-framework with a test-failure if p<0.01.

Alternatives

  • Don't test univariate distributions and helper classes.
  • Don't test alternative variants for test-purposes
  • Use other statistical tests or different methods to attain a univariate rv for the KS test
  • Use a different value for p-threshold

[FEATURE REQUEST] Add `R` implementation of the "exchangeable" Arnold model

Summary

Add an implementation of the modified Arnold model for the exchangeable Marshall-Olkin distribution in plain R.

Proposal

The following algorithm is given in section 3.2.2 in Mai, J. F., & Scherer, M. (2017). Simulating Copulas.

An implementation that I found in my old files, but which has to be tested and documented is listed below. It is based on the parametrised Marshall-Olkin distribution.

rmo_exArnold <- function(n, a) {
  t(vapply(1:n, FUN = function(x) rmo_exArnold_(a), FUN.VALUE = 0.5, USE.NAMES = FALSE))
}

rmo_exArnold_ <- function(a, shuffle=TRUE) {
  d <- length(a)
  value <- numeric(d)

  ex_intensities <- vapply(1:d,
    function(x) sum(vapply(1:x, function(y) (-1)^(y-1) * choose(x-1, y-1) * a[[d-x+y]],
      FUN.VALUE = 0.5 )), FUN.VALUE = 0.5)
   
  transition_probabilities <- vapply(1:d, function(x) choose(d, x), FUN.VALUE = 0.5) * 
    ex_intensities
  total_intensity <- sum(transition_probabilities)
  transition_probabilities <- transition_probabilities / total_intensity

  epsilon <- rexp(1, total_intensity)
  affected <- sample.int(d, 1, replace = TRUE, transition_probabilities)
  Y <- rep(epsilon, affected)

  if (d == affected)
    return(Y)

  value <- c(Y, epsilon + rmo_exArnold_(a[d-affected], shuffle=FALSE))
  if (shuffle) {
    perm <- sample.int(d, d, replace = FALSE)
    value <- value[perm]
  }

  value
}

Alternatives

The implementation based on the reparametrisation seems to be overly complicated. Consider the following alternative:

rmo_exArnold <- function(n, ex_intensities) {
  d <- length(ex_intensities)
  out <- matrix(0, nrow=n, ncol=d)
 
  for (i in 1:n) {
    value <- rmo_exArnold_sorted(ex_intensities)
    perm <- sample.int(d, d, replace = FALSE)
    out[i, ] <- value[perm]
  }

  out
}

rmo_exArnold_sorted <- function(ex_intensities) {
  d <- length(ex_intensities)

  transition_probabilities <- vapply(1:d, function(x) choose(d, x), FUN.VALUE = 0.5) * 
    ex_intensities
  total_intensity <- sum(transition_probabilities)
  transition_probabilities <- transition_probabilities / total_intensity

  epsilon <- rexp(1, total_intensity)
  affected <- sample.int(d, 1, replace = TRUE, transition_probabilities)
  Y <- rep(epsilon, affected)

  if (d == affected)
    return(Y)
  
  for (i in 1:affected) { ## if possible, replace with binomial expression
    ex_intensities <- ex_intensities[1:(d-i)] + ex_intensities[2:(d-i+1)]
  }
  c(Y, epsilon + rmo_exArnold_sorted(ex_intensities))
}

Further aspects:

  • One should consider providing a random factory, i.e. a function do_random(n, FUN, arg_list) which calls FUNwith arg_list ntimes, ensures the results are vectors of length d and stores the result in a n x d matrix.
  • Ultimately, a C or Rcpp implementation of the algorithm would be preferable, but for the beginning, a plain R implementation should be sufficient. Especially, because this implementation can later be used for testing if a reimplementation in C or Rcpp is done.
  • If a reimplementation in C or Rcpp is done, one has to carefully think on the algorithms which are used such that a comparison with the R implementation for testing is possible. Particularly, the sample.int function might be a candidate for review.

Additional context

No additional context.

[DOC] Binary representation

Proposal

An article for the binary representation should be written which allows understanding the internal representation of Marshall--Olkin parameters better.

[FEATURE REQUEST]: Implement sampling algorithm for Cuadras-Augé family

Summary

The Cuadras-Augé family is described on https://hsloot.github.io/rmo/articles/Cuadras-Auge-distributions.html. For this family, all but the individual shocks and the global shock are almost surely infinite and can be ignored in the shock model. Also, the Arnold model and exchangeable Arnold model can be implemented more efficiently. This allows a very efficient implementation of the shock model algorithm.

Proposal

Reimplement the shock model algorithm, the Arnold model, and the exchangeable Arnold model for this class in a more efficient way.

As of now, I am not sure what the best names for the parameters as well as the specialised algorithms could be. Suggestions are very welcome.

[BUG] Numerical integration might fail in some cases

Description

For some boundary cases, the numerical integration might report divergence, even if the integral theoretically should converge (see reprex). This behaviour should somehow be avoided. Possible solutions:

  • Use try-catch and report that parameterisation is unstable
  • Change integration parameters
  • The problem might result from very small values of lambda (consider setting these results to zero)

All implemented numerical integrations should be checked if they have singularities close to zero (or infinity).

reprex

Please include a minimal reproducible example (AKA a reprex). If you've never heard of a reprex before, start by reading https://www.tidyverse.org/help/#reprex.

d <- 125

lambda <- 1
alpha <- 0.5
x0 <- 5e-4

intensities <- lambda * rmo::intensities_pareto(d, alpha, x0)
#> Error in integrate(f = function(u) {: the integral is probably divergent

Created on 2020-06-30 by the reprex package (v0.3.0)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.