jashu / beset Goto Github PK

Best Subset Predictive Modeling

R 20.01% HTML 79.87% CSS 0.11%

beset's Introduction

beset: Best Subset Predictive Modeling

beset is a portmanteau of BEst subSET, which references the overall objective of this package: to identify the best subset of variables for a predictive model.

Installation instructions

To install beset in R, you first need to install the devtools package if you haven’t already:

	install.packages("devtools")

Once devtools is installed, use the following command to install beset on your system:

	devtools::install_github("jashu/beset", build_vignettes = TRUE)

After intalling, to learn more about beset, start with the vignettes:

  browseVignettes(package = "beset")

beset's People

Contributors

Stargazers

Watchers

beset's Issues

Can't calculate pseudo R2 for a ZINB done with pscl::zeroinfl

I'm not sure what I'm doing wrong, but I can't calculate pseudo R2 for a zero-inflated model built with pscl::zeroinfl using beset::r2d(). The package documentation is not clear with this object returned by the function can be passed to r2d.

Appreciate if you could clarify that.

Add error message for attempt to use r2d function with mixed effects models

Error calculating pseudo R2 for intercept-only ZINB built with pscl::zeroinfl using beset::r2d

I'm trying to calculate the pseudo R2 value for a zero-inflated, negative binomial model built with pscl::zeroinfl, using beset::r2d.

Here are the relevant excerpts from my code:

fish <- read.csv('fish.csv', header=TRUE, row.names = 1)
fish$site.ID <- as.factor(fish$site.ID)
fish$year <- as.factor(fish$year)
colnames(fish)[16] <- "phylo"
library(MASS)
library(pscl)
m14 <- zeroinfl(phylo ~ site.ID + year + site.ID:year + standard.length | 1, data = fish, dist = "negbin")
summary(m14)

The model (m14) contains an intercept-only zero-inflation component. Whenever I go to run r2d(m14), I get the error

Error in rep_len(prob, LLL) : cannot replicate NULL to a non-zero length

I don't understand where this error is coming from. Can you advise on how to troubleshoot?

`compare` gives false error when two observed vectors have equal values but different attributes (e.g., names)

Add defensive code to prevent user from using Poisson family with negative or decimal fraction values.

Customizable Importance Plots

Hi Jason - can we add a feature to customize the scale of the importance plots?

Linear dependency check drops offending variable from data but not formula

Either change warning to an error with suggestion to remove variable before proceeding, or edit formula to also remove offending variable.

Trouble installing from GitHub

I am getting an error message when trying to install this package on GitHub. Please let me know if you see a way to work through this.

Thank you

Error: Failed to install 'beset' from GitHub:
System command 'R' failed, exit status: 1, stdout + stderr (last 10 lines):
E> Quitting from lines 57-58 (validate.Rmd)
E> Error: processing vignette 'validate.Rmd' failed with diagnostics:
E> The engine "block2" is for R Markdown only
E> --- failed re-building ‘validate.Rmd’
E>
E> SUMMARY: processing the following files failed:
E> ‘best-subsets.Rmd’ ‘elastic-net.Rmd’ ‘validate.Rmd’
E>
E> Error: Vignette re-building failed.
E> Execution halted

R - r2d function in beset package not providing any input

I am trying to assess the fit of the following zero-inflated negative binomial model based on it's R2 value, using the r2d function of the beset package.

My model looks like this:

zero.infl.neg.bin.mod.reports.ac.year <- glmmTMB(Reports_per_week ~ (1|Park) + Reports_4w +
Number_4w_AC + factor(Year),
ziformula = ~ Reports_4w + Number_4w_AC + (1|Park),
data = Reports_per_park_per_week_3,
family = nbinom2, na.action = "na.fail")

I tried doing the following:

install.packages("remotes")
remotes::install_github("jashu/beset")
library(beset)

r2d(zero.infl.neg.bin.mod.reports.ac.year)
This results in the following:

Fit R-squared:

As you can see, it doesn't actually provide any output. Does that mean that there is something wrong with my model, or with my code? Or is this function simply not compatible with models built using glmmTMB?

`create_folds` can return empty folds when sample size is small (N / fold is in single digits)

Using validate with default arguments (10-fold CV) and smaller sample sizes (~ N < 75)---or using functions that call validate (e.g., beset_elnet) under the same circumstances---occasionally results in error that traces back to utility function create_folds, which applies stratified random sampling to assign observations to folds in an attempt to balance an equal N per fold while at the same time achieving a similar distribution of the outcome variable for each fold. As sample size drops below 75 (and the number of holdout cases in each fold is on average smaller than 7.5), equalizing distributions becomes impossible, but the current algorithm continues to attempt stratification, resulting in infrequent random failure to assign any holdout cases to one or more of the requested folds. For example, with 10-fold cross-validation, this error was found to occur with the following frequency in a test simulation:

With N = 75, occurs 0 times out of every 10,000 random seeds.
With N = 70, occurs 6 times out of every 10,000 random seeds (0.06% of the time).
With N = 60, occurs 21 times out of every 10,000 random seeds (0.21 % of the time).
With N = 50, occurs 236 times out of every 10,000 random seeds (2.36 % of the time).

Arguably, k-fold cross-validation is not the optimal validation strategy for samples this small (as it approaches the high variance of leave-one-out cross-validation), but this should not result in failing to assign at least one individual to each hold-out fold.

`predict` method for objects inheriting "beset"/"elnet" class returns incorrect units for logistic regression models

Using predict or methods that call predict (e.g., dependence) on "beset" "elnet" models fit to a binomial response returns erroneous values as if the response were Gaussian. Traces back to calling glmnet::predict.glmnet to generate predictions whereas glmnet package uses a separate function, predict.lognet, for logistic models. (When the binomial family is used, the glmnet function returns an object of class "lognet" rather than just class "glmnet".)