Code Monkey home page Code Monkey logo

beset's Introduction

beset: Best Subset Predictive Modeling

beset is a portmanteau of BEst subSET, which references the overall objective of this package: to identify the best subset of variables for a predictive model.

Installation instructions

To install beset in R, you first need to install the devtools package if you haven’t already:

	install.packages("devtools")

Once devtools is installed, use the following command to install beset on your system:

	devtools::install_github("jashu/beset", build_vignettes = TRUE)

After intalling, to learn more about beset, start with the vignettes:

  browseVignettes(package = "beset")

beset's People

Contributors

jashu avatar

Stargazers

Jingqing Nian avatar Megan McMahon avatar  avatar Derek Alexander avatar Christoph Stiehm avatar

Watchers

James Cloos avatar  avatar

beset's Issues

Can't calculate pseudo R2 for a ZINB done with pscl::zeroinfl

I'm not sure what I'm doing wrong, but I can't calculate pseudo R2 for a zero-inflated model built with pscl::zeroinfl using beset::r2d(). The package documentation is not clear with this object returned by the function can be passed to r2d.

Appreciate if you could clarify that.

Error calculating pseudo R2 for intercept-only ZINB built with pscl::zeroinfl using beset::r2d

I'm trying to calculate the pseudo R2 value for a zero-inflated, negative binomial model built with pscl::zeroinfl, using beset::r2d.

Here are the relevant excerpts from my code:

fish <- read.csv('fish.csv', header=TRUE, row.names = 1)
fish$site.ID <- as.factor(fish$site.ID)
fish$year <- as.factor(fish$year)
colnames(fish)[16] <- "phylo"
library(MASS)
library(pscl)
m14 <- zeroinfl(phylo ~ site.ID + year + site.ID:year + standard.length | 1, data = fish, dist = "negbin")
summary(m14)

The model (m14) contains an intercept-only zero-inflation component. Whenever I go to run r2d(m14), I get the error

Error in rep_len(prob, LLL) : cannot replicate NULL to a non-zero length

I don't understand where this error is coming from. Can you advise on how to troubleshoot?

Trouble installing from GitHub

I am getting an error message when trying to install this package on GitHub. Please let me know if you see a way to work through this.

Thank you

Error: Failed to install 'beset' from GitHub:
System command 'R' failed, exit status: 1, stdout + stderr (last 10 lines):
E> Quitting from lines 57-58 (validate.Rmd)
E> Error: processing vignette 'validate.Rmd' failed with diagnostics:
E> The engine "block2" is for R Markdown only
E> --- failed re-building ‘validate.Rmd’
E>
E> SUMMARY: processing the following files failed:
E> ‘best-subsets.Rmd’ ‘elastic-net.Rmd’ ‘validate.Rmd’
E>
E> Error: Vignette re-building failed.
E> Execution halted

R - r2d function in beset package not providing any input

I am trying to assess the fit of the following zero-inflated negative binomial model based on it's R2 value, using the r2d function of the beset package.

My model looks like this:

zero.infl.neg.bin.mod.reports.ac.year <- glmmTMB(Reports_per_week ~ (1|Park) + Reports_4w +
Number_4w_AC + factor(Year),
ziformula = ~ Reports_4w + Number_4w_AC + (1|Park),
data = Reports_per_park_per_week_3,
family = nbinom2, na.action = "na.fail")

I tried doing the following:

install.packages("remotes")
remotes::install_github("jashu/beset")
library(beset)

r2d(zero.infl.neg.bin.mod.reports.ac.year)
This results in the following:

Fit R-squared:

As you can see, it doesn't actually provide any output. Does that mean that there is something wrong with my model, or with my code? Or is this function simply not compatible with models built using glmmTMB?

`create_folds` can return empty folds when sample size is small (N / fold is in single digits)

Using validate with default arguments (10-fold CV) and smaller sample sizes (~ N < 75)---or using functions that call validate (e.g., beset_elnet) under the same circumstances---occasionally results in error that traces back to utility function create_folds, which applies stratified random sampling to assign observations to folds in an attempt to balance an equal N per fold while at the same time achieving a similar distribution of the outcome variable for each fold. As sample size drops below 75 (and the number of holdout cases in each fold is on average smaller than 7.5), equalizing distributions becomes impossible, but the current algorithm continues to attempt stratification, resulting in infrequent random failure to assign any holdout cases to one or more of the requested folds. For example, with 10-fold cross-validation, this error was found to occur with the following frequency in a test simulation:

  • With N = 75, occurs 0 times out of every 10,000 random seeds.
  • With N = 70, occurs 6 times out of every 10,000 random seeds (0.06% of the time).
  • With N = 60, occurs 21 times out of every 10,000 random seeds (0.21 % of the time).
  • With N = 50, occurs 236 times out of every 10,000 random seeds (2.36 % of the time).

Arguably, k-fold cross-validation is not the optimal validation strategy for samples this small (as it approaches the high variance of leave-one-out cross-validation), but this should not result in failing to assign at least one individual to each hold-out fold.

`predict` method for objects inheriting "beset"/"elnet" class returns incorrect units for logistic regression models

Using predict or methods that call predict (e.g., dependence) on "beset" "elnet" models fit to a binomial response returns erroneous values as if the response were Gaussian. Traces back to calling glmnet::predict.glmnet to generate predictions whereas glmnet package uses a separate function, predict.lognet, for logistic models. (When the binomial family is used, the glmnet function returns an object of class "lognet" rather than just class "glmnet".)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.