masurp / specr Goto Github PK

View Code? Open in Web Editor NEW

66.0 4.0 6.0 30.9 MB

Conducting and Visualizing Specification Curve Analyses

Home Page: https://masurp.github.io/specr

License: GNU General Public License v3.0

R 100.00%

r rstats specification-curve multiverse

specr's Introduction

specr

Conducting and Visualizing Specification Curve Analyses

News

22 May 2024: The new development version (specr version 1.0.1) includes new functions to conduct inferences on the specification curve analysis (the third step outlined Simohnsohn et al., 2020). See this vignette for further details.
20 January 2022: specr version 1.0.0 is now available via CRAN. This is a major update with several new features and functions. Note: it introduces a new framework for conduction specification curve analyses compared to earlier versions (see version history for more details).
4 December 2020: specr development version 0.2.2 is available via github. Mostly minor updates and bug fixes.
25 May 2020: specr version 0.2.1 has been released on CRAN.

What is specr?

The goal of specr is to facilitate specification curve analyses (Simonsohn, Simmons & Nelson, 2020; also known as multiverse analyses, see Steegen, Tuerlinckx, Gelman & Vanpaemel, 2016). The package can be used to investigate how different (theoretically plausible) analytical choices affect outcome statistics within the universe of one single data set. It provides functions to setup, run, evaluate, and plot the multiverse of specifications. A simple example of how to use specr is provided below. For more information about the various functions and specific vignettes and use cases, visit the documentation.

Disclaimer

We do see a lot of value in investigating how analytical choices affect a statistical outcome of interest. However, we strongly caution against using specr as a tool to somehow arrive at a better estimate. Running a specification curve analysis does not make your findings any more reliable, valid or generalizable than a single analysis. The method is only meant to inform about the effects of analytical choices on results, and not a better way to estimate a correlation or effect.

Installation

Install specr from CRAN:

install.packages("specr")

Or install the most recent development version from GitHub with:

# install.packages("devtools")
devtools::install_github("masurp/specr")

Usage

Using specr is comparatively simple. The two main function are setup(), in which analytic choices are specified as arguments, and specr(), which fits the models across all specifications. The latter creates a class called “specr.object”, which can be summarized and plotted with generic function such as summary or plot.

# Load package ----
library(specr)

# Setup Specifications ----
specs <- setup(data = example_data, 
               y = c("y1", "y2"), 
               x = c("x1", "x2"), 
               model = c("lm"),
               controls = c("c1", "c2"),
               subsets = list(group1 = unique(example_data$group1),
                              group2 = unique(example_data$group2)))

# Run Specification Curve Analysis ----
results <- specr(specs)

# Plot Specification Curve ----
plot(results)

How to cite this package

citation("specr")
#> 
#> To cite 'specr' in publications use:
#> 
#>   Masur, Philipp K. & Scharkow, M. (2020). specr: Conducting and
#>   Visualizing Specification Curve Analyses. Available from
#>   https://CRAN.R-project.org/package=specr.
#> 
#> A BibTeX entry for LaTeX users is
#> 
#>   @Misc{,
#>     title = {specr: Conducting and Visualizing Specification Curve Analyses (Version 1.0.1)},
#>     author = {Philipp K. Masur and Michael Scharkow},
#>     year = {2020},
#>     url = {https://CRAN.R-project.org/package=specr},
#>   }

References

Simonsohn, U., Simmons, J.P. & Nelson, L.D. (2020). Specification curve analysis. Nature Human Behaviour, 4, 1208–1214. https://doi.org/10.1038/s41562-020-0912-z
Steegen, S., Tuerlinckx, F., Gelman, A., & Vanpaemel, W. (2016). Increasing Transparency Through a Multiverse Analysis. Perspectives on Psychological Science, 11(5), 702-712. https://doi.org/10.1177/1745691616658637

Other resources on multiverse/specification curve analyses

The following papers and websites are interesting resources to explore the method further and learn about potential promises and pitfalls:

Good discussion of the multiverse analysis method: Del Giudice, M., & Gangestad, S. W. (2021). A traveler’s guide to the multiverse: Promises, pitfalls, and a framework for the evaluation of analytic decisions. Advances in Methods and Practices in Psychological Science, 4(1). https://journals.sagepub.com/doi/abs/10.1177/2515245920954925
A systematic comparison of visualization techniques for multiverse analyses: Hall, B. D., Liu, Y., Jansen, Y., Dragicevic, P., Chevalier, F., & Kay, M. (2022,). A survey of tasks and visualizations in multiverse analysis reports. In Computer Graphics Forum (Vol. 41, No. 1, pp. 402-426).https://onlinelibrary.wiley.com/doi/abs/10.1111/cgf.14443
Proposal to include data collection decisions as choices within the multiverse: Harder, J. A. (2020). The multiverse of methods: Extending the multiverse analysis to address data-collection decisions. Perspectives on Psychological Science, 15(5), 1158-1177. https://journals.sagepub.com/doi/abs/10.1177/1745691620917678
Proposal on how the multiverse approach can be implemented in student research projects: Heyman, T., & Vanpaemel, W. (2022). Multiverse analyses in the classroom. Meta-Psychology, 6. https://open.lnu.se/index.php/metapsychology/article/view/2718
Overview of different packages for specification curve/multiverse analysis: https://urisohn.com/specification-curve/

Papers that used ‘specr’

If you have published a paper in which you used specr and you would like to be included in the following list, please send an email to Philipp.

Akaliyski, P., Minkov, M., Li, J., Bond, M. H., & Gehring, S. (2022). The weight of culture: Societal individualism and flexibility explain large global variations in obesity. Social Science & Medicine, 307. https://doi.org/10.1016/j.socscimed.2022.115167
Ballou, N., & van Rooij, A. J. (2021). The relationship between mental well-being and dysregulated gaming: a specification curve analysis of core and peripheral criteria in five gaming disorder scales. The Royal Society Open Science. https://doi.org/10.1098/rsos.201385
Ballou, N., & Zendle, D. (2022). “Clinically significant distress” in internet gaming disorder: An individual participant meta-analysis. Computers in Human Behavior, 129. https://doi.org/10.1016/j.chb.2021.107140
Burton, J.W., Cruz, N. & Hahn, U. (2021). Reconsidering evidence of moral contagion in online social networks. Nature Human Behaviour. https://doi.org/10.1038/s41562-021-01133-5
Cantone, G. G., & Tomaselli, V. (2023). Theory and methods of the multiverse: an application for panel-based models. Quality & Quantity, 1-34.
Cantone, G. G., & Tomaselli, V. (2024). Characterisation and Calibration of Multiversal Models. Preprint: https://osf.io/download/6627aeecc5851a0791f66f6c/
Cosme, D., & Lopez, R. B. (2023). Neural indicators of food cue reactivity, regulation, and valuation and their associations with body composition and daily eating behavior. Social Cognitive and Affective Neuroscience, 18(1). https://doi.org/10.1093/scan/nsaa155
Del Giudice, M., & Gangestad, S. W. (2021). A Traveler’s Guide to the Multiverse: Promises, Pitfalls, and a Framework for the Evaluation of Analytic Decisions. Advances in Methods and Practices in Psychological Science. https://doi.org/10.1177/2515245920954925
De Vries, I., Baglivio, M., & Reid, J. A. (2024). Examining individual and contextual correlates of victimization for juvenile human trafficking in Florida. Journal of interpersonal violence, 08862605241243332. https://journals.sagepub.com/doi/abs/10.1177/08862605241243332
Haehner, P., Kritzler, S., & Luhmann, M. (2023). Can Perceived and Objective-Descriptive Event Characteristics Explain Individual Differences in Changes in Subjective Well-Being After Negative Life Events? A Specification Curve Analysis.
Henson, P., Rodriguez-Villa, E., Torous, J. (2021). Investigating Associations Between Screen Time and Symptomatology in Individuals With Serious Mental Illness: Longitudinal Observational Study Journal of Medical Internet Research, 23(3), e23144. https://doi.org/10.2196/23144
Huang, S., Lai, X., Zhao, X., Dai, X., Yao, Y., Zhang, C., & Wang, Y., (2022). Beyond screen time: Exploring associations between types of smartphone use content and adolescents’ social relationships. International Journal of Environmental Research and Public Health, 19, 8940. https://doi.org/10.3390/ijerph19158940
Jones, A., Petrovskaya, E., & Stafford, T. (2024). Exploring the multiverse of analysis options for the alcohol Stroop. Behavior Research Methods, 1-11. https://link.springer.com/article/10.3758/s13428-024-02377-5
Kleinert, M. (2024). Reconsidering the Relationship Between Anti-immigration Attitudes and Preferences for the AfD Using Implicit Attitudes Measures. Politische Vierteljahresschrift, 65(1), 71-98.
Kritzler, S., & Luhmann, M. (2021, March 25). Be Yourself and Behave Appropriately: Exploring Associations Between Incongruent Personality States and Positive Affect, Tiredness, and Cognitive Performance. https://doi.org/10.31234/osf.io/9utyj
Mao, Z. F., Li, Q. W., Wang, Y. M., & Zhou, J. (2024). Pro-religion attitude predicts lower vaccination coverage at country level. Humanities and Social Sciences Communications, 11(1), 1-9.
Masur, P. K. (2021). Understanding the Effects of Conceptual and Analytical Choices on ‘Finding’ the Privacy Paradox: A Specification Curve Analysis of Large-Scale Survey Data. Information, Communication & Society. https://doi.org/10.1080/1369118X.2021.1963460
Masur, P. K., & Ranzini, G. (2024). Privacy Calculus, Privacy Paradox, and Context Collapse: A Replication of Three Key Studies in Communication Privacy Research. SocArXiv. https://osf.io/preprints/socarxiv/8tr2k
Prasad, S., Knight, E. L., Sarkar, A., Welker, K. M., Lassetter, B., & Mehta, P. H. (2021). Testosterone fluctuations in response to a democratic election predict partisan attitudes toward the elected leader. Psychoneuroendocrinology, 133, 105396.
Rauvola, R. S., & Rudolph, C. W. (2023). Worker aging, control, and well-being: A specification curve analysis. Acta Psychologica, 233.
Sekścińska, K., Jaworska, D., & Rudzinska‐Wojciechowska, J. (2024). The effect of state and trait power on financial risk taking: The mediating and moderating roles of focus on rewards versus threats. Journal of Behavioral Decision Making, 37(1), e2363. https://doi.org/10.1002/bdm.2363
Tisdall, L., & Mata, R. (2023). Age differences in the neural basis of decision-making under uncertainty. Cognitive, Affective, & Behavioral Neuroscience. https://doi.org/10.3758/s13415-022-01060-6
Tünte, M. R., Hoehl, S., Wunderwald, M., Bullinger, J., Boyadziheva, A., Maister, L., … & Kayhan, E. (2023). Respiratory and Cardiac Interoceptive Sensitivity in the First Two Years of Life. eLife, 12. https://elifesciences.org/reviewed-preprints/91579
van Veelen, H.P.J., Ibáñez-Álamo, J.D., Horrocks, N.P.C. et al. (2023). Cloacal microbiota are biogeographically structured in larks from desert, tropical and temperate areas. BMC Microbiol 23(40). https://doi.org/10.1186/s12866-023-02768-2
Visontay, R., Mewton, L., Sunderland, M., Bell, S., Britton, A., Osman, B., … & Slade, T. (2023). A comprehensive evaluation of the longitudinal association between alcohol consumption and a measure of inflammation: Multiverse and vibration of effects analyses. Drug and Alcohol Dependence, 247. https://doi.org/10.1016/j.drugalcdep.2023.109886
Wang, Y., Pitre, T., Wallach, J. D., de Souza, R. J., Jassal, T., Bier, D., … & Zeraatkar, D. (2024). Grilling the data: Application of specification curve analysis to red meat and all-cause mortality. Journal of Clinical Epidemiology, 111278. https://doi.org/10.1016/j.jclinepi.2024.111278
Yu, R. P. (2024). Divides in News Verification: Antecedents and Political Outcomes of News Verification by Age. Digital Journalism, 1-21. https://www.tandfonline.com/doi/abs/10.1080/21670811.2024.2314582
Yuan, Q., Li, H., Du, B., Dang, Q., Chang, Q., Zhang, Z., … & Guo, T. (2023). The cerebellum and cognition: further evidence for its role in language control. Cerebral Cortex, 33(1), 35-49. https://doi.org/10.1093/cercor/bhac051

specr's People

Contributors

Stargazers

Watchers

Forkers

dcosme apoorvalal cbjrobertson tomstafford clarenceke cretecht

specr's Issues

Interactions question

Hello,

Incredible tool -- thank you very much.

A question about interactions. I am having trouble with specifying an interaction of a control with an x variable. If I include the "x_var:control1" in the controls specification, it removes either that specific x_var or the specific controls (even the non-interactive "control1"). It looks like I can include it in the add to formula, but then it includes it across all specifications. Is there some way to specify this in the specs()?

Thank you. If this is not clear, I can make try to make a reproducible example.

run_specs (and run_spec) respectively should allow for customized model functions

Currently, run_specs only uses glm()to estimate the relevant statistics. It should be possible to run any type of model by providing a "model function", e.g.;

mymodel <- function(df, formula){
glm(formula, data = df, family = "binomial")
}

to run_specs() or even several different model functions.

Progress bars

The run_specs() function should include some sort of progress visualization. Otherwise, one does not know whether the function is working or not.... ;)

Run a list of predefined specifications

Hi there,

I am trying to run a specification curve analysis on a predefined list of specifications that I generate using my own code. This list is a subset of the combinations of a very large set of variables. Theoretically, I could run analyses for my list of specifications by running all combinations and then subsetting the output to include only the relevant specifications, but I think that this is computationally prohibitive. Ideally, I would like to use a dataset of the same format as the output of the setup_specs() function as input for the run_specs() function. This would provide more control with respect to the list of specifications that are run. Is this currently possible or should I fork and modify the code?

I am thinking about something like this:

df_specs <- my_own_setup_specs(...)
run_specs(df_specs)

Guidance would be much appreciated!

np Package errors

Hello,

I'm running specr with the np package and encountering an issue.
Here's the code

kde.reg <- function(formula, data){
  bw <- npregbw(formula=formula,
                data = data,
                nmulti = 5,
                ckertype = "epanechnikov", ukertype = "liracine",
                regtype="lc", bwmethod="cv.aic")
  npreg(bw)
}

test_specs <- setup(data = test_df,
                    y = c("y1", "y2"),
                    x = c("x1", "x2", "x3"),
                    model = "kde.reg"
                    )

summary(test_specs, rows = 50)
test_results <- specr(test_specs)

I've attached a snippet of the code error below for context. I'm familiar with the error message and have solved it before by selecting the data for the explanatory variables. However, in specr, selecting inside of the model function causes errors of its own. Would this mean that specr isn't compatible with np or is there a way around this issue?

UPDATE: I've narrowed down the issue. In spec.R, at line 182, when it calls kde.reg the formula it passed in (formula = structure("y1 ~ x2 + 1", class = c("glue", "character"))) seems to be causing an error. I recreated the error by calling the npregbw with this formula format. When I replaced it with y1 ~ x2 + 1 as a standalone formula, the error was resolved. Even when including the structure(list...)) argument that spec.R creates. Although, I'm still unclear on how to resolve this issue.

UPDATE AGAIN: I was able to resolve the error by defining formula like formula=as.formula(formula), since specr defines it as a character when passed into the function. However, I'm running into a tidy error now. Working with tidy on that, but won't delete this for now.

Categorical independet variable

Not sure if this is truly an issue or just a limitation. I can't seem to run a specr analysis with independent variables that are categorical. Tried using as.factor and as.character, none works. I tried two different datasets to be sure. Is it a limitation or am I missing something?

Feature request: sets of control variables

When I'm considering which control variables to add, I'm often considering one set or another, not one variable at a time. Would you consider a syntax that supports sets of controls?

Another approach would be to consider every possible subset of controls, but the number of possible combinations increases pretty fast.

Here's an example:

library(specr)

# Currently, this code considers models with c1, c2, and c3 individually,
# no controls, and all three together.
results <- run_specs(
  df = dplyr::mutate(example_data, c3=runif(dplyr::n())), 
  y = c("y1", "y2"), 
  x = c("x1", "x2"), 
  model = c("lm"), 
  controls = c("c1", "c2", "c3")
)
dplyr::distinct(results, controls)
#> # A tibble: 5 x 1
#>   controls     
#>   <chr>        
#> 1 c1 + c2 + c3 
#> 2 c1           
#> 3 c2           
#> 4 c3           
#> 5 no covariates

# But what if these models with only one control variable are
# not in my set of "all plausible specifications"?
# Maybe I would consider c1 + c2 or c1 + c3.
results2 <- run_specs(
  df = dplyr::mutate(example_data, c3=runif(dplyr::n())), 
  y = c("y1", "y2"), 
  x = c("x1", "x2"), 
  model = c("lm"), 
  controls = list(c("c1", "c2"), c("c1", "c3"))
)
#> Error in model.frame.default(formula = "y1 ~ x1 + c(\"c1\", \"c2\") + c(\"c1\", \"c3\")", : variable lengths differ (found for 'c("c1", "c2")')

Problem treating x´s as pairs

Hi!

I am interested in running a specification curve based on a multilevel model in which I use the within centered x variable as well as the cluster mean. Therefore I would like for specr to treat my two variables: within_centered and cluster_mean as x´s that are always present together.

Looking at the response to issue 11 I figured that the below code might do the trick.

results <- run_specs(df = centered,
                      y = c("y"),
                      x = c("within_centered + cluster_mean"), 
                      model = c("lmer_ri_1"),
                      controls = c(var1, var2, var3),
                      keep.results = T)

The results of this code is however an empty dataframe.

Do you know why? And is it possible to do what I am intending?

Best regards,
Toric

specs using weighted survey data

Is there a way to specify the inclusion of weights in the specifications?

I'm replying on cross-national survey data (European Social Survey) where combined post-stratifcation and country population weight is required to produce representative results.

Testing the effect of an interaction term

Hello, love the package so far - thanks for your work on it!

I want to add an interaction term to my specification curve analysis. How do I do this?

My current set-up is as follows.

setup(data = dat,
           x = c("lonely_mostly_or_not", "hinctnta"),
           y = c("mean_p_eff_cgg_dpr_z", "mean_p_eff_dpr_z"),
           model = c("lmer"),
           controls = c("agea", "icgndra", "jbexpnt", "health_problems"),
           add_to_formula = "(1 | nuts1)")

I want to test the interaction between lonely_mostly_or_not and hinctnta. I have tried specifying it as both an x variable (lonely_mostly_or_not * hnctnta) and I've tried to add it to the add_to_formula argument (lonely_mostly_or_not * hnctnta + (1 | nuts1)). Neither seemed to work, and I didn't see anything about this in the documentation (but I am probably missing something obvious!).

Thanks for your help!

How to customise the colours in specr plots

Thank you once again for building this great package. I have read the Alternative way to visualize specification results section, but I want to know how to change the colours in type (i.e. curve, results, and samplesizes) as part of the 'standard' output plot.

For curve and results, I did + scale_color_manual(values = c("#HEXCODE")) which worked great. However, for samplesize, although it looks like a geom_bar, I am not sure why scale_fill_manual (or equally scale_color_manual) isn't working? It just returns the default grey colour.

Error when using `subsets` argument in `run_specs`

When running the example:

results<- run_specs(df = example_data,
y = c("y1", "y2"),
x = c("x1", "x2"),
model = c("lm", "lm_gauss"),
controls = c("c1", "c2"),
subsets = list(group1 = unique(example_data$group1),
group2 = unique(example_data$group2)))

I get the following error:

Error in get(as.character(..2)) : object '1' not found

This occurs with my own code and with the example in the vignette.

Models are not found when they are elements of a list.

I have a model called PQP

This code:

run_specs(df = db$db1,
          y = c("y"),
          x = c("x_000"),
          model = "PQP"
          ) -> a

runs with no errors.

Now I enlist PQP in the list models.
This code:

run_specs(df = db$db1,
          y = c("y"),
          x = c("x_000"),
          model = "models$PQP"
          ) -> a

gives back this error:

Error in `dplyr::mutate()`:
! Problem while computing `res = map2(.data$model, formula,
  ~do.call(.x, list(data = df, formula = .y)))`.
Caused by error in `models$PQP`:
! could not find function "models$PQP"

UPDATE:

TimTeaFan provided a debug:

get_model <- function(x) {
  x_str <- str2lang(x)
  if (is.name(x_str)) {
    return(x)
    } else if (is.call(x_str)) {
    eval(x_str)
  }
}

Then specr:::run_spec is corrected like this:

specs %>%
  dplyr::mutate(formula = pmap(.,
                               specr:::create_formula)
  ) %>% tidyr::unnest(formula) %>% 
  dplyr::mutate(res = map2(model,
                           formula,
                           ~ do.call(get_model(.x), list(data = df, formula = .y))))

Request: order by specification

speccurve for Stata allows you to order the plot by specification, instead of coefficient size:

This would be a nice feature to have.

Update plotting functions

Create functions that allow to plot each part of the overall plot individually (including full functionality of ggplot2)
Wrapper functions that combines plots (e.g., using cowplot) and allows to further customize the appearance.

Request: allow weighting and clustering

I want to use weights and clustering using fixest::feols, but this uses a comma in the formula, which seems to break specr.

library(dplyr)
library(fixest)

example_data <- example_data %>%
  mutate(weight = runif(n()))

feols(y1~x1+x2 | group2, weights=~weight, data=example_data)
feols(y1~x1+x2 | group2, cluster=~group1, data=example_data)
feols(y1~x1+x2 | group2, cluster=~group1, weights=~weight, data=example_data) # these work

fe <- function(formula,data){
  formula <- as.formula(paste0(formula, "|group2"))
  feols(formula,data)
}

cluster_weight <- function(formula,data) {
  formula <- as.formula(paste0(formula, "|group2,cluster=~group1,weights=~weight"))
  feols(formula,data)
}

results_fe <- run_specs(df = example_data, 
                     y = c("y1", "y2"), 
                     x = c("x1", "x2"), 
                     model = c("fe"),
                     controls = c("c1", "c2"))
plot_specs(results_fe) # this works

results_cw <- run_specs(df = example_data, 
                     y = c("y1", "y2"), 
                     x = c("x1", "x2"), 
                     model = c("cluster_weight"),
                     controls = c("c1", "c2"))

Running the clustering/weighting model, I get this error:

Error in run_spec with input res

Hi,

Thanks for the wounderful specr package - I really love the plot and its compatibility with custom model functions so I am very much looking forward to using specr in my forthcoming papers!

However, I am currently running into problems when specifying my custom model function. When trying to run a specification analysis with an underlying spatial econometrics model from the splm packe, the run_spec() function returns the following error message:

Error (Test.R#216): Problem with 'mutate()' input 'res'.
x $ operator is invalid for atomic vectors
i Input 'res' is 'map2(.data$model, formula, ~do.call(.x, list(data = df, formula = .y)))'.

Here a reproducible example:

library(plm)
library(splm)
library(specr)
data(Produc, package = "plm")
data(usaww, package = "splm")

spml_specr <- function (formula, data,...){
  spml(formula = formula, data = data, listw=mat2listw(usaww),
             effect = "twoways", model = "within")
}

results <- run_specs(df=Produc,
                          y=c("gsp", "log(gsp)"),
                          x= c("pcap","log(pcap)"),
                          model = "spml_specr",
                          controls = c("log(pc)","log(emp)","unemp"))

In my debugging attempts, I have already specifyied a costum tidy function and I also re-installed the specr package with all dependencies. However, the error persists. I really would like to adjust my costum model function spml_specr() to connect spml() with specr. However, I just can't figure out the reason for the error message. The run_specr() function is impressive, but I am not so familiar with pipe operators and the map() functions to be able to trace the origin of the error.

Can anybody please give me a hint? I would appreciate it very much. Thanks a lot!

Calculate models for all combinations of covariates

Hi there,

I am trying to run specifications for all possible combinations of covariates. Currently I only get specifications for each individual covariate and then an additional one that has all covariates. I would like to look at all possible subsets as well.

As a workaround, I have generated all combinations and pasted them into model formula strings using helper code I wrote myself (e.g.: 'cov1 + cov2 + cov3', 'cov1 + cov2', 'cov1 + cov2', etc.). When passing this to the covariates arguments it seems to calculate the respective models. However, there are two issues: 1) It crashes once I use a high number of model specifications (e.g. >4000). 2) In the plot it doesn't correctly show which covariates were used. Instead, it always just colors the "all_covariates" option.

e.g.

covariates <- c("airport_dist", "conservative", 'male', 'age', 'popdens',
                'manufact', 'tourism', 'academics', 'medinc', 'healthcare',
                'y_centroid_county', 'evanrate')


spec_results <- run_specs(df = df_us_slope_prev, 
                     y = c("onset_prev"), 
                     x = c("pers_e"), 
                     model = c("lm"), 
                     controls = covariates)

dim(spec_results)

gives me dimensions of 14x12 but it shoould be 4095x12 if all combinations were considered.

Guidance would be greatly appreciated!

Request: Perform joint test across specification curves

Hi there,

some authors suggest to run an additional inference statistical test on the distribution of effect sizes and test statistics across specification curves. E.g. mean/median effect size different from zero, share of significant results higher than what would be expected under H0, average test statistic different from what would be expected under H0.

Source: Simonsohn, Uri, Joseph P. Simmons, and Leif D. Nelson. “Specification Curve Analysis.” Nature Human Behaviour 4, no. 11 (November 2020): 1208–14. https://doi.org/10.1038/s41562-020-0912-z.

What is your opinion on this? Would it make sense to implement (some of) these tests in specr? Can you suggest any current workarounds? Your response would be greatly appreciated.

Best wishes,
Heinrich

Customizable output statistics

Still room for improvement in this regard.
At the moment, we use tidyr::broom to summarize the model statistics, which is a good starting point. However, it would be great if this would be the standard procedure and run_specs() would further allow to customize the output.

Parallelization of the functions using `furrr`

run_specs()currently uses only one core. In the long run, we should aim at making it as fast as possible (given the large numbers of specifications that one usually wants to estimate).

Added support for `lmer` and `glmer` mixed model estimation

Currently, run_specs does not appear to support specification of models with random effects (i.e. glmer or lmer models). I note that since #3, run_specs supports customisable glms, but would love to see a random effects parameter added to run_specs, so e.g. the following usage would be possible:

my_glmer <- function(formula,data){
    glmer(formula, data = data, family = binomial)
}

results <- run_specs(df = example_data,
                      y = c("y1", "y2"),
                      x = c("x1", "x2"),
                      model = c("lm"),
                      controls = c("c1", "c2"),
                      random_groups = c("group1"),
                      random_variance_components = c("1","x1","c1"),
                      subsets = list(group1 = unique(example_data$group1),
                                    group2 = unique(example_data$group2)))

where random_groups specified the level 2 grouping variables, and random_variance_components specified whether to specify randomly varying intercepts only, or slopes also for listed variables.

Singleton control set produces duplicates

When the control set is a singleton, run_specs() outputs duplicates.

library(dplyr)

results <- run_specs(
  df = example_data,
  y = c("y1"),
  x = c("x1"),
  model = c("lm"),
  controls = c("c1")
  # controls = c("c1","c2")
)
plot_specs(results,choices=c("x","y","controls"))
distinct(results,across(x:subsets)) # grab distinct rows

My guess is that it's doing the combination category (ie, c1+c1), which is the same as c1 by itself.

Problem running PGLM function, even though PLM works fine

Hi!

I work with panel data and have therefore specified a few different custom functions using the PLM package. I input these as the models in run_specs() and it works just fine.

Example:

within.plm.func <- function(formula, data) {
  plm(formula = formula, 
      data = data,
      model = "within")
}

I am, however, interested in running a negative binomial panel data analysis, and for that I need the PGLM package where I can specify the family = negbin, also.

Example: within.pglm.func <- function(formula, data) {
  pglm(formula = formula, 
      data = data,
      model = "within",
      family = negbin)
}

When I use this function in run_specs() I get the following error:

Error: Problem with `mutate()` input `coefs`.
x No tidy method recognized for this list.
i Input `coefs` is `map(.data$res, broom::tidy, conf.int = TRUE, conf.level = conf.level)`.

Run `rlang::last_error()` to see where the error occurred.
In addition: Der var 50 eller flere advarsler (brug warnings() for at se den første 50).

I have tried to install the development package, but the error persists.
If you know what causes it or how to fix it, I would greatly appreciate any input.

Best regards,
Toric

Variance decomposition

As an alternative visualization/analysis, provide variance components for the specs and the variability of a predefined parameter.

How does specr handle dummy coded categorical variables

This is probably a dumb question but how does specr handle dummy coded variable for regression analyses. For example, let's say I dummy coded ethnicity into several new variables and the biggest group as the reference category (coded as 0). How does specr treat this when treating it as a covariate.

df <- df %>%
mutate(
ethnicity_white_european_heritage = ifelse(ethnicg == 1, 1, 0),
ethnicity_black_caribbean_heritage = ifelse(ethnicg == 2, 1, 0),
ethnicity_black_african_heritage = ifelse(ethnicg == 3, 1, 0),
ethnicity_any_other_ethnic_minority = ifelse(ethnicg == 4, 1, 0),
ethnicity_indian = ifelse(ethnicg == 5, 1, 0),
ethnicity_pakistani = ifelse(ethnicg == 6, 1, 0),
ethnicity_bangladeshi = ifelse(ethnicg == 7, 1, 0),
ethnicity_mixed_race = ifelse(ethnicg == 8, 1, 0)
)

There's no scenario when you would only consider 1/2/3/4 of the n number of new ethnicity variables on their own as covariates. All the dummy coded variables would have to be input together right?

Combinations of subsets

If you define multiple subset variables, you should get separate specs for (a) none, (b) individual subsets, and (c) all combinations of subsets.

run_specs following setup_specs

For my analysis, there's a certain control variable which I want to be included in all specifications.
This is very easy to do after the fact by filtering for only rows where it appears:

 specr_results %>% 
  filter(str_detect(string = controls, pattern = fixed("important_control_var")))

When specr_results is small, this is ok. However, when specr_results is large, making it actually estimate the models without this crucial control variable is redundant. I thought to use setup_specs to setup my estimation tibble, filter that and then run_specs on that, thus reducing computation time immensely.

However, it seems that setup_specs is for illustration purposes only(?). There's no way that I see to pass on a setup_specs tibble to run_specs.

If I'm right, I suggest that you add an option to run_specs to get a setup_specs object as input OR to get x,y, controls etc.

P.S

Thanks for a great package!

Error message in Specr

Hi there,

When I try to run through the analysis in the specr package using the example_data or my own data, everything is fine until I try the run_specs command, in which I get the following error message:

> results <- run_specs(df = example_data, 
+                      y = c("y1", "y2"), 
+                      x = c("x1", "x2"), 
+                      model = c("lm", "lm_gauss"), 
+                      controls = c("c1", "c2"), 
+                      subsets = list(group1 = unique(example_data$group1),
+                                     group2 = unique(example_data$group2)))
Error: Problem with `mutate()` input `..1`.
✖ Column `obs` not found in `.data`
ℹ Input `..1` is `.data$obs`.
Run `rlang::last_error()` to see where the error occurred.
> rlang::last_error()
<error/dplyr_error>
Problem with `mutate()` input `..1`.
✖ Column `obs` not found in `.data`
ℹ Input `..1` is `.data$obs`.
Backtrace:
  1. specr::run_specs(...)
 33. dplyr:::stop_error_data_pronoun_not_found(...)
 34. dplyr:::stop_dplyr(index, dots, fn, problem = msg, .dot_data = TRUE)
>

If you have any idea why this error is coming up or how to solve the issue that would be greatly appreciated. Thanks!

Erin

Don't get "no covariates" in fixed effects regression

I'm using fixest::feols to run a fixed effects regression, but the "no covariates" row doesn't show up.

library(dplyr)
library(fixest)
library(specr)

test_formula <- function(formula,data) {
  formula <-as.formula(paste0(formula, "|group1")) # fixed effects for group1
  feols(formula,data)
}
results <- run_specs(
  df = example_data,
  y = c("y1", "y2"),
  x = c("x1", "x2"),
  model = 'test_formula',
  controls = c("c1", "c2")
)

plot_specs(results)

I notice that it works if the regression formula includes a variable before the fixed effects section:

test_formula2 <- function(formula,data) {
  formula <-as.formula(paste0(formula, "+ c3 |group1")) # control for c3 in all specifications
  feols(formula,data)
}
results2 <- run_specs(
  df = dplyr::mutate(example_data, c3 = runif(dplyr::n())),
  y = c("y1", "y2"),
  x = c("x1", "x2"),
  model = 'test_formula2',
  controls = c("c1", "c2")
)

plot_specs(results2)

Bug: `plot_specs()` plot is uninterpretable if there are many covariates.

Thank you for your work on this package and sharing it with everyone.

I'm running in to an issue where I have a long list of control variables, which causes the output of plot_specs() to be cut off and uninterpretable. See reprex below, but imagine that there at least 20 more covariates after the 3 listed. Would it be possible to rename this long label for controls as simply "all covariates", to correspond with the label "no covariates"?

library(tidyverse)
library(specr)

results_test <- 
  run_specs(iris,
            x = c("Sepal.Length"),
            y = c("Sepal.Width"),
            controls = c("Petal.Length", "Petal.Width", "Species"),
            model = "lm")

plot_specs(results_test)

^{Created on 2022-08-15 by the reprex package (v2.0.1)}

Show choices together with options on the left?

Thanks for this great package! I understand that the charts are based on ggplot2 defaults - but would it be possible to shift the choices to be displayed together with their levels on the left? Given how closely they belong together, that would seem to make it much more readable, as in the example below (unfortunately created with base R plots, code here?

masurp / specr Goto Github PK

specr's Introduction

specr

Conducting and Visualizing Specification Curve Analyses

News

What is specr?

Disclaimer

Installation

Usage

How to cite this package

References

Other resources on multiverse/specification curve analyses

Papers that used ‘specr’

specr's People

Contributors

Stargazers

Watchers

Forkers

specr's Issues

Recommend Projects

Recommend Topics

Recommend Org