ngreifer / cobalt Goto Github PK
View Code? Open in Web Editor NEWCovariate Balance Tables and Plots - An R package for assessing covariate balance
Home Page: https://ngreifer.github.io/cobalt/
Covariate Balance Tables and Plots - An R package for assessing covariate balance
Home Page: https://ngreifer.github.io/cobalt/
After creating weights with weightitMSM, bal.tab yields values of Max.Corr.Adj that are > 1, which does not make any sense.
What could be the cause? (just reporting a subset of covariates in the table)
Many thanks!
psDRYEXTRA<-weightitMSM(formula.list =FormulaListDRYEXTRA,
data=FBS,
method = "ps",verbose = T)
bal.tab(psDRYEXTRA, r.threshold = .05, disp.ks = TRUE, which.time = .none)
Balance summary across all time points
Times Type Max.Corr.Adj R.Threshold Max.KS.Adj
pop 1, 2, 3, 4 Contin. 0.0500 Balanced, <0.05 0.2977
city_tt 1, 2, 3, 4 Contin. 1.5959 Not Balanced, >0.05 0.5807
capdist 1, 2, 3, 4 Contin. 1.1841 Not Balanced, >0.05 0.6695
distnearestcountry 1, 2, 3, 4 Contin. 1.7098 Not Balanced, >0.05 0.6100
distownborders 1, 2, 3, 4 Contin. 3.0464 Not Balanced, >0.05 0.2960
Balance tally for treatment correlations
count
Balanced, <0.05 6
Not Balanced, >0.05 36
Variable with the greatest treatment correlation
Variable Max.Corr.Adj R.Threshold
distownborders 3.0464 Not Balanced, >0.05
Effective sample sizes
- Time 1
Total
Unadjusted 5000.
Adjusted 6.57
- Time 2
Total
Unadjusted 5000.
Adjusted 6.57
- Time 3
Total
Unadjusted 5000.
Adjusted 6.57
- Time 4
Total
Unadjusted 5000.
Adjusted 6.57
Sorry, everything is Ok for version 4.0.0 with R 3.6.2, the errors appeared for v. 3.9.0 with R 3.5.2.
I have installed cobalt 3.9.0 (for R 3.5.2).
NAs are mistreated in the covariates by bal.tab.formula().
When I have some NAs in a covariate, I get the error message from bal.tab.formula():
Error in `[.data.frame`(C, !vapply(C, all_the_same, logical(1L))) :
undefined columns selected
Besides, when a covariate contains only one value, I also get the error message:
Error: All variables in formula must be variables in data or objects in the global environment.
I tried to run the following code using the test dataset within the cobalt
package:
library("mice"); library("MatchThem")
data("lalonde_mis", package = "cobalt")
#Generate imputed data sets
m <- 10 #number of imputed data sets
imp.out <- mice(lalonde_mis, m = m, print = FALSE)
#Matching for balance on covariates
mt.out <- matchthem(treat ~ age + educ + married +
race + re74 + re75,
datasets = imp.out,
approach = "within",
method = "nearest",
link = "logit",
estimand = "ATT")
bal.tab(mt.out)
However, I get the following error:
Error in imp.complete(mimids$others$source) : 'data' not of class 'mids'
Is there something I'm missing? I tried tracing the error but couldn't exactly find where the check happens for imp.complete.
Hello,
As far as I know, there is no easy way to access the mean covariate balance across times, only the max is available. I'm guessing that could be an easy thing to add to the function?
Minimal example:
library(cobalt)
data("iptwExWide", package = "twang")
library(WeightIt)
Wmsm <- weightitMSM(list(tx1 ~ use0 + gender + age,
tx2 ~ use0 + gender + age + use1 + tx1,
tx3 ~ use0 + gender + age + use1 + tx1 + use2 + tx2),
data = iptwExWide,
method = "ps")
baltab <- bal.tab(Wmsm, un = T)
baltab$Balance.Across.Times
Times Type Max.Diff.Un Max.Diff.Adj
prop.score 1, 2, 3 Distance 0.7862446 0.025135867
use0 1, 2, 3 Contin. 0.2667626 0.055835400
gender 1, 2, 3 Binary 0.2944634 0.026293838
age 1, 2, 3 Contin. 0.3798713 0.070253208
use1 2, 3 Contin. 0.1662348 0.031572818
tx1 2, 3 Binary 0.1694514 0.017114709
use2 3 Contin. 0.1086601 0.031463385
tx2 3 Binary 0.2422819 0.008532322
So we get the Max here, but not the mean value. Are you aware of a way to compute those values? It seems it is possible to plot them with love.plot
but not to get them directly from bal.tab
.
Cheers!
Hello, Noah Greifer
I am learning how to use 'cobalt' package for balancing samples from tutorials (https://cran.r-project.org/web/packages/cobalt/vignettes/cobalt.html#using-cobalt-with-multi-category-treatments) and I have a question.
With the following command, I gave each ID to 614 samples in 'lalonde' example data.
lalonde$ID <- paste0("ID_", c(1:nrow(lalonde)))
According to the tutorial, we can check "Effective sample sizes" by using bal.tab() function.
The result was as follows :
Effective sample sizes
black hispan white
Unadjusted 243. 72. 299.
Adjusted 138.38 54.99 259.59
I want to get the ID of each of the Adjusted samples.
Could you tell me how to get the IDs of about 451(138+54+259) people?
Yours sincerely,
QANGFQ
Hi! Excited to make use of this tool, but running into some basic issues.
Cobalt version 4.2.0
MatchIt version 3.0.2
R version 3.5.3 (2019-03-11)
OS linux-gnu
The following code generates the following errror:
Error: could not find function "str2expression"
xdata <- data.frame(treat = 1 * (runif(100) <= 0.5),
x1 = rnorm(100, 2, 4),
x2 = rnorm(100, 5, 2))
matching <- MatchIt::matchit(treat ~ x1 + x2, data = xdata, distance = "mahalanobis", replace = TRUE)
cobalt::bal.plot(matching)
Let me know if I can clarify anything.
Thanks.
When I try to save love.plot() plots as WMF or EMF pictures specifying small picture size, e.g.., 3x5 inches, I get a wrong size of the WMF/EMF picture. The plot itself resides in the left upper corner of the picture.
I tried
win.metafile("loveplot1.wmf" ,height=3,width=5)
love.plot(b)
dev.off()
And I tried to save the plot as EMF file from RStudio. The result was the same.
I am trying to find a way of selecting the variables to display in the love.plot().
Following the example in the vignette on love plots, I was hoping to display only 3 variables: age, educ, and married. Is there any way to do that?
In practice, this is useful when including factor variables in the matching procedures (for example industries), and not wanting to display all these dummy variables in the love plot.
I was wondering about the specific formula you use to calculate balance diagnostics for binary variables? I have read and understood your explanation in the function documentation (https://www.rdocumentation.org/packages/cobalt/versions/3.7.0/topics/bal.tab). However, when I check the standardised solution of the function, it does not seem to be consistent with the solution by Austin, 2009 (https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3472075/), which is often used.
So are you calculating a different standardised solution? If so, how?
In the bal.plot()
function I can't pass alpha
argument through to geom_bar()
to change the transparency of the colors. I can't pass position
argument either.
Also love.plot()
for sub-classification/stratification doesn't seem to work?
b_s1 <- bal.tab(f_lin, data = Algebra_dat, subclass = "quintile",
method = "subclassification", disp.subclass = TRUE,
estimand = "ATT",
disp.v.ratio = TRUE, un = TRUE)
love.plot(b_s1)
I get this error. I tried adding facet argument to love.plot()
but that doesn't work.
Error in is_not_null(facet) : object 'facet' not found
In addition, when I create a love.plot I get this warning
Standardized mean differences and raw mean differences are present in the same plot.
Use the 'stars' argument to distinguish between them and appropriately label the x-axis.
What is the stars argument? I can't seem to find it in vignettes.
Either provide errors or do MI balance at each time point with no summary
Thanks for providing the package cobalt. I'm trying to use it with the CBPS package, but I have a problem plotting. I can't get "bal.plot()" to generate density or histogram plots of the object generated by CBPS. Even if I specify the "type" and variable name properly, it returns a scatterplot. All of the examples provided online have tried objects generated by other packages such as MatchIt, but none with CBPS. However, other functions like love.plot() are working fine with CBPS. I was wondering if you've tried density plots for objects generated by CBPS?
I started my issue somewhere else cran/cobalt#2 (comment)
Using the example in CRAN
https://www.rdocumentation.org/packages/cobalt/versions/4.3.0/topics/love.plot
Here is my code and error
library(WeightIt)
library(WeightIt); data("lalonde", package = "cobalt")
w.out1 <- weightit(treat ~ age + educ + race + married + nodegree + re74 + re75,
data = lalonde)
love.plot(w.out1, thresholds = c(m = .1), var.order = "unadjusted")
Error in as.environment(pos) :
no item called "get(".S3MethodsTable.", envir = asNamespace(i))" on the search list
In addition: Warning messages:
1: In get(".S3MethodsTable.", envir = asNamespace(i)) :
restarting interrupted promise evaluation
2: In get(".S3MethodsTable.", envir = asNamespace(i)) :
internal error -3 in R_decompress1
3: In ls(get(".S3MethodsTable.", envir = asNamespace(i)), pattern = name) :
‘get(".S3MethodsTable.", envir = asNamespace(i))’ converted to character string
I wonder how if there is a quick fix, thank you!
I have a list of matchit
objects and want to create love plots from them. I'm getting an error using love.plot()
with purrr::map()
.
library(purrr)
library(cobalt)
data("lalonde")
matchits <- vector(mode = "list")
ps_form <- formula(treat ~ age + educ + black + hispan + married)
matchits$nn.wo <- matchit(ps_form, lalonde, method = "nearest", replace = FALSE)
matchits$nn.wr <- matchit(ps_form, lalonde, method = "nearest", replace = TRUE)
matchits$opt.r1 <- matchit(ps_form, lalonde, method = "optimal", ratio = 1)
# These work
love.plot(matchits$nn.wr)
love.plot(matchits[[1]])
# These do not work
map(matchits, love.plot)
# Error in .f(x = bal.tab(.x[[i]])) : could not find function ".f"
map(matchits, ~love.plot(.))
# Error in mc[["x"]][[1]] : object of type 'symbol' is not subsettable
map(matchits, function(x) love.plot(x))
# Error: covs must be a data.frame of covariates.
If you type
devtools::use_cran_badge()
then you should see some text about how to add a line to the top of your README document that adds a little badge for the CRAN version.
Hi. I try to get a bal.tab with preprocessed output from weightit.
I receive the error message: "All weights are zero when treat = TRUE".
However, this is not the case, as all weights are above 1 and none are NA or NULL or whatever.
I have traced the problem to some odd behaviour of the apply function in combination with the check_if_zero function: The check whether all is zero yields "FALSE" if called outside the apply function and "TRUE" (incorrectly) if called via the apply function.
I use latest versions of weightit, cobalt (installed today) and R.
This is from the debugger which puts me in check_if_zero_weights().
Thanks for your help!
Martin
Browse[1]> version
_
platform x86_64-w64-mingw32
arch x86_64
os mingw32
system x86_64, mingw32
status
major 3
minor 5.1
year 2018
month 07
day 02
svn rev 74947
language R
version.string R version 3.5.1 (2018-07-02)
nickname Feather Spray
Browse[1]> error
[1] "All weights are zero when treat = TRUE."
Browse[1]> problems
[1] TRUE FALSE
Browse[1]> w.t.mat
Var1 Var2
1 weights TRUE
2 weights FALSE
Browse[1]> **problems <- apply(w.t.mat, 1, function(x) all(check_if_zero(weights.df[treat ==
+ x[2], x[1]])))**
Browse[1]> problems
[1] **TRUE** FALSE
Browse[1]> check_if_zero(weights.df[treat == TRUE, "weights"])
[1] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
[17] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
[33] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
[49] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
[65] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
[81] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
[97] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
[113] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
[129] FALSE FALSE FALSE FALSE FALSE
Browse[1]> all(check_if_zero(weights.df[treat == TRUE, "weights"]))
[1] FALSE
Browse[1]> all(check_if_zero(weights.df[w.t.mat[1,2], w.t.mat[1,1]]))
[1] **FALSE**
Browse[1]> weights.df[treat == TRUE,]
[1] 67.307627 79.826155 1.885578 158.026877 45.196684 56.164825 48.752937 30.879905 11.674147
[10] 26.692058 13.295941 82.674590 196.897668 149.248587 52.762289 39.684289 33.170495 39.308733
[19] 53.171570 41.240477 160.372561 194.005269 60.413108 43.963563 15.230456 42.829538 27.463982
[28] 19.865331 50.847341 186.199997 9.753240 68.585247 36.408196 38.549354 38.755218 40.884345
[37] 29.620660 52.601934 111.332990 56.297451 27.031934 101.012349 34.349574 107.525278 78.727894
[46] 11.777890 70.914950 55.277485 51.883375 71.255899 32.319254 42.992511 72.273144 28.642228
[55] 137.954277 34.807268 60.276977 63.115426 68.655834 133.662160 17.536231 94.708934 65.562180
[64] 60.241740 109.878914 79.942162 28.324305 74.347703 66.622288 26.406760 20.897160 69.370021
[73] 52.737898 46.644920 96.783245 47.111526 35.341429 77.041636 77.046557 30.057204 91.398045
[82] 46.837280 94.873180 37.793427 104.106985 21.611831 18.633768 140.601745 21.072106 84.664917
[91] 171.780325 23.068098 65.262950 45.945273 65.830478 13.585935 14.353937 36.560600 77.410477
[100] 42.240395 11.444596 67.281186 25.100079 117.032776 66.714564 190.680325 27.129495 69.194680
[109] 74.293695 28.874397 32.587939 95.918416 27.744732 94.771610 11.792023 83.279133 31.746677
[118] 36.733866 13.132560 66.008024 40.119701 78.070225 16.603842 35.215006 57.132454 44.612056
[127] 20.949391 81.514315 47.458328 15.125913 70.443032 33.938332 54.767617
Under imports in the package DESCRIPTION file, gridExtra (>= 2.3.0) is listed
gridExtra 2.3.0 does not actually exist, 2.3 does.
install.packges()
sees these as the same which is why this has gone unnoticed, however some package managers (yum, for example) do not, which causes issues on installation.
replace gridExtra (>= 2.3.0)
with gridExtra (>= 2.3)
in the DESCRIPTION file.
Hi: I need help trying to install cobalt on my mac:
Error in loadNamespace(i, c(lib.loc, .libPaths()), versionCheck = vI[[i]]) :
namespace ‘grid’ 3.5.0 is already loaded, but >= 3.6.1 is required
I have just reinstalled Rstudio. What is needed to resolve this namespace 'grid' issue?
All help is appreciated!
I have constructed CBPS object using CBPS(.... , sample.weights=mydata$myweight)
. Afterward, cobalt's bal.tab()
function gives odd results. For example, the means from bal.tab()
don't match the means from balance()
function from the CBPS package. I don't think bal.tab()
is incorporating the sample.weights
appropriately. Any suggestions?
Here's some example code to reproduce the problem:
library(CBPS)
data(LaLonde)
LaLonde$wgt <- rnorm(rep(1,nrow(LaLonde)), mean = (rep(1,nrow(LaLonde))+LaLonde$treat*.5), sd = .05)
fit <- CBPS(treat ~ age + educ + re75 + re74 + I(re75==0) + I(re74==0), data = LaLonde, ATT = TRUE, sample.weights = LaLonde$wgt)
balance(fit)
library(cobalt)
bal.tab(fit, disp.means = TRUE)
I have CBPS version 0.17 and cobalt version 3.2.0.
I'd suggest that, for a binary variable, interactions be calculated for the both values of the variable, not just for 1 (for 0-1 variables). Let, say, we have a continuous variable a and a binary variable b. Now only the distribution of ab is assessed. It would be more correct to assess the both distributions: of ab and a(1-b).
Hi there. I'm running bal.tab on the results of a MatchIt run on a dataset of about 120,000 rows. The MatchIt process took about 2 hours to run, producing a matchit object about 229 Mb in size. I tried running bal.tab as follows:
baltab <- bal.tab(m.out1, m.threshold=0.1, binary="std")
and it's taking a long time (still running, currently at over an hour). I was able to run a practice example from the documentation on the lalonde dataset, and that worked fine. I also was able to run matchit on a sample of 2000 rows and ran bal.tab on that (which took about 5 seconds). So I'm confused about why this is taking so long.
I am using R 3.5.1, MatchIt version 3.0.2, and cobalt version 3.6.1.
Thank you!
Edit: I killed the R session after > 4 hours of running, it never seemed to finish.
Hi, the balance table has two distance rows
library(MatchIt)
library(cobalt)
data("lalonde")
m_out <- matchit(treat ~ married, data = lalonde, method = "nearest", distance = "glm")
m_summary <- bal.tab(m_out, un = TRUE)
m_summary$Balance
And output
Type Diff.Un Diff.Adj
distance_0.13725490196086 Distance -0.8240730 0
distance_0.417827298050138 Distance 0.8240730 0
married Binary -0.3236313 0
R version 4.1.0 (2021-05-18), cobalt_4.3.1, MatchIt_4.2.0
When we try to draw a Love plot with no interactions and abs=TRUE, the "_1" are added to the names of 0-1 variables. It has no meaning, especially for abs=TRUE. May be it would be better to remove such "_1" for the variables themselves leaving "_1" for interactions only?
I check the unadjusted standardized difference.
I have the follow table:
user h02 N tp
1: control 0 131071 0.98487421
2: control 1 2013 0.01512579
3: user 0 13904 0.97929286
4: user 1 294 0.02070714
I calculated the standardized difference as Austin:
(0.02070714 - 0.01512579)/sqrt((0.015125790.98487421 + 0.020707140.97929286)/2) = 0.0420858
But bal.tab returns 0.0056
bal.tab(formula = user ~ h02, data = data)
Balance Measures
Type Diff.Un
h02 Binary 0.0056Sample sizes
Control Treated
All 133084 14198
When plotting binary categorical data (eg. sex with the values "male" & "female"), bal.plot()
gives a message The dropped category for [variable] will be set to NA.
leading all bars to be plotted as 100%. This doesn't occur if the variable is recoded as 0/1 or if there are 3 or more possible values. I'm using MatchIt to match, I don't know if this behaviour occurs with other packages.
I'm using cobalt v4.2.4 & MatchIt v3.0.2.
binary categorical variable (male/female)
df <- tibble(sex = sample(c("male", "female"), 100, replace = T),
group = sample(c(0, 1), prob = c(0.7, 0.3), 100, replace = T))
m.out <- matchit(group ~ sex, data = df)
bal.plot(m.out, "sex")
The dropped category for sex will be set to NA.
binary numeric variable (0/1)
df2 <- df %>%
mutate(sex = recode("male" = 0, "female" = 1)
m.out <- matchit(group ~ sex, data = df2)
bal.plot(m.out, "sex")
categorical variable with 3 values (male/female/unknown)
df3 <- tibble(sex = sample(c("male", "female", "unknown"), 100, replace = T),
group = sample(c(0, 1), prob = c(0.7, 0.3), 100, replace = T))
m.out <- matchit(group ~ sex, data = df3)
bal.plot(m.out, "sex")
When you do this then your github repo will be linked directly from your CRAN page.
I think this function would do it all for you (perhaps you need to follow a step or two, but it should explain)
devtools::use_github_links()
Used in Imbens & Rubin, similar scale to SMD so easy to interpret
Factor treatment makes the sample sizes of the groups switch (i.e., labels are not correct), probably due to using 0 and 1 somewhere or an incorrect binarize()
call.
From the twang example:
library(twang)
data(AOD)
mnps.AOD <- mnps(treat ~ illact + crimjust + subprob + subdep + white,
data = AOD,
estimand = "ATE",
verbose = FALSE,
stop.method = c("es.mean", "ks.mean"),
n.trees = 3000)
bal.tab(mnps.AOD)
Error in rep("all", length(errors) - 1) : argument 'times' incorrect
When subset
is all FALSE
, there error it provides is not informative.
For some strange reason, bal.tab
does not work with tibbles. Minimal reproducible example:
df <- tbl_df(lalonde)
treat <- "treat"
outcome <- "re78"
covs <- setdiff(names(df), c(treat, outcome))
covs_df <- dplyr::select(df, -treat, -re78, -nodegree, -married)
bal.tab(covs_df, treat = df[[treat]], method = "weighting")
This results in:
Note: estimand and s.d.denom not specified; assuming ATT and treated.
Error: No names in var.name are names of factor variables in data.
But this works (note that the only difference is that I'm not using tibbles):
data("lalonde", package = "cobalt")
df <- lalonde
treat <- "treat"
outcome <- "re78"
covs <- setdiff(names(df), c(treat, outcome))
covs_df <- dplyr::select(df, -treat, -re78, -nodegree, -married)
bal.tab(covs_df, treat = df[[treat]], method = "weighting")
This gives the desired result:
Note: estimand and s.d.denom not specified; assuming ATT and treated.
Balance Measures:
Type Diff.Un
age Contin. -0.3094
educ Contin. 0.0550
race_black Binary 0.6404
race_hispan Binary -0.0827
race_white Binary -0.5577
re74 Contin. -0.7211
re75 Contin. -0.2903
Sample sizes:
Control Treated
All 429 185
Error in deparse1(substitute(x)) :
impossible de trouver la fonction "deparse1"
Hello,
I want to produce a love plot of the mean covariate balance adjustments from 4 matched samples of the same data (the number of observations are too large to match in their entirety)
I have tried this two ways so far, first by creating a single match object (using the matching package) that has all four samples included and matched seperately as clusters (matchby)
However when I try to use this object in a bal.tab or love.plot call I get an error:
Error in names(object) <- nm :
'names' attribute [400000] must be the same length as the vector [1]
In addition: Warning messages:
1: Deprecated
2: Deprecated
This is my script:
#Matching call with exact matching on sample no.
SP_2010_GLM <- SP_2010_samples_combined %>% glm(formula = FCL_out ~ SFCL + ELC_Dist + Pop + Slope + Precip + Elevation + Cap_dist + Border_dist + Road_dist + Soil, family = binomial())
SP_2010_covs <- subset(SP_2010_samples_combined, select = -c(T_C, Temp, FCL_out, sample_no.))
X1 <- SP_2010_GLM$fitted #the propensity score
Y1 <- SP_2010_samples_combined$FCL_out #the outcome
Tr1 <- SP_2010_samples_combined$T_C #a vector of the treatment
SP_2010_combined_match <- Matchby(Y=Y1, Tr=Tr1, X=X1, by= SP_2010_samples_combined$sample_no., M=1, replace= TRUE, caliper = 0.5, Weight=1, ties = FALSE)
summary(SP_2010_combined_match)
#Call to bal.tab
SP_2010_combined_balance <- bal.tab(SP_2010_combined_match, treat = SP_2010_samples_combined$T_C, cluster = "sample_no.",
distance = X1, covs = SP_2010_covs, un = TRUE, stats= c("mean.diffs", "ks.statistics"))
`
Alternatively I have tried creating a vector of the sperate match objects after performing matching for each of the samples seperately and then introducing this through the 'weights' specification in love.plot as you suggested in another issue:
library(cobalt)
library(purrr)
match_objects <- vector(mode = "list")
match_objects$sample1 <- Sample1_match
match_objects$sample2 <- Sample2_match
match_objects$sample3 <- Sample3_match
match_objects$sample4 <- Sample4_match
match_formula <- SP_2010_sample1 %>%formula(FCL_out ~ SFCL + ELC_Dist + Pop + Slope + Precip + Elevation + Cap_dist + Border_dist + Road_dist + Soil)
love.plot(match_formula, data = SP_2010_sample1, weights = map(match_objects, get.w))
However this call runs a long time without producing a result, where am I going wrong?
Apologies if my explanation is unclear I am still relatively new to R.
Many thanks, Ben.
Hello,
Not really a bug/issue with cobalt
, rather a question about SMDs I'd be grateful if you could help me with.
Following a 1:1 NNM matching, some of the treated subjects are left unmatched. When computing the SMD, cobalt
(with the option s.d.denom = "treated"
) uses the SD in all treated subjects, ie including those unmatched. This is consistent with MatchIt
's behaviour.
In a similar fashion, cobalt
with the option s.d.denom = "pooled"
computes the denominator of the SMDs using the SD in all untreated subjects (matched and unmatched).
I understand that the denominator of a SMD is –at the end of the day– arbitrary: it's just a value used to standardise the MD (duh!) and we could use –in principle– the SD of any population.
However, I wonder if you have any reference that supports the use of just those SDs as opposed to the SDs in the subjects (treated and untreated) who are successfully matched.
> m <- bal.tab(trt ~ x,
+ data = s,
+ method = "weighting",
+ s.d.denom = "pooled",
+ weights= s$w,
+ continuous = "std")
# 1:1 matching, weights are either 0 or 1
> with(s, table(w))
w
0 1
6468 1120
# 560 untreated subjects matched to 560 treated subjects
> m
Balance Measures
Type Diff.Adj
x Contin. 0.005
Effective sample sizes
Control Treated
Unadjusted 6997 591
Adjusted 560 560
> m$Balance
Type M.0.Un SD.0.Un M.1.Un SD.1.Un Diff.Un M.Threshold.Un V.Ratio.Un V.Threshold.Un KS.Un KS.Threshold.Un M.0.Adj
x Contin. 1.362974 0.5868113 2.14297 0.9667456 0.9753969 NA NA NA NA NA 2.050798
SD.0.Adj M.1.Adj SD.1.Adj Diff.Adj M.Threshold V.Ratio.Adj V.Threshold KS.Adj KS.Threshold
x 0.862962 2.054828 0.7274142 0.005040633 NA NA NA NA NA
> smd_pooled <- setNames((m$Balance["M.1.Adj"] - m$Balance["M.0.Adj"]) / sqrt(.5*m$Balance["SD.1.Un"]^2 + .5*m$Balance["SD.0.Un"]^2), nm = "SMD")
> smd_pooled
SMD
x 0.005040633
If I take a constant independent variable, I get the error
Error in relevel.factor(C[[i]], levels(C[[i]])[2]) :
'ref' must be an existing level
Code to reproduce the problem:
n=20
a=sample(2,n,replace=T)-1
b=runif(n)
c=rep(0,n)
l=glm(a~b+c)
matched=sample(2,n,replace=T)-1
b1=bal.tab(a~b+c,weights=matched,method="matching",s.d.denom="pooled")
Previous versions of Cobalt just silently removed constant variables from the analysis. The most convenient way may be that constant variables are removed with a warning, so that the script is not stopped.
Hi,
from the twang example, if treat has character levels, it works:
library(twang)
data(AOD)
mnps.AOD <- mnps(treat ~ illact + crimjust + subprob + subdep + white,
data = AOD,
estimand = "ATE",
verbose = FALSE,
stop.method = c("es.mean"),
n.trees = 3000)
bal.tab(mnps.AOD)
If, however, you add: levels(AOD$treat) <- 1:3
at the beginning, you get
Error in `[.data.frame`(do.call("cbind", unname(lapply(bal.tab.multi.list, :
undefined columns selected
love.plot and bal.tab are not dealing correctly with the formula object when there are inline operations applied to a variable. The bug occurs only when the raw variable is not included in the formula object.
Replicable code:
library(MatchIt)
data("lalonde", package = "cobalt")
#Works
m.out1 <- matchit(treat ~ age + educ + race + married + nodegree + re74 + re75, data = lalonde)
love.plot(m.out1, var.order = "unadjusted")
#Does not work
m.out1 <- matchit(treat ~ log(age) + educ + race + married + nodegree + re74 + re75,data = lalonde)
love.plot(m.out1, var.order = "unadjusted") #KO
#Works again
m.out1 <- matchit(treat ~ log(age) + age + educ + race + married + nodegree + re74 + re75,data = lalonde)
love.plot(m.out1, var.order = "unadjusted")
The behavior is true in both versions: cobalt_4.2.2 and cobalt_4.2.1
Environment for replication (also works on 4.x).
R version 3.6.1 (2019-07-05)
Platform: x86_64-apple-darwin15.6.0 (64-bit)
Running under: macOS Catalina 10.15.5
Matrix products: default
BLAS: /System/Library/Frameworks/Accelerate.framework/Versions/A/Frameworks/vecLib.framework/Versions/A/libBLAS.dylib
LAPACK: /Library/Frameworks/R.framework/Versions/3.6/Resources/lib/libRlapack.dylib
locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
attached base packages:
[1] stats graphics grDevices utils datasets methods base
other attached packages:
[1] cobalt_4.2.1 MatchIt_3.0.2
loaded via a namespace (and not attached):
[1] rstudioapi_0.11 knitr_1.29 magrittr_1.5 MASS_7.3-51.6 tidyselect_1.1.0 munsell_0.5.0 colorspace_1.4-1 R6_2.4.1 rlang_0.4.6 dplyr_1.0.0
[11] tools_3.6.1 grid_3.6.1 gtable_0.3.0 xfun_0.15 htmltools_0.5.0 ellipsis_0.3.1 yaml_2.2.1 digest_0.6.25 tibble_3.0.1 lifecycle_0.2.0
[21] crayon_1.3.4 purrr_0.3.4 ggplot2_3.3.2 vctrs_0.3.1 glue_1.4.1 evaluate_0.14 rmarkdown_2.3 pillar_1.4.4 compiler_3.6.1 backports_1.1.8
[31] generics_0.0.2 scales_1.1.1 pkgconfig_2.0.3
I'd suggest that the names for interactions in love.plot() be taken from the names of variables in the var.names parameter. Now the names for interactions are taken from the names of variables in the code.
It seems that love.plot() with abs=FALSE changes the sign of the adjusted (or of the unadjusted) differences. When I have unadjusted and adjusted differences of different signs, they are displayed as they have one sign.
bal.tab()$Max.Imbalance.Variances$V.Ratio.Adj contains the value with maximal absolute value. So it can be <2 and >0.5 when there are variance ratios <0.5.
The same issue is with summary output. The variable with maximal absolute variance ratio is printed, so it can be balanced even when there are unbalanced variables with respect to variance ratio.
I would also suggest to introduce a simple function like feasible.matching(b), where b=bal.tab(...), which would check whether all variables are balanced. It would be helpful, e.g., for finding feasible calipers for matching.
I am attempting to assess the covariate balance following propensity score weighting implemented in the WeightIt package. When I use method = "ps" in WeightIt, everything functions normally. However, when I use method = "gbm" and try to assess the balance with bal.tab I get the following love plot:
as well as the warning message, "Warning message:
Large mean differences detected; you may not be using standardized mean differences for continuous variables."
I tried to make sure standardized differences were being used with set.cobalt.options(binary = "std", continuous = "std"), but this did not resolve the problem. The difference in prop.score does seem to be sensitive to the stop.method used, but in all cases it's still way larger than I would expect. I'm not sure what else to try, and would greatly appreciate any advice. My code is below, but it's all standard stuff, and again, it works fine when method = "ps" so I'm not sure what's going on here. Thanks much.
weight.gbm <- weightit(RCTflag ~ Urbanicity + Region + GradRate + StTchRatio + TotEnroll + FARMS + StudentN + Grade10 + Grade11 + Grade12, data=mydata.complete, method="gbm", estimand="ATT", stop.method="es.mean")
weight.balance <- bal.tab(weight.gbm, un = TRUE)
weight.balance
love.plot(weight.balance, thresholds = .25, title="GBM weighting")
I created a matchit
object setting distance="mahalanobis"
and exact=~specialty + event_month
, where specialty
is of character type (only two possible values) and event_month
of date type.
Calling summary
on the matchit object correctly returns the balanace statistics:
The event_month
variable was implicitly converted to numeric, while specialty
seems to be converted into 0 or 1s for each possible value. For both variables, summary.matchit
is able to compute a std. mean difference.
However, calling bal.tab
on the matchit object results in the error:
Any advice on how to handle this error?
If I take a logical treatment indicator in the corresponding formula for bal.tab, I get the error:
Error: The argument to treat must be a vector of treatment statuses or the (quoted) name of a variable in data that contains treatment status.
Code to reproduce the problem:
n=20
a=as.logical(sample(2,n,replace=T)-1)
b=runif(n)
l=glm(a~b)
matched=sample(2,n,replace=T)-1
b1=bal.tab(as.integer(a)~b,weights=matched,method="matching",s.d.denom="pooled")
b2=bal.tab(a~b,weights=matched,method="matching",s.d.denom="pooled")
It would be convenient to have bal.tab() accepting logical values as well.
I see that you have a file do_not_include/tests.R
. It doesn't seem to me that this is strictly in the unit testing framework. If it's not, I think a huge improvement would be to implement a unit testing framework, like from the testthat
package.
An easy way to do this is to type in devtools::use_testthat()
, and the directory structure will be added. I'd be happy to help you set up your first couple of tests to get you started. If you're interested, please respond below (but don't close the issue)
Make sure inputs to aes()
are correct.
geom_point
now supports strings, so update that in love.plot
Update facet_grid
with new syntax
Hello, Noah Greifer
I would like to estimate the effect of continuous exposure on binary outcomes.
For my dataset, exposure is 'ADLScaled'; covariates are 'Age', 'sex', 'HT', 'DM', 'Stroke', and 'MI'; outcomes is 'sequela'; and weights are 'swtTrimmed'.
I have run the following R code, but I cannot get adjusted correlation.
library(cobalt)
library(data.table)
dt <- fread('dt_sample.csv')
dt_covs <- dt[, .(Age, sex, HT, DM, Stroke, MI)]
baltab <- bal.tab(x = dt_covs,
data = dt,
treat = 'ADLScaled',
method = 'weighting',
weigths = 'swtTrimmed',
un = T,
thresholds = 0.1
)
The output is as follows, and only 'Corr.Un' is shown (Corr.Adj is not):
Balance Measures
Type Corr.Un R.Threshold.Un
Age Contin. -0.8869 Not Balanced, >0.1
sex_男 Binary 0.0013 Balanced, <0.1
HT Binary -0.5920 Not Balanced, >0.1
DM Binary -0.5876 Not Balanced, >0.1
Stroke Binary -0.5057 Not Balanced, >0.1
MI Binary -0.5535 Not Balanced, >0.1
Balance tally for treatment correlations
count
Balanced, <0.1 1
Not Balanced, >0.1 5
Variable with the greatest treatment correlation
Variable Corr.Un R.Threshold.Un
Age -0.8869 Not Balanced, >0.1
Sample sizes
Total
All 15000
I would appreciate it if you could tell me how to calculate adjusted (weighted) correlation values for continuous exposure.
The dataset is as follows: dt_sample.csv
Sincerely yours,
yohei-h
It seems the requirement on the CRAN page for grid
for version 3.8 of cobalt is what's causing the issue (grid (≥ 3.6.1)
)
Warning in install.packages :
dependency ‘grid’ is not available
Error in loadNamespace(i, c(lib.loc, .libPaths()), versionCheck = vI[[i]]) :
namespace 'grid' 3.5.3 is already loaded, but >= 3.6.1 is required
I am running R 3.5.3 and would prefer not to update to 3.6 yet -- is it possible to fix the dependency issue so that 3.8 can be installed, or is grid 3.6.1 or greater strictly required? If so, could you update the R version requirement from 3.3.0?
Thanks for the excellent package.
Hi, I have a suggestion to add # of units discarded in bal.tab()$Observations when using subclassification with discard.
For example,
library(MatchIt)
library(cobalt)
data("lalonde")
m_out <- matchit(treat ~ age + educ + race + married + nodegree,
data = lalonde, method = "subclass", distance = "glm", discard = "both")
m_summary <- bal.tab(m_out)
m_summary$Observations
Currently it returns
1 2 3 4 5 6 All
Control 297 21 24 15 9 14 429
Treated 31 31 30 31 31 30 185
Total 328 52 54 46 40 44 614
Add a column of "discarded" so that the numbers add up to "All".
1 2 3 4 5 6 Discarded All
Control 297 21 24 15 9 14 49 429
Treated 31 31 30 31 31 30 1 185
Total 328 52 54 46 40 44 50 614
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.