Code Monkey home page Code Monkey logo

ergm's Introduction

ergm: Fit, Simulate and Diagnose Exponential-Family Models for Networks

rstudio mirror downloads cran version Coverage status R build status

An integrated set of tools to analyze and simulate networks based on exponential-family random graph models (ERGMs). 'ergm' is a part of the Statnet suite of packages for network analysis. See Hunter, Handcock, Butts, Goodreau, and Morris (2008) doi:10.18637/jss.v024.i03 and Krivitsky, Hunter, Morris, and Klumb (2023) doi:10.18637/jss.v105.i06.

Public and Private repositories

To facilitate open development of the package while giving the core developers an opportunity to publish on their developments before opening them up for general use, this project comprises two repositories:

  • A public repository statnet/ergm
  • A private repository statnet/ergm-private

The intention is that all developments in statnet/ergm-private will eventually make their way into statnet/ergm and onto CRAN.

Developers and Contributing Users to the Statnet Project should read https://statnet.github.io/private/ for information about the relationship between the public and the private repository and the workflows involved.

Latest Windows and MacOS binaries

A set of binaries is built after every commit to the repository. We strongly encourage testing against them before filing a bug report, as they may contain fixes that have not yet been sent to CRAN. They can be downloaded through the following links:

You will need to extract the MacOS .tgz or the Windows .zip file from the outer .zip file before installing. These binaries are usually built under the latest version of R and their operating system and may not work under other versions.

You may also want to install the corresponding latest binaries for packages on which ergm depends, in particular statnet.common and ergm.count.

ergm's People

Contributors

alexjiahaowang avatar aryakarami avatar chad-klumb avatar cloehle avatar drh20drh20 avatar facorread avatar handcock avatar jeffreyhorner avatar joycecheng avatar kecoli avatar krivit avatar lxwang2 avatar martinamorris avatar mbojan avatar schmid86 avatar sgoodreau avatar skyebend avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

ergm's Issues

Should *cov() terms be able to return matrix predictors?

Given the more sophisticated UI we are implementing, it might make sense to also enable *cov() terms to add (n * p) matrix predictors, adding more than one statistic to the model. For example, nodecov(~poly(age,2)) would return a two-column matrix with orthogonalised age effects, equivalent (up to an affine transformation) to nodecov(~age)+nodecov(~age^2).

Similarly, we might want to be able to interpret factors, so that, say, nodecov(~factor(occupation)) would produce a matrix of dummy variables equivalent to nodefactor(~occupation).

This should be a backwards-compatible change.

Warnings generating on current ergm@master build from source

MHproposals.c:110:3: warning: variable 'logratio' is used uninitialized whenever 'for' loop exits because its condition is false [-Wsometimes-uninitialized]
  BD_LOOP({
  ^~~~~~~~~
./ergm_MHproposal.h:100:23: note: expanded from macro 'BD_LOOP'
#define BD_LOOP(proc) BD_COND_LOOP({proc}, TRUE, 1)
                      ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~
./ergm_MHproposal.h:104:22: note: expanded from macro 'BD_COND_LOOP'
  for(trytoggle = 0; trytoggle < MAX_TRIES*tryfactor; trytoggle++){     \
                     ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
MHproposals.c:128:20: note: uninitialized use occurs here
  MHp->logratio += logratio;
                   ^~~~~~~~
MHproposals.c:110:3: note: remove the condition if it is always true
  BD_LOOP({
  ^
./ergm_MHproposal.h:100:23: note: expanded from macro 'BD_LOOP'
#define BD_LOOP(proc) BD_COND_LOOP({proc}, TRUE, 1)
                      ^
./ergm_MHproposal.h:104:22: note: expanded from macro 'BD_COND_LOOP'
  for(trytoggle = 0; trytoggle < MAX_TRIES*tryfactor; trytoggle++){     \
                     ^
MHproposals.c:109:18: note: initialize the variable 'logratio' to silence this warning
  double logratio;
                 ^
                  = 0.0
1 warning generated.

Current ergm@master causes fatal error on windows

Any model estimation with the master branch of ergm causes R to crash with a call to the ergm::ergm on Windows. For example, this simple model:

nw <- network.initialize(100, directed = FALSE)
fit <- ergm(nw ~ edges, target.stats = c(50))

With debug I have pinpointed the crash to this fx within ergm:

	nw.stats <- summary(formula.no, response = response, 
			term.options = control$term.options)

Shouldn't argument to getOption be quoted

here: https://github.com/statnet/ergm-private/blob/790afe02397055239451dd4e6d85f44b94f13235/R/logLik.ergm.R#L194

After updating the package to 4.0-4834 I see in the logs (trial run with 1 iteration):

(yaddayadda)
Optimizing with step length 0.418084281310255.
Using lognormal metric (see control.ergm function).
Using log-normal approx (no optim)
Starting MCMC s.e. computation.
The log-likelihood improved by 4.107.
Estimating equations are not within tolerance region.
MCMLE estimation did not converge after 1 iterations. The estimated coefficients may not be accurate. Estimation may be resumed by passing the coefficients as initial v
alues; see 'init' under ?control.ergm for details.
Finished MCMLE.
Stopping the running cluster.
Error in getOption(ergm.loglik.warn_dyads) : 
  object 'ergm.loglik.warn_dyads' not found

rlebdm() construction does not handle network sizes over 46340 (Was: Problem specifying ppopsize=50000)

@krivit @mbojan

I am running into trouble using the ppopsize control, for larger sizes. Our desired size is 50K. I can run 45K but not 50K.

So, this works:

data(faux.mesa.high)
fmh.ego <- as.egodata(faux.mesa.high)
fit <- ergm.ego(fmh.ego ~ edges, 
                         control=control.ergm.ego(ppopsize=45000))

But this throws the error below (I included some traceback, not sure if that helps):

data(faux.mesa.high)
fmh.ego <- as.egodata(faux.mesa.high)
fit <- ergm.ego(fmh.ego ~ edges, 
                         control=control.ergm.ego(ppopsize=50000))

image

@martinamorris @dth FYI

Help wanted -- nodemix term

Let's say we want to build up a model for faux.mesa.high network, with nodemix term accounting for the cross-Grade pairing effects. Specifically, for example, we only want to consider such effects between two grades with the largest number of members (grade 7 and grade 9), then what might be the appropriate way to specify the model?

I tried the following code --

nodemix_try <- ergm(faux.mesa.high ~ edges + nodemix("Grade", base = c(7,9)), estimate = "MPLE")
summary(nodemix_try)
`
nodemix_try

It seems that the effects of all possible pairs of grades are considered in the model, despite that I specified base = c(7,9) as the argument for nodemix.

May I know what I should do to achieve the aforementioned goal?

Thanks!

switch off warning for diff() terms and bipartie models

Hi Pavel,

I think (but I might be wrong) that using diff terms for bipartie models is quite fair even for undirected networks. Then the warning message should be switched off:

In InitErgmTerm.R l1272

if(sign.action!="abs" && !is.directed(nw)) message("Note that behavior of term diff() on undirected networks may be unexpected. See help(\"ergm-terms\") for more information.")

should be replaced by something like

if(sign.action!="abs" && !is.directed(nw) && !is.bipartite(nw)) message("Note that behavior of term diff() on undirected (resp. non-bipartie) networks may be unexpected. See help(\"ergm-terms\") for more information.")

Best,
Damien

more informative warning for valued ergm model requirements

from statnet help:

Hi,

To model counts, you need to do it in a valued ERGM framework, which requires additional information about the model. See https://statnet.github.io/Workshops/Valued/Valued.pdf
[statnet.github.io] for some examples. The error message could be more informative, though.

I hope this helps,
Pavel

On Fri, 2019-01-25 at 07:00 +0000, Douglas Kerlin wrote:

   Hi,


  In short, I am having some issues trying to fit an ergm model using weighted edges, and getting an error message:

"Error in locate.InitFunction(name, NVL2(response, "InitWtErgmProposal",  :
  Metropolis-Hastings proposal ‘TNT’ initialization function ‘InitWtErgmProposal.TNT’ not found". 

Enable ergm_MCMC_slave() to store intermediate networks without returning to R

Currently, simulate.formula() with output!="stats" makes nsim (indirect) calls to .C("MCMC_wrapper"). The C code then

  1. Reinitialises the model struct and the Network struct.
  2. Calls the sampler.
  3. Saves the Network to an edge list.
  4. Destroys model and Network.
  5. Returns.

This also entails copying data between R and C. There are several ways this could be improved:

Let the C code save multiple networks.

This is as simple as passing in more space for newnetwork* vectors, and save to edgelist after every iteration.

Make it possible to "pause" and "resume" MCMC sampling.

Rather than deallocating everything, the pointers to all the data structures representing the state of the sampler are saved and returned as perhaps a raw vector. The next call to the C code can then pass this vector in and have the MCMC code resume.

Care would need to be taken to make sure that this pointer is deallocated.

Switch to a .Call interface and allocate return vector dynamically.

This can be viewed as an enhancement on the first two possibilities: .C must preallocate the space for return values, whereas .Call can construct new R objects and pass them back to R.

ergm_MCMC_slave function parameter name changes

In the transition from ergm.mcmcslave to ergm_MCMC_slave in 3.9.4, you changed the names of two parameters:

  1. MHproposal became proposal
  2. eta0 became eta

If that was intentional, I would suggest documenting those changes in your deprecation help page that covers ergm.mcmcslave.

Additionally, in the deprecation function here:
https://github.com/statnet/ergm/blob/3.9.4/R/ergm.getMCMCsample.R#L238-L241

  1. You switch from MHproposal that was in 3.8.0 (
    https://github.com/statnet/ergm/blob/3.8.0/R/ergm.getMCMCsample.R#L241) to proposal without warning
  2. There is a mismatch between eta0 there and eta in the primary function definition here:
    https://github.com/statnet/ergm/blob/3.9.4/R/ergm.getMCMCsample.R#L265

Towards predict.ergm()

I wrote a function that calculates conditional tie probabilities for a given model fit and a network object net by:

  1. Extracting model formula from fit
  2. Calculating observed sufficient statistics for net
  3. For all dyads in net
    a. toggle the dyad
    b. calculate sufficient stats after toggle
    c. calculate change stats (2)-(3b) or (3b)-(2) depending if the toggle created the tie or removed it
    d. compute expit based on changestats (c)

I used fix.curved() on (1) if is.curved(fit) to be able to compute sufficient statistics for curved ERGMs in (2) and (3b).

Would that be correct? Seems to be working fine for some simple ERGMs I tried.

Robbins Monro algorithm to fit ERGM

`wants <- c("stringr", "parallel", "data.table", "dplyr", "stringi",
"ggplot2", "tibble", "mixtools", "openxlsx", "magic", "invgamma", "stringr","statnet",
"ergm", "brainGraph", "intergraph", "CINNA")
has <- wants %in% rownames(installed.packages())
if(any(!has)) install.packages(wants[!has])
lapply(wants, library, character.only=TRUE)

data("faux.mesa.high")
mesa_new_mod <- ergm(faux.mesa.high ~ edges + nodematch('Race',diff=F)
+ gwesp(0.25, fixed=TRUE), eval.loglik = TRUE,
control= control.ergm(main.method = "Robbins-Monro", MCMC.burnin = 1e6,
MCMC.interval = 5e5))
`
image

Help page for the ddsp term is incorrect

There are 5 possible options for the type argument (noted in the ddsp help text correctly). But only 4 are then listed.

The 5 should be:

UTP - Undirected two-path (undirected graphs only)

OTP - Outgoing two-path (i->k->j)

ITP - Incoming two-path (i<-k<-j)

RTP - Reciprocated two-path (i<->k<->j)

OSP - Outgoing shared partner (i->k<-j)

ISP - Incoming shared partner (i<-k->j)

The RTP is missing now from the help.

It would also be good to have this info duplicated in the help for all of the dgw*sp terms -- right now the help for those terms just points users to the help for ddsp.

etamapping

I am trying to understand how ergm.eta and friends work and I tried:

library(ergm)
#> Loading required package: statnet.common
#> 
#> Attaching package: 'statnet.common'
#> (...)
data(sampson)
# Curved model with offset
gest <- ergm(
  samplike ~ offset(edges) + edges + gwidegree(decay=0.5, fixed=FALSE), 
  offset.coef = -log(network.size(samplike))
  # , control=control.ergm(MCMLE.maxit=1)
)
#> Starting maximum likelihood estimation via MCMLE:
#> Iteration 1 of at most 20:
#> (...) 
#  It converged
is.curved(gest)
#> [1] TRUE
(m <- ergm.etamap(gest))
#> $canonical
#> NULL
#> 
#> $offsetmap
#> NULL
#> 
#> $offset
#> [1]  TRUE FALSE FALSE FALSE
#> 
#> $offsettheta
#> NULL
#> 
#> $curved
#> list()
#> 
#> $etalength
#> [1] 0
ergm.eta(coef(gest), m)
#> numeric(0)

Is this expected? Am I missing something?

Problem in Penalized

I ran into a small strange problem while using MPLE.type "penalized."

The error I ran into was: "Error in if (loglik > loglik.old) break : \n missing value where TRUE/FALSE needed\n"

I traced the problem back to ergm.pen.glm at line 120. From what I can figure out a rounding error was making it so pi==1. I solved this by adding the following lines. I am guessing this is really rare and don't think this is probably the best way to fix it, but it works well enough:

pi <- ifelse(pi!=1, pi, max(pi[pi<1]) + (1-max(pi[pi<1]))/10)
pi <- ifelse(pi!=0, pi, min(pi[pi>0]) - (min(pi[pi>0])/10))

Processing time on simulate.ergm has substantially slowed due to constraint processing in ergm_proposal

The Problem

Over the summer, there was a revision in how attributes for constraints were stored. I'm not exactly sure why, but they are now stored in an .attributes sublist. It looks like the processing of these attributes by ergm_proposal within simulate.ergm is now substantially slower than before this change. In dynamic simulations that call simulate.ergm repeatedly, this results in many of our "built-in" models in EpiModel that use the full functionality of ergm/tergm now being effectively un-runnable on laptops during timed events like workshops. One way around this is to use tergmLite 100% of the time, but that would be unfortunate.

Profile Example

Here is a R profile of the clock time it takes to run simulate.ergm on a network of 10k vertices and ~2000 edges:

screen shot 2018-12-14 at 1 29 55 pm

Clock time was about 3.6 seconds, with 3.4 seconds taken up by ergm:::InitErgmConstraint..attributes. I don't have an equivalent profile of this would have taken before the changes over the summer, but my eyeball estimate was 500 ms in the older versions of ergm.

Interestingly, this model (fit with a target stats approach) has no "true" constraints (i.e., arguments passed to the constraints parameter in ergm. We have "effective" constraints on the model by passing in 0's for target stats to our formation terms. Here's the model, and we use 0's for the last three target stats:

model_casl <- ~edges +
               nodematch("age.grp", diff = TRUE) +
               nodefactor("age.grp", base = 3) +
               nodematch("race") +
               nodefactor("race", base = 1) +
               nodefactor("deg.main", base = 3) +
               concurrent +
               degrange(from = 4) +
               nodematch("role.class", diff = TRUE, keep = 1:2)

Why did the run time substantially increase? What are the different strategies to reduce the computational time in this use case? Are the new .attributes for constraints relevant in this example? Could we reduce the writing and reading time through ergm:::InitErgmConstraint..attributes for models with no true constraints? Even in models with true constraints (really, only ~bd(maxout = X) ever) , do we need ergm:::InitErgmConstraint..attributes?

Reproducible Example for profiling

  1. Download Fitted ERGM
  2. Run this script
#install.packages("profvis")
library(profvis)
est <- readRDS("path/to/dir/artnet.NetEst.Atlanta.rda")
fit <- est[[2]]$fit
profvis(simulate(fit, basis = fit$newnetwork))

cc: @martinamorris @sgoodreau

problem reading the network model from the .RData file

I am trying to build an ERGM model and simulate some artificial networks from the model in the R language statnet library. When I simulate the model exactly after the building the model I can do so without any problems.

netModel <- ergm(net ~ edges + density + triangles + gwdegree(0.25))
simulatedNet <- simulate(netModel, nsim=1)

However, when I save the model in a .RData object an load it some time later I have problems. For example,

save(netModel, file = "netModelRdata.Rdata")
and later do the following steps
load("netModelRdata.Rdata")
simulatedNet <- simulate(netModel, nsim=1)

I get the following error:

Error in eval(expr, envir, enclos) :
Invalid network on the LHS of the formula.

thank you for your replies in advance.

Update built-in ergm terms to make use of ergm_get_vattr(), ergm_attr_levels(), and ergm_Init_*() API.

Currently, the Init*Ergm*() routines use their own message writing, and Init*ErgmTerm*() functions use get.node.attr(). Recently, new helper functions have been implemented:

  • ergm_get_vattr() can extract and check nodal (vertex) attributes and their transformations in a robust and flexible manner.
  • ergm_attr_levels() can similarly filter levels of a categorical attribute.
  • ergm_Init_(warn|abort|inform)() functions can print sensible error and other messages, incorporating information which ergm initializer is responsible.

In addition, two constants are exported:

  • ERGM_VATTR_SPEC
  • ERGM_LEVELS_SPEC

which can be used in the vartypes= argument of check.ErgmTerm() to specify permitted argument types for ergm_get_vattr() and ergm_attr_levels() respectively.

Porting existing terms to use these would make the interface for terms that use them more consistent and flexible.

control.ergm param SAN.burnin.times deprecation warning

The control.ergm parameter SAN.burnin.times was changed to SAN.nsteps.times without warning, so existing code generates an error:

Error in control.ergm(MCMLE.maxit = 500, SAN.maxit = 2, SAN.burnin.times = 2) : 
  Unrecognized control parameter: SAN.burnin.times.

Suggest adding a deprecation warning here to changed argument names.

typo in ergm-terms help file - diff() term

Hi,

I think there is a typo in diff() trem description.

In ergm-terms.Rd (l 832 - 833) t-h and tail-head should be h-t and head-tail right?

original chunk:

and of \code{sign.action(attrname[j]-attrname[i])^pow} if
    \code{"t-h"}, \code{"tail-head"}, or \code{"b2-b1"}.

modified one:

and of \code{sign.action(attrname[j]-attrname[i])^pow} if
    \code{"h-t"}, \code{"head-tail"}, or \code{"b2-b1"}.

In fact that is maybe the peace just before that needs to be adapted.. but something is going wrong somewhere.

I would find it easier to get if:

  • h-t, head-tail and b1-b2 was associated with sign.action(attrname[i]-attrname[j])^pow
  • t-h, tail-head and b2-b1 was associated with sign.action(attrname[j]-attrname[i])^pow

Best,
Damien

Incorrect warning message for simulate function

Running the following:

library(ergm)
nw <- network.initialize(100, directed = FALSE)
fit <- ergm(nw ~ edges, target.stats = 50)
sim <- simulate(fit)

Now generates this warning message in the new ergm:

Warning message:
You appear to be calling simulate.formula() directly. simulate.formula() is a method, and will not be exported in a future version ofergm. Use simulate() instead, or getS3method() if absolutely necessary.

This only shows up the first time the simulate function is run in an R session.

Add info on gof defaults to man page

Would be useful to know how many networks are simulated by default -- you can see that now by clicking on the link to the control.gof.ergm function, but it's a basic enough bit of info that it would be worth including in the main documentation.

Some C-level diagnostic output produces integer overflows.

When MCMC/CD/SAN/etc. total step count (i.e., sample size * interval) exceeds the maximum signed integer value (2147483647), it produces a signed integer overflow resulting in funny diagnostic output such as

Sampler accepted  15.319% of -1089934592 proposed steps.

This is a benign issue, since the actual iteration uses a nested loop, but it would be good for someone to go through all our sampler code and replace instances of

    if (fVerbose){
	  if (samplesize > 0 && interval > LONG_MAX / samplesize) {
		// overflow
		Rprintf("Sampler accepted %7.3f%% of %d proposed steps.\n",
	      tottaken*100.0/(1.0*interval*samplesize), interval, samplesize); 
	  } else {
	    Rprintf("Sampler accepted %7.3f%% of %d proposed steps.\n",
	      tottaken*100.0/(1.0*interval*samplesize), interval*samplesize); 
	  }
    }

and similar with something more robust.

Error in estimation with ERGM 3.9.4 and tergm 3.5.2

I am fitting an ergm and I get the following error:

Error in UseMethod("as.edgelist") :
no applicable method for 'as.edgelist' applied to an object of class "NULL"
Error: $ operator is invalid for atomic vectors

The formula for the ergm is:

Formulas

formation.asmm <- ~edges +
nodefactor("riskg",base = 3)+
nodefactor("yamsm", base=1) +
absdiff("sqrt.age") +
offset(nodefactor("oamsm",base = 1)) +
offset(nodematch("yamsm", diff = TRUE, keep = 2)) +
offset(nodematch("role.class", diff = TRUE, keep = 1:2)) +
offset(nodefactor("debuted",base=2))

If I remove either the nodefactor("riskg", base =3) or the absdiff("sqrt.age") the model will converge. I can get the model to run as is by reverting to ergm_3.8.0 and tergm_3.4.1

Should NetworkInitialize and MHInitialize return a pointer?

Right now, they both return the structure (Network and MHproposal) itself, and having them return pointers makes more sense, since they get passed around as pointers anyway.

Judging by the reverse-LinkingTo: list, the only package that would need to be adjusted is tergm. Any thoughts?

"Robbins-Monro" returns with error when using `parallel`

Using multiple cores when setting main.method="Robbins-Monro" returns with error. These are the steps to reproduce the issue:

This model returns OK

library(ergm)
data(sampson)

> ergm(samplike ~ edges + mutual, control = control.ergm(main.method = "Robbins-Monro", seed=1))
Robbins-Monro algorithm with theta_0 equal to:
    edges    mutual 
-1.760011  2.319627 
Phase 1:  13 iterations (interval=1024)
Phase 1 complete; estimated variances are:
   edges   mutual 
92.46154 22.76923 
Phase 2, subphase 1 : a= 0.1 , 9 iterations (burnin=16384)
theta new: -1.76747333321467 theta new: 2.31435628848381 
Phase 2, subphase 2 : a= 0.05 , 23 iterations (burnin=16384)
theta new: -1.7613086077571 theta new: 2.32393061280814 
Phase 2, subphase 3 : a= 0.025 , 58 iterations (burnin=16384)
theta new: -1.76160377553303 theta new: 2.31726042699733 
Phase 2, subphase 4 : a= 0.0125 , 146 iterations (burnin=16384)
theta new: -1.75814173134114 theta new: 2.32323135290205 
Phase 3:  20 iterations (interval=1024)
Evaluating log-likelihood at the estimate. Using 20 bridges: 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 .
This model was fit using MCMC.  To examine model diagnostics and check for degeneracy, use the mcmc.diagnostics() function.
MCMC sample of size based on: 
 edges  mutual  
-1.760   2.322  

Monte Carlo MLE Coefficients:
 edges  mutual  
-1.762   2.264  

This returns with error

> ergm(samplike ~ edges + mutual, control = control.ergm(main.method = "Robbins-Monro", seed=1, parallel = 2))
Robbins-Monro algorithm with theta_0 equal to:
    edges    mutual 
-1.760011  2.319627 
Phase 1:  13 iterations (interval=1024)
Phase 1 complete; estimated variances are:
    edges    mutual 
181.64286  40.42857 
Error in ergm.robmon(init, nw, model, MHproposal = MHproposal, verbose = verbose,  : 
  (list) object cannot be coerced to type 'double'

Session info:

Session info ------------------------------------------------------------------------------------------------------------------------------------------
 setting  value                       
 version  R version 3.4.3 (2017-11-30)
 system   x86_64, linux-gnu           
 ui       RStudio (1.1.423)           
 language en_US                       
 collate  en_US.UTF-8                 
 tz       US/Pacific                  
 date     2018-03-23                  

Packages ----------------------------------------------------------------------------------------------------------------------------------------------
 package        * version date       source         
 backports        1.1.2   2017-12-13 CRAN (R 3.4.3) 
 bookdown         0.7     2018-02-18 CRAN (R 3.4.3) 
 coda             0.19-1  2016-12-08 CRAN (R 3.4.1) 
 DEoptimR         1.0-8   2016-11-19 CRAN (R 3.4.3) 
 devtools         1.12.0  2016-06-24 CRAN (R 3.3.1) 
 digest           0.6.15  2018-01-28 cran (@0.6.15) 
 ergm           * 3.8.0   2017-08-20 CRAN (R 3.4.3) 
 evaluate         0.10.1  2017-06-24 CRAN (R 3.4.0) 
 htmltools        0.3.6   2017-04-28 CRAN (R 3.4.1) 
 knitr            1.20    2018-02-20 CRAN (R 3.4.3) 
 lattice          0.20-35 2017-03-25 CRAN (R 3.3.3) 
 lpSolve          5.6.13  2015-09-19 CRAN (R 3.4.3) 
 magrittr         1.5     2014-11-22 CRAN (R 3.3.1) 
 MASS             7.3-49  2018-02-23 CRAN (R 3.4.3) 
 Matrix           1.2-11  2017-08-16 CRAN (R 3.4.1) 
 memoise          1.0.0   2016-01-29 CRAN (R 3.3.1) 
 network        * 1.13.0  2015-09-19 CRAN (R 3.4.0) 
 Rcpp             0.12.15 2018-01-20 cran (@0.12.15)
 rmarkdown        1.9     2018-03-01 CRAN (R 3.4.3) 
 robustbase       0.92-8  2017-11-01 CRAN (R 3.4.3) 
 rprojroot        1.3-2   2018-01-03 CRAN (R 3.4.3) 
 statnet.common * 4.0.0   2017-08-16 CRAN (R 3.4.1) 
 stringi          1.1.6   2017-11-17 CRAN (R 3.4.3) 
 stringr          1.3.0   2018-02-19 CRAN (R 3.4.3) 
 trust            0.1-7   2015-07-04 CRAN (R 3.4.3) 
 withr            2.1.1   2017-12-19 CRAN (R 3.4.3) 
 xfun             0.1     2018-01-22 CRAN (R 3.4.3) 
 yaml             2.1.17  2018-02-27 CRAN (R 3.4.3) 

A formula-style interface for the response= argument.

The idea is two-fold:

  1. Give the user a convenient way to specify the response variable for binary and valued ERGMs when it is a function of edge attributes.
  2. Give the user a way to force a binary ERGM regime even when the response is an edge attribute.

For example, the network dataset tribes in the latentnet package is a complete graph, with the information contained in its edge attributes (pos, neg, sign, and sign.012).

So, for example, something like ergm(tribes~edges, response=binary~pos) or ergm(tribes~edges, response=~pos|binary) would estimate the network of positive relations, in the binary mode.

Convert ergm-terms documentation to Roxygen?

At the moment, all the ergm terms are documented in one big file (ergm-terms.Rd) with a "standard" syntax for keywords. Porting them to Roxygen (somehow) would greatly simplify maintenance in the following ways:

  • Documentation "source" could be placed next to the InitErgmTerm function definitions, for much better maintainability.
  • Using @template, common bits of text (like meanings of arguments common among many terms) only need to be written and updated once.
  • Markdown is less clunky than the Rd format.

AFAIK, roxygen2 has an API for custom @ tags and document types, and we could write them to generate automatically formatted Rd files. Some other things we might be able to do:

  • Generate a help file for each term and a help file with all the terms and/or a term index. This would allow us to link to a specific term's help.
  • @seealso-style cross-referencing for related terms.

Any thoughts?

Use sQuote() and dQuote() in output where practical.

This a largely aesthetic change, though it might also simplify the code in places. E.g.,

> message("I said ", '"', "Hello world!", '"')
I said "Hello world!"
> message("I said ", sQuote("Hello world!"))
I said ‘Hello world!’
> message("I said ", dQuote("Hello world!"))
I said “Hello world!”

Deprecation warnings

Hi there,

Recently, the ergm package version on CRAN (3.8.0) shows deprecation warnings of the form

Warning: 'term.list.formula' is deprecated.
Warning: 'ergm.update.formula' is deprecated.
Warning: 'append.rhs.formula' is deprecated.

See here: https://cran.r-project.org/web/checks/check_results_ergm.html

My xergm package depends on ergm and thus shows some of those warnings as well. The CRAN maintainers asked me to remove these warnings from my package. When I pointed them to the dependency issue, they asked if I could contact you about it. Thanks for your support.

Add "artificial multiplicity" proposals to SAN

The idea is to make a series of calls to the proposal function before deciding whether to accept or reject them wholesale. Similar functionality already exists in the contrastive divergence code.

Make SAN more efficient

There are several ways to make san() more efficient:

  • 1. Autodetect when the annealing has "converged" for a given temperature and so should move to the next lower temperature.
  • 2. A smoother annealing curve: currently, the temperature only changes between C calls.
  • 3. Keep track of the best network configuration so far, rather than the last one.
  • 4. Make sure the "covariance" calculation does not get stuck in a feedback loop. (I.e., those statistics that vary more get a smaller weight, which causes them to vary more, etc..

Etas and changestats

This is a child of #6

I am having troubles with understanding what exactly ergm.eta returns and how does it square with the output of ergmMPLE. For example:

data(sampson)
gest <- ergm(
  samplike ~ edges + gwesp(decay=0.5, fixed=FALSE),
  control=control.ergm(MCMLE.maxit=1)
)
etas <- ergm.eta(coef(gest), gest$etamap)
a <- ergmMPLE(
  samplike ~ edges + gwesp(decay=0.5, fixed=FALSE),
  output="array"
)

Now etas has length 17 (!) while the predictors array has only two layers for edges and GWESP terms. Does etas contain entries for the whole ESP distribution?

Make MacOS dmg and Windows zip package files built by Travis CI and make them available to end-users.

This will enable end-users who do not have MacOS and Windows development tools installed to use unreleased versions of packages. TravisCI + Windows + R doesn't work yet, but probably will soon enough.

The steps seem to be straightforward: https://docs.travis-ci.com/user/deployment/ .

In our case, we want the following behaviour:

  • Deploy after every push, not just the tags or the releases.
  • Group or label deployments by branch.
  • Don't clobber past deployments.
  • Deploy the -private branches to a space that is only accessible by those with access to those repositories.

If this is not feasible, a compromise is to deploy on tagging. This is suboptimal, but is much better than nothing.

ergm fits with low concurrent target.stats fail

Using both the CRAN version and dev version of ergm, this fit:

nw <- network.initialize(100, directed = FALSE)
fit <- ergm(nw ~ edges + concurrent,
            target.stats = c(50, 0))

errors after 9 iterations with:

Iteration 9 of at most 20:
Error in rbind(if (!is.null(x2)) t(gamma * t(x2crs) + (1 - gamma) * c(m1crs)),  : 
  object 'm2crs' not found

Also, a fit with a low but non-zero target stat for concurrent does not mix:

fit <- ergm(nw ~ edges + concurrent,
            target.stats = c(50, 5))
...
The log-likelihood improved by 0.002221.
Iteration 18 of at most 20:
Error in ergm.MCMLE(init, nw, model, initialfit = (initialfit <- NULL),  : 
  Unconstrained MCMC sampling did not mix at all. Optimization cannot continue.

We have fit these models all the time in the past without error. They are a primary feature of our modeling classes (NME) to demonstrate the impact of concurrency on epidemic dynamics. This would be a critical issue to fix.

cc: @martinamorris, @sgoodreau

Runtime traceplots and missing categories leads to an error

Hey,

This is a small weird bug, but thought I'd report it anyway. If you have both runtime traceplots running and have a variable that is fixed to -Inf then ERGM crashes. The below code shows this, the first ERGM runs, the second one fails (as there are no C - C edges and the traceplots are turned on) and then the third one runs again because I've turned off the traceplots.

Thanks for all the work on this,

library(ergm)

set.seed(1)
adj <- matrix(rbinom(400, size=1, p=.5), 20, 20)
net <- network(adj, directed=F)

set.vertex.attribute(net, "Group", c(rep("A", 10), rep("B", 5), rep("C", 5)))

#### works
mod <- ergm(net~edges+nodematch("Group", diff=T), eval.loglik = F,
            verbose=T, control=control.ergm(force.main=T,MCMC.runtime.traceplot = T))

### remove the connections between C and C. 
net[15:20,15:20] <- 0

#### FAILS
mod <- ergm(net~edges+nodematch("Group", diff=T),  eval.loglik = F,
            verbose=T, control=control.ergm(force.main=T,MCMC.runtime.traceplot = T))


### Turn off runtime traceplots and it runs again
mod <- ergm(net~edges+nodematch("Group", diff=T),  eval.loglik = F,
            verbose=T, control=control.ergm(force.main=T,MCMC.runtime.traceplot = F))

Error in the curved target+offset test with certain seeds.

As of dd1c70a,

library(ergm)
data(florentine)
set.seed(5)
ergm(flomarriage~offset(edges)+edges+gwdegree(fix=FALSE)+degree(0)+offset(degree(1)),target.stats=summary(flomarriage~edges+gwdegree(fix=FALSE)+degree(0)), offset.coef=c(0,-0.25))

or set.seed(15) produces

Starting contrastive divergence estimation via CD-MCMLE:
Iteration 1 of at most 60:
Convergence test P-value:2.1e-274
Optimizing with step length 1.
Error in check.objfun.output(out, minimize, d) : 
  objfun returned value that is NA or NaN
Error: $ operator is invalid for atomic vectors

However, it works for most seeds.

Edit: Updated to a more recent version and problematic seeds.

New deprecation error for simulate.ergm when statsonly = TRUE

With the latest on the master branch of ergm, simulating with statsonly = TRUE generates a deprecation error:

nw <- network.initialize(100, directed = FALSE)
fit <- ergm(nw ~ edges, target.stats = 50)
sim <- simulate(fit, nsim = 100, statsonly = TRUE)

Error message:

Error in .Deprecate_once(msg = paste0("Use of ", sQuote("statsonly="),  : 
  could not find function ".Deprecate_once"

The error does not arise if statsonly = FALSE.

Session info:

R version 3.5.0 (2018-04-23)
Platform: x86_64-apple-darwin15.6.0 (64-bit)
Running under: macOS High Sierra 10.13.6

Matrix products: default
BLAS: /System/Library/Frameworks/Accelerate.framework/Versions/A/Frameworks/vecLib.framework/Versions/A/libBLAS.dylib
LAPACK: /Library/Frameworks/R.framework/Versions/3.5/Resources/lib/libRlapack.dylib

locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] ergm_3.10.0-4403 network_1.14-355

loaded via a namespace (and not attached):
 [1] Rcpp_0.12.18         rstudioapi_0.7       bindr_0.1.1          magrittr_1.5         MASS_7.3-50          tidyselect_0.2.4    
 [7] lattice_0.20-35      R6_2.2.2             rlang_0.2.2          dplyr_0.7.6          tools_3.5.0          parallel_3.5.0      
[13] grid_3.5.0           packrat_0.4.9-3      lpSolve_5.6.13       coda_0.19-1          assertthat_0.2.0     tibble_1.4.2        
[19] crayon_1.3.4         Matrix_1.2-14        bindrcpp_0.2.2       purrr_0.2.5          trust_0.1-7          robustbase_0.93-2   
[25] glue_1.3.0           statnet.common_4.1.4 DEoptimR_1.0-8       compiler_3.5.0       pillar_1.3.0         pkgconfig_2.0.2  

Passing existing cluster does not seem to work

Creating a cluster manually and passing it to control.ergm(parallel=) does not seem to work:

data(florentine)

fit <- ergm(flomarriage ~ absdiff("wealth") + kstar(1:2) + triangles, 
            control=control.ergm(parallel=2) )


clus <- parallel::makeForkCluster(2)
fit2 <- ergm(flomarriage ~ absdiff("wealth") + kstar(1:2) + triangles, 
            control=control.ergm(parallel=clus) )
parallel::stopCluster(clus)


mcmc.diagnostics(fit) # two chains OK
mcmc.diagnostics(fit2) # one chain?

Add examples and/or unit tests for the san() family of methods.

Currently, san() help does not have any examples, and there are no tests in tests/ testing its functionality directly. (It is tested indirectly, when ergm() is called with target.stats=.)

It would be good to

  • Add an example for using san().
  • Add some tests (in the testthat framework, preferably) for its functionality.

SAN_wrapper error (Updated Issue) -- Windows Only

Originally reported by @andsv2 in EpiModel/EpiModel#308, updated now with a reproducible example and sessionInfo:

This code:

nw <- network.initialize(n = 100, directed = FALSE)
fit <- ergm(nw ~ edges, target.stats = 50)

results in this error:

Error in .C("SAN_wrapper", as.integer(nedges), as.integer(tails), as.integer(heads),  : 
  Incorrect number of arguments (29), expecting 32 for 'SAN_wrapper'

when running the latest masters of ergm and depending packages on Windows. Note, I am not able to reproduce this error on Mac/Linux, only on Windows.

SessionInfo:

R version 3.5.1 (2018-07-02)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows Server >= 2012 x64 (build 9200)

Matrix products: default

locale:
[1] LC_COLLATE=English_United States.1252  LC_CTYPE=English_United States.1252    LC_MONETARY=English_United States.1252
[4] LC_NUMERIC=C                           LC_TIME=English_United States.1252    

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] EpiModel_1.7.2       tergm_3.6.0-1659     ergm_3.10.0-4717     networkDynamic_0.9.0 network_1.14-355     deSolve_1.21        

loaded via a namespace (and not attached):
 [1] tidyselect_0.2.5         remotes_2.0.2            lpSolve_5.6.13           purrr_0.2.5              lattice_0.20-38          colorspace_1.3-2        
 [7] yaml_2.2.0               rlang_0.3.0.1            pkgbuild_1.0.2           pillar_1.3.1             glue_1.3.0               withr_2.1.2             
[13] RColorBrewer_1.1-2       bindrcpp_0.2.2           trust_0.1-7              foreach_1.4.4            bindr_0.1.1              plyr_1.8.4              
[19] robustbase_0.93-3        munsell_0.5.0            gtable_0.2.0             codetools_0.2-15         coda_0.19-2              callr_3.1.0             
[25] doParallel_1.0.14        ps_1.2.1                 parallel_3.5.1           curl_3.2                 DEoptimR_1.0-8           Rcpp_1.0.0              
[31] backports_1.1.3          scales_1.0.0             ggplot2_3.1.0            processx_3.2.1           dplyr_0.7.8              grid_3.5.1              
[37] rprojroot_1.3-2          cli_1.0.1                tools_3.5.1              magrittr_1.5             lazyeval_0.2.1           tibble_1.4.2            
[43] crayon_1.3.4             ape_5.2                  pkgconfig_2.0.2          MASS_7.3-51.1            Matrix_1.2-15            prettyunits_1.0.2       
[49] iterators_1.0.10         assertthat_0.2.0         rstudioapi_0.8           statnet.common_4.2.0-208 R6_2.3.0                 nlme_3.1-137            
[55] compiler_3.5.1

Suggestion for `?ergm`

As a newbie to both ERGMs and the R package, I found it confusing to see that the numbers in the sample slot of the ergm class object neither matched the parameter estimates nor the target network statistics.

It took me at least a couple of hours to figure out that that matrix was actually the centered network statistics sampled from the last iteration of the model. Perhaps adding those two pieces of information to the following could be useful:

ergm/R/ergm.R

Lines 222 to 225 in 7433c4b

#' \item{sample}{The \eqn{n\times p} matrix of network statistics,
#' where \eqn{n} is the
#' sample size and \eqn{p} is the number of network statistics specified in the
#' model, that is used in the maximum likelihood estimation routine.}

Again, this is from someone who just started using ERGMs

Possible bug in fix.curved handling of dgwesp term

I noticed that the fix.curved function does not behave as expected for the dgwesp terms.

I have a reproducible example:

library(ergm)
net <- network.initialize(10)
form <- net ~ dgwesp(fixed = FALSE)

Try fixing dgwesp

fix.curved(form, c(0.5, 0.5))$formula

Returns:
net ~ dgwesp(fixed = FALSE)

where fixed = FALSE, but should be fixed = TRUE

Comparing to GWESP:

form <- net ~ gwesp(fixed = FALSE)
fix.curved(form, c(0.5, 0.5))$formula

Returns:
net ~ gwesp(fixed = TRUE, decay = 0.5)

where fixed = TRUE, as would be expected.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.