Code Monkey home page Code Monkey logo

pwrcgp's Introduction

pwrCGP - Power Analysis for Social Networks Built From Count Data using a Gamma-Poisson model

This package can be used to estimate the accuracy of observed social networks built from count data. It uses a Gamma-Poisson model of social event counts to estimate the correlation between a sampled network and the true underlying network. The package also includes methods to conduct power analysis on nodal regression and to estimate the point at which increases in sampling effort lead to diminishing returns in accuracy.

If using this package, please cite the paper, which includes more detailed information about the methods:

Hart, J. D. A., Franks, D. W., Brent, L. J. N., & Weiss, M. N. (2021). Accuracy and Power Analysis of Social Interaction Networks. BioRxiv, 2021.05.07.443094. https://doi.org/10.1101/2021.05.07.443094

How to use

Installation

To install this package you'll need the devtools library, then you can install pwrCGP using the install_github function.

devtools::install_github("JHart96/pwrCGP")

Example

First import the package.

library(pwrCGP)

Simulate undirected network data with 8 nodes, 10 mean units of observation time per dyad, a social differentiation of 2, and a mean interaction rate of 0.5 interactions per unit time. Extract the symmetric square matrices for use in the net_cor function. This won't be necessary if you're using your own data.

set.seed(1)
sim_data <- simulate_data_gp(8, 10, 2, 0.5)

X <- sim_data$X # 8 x 8 symmetric matrix of integer observation counts.
D <- sim_data$D # 8 x 8 matrix of positive real-valued sampling times.

Let's have a look at what X looks like:

X
     [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8]
[1,]    0    0    2    0    2    2    0    0
[2,]    0    0    0    7    0    0    6   13
[3,]    2    0    0   12    5   14    1    0
[4,]    0    7   12    0    0    0    0    4
[5,]    2    0    5    0    0    1    0    1
[6,]    2    0   14    0    1    0    0    0
[7,]    0    6    1    0    0    0    0   11
[8,]    0   13    0    4    1    0   11    0

Use the two matrices X and D to estimate the correlation between the sampled network and the true, underlying network. This will provide both a summary table of several properties of the data as well as a QQ diagnostic plot to qualitatively verify that the data fit the Gamma-Poisson model.

net_cor_obj <- net_cor(X, D)
net_cor_obj

Gamma-Poisson QQ Plot

                                Estimate     SE Lower CI Upper CI
Observed Social Differentiation    1.540     NA       NA       NA
Mean Event Rate                    0.274     NA       NA       NA
Sampling Effort                    2.700     NA       NA       NA
Est. Event Rate                    0.274 0.1060    0.137    0.549
Est. Social Differentiation        1.780 0.3450    1.230    2.580
Est. Correlation                   0.946 0.0287    0.869    0.979

The QQ diagnostic plot shows that the data fit the model well, with a slight deviation at the tail. We should be okay to proceed now.

The first three rows give the social differentiation, mean event rate, and sampling effort according to the observed data. The next three rows use the estimates from the Gamma-Poisson model to estimate the true event rate, social differentiation, and the correlation between the observed event rates and the true event rates.

If you want to conduct power analysis for nodal regression, you will need to extract social differentiation, event rate, and sampling times from the summary object and the data matrices. We also recommend to extract the confidence intervals of these to capture the uncertainty of the data. You will also need to provide an effect value. This is the effect size (correlation coefficient) and reflects the true relationship between the response and predictors in the regression (the effect size we'd see with perfect, infinite sampling).

social_differentiations <- net_cor_obj$summary[5, c(3, 1, 4)] # c(3, 1, 4) gives the lower CI, median, and upper CI.
interaction_rates <- net_cor_obj$summary[4, c(3, 1, 4)]
sampling_times <- D # Matrix of sampling times OR Single value of mean sampling times

# Calculate power of nodal regression for effect size r = 0.5
effect <- 0.5
pwr_nodereg(8, effect, social_differentiations, interaction_rates, sampling_times)
Running simulations...Done!

Number of nodes: 8
Effect size: 0.5
 Social Differentiation       Event Rate Power
                   1.23            0.137 0.207
                   1.78            0.274 0.227
                   2.58            0.549 0.249

This shows us that we could expect a power of between 20.7 and 24.9% given the properties of the data and the true effect size = 0.5.

If a different type of analysis is being conducted such as network subsetting, the diminishing returns/elbow estimator could be used to determine if sufficient data are available:

pwr_elbow(social_differentiations, rho_max=0.99) # Use rho_max=0.99 as in the paper.
Social Differentiation Sampling Effort Correlation
                   1.23            2.81   0.8997606
                   1.78            1.34   0.8996477
                   2.58            0.64   0.8999386

The elbow method says we need a sampling effort of between 0.64 and 2.81 to reach the optimal level of correlation, which is roughly 90%. From our run of net_cor we know that sampling effort is 2.7, which is lower than the 2.81 upper CI estimate. This indicates that in the worst case scenario, we probably have roughly the optimal amount of sampling for our level of social differentiation.

pwrcgp's People

Contributors

jhart96 avatar

Watchers

James Cloos avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.