Code Monkey home page Code Monkey logo

sparsebn's Introduction

sparsebn

Project Status: Active The project has reached a stable, usable state and is being actively developed. Travis-CI Build Status CRAN RStudio mirror downloads

Introducing sparsebn: A new R package for learning sparse Bayesian networks and other graphical models from high-dimensional data via sparse regularization. Designed from the ground up to handle:

  • Experimental data with interventions
  • Mixed observational / experimental data
  • High-dimensional data with p >> n
  • Datasets with thousands of variables (tested up to p=8000)
  • Continuous and discrete data

The emphasis of this package is scalability and statistical consistency on high-dimensional datasets. Compared to existing algorithms, sparsebn scales much better and is under active development. For more details on this package, including worked examples and the methodological background, please see our new preprint [1].

Overview

The main methods for learning graphical models are:

  • estimate.dag for directed acyclic graphs (Bayesian networks).
  • estimate.precision for undirected graphs (Markov random fields).
  • estimate.covariance for covariance matrices.

Currently, estimation of precision and covariances matrices is limited to Gaussian data.

The workhorse behind sparsebn is the sparsebnUtils package, which provides various S3 classes and methods for representing and manipulating graphs. The basic algorithms are implemented in ccdrAlgorithm and discretecdAlgorithm.

Installation

You can install:

  • the latest CRAN version with

    install.packages("sparsebn")
  • the latest development version from GitHub with

    devtools::install_github(c("itsrainingdata/sparsebn/", "itsrainingdata/sparsebnUtils/dev", "itsrainingdata/ccdrAlgorithm/dev", "gujyjean/discretecdAlgorithm"))

References

[1] Aragam, B., Gu, J., and Zhou, Q. (2017). Learning large-scale Bayesian networks with the sparsebn package. arXiv: 1703.04025.

[2] Aragam, B. and Zhou, Q. (2015). Concave penalized estimation of sparse Gaussian Bayesian networks. The Journal of Machine Learning Research. 16(Nov):2273−2328.

[3] Fu, F., Gu, J., and Zhou, Q. (2014). Adaptive penalized estimation of directed acyclic graphs from categorical data. arXiv: 1403.2310.

[4] Aragam, B., Amini, A. A., and Zhou, Q. (2015). Learning directed acyclic graphs with penalized neighbourhood regression. arXiv: 1511.08963.

[5] Fu, F. and Zhou, Q. (2013). Learning sparse causal Gaussian networks with experimental intervention: Regularization and coordinate descent. Journal of the American Statistical Association, 108: 288-300.

sparsebn's People

Contributors

itsrainingdata avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar

sparsebn's Issues

Correct error message for mixed data

current <- cbind(c(0,1,1,0),
c(2,1,0,1),
c(0,0,3,0), c(0.35, 5, 10,7), c(4,1,0,7))
current.data <- sparsebnData(current, type ="mixed")

#Error in sparsebnData.data.frame(as.data.frame(x), type, levels, ivn) :
#Invalid 'type' entered: Must match one of 'continuous', 'discrete', 'mixed’.

In Rstudio, this command terminates the R session

#Fucntion to_igraph not converting network to graph?


> library(sparsebn)
> library(igraph)
> cyto.data <- sparsebnData(cytometryContinuous[["data"]],
+                           type = "continuous",
+                           ivn = cytometryContinuous[["ivn"]])
> cyto.learn <- estimate.dag(data = cyto.data)
> cyto.param <- estimate.parameters(cyto.learn, data = cyto.data)
> param=select.parameter(cyto.learn, cyto.data)
> 
> 
> cyto.learn.igraph=to_igraph(cyto.learn[[param ]])
> 
> get.edges(cyto.learn.igraph)
Error in ends(graph, es, names = FALSE) : Not a graph object

Get error with select.paramenter with discrete data

Error:

> param=select.parameter(current.learn, current.data)
 Error in eval(expr, envir, enclos) : object 'Anti' not found

This error only appears when the input dataset of sparsebnData() is a matrix. I observed it with discrete data. The error does not occur when the input dataset of sparsebnData() is a data.frame.

Maybe send warning while not using data.frame with sparsebnData()?

Get error with estimate.dag with continuose data

Error:

cyto.learn <- estimate.dag(cyto.data, whitelist = whitelist,lambdas.length = 10)
Error in weights < -1 || weights > 1 :
'length = 6922161' in coercion to 'logical(1)'

I am getting this error only in my workstation but when i running this code in another workstation, its work fine.
Also, i am working on 9897 nodes and 6 observational data. It is running from last 6 days on single cpu core. How much time it take to estimate dags ? and can we run sparsebn r package on multicore-processor or run on GPU ??

Improve methods for exploring graphs

Extend and improve the existing show.parents method to filter by richer criteria such as minimum number of parents, number of children, v-structures, etc.

  • Write a method for filtering: filter.nodes that returns a list of filtered nodes
  • Write a wrapper that calls show.parents on the output of filter.nodes

Improve output when estimate.parameters is singular

Instead of returning an error when estimate.parameters encounters a singular Gram matrix, return valid estimates for those nodes that are nonsingular (< n parents). Output a warning, and return NA for nodes with > n parents.

Also, improve the error message, which is a bit cryptic:

  Error in fit_glm_dag(edges, data$data, call = "lm.fit", ...) :
        Node 465 has too many parents! <27 > 22>

Need to add an argument in estimate.dag function

In cd.run() function, I added an argument: adaptive
It used to be:
cd.run <- function(indata, weights=NULL, lambdas=NULL, lambdas.length=30, error.tol=0.0001, convLb=0.01, weight.scale=1.0, upperbound = 100.0) {...}

And now I updated it as:
cd.run <- function(indata, weights=NULL, lambdas=NULL, lambdas.length=30, error.tol=0.0001, convLb=0.01, weight.scale=1.0, upperbound = 100.0, adaptive = FALSE) {...}

Jiaying

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.