Code Monkey home page Code Monkey logo

wmms's Introduction

wmms

Codecov test coverage R-CMD-check

Prune a tree to be the most profitable possible subtree that is still connected to the root node. If every node has two numeric values, a and b, the most profitable subtree will be the one where sum(a) / sum(b) over all remaining nodes is the maximum it could possibly be. Minimising can be done by subtracting the a values from zero.

This package implements the algorithm in Carlson J., Eppstein D. (2006) The Weighted Maximum-Mean Subtree and Other Bicriterion Subtree Problems. In: Arge L., Freivalds R. (eds) Algorithm Theory โ€“ SWAT 2006. SWAT 2006. Lecture Notes in Computer Science, vol 4059. Springer, Berlin, Heidelberg.

Pure base R is used, yet the package is reasonably quick and has been used to find optimal trees for every LSOA11 (Lower Layer Super Output Area) in Wales in less than a minute on a standard laptop.

Installation

You can install the development version from GitHub with:

# install.packages("devtools")
devtools::install_github("ukgovdatascience/wmms")

Example

Here is an example of building a broadband network out from an existing node. The cost of extending the network to each node is a function of its distance from its parent. We want to minimise the cost of connecting each node, when averaged over all the nodes that are connected.

In this graph, the root node in the centre has no value, because it already exists. The node beside it has the value 10, because its distance from the root is 10. The b value of all nodes is 1, so that the algorithm will find an unweighted mean.

Create this graph by making a data frame of edges, one row per edge, with the columns from and to referring to node IDs, and a and b giving the values of the to node.

(One way to obtain such an edgelist from geographical coordinates of each node is the emstreeR package, which calculates a Euclidean Minimum Spanning Tree.)

library(wmms)

# This graph was invented by Jo Keefe at the UK Department for Digital, Culture,
# Media and Sport.
edges <- data.frame(from = c(1, rep(2:7, each = 5)),
                    to = c(2, 3:32),
                    a = c(10, rep(1:5, 6)),
                    b = 1)
edges
#>    from to  a b
#> 1     1  2 10 1
#> 2     2  3  1 1
#> 3     2  4  2 1
#> 4     2  5  3 1
#> 5     2  6  4 1
#> 6     2  7  5 1
#> 7     3  8  1 1
#> 8     3  9  2 1
#> 9     3 10  3 1
#> 10    3 11  4 1
#> 11    3 12  5 1
#> 12    4 13  1 1
#> 13    4 14  2 1
#> 14    4 15  3 1
#> 15    4 16  4 1
#> 16    4 17  5 1
#> 17    5 18  1 1
#> 18    5 19  2 1
#> 19    5 20  3 1
#> 20    5 21  4 1
#> 21    5 22  5 1
#> 22    6 23  1 1
#> 23    6 24  2 1
#> 24    6 25  3 1
#> 25    6 26  4 1
#> 26    6 27  5 1
#> 27    7 28  1 1
#> 28    7 29  2 1
#> 29    7 30  3 1
#> 30    7 31  4 1
#> 31    7 32  5 1

Now prune the tree to obtain the optimum possible average value over all nodes. In this example, the values a are made negative, so that a/b will be minimised rather than maximised.

root_id <- 1

edges$a <- 0 - edges$a

maximum <- weighted_maximum_mean_subtrees(edges, root_id)

maximum$a <- 0 - maximum$a
maximum$value <- 0 - maximum$value

maximum
#>    from to  a b depth ancestor    value
#> 1     1  2 10 1     1        2 2.461538
#> 2     2  3  1 1     2        2 2.461538
#> 3     2  4  2 1     2        2 2.461538
#> 4     2  5  3 1     2        2 2.461538
#> 5     2  6  4 1     2        2 2.461538
#> 6     2  7  5 1     2        7 2.666667
#> 7     3  8  1 1     3        2 2.461538
#> 8     3  9  2 1     3        2 2.461538
#> 9     3 10  3 1     3       10 3.000000
#> 10    3 11  4 1     3       11 4.000000
#> 11    3 12  5 1     3       12 5.000000
#> 12    4 13  1 1     3        2 2.461538
#> 13    4 14  2 1     3        2 2.461538
#> 14    4 15  3 1     3       15 3.000000
#> 15    4 16  4 1     3       16 4.000000
#> 16    4 17  5 1     3       17 5.000000
#> 17    5 18  1 1     3        2 2.461538
#> 18    5 19  2 1     3        2 2.461538
#> 19    5 20  3 1     3       20 3.000000
#> 20    5 21  4 1     3       21 4.000000
#> 21    5 22  5 1     3       22 5.000000
#> 22    6 23  1 1     3        2 2.461538
#> 23    6 24  2 1     3        2 2.461538
#> 24    6 25  3 1     3       25 3.000000
#> 25    6 26  4 1     3       26 4.000000
#> 26    6 27  5 1     3       27 5.000000
#> 27    7 28  1 1     3        7 2.666667
#> 28    7 29  2 1     3        7 2.666667
#> 29    7 30  3 1     3       30 3.000000
#> 30    7 31  4 1     3       31 4.000000
#> 31    7 32  5 1     3       32 5.000000

Plot the maximum subtree, and all subtrees connected to it. The blue nodes are part of the maximum subtree. Their average cost is 2.46, which is lower than the original value of the top node (10). The cost of appending any other nodes is higher than 2.46. The pink nodes comprise another subtree of nodes that have been averaged together, to achieve the value 2.67, which is lower than the original value of their top node (5).

Algorithm

A worked example is in a Google Sheet.

It is also explained in the docs ?algorithm in the file R/algorithm.R.

wmms's People

Contributors

nacnudus avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.