Code Monkey home page Code Monkey logo

bayesian-k2-r's Introduction

The K2 algorithm in R

Group members // Barone, Nagaro, Ninni

This is our final project for Advanced Statistics for Physics Analysis.

Bayesian networks

A Bayesian network (also known as a belief network) is a probabilistic graphical model that represents a set of variables and their conditional dependencies via a directed acyclic graph (DAG).

Such DAGs consist of:

  • $\color{red}{\text{nodes}}\leftrightarrow$ variables in the Bayesian sense (observable quantities, latent variables, unknown parameters or hypotheses)
    • unconnected nodes $\leftrightarrow$ conditionally independent variables
    • each node $\leftrightarrow$ probability function (INPUT: a particular set of values for the node's parent variables, OUTPUT: probability of the variable represented by the node)
  • $\color{blue}{\text{edges}}\leftrightarrow$ conditional dependencies

Given a database of records, it is interesting to construct a Bayesian network which can provide insights into the probabilistic dependencies existing among the variables in the database itself. Such network can be further used to predict the future behaviour of the system thus modeled.

Topology learning

In the simplest case, a Bayesian network has a small number of nodes and therefore can be specified manually by an expert. Even in the case of a large number of nodes, an expert could be able to assess the network structure, if his knowledge of the system is sufficient. In general, it is crucial to learn the network topology following a data-driven approach.

Automatically learning the graph structure of a Bayesian network is a challenge pursued within machine learning. For instance, $\color{green}{\text{score-based}}$ algorithms use optimization-based search according to:

  • a scoring function
  • a search strategy

Such score-based algorithms focus on the DAG as a whole and approach the topology learning task as an optimization problem:

  1. each candidate DAG is assigned a score which represents a statistical measurement of how well the DAG itself mirrors the dependence structure of the data
  2. the output DAG is the one corresponding to the highest score

K2 algorithm

The K2 algorithm is a score-based algorithm introduced by G. F. Cooper and E. Herskovits in 1992. It allows to learn the topology of a Bayesian network, i.e. to find the most probable belief network structure, given a database of cases.

The algorithm holds on the following assumptions:

  • there is an ordering on the domain variables
  • a priori, all structures are considered equally likely

The algorithm is based on the use of a greedy-search method. In particular, for each node:

  1. it begins by making the assumption that the current node has no parents
  2. it adds incrementally that parent whose addition most increases the probability of the resulting structure
  3. when the addition of no single parent can increase the probability, it stops adding parents to the current node and moves on to the next node

Bibliography

  • M. Scutari and J. B. Denis, Bayesian Networks, CRC Press, 2022, Taylor and Francis Group
  • G. F. Cooper and E. Herskovits, A Bayesian Method for the Induction of Probabilistic Networks from Data, Machine Learning 9, (1992) 309
  • C. Ruiz, Illustration of the K2 Algorithm for learning Bayes Net Structures web.cs.wpi.edu
  • A. Franzin et al., bnstruct: an R package for Bayesian Network structure learning in the presence of missing data, Bioinformatics 33(8) (2017) 1250
  • F. Sambo and A. Franzin, bnstruct: an R package for Bayesian Network Structure Learning with missing data, December 12, 2016
AdvStat4Phys project
AY 2021/2022 University of Padua

โ€ƒ

bayesian-k2-r's People

Contributors

baronefr avatar danieleninni avatar gnagaro avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.