Code Monkey home page Code Monkey logo

factoranalysis's Introduction

Background

Factor analysis ("FA") is a classical technique in multivariate statistics enjoying a rich history dating back over 100 years. In its simplest form, FA focuses on decomposing a given covariance matrix S into two respective components: S=T+P, where T is a (low-rank) positive semidefinite ("PSD") matrix (representing the common variances) and P is a diagonal matrix with non-negative diagonal entries (representing the individual variances). In reality, this noiseless decomposition can often be too restrictive, and so instead it is better to decompose S approximately as a low-rank matrix T plus a diagonal matrix P.

Our approach

This page contains several sample implementations of an estimation procedure for FA as described in Bertsimas, Copenhaver, and Mazumder, "Certifiably Optimal Low Rank Factor Analysis", Journal of Machine Learning Research 18 (2017) ("BCM17"). The approach is based on conditional gradient methods from convex optimization. We provide several sample implementations of Algorithm 1 (see page 13). Given the well-structured nature of the problems solved in Algorithm 1, there are algorithmic improvements that can be made to the implementations here, but these serve as a good starting point.

The problem that we focus on is the case of q=1 outlined in BCM17. This special case is known as Approximate Minimum Rank Factor Analysis, or MRFA, and takes the form

minimize	nuclear_norm( S - ( T + P ) )
subject to 	P, T >= 0
		S - P >= 0
		P diagonal
		rank(T) <= r

Here the optimization variables are T and P; the notation A >= 0 denotes that a matrix A is symmetric positive semidefinite; and the nuclear norm is the sum of the singular values. As shown in BCM17, this can be rewritten exactly as

minimize	trace( W*S - W*P )
subject to 	W, P >= 0
		S - P >= 0
		P diagonal
		I - W >= 0
		trace(W) = p - r

where p is the number of variables (i.e. S is a p by p matrix) and I is the p by p identity matrix.

The resulting algorithm described in BCM17 amounts to an alternating minimization scheme in W and P.

Implementations

The two implementations are as follows:

  1. R

    This implementation is as described in the paper. In particular, the update with respect to P, where W is fixed, is solved using the customized ADMM (Section 3.2.2).

  2. MATLAB

    This implementation relies on cvx and highlights the simplicity of the alternating minimization approach.

Citation

If you would like to cite this work, please use the following citation for BCM17:

@article{bcm17,
  author  = {Dimitris Bertsimas and Martin S. Copenhaver and Rahul Mazumder},
  title   = {Certifiably Optimal Low Rank Factor Analysis},
  journal = {Journal of Machine Learning Research},
  year    = {2017},
  volume  = {18},
  number  = {29},
  pages   = {1-53},
  url     = {http://jmlr.org/papers/v18/15-613.html}
}

factoranalysis's People

Contributors

copenhaver avatar

Stargazers

 avatar  avatar  avatar  avatar

Watchers

 avatar

factoranalysis's Issues

License?

Hi,

Do you intend to release this code under an free/open license? There is no LICENSE/COPYING file nor any copyright note on the source files.
This implies the code is proprietary.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.