Code Monkey home page Code Monkey logo

elbow_inference's Introduction

Post-selection inference on the proportion of variance explained by PCA

Principal component analysis (PCA) is a longstanding and well-studied approach for dimension reduction. It rests upon the assumption that the underlying signal in the data has low rank, and thus can be well-summarized using a small number of dimensions. The output of PCA is typically represented using a scree plot, which displays the proportion of variance explained (PVE) by each principal component. While the PVE is extensively reported in routine data analyses, to the best of our knowledge the notion of inference on the PVE remains unexplored.

We consider inference on the PVE. We first introduce a new population quantity for the PVE with respect to an unknown matrix mean. Critically, our interest lies in the PVE of the sample principal components (as opposed to unobserved population principal components); thus, the population PVE that we introduce is defined conditional on the sample singular vectors. We show that it is possible to conduct inference, in the sense of confidence intervals and p-values, on this population quantity. Furthermore, we can conduct valid inference on the PVE of a subset of the principal components, even when the subset is selected using a data-driven approach such as the elbow rule. We demonstrate the proposed approach in simulation and in an application to a gene expression dataset.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.