Code Monkey home page Code Monkey logo

lol's Introduction

Linear Optimal Low Rank Projection (lolR)

CRAN Status Badge arXiv shield Travis-CI Build Status Codecov status Downloads badge DOI

Contents

Overview

Supervised learning techniques designed for the situation when the dimensionality exceeds the sample size have a tendency to overfit as the dimensionality of the data increases. To remedy this high dimensionality; low sample size (HDLSS) situation, we attempt to learn a lower-dimensional representation of the data before learning a classifier. That is, we project the data to a situation where the dimensionality is more manageable, and then we are able to better apply standard classification or clustering techniques since we will have fewer dimensions to overfit. A number of previous works have focused on how to strategically reduce dimensionality in the unsupervised case, yet in the supervised HDLSS regime, few works have attempted to devise dimensionality reduction techniques that leverage the labels associated with the data. In this package, we provide several methods for feature extraction, some utilizing labels and some not, along with easily extensible utilities to simplify cross-validative efforts to identify the best feature extraction method. Additionally, we include a series of adaptable benchmark simulations to serve as a standard for future investigative efforts into supervised HDLSS. Finally, we produce a comprehensive comparison of the included algorithms across a range of benchmark simulations and real data applications.

Repo Contents

  • R: R package code.
  • docs: package documentation, and usage of the lolR package on many real and simulated data examples.
  • man: package manual for help in R session.
  • tests: R unit tests written using the testthat package.
  • vignettes: R vignettes for R session html help pages.

System Requirements

Hardware Requirements

The lol package requires only a standard computer with enough RAM to support the operations defined by a user. For minimal performance, this will be a computer with about 2 GB of RAM. For optimal performance, we recommend a computer with the following specs:

RAM: 16+ GB
CPU: 4+ cores, 3.3+ GHz/core

The runtimes below are generated using a computer with the recommended specs (16 GB RAM, 4 [email protected] GHz) and internet of speed 25 Mbps.

Software Requirements

OS Requirements

The package development version is tested on Linux operating systems. The developmental version of the package has been tested on the following systems:

Linux: Ubuntu 16.04
Mac OSX:
Windows:

The CRAN package should be compatible with Windows, Mac, and Linux operating systems.

Before setting up the lolR package, users should have R version 3.4.0 or higher, and several packages set up from CRAN.

Installing R version 3.4.2 on Ubuntu 16.04

the latest version of R can be installed by adding the latest repository to apt:

sudo echo "deb http://cran.rstudio.com/bin/linux/ubuntu xenial/" | sudo tee -a /etc/apt/sources.list
gpg --keyserver keyserver.ubuntu.com --recv-key E084DAB9
gpg -a --export E084DAB9 | sudo apt-key add -
sudo apt-get update
sudo apt-get install r-base r-base-dev

which should install in about 20 seconds.

Installation Guide

Stable Release

lolR is available in a stable release on CRAN:

install.packages('lolR')

Development Version

Package dependencies

Users should install the following packages prior to installing lolR, from an R terminal:

install.packages(c('ggplot2', 'abind', 'irlba', 'knitr', 'rmarkdown', 'latex2exp', 'MASS', 'randomForest'))

which will install in about 30 seconds on a machine with the recommended specs.

The lolR package functions with all packages in their latest versions as they appear on CRAN on December 13, 2017. Users can check CRAN snapshot for details. The versions of software are, specifically:

abind_1.4-5
latex2exp_0.4.0
ggplot2_2.2.1
irlba_2.3.1
Matrix_1.2-3
MASS_7.3-47
randomForest_4.6-12

If you are having an issue that you believe to be tied to software versioning issues, please drop us an Issue.

Package Installation

From an R session, type:

require(devtools)
install_github('neurodata/lol', build_vignettes=TRUE, force=TRUE)  # install lol with the vignettes
require(lolR)
vignette("lol", package="lolR")  # view one of the basic vignettes

The package should take approximately 40 seconds to install with vignettes on a recommended computer.

Demo

Functions

For interactive demos of the functions, please check out the vignettes built into the package. They can be accessed as follows:

require(lolR)
vignette('lol')
vignette('pca')
vignette('cpca')
vignette('lrcca')
vignette('mdp')
vignette('xval')
vignette('qoq')
vignette('simulations')
vignette('nearestCentroid')

Extending the lolR Package

The lolR package makes many useful resources available (such as embedding and cross-validation) for simple extension.

To extend the lolR package, check out the vignettes:

require(lolR)
vignette('extend_embedding')
vignette('extend_classification')

Results

In this benchmark comparison, we show that LOL does better than all linear embedding techniques in supervised HDLSS settings when dimensionality is high (d > 100, ntrain <= d) on 20 benchmark problems from the UCI and PMLB datasets. LOL provides a good tradeoff between maintaining the class conditional difference (good misclassification rate) in a small number of dimensions (low number of embedding dimensions).

Citation

For usage of the package and associated manuscript, please cite according to the enclosed citation.bib.

lol's People

Contributors

loftusa avatar jasonkyuyim avatar instigatorofawe avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.