Code Monkey home page Code Monkey logo

sparse's Introduction

Sparse matrix formats

License: MIT GoDoc Go Report Card

Implementations of selected sparse matrix formats for linear algebra supporting scientific and machine learning applications.

Machine learning applications typically model entities as vectors of numerical features so that they may be compared and analysed quantitively. Typically the majority of the elements in these vectors are zeros. In the case of text mining applications, each document within a corpus is represented as a vector and its features represent the vocabulary of unique words. A corpus of several thousand documents might utilise a vocabulary of hundreds of thousands (or perhaps even millions) of unique words but each document will typically only contain a couple of hundred unique words. This means the number of non-zero values in the matrix might only be around 1%.

Sparse matrix formats capitalise on this premise by only storing the non-zero values thereby reducing both storage/memory requirements and processing effort for manipulating the data. Sparse matrices can effectively be divided into 3 main categories:

  1. Creational - Sparse matrix formats suited to construction and building of matrices. Matrix formats in this category include DOK (Dictionary Of Keys) and COO (COOrdinate aka triplet).

  2. Operational - Sparse matrix formats suited to arithmetic operations e.g. multiplication. Matrix formats in this category include CSR (Compressed Sparse Row aka CRS - Compressed Row Storage) and CSC (Compressed Sparse Column aka CCS - Compressed Column Storage)

  3. Specialised - Specialised matrix formats suiting specific sparsity patterns. Matrix formats in this category include DIA (DIAgonal) for efficiently storing and manipulating symmetric diagonal matrices.

A common practice is to construct sparse matrices using a creational format e.g. DOK or COO and then convert them to an operational format e.g. CSR for arithmetic operations.

Implemented Features

  • DOK (Dictionary Of Keys) format
  • COO (COOrdinate) format (sometimes referred to as 'triplet')
  • CSR (Compressed Sparse Row) format
  • CSC (Compressed Sparse Column) format
  • DIA (DIAgonal) format
  • CSR dot product (matrix multiplication) of 2 matrices (with optimisations for operands of type DIA (as LHS or RHS operand), CSR (LHS operand only) and CSC (RHS operand only when LHS operand is CSR) but supporting any implementation of gonum/mat64.Matrix interface).
  • CSR addition of 2 matrices (with optimisations for operands of type CSR but supporting any implementation of gonum/mat64.Matrix interface).
  • Row and column slicing of CSR and CSC types.

Planned

  • Further optimisations of CSR dot product for sparse matrix type operands (only considering non-zero values as with CSR operands currently), even as RHS operand ((AB)^T = B^T A^T)
  • Consider implicitly converting sparse matrix operands to CSR/CSC for arithmetic operations
  • Implement Parallel/fast matrix multiplication algorithm for sparse matrices
  • Implement further arithmetic operations e.g. subtract, divide, element wise multiplication, etc.
  • Consider utilising LAPACK/BLAS/etc. to perform matrix arithmetic (as an option if available on host).
  • Improve memory allocation for matrix multiplication - pre-calculating sparsity pattern for product and allocate storage in advance rather than incrementally.
  • standard API for iteration over non-zero elements for each sparse format type
  • row and column slicing of other types

sparse's People

Contributors

james-bowman avatar eugene-shvarts avatar

Watchers

Paul Bohm avatar Christopher Gilson avatar Konrad Reiche avatar James Cloos avatar Kai Faust avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.