d2lib
is a C++ library of discrete distribution
(d2) based data processing framework. It also contains a collection of
computing tools supporting the analysis of d2 data.
[under construction]
- discrete distribution over Euclidean space
- discrete distribution with finite possible supports in Euclidean space (e.g., bag-of-word-vectors and sparsified histograms)
- n-gram data with cross-term distance
- dense histogram
- distributed/serial IO
- compute Wasserstein distance between a pair of D2.
- nearest neighbors [TBA]
- D2-clustering [TBA]
- Dirichlet process [TBA]
- document analysis: from bag-of-words to .d2s format [TBA]