Code Monkey home page Code Monkey logo

dpmm's Introduction

DpMM

A set of Dirichlet Process Mixture Model (DPMM) sampling-based inference algorithms.

This is research code and builds on the following two papers (please cite them appropriately):

  • [1] Jason Chang and John W. Fisher III. Parallel Sampling of DP Mixture Models using Sub-Clusters Splits, NIPS 2013.
  • [2] Julian Straub, Jason Chang, Oren Freifeld and John W. Fisher III. Dirichlet Process Mixture Model for Spherical Data, AISTATS 2015.

Dependencies

This code depends on the following other libraries: Eigen3, Boost, CUDA. Optionally OpenMP.

It has been tested under Ubuntu 14.04 with:

  • Eigen3 (3.0.5)
  • Boost (1.52)
  • CUDA (6.5)

It has been tested under Windows 7 with:

  • Visual Studio 2012
  • Eigen3 (3.2.3)
  • Boost (1.57)
  • CUDA (6.5)

It has been tested under Mac OS (10.9.4) with:

  • clang-600.51
  • Eigen3 (3.2.1)
  • Boost (1.55)
  • CUDA (6.5)
  • No OpenMP

Compiling

  • Linux:

    Install Eigen3 and Boost

    sudo apt-get install libeigen3-dev libboost-dev 
    

    Install the appropriate CUDA version matching with your nvidia drivers. On our machines we use nvidia-340-dev with libcuda1-340 cuda-6-5 cuda-toolkit-6-5

    Clone this repository and compile the code:

    git clone https://github.com/jstraub/dpMM; cd dpMM; mkdir build; cd
    build; cmake ..; make -j6;
    

Getting Started

After compiling the sampler executable as described above, get started quickly by looking at python/dpmmSampler.py. It simply loads a dataset and runs the sub-cluster split/merge algorithm with different base measures:

python ./python/dpmmSampler.py -i ./data/rndSphereDataIwUncertain.csv -b DpNiwSphereFull -T 400
python ./python/dpmmSampler.py -i ./data/rndSphereDataIwUncertain.csv -b DpNiw -T 400

Where DpNiwSphereFull is for the DP-TGMM [2] and DpNiw for the standard DP-GMM [1]. After finishing the specified number of iterations (via the -T option) the log likelihood as well as the number of clusters over the iterations is shown. Note that the true number of clusters of the data in ./data/rndSphereDataIwUncertain.csv is 30 where each of the clusters has 333 data-points. A groundtruth labeling can be found in ./data/rndSphereDataIwUncertain_gt.lbl. Make sure you compiled the .cpp code beforehand since the python script just wraps the call to dpmmSampler.

Executables

  • dpmmSampler: Sampler for Dirichlet process mixture model (DPMM) inference using different algorithms. It is usually preferable to use the python script in python/dpmmSampler.py which wraps around this executable to provide an easier-to-use interface.

Allowed options: -h [ --help ] produce help message --seed arg seed for random number generator -N [ --N ] arg number of input datapoints -D [ --D ] arg number of dimensions of the data -T [ --T ] arg number of sampler iterations -a [ --alpha ] arg alpha parameter of the DP (if single value assumes all alpha_i are the same -K [ --K ] arg number of initial clusters -n [ --nopropose ] flag to disable the propsal of splits and merges -s [ --silhouette ] flag to enable output of silhouett value of the last iteration --shuffle shuffle the data before processing --base arg which base measure to use (StickNiw, DpNiw (DP-GMM), DpNiwSphereFull (DP-TGMM), DpNiwSphere, NiwSphere, DirNiwSphereFull NiwSphereUnifNoise, CrpvMF, DirvMF right now) -p [ --params ] arg parameters of the base measure --brief arg brief parameters of the base measure (ie Delta = deltaI; theta=tones(D) -i [ --input ] arg path to input dataset .csv file (rows: dimensions; cols: different datapoints) -o [ --output ] arg path to output labels .csv file (rows: time; cols: different datapoints) ``` Parameter arguments by model (compare src/dpmmSampler.cpp and python/dpmmSampler.py): - DpNiw (DP-GMM): nu kappa theta0 ... thetaD Delta00 Delta01 ... DeltaDD - DpNiwSphereFull (DP-TGMM): nu Delta00 Delta01 ... Delta(D-1)(D-1)

  • generateSphericalData: generate spherical data for synthetic data experiments.

Allowed options: -h [ --help ] produce help message --seed arg seed for random number generator -N [ --N ] arg number of input datapoints -D [ --D ] arg number of dimensions of the data -K [ --K ] arg number of initial clusters -n [ --nu ] arg nu parameter of IW from which sigmas are sampled -a [ --minAngle ] arg min angle between means on sphere -d [ --delta ] arg delta of NIW -o [ --output ] arg path to output labels and data .csv file (rows: time; cols: different datapoints) ```

Collaborators

Julian Straub (jstraub) and Randi Cabezas (rcabezas)

dpmm's People

Contributors

jstraub avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar

dpmm's Issues

Errors in sphereGpu_test

When running testSphere I get the following error:

➜  build git:(master) ✗ ./testSphere 
Running 2 test cases...
----------------------- sphere ----------------------
sampled point:  0.959749 -0.280846 0.0028591
q             :   0.886007 -0.0371732   0.462179
q in TpS north: 0.888576 0.465101
q (Log->Exp)  :   0.886007 -0.0371732   0.462179
x in TpS:   -0.577018    0.432391     1.04507    0.769108 -0.00310499    -0.20087    0.411206    -0.16583    0.409948
muTrue=
 -0.591753  0.0185825   0.541358   0.570023   0.513567
  0.190217 -0.0309612  -0.639339  -0.472942   -0.33348
 -0.783355   0.999348   0.546056  -0.671863   0.790595
muEst =
 -0.687199  0.0471051   0.175038    0.80972   0.515675
  0.246121 -0.0251527  -0.763128  -0.401764    -0.3977
 -0.683508   0.998573   0.622091  -0.427714    0.75889
muInit =
  -0.628084     0.24783    0.544876 0.000865349    0.439955
   0.562772    0.786237   -0.827546   -0.397558   -0.179061
    -0.5374   -0.566049   -0.135196   -0.917577    0.879987
mapping -1  0  0 to TpS at1 0 0
3.48787e-16           0    -3.14159
Exp ing it back down:           -1            0 -1.22465e-16
----------------------- sphere Gpu ----------------------
 karcher means on CPU
-0.642447
 0.261111
-0.720474
-0.483174
 0.340593
 -0.80656
D=3 N=20 K=6
ClTGMMDataGpu<T>::karcherMeans__: converged after 0 residual = 0 0 0 0 0 0
karcherMeansFull: 6.623ms
-0.885112  0.897132  -0.75364  0.770911  0.156756 -0.654378
-0.412325    0.3888 -0.311664 -0.136646  0.127642 -0.734173
 0.215789 -0.209737 -0.578699 -0.622113  0.979355  0.181049
true centers
-0.591753 -0.495743
 0.190217  0.423411
-0.783355 -0.758262
/home/akonneker/code/dpMM/test/sphere.cpp(145): error in "sphereGpu_test": check ((karcherMeans.leftCols(2) - mus).array().abs() < 1e-2).all() failed
Log_p_north GPU: 0.064ms
/home/akonneker/code/dpMM/test/sphere.cpp(150): error in "sphereGpu_test": check x.rows() == static_cast<myFlt>(D-1) failed
/home/akonneker/code/dpMM/test/sphere.cpp(157): error in "sphereGpu_test": check fabs(x.col(i).norm() -acosf(ps.col((*sz)(i)).transpose()*sq->col(i)) ) < 1e-4 failed
0.498654
/home/akonneker/code/dpMM/test/sphere.cpp(157): error in "sphereGpu_test": check fabs(x.col(i).norm() -acosf(ps.col((*sz)(i)).transpose()*sq->col(i)) ) < 1e-4 failed
0.0388851
/home/akonneker/code/dpMM/test/sphere.cpp(157): error in "sphereGpu_test": check fabs(x.col(i).norm() -acosf(ps.col((*sz)(i)).transpose()*sq->col(i)) ) < 1e-4 failed
0.780706
/home/akonneker/code/dpMM/test/sphere.cpp(157): error in "sphereGpu_test": check fabs(x.col(i).norm() -acosf(ps.col((*sz)(i)).transpose()*sq->col(i)) ) < 1e-4 failed
0.297408
/home/akonneker/code/dpMM/test/sphere.cpp(157): error in "sphereGpu_test": check fabs(x.col(i).norm() -acosf(ps.col((*sz)(i)).transpose()*sq->col(i)) ) < 1e-4 failed
0.791581
/home/akonneker/code/dpMM/test/sphere.cpp(157): error in "sphereGpu_test": check fabs(x.col(i).norm() -acosf(ps.col((*sz)(i)).transpose()*sq->col(i)) ) < 1e-4 failed
0.0459907
/home/akonneker/code/dpMM/test/sphere.cpp(157): error in "sphereGpu_test": check fabs(x.col(i).norm() -acosf(ps.col((*sz)(i)).transpose()*sq->col(i)) ) < 1e-4 failed
0.434165
/home/akonneker/code/dpMM/test/sphere.cpp(157): error in "sphereGpu_test": check fabs(x.col(i).norm() -acosf(ps.col((*sz)(i)).transpose()*sq->col(i)) ) < 1e-4 failed
1.75733
/home/akonneker/code/dpMM/test/sphere.cpp(157): error in "sphereGpu_test": check fabs(x.col(i).norm() -acosf(ps.col((*sz)(i)).transpose()*sq->col(i)) ) < 1e-4 failed
1.18908
/home/akonneker/code/dpMM/test/sphere.cpp(157): error in "sphereGpu_test": check fabs(x.col(i).norm() -acosf(ps.col((*sz)(i)).transpose()*sq->col(i)) ) < 1e-4 failed
0.945637
/home/akonneker/code/dpMM/test/sphere.cpp(157): error in "sphereGpu_test": check fabs(x.col(i).norm() -acosf(ps.col((*sz)(i)).transpose()*sq->col(i)) ) < 1e-4 failed
0.203667
/home/akonneker/code/dpMM/test/sphere.cpp(157): error in "sphereGpu_test": check fabs(x.col(i).norm() -acosf(ps.col((*sz)(i)).transpose()*sq->col(i)) ) < 1e-4 failed
0.369025
/home/akonneker/code/dpMM/test/sphere.cpp(157): error in "sphereGpu_test": check fabs(x.col(i).norm() -acosf(ps.col((*sz)(i)).transpose()*sq->col(i)) ) < 1e-4 failed
0.710309
/home/akonneker/code/dpMM/test/sphere.cpp(157): error in "sphereGpu_test": check fabs(x.col(i).norm() -acosf(ps.col((*sz)(i)).transpose()*sq->col(i)) ) < 1e-4 failed
0.174019
/home/akonneker/code/dpMM/test/sphere.cpp(157): error in "sphereGpu_test": check fabs(x.col(i).norm() -acosf(ps.col((*sz)(i)).transpose()*sq->col(i)) ) < 1e-4 failed
2.09557
/home/akonneker/code/dpMM/test/sphere.cpp(157): error in "sphereGpu_test": check fabs(x.col(i).norm() -acosf(ps.col((*sz)(i)).transpose()*sq->col(i)) ) < 1e-4 failed
1.55793
/home/akonneker/code/dpMM/test/sphere.cpp(157): error in "sphereGpu_test": check fabs(x.col(i).norm() -acosf(ps.col((*sz)(i)).transpose()*sq->col(i)) ) < 1e-4 failed
1.44941
/home/akonneker/code/dpMM/test/sphere.cpp(157): error in "sphereGpu_test": check fabs(x.col(i).norm() -acosf(ps.col((*sz)(i)).transpose()*sq->col(i)) ) < 1e-4 failed
1.29958
/home/akonneker/code/dpMM/test/sphere.cpp(157): error in "sphereGpu_test": check fabs(x.col(i).norm() -acosf(ps.col((*sz)(i)).transpose()*sq->col(i)) ) < 1e-4 failed
1.19854
/home/akonneker/code/dpMM/test/sphere.cpp(157): error in "sphereGpu_test": check fabs(x.col(i).norm() -acosf(ps.col((*sz)(i)).transpose()*sq->col(i)) ) < 1e-4 failed
1.75418
Log_p_north CPU: 0.021ms
/home/akonneker/code/dpMM/test/sphere.cpp(173): error in "sphereGpu_test": check ((x-xx).array().abs() < 1.0e-4).all() failed
sufficient statistics test --------------------
-12.2931  9.45833        0        0        0        0
 1.58493  13.4296        0        0        0        0
 15.5211  10.5617        0        0        0        0
-1.94223  12.2331        0        0        0        0
-1.94223  12.2331        0        0        0        0
0.921035  19.3358        0        0        0        0
      10       10        0        0        0        0
unknown location(0): fatal error in "sphereGpu_test": memory access violation at address: 0x00000000: no mapping at fault address
/home/akonneker/code/dpMM/test/sphere.cpp(173): last checkpoint

*** 24 failures detected in test suite "distributions test"

Any thoughts as to what the issue might be? All the other tests run just fine.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.