Spark Implementation of ToPMine, a topic modeling algorithm based on a bag of variable-length phrases.
From "Scalable Topical Phrase Mining from Text Corpora" by Ahmed El-Kishky, Yanglei Song, Chi Wang, Clare R. Voss, Jiawei Han (http://arxiv.org/pdf/1406.6312.pdf)
Includes 3 main components:
-Parallelelized Frequent Phrase Mining
-Agglomerative Phrase Merging
-LDA