Code Monkey home page Code Monkey logo

uspec_usenc's Introduction

Ultra-Scalable Spectral Clustering and Ensemble Clustering

Overview

This repository provides the Matlab source code for two large-scale clustering algorithms, namely, Ultra-Scalable Spectral Clustering (U-SPEC) and Ultra-Scalable Ensemble Clustering (U-SENC), both of which have nearly linear time and space complexity and are capable of robustly and efficiently partitioning ten-million-level nonlinearly-separable datasets on a PC with 64GB memory.

If you find this repository helpful for your research, please cite the paper below.

Dong Huang, Chang-Dong Wang, Jian-Sheng Wu, Jianhuang Lai, and Chee-Keong Kwoh.
Ultra-Scalable Spectral Clustering and Ensemble Clustering, 
IEEE Transactions on Knowledge and Data Engineering (TKDE), 2020, vol.32, no.6, pp.1212-1226. 
DOI: https://doi.org/10.1109/TKDE.2019.2903410

Description of Files

Code

Function Description
demo_1_USPEC.m A demo of the U-SPEC algorithm.
demo_2_USENC.m A demo of the U-SENC algorithm.
USPEC.m Call this function to perform the U-SPEC algorithm.
USENC.m Call this function to perform the U-SENC algorithm.
litekmeans.m A fast implementation of k-means.
computeNMI.m Call this function to compute the NMI score.
synthesizeLargescaleDatasets.p Call this function to synthesize the five large-scale datasets, whose sizes range from one million to twenty million.
synthesizeLargescaleDatasets_withArbitrarySizes.p Produce the five synthetic datasets with arbitrary sizes.

Data

In this repository, we provide the files of the five real-world datasets, namely, PenDigits, USPS, Letters, MNIST, and Covertype. We also provide the MATLAB code to reproduce the five large-scale synthetic datasets used in our paper.

How to Reproduce the Synthetic Datasets?

  • To generate the five large-scale synthetic datasets, you can call the synthesizeLargescaleDatasets function, which has just one input parameter. Note that this input parameter can be set to one of the five data names:

    • 'TB1M'
    • 'SF2M'
    • 'CC5M'
    • 'CG10M'
    • 'Flower20M'

    Example (to synthesize the CC5M dataset):

     synthesizeLargescaleDatasets('CC5M');
     % The synthesized dataset will be saved in 'data_CC5M.mat'.
    
  • To generate the five synthetic datasets with arbitrary sizes, you can call the synthesizeLargescaleDatasets_withArbitrarySizes function, which has two input parameters, that is

    • dataName: can be one of the five names: 'TB', 'SF', 'CC', 'CG', 'Flower'
    • dataSize: can be set to any integers, provided that you have enough space to save the data.

    Example (to synthesize a CG dataset with one million points):

     dataName = 'CG';
     dataSize = 1000000;
     synthesizeLargescaleDatasets_withArbitrarySizes(dataName, dataSize); 
     % The synthesized dataset will be saved in 'data_CG_1000000.mat'.
    

Further Questions?

Don't hesitate to contact me if you have any questions regarding this work.
Email: huangdonghere at gmail dot com
Website: https://www.researchgate.net/publication/330760669

uspec_usenc's People

Contributors

huangdonghere avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar

uspec_usenc's Issues

python implementation?

Hi!Thank you for developing the wonderful algorithm! Is there any Python implementation of the algorithm? Thanks again!!

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.