Code Monkey home page Code Monkey logo

cop-kmeans's Introduction

COP-Kmeans

DOI

This is an implementations of the Constrained K-means algorithm, introduced by Wagstaff et al. This implementation is developed according to the description of algorithm as presented in [1].

The COP-Kmeans algorithm

This is the COP-Kmeans algorithm, as described in [1]:

Usage

usage: run_ckm.py [-h] [--ofile OFILE] [--n_rep N_REP] [--m_iter M_ITER] [--tol TOL] dfile cfile k

Run COP-Kmeans algorithm

positional arguments:
  dfile            data file
  cfile            constraint file
  k                number of clusters

optional arguments:
  -h, --help       show this help message and exit
  --ofile OFILE    file to store the output
  --n_rep N_REP    number of times to repeat the algorithm
  --m_iter M_ITER  maximum number of iterations of the main loop
  --tol TOL        tolerance for deciding on convergence

To see a run of the algorithm on example data and constraints, run the script runner.sh in the examples directory.

Citing

If you want to cite this implementation, you can use the following bibtex entry (other formats are also available):

@misc{behrouz_babaki_2017_831850,
  author       = {Behrouz Babaki},
  title        = {COP-Kmeans version 1.5},
  month        = jul,
  year         = 2017,
  doi          = {10.5281/zenodo.831850},
  url          = {https://doi.org/10.5281/zenodo.831850}
}

There's more ...

Other implementations

Other types of constraints

There is another version of constrained Kmeans that handles size constraints [2]. A python implementation of the algorithm (and its extensions) is available here.

Exact algorithms for constrained clustering

In 2013-14, I was working on developing an integer linear programming formulation for an instance of the constrained clustering problem. The approach that I chose was Branch-and-Price (also referred to as column-generation). In the initialization step of my algorithm, I needed another algorithm that can produce solutions of reasonably good quality very quickly. The algorithm COP-Kmeans turned out to be exactly what I was looking for. Interested in knowing more about my own work? Go to my homepage, from where you can access my paper [3] and the corresponding code.

There is also a body of work on using constraint programming for exact constrained clustering. In particular, [4] is the state-of-the art in exact constrained clustering.

References

  1. Wagstaff, K., Cardie, C., Rogers, S., & Schrödl, S. (2001, June). Constrained k-means clustering with background knowledge. In ICML (Vol. 1, pp. 577-584).

  2. Bradley, P. S., K. P. Bennett, and Ayhan Demiriz. "Constrained k-means clustering." Microsoft Research, Redmond (2000): 1-8.

  3. Babaki, B., Guns, T., & Nijssen, S. (2014). Constrained clustering using column generation. In Integration of AI and OR Techniques in Constraint Programming (pp. 438-454). Springer International Publishing.

  4. Guns, Tias, Christel Vrain, and Khanh-Chuong Duong. "Repetitive branch-and-bound using constraint programming for constrained minimum sum-of-squares clustering." 22nd European Conference on Artificial Intelligence. 2016.

cop-kmeans's People

Watchers

Kevin Bi avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.