Code Monkey home page Code Monkey logo

pyccalg's Introduction

PyCCAlg - Correlation-Clustering Algorithms in Python

Python implementation of some algorithms for Correlation Clustering. Specifically:

  • Linear-programming + region-growing O(log n)-approximation algorithms for general weighted graphs
  • kwikcluster in src/pyccalg.py: KwikCluster randomized, linear-time algorithm (Ailon et al., JACM 2008), achieving constant-factor approximation guarantees on complete graphs satisfying certain constraints (e.g., probability constraint and/or triangle-inequality constraint)

Requirements:

  • Python v3.6+
  • For linear-programming-based algorithms:
    • SciPyv1.6+ and/or PuLP
    • SciPy linprog comes with various solvers: 'Method highs-ds is a wrapper of the C++ high performance dual revised simplex implementation (HSOL). Method highs-ipm is a wrapper of a C++ implementation of an interior-point method; it features a crossover routine, so it is as accurate as a simplex solver. Method highs chooses between the two automatically. For new code involving linprog, we recommend explicitly choosing one of these three method values instead of interior-point (default), revised simplex, and simplex (legacy)'. See here for more details.
    • PuLP comes with two solvers by default: CBC (linear and integer programming) and CHOCO (constraint programming), but it can connect to many others (e.g., GUROBI, CPLEX, SCIP, MIPCL, XPRESS, GLPK9) if you have them installed
    • Here we use highs-ipm with SciPy linprog and the default CBC with PuLP
    • However, any linear-programming Python (other than SciPy linprog or PuLP) library can alternatively be used with minimal adaption

Usage:

python src/pyccalg.py -d <DATASET_FILE> [-r <LB,UB>] [-a <PROB>] [-s {'pulp','scipy'}] [-m {'charikar','demaine','kwik'}]

  • Optional arguments:
    • -r <LB,UB>, if you want to generate random edge weights from [LB,UB] range
    • -a <PROB>, if you want to randomly add edges with probability PROB
    • -m {'charikar','demaine','kwik'}, to choose the algorithm (default: 'charikar'). NOTE: kwikcluster is always run too
    • -s {'pulp','scipy'}, to select the solver to be used (default: 'scipy' (it seems faster))
  • Dataset-file format:
    • First line: #VERTICES \t #EDGES
    • One line per edge; every line is a quadruple: NODE1 \t NODE2 \t POSITIVE_WEIGHT \t NEGATIVE_WEIGHT (POSITIVE_WEIGHT and NEGATIVE_WEIGHT are ignored if code is run with -r option)
    • Look at data folder for some examples

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.