Code Monkey home page Code Monkey logo

clusterpy's Introduction

Build Status ClusterPy

Analytical regionalization is a scientific way to decide how to group of a large number of geographic areas or points into a smaller number of regions based on similiarities in one or more variables (e.g. income, ethnicity, or environmental condition) that the researcher believes are important for the topic at hand. Conventional conceptions of how areas should be grouped into regions may either not be relevant to the information one is trying to illustrate (e.g. using political regions to map air pollution) or may actually be designed in ways to bias aggregated results.

Current algorithms

  • AZP: Openshaw and Rao (1995)
  • AZP-Simulated Annealing: Openshaw and Rao (1995)
  • AZP-Tabu: Openshaw and Rao (1995)
  • AZP-R-Tabu: Openshaw and Rao (1995)
  • Max-p-regions (Greedy): Duque, Anselin and Rey (2010)
  • Max-p-regions (Tabu): Duque, Anselin and Rey (2010)
  • Max-p-regions (Simulated Annealing): Duque, Anselin and Rey (2010)
  • AMOEBA: Aldstadt and Getis (2006)
  • SOM: Kohonen (1990)
  • geoSOM: Bacao (2004)
  • Random

Special Features

  • Customized 'Analytical' Regionalizations based on following user specifications/inputs:
  • Key areal attribute to regionalize on: User regionalizes (or clusters) data based on different variables she considers important for her problem at hand. (i.e. use your own 'analytical' regions versus normative or administrative regions)
  • Maximum or minimum number of regions.
  • Threshold conditions of the maximum or minimum value that all regional clusters must meet for a given variable (e.g. a minimum threshold for a social or business project might be for all regions to have at least 100.000 people, or for an ecological project regions should have an area of at least 100 square miles).
  • Spatial contiguity constraints (W matrix , GAL, GWT formats), or they will be created for you based the shared geographic borders of your areal units.
  • Time-series signature clustering: not only can areas by clustered by a cross-sectional variable, but also by the correlation of their time-series signatures of the variable.
  • Create New ESRI shapefiles:

Related information

Citing Clusterpy

Please cite ClusterPy when using the software in your work

Duque, J.C.; Dev, B.; Betancourt, A.; Franco, J.L. (2011). ClusterPy: Librar of spatially constrained clustering algorithms, Version 0.9.9. RiSE-group (Research in Spatial Economics). EAFIT University. http://www.rise-group.org.

A BibTeX entry for LaTeX users is:

@Manual{ClusterPy,
title = {ClusterPy: {Library} of spatially constrained clustering algorithms,
{Version} 0.9.9.},
author = {Juan C. Duque and Boris Dev and Alejandro Betancourt and Jose L. Franco},
organization = {RiSE-group (Research in Spatial Economics). EAFIT University.},
address = {Colombia},
year = {2011},
url = {http://www.rise-group.org}
}

License information

See the file "LICENSE.txt" for information on the history of this software, terms & conditions for usage, and a DISCLAIMER OF ALL WARRANTIES.

clusterpy's People

Contributors

abetan16 avatar juancaduque avatar lauraecheverri avatar sergiobuj avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar

clusterpy's Issues

Gurobipy not present

Handle the case that the user doesn't have gurobipy.
Also handle the case when they try to use it without gurobipy.

This is important because gurobipy is a very specific import and a lot of people may not have it.

random seed

when running an algorithm multiple times from the same script, the random will get the seed from the process id. this bug comes from the multi core implementation.

Warning notice for unavailable required libraries

What steps will reproduce the problem?

  1. Install Clusterpy without having any of the required libraries.
  2. import Clusterpy

What is the expected output? What do you see instead?
Clusterpy should not install if the required libraries are not installed.
At the moment the library lets install without any warning or notice.

[This issue was raised by a user trying to install the library, and
contacted the group directly]

Using disjoint polygons or centroids will fail?

Hi

When my initial geometry contains non-touching polygons, or simply centroid points instead of polygons, it seems the algorithm do not work anymore?

See example below, modifying the cluster example of California and:

  • making each county of California slightly smaller, hence not touching
  • using countie's centroid

Note that code will not break, but just run infinitely? Is there a problem with how the distance metric is measured? I thought that conceptually, the algorithms would run independent of the geometry of the initial dataset?

Thanks!

import os
os.chdir("path_to_clusterpy/clusterpy/")
os.getcwd()
#> '/home/matifou/gitReps/clusterpy/clusterpy'
if not os.path.exists("tempDir"):
    os.mkdir("tempDir")
import geopandas
import clusterpy
#> ClusterPy: Library of spatially constrained clustering algorithms

calif_gpd = geopandas.read_file("data_examples/CA_Polygons.shp")

## Smaller buffers
calif_gpd_buf = calif_gpd.copy()
calif_gpd_buf['geometry'] = calif_gpd_buf["geometry"].simplify(2000, preserve_topology=False).buffer(-6000)
calif_gpd_buf = calif_gpd_buf.set_geometry('geometry')
calif_gpd_buf.plot()
#> <matplotlib.axes._subplots.AxesSubplot at 0x7f54e9ec28d0>

## Centroid
calif_centroid = calif_gpd.copy()
calif_centroid['geometry'] = calif_gpd.centroid
calif_centroid = calif_centroid.set_geometry('geometry')

calif_centroid.plot();

## Write to disk
calif_gpd_buf.to_file("tempDir/CA_Polygons_buffer.shp")#
calif_centroid.to_file("tempDir/CA_Polygons_centroid.shp")#

## laod with clusterPy
calif = clusterpy.importArcData("data_examples/CA_Polygons")
#> Loading data_examples/CA_Polygons.dbf
#> Loading data_examples/CA_Polygons.shp
#> Done
calif_buffer = clusterpy.importArcData("tempDir/CA_Polygons_buffer")
#> Loading tempDir/CA_Polygons_buffer.dbf
#> Loading tempDir/CA_Polygons_buffer.shp
#> Done
calif_centroid = clusterpy.importArcData("tempDir/CA_Polygons_centroid")
#> Loading tempDir/CA_Polygons_centroid.dbf
#> Loading tempDir/CA_Polygons_centroid.shp
#> Done

## Run

### Classic: works
calif.cluster('arisel', ['PCR2002'], 9, wType='rook', inits=10, dissolve=1)
#> Getting variables
#> Variables successfully extracted
#> Running original Arisel algorithm
#> Number of areas:  58
#> Number of regions:  9
#> initial Solution:  [8, 4, 4, 1, 4, 1, 8, 5, 7, 4, 1, 1, 4, 4, 4, 4, 1, 1, 0, 4, 3, 4, 1, 4, 1, 4, 0, 7, 7, 0, 7, 1, 4, 7, 4, 4, 0, 2, 4, 0, 2, 0, 8, 6, 1, 1, 1, 7, 7, 4, 1, 1, 1, 4, 4, 0, 7, 1]
#> initial O.F:  0.5022200552944863
#> FINAL SOLUTION:  [8, 4, 4, 1, 4, 1, 8, 5, 7, 4, 5, 1, 4, 4, 4, 4, 1, 5, 0, 4, 3, 4, 1, 4, 5, 4, 0, 7, 7, 0, 7, 1, 4, 7, 0, 4, 0, 2, 4, 0, 2, 0, 8, 6, 1, 5, 5, 7, 7, 4, 1, 5, 5, 4, 4, 0, 1, 5]
#> FINAL OF:  0.4011089126984128
#> Done
#> Adding variables
#> Done
#> Dissolving lines
#> Done
calif.results[0]
#> <clusterpy.core.layer.Layer instance at 0x7f54e9e43dc0>

Created on 2020-03-02 by the reprexpy package

Try now:

calif_buffer.cluster('arisel', ['PCR2002'], 9, wType='rook', inits=10, dissolve=1)
calif_buffer.results[0]

or:

calif_centroid.cluster('arisel', ['PCR2002'], 9, wType='rook', inits=10, dissolve=1)
calif_centroid.results[0]

Matplotlib

Present the map with matplotlib on the interactive python console.
This will be useful when used inside things like the IPython notebook to present the map inline.

Python 3.x support

I see clusterpy is not getting updated anymore, but I find its functionality is still useful. Are there any plans of supporting Python 3.x? or maybe development has moved somewhere else and I've missed it?

Anyway, I'm willing to give it a try and port this to Python 3.6, any pointers on what the roadblocks might be?

Thanks!

Add Travis CI

Great way to show people the status of the project

A question about the realization function in the SAR data module.

I'm a beginner in spatial regression model. And I've got a problem in the simulation fomulation in realization function in the SAR data module.
Why does the response variable Y equal to the product between self.cvcv (the Cholesky factor of self.vcv) and the normal distributed random serial number e?

In the definition of SAR class, the (I-rho*W) and its inversed matrix AI are already calculated. So, can I get the response variable Y by simplely mutiple AI and the normal distributed random serial number e? What's the meaning of vcv matrix, and why its Cholesky factorisation is needed?

In the end, I found that the parameter meanY seems useless in the DGP and SAR initiation step. How can we give a basic mean value to the response variable Y during the simulation process?

Thanks for your time, and looking forward to your responds.

Versioning

Fix versioning.

Add a Simple way to check Clusterpy' version

Region maker as best possible solution

A common strategy with the algorithms is to create multiple instances to test for the best possible configuration and then work on that one. It would be better if the process of getting the best possible region was in the creation of the region maker itself.
That way avoiding the need to create multiple instances and deciding which to use.

examples of usage

an entry on the wiki with a list of example (I say notebooks) showing different ways or different usage of the library.
A big contribution to this is getting #14 done.

Curate data examples

Many sample files are in the data_examples directory.
For the pypi version we need to slim down this, since this would mean downloading many files.

Add debug flag

A debug flag would be very useful in case of setting a random seed and trying to get consistent results between executions.

Objective functions with 'f' added in the end?

When adding a new objective function, in some cases the method to compute
the objective function will try to fetch the method by appending an 'f' to the end.

def getObjectiveFast(self, region2AreaDict, modifiedRegions=[]):
 [code]
                _fun = objectiveFunctionTypeDispatcher[_objFunType+'f']
                distance=_fun(self, region2AreaDict, modifiedRegions, indexData)
 [code]

This has to be either documented and/or fixed somehow.

Layer variables for colors in Matplotlib

Use a specific variable in the layer as the value for the colormap.
This will allow to present the data with intensity levels with Matplotlib.

Very useful to present indexes data. Related #14

How to change versions

Entry in the wiki listing the different places where the version should be updated.

  • clusterpy/init file
  • setup.py file
  • pypi version

Remove unnecessary output

Output like:

Done
Adding variables
Done
Dissolving lines
Done

Only makes it difficult to use Clusterpy in a unix fashion, using pipes and redirections.

maxpTabu using queen

When running the maxptabu algorithm, the execution assumes, and works, with a rook.
If only the queen is available, the execution will never get to a result.

Specifics of papers

Some functionality of Clusterpy, specially parameters, are specific for a publication or project. This kind of developments, when are not general for the usage of the library, should be on a branch on its own or separated somehow.

Unknown function stdobs

A strange function used to standardize is referenced but it
does not exist.

In file clusterpy/core/toolboxes/cluster/componentsAlg/distanceFunctions.py
in the distanceA2AEuclideanSquared function.

...
    if std:
        x = nparray(x)
        x = stdobs(x)  #  standardize
        x = x.tolist()
...
NameError: name 'stdobs' is not defined

dissolveLayer list.remove(x): x not in list

What steps will reproduce the problem?

  1. Setting dissolve to 1 on some map instances.

The cause or type of map/configuration is not clear.

ERROR MESSAGE:
Dissolving lines
clusterPy is not able to dissolve your map based on this solution.Please execute the command Layer.exportArcData(dissolveProblem) and send us the resulting files to [email protected] to analyse the problem and give you a solution as soon asposible. Your feedback is important for us.

TRACEBACK:
Traceback (most recent call last):
  File "performance_script.py", line 46, in <module>
    instance.cluster('azp', ['SAR1'], pReg, dissolve=1)
  File "clusterpy/source/clusterpy/core/layer.py", line 1240, in cluster
    self.dissolveMap(dataOperations=dataOperations)
  File "clusterpy/source/clusterpy/core/layer.py", line 222, in dissolveMap
    dissolveLayer(self, sh, self.region2areas)
  File "clusterpy/source/clusterpy/core/geometry/dissolve.py", line 70, in dissolveLayer
    raise ve
ValueError: list.remove(x): x not in list

Tests for AZP algs.

Need to add tests for the AZP* clustering algorithms.

  • AZP
  • AZP Tabu
  • AZP R-Tabu
  • AZP SA

Problem dissolving solution

I can't dissolve the shapefile for a solution, am I missing anything? I'm posting here an example using a toy example data (from pysal), but I have not been able to get it to work with other datasets too.

import pysal as ps
import clusterpy as clp
col = clp.importArcData(ps.examples.get_path('columbus'))
col.cluster('azpRTabu', ['HOVAL', 'CRIME'], 2, dissolve=1)

Which prints out the following output:

Getting variables
Variables successfully extracted
Running original AZP-R-Tabu algorithm (Openshaw and Rao, 1995)
Number of areas:  49
Number of regions:  2
Constructing regions
initial Solution:  [0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 1, 0, 0, 1, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 1, 1, 0, 0, 0, 0, 0, 1, 0, 0]
initial O.F:  24864.1208571
Performing local search
FINAL SOLUTION:  [0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 0, 0, 0, 0, 0, 0, 1, 0, 0, 1, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 1, 1, 0, 0, 0, 0, 0, 1, 0, 0]
FINAL OF:  24572.9644776
Done
Adding variables
Done
Dissolving lines
Problem: Amount of assigned regions does not match number of areas
Regions: [0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 0, 0, 0, 0, 0, 0, 1, 0, 0, 1, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 1, 1, 0, 0, 0, 0, 0, 1, 0, 0]
Done

generateData : local variable 'y' referenced before assignment

What steps will reproduce the problem?

l = clusterpy.createGrid(4,4)
l.generateData("SAR", "rook", 1, 0.7)
l.generateData("SAR1", "rook", 1, 0.7)

GIVES:

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "clusterpy/core/layer.py", line 539, in generateData
    self.Y[i] = self.Y[i] + y[i]
UnboundLocalError: local variable 'y' referenced before assignment

What is the expected output? What do you see instead?
A layer with two data vars, SAR and SAR1

Local search as functions

The local search procedures should be outside the region maker.
The region maker is handling more things than it should and having the local search outside will help towards #30

[Workflow] Cluster templates

All the algorithms in the cluster module should be presented as 'Templates', but the implementation of the algorithm.
A user should be able to recreate any algorithm with functions solely from the componentsAlg module.
E.g.

  1. Creating a layer
  2. Use any clustering algorithm
  3. Run one or multiple times any local search algorithm on the layer.

This workflow is not possible with the current implementation.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.