Comments (2)
Hey Chris,
I was actually playing around with something similar a few months ago and
implemented something myself based on the dbscan algorithm:
http://en.wikipedia.org/wiki/DBSCAN. I never untangled that code from our
own stuff so I never bothered to open source it; so I'll just paste it
below. It uses a multi map to look up nearby points using their geohash,
and my north, east,south, & west functions for calculating neighboring
geohashes. The pois are then post processed using distance calculation.
The algorithm should be fairly easy to adapt since you can easily mess
around with the nearby function to change how it works.
I have a variation of this class that uses my geokv github project instead
of a multi map. I've run it on city sized areas with tens of thousands of
pois. It is very sensitive to the threshold parameters of course and
depending on how you set them it can range from very fast to very slow.
Worst case performance of this algorithm is n^2 if I remember correctly.
Jilles
package com.localstream.processing.clustering;
import static com.localstream.common.Fields.point;
import java.util.Iterator;
import java.util.LinkedList;
import java.util.List;
import java.util.Map.Entry;
import java.util.Set;
import com.github.jsonj.JsonObject;
import com.google.common.collect.HashMultimap;
import com.google.common.collect.ImmutableMultimap;
import com.google.common.collect.Sets;
import com.jillesvangurp.geo.GeoGeometry;
import com.jillesvangurp.geo.GeoHashUtils;
public class InMemoryGeoHashDbScanClustering {
HashMultimap<String, JsonObject> points = HashMultimap.create();
private final int length;
private final int minimumClusterSize;
private final long threshold;
/**
* This configures the clustering.
*
* @param threshold distance in meters used as a threshold
* @param minimumClusterSize This dictates the minimum size of a
cluster.
*/
public InMemoryGeoHashDbScanClustering(long threshold,
intminimumClusterSize) {
this.threshold = threshold;
this.length = GeoHashUtils.getSuitableHashLength(threshold, 50, 0);
this.minimumClusterSize = minimumClusterSize;
}
/**
* Add a point. The object should have a latitude and longitude field.
* @param object
*/
public void add(JsonObject object) {
String hash = getHash(object);
points.put(hash, object);
}
private String getHash(JsonObject object) {
double[] point = point(object);
return GeoHashUtils.encode(point[0], point[1],length);
}
public Set<Set<JsonObject>> cluster() {
Set<Set<JsonObject>> clusters = Sets.newHashSet();
Set<JsonObject> visited = Sets.newHashSet();
Set<JsonObject> clustered = Sets.newHashSet();
ImmutableMultimap.copyOf(points);
for(Entry<String, JsonObject> entry
:ImmutableMultimap.copyOf(points).entries())
{
JsonObject object = entry.getValue();
if(!visited.contains(object)) {
Set<JsonObject> nearBy = getNearBy(entry.getValue());
if(nearBy.size() > minimumClusterSize) {
Set<JsonObject> cluster = Sets.newHashSet();
expandClusterNonRecursive(object, visited,clustered,
cluster, nearBy);
if(cluster.size() > minimumClusterSize) {
clusters.add(cluster);
}
}
}
}
return clusters;
}
public long size() {
return points.size();
}
// private void expandCluster(JsonObject p, Set visited,
Set clustered, Set cluster, Set nearBy)
{
// cluster.add(p);
// clustered.add(p);
// for(JsonObject o: ImmutableSet.copyOf(nearBy)) {
// if(!clustered.contains(o)) {
// cluster.add(o);
// clustered.add(o);
// }
// if(!visited.contains(o)) {
// visited.add(o);
// Set nearBy2 = getNearBy(o);
// expandCluster(o, visited, clustered, cluster, nearBy2);
// }
// }
// }
private void expandClusterNonRecursive(JsonObject p, Set<JsonObject>
visited, Set clustered, Set cluster,
Set nearBy) {
List<JsonObject> stack = new LinkedList<>();
stack.add(p);
stack.addAll(nearBy);
while(stack.size() > 0) {
JsonObject o = stack.remove(stack.size()-1);
if(!visited.contains(o)) {
visited.add(o);
Set<JsonObject> nearBy2 = getNearBy(o);
if(nearBy2.size() > minimumClusterSize) {
stack.addAll(nearBy2);
}
}
if(!clustered.contains(o)) {
cluster.add(o);
clustered.add(o);
}
}
}
private double distance(JsonObject o1, JsonObject o2) {
double[] p1 = point(o1);
double[] p2 = point(o2);
return GeoGeometry.distance(p1[0],p1[1],p2[0],p2[1]);
}
private Set<JsonObject> getNearBy(JsonObject o) {
Set<JsonObject> unFiltered = getNearByHash(getHash(o));
Iterator<JsonObject> iterator = unFiltered.iterator();
while (iterator.hasNext()) {
JsonObject candidate = iterator.next();
if(distance(o,candidate) > threshold) {
iterator.remove();
}
}
return unFiltered;
}
private Set<JsonObject> getNearByHash(String hash) {
Set<JsonObject> nearBy = points.get(hash);
String next = GeoHashUtils.north(hash);
nearBy.addAll(points.get(next));
next = GeoHashUtils.east(next);
nearBy.addAll(points.get(next));
next = GeoHashUtils.south(next);
nearBy.addAll(points.get(next));
next = GeoHashUtils.south(next);
nearBy.addAll(points.get(next));
next = GeoHashUtils.west(next);
nearBy.addAll(points.get(next));
next = GeoHashUtils.west(next);
nearBy.addAll(points.get(next));
next = GeoHashUtils.north(next);
nearBy.addAll(points.get(next));
next = GeoHashUtils.north(next);
nearBy.addAll(points.get(next));
return nearBy;
}
}
On Thu, Feb 21, 2013 at 7:43 AM, Christopher Schmidt <
[email protected]> wrote:
1st: Thanks for this great library...
We have a common use case to show clustered data because we cannot show
thousands of POIs in a map window. The algorithms I found are using single
points for that and are very slow on large data. Shouldn't it be possible
to combine this clustering with the more performance oriented GeoHashes? In
fact GeoHashes are already a grid clustering and aggregation. Do you know a
lib or a algorithm?—
Reply to this email directly or view it on GitHubhttps://github.com/jillesvangurp/geotools/issues/1.
from geogeometry.
Hi Jilles, thanks for your answer and code. I'll test it...
from geogeometry.
Related Issues (20)
- geoHashesForPolygon Poles Condition HOT 6
- This repo's master and maven repository are not at the same version HOT 2
- lineCross function is resulting in a false positive HOT 3
- Bbox lat/lng mixup HOT 2
- bounding boxes do not follow GeoJson spec HOT 1
- add tests for Douglas Peucker algorithm
- get rid of testng & kotest in favor of junit5 + kotest assertions
- Build out geojson model support
- bug with GeoHashUtils.decode _box HOT 1
- improve validation
- add a max hashes to geoHashesForPolygon
- improve geoHashesForPolygon split and filter HOT 1
- expandPolygon produces weird results/broken
- GeoHashUtils.geoHashesForCircle is not generating correctly HOT 3
- poles check bug HOT 1
- Version 3.3.8 build with Java 21 HOT 2
- decode_bbox return array comment HOT 2
- polygonContains doesn't seem to work correctly HOT 12
- Looks like GeoGeometry.overlap function is not perfect HOT 4
- geoHashesForPolygon HOT 3
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from geogeometry.