Code Monkey home page Code Monkey logo

kmeans-postgresql's Introduction

kmeans-postgresql

This module implements k-means clustering algorithm in PostgreSQL. It is a truly user-defined window function out of builtin functions, written in C.

Designed for PostgreSQL 8.4+

Hitoshi Harada

kmeans-postgresql's People

Contributors

scw avatar umitanuki avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

kmeans-postgresql's Issues

Support multi-pass calculation

kmeans currently implements only single-pass simple algorithm. K-means can actually be improved by more complicated multi-pass approach.

signal 11: Segmentation fault created when certain values are passed into kmeans

First off, I'd like to say that kmeans is spectacular and is doing an awesome job at clustering our data points! We are using kmeans to cluster coordinates on Google Maps and have come across a rather odd error. It seems that whenever we pass in exactly 40 or 44 points to the kmeans function and ask it to cluster based on 65 or greater points it will force a complete shutdown of Postgres, killing any and all active queries. Here's an output of the log:
2014-02-04 16:03:49 UTC LOG: statement: with sites as ( select latitude, longitude, testsite_id
from locator.testsites
order by longitude desc
limit 40 ),
clusterd as ( select kmeans,count(*) ct,array_agg(testsite_id) as site_id
FROM ( select kmeans(array[longitude,latitude], 75) over (), testsite_id
from sites ) as wakka
group by kmeans )

    select * from clusterd

2014-02-04 16:03:49 UTC LOG: server process (PID 11900) was terminated by signal 11: Segmentation fault
2014-02-04 16:03:49 UTC LOG: terminating any other active server processes
2014-02-04 16:03:49 UTC WARNING: terminating connection because of crash of another server process
2014-02-04 16:03:49 UTC DETAIL: The postmaster has commanded this server process to roll back the current transaction and exit, because another server process exited abnormally and possibly corrupted shared memory.
2014-02-04 16:03:49 UTC HINT: In a moment you should be able to reconnect to the database and repeat your command.
2014-02-04 16:03:49 UTC LOG: all server processes terminated; reinitializing
2014-02-04 16:03:49 UTC LOG: database system was interrupted; last known up at 2014-02-04 16:02:24 UTC
2014-02-04 16:03:49 UTC LOG: database system was not properly shut down; automatic recovery in progress

The oddest thing that I find about this is the query returns perfectly fine when the limit is decreased to any other number or increased to any other number (aside from 44, that seems to be the other magical number). Since kmeans is doing such a fantastic job outside of this strange oddity we're currently working around it by simply adjusting the cluster count based on the number of sites passed in, but I thought I'd bring this to your attention.

Please let me know if you need any more information from me!

EDIT : Forgot to mention, we're running PostgreSQL 9.1.11 on Ubuntu 12.04.4 LTS and can replicate this error 100% of the time on PostgreSQL 9.1.10 and Ubuntu 12.04.3 LTS (haven't tried it on any other major versions)

Thanks,
Paul

Probably doing something quite wrong... :-)

This one compiled great, however your makefiles seem to want me to be in some special location.

DB:~/pgdevel/kmeans-postgresql $ sudo make installcheck USE_PGXS=1 PGUSER=postgres
/opt/local/lib/postgresql91/pgxs/src/makefiles/../../src/test/regress/pg_regress --inputdir=. --psqldir=/opt/local/lib/postgresql91/bin   --dbname=contrib_regression kmeans
(using postmaster on Unix socket, default port)
pg_regress: could not open file "sql/kmeans.sql" for writing: No such file or directory
make: *** [installcheck] Error 2
DB:~/pgdevel/kmeans-postgresql $ sudo make installcheck PGUSER=postgres
Makefile:34: ../../src/Makefile.global: No such file or directory
Makefile:35: /contrib/contrib-global.mk: No such file or directory
make: *** No rule to make target `/contrib/contrib-global.mk'.  Stop.

Terribly sorry if this may sound trivial to you. I take it I need to do this stuff from the postgres/contrib folder. Or might I have missed exporting something or adding an extra parameter somewhere?

Getting never more than 3 clusters, unless manual initial points were given.

Somehow, we get never more than 3 clusters. Except for a few points, all of them fall into a single cluster.

We are using postgis, using the following query:

SELECT id, kmeans(ARRAY[ ST_X(point), ST_Y(point)], 40) OVER () AS k, id, FROM our_table ORDER BY k DESC;

Is there any way to improve the initial choosen points?

Version compatible with postgres 10+

I'm running a version higher than Postgres 9.6 and when I try to install this I get the error "version mismatch". It seems nothing higher than 9.6 is supported. Is there a newer version or a way to make it compatible?

installation tutorial

can you share any tutorial to how can i install kmean-postgresql to on my ubuntu 10.04? from start to end...

unrecognized command line option "-Wno-format-truncation"

make
gcc -Wall -Wmissing-prototypes -Wpointer-arith -Werror=vla -Wendif-labels -Wmissing-format-attribute -Wformat-security -fno-strict-aliasing -fwrapv -fexcess-precision=standard -Wno-unused-but-set-variable -Werror=implicit-fallthrough=3 -Wno-format-truncation -m64 -O3 -fargument-noalias-global -fno-omit-frame-pointer -g -Werror=uninitialized -Werror=implicit-function-declaration -fPIC -I. -I./ -I/usr/local/matrixdb-4.2.0.enterprise/include/postgresql/server -I/usr/local/matrixdb-4.2.0.enterprise/include/postgresql/internal -D_GNU_SOURCE -I/usr/include/libxml2 -I/tmp/build/matrixdb/gpAux/ext/rhel7_x86_64/include -c -o kmeans.o kmeans.c
cc1: error: -Werror=implicit-fallthrough=3: no option -Wimplicit-fallthrough=3
cc1: warning: unrecognized command line option "-Wno-format-truncation" [enabled by default]
make: *** [kmeans.o] Error 1

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.