Code Monkey home page Code Monkey logo

dbscan1d's People

Contributors

d-chambers avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar

dbscan1d's Issues

dbscan metric

Hi there,
I noticed that the metric is not included in the constructor which will make it incompatible with DBSCAN official version should support:

dbs = DBSCAN1D(eps=1.0, min_samples=10,metric='Euclidean')

I have looked into the code and I can see from here:

    def _get_is_core(self, ar):
        """ Determine if each point is a core. """
        mineps = np.searchsorted(ar, ar - self.eps, side="left")
        maxeps = np.searchsorted(ar, ar + self.eps, side="right")
        core = (maxeps - mineps) >= self.min_samples
        return core

That is equivalent to euclidean distance in one dimension abs(p1-p2), is this correct?
Would there be any point in supporting other distances in 1 dimension which are not many but for example will be useful to have (p1-p2)^2.

Maybe throw an exception for any other non supported distances?

Possible bug in label vs core counts

Hi,
found an issue when counting labels with core points.
If you try this:


from sklearn.datasets import make_blobs

from dbscan1d.core import DBSCAN1D

import random
random.seed(0)

for x in range(1,100):
  # make blobs to test clustering
  X,y = make_blobs(1_000_000, centers=2, n_features=1)

  # init dbscan object
  dbs = DBSCAN1D(eps=.5, min_samples=4)

  # get labels for each point
  labels = dbs.fit_predict(X)
  core_pts = dbs.core_sample_indices_

  core_size = core_pts[0].size
  label_size = np.where(labels >=0)[0].size

  if core_size != label_size:
    print('Total points %d' % len(X))
    print('Cluster ID: %s' % np.unique(labels))
    print('Total noise %s' % np.where(labels <0)[0].size)
    print('Total core %s' % np.where(labels >=0)[0].size)
    print('Total core points %d' % core_pts[0].size)

Very often you will find this situation:

Total points 1000000
Cluster ID: [-1  0  1]
Total noise 1
Total core 999999
Total core points 999998

The labels do not match the core point counts.

Inconsistent labels values when there is only one cluster

First and foremost thank you for this efficient implementation of the alghoritm for 1D data.
With that being said i noticed that the labelling seems to be inconsistent, when there is only one cluster sometimes the points belonging to the cluster are labeled with 1 sometimes with 0, why does this happen and could it happen when there's more than one cluster?
Following a reproducibile example:
label_is_zero=[86400.0,86400.0,86400.0,86401.0,86399.0,86400.0,86401.0,86399.0,86400.0,86400.0,86400.0,86402.0,86398.0,86401.0,86399.0,86401.0,86400.0,86399.0,86399.0,86401.0,86399.0,86401.0,86399.0,86402.0,86399.0,86400.0,86401.0,86401.0]
label_is_one=[46823, 46818, 46816, 46816, 46819]

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.