Code Monkey home page Code Monkey logo

cogset's Introduction

cogset's People

Contributors

huonw avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar

cogset's Issues

Zero assigned clusters leading to zero means

The current design chooses the first k points as starting values.

If any of these data points are identical this leads the first to be assigned all the points and the second to be assigned no points (and then generating a NaN mean over its 0 members, and derailing the whole clustering algorithm).

There are 2 solutions I can think of to avoid this condition:

  • Select the first k distinct points for centres.
  • Move any centre which ends up with a cluster of size 0 to a random other point.

The first one seems simple and more predictably performant to start from.

impl Point + Euclidean for Euclid<Vec<f64>>

I would like to be able to use arbitrary-length Vec<f64>s as points in the clustering algorithms. I assume that the problem is likely that it would be ideal to have compile-time checks on the length of the vectors, to ensure that you don't accidentally add a dimension somewhere.

My use-case is computing a series of audio spectrum coefficients, and because of this limitation changing the number of bins (dimensions) means that I have to recompile the code. I would like to allow the end-user to change the number of bins, but I'm not sure how to do that without arbitrary-length vectors. I'm not particularly concerned about efficiency of working on the stack.

Thought I'd open an issue since this hasn't already been opened and closed, but I understand that it's an enhancement not a bug.

Performance benchmarks please?

Could you please provide some performance benchmarks against Scikit-Learn for training and inference of Clustering algorithms that you made? In case you have the time, also against the Intel DAAL library?

I am looking for a starting point to implement fast clustering algorithms and want to know if switching to Rust would have significant gains as compared to scikit learn in Python (uses numpy, which can use MKL backend) OR as compared to scikit-learn in Intel Distribution for python that uses the aforementioned DAAL library.

Thanks!

Hierarchical bottom-up clustering with complete linkage

I use a custom implementation of hierarchical bottom-up clustering with complete linkage in a private project and I would like to move it to a public external crate for obvious reasons, i.e., help others, get help, thin my code base and focus on my main idea.

Well, I found your crate and I was wondering if it is the right place. It should be fairly simple to incorporate even more hierarchical linkage criterias.

cc @huonw (added by @huonw to give me an email about this issue)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.