DallasC commented on July 30, 2024 4

I did a quick round up of crates that implement the algorithms listed on the roadmap. Probably missed quite a few too but this can be a good starting point.

It was just a quick search so I don't know how reliavent each crate is but I tried to make a note if the crate was old and unmaintained. Hopefully this can be useful for helping with algorithm design or saving us from having to reimplement something that is already there.

Algo ecosystem gist

from linfa.

erkasc01 commented on July 30, 2024 3

Hi everyone, I've implemented the semi-supervised learning algorithm called dynamic label propagation using Rust. I'm getting accuracy score up to 98% for one of the datasets I've been using. I don't think this algorithm is very well known, but could it be added to the Linfa library?

from linfa.

ChristopherRabotin commented on July 30, 2024 2

Hi there! First off, I don't have any experience in ML, but I read a lot about it (and listen to way too many podcasts on the topic). I'm interested in jumping in. I have quite some experience developing in Rust, and specifically high fidelity simulation tools (cf nyx and hifitime).

I wrote an Ant Colony Optimizer in Rust. ACOs are great for traversing graphs which represent a solution space, a problem which is considered NP hard if I'm not mistaken. Is that something used at all in ML? If so, would it be of interest to this library, or is there a greater interest (for now) to focus on the problems listed in the first post?

Cheers

from linfa.

VirtualSpaceman commented on July 30, 2024 2

Hello, if possible, I would like to assume the implementation of ICA algorithm. My implementation will take this paper as a guide.

from linfa.

quietlychris commented on July 30, 2024 2

@Sauro98 I believe linfa currently an implementation of the vanilla DBSCAN algorithm here, but an approximate version would be a great addition! It would be great if you could open a pull request following that same style under the linfa-clustering sub-crate (i.e. using the appx_dbscan vs. dbscan), in a way that maintains some closeness between the API for the existing algorithm and your own ( Dbscan::predict(&hyperparams, &dataset); -> AppxDbscan::predict(&hyperparams, &dataset);), which I believe usually accepts data in the form of an ndarray &Array2<f64> structure variant. I'd also think it would be really interesting to see benchmarks comparing performance between the two!

from linfa.

jkabc123 commented on July 30, 2024 2

I'd like to implement Random Forrest.

from linfa.

LukeMathWalker commented on July 30, 2024 1

Cool! I worked a bit on linear regression a while ago - you can find a very vanilla implementation of it here: https://github.com/rust-ndarray/ndarray-examples/tree/master/linear_regression @Nimpruda

from linfa.

nestordemeure commented on July 30, 2024 1

Both are ok with me.

An issue in friedrich's repository might help avoid overcrowning linfa with issues but do as you prefer.

from linfa.

milesgranger commented on July 30, 2024 1

I think this is really great, I just started on a sklearn like implementation of their pipelines, here but more or less for experimentation without anything serious. I'll be sure to keep my eye on issues/goals here and help out where I can. Thanks for the initiative! 👏

from linfa.

Nimpruda commented on July 30, 2024 1

Hi @ChristopherRabotin I've never heard of ACOs but as it's in relation with graphs you should check if it has any uses with Markov Chains.

from linfa.

onehr commented on July 30, 2024 1

I would like to take the Naive Bayes one.

from linfa.

LukeMathWalker commented on July 30, 2024 1

Tracking friedrich<>linfa integration here: nestordemeure/friedrich#1

from linfa.

InCogNiTo124 commented on July 30, 2024 1

hey @LukeMathWalker could you add me next to the normalization? I plan to do it by New Year's as I'm still not very experienced with Rust, but I have an idea how to implement it

from linfa.

LukeMathWalker commented on July 30, 2024 1

Done @InCogNiTo124 🙏

from linfa.

LukeMathWalker commented on July 30, 2024 1

Implementation of DBSCAN merged to master - thanks @xd009642 🙏

from linfa.

LukeMathWalker commented on July 30, 2024 1

I would consider both of them to be out of scope for this project - it's already incredibly broad as it is right now 😅
I'd love to see something spawn up for reinforcement learning, especially gym environments!

from linfa.

mossbanay commented on July 30, 2024 1

Hi there!

I've been working on an implementation of decision trees here. It's still a WIP and needs documentation but it's a start at least. Once it's a bit more polished I can look at random forests also.

from linfa.

bytesnake commented on July 30, 2024 1

ordinary linear regression was added in #20, thanks to @Nimpruda and @paulkoerbitz

from linfa.

bytesnake commented on July 30, 2024 1

linear decision trees were added in #18, kudos to @mossbanay

from linfa.

xd009642 commented on July 30, 2024 1

I'm gonna start working on OPTICS at some point soon 👀

from linfa.

bytesnake commented on July 30, 2024 1

Gaussian naïve Bayes was added in #51, kudos to @VasanthakumarV

from linfa.

Clara322 commented on July 30, 2024 1

Is there any interest for linfa supporting model selection algorithms such as grid search or hyperparameter tuning?

from linfa.

vaijira commented on July 30, 2024 1

Could it be added "causal inference" like https://github.com/microsoft/dowhy library or would be out of scope for linfa?

from linfa.

YuhanLiin commented on July 30, 2024 1

Infrastructure Goals

Aside from just adding new algorithms, there are also some infrastructure tasks that will significantly improve the ergonomics and performance of Linfa. They are listed here in descending order of importance, in my opinion:

#220: Python bindings allows linfa to target the same userbase as scikit-learn, broadening the reach of the project. Having Python benchmarks also allows a fair performance comparison between linfa and scikit-learn.
#228: This task potentially allows removing BLAS dependencies from linfa completely, significantly increasing code quality. It also has the side-effect of increasing benchmark coverage.
#103: More benchmarks allow us to make performance optimizations with confidence.
#161: Allows users to have more visibility into the internals of longer-running algorithms, similar to other mainstream ML libraries.

from linfa.

sebasv commented on July 30, 2024 1

I'd like to contribute quantile regression

from linfa.

Nimpruda commented on July 30, 2024

Hi, I'm eager to help I'll take Linear regression, Lasso and ridge.

from linfa.

InCogNiTo124 commented on July 30, 2024

What does Normalization mean, is it like sklearn's StandardScaler or something else?

from linfa.

LukeMathWalker commented on July 30, 2024

Exactly @InCogNiTo124.

from linfa.

ADMoreau commented on July 30, 2024

This is an interesting project and I will work on the PCA implementation

from linfa.

nestordemeure commented on July 30, 2024

I am the author of the friedrich crate which implements Gaussian Processes.

While it is still a work in progress, it is fully featured and I would be happy to help integrate it into the project if you have directions to do so.

from linfa.

LukeMathWalker commented on July 30, 2024

That would be awesome @nestordemeure - I'll have a look at the project and I'll get back to you! Should I open an issue on friedrich's repository when I am ready? Or would you prefer it to be tracked here on the linfa repository?

from linfa.

mstallmo commented on July 30, 2024

I'd love to take the Nearest Neighbors implementation

from linfa.

ChristopherRabotin commented on July 30, 2024

So far, I haven't found how both can be used together. The closest I found was finding several papers which use Markov Chains to analyze ACOs.

from linfa.

tyfarnan commented on July 30, 2024

I'll take on Gaussian Processes.

from linfa.

bplevin36 commented on July 30, 2024

I'll put some work towards the text tokenization algorithms (CountVectorizer and TFIDF). I'm also extremely interested in a good SVM implementation in Rust. Whoever is working on that, let me know if you'd like some help or anything.

from linfa.

LukeMathWalker commented on July 30, 2024

Please take a look at what is already out there before diving head down into a reimplementation @tyfarnan - I haven't had the time to look at friedrich by @nestordemeure yet (taking a break after the final push to release the blog post and related code 😅) but we should definitely start from there as well as the GP sub-module in rusty-machine.

from linfa.

nestordemeure commented on July 30, 2024

@tyfarnan, don't hesitate to contact me via an issue on friedrich's repository once @LukeMathWalker has explicited what is expected of code that is integrated into Linfa and how this integration will be done.

from linfa.

LukeMathWalker commented on July 30, 2024

I have updated the Issue to make sure it's immediately clear who is working on what and what items are still looking for an owner 👍

from linfa.

xd009642 commented on July 30, 2024

Started implementing DBScan in #12.

Also if there are suggestions Gaussian Mixture Models would be cool

from linfa.

adamShimi commented on July 30, 2024

Hi, really cool project!
I have a question concerning the scope: do you eventually want to have deep learning and reinforcement learning algorithms too? I guess I'm curious to know if adding them is the plan eventually, but you want to start with the easier stuff, or if you think along the line of the scikit dev themselves : here.

Either way, I'll be glad to help spread the rust gospel. Right know I'm going through the Reinforcement Learning book, and I will implement some of the algorithms; if that's in the scope of linfa, I'll be glad to try adding them to it. If not, I plan to read through Understanding Machine Learning afterwards, and thus will eventually reach some of the algorithms in the roadmap. Then I will help by implementing them. :)

from linfa.

xd009642 commented on July 30, 2024

From previous discussions deep learning etc is out of scope for the same reasons as it is for sci-kit. @LukeMathWalker might have more to say about it or reinforcement learning 😄

from linfa.

adamShimi commented on July 30, 2024

Ok, thanks.

from linfa.

bytesnake commented on July 30, 2024

Can you also include Non-Negative Matrix Factorization (NMF) in the list for pre-processing steps. Its a standard algorithm in NLP/audio enhancement and decomposes a matrix into the product of two positive valued matrices. (https://en.wikipedia.org/wiki/Non-negative_matrix_factorization)

One of the nice properties is that there is a simple incremental algorithm for solving the the problem, with simple modification for sparsity constraints.

from linfa.

bytesnake commented on July 30, 2024

For hierarchical clustering there is the wonderful kodama crate. Its based on this paper and implements a list of algorithms for hierarchical clustering (and chooses the fastest one). I think it would be a waste to re-implement them. Perhaps we can just re-export it in a module?

from linfa.

bytesnake commented on July 30, 2024

Please take a look at the PR rust-ndarray/ndarray-linalg#184 which adds TruncatedEig and TruncatedSvd routines to the library. Both are based on the LOBPCG algorithm and allow an iterative approach to eigenvalue and singular value decomposition. This is used in PCA, manifold learning (e.g. spectral clustering) and discriminant analysis and is therefore useful here too. The algorithm also supports sparse problems, because the operator is defined in a matrix free way. (the matrix A is provided as a closure in the function call to LOBPCG)

from linfa.

bytesnake commented on July 30, 2024

Can you also add me to the spectral clustering task? Will try to implement classical Multidimensional Scaling. The t-SNE technique is interesting too, but requires more time because it is based on a custom optimization problem. Furthermore gaussian embedding is another interesting technique, I used recently, but requires at-least SGD for a single layer NN. See papers here:

from linfa.

quietlychris commented on July 30, 2024

@VirtualSpaceman That would be great--any pull requests are very welcome

from linfa.

bytesnake commented on July 30, 2024

fast Independent Component Analysis was added in #47, kudos to @VasanthakumarV

from linfa.

Sauro98 commented on July 30, 2024

Hello, I am working on an implementation of the Approximated DBSCAN algorithm here and I was wondering if that is something that could be interesting for this project. Right now it has all the basic functionalities implemented and I would happily make any changes to make it fit here

from linfa.

relf commented on July 30, 2024

Hello, I've started to work on a port of sklearn Gaussian Mixture Model (mentioned by @xd009642 above). I would be happy to contribute to the linfa-clustering sub-crate. Btw, thanks for the linfa initiative which is really promising.

from linfa.

bytesnake commented on July 30, 2024

Gaussian Mixture Models were added in #56, thanks to @relf

from linfa.

kno10 commented on July 30, 2024

Fast K-medoids clustering (PAM, FasterPAM) implementations: https://crates.io/crates/kmedoids

from linfa.

rrichardson commented on July 30, 2024

Markov Chains was mentioned above, but I'd really like to see Hidden Markov Models. I think it'd fit better under the "supervised learning" set of algorithms, even though it has unsupervised applications as well.
I made a toy project with https://github.com/paulkernfeld/hmmm a while back and it seemed solid enough.
There are no (published) API docs, but the code itself is quite small, and very well documented.

from linfa.

bytesnake commented on July 30, 2024

the Partial Least Squares family was added in #95, thanks to @relf

from linfa.

bytesnake commented on July 30, 2024

Preprocessing with normalisation, count-vectorizer and tf-idf merged in #93, kudos to @Sauro98

from linfa.

bytesnake commented on July 30, 2024

Nearest neighbours merged in #120, thanks to @YuhanLiin

from linfa.

mrleu commented on July 30, 2024

hi all i'd like to help implement too. what's the best way to pick up a task?

from linfa.

bytesnake commented on July 30, 2024

hi all i'd like to help implement too. what's the best way to pick up a task?

not difficult, just mention your interest here and I will add you to the list once you've submitted the initial draft :)

from linfa.

xd009642 commented on July 30, 2024

@Clara322 I personally think that would be a good candidate for a new linfa crate if you want to open an issue for it specifically then there can be some discussion on the specifics of what the design will look like and the steps to implement it 👍

from linfa.

bytesnake commented on July 30, 2024

Could it be added "causal inference" like https://github.com/microsoft/dowhy library or would be out of scope for linfa?

there are many interesting pattern which linfa can learn from but we would need first to support graphical models

Hi everyone, I've implemented the semi-supervised learning algorithm called dynamic label propagation using Rust. I'm getting accuracy score up to 98% for one of the datasets I've been using. I don't think this algorithm is very well known, but could it be added to the Linfa library?

cool, sure! Once you have a working prototype, submit a PR and I will review the integration. We have to see how to add support for incomplete datasets though

from linfa.

vaijira commented on July 30, 2024

@bytesnake I'm playing with it, creating graph and identification support. If one day i feel it can be ready i'll submit a PR. https://github.com/vaijira/linfa/tree/causal/algorithms/linfa-causal-inference

from linfa.

oojo12 commented on July 30, 2024

Can I work on adding Linear Discriminant Analysis to linfa? Here is a link to the Sklearn analog

from linfa.

bernardo-sb commented on July 30, 2024

I've been working on some features like: Categorical Encoding, MAPE and random forest. How can I contribute?

from linfa.

YuhanLiin commented on July 30, 2024

Can I work on adding Linear Discriminant Analysis to linfa? Here is a link to the Sklearn analog

Does LDA output the dimensionally-reduced data at all? If so it should go into linfa-reduction

from linfa.

YuhanLiin commented on July 30, 2024

I've been working on some features like: Categorical Encoding, MAPE and random forest. How can I contribute?

Random forests are covered by this PR.

Categorical encoding would go into linfa-preprocessing. I'm pretty sure we don't have it but just check to make sure.

MAPE is a simple function that would go into linfa/src/metrics_regression.rs

from linfa.

oojo12 commented on July 30, 2024

Can I work on adding Linear Discriminant Analysis to linfa? Here is a link to the Sklearn analog

Does LDA output the dimensionally-reduced data at all? If so it should go into linfa-reduction

It can perform dimensionality-reduction (transform). It can also just be used to predict classes (predict). The parentheses hold the method analog in Sklearn. Is there a preference for which should be implemented? Also, I am still getting familiar with Rust so it may take a few weeks to get done.

from linfa.

YuhanLiin commented on July 30, 2024

Preferably implement both if possible.

from linfa.

oojo12 commented on July 30, 2024

Gotcha

…

On Sun, Oct 23, 2022, 11:32 PM Yuhan Lin ***@***.***> wrote: Preferably implement both if possible. — Reply to this email directly, view it on GitHub <#7 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/ALHYMCWVD5TXQ3UTMGHVNSTWEX7LPANCNFSM4JTOIM2Q> . You are receiving this because you commented.Message ID: ***@***.***>

from linfa.

LundinMachine commented on July 30, 2024

Are there plans to implement ridge regression in the linear sub-package? Looking for models to contribute.

from linfa.

YuhanLiin commented on July 30, 2024

Ridge regression should already be in linfa-elasticnet

from linfa.

LundinMachine commented on July 30, 2024

What about imputation, similar to scikit imput?

from linfa.

YuhanLiin commented on July 30, 2024

We don't have that. That can go in linfa-preprocessing

from linfa.

HridayM25 commented on July 30, 2024

Hi!
Can I take up Random Forests?
Also can we look to implement xgboost and adaboost?

from linfa.

YuhanLiin commented on July 30, 2024

#229 implements bootstrap aggregation, which is a generalization of random forests, so you could work on that.

xgboost and adaboost seem to both be ensemble algorithms that are not necessarily tied to decision trees (correct me if I'm wrong), so we should probably put them in a new algorithm crate called linfa-emsemble or something. Bootstrap aggregation should probably go in there as well.

from linfa.

MarekJReid commented on July 30, 2024

In terms of functionality, the mid-term end goal is to achieve an offering of ML algorithms and pre-processing routines comparable to what is currently available in Python's scikit-learn.

These algorithms can either be:
* re-implemented in Rust;

* re-exported from an existing Rust crate, if available on [crates.io](crates.io) with a compatible interface.
In no particular order, focusing on the main gaps:
* Clustering:
  
  * [x]  DBSCAN
  * [x]  Spectral clustering;
  * [x]  Hierarchical clustering;
  * [x]  OPTICS.

* Preprocessing:
  
  * [x]  PCA
  * [x]  ICA
  * [x]  Normalisation
  * [x]  CountVectoriser
  * [x]  TFIDF
  * [x]  t-SNE

* Supervised Learning:
  
  * [x]  Linear regression;
  * [x]  Ridge regression;
  * [x]  LASSO;
  * [x]  ElasticNet;
  * [x]  Support vector machines;
  * [x]  Nearest Neighbours;
  * [ ]  Gaussian processes; (integrating `friedrich` - tracking issue [Integrating friedrich into linfa nestordemeure/friedrich#1](https://github.com/nestordemeure/friedrich/issues/1))
  * [x]  Decision trees;
  * [ ]  Random Forest
  * [x]  Naive Bayes
  * [x]  Logistic Regression
  * [ ]  Ensemble Learning
  * [ ]  Least Angle Regression
  * [x]  PLS
The collection is on purpose loose and non-exhaustive, it will evolve over time - if there is an ML algorithm that you find yourself using often on a day to day, please feel free to contribute it 💯

Id love to take on Random Forest! I have previously implemented it simplistically in Go, but I'd love to make it happen in Rust. This is my first open source contribution - let me know how I can make it happen :)

from linfa.

giorgiozoppi commented on July 30, 2024

I'd also would like to help this.

from linfa.

AndersonYin commented on July 30, 2024

I'm interested in the least angle regression (lars). It seems that PR #115 was trying to implement it but it has paused for 3 years. So I guess it's basically abolished. I'm going to pick it up.

from linfa.

giorgiozoppi commented on July 30, 2024

I am interested in random forests.

from linfa.

zenconnor commented on July 30, 2024

@MarekJReid @giorgiozoppi did either of you take a chance at random forests?

from linfa.

giorgiozoppi commented on July 30, 2024

i look into. At school we did this week. For python binding maturin is perfect. @zenconnor should i look inside scitkit-learn? I was looking at scikit learn implementation, as soon I can i provide a class diagram of that.

from linfa.

Roadmap about linfa HOT 80 OPEN

Comments (80)

Infrastructure Goals

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent