Code Monkey home page Code Monkey logo

np_ml's Introduction

NP_ML

Introduction

Classical machine learning algorithms implemented with pure numpy.

The repo to help you understand the ml algorithms instead of blindly using APIs.

Directory

Algorithm List

Classify

  • Perceptron

For perceptron, the example used the UCI/iris dataset. Since the basic perceptron is a binary classifier, the example used the data for versicolor and virginica. Also, since the iris dataset is not linear separable, the result may vary much.

Figure: versicolor and virginica. Hard to distinguish... Right?

Perceptron result on the Iris dataset.

  • K Nearest Neightbor (KNN)

For KNN, the example also used the UCI/iris dataset.

KNN result on the Iris dataset.

  • Naive Bayes

For naive bayes, the example used the UCI/SMS Spam Collection Dataset to do spam filtering.

For this example only, for tokenizing, nltk is used. And the result is listed below:

preprocessing data...
100%|#####################################################################| 5572/5572 [00:00<00:00, 8656.12it/s]
finish preprocessing data.

100%|#####################################################################| 1115/1115 [00:00<00:00, 55528.96it/s]
accuracy:  0.9757847533632287

We got 97.6% accuracy! That's nice!

And we try two examples, a typical ham and a typical spam. The result show as following.

example ham:
Po de :-):):-):-):-). No need job aha.
predict result:
ham

example spam:
u r a winner U ave been specially selected 2 receive 澹1000 cash or a 4* holiday (flights inc) speak to a 
live operator 2 claim 0871277810710p/min (18 )
predict result:
spam
  • Decision Tree

For decision tree, the example used the UCI/tic-tac-toe dataset. The input is the status of 9 block and the result is whether x win.

tic tac toe.

Here, we use ID3 and CART to generate a one layer tree.

For the ID3, we have:

root
├── 4 == b : True
├── 4 == o : False
└── 4 == x : True
accuracy = 0.385

And for CART, we have:

root
├── 4 == o : False
└── 4 != o : True
accuracy = 0.718

In both of them, feature_4 is the status of the center block. We could find out that the center block matters!!! And in ID3, the tree has to give a result for 'b', which causes the low accuracy.

  • Random Forest
  • SVM
  • AdaBoost
  • HMM

Cluster

  • Kmeans

For kmeans, we use the make_blob() function in sklearn to produce toy dataset.

Kmeans result on the blob dataset.

  • Affinity Propagation

You can think affinity propagation as an cluster algorithm that generate cluster number automatically.

Kmeans result on the blob dataset.

Manifold Learning

In manifold learning, we all use the simple curve-s data to show the difference between algorithms.

Curve S data.

  • PCA

The most popular way to reduce dimension.

PCA visualization.

  • LLE

A manifold learning method using only local information.

LLE visualization.

NLP

  • LDA

Time Series Analysis

  • AR

Usage

  • Installation

If you want to use the visual example, please install the package by:

  $ git clone https://github.com/zhuzilin/NP_ML
  $ cd NP_ML
  $ python setup.py install
  • Examples in section "Algorithm List"

Run the script in NP_ML/example/ . For example:

  $ cd example/
  $ python affinity_propagation.py

(Mac/Linux user may face some issue with the data directory. Please change them in the correspondent script).

  • Examples for Statistical Learning Method(《统计学习方法》)

Run the script in NP_ML/example/StatisticalLearningMethod/ .For example:

  $ cd example/StatisticalLearningMethod
  $ python adaboost.py

Reference

Classical ML algorithms was validated by naive examples in Statistical Learning Method(《统计学习方法》)

Time series models was validated by example in Bus 41202

Something Else

Currently, this repo will only implement algorithms that do not need gradient descent. Those would be arranged in another repo in which I would implement those using framework like pytorch. Coming soon:)

np_ml's People

Contributors

zhuzilin avatar

Watchers

James Cloos avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.