Code Monkey home page Code Monkey logo

ml_for_learner's Introduction

ML_for_learner

该项目旨在使用numpy实现一个类scikit-learn的mini机器学习库,对于相关的知识,均配有blog文章对其理论进行讲解,对于部分功能,还配有notebook分析代码实现上的细节。该项目的初衷是为那些算法学习者提供从理论到实现的一站式服务。

由于本人学识有限,并且没有Python开发经验,该库目前还是一个非常松散的代码集合体。如果你在blog、notebook或者code中发现任何纰漏或bug,甚至是觉得哪写的不通顺,都可以联系我,当然也可以直接在项目页面提issue,谢谢。

QQ: 435248055   |   WeChat: QQ435248055   |   Blog


点击算法名称进入相应Blog了解算法理论,notebook指导如何step-by-step的去实现该算法,code为模块化的代码文件。

注:除非特别说明,各模型所接受的数据格式均为numpy.ndarray格式,部分也可接受List或者嵌套List,除此之外的数据格式本人暂不保证。由于目前的Python type hint还不支持numpy,所以在代码中未说明(感谢微信昵称@Stream的提醒)。

Supervised learning

Class Algorithm Implementation Code
Generalized Linear Models Linear Regression notebook code
Logistic regression notebook code
Nearest Neighbors Nearest Neighbors Classification notebook code
Naive Bayes Gaussian Naive Bayes notebook code
Support Vector Machine SVC notebook code
Decision Trees ID3 Classification notebook code
ID3 Regression notebook code
CART Classification notebook code
CART Regression notebook code
Ensemble methods Random Forests Classification notebook code
Random Forests Regression notebook code
AdaBoosting Classification notebook code

Unsupervised learning

Class Algorithm Implementation Code
Gaussian mixture models Gaussian Mixture notebook code
Clustering K-means notebook code
DBSCAN notebook code
Association Rules Apriori notebook
Collaborative Filtering User-based notebook
Item-based notebook
LFM notebook

Model selection and evaluation

Class Approach Code
Model Selection Dataset Split code
K-Fold code
Stratified K-Fold code
Metrics Accuracy code
Log loss code
F1-score code
AUC code
Explained Variance code
Mean Absolute Error code
Mean Squared Error code
R Square code
Euclidean Distances code

Preprocessing data

Class Algorithm Implementation Code
Feature Scaling StandardScaler code
MinMaxScaler code
Unsupervised dimensionality reduction PCA notebook code
SVD notebook code
Supervised dimensionality reduction Linear Discriminant Analysis notebook code
Text Feature Count Feature code
TF-IDF code

Known Issues

整体代码重用性较低。

random forest没有实现并行。

LDA代码存在功能欠缺。

K-Fold代码中使用了np.append(),效率较低。

ml_for_learner's People

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.