Code Monkey home page Code Monkey logo

nimbusml's Introduction

NimbusML

nimbusml is a Python module that provides Python bindings for ML.NET.

ML.NET was originally developed in Microsoft Research and is used across many product groups in Microsoft like Windows, Bing, PowerPoint, Excel, and others. nimbusml was built to enable data science teams that are more familiar with Python to take advantage of ML.NET's functionality and performance.

nimbusml enables training ML.NET pipelines or integrating ML.NET components directly into scikit-learn pipelines. It adheres to existing scikit-learn conventions, allowing simple interoperability between nimbusml and scikit-learn components, while adding a suite of fast, highly optimized, and scalable algorithms, transforms, and components written in C++ and C#.

See examples below showing interoperability with scikit-learn. A more detailed example in the documentation shows how to use a nimbusml component in a scikit-learn pipeline, and create a pipeline using only nimbusml components.

nimbusml supports numpy.ndarray, scipy.sparse_cst, and pandas.DataFrame as inputs. In addition, nimbusml also supports streaming from files without loading the dataset into memory with FileDataStream, which allows training on data significantly exceeding memory.

Documentation can be found here and additional notebook samples can be found here.

Installation

nimbusml runs on Windows, Linux, and macOS.

nimbusml requires Python 2.7, 3.5, 3.6, 3.7 64 bit version only.

Install nimbusml using pip with:

pip install nimbusml

nimbusml has been reported to work on Windows 10, MacOS 10.13, Ubuntu 14.04, Ubuntu 16.04, Ubuntu 18.04, CentOS 7, and RHEL 7.

Examples

Here is an example of how to train a model to predict sentiment from text samples (based on this ML.NET example). The full code for this example is here.

from nimbusml import Pipeline, FileDataStream
from nimbusml.datasets import get_dataset
from nimbusml.ensemble import FastTreesBinaryClassifier
from nimbusml.feature_extraction.text import NGramFeaturizer

train_file = get_dataset('gen_twittertrain').as_filepath()
test_file = get_dataset('gen_twittertest').as_filepath()

train_data = FileDataStream.read_csv(train_file, sep='\t')
test_data = FileDataStream.read_csv(test_file, sep='\t')

pipeline = Pipeline([ # nimbusml pipeline
    NGramFeaturizer(columns={'Features': ['Text']}),
    FastTreesBinaryClassifier(feature=['Features'], label='Label')
])

# fit and predict
pipeline.fit(train_data)
results = pipeline.predict(test_data)

Instead of creating an nimbusml pipeline, you can also integrate components into scikit-learn pipelines:

from sklearn.pipeline import Pipeline
from nimbusml.datasets import get_dataset
from nimbusml.ensemble import FastTreesBinaryClassifier
from sklearn.feature_extraction.text import TfidfVectorizer
import pandas as pd

train_file = get_dataset('gen_twittertrain').as_filepath()
test_file = get_dataset('gen_twittertest').as_filepath()

train_data = pd.read_csv(train_file, sep='\t')
test_data = pd.read_csv(test_file, sep='\t')

pipeline = Pipeline([ # sklearn pipeline
    ('tfidf', TfidfVectorizer()), # sklearn transform
    ('clf', FastTreesBinaryClassifier()) # nimbusml learner
])

# fit and predict
pipeline.fit(train_data["Text"], train_data["Label"])
results = pipeline.predict(test_data["Text"])

Many additional examples and tutorials can be found in the documentation.

Building

To build nimbusml from source please visit our developer guide.

Contributing

The contributions guide can be found here.

Support

If you have an idea for a new feature or encounter a problem, please open an issue in this repository or ask your question on Stack Overflow.

License

NimbusML is licensed under the MIT license.

nimbusml's People

Contributors

ganik avatar pieths avatar najeeb-kazmi avatar mstfbl avatar zyw400 avatar montebhoover avatar shmoradims avatar stephen0620 avatar justinormont avatar microsoftopensource avatar kant avatar galoshri avatar maherjendoubi avatar msftgits avatar montehoover avatar safern avatar xadupre avatar cclauss avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.