Code Monkey home page Code Monkey logo

mleap-demo's People

Contributors

ancasarb avatar hollinwilkins avatar seme0021 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar

mleap-demo's Issues

Airbnb price regression, dataset unzip error

Following the tutorial on https://github.com/combust/mleap-demo/blob/master/notebooks/airbnb-price-regression.ipynb , I downloaded the dataset from https://s3-us-west-2.amazonaws.com/mleap-demo/datasources/airbnb.avro.zip . However the file cant be unzipped, and fails with the error:

  End-of-central-directory signature not found.  Either this file is not
  a zipfile, or it constitutes one disk of a multi-part archive.  In the
  latter case the central directory and zipfile comment will be found on
  the last disk(s) of this archive.
note:  airbnb.avro.zip may be a plain executable, not an archive
unzip:  cannot find zipfile directory in one of airbnb.avro.zip or
        airbnb.avro.zip.zip, and cannot find airbnb.avro.zip.ZIP, period.

AttributeError: 'OneHotEncoder' object has no attribute 'n_values_'

I tried serializing pipeline with mleap but it's giving error
"ColumnTransformer' object has no attribute 'op"

Below are segments from the pipline:
numeric_transformer = Pipeline(steps=[ ('imputer', SimpleImputer(strategy='median')), ('scaler', StandardScaler())])

categorical_transformer = Pipeline(steps=[
('imputer', SimpleImputer(strategy='constant', fill_value='missing')),
('onehot', OneHotEncoder(handle_unknown='ignore'))])

preprocessor = ColumnTransformer( transformers=[ ('num', numeric_transformer, numeric_features),
('cat', categorical_transformer, categorical_features), ])

mlp = MLPClassifier(hidden_layer_sizes=(8,6,1), max_iter=300,activation = 'tanh',solver='adam',random_state=123)

pipe = Pipeline([('preprocessor', preprocessor), ('mlp', mlp) ])
pipe.mlinit()

model = pipe.fit(X_train,y_train)

model.serialize_to_bundle("jar:file:/C://Users/logReg.zip")

doesn't work on 0.8.1

Seems the pyspark and scala notebooks don't work due to changes in the underlying API. The notebooks needs to be fixed to work with the recent versions.

Doesn't work on mleap 0.6.0?

I'm going through this example and this doesn't seem to work using the current master branch on 0.6.0. There is no mleap.pyspark in master. That said, I also tried using branch feature/scikit-v2 which does have mleap.pyspark, but then when I get to the bottom, it just says 'Pipeline' object has no attribute 'serializeToBundle'. Any ideas on what is going on here?

Multiple artifacts of the module net.sourceforge.f2j#arpack_combined_all;0.1 are retrieved to the same file! Update the retrieve pattern to fix this error

I am trying to use spark-shell using mleap as a package with following command:

spark-shell --packages ml.combust.mleap:mleap-runtime_2.11:0.7.0

Here is the error that I get:
Exception in thread "main" java.lang.RuntimeException: problem during retrieve of org.apache.spark#spark-submit-parent: java.lang.RuntimeException: Multiple artifacts of the module net.sourceforge.f2j#arpack_combined_all;0.1 are retrieved to the same file! Update the retrieve pattern to fix this error.
at org.apache.ivy.core.retrieve.RetrieveEngine.retrieve(RetrieveEngine.java:249)
at org.apache.ivy.core.retrieve.RetrieveEngine.retrieve(RetrieveEngine.java:83)
at org.apache.ivy.Ivy.retrieve(Ivy.java:551)
at org.apache.spark.deploy.SparkSubmitUtils$.resolveMavenCoordinates(SparkSubmit.scala:1086)
at org.apache.spark.deploy.SparkSubmit$.prepareSubmitEnvironment(SparkSubmit.scala:296)
at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:160)
at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:126)
at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
Caused by: java.lang.RuntimeException: Multiple artifacts of the module net.sourceforge.f2j#arpack_combined_all;0.1 are retrieved to the same file! Update the retrieve pattern to fix this error.
at org.apache.ivy.core.retrieve.RetrieveEngine.determineArtifactsToCopy(RetrieveEngine.java:417)
at org.apache.ivy.core.retrieve.RetrieveEngine.retrieve(RetrieveEngine.java:118)

Please help in resolving it. I am using spark 2.1

Serializing mlp classifier (sk learn) with mleap serialize_to_bundle

I tried serializing pipeline with mleap but it's giving error
"ColumnTransformer' object has no attribute 'op"

Below are segments from the pipline:
numeric_transformer = Pipeline(steps=[ ('imputer', SimpleImputer(strategy='median')), ('scaler', StandardScaler())])

categorical_transformer = Pipeline(steps=[
('imputer', SimpleImputer(strategy='constant', fill_value='missing')),
('onehot', OneHotEncoder(handle_unknown='ignore'))])

preprocessor = ColumnTransformer( transformers=[ ('num', numeric_transformer, numeric_features),
('cat', categorical_transformer, categorical_features), ])

mlp = MLPClassifier(hidden_layer_sizes=(8,6,1), max_iter=300,activation = 'tanh',solver='adam',random_state=123)

pipe = Pipeline([('preprocessor', preprocessor), ('mlp', mlp) ])
pipe.mlinit()

model = pipe.fit(X_train,y_train)

model.serialize_to_bundle("jar:file:/C://Users/logReg.zip")

AttributeError: 'Pipeline' object has no attribute 'name'

tried serializing the below pipepline

if I try remove the init argument from serialize to bundle
error is
"AttributeError: 'OutletTypeEncoder' object has no attribute 'op'"

importing required libraries

import pandas as pd
from sklearn.compose import ColumnTransformer
from sklearn.impute import SimpleImputer
import category_encoders as ce
from sklearn.preprocessing import StandardScaler
from sklearn.ensemble import RandomForestRegressor
#from sklearn.preprocessing import StandardScaler, MinMaxScaler, Imputer, Binarizer, PolynomialFeatures
from sklearn.pipeline import Pipeline
import mleap.sklearn.pipeline
import mleap.sklearn.feature_union
import mleap.sklearn.base
import mleap.sklearn.logistic
import mleap.sklearn.preprocessing

read the training data set

data = pd.read_csv('market.csv')

top rows of the data

#print(data.head(5))

seperate the independent and target variables

train_x = data.drop(columns=['Item_Outlet_Sales'])
train_y = data['Item_Outlet_Sales']

import the BaseEstimator

from sklearn.base import BaseEstimator

define the class OutletTypeEncoder

This will be our custom transformer that will create 3 new binary columns

custom transformer must have methods fit and transform

class OutletTypeEncoder(BaseEstimator):

def __init__(self):
    pass

def fit(self, documents, y=None):
    return self

def transform(self, x_dataset):
    x_dataset['outlet_grocery_store'] = (x_dataset['Outlet_Type'] == 'Grocery Store') * 1
    x_dataset['outlet_supermarket_3'] = (x_dataset['Outlet_Type'] == 'Supermarket Type3') * 1
    x_dataset['outlet_identifier_OUT027'] = (x_dataset['Outlet_Identifier'] == 'OUT027') * 1

    return x_dataset

pre-processsing step

Drop the columns -

Impute the missing values in column Item_Weight by mean

Scale the data in the column Item_MRP

pre_process = ColumnTransformer(remainder='passthrough',
transformers=[('drop_columns', 'drop', ['Item_Identifier',
'Outlet_Identifier',
'Item_Fat_Content',
'Item_Type',
'Outlet_Identifier',
'Outlet_Size',
'Outlet_Location_Type',
'Outlet_Type'
]),
('impute_item_weight', SimpleImputer(strategy='mean'), ['Item_Weight']),
('scale_data', StandardScaler(),['Item_MRP'])])

Define the Pipeline

"""
Step1: get the oultet binary columns
Step2: pre processing
Step3: Train a Random Forest Model
"""
model_pipeline = Pipeline(steps=[('get_outlet_binary_columns', OutletTypeEncoder()),
('pre_processing',pre_process),
('random_forest', RandomForestRegressor(max_depth=10,random_state=2))
])

fit the pipeline with the training data

model_pipeline.fit(train_x,train_y)

read the test data

test_data = pd.read_csv('test.csv')

predict target variables on the test data

#model_pipeline.predict(test_data)

Serialiaze the random forest model

model_pipeline.serialize_to_bundle('/tmp', 'market.rf', init=True)

-

Hello there,

I have an issue for running just a basic script :

from river import linear_model
from river import metrics
from river import evaluate
from river import preprocessing
import pandas as pd

data = pd.read_csv("C:/Users/Monster/Desktop/LveR.csv")

Import label encoder

from sklearn import preprocessing
from sklearn.model_selection import train_test_split

label_encoder object knows how to understand word labels.

label_encoder = preprocessing.LabelEncoder()

Encode labels in column 'species'.

data['Class'] = label_encoder.fit_transform(data['Class'])

data['Class'].unique()

X=data.iloc[:,:-1]
y=data.iloc[:,-1]

X_train,X_test,y_train,y_test = train_test_split(X,y,random_state = 11,test_size=0.25,shuffle=True)

model = (
preprocessing.StandardScaler() |
linear_model.LogisticRegression())

metric=metrics.ROCAUC()
evaluate.progressive_val_score(X_test,y_test,model,metric)

My aim is to perform a logistic regression and i encoded my categorical variable as 0,1.

That says " Pipeline object has no attribute "works_with". I get confused. Any type of help will be appreciated. Thanks !

Error

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.