hyperopt / hyperopt-sklearn Goto Github PK

View Code? Open in Web Editor NEW

1.6K 58.0 270.0 2.29 MB

Hyper-parameter optimization for sklearn

Home Page: hyperopt.github.io/hyperopt-sklearn

License: Other

Python 100.00%

hyperopt-sklearn's Introduction

hyperopt-sklearn

Hyperopt-sklearn is Hyperopt-based model selection among machine learning algorithms in scikit-learn.

See how to use hyperopt-sklearn through examples More examples can be found in the Example Usage section of the SciPy paper

Komer B., Bergstra J., and Eliasmith C. "Hyperopt-Sklearn: automatic hyperparameter configuration for Scikit-learn" Proc. SciPy 2014. http://conference.scipy.org/proceedings/scipy2014/pdfs/komer.pdf

Installation

Installation from the GitHub repository is supported using pip:

pip install git+https://github.com/hyperopt/hyperopt-sklearn

Optionally you can install a specific tag, branch or commit:

pip install git+https://github.com/hyperopt/[email protected]
pip install git+https://github.com/hyperopt/hyperopt-sklearn@master
pip install git+https://github.com/hyperopt/hyperopt-sklearn@fd718c44fc440bd6e2718ec1442b1af58cafcb18

Usage

If you are familiar with sklearn, adding the hyperparameter search with hyperopt-sklearn is only a one line change from the standard pipeline.

from hpsklearn import HyperoptEstimator, svc
from sklearn import svm

# Load Data
# ...

if __name__ == "__main__":
    if use_hpsklearn:
        estim = HyperoptEstimator(classifier=svc("mySVC"))
    else:
        estim = svm.SVC()
    
    estim.fit(X_train, y_train)
    
    print(estim.score(X_test, y_test))
# <<show score here>>

Each component comes with a default search space. The search space for each parameter can be changed or set constant by passing in keyword arguments. In the following example the penalty parameter is held constant during the search, and the loss and alpha parameters have their search space modified from the default.

from hpsklearn import HyperoptEstimator, sgd_classifier
from hyperopt import hp
import numpy as np

sgd_penalty = "l2"
sgd_loss = hp.pchoice("loss", [(0.50, "hinge"), (0.25, "log"), (0.25, "huber")])
sgd_alpha = hp.loguniform("alpha", low=np.log(1e-5), high=np.log(1))

if __name__ == "__main__":
    estim = HyperoptEstimator(classifier=sgd_classifier("my_sgd", penalty=sgd_penalty, loss=sgd_loss, alpha=sgd_alpha))
    estim.fit(X_train, y_train)

Complete example using the Iris dataset:

from hpsklearn import HyperoptEstimator, any_classifier, any_preprocessing
from sklearn.datasets import load_iris
from hyperopt import tpe
import numpy as np

# Download the data and split into training and test sets

iris = load_iris()

X = iris.data
y = iris.target

test_size = int(0.2 * len(y))
np.random.seed(13)
indices = np.random.permutation(len(X))
X_train = X[indices[:-test_size]]
y_train = y[indices[:-test_size]]
X_test = X[indices[-test_size:]]
y_test = y[indices[-test_size:]]


if __name__ == "__main__":
    # Instantiate a HyperoptEstimator with the search space and number of evaluations
    estim = HyperoptEstimator(classifier=any_classifier("my_clf"),
                              preprocessing=any_preprocessing("my_pre"),
                              algo=tpe.suggest,
                              max_evals=100,
                              trial_timeout=120)
    
    # Search the hyperparameter space based on the data
    estim.fit(X_train, y_train)
    
    # Show the results
    print(estim.score(X_test, y_test))
    # 1.0
    
    print(estim.best_model())
    # {'learner': ExtraTreesClassifier(bootstrap=False, class_weight=None, criterion='gini',
    #           max_depth=3, max_features='log2', max_leaf_nodes=None,
    #           min_impurity_decrease=0.0, min_impurity_split=None,
    #           min_samples_leaf=1, min_samples_split=2,
    #           min_weight_fraction_leaf=0.0, n_estimators=13, n_jobs=1,
    #           oob_score=False, random_state=1, verbose=False,
    #           warm_start=False), 'preprocs': (), 'ex_preprocs': ()}

Here's an example using MNIST and being more specific on the classifier and preprocessing.

from hpsklearn import HyperoptEstimator, extra_tree_classifier
from sklearn.datasets import load_digits
from hyperopt import tpe
import numpy as np

# Download the data and split into training and test sets

digits = load_digits()

X = digits.data
y = digits.target

test_size = int(0.2 * len(y))
np.random.seed(13)
indices = np.random.permutation(len(X))
X_train = X[indices[:-test_size]]
y_train = y[indices[:-test_size]]
X_test = X[indices[-test_size:]]
y_test = y[indices[-test_size:]]


if __name__ == "__main__":
    # Instantiate a HyperoptEstimator with the search space and number of evaluations
    estim = HyperoptEstimator(classifier=extra_tree_classifier("my_clf"),
                              preprocessing=[],
                              algo=tpe.suggest,
                              max_evals=10,
                              trial_timeout=300)

    # Search the hyperparameter space based on the data
    estim.fit(X_train, y_train)

    # Show the results
    print(estim.score(X_test, y_test))
    # 0.962785714286

    print(estim.best_model())
    # {'learner': ExtraTreesClassifier(bootstrap=True, class_weight=None, criterion='entropy',
    #           max_depth=None, max_features=0.959202875857,
    #           max_leaf_nodes=None, min_impurity_decrease=0.0,
    #           min_impurity_split=None, min_samples_leaf=1,
    #           min_samples_split=2, min_weight_fraction_leaf=0.0,
    #           n_estimators=20, n_jobs=1, oob_score=False, random_state=3,
    #           verbose=False, warm_start=False), 'preprocs': (), 'ex_preprocs': ()}

Available Components

Almost all classifiers/regressors/preprocessing scikit-learn components are implemented. If there is something you would like that is not yet implemented, feel free to make an issue or a pull request!

Classifiers

random_forest_classifier
extra_trees_classifier
bagging_classifier
ada_boost_classifier
gradient_boosting_classifier
hist_gradient_boosting_classifier

bernoulli_nb
categorical_nb
complement_nb
gaussian_nb
multinomial_nb

sgd_classifier
sgd_one_class_svm
ridge_classifier
ridge_classifier_cv
passive_aggressive_classifier
perceptron

dummy_classifier

gaussian_process_classifier

mlp_classifier

linear_svc
nu_svc
svc

decision_tree_classifier
extra_tree_classifier

label_propagation
label_spreading

elliptic_envelope

linear_discriminant_analysis
quadratic_discriminant_analysis

bayesian_gaussian_mixture
gaussian_mixture

k_neighbors_classifier
radius_neighbors_classifier
nearest_centroid

xgboost_classification
lightgbm_classification

one_vs_rest
one_vs_one
output_code

For a simple generic search space across many classifiers, use any_classifier. If your data is in a sparse matrix format, use any_sparse_classifier. For a complete search space across all possible classifiers, use all_classifiers.

Regressors

random_forest_regressor
extra_trees_regressor
bagging_regressor
isolation_forest
ada_boost_regressor
gradient_boosting_regressor
hist_gradient_boosting_regressor

linear_regression
bayesian_ridge
ard_regression
lars
lasso_lars
lars_cv
lasso_lars_cv
lasso_lars_ic
lasso
elastic_net
lasso_cv
elastic_net_cv
multi_task_lasso
multi_task_elastic_net
multi_task_lasso_cv
multi_task_elastic_net_cv
poisson_regressor
gamma_regressor
tweedie_regressor
huber_regressor
sgd_regressor
ridge
ridge_cv
logistic_regression
logistic_regression_cv
orthogonal_matching_pursuit
orthogonal_matching_pursuit_cv
passive_aggressive_regressor
quantile_regression
ransac_regression
theil_sen_regressor

dummy_regressor

gaussian_process_regressor

mlp_regressor

cca
pls_canonical
pls_regression

linear_svr
nu_svr
one_class_svm
svr

decision_tree_regressor
extra_tree_regressor

transformed_target_regressor

hp_sklearn_kernel_ridge

bayesian_gaussian_mixture
gaussian_mixture

k_neighbors_regressor
radius_neighbors_regressor

k_means
mini_batch_k_means

xgboost_regression

lightgbm_regression

For a simple generic search space across many regressors, use any_regressor. If your data is in a sparse matrix format, use any_sparse_regressor. For a complete search space across all possible regressors, use all_regressors.

Preprocessing

binarizer
min_max_scaler
max_abs_scaler
normalizer
robust_scaler
standard_scaler
quantile_transformer
power_transformer
one_hot_encoder
ordinal_encoder
polynomial_features
spline_transformer
k_bins_discretizer

tfidf

pca

ts_lagselector

colkmeans

For a simple generic search space across many preprocessing algorithms, use any_preprocessing. If your data is in a sparse matrix format, use any_sparse_preprocessing. For a complete search space across all preprocessing algorithms, use all_preprocessing. If you are working with raw text data, use any_text_preprocessing. Currently, only TFIDF is used for text, but more may be added in the future.

Note that the preprocessing parameter in HyperoptEstimator is expecting a list, since various preprocessing steps can be chained together. The generic search space functions any_preprocessing and any_text_preprocessing already return a list, but the others do not, so they should be wrapped in a list. If you do not want to do any preprocessing, pass in an empty list [].

hyperopt-sklearn's People

Contributors

Stargazers

Watchers

Forkers

ronncc aurora1625 bjkomer bergtholdt soravux dwf wavelets 2dpodcast liuyepku bluelzx mlmlm nusratsharmin temporaer superbobry chemccandless jmwoloso mehdidc gopal-m qwshy weifoo fandywang shashankg7 dbehrouz goleo8 xydrolase pqcfox lishen mrfroll vanova tebzito yochju ml-ai-nlp-ir miaochenal georggr mutual-ai roertech koryako khatriindu abhishekkodi xingxinghuang davidurpani forestdengtech prashant118 henrilin28 naoko leezqcst switcc kirchho wuyanan520 stephanesbizzera zaczou breather songzee racingicemen timellemit enavarroai tigerneil guildford brutishguy zhanghonglishanzai ninachang1107 geoblog zeromtmu jtang-qhzx dresquivelphd robinhsu121 zgcgreat tsdalton milkandbanana aht shaoguang123 jplauris xtannier songhuiming alvinjamur christopherlugos pboreas caomw jz3707 snci idex towerjoo ck032 aflaisler shaidd amitgtx kdotm dido524 tchaton secondwind liyingkun1237 lstmemery yuanjie-ai mcckk ypc8272805 yutiansut pfjob09 machinedogwc strategicvisionary jsantoso-stts

hyperopt-sklearn's Issues

Text classification - error when fitting HyperoptEstimator

Hi there,

I am encountering an issue regarding the text classifier example explained at http://hyperopt.github.io/hyperopt-sklearn/.

I'm litteraly copy-pasting the example, but an error rises when I execute the fit command.

estim.fit( X_train, y_train )

Traceback (most recent call last):
  File "<ipython-input-12-8ecb54c50078>", line 1, in <module>
    estim.fit( X_train, y_train )

  File "/home/robin/Documents/Python/hyperopt-sklearn/hpsklearn/estimator.py", line 708, in fit
    fit_iter.send(increment)
  File "/home/robin/Documents/Python/hyperopt-sklearn/hpsklearn/estimator.py", line 619, in fit_iter
    return_argmin=False, # -- in case no success so far
  File "/home/robin/anaconda3/lib/python3.5/site-packages/hyperopt/fmin.py", line 307, in fmin
    return_argmin=return_argmin,
  File "/home/robin/anaconda3/lib/python3.5/site-packages/hyperopt/base.py", line 635, in fmin
    return_argmin=return_argmin)
  File "/home/robin/anaconda3/lib/python3.5/site-packages/hyperopt/fmin.py", line 320, in fmin
    rval.exhaust()
  File "/home/robin/anaconda3/lib/python3.5/site-packages/hyperopt/fmin.py", line 199, in exhaust
    self.run(self.max_evals - n_done, block_until_done=self.async)
  File "/home/robin/anaconda3/lib/python3.5/site-packages/hyperopt/fmin.py", line 173, in run
    self.serial_evaluate()
  File "/home/robin/anaconda3/lib/python3.5/site-packages/hyperopt/fmin.py", line 92, in serial_evaluate
    result = self.domain.evaluate(spec, ctrl)
  File "/home/robin/anaconda3/lib/python3.5/site-packages/hyperopt/base.py", line 840, in evaluate
    rval = self.fn(pyll_rval)
  File "/home/robin/Documents/Python/hyperopt-sklearn/hpsklearn/estimator.py", line 582, in fn_with_timeout
    raise fn_rval[1]

ValueError: zero-dimensional arrays cannot be concatenated

I have checked that my numpy version is 1.11.0 ( I saw that this could be an issue).

Do you have an idea of what's going on? Maybe Python 3.5 ?
Thank you,

text preprocessing: stemming, lemmatization

E.g. https://pypi.python.org/pypi/stemming/1.0, and there's an NLTK binding.

Write tutorial on how to proceed from features

As a researcher with a bunch of extracted features, how to use hyperopt_estimator to know if they're good for predicting some target?

Unbalanced Data - Change Loss Metric

Hyperopt-sklearn for classification uses accuracy as the scorer. I'm using it on unbalanced data, so accuracy seems like a poor choice, and would prefer to use F1. Is it fine to just switch accuracy_score for f1_score in estimator.py line 288. Or is there a better approach, am I missing something?

Incorrect "Best" returned at the end of search

When performing a search instead of returning the best trial it's returning... well I'm not sure.

Code below

`
from hpsklearn import HyperoptEstimator
from hyperopt import fmin, tpe, hp, STATUS_OK, Trials
from sklearn.model_selection import cross_val_score

max_leaf_n = [-11, -10, -9, -8, -7, -6, -5, -4, -3, -2, -1, 2, 3, 4, 5, 6, 7, 8, 9, 10]

def hyperopt_train_test(params):
clf = GradientBoostingClassifier(**params)
return cross_val_score(clf, X_train, y_train, cv=4, scoring='roc_auc').mean()

space4gbc = {
'learning_rate': hp.choice('learning_rate', np.array(range(1, 10))/100.),
'n_estimators': hp.choice('n_estimators', np.array(range(100, 10000, 100))),
'subsample': hp.choice('subsample', np.array(range(5, 10))/10.),
'max_depth': hp.choice('max_depth', np.array(range(3, 11))),
'max_features': hp.choice('max_features', np.array(range(5, 10))/10.),
'max_leaf_nodes':hp.choice('max_leaf_nodes', np.array(max_leaf_n))

}

best = 0
def f(params):
global best
acc = hyperopt_train_test(params)
if acc > best:
best = acc
print 'new best:', best, params
return {'loss': -acc, 'status': STATUS_OK}

trials = Trials()
best = fmin(f, space4gbc, algo=tpe.suggest, max_evals=1500, trials=trials)
print 'best:'
print best
`

Returns:

WARN: OMP_NUM_THREADS=None =>
... If you are using openblas if you are using openblas set OMP_NUM_THREADS=1 or risk subprocess calls hanging indefinitely
new best: 0.935256874069 {'n_estimators': 6000, 'subsample': 0.5, 'max_features': 0.90000000000000002, 'max_leaf_nodes': 2, 'learning_rate': 0.089999999999999997, 'max_depth': 3}
new best: 0.95180737041 {'n_estimators': 6200, 'subsample': 0.59999999999999998, 'max_features': 0.5, 'max_leaf_nodes': -8, 'learning_rate': 0.050000000000000003, 'max_depth': 5}
new best: 0.952759609985 {'n_estimators': 1800, 'subsample': 0.90000000000000002, 'max_features': 0.69999999999999996, 'max_leaf_nodes': -10, 'learning_rate': 0.070000000000000007, 'max_depth': 8}
new best: 0.953281116353 {'n_estimators': 4000, 'subsample': 0.59999999999999998, 'max_features': 0.5, 'max_leaf_nodes': -10, 'learning_rate': 0.089999999999999997, 'max_depth': 8}
new best: 0.95399352664 {'n_estimators': 5800, 'subsample': 0.90000000000000002, 'max_features': 0.5, 'max_leaf_nodes': -4, 'learning_rate': 0.089999999999999997, 'max_depth': 9}
new best: 0.954052296118 {'n_estimators': 2000, 'subsample': 0.80000000000000004, 'max_features': 0.59999999999999998, 'max_leaf_nodes': -11, 'learning_rate': 0.040000000000000001, 'max_depth': 10}
new best: 0.954097993402 {'n_estimators': 2000, 'subsample': 0.80000000000000004, 'max_features': 0.59999999999999998, 'max_leaf_nodes': -2, 'learning_rate': 0.040000000000000001, 'max_depth': 7}
new best: 0.954157499429 {'n_estimators': 5400, 'subsample': 0.80000000000000004, 'max_features': 0.5, 'max_leaf_nodes': -11, 'learning_rate': 0.02, 'max_depth': 10}
new best: 0.954237104722 {'n_estimators': 6500, 'subsample': 0.80000000000000004, 'max_features': 0.5, 'max_leaf_nodes': -4, 'learning_rate': 0.02, 'max_depth': 10}
new best: 0.954264561746 {'n_estimators': 8400, 'subsample': 0.80000000000000004, 'max_features': 0.5, 'max_leaf_nodes': -3, 'learning_rate': 0.029999999999999999, 'max_depth': 10}
new best: 0.954437874474 {'n_estimators': 6300, 'subsample': 0.80000000000000004, 'max_features': 0.5, 'max_leaf_nodes': -3, 'learning_rate': 0.029999999999999999, 'max_depth': 7}
new best: 0.954742832472 {'n_estimators': 7100, 'subsample': 0.80000000000000004, 'max_features': 0.5, 'max_leaf_nodes': -6, 'learning_rate': 0.029999999999999999, 'max_depth': 7}
best:
{'max_leaf_nodes': 5, 'learning_rate': 2, 'n_estimators': 70, 'subsample': 3, 'max_features': 0, 'max_depth': 4}

Add a "warm start" feature

I'd like to help adding a "warm start" feature so that when the fit method of a HyperoptEstimator object is called, the estimator can start from an existing sequence of trials. This can be easily implemented by recycling the existing self.trials object and adjusting the max_evals of the while loop in the fit method. Let me know if you would rather implement otherwise. @jaberg @bjkomer

merge

@bjkomer Could we merge in some of the work you've been doing to this project?

Prepare MNIST example / demo

Hyperopt-sklearn For Regression

After going through the code, I tried to implement the optimization for multivariate regression as follows.
from hyperopt import hp
from hpsklearn import regressor
estimreg = hyperopt_estimator(regressor =any_regressor('reg')
estimreg.fit(X_train,y_train)

It gives me a error. I am not able to run any regression algorithms. Is this a limitation ?

Get tests running cleanly post-partial-fit merge

Restrict hyperparameter choice based on data format

For example, KNN has a distance metric parameter and some metrics cannot be used on sparse data (e.g. chebyshev). Need a nice way to prevent these from being selected when sparse data is used.

One way could be to have a separate search space defined for sparse and dense data. (i.e. knn() and knn_sparse())
Another option could be to have a sparse/dense flag that changes how the space is defined (i.e. knn( sparse=True ))

I'm leaning towards the second option.

Error running demo iris notebook - Iteration of zero-sized operands is not enabled

I was trying to follow demo notebook published here: http://nbviewer.jupyter.org/github/hyperopt/hyperopt-sklearn/blob/master/notebooks/Demo-Iris.ipynb but I'm getting the following error on cell # 4.
I tried to run just estimator.fit(X_train,y_train) after cell #3 execution but I'm getting the same error. Any insight would be appreciated.

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-28-b2e0025bc20c> in <module>()
      5                                                 mintodate_ylim=(-.01, .05))
      6 while len(estimator.trials.trials) < estimator.max_evals:
----> 7     fit_iterator.send(1) # -- try one more model
      8     plot_helper.post_iter()
      9 plot_helper.post_loop()

/Users/nreeves/projects/automl/env/lib/python2.7/site-packages/hpsklearn/estimator.pyc in fit_iter(self, X, y, EX_list, valid_size, n_folds, cv_shuffle, warm_start, random_state, weights, increment)
    614                               #    so we notice them.
    615                               catch_eval_exceptions=False,
--> 616                               return_argmin=False, # -- in case no success so far
    617                              )
    618             else:

/Users/nreeves/projects/automl/env/lib/python2.7/site-packages/hyperopt/fmin.pyc in fmin(fn, space, algo, max_evals, trials, rstate, allow_trials_fmin, pass_expr_memo_ctrl, catch_eval_exceptions, verbose, return_argmin)
    305             verbose=verbose,
    306             catch_eval_exceptions=catch_eval_exceptions,
--> 307             return_argmin=return_argmin,
    308         )
    309 

/Users/nreeves/projects/automl/env/lib/python2.7/site-packages/hyperopt/base.pyc in fmin(self, fn, space, algo, max_evals, rstate, verbose, pass_expr_memo_ctrl, catch_eval_exceptions, return_argmin)
    633             pass_expr_memo_ctrl=pass_expr_memo_ctrl,
    634             catch_eval_exceptions=catch_eval_exceptions,
--> 635             return_argmin=return_argmin)
    636 
    637 

/Users/nreeves/projects/automl/env/lib/python2.7/site-packages/hyperopt/fmin.pyc in fmin(fn, space, algo, max_evals, trials, rstate, allow_trials_fmin, pass_expr_memo_ctrl, catch_eval_exceptions, verbose, return_argmin)
    318                     verbose=verbose)
    319     rval.catch_eval_exceptions = catch_eval_exceptions
--> 320     rval.exhaust()
    321     if return_argmin:
    322         return trials.argmin

/Users/nreeves/projects/automl/env/lib/python2.7/site-packages/hyperopt/fmin.pyc in exhaust(self)
    197     def exhaust(self):
    198         n_done = len(self.trials)
--> 199         self.run(self.max_evals - n_done, block_until_done=self.async)
    200         self.trials.refresh()
    201         return self

/Users/nreeves/projects/automl/env/lib/python2.7/site-packages/hyperopt/fmin.pyc in run(self, N, block_until_done)
    171             else:
    172                 # -- loop over trials and do the jobs directly
--> 173                 self.serial_evaluate()
    174 
    175             if stopped:

/Users/nreeves/projects/automl/env/lib/python2.7/site-packages/hyperopt/fmin.pyc in serial_evaluate(self, N)
     90                 ctrl = base.Ctrl(self.trials, current_trial=trial)
     91                 try:
---> 92                     result = self.domain.evaluate(spec, ctrl)
     93                 except Exception as e:
     94                     logger.info('job exception: %s' % str(e))

/Users/nreeves/projects/automl/env/lib/python2.7/site-packages/hyperopt/base.pyc in evaluate(self, config, ctrl, attach_attachments)
    838                 memo=memo,
    839                 print_node_on_error=self.rec_eval_print_node_on_error)
--> 840             rval = self.fn(pyll_rval)
    841 
    842         if isinstance(rval, (float, int, np.number)):

/Users/nreeves/projects/automl/env/lib/python2.7/site-packages/hpsklearn/estimator.pyc in fn_with_timeout(*args, **kwargs)
    577             assert fn_rval[0] in ('raise', 'return')
    578             if fn_rval[0] == 'raise':
--> 579                 raise fn_rval[1]
    580 
    581             # -- remove potentially large objects from the rval

ValueError: Iteration of zero-sized operands is not enabled

Tuning parameters for sklearn.decomposition algorithms?

Hey,

I'd like to use this to tune the kernel parameter for k-PCA, but all the examples are for use with labeled data. How would I go about setting up hyperopt-sklearn to allow me to tune the RBF kernel parameter for kernel PCA?

I've got some labeled data that I'd like to push through a pipeline of k-PCA -> K means -> V-measure. I don't see any mention of kernel PCA in the hyperopt-sklearn repository, is it supported yet?

Remove dead code

Files such as notworking.py arouse suspicion.

MLOSS NIPS Workshop paper

Unexpected keyword argument 'probability' on regression models

HyperoptEstimator fit method crashes with ''TypeError: init() got an unexpected keyword argument 'probability'" on a regression models.

dataset demo: convex

TypeError: cannot serialize '_io.TextIOWrapper' object

From what I can tell, this error is related to multiprocessing. I'm on OS X, MacBook Pro.

from hpsklearn import HyperoptEstimator, any_classifier
from hyperopt import tpe
import numpy as np

print(X_train_matrix.shape, y_train_matrix.shape, X_test_matrix.shape, y_test_matrix.shape, type(X_train_matrix), type(y_train_matrix), type(X_test_matrix), type(y_test_matrix))

if __name__ == '__main__':
    estim = HyperoptEstimator( classifier=xgb,  
                                algo=tpe.suggest, 
                              preprocessing=None,
                              trial_timeout=100.0,
                             
                             )

    estim.fit(X_train_matrix, y_train_matrix)

    print( estim.score(X_test_matrix, y_test_matrix) )

    print( estim.best_model() )

Full log:

(31016, 133) (31016,) (7755, 133) (7755,) <class 'numpy.ndarray'> <class 'numpy.ndarray'> <class 'numpy.ndarray'> <class 'numpy.ndarray'>
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-151-69ac7b68d098> in <module>()
     13                              )
     14 
---> 15     estim.fit(X_train_matrix, y_train_matrix)
     16 
     17     print( estim.score(X_test_matrix, y_test_matrix) )

~/hyperopt-sklearn/hpsklearn/estimator.py in fit(self, X, y, EX_list, valid_size, n_folds, cv_shuffle, warm_start, random_state, weights)
    706             increment = min(self.fit_increment,
    707                             adjusted_max_evals - len(self.trials.trials))
--> 708             fit_iter.send(increment)
    709             if filename is not None:
    710                 with open(filename, 'wb') as dump_file:

~/hyperopt-sklearn/hpsklearn/estimator.py in fit_iter(self, X, y, EX_list, valid_size, n_folds, cv_shuffle, warm_start, random_state, weights, increment)
    617                               #    so we notice them.
    618                               catch_eval_exceptions=False,
--> 619                               return_argmin=False, # -- in case no success so far
    620                              )
    621             else:

/Library/Frameworks/Python.framework/Versions/3.5/lib/python3.5/site-packages/hyperopt/fmin.py in fmin(fn, space, algo, max_evals, trials, rstate, allow_trials_fmin, pass_expr_memo_ctrl, catch_eval_exceptions, verbose, return_argmin)
    305             verbose=verbose,
    306             catch_eval_exceptions=catch_eval_exceptions,
--> 307             return_argmin=return_argmin,
    308         )
    309 

/Library/Frameworks/Python.framework/Versions/3.5/lib/python3.5/site-packages/hyperopt/base.py in fmin(self, fn, space, algo, max_evals, rstate, verbose, pass_expr_memo_ctrl, catch_eval_exceptions, return_argmin)
    633             pass_expr_memo_ctrl=pass_expr_memo_ctrl,
    634             catch_eval_exceptions=catch_eval_exceptions,
--> 635             return_argmin=return_argmin)
    636 
    637 

/Library/Frameworks/Python.framework/Versions/3.5/lib/python3.5/site-packages/hyperopt/fmin.py in fmin(fn, space, algo, max_evals, trials, rstate, allow_trials_fmin, pass_expr_memo_ctrl, catch_eval_exceptions, verbose, return_argmin)
    318                     verbose=verbose)
    319     rval.catch_eval_exceptions = catch_eval_exceptions
--> 320     rval.exhaust()
    321     if return_argmin:
    322         return trials.argmin

/Library/Frameworks/Python.framework/Versions/3.5/lib/python3.5/site-packages/hyperopt/fmin.py in exhaust(self)
    197     def exhaust(self):
    198         n_done = len(self.trials)
--> 199         self.run(self.max_evals - n_done, block_until_done=self.async)
    200         self.trials.refresh()
    201         return self

/Library/Frameworks/Python.framework/Versions/3.5/lib/python3.5/site-packages/hyperopt/fmin.py in run(self, N, block_until_done)
    171             else:
    172                 # -- loop over trials and do the jobs directly
--> 173                 self.serial_evaluate()
    174 
    175             if stopped:

/Library/Frameworks/Python.framework/Versions/3.5/lib/python3.5/site-packages/hyperopt/fmin.py in serial_evaluate(self, N)
     90                 ctrl = base.Ctrl(self.trials, current_trial=trial)
     91                 try:
---> 92                     result = self.domain.evaluate(spec, ctrl)
     93                 except Exception as e:
     94                     logger.info('job exception: %s' % str(e))

/Library/Frameworks/Python.framework/Versions/3.5/lib/python3.5/site-packages/hyperopt/base.py in evaluate(self, config, ctrl, attach_attachments)
    838                 memo=memo,
    839                 print_node_on_error=self.rec_eval_print_node_on_error)
--> 840             rval = self.fn(pyll_rval)
    841 
    842         if isinstance(rval, (float, int, np.number)):

~/hyperopt-sklearn/hpsklearn/estimator.py in fn_with_timeout(*args, **kwargs)
    580             assert fn_rval[0] in ('raise', 'return')
    581             if fn_rval[0] == 'raise':
--> 582                 raise fn_rval[1]
    583 
    584             # -- remove potentially large objects from the rval

TypeError: cannot serialize '_io.TextIOWrapper' object

Any ideas? My searching hasn't found much...

rstate problem

Hi, I've been trying to get some examples working using hyperopt-sklearn on the Iris dataset, but haven't had any luck yet. After modifying the example code so that the iris dataset is loaded and split correctly, I am running into a problem using the demo code:

# Hyperopt SKL estimator
estimator = hpsklearn.HyperoptEstimator(
    preprocessing=hpsklearn.components.any_preprocessing('pp'),
    classifier=hpsklearn.components.any_classifier('clf'),
    algo=hyperopt.tpe.suggest,
    trial_timeout=15.0, # seconds
    max_evals=15,
    )

# Demo version of estimator.fit()
fit_iterator = estimator.fit_iter(X_train,y_train)
fit_iterator.next()
plot_helper = hpsklearn.demo_support.PlotHelper(estimator,
                                                mintodate_ylim=(-.01, .05))
while len(estimator.trials.trials) < estimator.max_evals:
    fit_iterator.send(1) # -- try one more model
    plot_helper.post_iter()
plot_helper.post_loop()

# -- Model selection was done on a subset of the training data.
# -- Now that we've picked a model, train on all training data.
estimator.retrain_best_model_on_full_data(X_train, y_train)

and the error I get back:

Traceback (most recent call last):
  File "dir/iris_test.py", line 45, in <module>
    fit_iterator.send(1) # -- try one more model
  File "dir/hyperopt-sklearn/hpsklearn/estimator.py", line 353, in fit_iter
    return_argmin=False, # -- in case no success so far
TypeError: fmin() got an unexpected keyword argument 'rstate'

Specific train/test splits

Don't always split the train set 80-20, allow the caller to specify a train and validation set.

Does hyperopt work for multiclass and multilabel classification?

Hi,

I have 2 questions:

I would like to know if hyperopt work for multiclass / multilable classification? For example, something like:

estimator = HyperoptEstimator(classifier=OneVsRest(svc('my_est')), algo=tpe.suggest, preprocessing=[], use_partial_fit=True, trial_timeout=timeout)

I found that hyperopt is quite slow when the training data is large. I think the parameter 'use_partial_fit' might speed up the fitting process, am I right? Is this the best practice to tell hyperopt not to train the entire training data when it is too large?

Thank you in advance!

failed to install on win10 python3.6

cd hyperopt-sklearn && pip install -e .

Obtaining file:///C:/Users/AStupidBear/hyperopt-sklearn
Complete output from command python setup.py egg_info:
Extracting in C:\Users\ASTUPI~1\AppData\Local\Temp\tmpz2hb3cqe
Traceback (most recent call last):
File "c:\users\astupidbear\documents\codes\julia\hpsklearn\deps\hyperopt\distribute_setup.py", line 150, in use_setuptools
raise ImportError
ImportError

During handling of the above exception, another exception occurred:                                                               
                                                                                                                                  
Traceback (most recent call last):                                                                                                
  File "<string>", line 1, in <module>                                                                                            
  File "C:\Users\AStupidBear\hyperopt-sklearn\setup.py", line 36, in <module>                                                     
    distribute_setup.use_setuptools()                                                                                             
  File "c:\users\astupidbear\documents\codes\julia\hpsklearn\deps\hyperopt\distribute_setup.py", line 152, in use_setuptools      
    return _do_download(version, download_base, to_dir, download_delay)                                                           
  File "c:\users\astupidbear\documents\codes\julia\hpsklearn\deps\hyperopt\distribute_setup.py", line 132, in _do_download        
    _build_egg(egg, tarball, to_dir)                                                                                              
  File "c:\users\astupidbear\documents\codes\julia\hpsklearn\deps\hyperopt\distribute_setup.py", line 105, in _build_egg          
    _extractall(tar)                                                                                                              
  File "c:\users\astupidbear\documents\codes\julia\hpsklearn\deps\hyperopt\distribute_setup.py", line 500, in _extractall         
    self.chown(tarinfo, dirpath)                                                                                                  
TypeError: chown() missing 1 required positional argument: 'numeric_owner'                                                        
                                                                                                                                  
----------------------------------------

Command "python setup.py egg_info" failed with error code 1 in C:\Users\AStupidBear\hyperopt-sklearn\

AssertionError: assert regressor is not None

Hello,

I am using hpsklearn for the first time. When doing the following basic things:

>>> from hpsklearn import HyperoptEstimator
>>> estimator = HyperoptEstimator()

I got the AssertionError:

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "hpsklearn/estimator.py", line 466, in __init__
    assert regressor is not None
AssertionError

Can you tell me why? By the way I am using sklearn 0.18 with anaconda python 2.7.

Thank you in advance!

Error when calling estim.fit() when all trials time out

Something like this:

...
estim = HyperoptEstimator( classifier=clf, trial_timeout=1)
estim.fit(X_train, y_train)

Results in this error:
AttributeError: 'hyperopt_estimator' object has no attribute '_best_preprocs'

estimator.fit crashes for sparse data in spite of using any_sparse_classifier()

I am trying to optimize for a text classification problem. For pre-processing I used sckit's CountVectorizer and TFIDF to get a sparse matrix as my data set. I am trying to optimize using any_sparse_classifier() method like,

    # Preprocessing
    X_train_counts = count_vect.fit_transform(train)
    X_train_tfidf = tfidf_transformer.fit_transform(X_train_counts)

    X_train, X_test, y_train, y_test = train_test_split(X_train_tfidf, target, test_size=0.3, random_state=42)

    # Optimizing
    from hpsklearn import HyperoptEstimator, any_sparse_classifier
    from hyperopt import tpe


    estim = HyperoptEstimator( classifier=any_sparse_classifier('myclf') ,
                            algo=tpe.suggest, trial_timeout=300, verbose=1)

    estim.fit(X_train, y_train)

    print(estim.score(X_test, y_test))
    # <<show score here>>
    print( estim.best_model() )
    # <<show model here>>

However it runs for a couple of iterations, but then crashes, each time giving a different error.
These are some of the trace backs I have received:

Fitting Normalizer(copy=True, norm='l1') to X of shape (30536, 252461)
Transforming fit and Xval (30536, 252461) (7635, 252461)
Training classifier SVC(C=1909.81396385, cache_size=1000.0, class_weight=None, coef0=0.0,
  decision_function_shape=None, degree=3, gamma='auto', kernel='linear',
  max_iter=160507086, probability=False, random_state=4, shrinking=False,
  tol=0.0107899097808, verbose=False) on X of dimension (30536, 252461)
Scoring on Xval of shape (7635, 252461)
OK trial with accuracy 98.7 +- 0.1
Traceback (most recent call last):
  File "/home/ubuntu/train.py", line 273, in trainclassifier
    estim.fit(X_train, y_train)
  File "/home/ubuntu/hyperopt-sklearn/hpsklearn/estimator.py", line 384, in fit
    fit_iter.send(increment)
  File "/home/ubuntu/hyperopt-sklearn/hpsklearn/estimator.py", line 353, in fit_iter
    return_argmin=False, # -- in case no success so far
  File "/home/ubuntu/hyperopt/hyperopt/fmin.py", line 306, in fmin
    return_argmin=return_argmin,
  File "/home/ubuntu/hyperopt/hyperopt/base.py", line 633, in fmin
    return_argmin=return_argmin)
  File "/home/ubuntu/hyperopt/hyperopt/fmin.py", line 319, in fmin
    rval.exhaust()
  File "/home/ubuntu/hyperopt/hyperopt/fmin.py", line 198, in exhaust
    self.run(self.max_evals - n_done, block_until_done=self.async)
  File "/home/ubuntu/hyperopt/hyperopt/fmin.py", line 172, in run
    self.serial_evaluate()
  File "/home/ubuntu/hyperopt/hyperopt/fmin.py", line 89, in serial_evaluate
    result = self.domain.evaluate(spec, ctrl)
  File "/home/ubuntu/hyperopt/hyperopt/base.py", line 838, in evaluate
    rval = self.fn(pyll_rval)
  File "/home/ubuntu/hyperopt-sklearn/hpsklearn/estimator.py", line 322, in fn_with_timeout
    raise fn_rval[1]
TypeError: MinMaxScaler does no support sparse input. You may consider to use MaxAbsScaler instead.
Fitting MinMaxScaler(copy=True, feature_range=(-1.0, 1.0)) to X of shape (30536, 252461)

Training classifier KNeighborsClassifier(algorithm='auto', leaf_size=41, metric='euclidean',
           metric_params=None, n_jobs=1, n_neighbors=40, p=2,
           weights='distance') on X of dimension (30536, 252461)
Scoring on Xval of shape (7635, 252461)
OK trial with accuracy 96.7 +- 0.2
Traceback (most recent call last):
  File "/home/ubuntu/train.py", line 273, in trainclassifier
    estim.fit(X_train, y_train)
  File "/home/ubuntu/hyperopt-sklearn/hpsklearn/estimator.py", line 384, in fit
    fit_iter.send(increment)
  File "/home/ubuntu/hyperopt-sklearn/hpsklearn/estimator.py", line 353, in fit_iter
    return_argmin=False, # -- in case no success so far
  File "/home/ubuntu/hyperopt/hyperopt/fmin.py", line 306, in fmin
    return_argmin=return_argmin,
  File "/home/ubuntu/hyperopt/hyperopt/base.py", line 633, in fmin
    return_argmin=return_argmin)
  File "/home/ubuntu/hyperopt/hyperopt/fmin.py", line 319, in fmin
    rval.exhaust()
  File "/home/ubuntu/hyperopt/hyperopt/fmin.py", line 198, in exhaust
    self.run(self.max_evals - n_done, block_until_done=self.async)
  File "/home/ubuntu/hyperopt/hyperopt/fmin.py", line 172, in run
    self.serial_evaluate()
  File "/home/ubuntu/hyperopt/hyperopt/fmin.py", line 89, in serial_evaluate
    result = self.domain.evaluate(spec, ctrl)
  File "/home/ubuntu/hyperopt/hyperopt/base.py", line 837, in evaluate
    print_node_on_error=self.rec_eval_print_node_on_error)
  File "/home/ubuntu/hyperopt/hyperopt/pyll/base.py", line 912, in rec_eval
    rval = scope._impls[node.name](*args, **kwargs)
  File "/home/ubuntu/hyperopt-sklearn/hpsklearn/components.py", line 20, in sklearn_SVC
    return sklearn.svm.SVC(*args, **kwargs)
  File "/usr/local/lib/python2.7/dist-packages/sklearn/svm/classes.py", line 539, in __init__
    random_state=random_state)
  File "/usr/local/lib/python2.7/dist-packages/sklearn/svm/base.py", line 507, in __init__
    random_state=random_state)
  File "/usr/local/lib/python2.7/dist-packages/sklearn/svm/base.py", line 86, in __init__
    raise ValueError(msg)
ValueError: The gamma value of 0.0 is invalid. Use 'auto' to set gamma to a value of 1 / n_features.

What is the issue here?

parameter importance

support tagging hyperparameters with importance level

hp.uniform('foo', 0, 1, importance=1.0) # average importance
hp.uniform('bar', 0, 1, importance=2.0) # above-average importance
hp.uniform('baz', 0, 1, importance=0.1) # way-below-average importance

adaptive time budget

multiprocessing gives a way to cut off jobs if they exceed a budget, but how should that budget be adapted to a new dataset?

Ideas:

set the budget to be something small-ish, like 60s
keep increasing it until "most jobs" finish (80% ?)
monitor whether the longest jobs are the best jobs, and consider raising the budget

dataset demo: CIFAR10 ?

KNN wrapper optimizes algorithm param

If algorithm param is a runtime / dimensionality consideration that actually does not change the classifier, then it should not be a hyperparameter

extending hyperopt-sklearn to clustering?

Hello, do you have any plans to extend hyperopt-sklearn functionality to unsupervised machine learning, i.e. clustering?
There are many clustering algorithms available in scikit-learn and for some of them setting hyperparameetrs can be tricky. It would be great if hyperopt-sklearn could search for best clustering algorithm and hyperparameters, similarly as for the supervised learning.

Estimator should re-train best model on train + validation

At least make it an option to retrain the best configuration on the full set of data passed to fit() after selecting a best configuration. This usually adds a bit of accuracy.

K-fold evaluation & early-stopping of it

It should be possible for the estimator's inner cost function to evaluate loss by K-fold cross-validation.

Also, that K-fold cross-validation should be aborted after < K folds if performance is terrible.

There are clever-er ways to do it, but it would already be a good start if promising configurations went through all K folds on their first (and only) evaluation, and un-promising loops stop after it becomes highly improbable that they are dealing with a new best configuration.

See "racing algorithms" for some criteria for this.

How to use hyperopt-sklearn to select a model when a big data set is divided into smaller ones?

I have a large data set which as a single file will not be able to load in my memory. So I divided the big data set into much smaller ones in such a way that I can access it in my memory. This is an sample code.

from hpsklearn import HyperoptEstimator, one_vs_rest
from hyperopt import tpe
import numpy as np
np.random.seed(111)

x=np.random.rand(5,50,3) 
y=np.random.randint(0,2,(5,50,2))

estim = HyperoptEstimator( classifier=one_vs_rest('clf'),algo=tpe.suggest, trial_timeout=300)
arr=[]
for i,j in enumerate(x):
    print i
    estim.fit( j, y[i] )
    print(estim.best_model())
    arr=np.append(arr,estim.best_model())
    
print arr

Here X and y represents small data set (5) formed from a big data set

The problem is hyperopt-sklearn treats every small data set as a different data set and output different models

array([ {'learner': OneVsRestClassifier(estimator=ExtraTreesClassifier(bootstrap=True, class_weight=None, criterion='entropy',
           max_depth=None, max_features=None, max_leaf_nodes=None,
           min_impurity_split=1e-07, min_samples_leaf=27,
           min_samples_split=2, min_weight_fraction_leaf=0.0,
           n_estimators=447, n_jobs=1, oob_score=False, random_state=3,
           verbose=False, warm_start=False),
          n_jobs=1), 'preprocs': (Normalizer(copy=True, norm='l1'),), 'ex_preprocs': ()},
       {'learner': OneVsRestClassifier(estimator=ExtraTreesClassifier(bootstrap=False, class_weight=None, criterion='gini',
           max_depth=None, max_features=0.329952415663,
           max_leaf_nodes=None, min_impurity_split=1e-07,
           min_samples_leaf=1, min_samples_split=2,
           min_weight_fraction_leaf=0.0, n_estimators=242, n_jobs=1,
           oob_score=False, random_state=1, verbose=False,
           warm_start=False),
          n_jobs=1), 'preprocs': (PCA(copy=True, iterated_power='auto', n_components=3, random_state=None,
  svd_solver='auto', tol=0.0, whiten=False),), 'ex_preprocs': ()},
       {'learner': OneVsRestClassifier(estimator=KNeighborsClassifier(algorithm='auto', leaf_size=30, metric='euclidean',
           metric_params=None, n_jobs=1, n_neighbors=1, p=2,
           weights='distance'),
          n_jobs=1), 'preprocs': (MinMaxScaler(copy=True, feature_range=(-1.0, 1.0)),), 'ex_preprocs': ()},
       {'learner': OneVsRestClassifier(estimator=SVC(C=0.0044670713832, cache_size=512, class_weight=None, coef0=0,
  decision_function_shape=None, degree=2.0, gamma=128.034173656,
  kernel='poly', max_iter=130374554.0, probability=False, random_state=2,
  shrinking=False, tol=6.0354795324e-05, verbose=False),
          n_jobs=1), 'preprocs': (MinMaxScaler(copy=True, feature_range=(-1.0, 1.0)),), 'ex_preprocs': ()},
       {'learner': OneVsRestClassifier(estimator=AdaBoostClassifier(algorithm='SAMME', base_estimator=None,
          learning_rate=0.78302675986, n_estimators=93, random_state=0),
          n_jobs=1), 'preprocs': (StandardScaler(copy=True, with_mean=True, with_std=False),), 'ex_preprocs': ()}], dtype=object)

How do I use hyperopt-sklearn to select a single model using these smaller data set as a whole data set?

multiple defaults

Multiple defaults can be good (a) to encode overall best defaults (e.g. fromsklearn) and (b) good sub-joint-assignments that have been found to be e.g. good SVM parameters for some task.

hp.uniform('foo', 0, 1, defaults={'a': 0.1, 'sklearn': 0.75})
hp.uniform('bar', 0, 1, defaults={'sklearn': 0.0001})

Then somehow modify hyperopt's search algos to sometimes pick sub-assignments from the defaults that are mentioned within a search space.

dataset demo: 20newsgroups

Produce an IPython notebook that illustrates how to train a model on 20newsgroups using the ipython parallelization backend.

Show some training curves, and do some basic analysis of the space to determine what kinds of models worked.

what preproc do we need
what version of task to run?

N.B. 20 newsgroups data should not include the "publication details" at the bottom of each file. There's a note to this effect in the sklearn website describing 20newsgroups data set.

cannot import name 'hyperopt_estimator'

I tried to run the example code:

from hpsklearn import svc, hyperopt_estimator
from sklearn import svm

if use_hpsklearn:
    estim = hyperopt_estimator( classifier=svc('mySVC') )
else:
    estim = svm.SVC( )

estim.fit( X_train, y_train )

print( estim.score( X_test, y_test ) )

But I get the following error message:

File "E:/Dokumente/somefolder/untitled0.py", line 1, in
from hpsklearn import svc, hyperopt_estimator

ImportError: cannot import name 'hyperopt_estimator'

Note: I can import svc, it's just the hyperopt_estimator that causes errors.

Error occured when calling estim.fit() in demo

Hello,

I got the error below when running the second example copied from http://hyperopt.github.io/hyperopt-sklearn/.

D:\ProgramData\Anaconda3\envs\py27\lib\site-packages\sklearn\cross_validation.py:44: DeprecationWarning: This module was deprecated in version 0.18 in favor of the model_selection module into which all the refactored classes and functions are moved. Also note that the interface of the new CV iterators are different from that of this module. This module will be removed in 0.20.
"This module will be removed in 0.20.", DeprecationWarning)
Traceback (most recent call last):
File "", line 1, in
File "D:\ProgramData\Anaconda3\envs\py27\lib\multiprocessing\forking.py", line 380, in main
prepare(preparation_data)
File "D:\ProgramData\Anaconda3\envs\py27\lib\multiprocessing\forking.py", line 509, in prepare
'parents_main', file, path_name, etc
File "E:\Hyperopt-sklearn\hysklearn_examples\sparse_classifier.py", line 20, in
estim.fit( X_train, y_train )
File "d:\programdata\hyperopt-hyperopt-sklearn-4cbcc64\hpsklearn\estimator.py", line 708, in fit
fit_iter.send(increment)
File "d:\programdata\hyperopt-hyperopt-sklearn-4cbcc64\hpsklearn\estimator.py", line 619, in fit_iter
return_argmin=False, # -- in case no success so far
File "D:\ProgramData\Anaconda3\envs\py27\lib\site-packages\hyperopt\fmin.py", line 307, in fmin
return_argmin=return_argmin,
File "D:\ProgramData\Anaconda3\envs\py27\lib\site-packages\hyperopt\base.py", line 635, in fmin
return_argmin=return_argmin)
File "D:\ProgramData\Anaconda3\envs\py27\lib\site-packages\hyperopt\fmin.py", line 320, in fmin
rval.exhaust()
File "D:\ProgramData\Anaconda3\envs\py27\lib\site-packages\hyperopt\fmin.py", line 199, in exhaust
self.run(self.max_evals - n_done, block_until_done=self.async)
File "D:\ProgramData\Anaconda3\envs\py27\lib\site-packages\hyperopt\fmin.py", line 173, in run
self.serial_evaluate()
File "D:\ProgramData\Anaconda3\envs\py27\lib\site-packages\hyperopt\fmin.py", line 92, in serial_evaluate
result = self.domain.evaluate(spec, ctrl)
File "D:\ProgramData\Anaconda3\envs\py27\lib\site-packages\hyperopt\base.py", line 840, in evaluate
rval = self.fn(pyll_rval)
File "d:\programdata\hyperopt-hyperopt-sklearn-4cbcc64\hpsklearn\estimator.py", line 567, in fn_with_timeout
th.start()
File "D:\ProgramData\Anaconda3\envs\py27\lib\multiprocessing\process.py", line 130, in start
self._popen = Popen(self)
File "D:\ProgramData\Anaconda3\envs\py27\lib\multiprocessing\forking.py", line 258, in init
cmd = get_command_line() + [rhandle]
File "D:\ProgramData\Anaconda3\envs\py27\lib\multiprocessing\forking.py", line 358, in get_command_line
is not going to be frozen to produce a Windows executable.''')
RuntimeError:
Attempt to start a new process before the current process
has finished its bootstrapping phase.

        This probably means that you are on Windows and you have
        forgotten to use the proper idiom in the main module:

            if __name__ == '__main__':
                freeze_support()
                ...

        The "freeze_support()" line can be omitted if the program
        is not going to be frozen to produce a Windows executable.

D:\ProgramData\Anaconda3\envs\py27\lib\site-packages\sklearn\cross_validation.py:44: DeprecationWarning: This module was deprecated in version 0.18 in favor of the model_selection module into which all the refactored classes and functions are moved. Also note that the interface of the new CV iterators are different from that of this module. This module will be removed in 0.20.
"This module will be removed in 0.20.", DeprecationWarning)
Traceback (most recent call last):
File "E:\Hyperopt-sklearn\hysklearn_examples\sparse_classifier.py", line 20, in
estim.fit( X_train, y_train )
File "d:\programdata\hyperopt-hyperopt-sklearn-4cbcc64\hpsklearn\estimator.py", line 708, in fit
fit_iter.send(increment)
File "d:\programdata\hyperopt-hyperopt-sklearn-4cbcc64\hpsklearn\estimator.py", line 619, in fit_iter
return_argmin=False, # -- in case no success so far
File "D:\ProgramData\Anaconda3\envs\py27\lib\site-packages\hyperopt\fmin.py", line 307, in fmin
return_argmin=return_argmin,
File "D:\ProgramData\Anaconda3\envs\py27\lib\site-packages\hyperopt\base.py", line 635, in fmin
return_argmin=return_argmin)
File "D:\ProgramData\Anaconda3\envs\py27\lib\site-packages\hyperopt\fmin.py", line 320, in fmin
rval.exhaust()
File "D:\ProgramData\Anaconda3\envs\py27\lib\site-packages\hyperopt\fmin.py", line 199, in exhaust
self.run(self.max_evals - n_done, block_until_done=self.async)
File "D:\ProgramData\Anaconda3\envs\py27\lib\site-packages\hyperopt\fmin.py", line 173, in run
self.serial_evaluate()
File "D:\ProgramData\Anaconda3\envs\py27\lib\site-packages\hyperopt\fmin.py", line 92, in serial_evaluate
result = self.domain.evaluate(spec, ctrl)
File "D:\ProgramData\Anaconda3\envs\py27\lib\site-packages\hyperopt\base.py", line 840, in evaluate
rval = self.fn(pyll_rval)
File "d:\programdata\hyperopt-hyperopt-sklearn-4cbcc64\hpsklearn\estimator.py", line 567, in fn_with_timeout
th.start()
File "D:\ProgramData\Anaconda3\envs\py27\lib\multiprocessing\process.py", line 130, in start
self._popen = Popen(self)
File "D:\ProgramData\Anaconda3\envs\py27\lib\multiprocessing\forking.py", line 280, in init
to_child.close()
IOError: [Errno 22] Invalid argument

I have tried other demos such as iris and MNIST, and encourtered the same error too. What's the problem?

Walltime budget for trial evaluation

Installation Problem using setup.py

I got the zip file, unzipped it, cd into folder and ran

python setup.py install

I eventually got the error

byte-compiling build\bdist.win-amd64\egg\hpsklearn\estimator.py to estimator.cpython-35.pyc
  File "build\bdist.win-amd64\egg\hpsklearn\estimator.py", line 329
    except (NonFiniteFeature,), exc:
                              ^
SyntaxError: invalid syntax

Error occured when replacing demo iris data with new dataset??

I replace demo's iris data with my own data, but I got the error as follows:
Traceback (most recent call last):
File "/opt/hyperopt-sklearn/hpsklearn/tests/test_demo.py", line 92, in
test_demo_yyb()
File "/opt/hyperopt-sklearn/hpsklearn/tests/test_demo.py", line 82, in test_demo_yyb
fit_iterator.send(1) # -- try one more model
File "/opt/hyperopt-sklearn/hpsklearn/estimator.py", line 348, in fit_iter
max_evals=len(self.trials.trials) + increment
File "/usr/local/lib/python2.7/dist-packages/hyperopt/fmin.py", line 334, in fmin
rval.exhaust()
File "/usr/local/lib/python2.7/dist-packages/hyperopt/fmin.py", line 294, in exhaust
self.run(self.max_evals - n_done, block_until_done=self.async)
File "/usr/local/lib/python2.7/dist-packages/hyperopt/fmin.py", line 268, in run
self.serial_evaluate()
File "/usr/local/lib/python2.7/dist-packages/hyperopt/fmin.py", line 187, in serial_evaluate
result = self.domain.evaluate(spec, ctrl)
File "/usr/local/lib/python2.7/dist-packages/hyperopt/fmin.py", line 122, in evaluate
dict_rval.keys())
ValueError: ('dictionary must have "loss" key', ['status', 'failure'])

How to pass "probability=True" to a support vector machine classifier?

I need to pass probability=True to SVC algorithm, because it is used later for ensemble soft voting classifier (VotingClassifier) and therefore probability calculated is required.
I tried with the following, but get the error: "Error: _svm_hp_space() got an unexpected keyword argument 'probability'"

svc1 = HyperoptEstimator(preprocessing = [],
    classifier=svc('my_svc', probability=True),
    algo=hyperopt.tpe.suggest,
    trial_timeout=120.0
    max_evals=100,
    verbose=2)

How to pass this probability=True parameter value to the SVC classifier utilizing hpsklearn?

Documentation for the available search spaces

It would be nice to have one easy to find page that lists all of the available classifiers/regressors/preprocessing in hyperopt-sklearn. Currently people have to look in components.py to find them. The README may be a good place for this.

ValueError: Iteration of zero-sized operands is not enabled

I am trying to use hyperopt-sklearn for classification. My data has about 1500 samples each having about 16000 features.

I have initialized my hyperopt estimator by:-

estim = hpsklearn.HyperoptEstimator( preprocessing=hpsklearn.components.any_preprocessing('pp'), classifier=hpsklearn.components.any_classifier('clf'), algo=hyperopt.tpe.suggest, trial_timeout=3000, # seconds max_evals=15, verbose=True )

I am constantly getting this error:

Traceback (most recent call last):
File "stage4_transfer_with_hyperopt.py", line 280, in
estim.fit(to_fit,to_fit_label)
File "build/bdist.linux-x86_64/egg/hpsklearn/estimator.py", line 705, in fit
File "build/bdist.linux-x86_64/egg/hpsklearn/estimator.py", line 616, in fit_iter
File "/usr/local/lib/python2.7/dist-packages/hyperopt-0.1-py2.7.egg/hyperopt/fmin.py", line 307, in fmin
return_argmin=return_argmin,
File "/usr/local/lib/python2.7/dist-packages/hyperopt-0.1-py2.7.egg/hyperopt/base.py", line 635, in fmin
return_argmin=return_argmin)
File "/usr/local/lib/python2.7/dist-packages/hyperopt-0.1-py2.7.egg/hyperopt/fmin.py", line 320, in fmin
rval.exhaust()
File "/usr/local/lib/python2.7/dist-packages/hyperopt-0.1-py2.7.egg/hyperopt/fmin.py", line 199, in exhaust
self.run(self.max_evals - n_done, block_until_done=self.async)
File "/usr/local/lib/python2.7/dist-packages/hyperopt-0.1-py2.7.egg/hyperopt/fmin.py", line 173, in run
self.serial_evaluate()
File "/usr/local/lib/python2.7/dist-packages/hyperopt-0.1-py2.7.egg/hyperopt/fmin.py", line 92, in serial_evaluate
result = self.domain.evaluate(spec, ctrl)
File "/usr/local/lib/python2.7/dist-packages/hyperopt-0.1-py2.7.egg/hyperopt/base.py", line 840, in evaluate
rval = self.fn(pyll_rval)
File "build/bdist.linux-x86_64/egg/hpsklearn/estimator.py", line 579, in fn_with_timeout
ValueError: Iteration of zero-sized operands is not enabled

skdata

After #33, I still get this error. Is there some modification to skdata that hasn't been merged?

======================================================================
ERROR: test_demo.test_demo_iris
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/opt/anaconda/lib/python2.7/site-packages/nose/case.py", line 197, in runTest
    self.test(*self.arg)
  File "/home/dwf/src/hyperopt-sklearn/hpsklearn/tests/test_demo.py", line 21, in test_demo_iris
    data_view.split[0].train.X,
AttributeError: 'KfoldClassification' object has no attribute 'split'

Demo error on win10, python3.6, sklearn 0.18

running

import numpy as np
import skdata.iris.view
import hyperopt.tpe
import hpsklearn
import hpsklearn.demo_support
import time

data_view = skdata.iris.view.KfoldClassification(4)
attrs = 'petal_length', 'petal_width', 'sepal_length', 'sepal_width'
labels = 'setosa', 'versicolor', 'virginica'
X_all = np.asarray([map(d.__getitem__, attrs) for d in data_view.dataset.meta])
y_all = np.asarray([labels.index(d['name']) for d in data_view.dataset.meta])
idx_all = np.random.RandomState(1).permutation(len(y_all))
idx_train = idx_all[:int(.8 * len(y_all))]
idx_test = idx_all[int(.8 *  len(y_all)):]

# TRAIN AND TEST DATA
X_train = X_all[idx_train]
y_train = y_all[idx_train]
X_test = X_all[idx_test]
y_test = y_all[idx_test]

estimator = hpsklearn.HyperoptEstimator(
    preprocessing=hpsklearn.components.any_preprocessing('pp'),
    classifier=hpsklearn.components.any_classifier('clf'),
    algo=hyperopt.tpe.suggest,
    trial_timeout=15.0, # seconds
    max_evals=15,
    )

# Demo version of estimator.fit()
es = estimator.fit(X_train,y_train)

gives

C:\PortableSoftware\Scoop\apps\python\current\lib\site-packages\sklearn\cross_validation.py:44: Depr
ecationWarning: This module was deprecated in version 0.18 in favor of the model_selection module in
to which all the refactored classes and functions are moved. Also note that the interface of the new
 CV iterators are different from that of this module. This module will be removed in 0.20.
  "This module will be removed in 0.20.", DeprecationWarning)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "c:\users\astupidbear\documents\codes\julia\hpsklearn\deps\hyperopt-sklearn\hpsklearn\estimat
or.py", line 705, in fit
    fit_iter.send(increment)
  File "c:\users\astupidbear\documents\codes\julia\hpsklearn\deps\hyperopt-sklearn\hpsklearn\estimat
or.py", line 616, in fit_iter
    return_argmin=False, # -- in case no success so far
  File "C:\PortableSoftware\Scoop\apps\python\current\lib\site-packages\hyperopt\fmin.py", line 307,
 in fmin
    return_argmin=return_argmin,
  File "C:\PortableSoftware\Scoop\apps\python\current\lib\site-packages\hyperopt\base.py", line 635,
 in fmin
    return_argmin=return_argmin)
  File "C:\PortableSoftware\Scoop\apps\python\current\lib\site-packages\hyperopt\fmin.py", line 320,
 in fmin
    rval.exhaust()
  File "C:\PortableSoftware\Scoop\apps\python\current\lib\site-packages\hyperopt\fmin.py", line 199,
 in exhaust
    self.run(self.max_evals - n_done, block_until_done=self.async)
  File "C:\PortableSoftware\Scoop\apps\python\current\lib\site-packages\hyperopt\fmin.py", line 173,
 in run
    self.serial_evaluate()
  File "C:\PortableSoftware\Scoop\apps\python\current\lib\site-packages\hyperopt\fmin.py", line 92, in serial_evaluate
    result = self.domain.evaluate(spec, ctrl)
  File "C:\PortableSoftware\Scoop\apps\python\current\lib\site-packages\hyperopt\base.py", line 840, in evaluate
    rval = self.fn(pyll_rval)
  File "c:\users\astupidbear\documents\codes\julia\hpsklearn\deps\hyperopt-sklearn\hpsklearn\estimator.py", line 579, in fn_with_timeout
    raise fn_rval[1]
TypeError: float() argument must be a string or a number, not 'map'

dataset demo: reuters

Pretty much same idea as the 20 newsgroups dataset demo. Make an IPython notebook illustrating what happens.

Documentation: what do we need?

Python 3 Compatibility

Hello!

On line 72 of setup.py, there is a line that reads

subdirectories = os.walk(package_to_path(package)).next()[1]

but to be compatible with Python 3 as well, this should read

subdirectories = next(os.walk(package_to_path(package)))[1]

I cloned the repository and tested this out myself, and it installed without a hitch in Python 3.5. Would there be a chance this change could be implemented in the repository?

Use soft-timeout on estimators with iterative fit method

Some sklearn estimators can be fit iteratively. For such estimators, the timeout should be handled more gracefully: simply stop the fitting procedure and go on with testing. Currently the timeout just kills the job, regardless of how close it was to converging.

hyperopt / hyperopt-sklearn Goto Github PK

hyperopt-sklearn's Introduction

hyperopt-sklearn

Installation

Usage

Available Components

Classifiers

Regressors

Preprocessing

hyperopt-sklearn's People

Contributors

Stargazers

Watchers

Forkers

hyperopt-sklearn's Issues

Hello,

I got the error below when running the second example copied from http://hyperopt.github.io/hyperopt-sklearn/.

I have tried other demos such as iris and MNIST, and encourtered the same error too. What's the problem?

Recommend Projects

Recommend Topics

Recommend Org