Dear Emanuele,
Thanks for your generosity for making public your code for blending. It is a very good help.
I need your help regarding blend.py. While testing your code (blending.py), I could successfully run it on "bioresponse" data. However, with two different datasets, I am getting the almost the same error, which is:
Dataset 1:
Loading data...
/usr/lib64/python2.7/site-packages/sklearn/cross_validation.py:525: Warning: The least populated class in y has only 1 members, which is too few. The minimum number of labels for any class cannot be less than n_folds=10.
% (min_labels, self.n_folds)), Warning)
Creating train and test sets for blending.
0 RandomForestClassifier(bootstrap=True, class_weight=None, criterion='gini',
max_depth=None, max_features='auto', max_leaf_nodes=None,
min_samples_leaf=1, min_samples_split=2,
min_weight_fraction_leaf=0.0, n_estimators=100, n_jobs=-1,
oob_score=False, random_state=None, verbose=0,
warm_start=False)
Fold 0
Traceback (most recent call last):
File "blend.py", line 77, in
clf.fit(X_train, y_train)
File "/usr/lib64/python2.7/site-packages/sklearn/ensemble/forest.py", line 211, in fit
X = check_array(X, dtype=DTYPE, accept_sparse="csc")
File "/usr/lib64/python2.7/site-packages/sklearn/utils/validation.py", line 392, in check_array
% (n_samples, shape_repr, ensure_min_samples))
ValueError: Found array with 0 sample(s) (shape=(0, 0)) while a minimum of 1 is required.
The error with another dataset was:
Dataset 2:
Loading data...
/usr/lib64/python2.7/site-packages/sklearn/cross_validation.py:525: Warning: The least populated class in y has only 1 members, which is too few. The minimum number of labels for any class cannot be less than n_folds=10.
% (min_labels, self.n_folds)), Warning)
Creating train and test sets for blending.
0 RandomForestClassifier(bootstrap=True, class_weight=None, criterion='gini',
max_depth=None, max_features='auto', max_leaf_nodes=None,
min_samples_leaf=1, min_samples_split=2,
min_weight_fraction_leaf=0.0, n_estimators=100, n_jobs=-1,
oob_score=False, random_state=None, verbose=0,
warm_start=False)
Fold 0
Traceback (most recent call last):
File "blend.py", line 77, in
clf.fit(X_train, y_train)
File "/usr/lib64/python2.7/site-packages/sklearn/ensemble/forest.py", line 211, in fit
X = check_array(X, dtype=DTYPE, accept_sparse="csc")
File "/usr/lib64/python2.7/site-packages/sklearn/utils/validation.py", line 392, in check_array
% (n_samples, shape_repr, ensure_min_samples))
ValueError: Found array with 0 sample(s) (shape=(0, 45)) while a minimum of 1 is required.
I am using scikit-learn version 0.17.dev
IN both the case, problem is with the forest.oy, validation.py.When I use random forest and GBM individually from R, I am able to make the predictions but when used through your code, I am failing.
Can you please suggest where is the problem.