natekupp / ffx Goto Github PK

View Code? Open in Web Editor NEW

80.0 80.0 96.0 43.7 MB

Fast Function Extraction

Home Page: http://trent.st/ffx

License: Other

Python 99.07% Makefile 0.93%

ffx's People

Contributors

Stargazers

Watchers

Forkers

kanzure jbongard sstijven keyvan-m-sadeghi borjaayerdi markcheno gioelelm mazieres johannesbraun alphamupsiomega emigdioz pukkapies amueller petrposik yhmcckdl rowhit dvolgyes jmmcd yslai vishalbelsare pizzooid ryan102590 chauncychtt ahmadiphy lkampoli soerenab lacava chrinide cycitizen vegadai naymerlee boodramen himino74 kaiji3166 shibalagu duckmario gulin13 siamins admanda satan60 combatidy tomastaro dbmins cymaven gerokoma hosganini santana815 jbdk24 carless28 poikfj ubox26 vullnet25 konglao63 click689 raat27 vase32 hkour30 wissen33 goroking juleyap88 cgu31 liukang25 haaynes darrenhut noreag wilkersonian gwendoye dhr35 thomas0360 mkcine maroonse virlord robycob dealordnd ethane37 shaermi thaddleus choiklee yobaochang ryubog bacapatry drizzycarmon vaisandra konstanes wongpin74 patacsi seraphlinea feruluo renzobal spatake lewis841214

ffx's Issues

coef_init argument of ElasticNet.fit is going to be removed in sklearn 0.15

While invoking API's run function, I get lots of the following warning:

/usr/local/lib/python2.7/dist-packages/ffx/core.py:860: DeprecationWarning: coef_init is now ignored and will be removed in 0.15. See enet_path function.
return ElasticNet.fit(self, _args, *_kwargs)

Exception when running example code

Hi,

When I execute the example code from the readme I get the following exception:

~/.local/lib/python3.5/site-packages/ffx/core.py in _pathwiseLearn(self, ss, varnames, bases, X_orig, X_orig_regress, y_orig, max_num_bases, target_nmse, verbose, **fit_params)
    863         st, fin = numpy.log10(alpha_max * ss.eps()), numpy.log10(alpha_max)
    864         alphas1 = numpy.logspace(
--> 865             st, fin, num=ss.numAlphas() * 10)[::-1][:ss.numAlphas() / 4]
    866         alphas2 = numpy.logspace(st, fin, num=ss.numAlphas())
    867         alphas = sorted(set(alphas1).union(alphas2), reverse=True)

TypeError: slice indices must be integers or None or have an __index__ method

This is on Python3.5

Any idea what's the issue here ?

Thanks,

Ben

ElasticNetWithTimeout.init() got an unexpected keyword argument 'l1_ratio'

I get the following error using the latest version of core.py

Traceback (most recent call last):
File "/home/keyvan/Git/opencog/opencog/python/spatiotemporal/temporal_events/init.py", line 2, in
from generic import TemporalEvent, TemporalInstance, TemporalEventPiecewiseLinear
File "/home/keyvan/Git/opencog/opencog/python/spatiotemporal/temporal_events/generic.py", line 2, in
from spatiotemporal.temporal_events.formulas import FormulaCreator, TemporalFormulaConvolution
File "/home/keyvan/Git/opencog/opencog/python/spatiotemporal/temporal_events/init.py", line 14, in
models = ffx.run(train_X, train_y, test_X, test_y, ["predictor_a", "predictor_b"])
File "/usr/local/lib/python2.7/dist-packages/ffx/api.py", line 4, in run
return core.MultiFFXModelFactory().build(train_X, train_y, test_X, test_y, varnames, verbose)
File "/usr/local/lib/python2.7/dist-packages/ffx/core.py", line 442, in build
next_models = FFXModelFactory().build(train_X, train_y, ss, varnames, verbose)
File "/usr/local/lib/python2.7/dist-packages/ffx/core.py", line 583, in build
ss, varnames, order1_bases, X, y, max_num_bases, target_train_nmse, verbose)
File "/usr/local/lib/python2.7/dist-packages/ffx/core.py", line 682, in _basesToModels
max_num_bases, target_train_nmse, verbose)
File "/usr/local/lib/python2.7/dist-packages/ffx/core.py", line 726, in _pathwiseLearn
max_iter=max_iter, **fit_params)
TypeError: init() got an unexpected keyword argument 'l1_ratio'

Fails to identify y=x^2

Added a test [5510110] with data y=x^2 for x=[0, 1, 2, 3]. FFX does pretty well, but I expected the exact relationship. In the 4-base model, there are two x^2 terms -- could this be related to the handling of second-order bases mentioned in #5 ?

Num bases,Test error (%),Model
0, 62.4453, 3.50
1, 11.4284, 0.640 + 0.817_x^2
2, 1.6635, 0.0846 + 0.972_x^2 + 0.00984_x
4, 0.7507, (0.0973 + 0.523_x^2 + 0.440_x^2) / (1.0 - 0.00214_x - 0.00168*x)

Choose one model from the Pareto front as our champion

FFX creates a Pareto front of good models, trading off numBases against accuracy. For some applications we would like to be able to put up just one model as the champion, for example when FFX is being benchmarked against other techniques for symbolic regression, eg https://github.com/EpistasisLab/regression-benchmark.

Our current option (see api.py/FFXRegressor) is just to choose the model of highest accuracy/highest numBases. But there are at least two other options:

Try to find an "elbow" in Pareto front (idea: we are willing to give up a little accuracy for simplicity)
reserve some training data to use as a validation set, and choose the model with best accuracy on the validation set (idea: the more complex models may be overfitting and our goal is to avoid that).

ElasticNet.fit not working, says "Array contains NaN or infinity"

I changed the example in Readme to:

import numpy as np
import ffx

train_X = np.array([(2, 7, 3), (3, 2, 8)]).T
train_y = np.array([5, 9, 11])

test_X = np.array([(1.5, 2.5, 3.5), (2.4, 3.6, 5.2)]).T
test_y = np.array([3.9, 6.1, 8.7])

models = ffx.run(train_X, train_y, test_X, test_y, ["predictor_a", "predictor_b"])
for model in models:
    yhat = model.simulate(test_X)
    print model

I was expecting to simply get "predictor_a + predictor_b"

But instead got:

/usr/local/lib/python2.7/dist-packages/sklearn/linear_model/coordinate_descent.py:418: UserWarning: Objective did not converge. You might want to increase the number of iterations
  ' to increase the number of iterations')
/home/keyvan/Git/ffx/ffx/core.py:805: RuntimeWarning: invalid value encountered in divide
  X_unbiased = (Xin - X_avgs) / X_stds
Traceback (most recent call last):
  File "/home/keyvan/Git/opencog/opencog/python/spatiotemporal/temporal_events/__init__.py", line 2, in <module>
    from generic import TemporalEvent, TemporalInstance, TemporalEventPiecewiseLinear
  File "/home/keyvan/Git/opencog/opencog/python/spatiotemporal/temporal_events/generic.py", line 2, in <module>
    from spatiotemporal.temporal_events.formulas import FormulaCreator, TemporalFormulaConvolution
  File "/home/keyvan/Git/opencog/opencog/python/spatiotemporal/temporal_events/__init__.py", line 14, in <module>
    models = ffx.run(train_X, train_y, test_X, test_y, ["predictor_a", "predictor_b"])
  File "/home/keyvan/Git/ffx/ffx/api.py", line 4, in run
    return core.MultiFFXModelFactory().build(train_X, train_y, test_X, test_y, varnames, verbose)
  File "/home/keyvan/Git/ffx/ffx/core.py", line 443, in build
    next_models = FFXModelFactory().build(train_X, train_y, ss, varnames, verbose)
  File "/home/keyvan/Git/ffx/ffx/core.py", line 653, in build
    ss, varnames, bases, X, y, ss.final_max_num_bases, ss.final_target_train_nmse, verbose)
  File "/home/keyvan/Git/ffx/ffx/core.py", line 683, in _basesToModels
    max_num_bases, target_train_nmse, verbose)
  File "/home/keyvan/Git/ffx/ffx/core.py", line 729, in _pathwiseLearn
    clf.fit(X_unbiased, y_unbiased)
  File "/home/keyvan/Git/ffx/ffx/core.py", line 849, in new_f
    result = f(*args, **kwargs)
  File "/home/keyvan/Git/ffx/ffx/core.py", line 861, in fit
    return ElasticNet.fit(self, *args, **kwargs)
  File "/usr/local/lib/python2.7/dist-packages/sklearn/linear_model/coordinate_descent.py", line 610, in fit
    copy=self.copy_X and self.fit_intercept)
  File "/usr/local/lib/python2.7/dist-packages/sklearn/utils/validation.py", line 124, in atleast2d_or_csc
    "tocsc", force_all_finite)
  File "/usr/local/lib/python2.7/dist-packages/sklearn/utils/validation.py", line 111, in _atleast2d_or_sparse
    force_all_finite=force_all_finite)
  File "/usr/local/lib/python2.7/dist-packages/sklearn/utils/validation.py", line 93, in array2d
    _assert_all_finite(X_2d)
  File "/usr/local/lib/python2.7/dist-packages/sklearn/utils/validation.py", line 27, in _assert_all_finite
    raise ValueError("Array contains NaN or infinity.")
ValueError: Array contains NaN or infinity.

Any idea on how to resolve this?

P.S. I'm not sure that I'm using test_x and test_y correctly, what do they do anyway?

Cheers,
K

FFX sometimes crashes with NaN or infinity with apparently simple data

This file crashes:

#!/usr/bin/env python

import numpy as np
import ffx

# This creates a dataset of 1 predictor
train_X = np.array([[0, 1, 2, 3]]).T
train_y = np.array([0, 1, 4, 9])

test_X = np.array([[4, 5, 6, 7]]).T
test_y = np.array([16, 25, 36, 49])

models = ffx.run(train_X, train_y, test_X, test_y, ["x"])

Traceback (most recent call last):
  File "./test2.py", line 13, in <module>
    models = ffx.run(train_X, train_y, test_X, test_y, ["x"])
  File "/Users/jmmcd/Documents/vc/ffx/ffx/api.py", line 4, in run
    return core.MultiFFXModelFactory().build(train_X, train_y, test_X, test_y, varnames, verbose)
  File "/Users/jmmcd/Documents/vc/ffx/ffx/core.py", line 443, in build
    next_models = FFXModelFactory().build(train_X, train_y, ss, varnames, verbose)
  File "/Users/jmmcd/Documents/vc/ffx/ffx/core.py", line 584, in build
    ss, varnames, order1_bases, X, y, max_num_bases, target_train_nmse, verbose)
  File "/Users/jmmcd/Documents/vc/ffx/ffx/core.py", line 683, in _basesToModels
    max_num_bases, target_train_nmse, verbose)
  File "/Users/jmmcd/Documents/vc/ffx/ffx/core.py", line 729, in _pathwiseLearn
    clf.fit(X_unbiased, y_unbiased, coef_init=cur_unbiased_coefs)
  File "/Users/jmmcd/Documents/vc/ffx/ffx/core.py", line 849, in new_f
    result = f(*args, **kwargs)
  File "/Users/jmmcd/Documents/vc/ffx/ffx/core.py", line 861, in fit
    return ElasticNet.fit(self, *args, **kwargs)
  File "/Users/jmmcd/Documents/dev/anaconda/lib/python2.7/site-packages/sklearn/linear_model/coordinate_descent.py", line 179, in fit
    copy=self.copy_X and self.fit_intercept)
  File "/Users/jmmcd/Documents/dev/anaconda/lib/python2.7/site-packages/sklearn/utils/validation.py", line 107, in atleast2d_or_csc
    "tocsc")
  File "/Users/jmmcd/Documents/dev/anaconda/lib/python2.7/site-packages/sklearn/utils/validation.py", line 96, in _atleast2d_or_sparse
    X = array2d(X, dtype=dtype, order=order, copy=copy)
  File "/Users/jmmcd/Documents/dev/anaconda/lib/python2.7/site-packages/sklearn/utils/validation.py", line 81, in array2d
    _assert_all_finite(X_2d)
  File "/Users/jmmcd/Documents/dev/anaconda/lib/python2.7/site-packages/sklearn/utils/validation.py", line 18, in _assert_all_finite
    raise ValueError("Array contains NaN or infinity.")
ValueError: Array contains NaN or infinity.

Whereas this one works ok -- only difference is the data:

#!/usr/bin/env python

import numpy as np
import ffx

# This creates a dataset of 1 predictor
train_X = np.array([[0, 1, 2, 3]]).T
train_y = np.array([1, 2, 3, 4])

test_X = np.array([[4, 5, 6, 7]]).T
test_y = np.array([5, 6, 7, 8])

models = ffx.run(train_X, train_y, test_X, test_y, ["x"])

How to identify the prediction model used from FFX.models_ via the Scikit-Learn API?

I am using the Scikit-Learn interface and have successfully printed the entire Pareto front through FFX.models_. However, I am uncertain about how to identify and print the specific model used for predictions.

The README.md states that FFX selects the model with the highest accuracy and complexity for predictions. Could you please clarify which model from the FFX.models_ list this corresponds to? Is it the last model in the list?

Thank you in advance.

can FFX learn the relationship within time series?

Thank you for your great work which offers new approach for symbolic regression. By your example in the slice, FFX can output the function of linear or nonlinear expression. But when I study the stock time series, I usually need auto-relation function expression. For example: ts_min(x, d) = time-series min over the past d days . ts_argmin(x, d) = which day ts_min(x, d) occurred on . ts_rank(x, d) = time-series rank in the past d days . stddev(x, d) = moving time-series standard deviation over the past d days.

So I have problems to output the function express above , can you give me some advice , thank you for your help. @jmmcd

is there a way to do classification?

Working with a large dataset

Hi @jmmcd, etc.,

I'm hoping to get ffx working on a dataset with 10 million rows, 26 features.

I ran to a few issues which I summarised below:

The time complexity of the algorithm, is it exponential? For 100 rows it takes 25 seconds, 37s for 200 rows and 163s for 300 rows.

I'm getting a new warning:

/home/keyvan/Git/ffx/ffx/core.py:568: RuntimeWarning: invalid value encountered in double_scalars
  minx + 0.2*rangex, maxx - 0.2*rangex + 0.1*rangex, stepx)

I see some names in the formulas that I don't have, for example, I do have variables named 'm' and 'o', but I get things like 'm3', 'm8' and 'o1' in the formulas:

0.510
0.296 + 0.473*m3
0.503 + 0.171*m3 - 0.163*max(0,0.867-m3) + 0.117*max(0,m3-0.200) - 0.101*max(0,0.733-m3) + 0.0163*max(0,m3-0.333)
(0.186 + 0.243*m3 + 0.242*m3 + 0.198*m5 * p) / (1.0 - 0.0998*m3 - 0.0995*m3 - 0.0774*p)
(0.178 + 0.250*m3 + 0.249*m3 + 0.235*m5 * p) / (1.0 - 0.0990*m3 - 0.0988*m3 - 0.0779*p - 0.0345*o1 * p)
(0.142 + 0.403*m5 * p + 0.286*m3 + 0.286*m3) / (1.0 - 0.197*o1 * p - 0.0944*m3 - 0.0934*m3 - 0.0888*m8 - 0.0728*p)
Performance: 163.948806047 seconds

what do these mean?

Cheers,
K

FFX crashes when X has only 1 variable

This code produces an "iteration over 0-d array" error, pasted below. I think the reason is that FFX calls list() on the coefficients, which have been numpy.squeeze()ed by sklearn ElasticNet. I don't see why they squeeze it, but ok. I guess the solution is to hack it by checking if the array is 0-d, and if so just reshape it. I'll commit that fix and then close this issue. Just raising it for posterity.

#!/usr/bin/env python

import numpy as np
import ffx

# This creates a dataset of 1 predictor
train_X = np.array([[0, 1, 2, 3]]).T
train_y = np.array([0, 1, 4, 9])

test_X = np.array([[4, 5, 6, 7]]).T
test_y = np.array([16, 25, 36, 49])

models = ffx.run(train_X, train_y, test_X, test_y, ["x"])

Traceback (most recent call last):
  File "./test2.py", line 12, in <module>
    models = ffx.run(train_X, train_y, test_X, test_y, ["x"])
  File "/Users/jmmcd/Documents/vc/ffx/ffx/api.py", line 4, in run
    return core.MultiFFXModelFactory().build(train_X, train_y, test_X, test_y, varnames, verbose)
  File "/Users/jmmcd/Documents/vc/ffx/ffx/core.py", line 443, in build
    next_models = FFXModelFactory().build(train_X, train_y, ss, varnames, verbose)
  File "/Users/jmmcd/Documents/vc/ffx/ffx/core.py", line 584, in build
    ss, varnames, order1_bases, X, y, max_num_bases, target_train_nmse, verbose)
  File "/Users/jmmcd/Documents/vc/ffx/ffx/core.py", line 683, in _basesToModels
    max_num_bases, target_train_nmse, verbose)
  File "/Users/jmmcd/Documents/vc/ffx/ffx/core.py", line 740, in _pathwiseLearn
    coefs = self._rebiasCoefs([0.0] + list(cur_unbiased_coefs), X_stds, X_avgs, y_std, y_avg)
TypeError: iteration over a 0-d array

version number bump and pypi upgrade?

Hi,

A few months ago there was a python3 division fix.
However, i think the pypi package wasn't updated.

FFX not summing expressions of same variable with different coefficients

I just came across the following FFX model:

(-0.124 - 0.228*Land_Use - 0.228*Land_Use) / (1.0 + 2.62*Land_Use + 2.60*Land_Use)

Why isn't FFX reducing this expression?

No module named 'api'

I follow the tutorial to install ffx. But when I import ffx one problem arising and the error information as follow:

In [1]: import ffx

ImportError Traceback (most recent call last)
in ()
----> 1 import ffx

E:\Program Files\Anaconda3\lib\site-packages\ffx__init__.py in ()
----> 1 from api import *

ImportError: No module named 'api'

How do I fix this problem?
My python version is 3.5.1.
Really appreciate!!

How to increase iterations?

I'm trying to train the models but it is not converging. I tried to set max_iter in regressor, but it is not recognizing it. This is the warning I'm getting.

ConvergenceWarning: Objective did not converge. You might want to increase the number of iterations. Duality gap: 0.17428639502685428, tolerance: 0.0699999999999999

Should I just wait for it to finish, or, if not, how can I increase iterations?

How to get reproducible results?

EDIT: Nevermind this issue. The code is deterministic, so the result is always the same. Sorry for the trouble.

Greetings,

I was wondering if there is a way to achieve reproducible results.
Specifically, I am looking for something like an integer parameter (random_state) in FFXRegressor(),
that would allow for consistent outcomes across different runs or another similar way.

I have checked the core files, but up to my knowledge it seems that an option for ensuring reproducibility is not present.

Could you please confirm if this functionality is indeed not yet implemented,
or if there is an alternative method to achieve reproducibility?

Thank you.

SimpleBase.simulate throws IndexError

At some random step that I can't trace, SimpleBase.simulate throws "IndexError: too many indices". I'm getting this in here:

return X[:,self.var] ** self.exponent

self.var is 13 (I have 26 features)

For a quick fix, I added:

    if len(X.shape) == 1:
        return X ** self.exponent

right before the above mentioned line, but I'm not sure if it's the right fix...

Using out-of-date scikit.learn

It looks like scikit-learn has changed a lot since FFX was written. I had to edit core.py to import as follows:

from sklearn.linear_model import ElasticNet

After fixing that, it imported ok. When I ran the sample code in Readme.md, I got an error concerning args in ElasticNet.fit() (pasted below) -- I tried fiddling with the args, but then I got a different error, and I don't really know those modules, so I stopped fiddling.

---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-1-5756048b6a36> in <module>()
      8 test_y = np.array( [3.03,0.9113,1.823])
      9 
---> 10 models = ffx.run(train_X, train_y, test_X, test_y, ["predictor_a", "predictor_b"])
     11 for model in models:
     12     yhat = model.simulate(test_X)

/Users/jmmcd/Documents/dev/anaconda/lib/python2.7/site-packages/ffx/api.pyc in run(train_X, train_y, test_X, test_y, varnames, verbose)
      2 
      3 def run(train_X, train_y, test_X, test_y, varnames=None, verbose=False):
----> 4     return core.MultiFFXModelFactory().build(train_X, train_y, test_X, test_y, varnames, verbose)

/Users/jmmcd/Documents/dev/anaconda/lib/python2.7/site-packages/ffx/core.py in build(self, train_X, train_y, test_X, test_y, varnames, verbose)
    441             ss = FFXBuildStrategy(approach)
    442 
--> 443             next_models = FFXModelFactory().build(train_X, train_y, ss, varnames, verbose)
    444 
    445             #set test_nmse on each model

/Users/jmmcd/Documents/dev/anaconda/lib/python2.7/site-packages/ffx/core.py in build(self, X, y, ss, varnames, verbose)
    582             target_train_nmse = 0.01
    583             models = self._basesToModels(
--> 584                 ss, varnames, order1_bases, X, y, max_num_bases, target_train_nmse, verbose) 
    585             if models is None: #fit failed.
    586                 model = ConstantModel(y[0], 0)

/Users/jmmcd/Documents/dev/anaconda/lib/python2.7/site-packages/ffx/core.py in _basesToModels(self, ss, varnames, bases, X, y, max_num_bases, target_train_nmse, verbose)
    681         #compute models.
    682         models = self._pathwiseLearn(ss, varnames, bases, X, regress_X, y, 
--> 683                                      max_num_bases, target_train_nmse, verbose)
    684         return models
    685 

/Users/jmmcd/Documents/dev/anaconda/lib/python2.7/site-packages/ffx/core.py in _pathwiseLearn(self, ss, varnames, bases, X_orig, X_orig_regress, y_orig, max_num_bases, target_nmse, verbose, **fit_params)
    727             try:
    728                 clf.fit(X_unbiased, y_unbiased, coef_init=cur_unbiased_coefs, 
--> 729                         max_iter=max_iter, **fit_params)
    730             except TimeoutError:
    731                 print '    Regularized update failed. Returning None'

/Users/jmmcd/Documents/dev/anaconda/lib/python2.7/site-packages/ffx/core.py in new_f(*args, **kwargs)
    839             signal.alarm(seconds_before_timeout)
    840             try:
--> 841                 result = f(*args, **kwargs)
    842             finally:
    843                 signal.signal(signal.SIGALRM, old)

/Users/jmmcd/Documents/dev/anaconda/lib/python2.7/site-packages/ffx/core.py in fit(self, *args, **kwargs)
    851     @timeout(MAX_TIME_REGULARIZE_UPDATE) #if this freezes, then exit with a TimeoutError
    852     def fit(self, *args, **kwargs):
--> 853         return ElasticNet.fit(self, *args, **kwargs)
    854 
    855 #========================================================================================

TypeError: fit() got an unexpected keyword argument 'precompute'

/Users/jmmcd/Documents/dev/anaconda/lib/python2.7/site-packages/sklearn/linear_model/coordinate_descent.py:137: DeprecationWarning: rho was renamed to l1_ratio and will be removed in 0.15
  "in 0.15", DeprecationWarning)

Website is broken

http://trent.st/ffx/ gives a 403 error

Error on some data

We want to use FFX in a comparative study but we ran into a problem: for some data, in STEP 2 there are somehow 0 bases (STEP 2: Regress on all 0 bases: begin.) and in this case the pathwise learn fails because of an empty array:

...
Build with approach 2/7 (inter1 denom0 expon0 nonlin0 thresh1): begin
  STEP 1A: Build order-1 bases: begin
  STEP 1A: Build order-1 bases: done.  Have 65 order-1 bases.
  STEP 1B: Find order-1 base infls: begin
    Pathwise learn: begin. max_num_bases=65
      alpha 1/1249 (4.091439e-02): num_bases=0, nmse=0.009177, time 1.83 s
    Pathwise learn: Early stop because nmse < target
  STEP 1B: Find order-1 base infls: done
  STEP 1C: Build order-2 bases: begin
  STEP 1C: Build order-2 bases: done.  Have 0 order-2 bases.
  STEP 2: Regress on all 0 bases: begin.
    Pathwise learn: begin. max_num_bases=250
Traceback (most recent call last):
  File "/mnt/data/gandalv/School/PhD/Research/sr-comparison/ffx/code/run.py", line 205, in <module>
    main()
  File "/mnt/data/gandalv/School/PhD/Research/sr-comparison/ffx/code/run.py", line 122, in main
    verbose=True)
  File "/mnt/data/gandalv/School/PhD/Research/sr-comparison/ffx/ffx-venv/local/lib/python2.7/site-packages/ffx/api.py", line 4, in run
    return core.MultiFFXModelFactory().build(train_X, train_y, test_X, test_y, varnames, verbose)
  File "/mnt/data/gandalv/School/PhD/Research/sr-comparison/ffx/ffx-venv/local/lib/python2.7/site-packages/ffx/core.py", line 443, in build
    next_models = FFXModelFactory().build(train_X, train_y, ss, varnames, verbose)
  File "/mnt/data/gandalv/School/PhD/Research/sr-comparison/ffx/ffx-venv/local/lib/python2.7/site-packages/ffx/core.py", line 653, in build
    ss, varnames, bases, X, y, ss.final_max_num_bases, ss.final_target_train_nmse, verbose)
  File "/mnt/data/gandalv/School/PhD/Research/sr-comparison/ffx/ffx-venv/local/lib/python2.7/site-packages/ffx/core.py", line 683, in _basesToModels
    max_num_bases, target_train_nmse, verbose)
  File "/mnt/data/gandalv/School/PhD/Research/sr-comparison/ffx/ffx-venv/local/lib/python2.7/site-packages/ffx/core.py", line 729, in _pathwiseLearn
    max_iter=max_iter, **fit_params)
  File "/mnt/data/gandalv/School/PhD/Research/sr-comparison/ffx/ffx-venv/local/lib/python2.7/site-packages/ffx/core.py", line 841, in new_f
    result = f(*args, **kwargs)
  File "/mnt/data/gandalv/School/PhD/Research/sr-comparison/ffx/ffx-venv/local/lib/python2.7/site-packages/ffx/core.py", line 853, in fit
    return ElasticNet.fit(self, *args, **kwargs)
  File "/mnt/data/gandalv/School/PhD/Research/sr-comparison/ffx/ffx-venv/local/lib/python2.7/site-packages/scikits/learn/linear_model/coordinate_descent.py", line 122, in fit
    beta, Gram, Xy, y, max_iter, tol)
  File "cd_fast.pyx", line 225, in scikits.learn.linear_model.cd_fast.enet_coordinate_descent_gram (scikits/learn/linear_model/cd_fast.c:2518)
  File "/mnt/data/gandalv/School/PhD/Research/sr-comparison/ffx/ffx-venv/local/lib/python2.7/site-packages/numpy/linalg/linalg.py", line 2072, in norm
    return abs(x).max(axis=axis)
  File "/mnt/data/gandalv/School/PhD/Research/sr-comparison/ffx/ffx-venv/local/lib/python2.7/site-packages/numpy/core/_methods.py", line 26, in _amax
    return umr_maximum(a, axis, None, out, keepdims)
ValueError: zero-size array to reduction operation maximum which has no identity

The training data can be downloaded here, the last column is the y-value (i.e. the target value). Testing data are identical to training data.

Is there any possibility this could be resolved? FFX will get its citation :).

Index out of bounds error

I'm trying FFX with gridded data (not certain if that is an issue) with a set of approx. 3000 data points consisting of 4 variables. The traceback is:

Traceback (most recent call last):
File "", line 1, in
File "/Library/Frameworks/EPD64.framework/Versions/7.2/lib/python2.7/site-packages/ffx-1.3.4-py2.7.egg/ffx/api.py", line 4, in run
return core.MultiFFXModelFactory().build(train_X, train_y, test_X, test_y, varnames, verbose)
File "/Library/Frameworks/EPD64.framework/Versions/7.2/lib/python2.7/site-packages/ffx-1.3.4-py2.7.egg/ffx/core.py", line 443, in build
next_models = FFXModelFactory().build(train_X, train_y, ss, varnames, verbose)
File "/Library/Frameworks/EPD64.framework/Versions/7.2/lib/python2.7/site-packages/ffx-1.3.4-py2.7.egg/ffx/core.py", line 653, in build
ss, varnames, bases, X, y, ss.final_max_num_bases, ss.final_target_train_nmse, verbose)
File "/Library/Frameworks/EPD64.framework/Versions/7.2/lib/python2.7/site-packages/ffx-1.3.4-py2.7.egg/ffx/core.py", line 683, in _basesToModels
max_num_bases, target_train_nmse, verbose)
File "/Library/Frameworks/EPD64.framework/Versions/7.2/lib/python2.7/site-packages/ffx-1.3.4-py2.7.egg/ffx/core.py", line 700, in _pathwiseLearn
(X_unbiased, y_unbiased, X_avgs, X_stds, y_avg, y_std) = self._unbiasedXy(X_orig_regress, y_orig)
File "/Library/Frameworks/EPD64.framework/Versions/7.2/lib/python2.7/site-packages/ffx-1.3.4-py2.7.egg/ffx/core.py", line 810, in _unbiasedXy
X_unbiased[i] = (Xin[i] - X_avgs[i])
IndexError: index out of bounds

SIGALRM on windows

hey!
i'm using this module on windows and when i try to run the example, it show me this error:

'module' object has no attribute 'SIGALRM'

i know that this function it's only available on unix, do you have an equivalent for windows?

thnks!

ImportError: cannot import name core

Got the above error message while trying to run(Both with Python2 and Python3) the following example code from the README:

import numpy as np
import ffx

train_X = np.array( [ (1.5,2,3), (4,5,6) ] ).T
train_y = np.array( [1,2,3])

test_X = np.array( [ (5.241,1.23, 3.125), (1.1,0.124,0.391) ] ).T
test_y = np.array( [3.03,0.9113,1.823])

models = ffx.run(train_X, train_y, test_X, test_y, ["predictor_a", "predictor_b"])
for model in models:
    yhat = model.simulate(test_X)
    print(model)

Indexing error

FFX stops working (for Python 3.5.1) a few days ago, due to an array indexing error.

This is due to core.py at line 865:
st, fin, num=ss.numAlphas() * 10)[::-1][:ss.numAlphas() / 4]
shoud be changed as:
st, fin, num=ss.numAlphas() * 10)[::-1][:int(ss.numAlphas() / 4)]

Cheers,

fail to real time series

Hello. I have a time series 'x', I use np.gradient to calculate dx/dt, then I want to learn a function like that dx/dt=f(x). In this case here, for the FFX, target is the dx/dt, and trainx, testx is the original time series. But I face with the problem as follow. It is no problem to do the same things with the model dx/dt = 0.2x.

The problem as follow:

IndexError Traceback (most recent call last)
in ()
4 test_y = dx[101::]
5 varnames = 'x'
----> 6 dxdt_fitted_result = run_ffx(train_x,train_y,test_x,test_y,varnames)

in run_ffx(train_X, train_y, test_X, test_y, varnames)
3
4 def run_ffx(train_X, train_y, test_X, test_y, varnames):
----> 5 models = ffx.run(train_X, train_y, test_X, test_y, varnames)
6 base = [model.numBases() for model in models]
7 test_error = [model.test_nmse for model in models]

/home/happyling/anaconda3/lib/python3.5/site-packages/ffx/api.py in run(train_X, train_y, test_X, test_y, varnames, verbose)
2
3 def run(train_X, train_y, test_X, test_y, varnames=None, verbose=False):
----> 4 return core.MultiFFXModelFactory().build(train_X, train_y, test_X, test_y, varnames, verbose)

/home/happyling/anaconda3/lib/python3.5/site-packages/ffx/core.py in build(self, train_X, train_y, test_X, test_y, varnames, verbose)
557 ss = FFXBuildStrategy(approach)
558
--> 559 next_models = FFXModelFactory().build(train_X, train_y, ss, varnames, verbose)
560
561 # set test_nmse on each model

/home/happyling/anaconda3/lib/python3.5/site-packages/ffx/core.py in build(self, X, y, ss, varnames, verbose)
658 #'lin' version of base
659 simple_base = SimpleBase(var_i, exponent)
--> 660 lin_yhat = simple_base.simulate(X)
661 # checking exponent is a speedup
662 if exponent in [1.0, 2.0] or not yIsPoor(lin_yhat):

/home/happyling/anaconda3/lib/python3.5/site-packages/ffx/core.py in simulate(self, X)
314 y -- 1d array of [sample_i] : float
315 """
--> 316 return X[:, self.var] ** self.exponent
317
318 def str(self):

IndexError: too many indices for array

change sklearn to scikit-learn

pypi doesn't accept sklearn as a dependency anymore; has to be scikit-learn

ffx/setup.py

Line 18 in ced56d9

    
           install_requires=['click>=5.0', 'contextlib2>=0.5.4', 'numpy', 'pandas', 'six', 'sklearn',],

Second order interactions code is flawed

The first part of the code which catches xi*xi:

order2_exponent is used nowhere
in the creation of combined_base var_i is used instead of basei.var, this variable is still alive from Step 1A and causes no error. However, only the last variable is added.

Furthermore, it seems more logic to me to include 2 in the list of allowed exponents, this way this part of the order-2 can just be removed completely.

In the second part, a break statement on len(order2_bases) >= max_n_order2_bases is added at the end of each for. However, only the inner for actually adds something to order2_bases and is able to violate the constraint. This means only the first one will ever be used.

TypeError: slice indices must be integers or None or have an index method

Hi, just tried the example. Alas, no success ...

# now trying ffx
import numpy as np
import ffx

train_X = np.array( [ (1.5,2,3), (4,5,6) ] ).T
train_y = np.array( [1,2,3])

test_X = np.array( [ (5.241,1.23, 3.125), (1.1,0.124,0.391) ] ).T
test_y = np.array( [3.03,0.9113,1.823])

models = ffx.run(train_X, train_y, test_X, test_y, ["a", "b"])
for model in models:
    yhat = model.simulate(test_X)
    print(model)

I'm running:
numpy version 1.12.1
scikit-learn version 0.18.1
Python 3.6.1 |Anaconda 4.4.0 (x86_64)| (default, May 11 2017, 13:04:09)
[GCC 4.2.1 Compatible Apple LLVM 6.0 (clang-600.0.57)] on darwin

Trace:

TypeError Traceback (most recent call last)
in ()
9 test_y = np.array( [3.03,0.9113,1.823])
10
---> 11 models = ffx.run(train_X, train_y, test_X, test_y, ["a", "b"])
12 for model in models:
13 yhat = model.simulate(test_X)

/Users/thiemo/miniconda3/envs/py36_64/lib/python3.6/site-packages/ffx/api.py in run(train_X, train_y, test_X, test_y, varnames, verbose)
2
3 def run(train_X, train_y, test_X, test_y, varnames=None, verbose=False):
----> 4 return core.MultiFFXModelFactory().build(train_X, train_y, test_X, test_y, varnames, verbose)

/Users/thiemo/miniconda3/envs/py36_64/lib/python3.6/site-packages/ffx/core.py in build(self, train_X, train_y, test_X, test_y, varnames, verbose)
557 ss = FFXBuildStrategy(approach)
558
--> 559 next_models = FFXModelFactory().build(train_X, train_y, ss, varnames, verbose)
560
561 # set test_nmse on each model

/Users/thiemo/miniconda3/envs/py36_64/lib/python3.6/site-packages/ffx/core.py in build(self, X, y, ss, varnames, verbose)
709 target_train_nmse = 0.01
710 models = self._basesToModels(
--> 711 ss, varnames, order1_bases, X, y, max_num_bases, target_train_nmse, verbose)
712 if models is None: # fit failed.
713 model = ConstantModel(y[0], 0)

/Users/thiemo/miniconda3/envs/py36_64/lib/python3.6/site-packages/ffx/core.py in _basesToModels(self, ss, varnames, bases, X, y, max_num_bases, target_train_nmse, verbose)
829 # compute models.
830 models = self._pathwiseLearn(ss, varnames, bases, X, regress_X, y,
--> 831 max_num_bases, target_train_nmse, verbose)
832 return models
833

/Users/thiemo/miniconda3/envs/py36_64/lib/python3.6/site-packages/ffx/core.py in _pathwiseLearn(self, ss, varnames, bases, X_orig, X_orig_regress, y_orig, max_num_bases, target_nmse, verbose, **fit_params)
863 st, fin = numpy.log10(alpha_max * ss.eps()), numpy.log10(alpha_max)
864 alphas1 = numpy.logspace(
--> 865 st, fin, num=ss.numAlphas() * 10)[::-1][:ss.numAlphas() / 4]
866 alphas2 = numpy.logspace(st, fin, num=ss.numAlphas())
867 alphas = sorted(set(alphas1).union(alphas2), reverse=True)

TypeError: slice indices must be integers or None or have an index method

AttributeError: module 'ffx' has no attribute 'run'

The error occurs when I simply try to run the example in the readme. I'm running Anaconda 4.4.0, Python 3.6. Is the Python version the issue?

How to troubleshoot a bad model fit?

Hello,

I tried to use your library with some measurement data but the model obtained is not a good fit.

x and y values are already normalized, I already tried diverse partitioning of the test/train sets and changing the max_iter value to 50.000 but after that I don't know what else to do. Is the number of data points too small? Does ffx not handle non-deterministic (with some random noise) models? Any help would be very much appreciated

Here is a sample code:

import ffx
import numpy as np
import matplotlib.pyplot as plt

# Data to model, two measurements of 'y' for each 'x' value
x = np.array([[-1.392], [-0.985], [-0.308], [0.293], [1.046], [1.347], [-1.392], [-0.985], [-0.308], [ 0.293], [ 1.046], [ 1.347]])
y = np.array([[-1.691], [-0.925], [ 0.109], [0.768], [0.826], [0.829], [-1.673], [-1.049], [ 0.123], [ 0.833], [ 0.947], [ 0.903]])

# Plot y vs x
fig, ax = plt.subplots(1)
ax.scatter(x, y, facecolor='b', marker='o')

# Separate in train and test sets, two possibilities:
if( False ): # Alternate values of 'x' in train set
    x_train = x[1:12:2].reshape( (6,1) )
    y_train = y[1:12:2].reshape( (6,1) )
    
    x_test = x[0:12:2].reshape( (6,1) )
    y_test = y[0:12:2].reshape( (6,1) )
else: # Each 'x' value in train set
    x_train = x[0:6].reshape( (6,1) )
    y_train = y[0:6].reshape( (6,1) )
    
    x_test = x[6:12].reshape( (6,1) )
    y_test = y[6:12].reshape( (6,1) )

#Plot train/tests sets
fig, ax = plt.subplots(1)
ax.scatter(x_train, y_train, facecolor='b', marker='o', label='train')
ax.scatter(x_test,  y_test,  facecolor='b', marker='x', label='test')
ax.legend()

# max_iter changed to 50000  in model_factories.py
models = ffx.run(x_train, y_train, x_test, y_test, varnames=['x'])

for model in models:
    yhat = model.simulate(x_test)
    print(model)

    fig, ax = plt.subplots(1)
    ax.scatter(x, y, facecolor='b', marker='o', label='measurement')
    ax.scatter(x_test, yhat, facecolor='r', marker='x', label='model')
    ax.legend()

The models I obtain with the first partition are:
0.227
0.187 + 0.179*x

and with the second partition are:
-0.0140
0.0116 / (1.0 - 0.150*abs(x))

Do you have any ideas what should I try? Any help would be much appreciated