cavalab / ellyn Goto Github PK

View Code? Open in Web Editor NEW

52.0 52.0 12.0 49.17 MB

python-wrapped version of ellen, a linear genetic programming system for symbolic regression and classification.

Home Page: http://cavalab.org/ellyn

License: Other

C++ 88.26% C 1.94% Makefile 0.26% Shell 0.21% Python 9.32%

ellyn's People

Contributors

Stargazers

Watchers

Forkers

arita37 lacava codeaudit kel85uk vishalbelsare zxp-proteus pizzooid tianmuwang allan2 zendra123

ellyn's Issues

Lexicase verbose

When running ellyn with no verbose, it still constantly outputs information regarding lexicase selection (see below).
Is it possible to turn of these prints using a verbose flag somewhere?
If not, from which cpp file do they originate from?

Best regards

lexpool = 1 getting lexpool lexpool = 1 getting lexpool lexpool = 1 getting lexpool lexpool = 1 getting lexpool lexpool = 1

pareto archive isn't being updated with islands=True and g=0

Codes for reproducing the issue:

from ellyn import ellyn
from sklearn.preprocessing import LabelEncoder
from pmlb import fetch_data

problem = 'breast-cancer-wisconsin'
X, y = fetch_data(problem, return_X_y = True)

y = LabelEncoder().fit_transform(y)
learner=ellyn(g=0, popsize=100, selection='tournament', classification=True,
        islands=False, num_islands=8, stop_threshold=0.01, AR=False, tourn_size=3, rt_cross=0.5,
        rt_mut=0.2, verbosity=2, random_state=42, prto_arch_on= True, class_m4gp= True)
learner.fit(X, y)
fitness_score = learner.score(X, y)
print(fitness_score)
learner=ellyn(g=0, popsize=100, selection='tournament', classification=True,
                islands=True, num_islands=8, stop_threshold=0.01, AR=False, tourn_size=3,
                rt_cross=0.5, rt_mut=0.2, verbosity=2, random_state=42,
                prto_arch_on= True, class_m4gp=True)
learner.fit(X, y) # fail with islands=Ture but g==1

Stdout and Stderr

_______________________________________________________________________________
                                    ellenGP
_______________________________________________________________________________
Results Path: /home/weixuanf/AI/ellyn/ellyn
parameter name: ellenGP
data file: d
Settings:
Evolutionary Method: Standard Tournament
ERCs on
Total Population Size: 100
Maximum Generations: 0
Number of log points: 0 (0 means log all points)
fitness type: MSE
verbosity: 2
Number of threads: 48
seeds:
1496330673
2660479011
1025990726
1884380119
1746222507
3630602626
111734222
582074169
1690355396
2910370845
3574735515
3242553180
3186686069
1940247230
194024723
4212676795
1552197784
3048528457
776098892
2078404842
831966003
2466454288
55867111
443916557
388049446
2134271953
3380710792
2992661346
1164148338
3824627349
4156809684
2716346122
3768760238
4018652072
2328296676
1414040172
2522321399
3104395568
1220015449
249891834
637941280
1608064895
1358173061
970123615
3962784961
2854503734
2272429565
3436577903
 number of evals: 100

 Program finished sucessfully.
Storing DistanceClassifier...
final model(s):
        ['x_6', 'x_13']
[best]  ['x_1', 'x_2', '-0.982']
        ['x_7', '0.683', 'x_7', '0.654']
0.88927943761
_______________________________________________________________________________
                                    ellenGP
_______________________________________________________________________________
Results Path: /home/weixuanf/AI/ellyn/ellyn
parameter name: ellenGP
data file: d
Settings:
Evolutionary Method: Standard Tournament
ERCs on
Total Population Size: 100
Maximum Generations: 0
Number of log points: 0 (0 means log all points)
fitness type: MSE
verbosity: 2
Number of threads: 8
seeds:
1496330673
1884380119
3380710792
1690355396
3186686069
388049446
2992661346
194024723
8 islands of 12 individuals, total pop 96
thread 2 exited while loop...
thread 1 exited while loop...
thread 6 exited while loop...
thread 7 exited while loop...
thread 3 exited while loop...
thread 5 exited while loop...
thread 0 exited while loop...
thread 4 exited while loop...
exited parallel region ...

 Program finished sucessfully.
Traceback (most recent call last):
  File "/home/weixuanf/anaconda3/lib/python3.6/site-packages/numpy/core/fromnumeric.py", line 57, in _wrapfunc
    return getattr(obj, method)(*args, **kwds)
AttributeError: 'list' object has no attribute 'argmax'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "ellyn_islands_tests.py", line 19, in <module>
    learner.fit(X, y) # fail with islands=Ture but g==1
  File "/home/weixuanf/AI/ellyn/ellyn/ellyn.py", line 186, in fit
    self.best_estimator_ = self.hof[np.argmax(self.fit_v)]
  File "/home/weixuanf/anaconda3/lib/python3.6/site-packages/numpy/core/fromnumeric.py", line 963, in argmax
    return _wrapfunc(a, 'argmax', axis=axis, out=out)
  File "/home/weixuanf/anaconda3/lib/python3.6/site-packages/numpy/core/fromnumeric.py", line 67, in _wrapfunc
    return _wrapit(obj, method, *args, **kwds)
  File "/home/weixuanf/anaconda3/lib/python3.6/site-packages/numpy/core/fromnumeric.py", line 47, in _wrapit
    result = getattr(asarray(obj), method)(*args, **kwds)
ValueError: attempt to get argmax of an empty sequence

Regression parameter problem

Thank you for providing Python code, great work!
I am a beginner, and I attempted to reproduce the Background Inspiration system using the d_bacres1. txt and d_bacres2. txt files you provided, but failed. My code is attached below. I found that there are many parameters that can be set in ellyn(). Which are necessary? How can I determine the range of these parameters? At the same time, I found that the results of each run are random. Is there a way to fix the output results?
Looking forward to your reply, thank you!

from ellyn import ellyn
import numpy as np
import  pandas  as pd

df = pd.DataFrame()
#with open("D:\software\software\\anaconda3\envs\ellyn-master\data\\d_bacres1.txt") as file:
with open("d_bacres1.txt") as file:
    for item in file:
        data = np.array(list(map(float,item.split(',')))).reshape((3, 1)).T
        #print(item)
        data1 = pd.DataFrame(data)
        #df = df.append(data1,ignore_index=True)
        df = pd.concat([df, data1], ignore_index=True)
df1 = pd.DataFrame() #创建一个空的dataframe
#with open("D:\software\software\\anaconda3\envs\ellyn-master\data\\d_bacres2.txt") as file:
with open("d_bacres2.txt") as file:
    for item in file:
        data = np.array(list(map(float,item.split(',')))).reshape((3, 1)).T
        #print(item)
        data1 = pd.DataFrame(data)
        #df = df.append(data1,ignore_index=True)
        df1 = pd.concat([df1, data1], ignore_index=True)

inpu = df.iloc[:,1:3].values 
oupu = df.iloc[:,0].values

learner = ellyn(g=100, popsize=100, selection = 'tournament',tourn_size=4,cross_ar=0.8,mut_ar=0.2,verbosity=1,op_list=['n','v','+','-','*','/'])
learner.fit(inpu, oupu)

Add custom primitive functions

Thanks for your work.
When doing regression,
there are primitive funtions like cos, sin, exp for building the regression tree.
Is there a way to add custom primitive functions to the list of primitive functions ?

It would help "accelerating" convergence on specific problems by renormalizing the primitives
and reducing the tree depth.

ModuleNotFoundError: No module named 'ellyn.elgp'

Hi.
I am a GP beginner.
Thank you for sharing this kind of GP codes.

But I failed to install perfectly for "ellyn.glgp".
After installation, python outputs the error message below.

(ellyn-env) renpoo@ubuntu:/mnt/hgfs/myWiki/Projects/ellyn$ python
Python 3.9.1 | packaged by conda-forge | (default, Dec 21 2020, 22:08:58) 
[GCC 9.3.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> from ellyn import ellyn
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/mnt/hgfs/myWiki/Projects/ellyn/ellyn/__init__.py", line 1, in <module>
    from .ellyn import ellyn
  File "/mnt/hgfs/myWiki/Projects/ellyn/ellyn/ellyn.py", line 21, in <module>
    import ellyn.elgp as elgp
ModuleNotFoundError: No module named 'ellyn.elgp'

My environment is

Ubuntu 20.04 on VMware (specially prepared for this "ellyn" from clean install)
Anaconda 4.9.2

Could you tell me how to fix this?
Thanks in advance.

update default configurations for python wrapper

it would be useful to have a derived class to represent M4GP, and also to update the default arguments to those used in relevant publications

Some problems with elgp

I intend to introduce ellyn package into my python environment. I think I have successfully installed the package, since ellyn is included in "conda list" and the PyCharm. However, It always returns an error: ImportError: DLL load failed while importing elgp. Given that elgp is included in the path, I guess there are some inconsistency between the versions of the dependent libraries. Basically, the versions of my libraries are python=3.9.6, boost=1.74.0, Eigen=3.3.7, ellyn=0.2.6. Are there anyone having solved the similar problems?

Uniqueness of solution

Thanks a lot for this python version of ellenGP!

I have been trying to play with a simple synthetic dataset, something as simple in the line of y = polynomial(x).
Ideally I would like to see that the equation is exactly recovered, but often this doesn't happen. Is there a way to get ellyn to converge uniquely, atleast in the case of synthetic dataset created from simple closed form expressions ?

As I couldn't resolve my issue about uniqueness, as an academic exercise, I tried to "force" ellyn to give a "unique" solution by controlling the randomness involved in the process. In a first step, I was able to control the train_test_split step happening within ellyn through setting random_state="some-seed-number". However, this by itself isn't sufficient as there is a certain randomness involved in the population generation. Is there a way to freeze or control this randomness?

makefile configuration

Hi...

I'm having trouble configuring the "makefile" file in the "ellen" repo
I changed line 2 to point to my folders
CFLAGS = -c -fPIC -I / usr / include / eigen3 -I /usr/include/python3.6m -std = c ++ 0x -fopenmp -Ofast

also change line 3 and 7
line 6:
LDFLAGS = -std = c ++ 0x -fopenmp -Ofast -shared -lpython3.6m -L / home / andres / boost_1_62_0 / stage / lib /
Line 7:
LDFLAGS2 = -lboost_python3 -Wl, -rpath, '/ home / andres / boost_1_62_0 / stage / lib /'

Could you help me to configure the file correctly

Greetings and thanks....

The poor performance of M4GP for classification

What is the best practice of M4GP on the classification problem? I have trained a classifier on the "soybean" dataset, and find it performs much worse than KNN. So, how can I tune the parameter to make it performs much better?
Code:

import numpy as np
from pmlb import fetch_data
from sklearn.metrics import accuracy_score
from sklearn.model_selection import train_test_split
from sklearn.neighbors import KNeighborsClassifier
from sklearn.preprocessing import LabelEncoder

from ellyn import ellyn

dataset = 'soybean'
data = fetch_data(dataset, return_X_y=False)
x = data.drop('target', axis=1).values
y = data['target'].values
print('y unique:', np.unique(y))
print('x:', x)

le = LabelEncoder()
yle = le.fit_transform(y)
print('yle unique:', np.unique(yle))

X_train, X_test, y_train, y_test = train_test_split(x, yle,
                                                    test_size=0.2, random_state=0)
e = ellyn(g=200, popsize=50, classification=True, verbosity=2,
          selection='lexicase')
e.fit(X_train, y_train)
print(e.predict(X_test))
print('M4GP', accuracy_score(y_test, e.predict(X_test)))

e = KNeighborsClassifier()
e.fit(X_train, y_train)
print('KNN', accuracy_score(y_test, e.predict(X_test)))

Result:

M4GP 0.17777777777777778
KNN 0.762962962962963

Fail to run M4GP on some datasets

M4GP crashes when executing the following code, but the corresponding error message has not appeared. It is fine to classify the "iris" dataset, but fail to do the same operation on the “soybean” dataset. So, what should I do?

import os
from pathlib import Path

from pmlb import fetch_data
from sklearn.metrics import accuracy_score
from sklearn.model_selection import train_test_split

from ellyn import ellyn

# iris = load_iris()
# x = iris.data
# y = iris.target

dataset = 'soybean'
local_dir = os.path.join(Path.home(), "pmlb_dataset")
data = fetch_data(dataset, return_X_y=False, local_cache_dir=local_dir)
x = data.drop('target', axis=1).values
y = data['target'].values
print(x.shape)

X_train, X_test, y_train, y_test = train_test_split(x, y, test_size=0.2, random_state=0)
e = ellyn(g=5, popsize=5, classification=True)
e.fit(X_train, y_train)
print(e.predict(X_test))
print(accuracy_score(y_test, e.predict(X_test)))

Example of the seeds parameter

Thank you for the work, it's amazing!

Could you please elaborate on parameter seeds?

Seed GP initialization with partial solutions, e.g. (x+y). Each partial solution must be enclosed in parentheses.

But when I use the code like this

learner = ellyn(g=2, popsize=200, verbosity=2, num_islands=2,
                scoring_function='r2', max_len=20, seeds='(x+y)')

or this

learner = ellyn(g=2, popsize=200, verbosity=2, num_islands=2,
                scoring_function='r2', max_len=20, seeds='((x_4)*(x_67))')

I get this error message:

Traceback (most recent call last):
  File "train_ellyn.py", line 18, in <module>
    learner.fit(X_train, y_train)
  File "/opt/notebooks/ellyn.py", line 213, in fit
    print('best model:',self.stack_2_eqn(self.best_estimator_))
  File "/opt/notebooks/ellyn.py", line 383, in stack_2_eqn
    return stack_eqn[-1]
IndexError: list index out of range

Problem with random_state

I really like your project and implementation. I wanted to use ellyn as a Classifier as below:
learner = ellyn(classification=True,
random_state=101,
class_m4gp=True,
prto_arch_on=True,
selection='lexicase',
fit_type='F1' # can be 'F1' or 'F1W' (weighted F1)
)

However, everytime, I am getting different result, so I am getting different y_pred each time. Could you please clarify where am doing it wrong?