darioizzo / dcgp Goto Github PK

View Code? Open in Web Editor NEW

104.0 104.0 28.0 24.34 MB

Implementation of a differentiable CGP (Cartesian Genetic Programming)

License: GNU General Public License v3.0

CMake 8.02% C++ 84.39% Python 6.15% Shell 1.08% PowerShell 0.37%

dcgp's People

Contributors

Stargazers

Watchers

dcgp's Issues

Python Getting Started is broken

Invoking https://github.com/darioizzo/dcgp/blob/master/doc/examples/getting_started.py

yields

Traceback (most recent call last):
  File "getting_started.py", line 20, in <module>
    print("Expression:", ex(in_sym)[0])
NameError: name 'in_sym' is not defined

and after defining in_sym = ["x"] the sympy module cannot be found.

Where do I see the requirements.txt or similar list of python dependencies which will be installed and do you have additional dependencies to be installed for the examples to work?

expression.simplify with weight substitution does not work

Setup: dcgpy v1.2.1 from PyPI installed via pip

The following code will throw a TypeError. I believe the reason is that expression.get_arity() now returns a list with arities per node instead of a single int.

from dcgpy import kernel_set_gdual_vdouble as kernel_set
from dcgpy import expression_weighted_gdual_vdouble as dcgpy_expression

kernels = ['sum', 'diff']
n_in = 1
n_out = 1
rows = 1
cols = 15
levels_back = 16
arity = 2
kernels = kernel_set(kernels)()
ex = dcgpy_expression(n_in, n_out, rows=rows, cols=cols,
                      levels_back=levels_back, arity=arity,
                      kernels=kernels)
ex.simplify(['x'], subs_weights=True)

Return type mismatches between simplified and non-simplified expressions

from dcgpy import expression_gdual_vdouble
from dcgpy import kernel_set_gdual_vdouble

kernels = kernel_set_gdual_vdouble(['sum', 'mul'])()
expr = expression_gdual_vdouble(inputs=3, outputs=1, rows=1, cols=15, levels_back=16, arity=2, kernels=kernels, seed = 4)

now type(expr(['x','y','z'])[0]) is str while print(type(expr.simplify(['x','y','z'])[0])) is sympy.core.add.Add. Would be better to have it consistent?

expression.simplify return only one output

At the moment the following code:

dCGP = expression(inputs=1, outputs=2, rows=1, cols=15, levels_back=16, arity=2, kernels=kernels, seed = 13)
print("Simplified expression: ", dCGP.simplify(["x"]))

returns only the simplified expression corresponding to the first output node. It should, instead, return a list of symbolic simplified expression (one per output node). This would also be consistent with the return type of the call operator.

pdiv not implemented

Hi,

Fantastic package! I just started using it and it looks like a very useful alternative to Eureqa ever since they stopped giving out free academic licenses.

I just wanted to note that I'm unable to get the symbolic regression example working as it is (though some modifications fix it). When I declare the kernels, it says that the pdiv kernel is unimplemented:

from dcgpy import expression_gdual_vdouble as expression
from dcgpy import kernel_set_gdual_vdouble as kernel_set
from pyaudi import gdual_vdouble as gdual
import pyaudi
from matplotlib import pyplot as plt
import numpy as np
from random import randint
%matplotlib inline

kernels = kernel_set(["sum", "mul", "diff", "pdiv"])()

Gives:

ValueError: Unimplemented function pdiv for this type

I can change pdiv to div and it works, but I was wondering what I was missing to get pdiv working.

Let me know if you'd like more debugging information. I'm installed from pip on Python 3.7.3, with:

    dcgpy.__version__: '1.2.1'
    pyaudi.__version__: '1.6.4'
    numpy.__version__: '1.16.2'
    sympy.__version__: '1.4'

Thanks!
Miles

Wrong computation made by a dCGP expression

The following computation is wrong. It should not be zero when x is 0 and y is 1:

from dcgpy import expression_gdual_vdouble as expression
from dcgpy import kernel_set_gdual_vdouble as kernel_set
from pyaudi import gdual_vdouble as gdual
kernels = kernel_set(["sum", "mul", "div", "log"])() 
dCGP = expression(inputs=2, outputs=1, rows=1, cols=15, levels_back=16, arity=2, kernels=kernels, seed = 13)
x = [1, 0, 0, 1, 1, 1, 2, 1, 1, 0, 2, 3, 0, 4, 5, 3, 6, 4, 0, 2, 6, 1, 6, 3, 3, 8, 7, 3, 0, 5, 1, 3, 11, 2, 11, 2, 2, 7, 2, 1, 5, 4, 3, 12, 2, 7]
x = dCGP.set(x)
print(dCGP.simplify(["x", "y"]))

#out[6] = [log(x**2 + y**2 + 1)]

dCGP([gdual([0]), gdual([1])])
#Out[7]: [[0.693147]]

dCGP([gdual([1]), gdual([0])])
#Out[8]: [0]

[BUG] Issues with tutorial

Nice looking package! I'm eager to try it out as an alternative to Eureqa. Here are the issues I am facing:

Describe the bug
2 issues:

Complained about missing libpagmo.so.1 and libipopt.so.1.
- I fixed (hopefully) by symlinking libpagmo.so.3 and libipopt.so.3 (which are the installed packages by conda-forge)
I see the following issues when running the Python tutorial: http://darioizzo.github.io/dcgp/notebooks/symbolic_regression_1.html

---------------------------------------------------------------------------
ArgumentError                             Traceback (most recent call last)
<ipython-input-9-0a362e511eb6> in <module>
      1 ss = dcgpy.kernel_set_double(["sum", "diff", "mul", "pdiv"])
      2 udp = dcgpy.symbolic_regression(points=X, labels=Y, kernels=ss())
----> 3 uda  = dcgpy.es4cgp(gen=10000, max_mut=4)
      4 prob = pg.problem(udp)
      5 algo = pg.algorithm(uda)
ArgumentError: Python argument types in
    es4cgp.__init__(es4cgp)
did not match C++ signature:
    __init__(_object*, unsigned int gen=1, unsigned int mut_n=1, double ftol=0.0001, bool learn_constants=True, unsigned int seed)
    __init__(_object*, unsigned int gen=1, unsigned int mut_n=1, double ftol=0.0001, bool learn_constants=True)
    __init__(_object*)

To Reproduce
conda-forge install with Python 3.7, then run the tutorial.

Environment (please complete the following information):

OS: CentOS Linux release 7.8.2003
Installation method: conda-forge
Version: conda 4.8.3; python 3.7; gcc 8.3.0

Evolve a neural network

Is it possible to evolve the topology of a neural network through mutation? (I know the implementation does not do crossover) Maybe I am not following the python code correctly, but it appears that in the feed forward neural network example, the following creates a fixed topology. If this was possible, the implementation could be very powerful (the code as it is is also quite powerful).

dcgpann = dcgpy.encode_ffnn(2,1,[50,20,10],["sig", "sig", "sig", "sum"], 5)

[BUG] Python getting_started.py with error

Invoking https://github.com/darioizzo/dcgp/blob/master/doc/examples/getting_started.py

yields

Traceback (most recent call last):
File "getting_started.py", line 33, in
print("Expression in x=1.2:", ex([x])[0])
TypeError: call(): incompatible function arguments. The following argument types are supported:
1. (self: dcgpy.core.expression_gdual_double, arg0: List[audi::gdual<double, obake::polynomials::d_packed_monomial<unsigned long long, 8u, void> >]) -> List[audi::gdual<double, obake::polynomials::d_packed_monomial<unsigned long long, 8u, void> >]
2. (self: dcgpy.core.expression_gdual_double, arg0: List[str]) -> List[str]

When I comment out lines 33 and 34, i.e. both that use "x = gdual(1.2, "x", 2)". It runs through.

I simply ran:
conda config --add channels conda-forge
conda install dcgp-python

On a new conda environment. Its hard to tell whats going on. But I run into the same problem when trying out one of the ODE problems.

I furthermore had to install graphiz in order to make it run this far in the first place.

Problem with evaluation of expression_gdual_vdouble

I have a problem with evaluation of expression_gdual_vdouble on a gdual_vdouble. If I run the following program

from pyaudi import gdual_vdouble as gdual
import pyaudi
from dcgpy import expression_gdual_vdouble as expression
from dcgpy import kernel_set_gdual_vdouble as kernel_set
import numpy as np

kernels = kernel_set(["sum", "mul", "diff", "div"])()

dCGP = expression(1, 1, rows=1, cols=15, levels_back=16, arity=2, kernels=kernels, seed=np.random.randint(1233456))

x = np.linspace(0,1,10)
x = gdual(x)
dCGP([x])

I get the following error:

TypeError: No registered converter was able to produce a C++ rvalue of type std::__1::basic_string<char, std::__1::char_traits, std::__1::allocator > from this Python object of type gdual_vdouble

Ephemeral constants for ANN and inconsistencies in documentation

The docs say that the dcgpy.expression_ann can take ephemeral constants, however the python constructor does not allow it.

https://darioizzo.github.io/dcgp/docs/python/expression_ann.html

Some of the example code does not work, for example

>>> from dcgpy import *
>>> dcgp = expression_double(1,1,1,10,11,2,kernel_set(["sum","diff","mul","div"])(), 0u, 32u)
>>> print(dcgp)

does not work (there is no "expression_double" anymore).

There are inconsistencies in the signatures of some functions, e.g.:

Symbolic regression wants to have columns

https://darioizzo.github.io/dcgp/docs/python/symbolic_regression.html

dCGP-expressions want to have cols

https://darioizzo.github.io/dcgp/docs/python/expression.html

Add constants

Hi,
I want to add "constants" to the function set. My first guess is to use a function that return a random variables, but the problem is that it return a different values every time it is called.
There is a way to add constants?

By the way, now the only way to add functions to the function set is using "names". Maybe it is better to add a method to directly add functions. If you think is a good idea I can make a PR.

Thanks

              Alberto

Querying for node_idx in bias is not what it seems to be....

dcgpann = dcgpy.encode_ffnn(1, 1, [5] , ['sig', 'sum'] , levels_back=1)
dcgpann.randomise_biases()

This network has 7 active nodes (1 input, 1 output + 5 hidden).
There should be 6 biases (5 hidden + 1 output)
The input node should not have a bias.

Querying for node_id = 0 should thus not give a bias.

But: dcgpann.get_bias(0) gives the bias of the node with id 1 instead. This is confusing and currently inconsistent with the documentation.

Moreover, It is possible to query for biases outside the bounds, potentially leading to undefined behavior.

dcgpann.get_bias(42)

[BUG] n_eph >= 5 causes crash for symbolic regression

Describe the bug
Large numbers of ephemeral constants causes Python to crash when doing symbolic regression.

To Reproduce
Steps to reproduce the behavior:

Copy the code from here: http://darioizzo.github.io/dcgp/notebooks/symbolic_regression_3.html
Choose n_eph to be 5 or larger in the argument to dcgpy.symbolic_regression.
Run the code.
This causes a Python crash.

Screenshots
Here is the log, with verbosity=1000 (I'm not sure what the highest verbosity level is)

2aab23143000-2aab23144000 ---p 00000000 00:00 0 
2aab23144000-2aab23384000 rw-p 00000000 00:00 0 
2aab24000000-2aab24021000 rw-p 00000000 00:00 0 
2aab24021000-2aab28000000 ---p 00000000 00:00 0 
2aab28000000-2aab28021000 rw-p 00000000 00:00 0 
2aab28021000-2aab2c000000 ---p 00000000 00:00 0 
555555554000-5555555af000 r--p 00000000 00:2c 4221074                    /mnt/home/mcranmer/miniconda3/envs/dcgpy/bin/python3.7
5555555af000-555555788000 r-xp 0005b000 00:2c 4221074                    /mnt/home/mcranmer/miniconda3/envs/dcgpy/bin/python3.7
555555788000-55555582f000 r--p 00234000 00:2c 4221074                    /mnt/home/mcranmer/miniconda3/envs/dcgpy/bin/python3.7
55555582f000-555555832000 r--p 002da000 00:2c 4221074                    /mnt/home/mcranmer/miniconda3/envs/dcgpy/bin/python3.7
555555832000-55555589b000 rw-p 002dd000 00:2c 4221074                    /mnt/home/mcranmer/miniconda3/envs/dcgpy/bin/python3.7
55555589b000-555559556000 rw-p 00000000 00:00 0                          [heap]
7ffffffdc000-7ffffffff000 rw-p 00000000 00:00 0                          [stack]
ffffffffff600000-ffffffffff601000 r-xp 00000000 00:00 0                  [vsyscall]

   Gen:        Fevals:     Best loss: Ndf size:  Compl.:
      0              0        1.41375         4         7
[I 22:01:59.528 NotebookApp] KernelRestarter: restarting kernel (1/5), keep random ports

Environment (please complete the following information):

OS: CentOS Linux release 7.8.2003 (Core)
Installation method: conda create --name dcgpy --channel conda-forge python=3.7 numpy scipy matplotlib cython h5py pyfftw notebook tqdm jupyter jupyterlab mpi4py openmpi astropy ipywidgets dcgp-python -y

Versions:

python: 3.7.5
dcgpy: 1.5
pygmo: 2.15.0

Cross entropy always 0.0 with dgcpann

Describe the bug
For some reason, loss(X,y, "CE") always returns 0.0 but works fine for MSE.

To Reproduce

Create a network using dcgpy.expression_ann with 2 inputs and 1 output
Run the loss function with any valid input. E.g.: loss([[0.5, 0.6]], [[1]], "CE")

Expected behavior
A non-zero cross-entropy loss.

Environment (please complete the following information):

Installation method: [conda]
Version: [latest]

mutation operators do not scale well

https://github.com/darioizzo/d-CGP/blob/bbef535fa3ddd81e2fcf90faf8215874e5fd0c31/include/dcgp/expression.hpp#L642

In cases where there are a lot of input nodes (but a relatively small network) the "try and discard"-while loops in this area are somewhat inefficient.

Defining a custom kernel in C++

Hello,
I am sorry if I am missing something obvious but I have a lot of trouble to define my custom kernel. I would like to use the Lambert W function as a kernel. It is implemented in boost.
I tried to adapt the first C++ example by adding the kernel as specified in the documentation for kernel and in the one for kernel_set as following:

#include <boost/math/special_functions/lambert_w.hpp>
// [...]

template <typename T>
inline T my_lambertw(const std::vector<T> &in) {
    return boost::math::lambert_w0(in[0]);
}

inline std::string print_my_lambertw(const std::vector<std::string> &in) {
    std::string retval(in[0]);
    return "Lw(" + retval + ")";
}

int main() {
    // [...]
    kernel<double> f(my_lambertw<double>, print_my_lambertw, "my_lambertw");
    kernel_set<double> kernels({"sum", "diff", "mul", "pdiv"});
    kernels.push_back(f);
    symbolic_regression udp(X, Y, 1, 20, 21, 2, kernels(), n_eph);
    // [...]
}

The compilation succeeds but at execution the program fails with

terminate called after throwing an instance of 'std::invalid_argument'
  what(): Unimplemented function my_lambertw for this type
Aborted

The error makes me think that the kernel should maybe be implemented for dual numbers as well? But I don't know how to do it.

Could you please help me to understand the issue? When I understand how to do it, I would be very happy to complete the documentation or write a short example exhibiting custom kernels to help future users.
Thank you very much in advance!

Cannot invoke python tests

Hi,

I installed dcgpy as described and end up getting a ModuleNotFoundError

conda create -n dcgpy-dev python=3.8
conda activate  dcgpy-dev 
conda install dcgp-python -c conda-forge
python -c "from dcgpy import test; test.run_test_suite(); import pygmo; pygmo.mp_island.shutdown_pool(); pygmo.mp_bfe.shutdown_pool()"
Traceback (most recent call last):
  File "<string>", line 1, in <module>
  File "/Users/mq/repo/github/dcgp/dcgpy/__init__.py", line 2, in <module>
    from ._version import __version__
ModuleNotFoundError: No module named 'dcgpy._version'

[DOCS] Hyperparameter tuning advice

Hi @darioizzo,

I was wondering if you had any hyperparameter tuning advice for symbolic regression? I've read descriptions of hyperparameters, but don't have much intuition for what they should be chosen to be.

For a given multi-objective symbolic regression problem, how should I select the following?

cols
rows
levels_back
gen
max_mut
size (for population)

I've started with the ones given in the examples. But I have no intuition for how to tune them to a different problem. Furthermore, if I encounter one of the following situations, how should I re-tune my parameters?

Expressions are too simple
or
Expressions are overcomplicated, but still don't find the true equation

Thanks!
Miles

obake not found

I was missing the environment creation in the conda command. Now I am able to install everything and run the python tests referenced on the repository. Thank you very much for the help. I think you should change the installation command in the README to the one that you mentioned (that includes the environment creation). I am not very familiar with conda and there could be readers who are also not experts.

But I am still unable to build the git package. The error is as follows:

CMake Error at /Users/shah/Documents/anaconda/anaconda3/envs/dcgpy/lib/cmake/audi/audi-config.cmake:9 (find_package):
  By not providing "Findobake.cmake" in CMAKE_MODULE_PATH this project has
  asked CMake to find a package configuration file provided by "obake", but
  CMake did not find one.

  Could not find a package configuration file provided by "obake" with any of
  the following names:

    obakeConfig.cmake
    obake-config.cmake

  Add the installation prefix of "obake" to CMAKE_PREFIX_PATH or set
  "obake_DIR" to a directory containing one of the above files.  If "obake"
  provides a separate development package or SDK, be sure it has been
  installed.
Call Stack (most recent call first):
  CMakeLists.txt:193 (find_package)

Keras vs dCGP plot

In the plot of keras vs dCGP, is the x-axis wrongly labeled as RMSE? Should it be epochs?

https://darioizzo.github.io/dcgp/notebooks/dCGPANNs_for_function_approximation.html

Logging can lead to crash

If symbolic expressions become too convoluted/long during evolution, the log-file grows extremely in size until it fills up the memory, leading to poor performance and freezes. The following code can produce this behavior:

import dcgpy
import pygmo as pg

X, Y = dcgpy.generate_salutowicz()

kernels = dcgpy.kernel_set_double(["sum", "diff", "mul", "cos"])

udp_dcgpy = dcgpy.symbolic_regression(
    points = X, 
    labels = Y, 
    kernels=kernels(), 
    rows = 1, 
    cols = 100, 
    n_eph = 0,
    levels_back = 5,  # this setting creates convoluted and complex expressions
    multi_objective=False)
prob = pg.problem(udp_dcgpy)

uda  = dcgpy.es4cgp(gen = 100, max_mut = 4)
algo = pg.algorithm(uda)

#outcomment the following line to eat up your memory
#algo.set_verbosity(10)

pop = pg.population(prob, 4)
pop = algo.evolve(pop)

Building examples

I am trying to build the C++ examples. Here is the cmake command I am using:

cmake -DCMAKE_PREFIX_PATH=/Users/shah/Documents/anaconda/anaconda3/envs/dcgpy/ -DCMAKE_CXX_STANDARD_INCLUDE_DIRECTORIES=/Users/shah/Documents/anaconda/anaconda3/envs/dcgpy/include/ .

The Eigen library is installed in an eigen3 directory, and it appears that it is unable to find headers inside the eigen3/Eigen/ directory.

/Users/shah/Documents/anaconda/anaconda3/envs/dcgpy/include/audi/invert_map.hpp:4:10: fatal error: 
      'Eigen/Dense' file not found
#include 
         ^~~~~~~~~~~~~

mes4cgp takes up all cpu cores [BUG]

Describe the bug
In contrast to es4cgp, which can be constraint to run single-core by setting parallel_batches = 0 in the corresponding symbolic regression udp, the memetic variant mes4cgp does not have this option and reserves all cpus available. This makes it impossible/hard to deploy fairly on large compute servers.

To Reproduce
The behavior should be reproducible simply be running some tutorial code.

Expected behavior
mes4cgp should respect parallel_batches similar to es4cgp.

Environment (please complete the following information):

OS: archlinux
Installation method: conda
Version: dcgp-python 1.5

Check if an expression depends on all its input variables

Hi,
which is the best way to check if an expression depends on all its input variables?
I use CGP to solve some PDE. Sometimes the expression does not depends on all its input variables, but I want that the solution depends on all the variables. Now I check if the first derivative are zero, if so I add a penalty term. It works ( without it the evolution get stuck in non optimal solutions ) but I don't know if it is the best way.
Thanks
Alberto

Expressions cannot be pickled

from dcgpy import expression_gdual_vdouble
from dcgpy import kernel_set_gdual_vdouble
import pickle

kernels = kernel_set_gdual_vdouble(['sum', 'mul'])()
expr = expression_gdual_vdouble(inputs=3, outputs=1, rows=1, cols=15, levels_back=16, arity=2, kernels=kernels, seed = 4)
pickle.dumps(expr)

Results in a RuntimeError. Cloudpickle produces no error, but also nothing that could be deserialized again.

make dcgp expressions available in python

There should be a way to retrieve individuals of a population of symbolic_regression udps as expression_double, expression_gdual_double, etc.

Right now, the chromosome is accessible in raw by pop.get_x(), but expressions would be easier to work with.

dcgpy.core ImportError

First off, I should say that in early 2023, I had no issues installing and using dcgpy on Windows with Conda.

Now, however, when importing core.pyd via SomeCondaInstallation\envs\dcgp\Lib\site-packages\dcgpy_init_.py
I consistently run into

ImportError: DLL load failed while importing core: The specified module could not be found.

The remarkable thing about it is that everything works just fine with the pyaudi core.pyd import!

The core.pyd file for dcgpy is in its place and does not differ from former successful installations on other Windows machines. So, no worries about that.
I tried with several Windows machines and different versions of Anaconda and Miniconda to no avail.
My only guess is that the issue might have started with the dcgpy conda package using Python 3.11.2, but I could easily be wrong about that.

Problem with python interface

Hi,
I can use d-CGP with c++, but when I tried to use the python interface I got the following error:

lib/python2.7/site-packages/dcgpy/_core.so: undefined symbol: _ZN5boost6python7objects23register_dynamic_id_auxENS0_9type_infoEPFNSt3__14pairIPvS2_EES5_E

I check the _core.so with ldd and it is linked with libpython2.7.so and boost python.
Have you any idea on which is the cause of this error?
Thank
Alberto

set_output_f should be available for all dcgp expressions

set_output_f exists only for dcgpann-types at the moment, which is a pity. It makes it difficult to simply switch the type of dcgp-expression with something else (like expression_double). Having better control about your output is always helpful and should not be limited to ANNs only.

Parallelism

Hi @darioizzo,

I was wondering if there were any options for parallelism in dCGPy for symbolic regression? I thought about parallelizing the following loops:

for i in range(offsprings):
            dCGP.set(best_chromosome)
            cumsum=0
            dCGP.mutate_active(i+1)
            fit, constant[i] = err2(dCGP, x, yt, best_constants)
            fitness[i] = fit
            chromosome[i] = dCGP.get()

in the run_experiment code in Python, but it appears dCGP is a global var.

Thanks,
Miles

[FEATURE] use previous computation

Just wondering, if there is a way to use previous formulaes
as starting points for evolution ?

Mostly, load as initial population (vs starting from scratch every time...),
Thanks

expression.set_weight() for weighted expressions does not respect the input_id

Using expression.set_weight(node_id, input_id, w) will always set the weight at input_id=0 of node_id=node_id to w.

Minimal working example:

import dcgpy

expr = dcgpy.expression_weighted_double(1, 1, 1, 1, 2, 2, dcgpy.kernel_set_double(['sum', 'diff', 'div', 'mul'])())
print(expr.get_weights())  # prints [1.0, 1.0]
expr.set_weight(0, 1, 0.5)  # should set the second weight to 0.5
print(expr.get_weights())  # prints [0.5, 1.0]

If I see it right the issue is that in the C++ function we get the node_id but do not add the input_id when constructing the index to the weight vector.

If you agree that this is the problem and if it helps, I can prepare a pull request for that.

PIP: No matching distribution found for dcgpy

I've tried to install dcgpy and pyaudi from both pip2.7 and pip3.
Both of them threw the same error:


[bernardo@sager-pc Downloads]$ sudo pip install dcgpy
Collecting dcgpy
  Could not find a version that satisfies the requirement dcgpy (from versions: )
No matching distribution found for dcgpy

[bernardo@sager-pc Downloads]$ sudo pip2 install pyaudi
Collecting pyaudi
  Could not find a version that satisfies the requirement pyaudi (from versions: )
No matching distribution found for pyaudi

darioizzo / dcgp Goto Github PK

dcgp's People

Contributors

Stargazers

Watchers

Forkers

dcgp's Issues

Recommend Projects

Recommend Topics

Recommend Org