Code Monkey home page Code Monkey logo

powerfulga's Introduction

Powerful GA

This is a powerful C++ library, that implements Genetic Algorithm. Why you should choose this one?

It supports these features:

  1. Optional Multithreading support for computing your fitness function. (as many threads as you want)
  2. Advanced matting system (There are special realistic distributions for genes while matting)
  3. Flexibly configurable random and heuristics anywhere it`s appropriate.
  4. Each generation consists of many different types of genomes
  5. Fast normal distributed numbers generator.
  6. There were other measures arranged to improve performance (for example, distributing genome pointers instead of genomes themselves)

Usage

This library uses CMake as its build system. However, you`ll probably have troubles with building this library, because it uses "pythonic" library from "https://github.com/donRumata03/pythonic", which I often modify and use for multiple projects. But to use it you can add cmake_build-release/GA.lib or cmake_build-debug/GA.lib (for debug mode) as static libraries, which are built by MSVC 2019 compiler, and header files in GA/. To explore the project there are also files .cpp in GA/src There is an example of using the library in Tests/main.cpp There are functions ga_optimize, which implements the main algorithm and log_ga_optimize, which is the wrapper, which logariphmates different parameters (you can choose them) and can also generate fitness function, which is nesessary for GA from loss function, which is usualy esasier to define.

ga_optimize function takes following parameters:

fitness_function ( std::function(double(std::vector<double>)) ) - it\`s actually the function to maximize

variable_ranges ( std::vector<std::pair<double, double>> ) - Refers to the ranges of different variables, where you are going to find maximum. 

params : ( type: GA_params ) : hyperparameters of the algorithm, which are necessary for genetic algorithms.
This is the struct, specially defined to be passed to the function.

Here are its poles with description:

  1. population_size (type : size_t) - doesn`t need description
  2. epoch_num (type : size_t) - also doesn`t need description
  3. allow_multithreading (type : bool, default : false). If this parameter is set true, there are some threads created to compute your fitness_function (assume, that it takes the biggest amount of time)
  4. threads (type : size_t, default : (std::\thread::hardware_concurrency() - 2)). This parameter only makes sense if multithreading is allowed. It determines the number of additional threads used to compute fitness_function for all the genomes in each population.
  5. Hazing percent

Implementation notes

This is not absolutely standard GA implementation. It`s designed with assumption, that the majority of time used used for computing fitness_function. That means, that almost the only goal for performance is to minimize the number of its computations. In order to achieve the aim, different heuristics are used.

To start with, there`s one difference between this implementation and the standard one, which could seem to be suspicious to you. It is the fact, that the genomes are stored as vectors of floating-point values instead of usual binary genome representation. This method is usually followed by low-quality code written by ...

But I`ll try to explain why I do it this way. Genetic Algorithm is nothing more than a tool for function optimization. That function must take floating-point parameters, which are continuous by their nature. The initial reason to represent genes the binary way is that in nature there is the operation called crossover. A point is chose randomly for crossover and the new genome is formed from two halves from different genome. The idea is that this is the way how the child gets some features from booth its parents, what helps us to achieve the main aim. Here we notice, that all the genes in genome are "crossovered" absolutely independently.

"We mostly need to search in the places where the function is bigger" - this is the idea formalized in GAs.

In each iterations we use some genomes from the previous generation somehow and add the results of their matting to them. So, the described method of crossover is OK. It pays respect to both parents and the possible result of matting for each gene is somehow distributed based on parents` genes. I declare, that the only advantage of this method over mine is that it might be a bit faster, but as far as it said above, it doesn`t matter. So, what is the method? Considering, that the genes in genomes are crossovered independently (which is, obviously, a good idea), we have the following task for each gene: given 2 numbers and optionally, this variable`s range, we should somehow chose that number for the child. The simplest way to do it is by returning the average between them. Obviously, it isn`t the best idea. This method terribly decreases variance of genomes and makes algorithm diverge to local, not global maxima. And, of course, it should be a non-deterministic operation, so, we need to form the function of probability density. I tried out and thought about different functions and I came to conclusion, that it should be made of 3 normal probability density distributions built above the genomes. One of them has it`s peak in the point of one genome`s value, the second - in the point of the next one and the last one finds itself in the middle point. Sigma values are proportional to the difference between the parents` values. The coefficient is chosen experimentally from reasons of maintaining population variance from the one hand and fast convergence from the other one.

powerfulga's People

Contributors

donrumata03 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar

powerfulga's Issues

Proposal to add binary search to the annealing method

The problem is that it's so difficult to find the temperature parameter for the annealing optimization method by hand!

So, it might make sense to do this automatically: there should be a particular percent of accepted mutations achieved.
The more the temperature, the more this percent is. So, this function is monotonous by nature (and is in fact if it't smoothed).
It might make sense to use something like n-ary search to find the optimal temperature with the best-fitting percent.

Automatic performance estimating

It would be great to be able to estimate optimization algorithm performance automatically.
Now the creator has no ideas which algorithm is better is worth continuing developing.
This might prevent situations like this one #1

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.