aimclub / iopt Goto Github PK

Framework of intelligent optimization methods iOpt

License: BSD 3-Clause "New" or "Revised" License

Python 99.97% Dockerfile 0.03%

global-optimization global-optimization-algorithms global-optimizers hyperparameter-optimization parameter-tuning

iopt's Introduction

iOpt is an open source framework for automatic selection of parameter values both for mathematical models of complex industrial processes and for AI and ML methods used in industry. The framework is distributed under the 3-Clause BSD license.

Key features of the framework

Automatic selection of parameter values both for mathematical models and for AI and ML methods used in industry.
Intelligent control of the process of choosing the optimal parameters for industrial applications.
Integration with external artificial intelligence and machine learning libraries or frameworks as well as applied models.
Automation of the preliminary analysis of the models under study, e.g., by identifying different types of model dependencies on different groups of parameters.
Visualization of the process of choosing optimal parameters.

Installation

Automatic installation

The simplest way to install iOpt is using pip:

pip install iOpt

Manual installation

On Unix-like systems:

git clone https://github.com/aimclub/iOpt
cd iOpt
pip install virtualenv
virtualenv ioptenv
source ioptenv/bin/activate
python setup.py install

On Windows:

git clone https://github.com/aimclub/iOpt
cd iOpt
pip install virtualenv
virtualenv ioptenv
ioptenv\Scripts\activate.bat
python setup.py install

Docker

Download the image:

docker pull aimclub/iopt:latest

Using the iOpt image:

docker run -it aimclub/iopt:latest

How to Use

Using the iOpt framework to minimize the Rastrigin test function.

from problems.rastrigin import Rastrigin
from iOpt.solver import Solver
from iOpt.solver_parametrs import SolverParameters
from iOpt.output_system.listeners.static_painters import StaticPainterNDListener
from iOpt.output_system.listeners.console_outputers import ConsoleOutputListener

from subprocess import Popen, PIPE, STDOUT

if __name__ == "__main__":
    """
    Minimization of the Rastrigin test function with visualization
    """
    # Create a test task
    problem = Rastrigin(2)
    # Setup a solver options
    params = SolverParameters(r=2.5, eps=0.01, iters_limit=300, refine_solution=True)
    # Create the solver
    solver = Solver(problem, parameters=params)
    # Print results to console while solving
    cfol = ConsoleOutputListener(mode='full')
    solver.add_listener(cfol)
    # 3D visualization at the end of the solution
    spl = StaticPainterNDListener("rastrigin.png", "output", vars_indxs=[0, 1], mode="surface", calc="interpolation")
    solver.add_listener(spl)
    # Run problem solution
    sol = solver.solve()

Examples

Let’s demonstrate the use of the iOpt framework when tuning the hyperparameters of one of the machine learning methods. In the support vector machine (SVC), we find the optimal hyperparameters (the regularization parameter C, the kernel coefficient gamma) in the problem of breast cancer classification (detailed description of the data).

import numpy as np
from sklearn.utils import shuffle
from sklearn.datasets import load_breast_cancer

from iOpt.output_system.listeners.static_painters import StaticPainterNDListener
from iOpt.output_system.listeners.animate_painters import AnimatePainterNDListener
from iOpt.output_system.listeners.console_outputers import ConsoleOutputListener
from iOpt.solver import Solver
from iOpt.solver_parametrs import SolverParameters
from examples.Machine_learning.SVC._2D.Problems import SVC_2d


def load_breast_cancer_data():
    dataset = load_breast_cancer()
    x_raw, y_raw = dataset['data'], dataset['target']
    inputs, outputs = shuffle(x_raw, y_raw ^ 1, random_state=42)
    return inputs, outputs


if __name__ == "__main__":
    x, y = load_breast_cancer_data()
    regularization_value_bound = {'low': 1, 'up': 6}
    kernel_coefficient_bound = {'low': -7, 'up': -3}

    problem = SVC_2d.SVC_2D(x, y, regularization_value_bound, kernel_coefficient_bound)

    method_params = SolverParameters(r=np.double(3.0), iters_limit=100)
    solver = Solver(problem, parameters=method_params)

    apl = AnimatePainterNDListener("svc2d_anim.png", "output", vars_indxs=[0, 1], to_paint_obj_func=False)
    solver.add_listener(apl)

    spl = StaticPainterNDListener("svc2d_stat.png", "output", vars_indxs=[0, 1], mode="surface", calc="interpolation")
    solver.add_listener(spl)

    cfol = ConsoleOutputListener(mode='full')
    solver.add_listener(cfol)

    solver_info = solver.solve()

Let's consider an example of using multicriteria optimization. We use optimization for two float objectives: precision and recall. The result of the process is a Pareto set chart.

from examples.Machine_learning.SVC._2D.Problems import mco_breast_cancer

from iOpt.solver import Solver
from iOpt.solver_parametrs import SolverParameters
from iOpt.output_system.listeners.console_outputers import ConsoleOutputListener
import matplotlib.pyplot as plt
from sklearn.datasets import load_breast_cancer
from sklearn.model_selection import train_test_split

if __name__ == "__main__":

    X, y = load_breast_cancer(return_X_y=True)
    X_train, X_valid, y_train, y_valid = train_test_split(X, y)
    problem = mco_breast_cancer.mco_breast_cancer(X, y, X_train, y_train)

    params = SolverParameters(r=3.0, eps=0.01, iters_limit=200, number_of_lambdas=50,
                              start_lambdas=[[0, 1]], is_scaling=False)

    solver = Solver(problem=problem, parameters=params)

    cfol = ConsoleOutputListener(mode='full')
    solver.add_listener(cfol)

    sol = solver.solve()

    var = [trial.point.float_variables for trial in sol.best_trials]
    val = [[-trial.function_values[i].value for i in range(2)] for trial in sol.best_trials]

    print("size pareto set: ", len(var))
    for fvar, fval in zip(var, val):
        print(fvar, fval)

    fv1 = [-trial.function_values[0].value for trial in sol.best_trials]
    fv2 = [-trial.function_values[1].value for trial in sol.best_trials]
    plt.plot(fv1, fv2, 'ro')
    plt.show()

Project Structure

The latest stable release of iOpt is in the main branch. The repository includes the following directories:

The iOpt directory contains the framework core in the form of Python classes.
The examples directory contains examples of using the framework for both test and applied problems.
Unit tests are located in the test directory.
Documentation source files are located in the docs directory.

Documentation

A detailed description of the iOpt framework API is available at Read the Docs.

Supported by

The study is supported by the Research Center Strong Artificial Intelligence in Industry of ITMO University as part of the plan of the center's program: Framework of intelligent heuristic optimization methods.

iopt's People

Contributors

Stargazers

Watchers

iopt's Issues

Добавить названия в задачи

Нужно в класс Problem добавить поле Name с названием конкретной задачи (Hill, X2, Rastrigin, ...)

Proposal for benchmarking

Как идея для тестирования библиотеки - можно попробовать запустить её на https://github.com/automl/HPOBench

Тесты - навести порядок в структуре.

Cтруктура папок и расположение самих тестов должно соответствовать пакету iOpt.

Переименовать "test/problem" в "test/problems" (Козинов, Лебедев)

Переместить "test/test_evolvent.py" и тестовые данные "test/evolventTestData" в "test/iOpt/evolvent/" (Штанюк)
Переместить "test/test_optim_task.py" в "test/iOpt/method/test_optim_task.py" (Силенко)

Также нужно дописать тесты на классы:

Solver - проверка получения результата, проверка одной итерации и т.д. (Кудрявцев)
Process - на все методы, сейчас этот класс не протестирован. (Черных?)
OptimizationTask - проверить вычисление задачи с ограничениями. (Силенко)

Discrete parameters optimization

While working on #89 I ran into a problem with optimizing categorical and discrete parameters (e.g. RandomForest has param criterion with options '["gini", "entropy"] and param min_samples_split that can be an int from 2 to inf).

Now class Problem contains such attributes as discreteVariableValues and discreteVariableNames but Solver doesn't seem to optimize them.

Can such parameters be optimized now?
How can I correctly specify possible values for discrete variables (Problem.discreteVariablesValues)?

Add timeout

Было бы полезно для практического применения иметь возможность задавать ограничение на время поиска параметров.
Например, чтобы была возможность передавать timeout через SolverParameters, как максимальное количество времени в минутах (пример для 5 минут).

solver_parameters = SolverParameters(timeout=5)

Так чтобы, не смотря на то, что не все заданные итерации были пройдены, алгоритм принудительно завершается по истечение заданного времени.

Геттер последней добавленной точки

Нужен геттер, который возвращал бы последнюю добавленную точку испытания. Например, когда проводится первая итерация в середине отрезка, вытащить получившуюся точку, чтобы отрисовать её или напечатать в консоль, никаким нормальным способом нельзя. Потому нужен геттер

Вынести папку problems из папки iOpt.

Надо переместить папку "iOpt/problems" в корень репозитория и изменить название на "problems".
Перемещение затронет классы в папках "examples" и "test" - нужно поправить зависимости. Например:
from iOpt.problems.GKLS import GKLS
изменится на from problems.GKLS import GKLS

Add gitlab mirror

The repository should be mirrored to gitlab https://gitlab.actcognitive.org/itmo-sai-code according to the

https://github.com/ITMO-NSS-team/open-source-ops/blob/master/tutorials/mirror_repo_to_gitlab.md

README should become multi-language.

Rename methods and variables PEP8

Сейчас в проекте названия методов пишутся с помощью CapWords, атрибуты классов и аргументы функций c помощью mixedCase. PEP8 рекомендует использовать во всех этих случаях camel_case, но допускает использование использование mixedCase, в следующем случае:

mixedCase is allowed only in contexts where that’s already the prevailing style (e.g. threading.py), to retain backwards compatibility.

Однако, такое именование приводит к тому, что при использовании IOpt в других проектах, которые в основном следуют pep8, нарушается консистентность нейминга. Например в GOLEM.

Хотела бы предложить:

привести именования в соответствие PEP8 к snake_case

Заодно

почистить проект от закоментированого кода (пример) и неинформативных комментариев (пример)
некоторые коментарии заменить на TODO (пример)
поправить типизацию и значение по умолчанию в некогорых методах (иногда в качестве значений по умолчанию используются изменяемые типы, что может вести к ошибкам)

Возможно стоит подключить pep8speaks, чтобы было проще следить за code style

Папка iOpt/examples устанавливается как отдельная библиотека.

Доброго дня, устанавливал зависимости для GOLEM и установил IOpt через pip. После чего, при попытке импортировать модули из папки GOLEM/examples... (из папки моего рабочего проекта) словил ошибку "module not found". Происходит это по той причине, что в PATH python путь до venv/Lib/Site-packeges стоит одним из первых, из-за чего python находит модуль examples в Site-packeges первым. И если добавлять путь до рабочей директории проекта через sys.path.append(), то это не помогает, так как уже найдет модуль examples. Если первым будет путь до рабочего проекта, то из examples от iOpt будет невозможно импортировать что-то.
Собственно вопрос/предложение в следующем: Нужно ли устанавливать папку iOpt/examples как отдельную библиотеку? Если нет необходимости, то может добавить в исключение эту папку? Если все же она как-то нужна, то может переименовать ее, чтобы не возникало конфликтов?

Проблемы с сериализацией при использовании numberOfParallelPoints

При попытке установить значение numberOfParallelPoints больше 1 сталкиваюсь со следующей ошибкой

Error when no descrete parameters

Как я поняла, если задача не имеет дискретных параметров, возникает ошибка на этапе вывода результата.

Implement usage of startPoint

Сейчас в SolverParameters можно передать параметр начального приближения startPoint, однако, при подборе параметров начальное приближение не используется.

Create FEDOT-based example

The example for FEDOT-GOLEM-iOPt should be prepared.

The efficiency should be compared with existing HyperOpt tuner.

Зацикливание при бесконечном значении функции

Сейчас пробую добавить оптимизатор IOpt в FEDOT и столкнулась с такой проблемой.

Для задачи поиска гиперпараметров для композитных моделей машинного обучения бывает, что комбинация гиперпараметров может быть невалидной и для нее невозможно вычислить метрику (при обучении модели возвращается ошибка). В таком случае я попробовала возвращать оптимизатору худшее возможное значение метрики (для минимизации) - np.inf. Однако в таком случае алгоритм попадает в бесконечный цикл при попытке добавить dataItem в очередь, на первой итерации (это то, что мне удалось понять при дебаге, но возможно я не права).

Возможно, есть какой-то способ решить данную проблему? Подумала, что скорее всего поможет заменить np.inf на sys.maxsize, но решила все-таки рассказать о проблеме, вoзможно получится обработать такой экстремальный случай внутри IOpt.

Я использовала версию IOpt 0.1.6, но попробовала и текущую версию из main - алгоритм так же зависает.

Проблему сложно воспроизвести, если специально ничего не ломать, но вот код, который иногда ломается

import pandas as pd
from golem.core.tuning.iopt_tuner import IOptTuner

from fedot.core.data.data import InputData
from fedot.core.pipelines.pipeline_builder import PipelineBuilder
from fedot.core.pipelines.tuning.search_space import PipelineSearchSpace
from fedot.core.pipelines.tuning.tuner_builder import TunerBuilder
from fedot.core.repository.dataset_types import DataTypesEnum
from fedot.core.repository.tasks import TaskTypesEnum, Task, TsForecastingParams
from fedot.core.utils import fedot_project_root

search_space = PipelineSearchSpace().get_parameters_dict()

# Можно использовать
# pipeline = PipelineBuilder().add_node('cgru').build()
# чтобы сломать специально - тогда точно воспроизведется зацикливание
pipeline = PipelineBuilder().add_node('lagged').add_node('cgru').build()

task = Task(TaskTypesEnum.ts_forecasting, task_params=TsForecastingParams(forecast_length=10))
time_series = pd.read_csv(f'{fedot_project_root()}/examples/data/ts/beer.csv')
idx = time_series['idx'].values
time_series = time_series['value'].values
data = InputData(idx=idx,
                 features=time_series,
                 target=time_series,
                 task=task,
                 data_type=DataTypesEnum.ts)

tuner = TunerBuilder(task).with_tuner(IOptTuner).with_iterations(20).build(data)

tuner.tune(pipeline)

Датасет: https://github.com/aimclub/FEDOT/blob/master/examples/data/ts/beer.csv
Я использую FEDOT и требования для него из своей ветки: https://github.com/aimclub/FEDOT/tree/add-iopt (pr aimclub/FEDOT#1102)
FEDOT использует реализацию тюнера на основе IOpt из этой ветки GOLEM: https://github.com/aimclub/GOLEM/tree/iopt-tuner
Сам код IOptTuner: https://github.com/aimclub/GOLEM/blob/iopt-tuner/golem/core/tuning/iopt_tuner.py

Не работает таймаут

При использовании параметра timeout solver.Solve() возвращает None. Кажется, эту строчку нужно исправить на sol = solv_with_timeout(). После исправления все заработало.