Weight scheme is perhaps a poor choice. These schemes are responsible for ada

Reinforcement learning <a href="https://www.andrew.cmu.edu/course/10-703/textbook/Bart

On adaptive operation selection from <a href="https://www.theses.fr/2022IMTA0297.pdf"

Rename weight schemes about alns HOT 13 CLOSED

n-wouda commented on July 17, 2024

Rename weight schemes

from alns.

Comments (13)

N-Wouda commented on July 17, 2024 1

Should we create an IntEnum for the evaluation outcome, and pass that to update, rather than a meaningless integer s_idx?

I'm thinking of

import enum

class Outcome(enum.IntEnum):
    BEST = 0
    BETTER = 1
    ACCEPTED = 2
    REJECTED = 3

from alns.

N-Wouda commented on July 17, 2024

If/when we do this, we should probably also implement more options from the literature. That would also be a good check on whether the interface for adaptive schemes is sufficiently general.

from alns.

leonlan commented on July 17, 2024

Stochastic universal sampling is an alternative to the roulette wheel mechanism. It's used in one ALNS paper (Chowdhury et al. (2019).

from alns.

leonlan commented on July 17, 2024

Reinforcement learning (Sutton and Barto (2018)) could provide a framework for adaptive operator selection. For example mutli-armed bandit, which is already mentioned by #73.

from alns.

leonlan commented on July 17, 2024

The dissertation by Karimi Mamaghan (2022) may be helpful for our changes related to the adaptive operation selection in ALNS.

In Section 3.5.1, the adaptive operator selection problem is described as a sequence of five steps: 1) performance criteria identification, 2) reward computation, 3) credit assignment, 4) selection, 5) move acceptance. Step 5) is already covered by our implementation of acceptance criteria.
Q-learning is the mechanism employed for operator selection. It is described in Section 4.1. The main idea is that we have a set of states $S$ (e.g., did we improve last time or not), a set of actions $A$ (e.g., which operator to select) and a reward function $R$ (e.g., how well did the selected operators perform).
Karimi-Mamaghan et al. (2023) uses Q-learning as operator selection mechanism in a large neighborhood search heuristic with $d$ destroy operators and a single repair operator.

from alns.

leonlan commented on July 17, 2024

On adaptive operation selection from Karimi Mamaghan (2022, p64-p65). For each step, I'll comment on how this could relate to our new implementation.

Performance criteria identification – Whenever an operator is selected and applied to a problem instance, a set of feedback information can be collected that represents the performance of the algorithm. This feedback can be different per- formance criteria such as Objective Function Value (OFV), Diversity of Solutions (DOS), CT, and Depth of the Local Optima (DLP).

Performance criteria should be collected by the selection mechanism itself and not by ALNS. So a selection scheme could keep a history of costs, diversity values, etc. This can be collected during SelectionScheme.update, when any internal state variables of the selection mechanism is updated and feedback is collected. Because the performance criteria mainly relate solutions, we should probably pass best, curr, cand to SelectionScheme.update.

Reward computation – Once the performance criteria are identified, this step computes how much the application of each operator improves/deteriorates the performance criteria.

Reward computation should also be managed by the selection mechanism. This is again something that should be done in SelectionScheme.update. It will again most likely be based on best, curr, cand, e.g., reward can be defined as objective function gain/decrease.

Credit assignment (CA) – In this step, a credit is assigned to an operator based on the rewards calculated in the reward computation step. There are different credit assignment methods including Score-based CA (SCA) [Pen+19], Q-Learning based CA (QLCA) [Wau+13], Compass CA (CCA) [Mat+09], Learning Automata based CA (LACA) [GLL18], Average CA (ACA) [Fia10], and Extreme Value-based CA (EVCA) [Fia10] presented in Table 5

We currently have a simple weight scheme that is updated based on the 4 outcomes: best, better, acceptance, rejection. This is now baked into ALNS, but it's probably better to let the selection mechanism deal with this. Not sure though, because the outcomes are inherent to ALNS as well.

I'm not entirely sure why reward computation and credit assignment are separated. But this is probably similar to how SegmentedRouletteWheel works: the rewards (i.e., outcomes) are gathered over a segment of iterations. But the credit (i.e., score) is only assigned once the segment has finished. I think this is also the case for Q-learning, which uses episodes (i.e., segments) and then updates the Q-table after each episode.

Selection – Once a credit has been assigned to each operator, AOS selects the operator to apply in the next iteration. Different selection methods including Ran- dom selection (RS), Max-Credit Selection (MCS), Roulette-wheel Selection (RWS) [GLL18], Probability Matching Selection (PMS) [Fia+08], Adaptive Pursuit Selec- tion (APS) [Fia+08], Soft-Max Selection (SMS) [GB17], Upper Confidence Bound Multi-Armed Bandit Selection (UCB-MABS) [Fia+08], Dynamic Multi-Armed Bandit Selection (D-MABS) [Mat+09], Epsilon Greedy Selection (EGS) [San+14], and Heuristic-based Selection (HS) [CWL18] are presented in Table 6.

This is precisely what SelectionScheme.__call__ should do: select the best destroy/repair operator based on operator credits. Currently, the selection is only based on the scores that are updated internally. But it would probably be useful to also pass-in the current state information best, curr. There is no candidate yet so that's not needed.

Essentially, a selection mechanism consists of two things: 1) keeping track of operator scores and 2) selecting operators.

ALNS interacts with a selection scheme in two ways: 1) ALNS provides relevant information (possible actions, current states) to the selection mechanism, which then returns the selected operator. 2) ALNS provide relevant information about the ruin-and-recreate iteration (current states, the outcome) and the selection mechanism gets updated but doesn't need to return anything else.

from alns.

leonlan commented on July 17, 2024

For SelectionScheme.__call__ to select operators, I like the current approach of using indices to determine which destroy/repair operator to select. So I think we only need to add best and curr as arguments to cover most things:

def __call__(self, best: State, curr: State, rnd: RandomState) -> Tuple[int, int]:

For SelectionScheme.__update__, we need the selected destroy/repair indices, the outcome index, and also best, curr, cand:

    def update(
        self,
        d_idx: int,
        r_idx: int,
        s_idx: int,
        best: State,
        curr: State,
        cand: State,
    ):

The reason why we still need the outcome index is because the selection schemes also interact with acceptance criteria. I also think that ALNS should still keep track of the outcomes, since it's a useful statistic to keep track of.

from alns.

N-Wouda commented on July 17, 2024

def call(self, best: State, curr: State, rnd: RandomState) -> Tuple[int, int]:

To match the acceptance and stopping criteria, I propose switching the argument order to (self, rnd, best, current). Otherwise looks good to me.

For SelectionScheme.__update__, we need [..] best, curr, cand:

I think we can limit this further to just the observed candidate solution: if needed, the others can be derived from s_idx and the previous __call__. Something like (self, cand, d_idx, r_idx, s_idx)?

from alns.

leonlan commented on July 17, 2024

To match the acceptance and stopping criteria, I propose switching the argument order to (self, rnd, best, current). Otherwise looks good to me.

Sounds good.

I think we can limit this further to just the observed candidate solution: if needed, the others can be derived from s_idx and the previous __call__. Something like (self, cand, d_idx, r_idx, s_idx)?

I'm not 100% convinced about this yet, but I'll try implementing this and see how it works. One benefit from providing all states is that it makes computing the objective difference (cand vs. best or cand vs curr) easier, just like we do in acceptance criteria.

Should we create an IntEnum for the evaluation outcome, and pass that to update, rather than a meaningless integer s_idx?

I'm thinking of
import enum

class Outcome(enum.IntEnum):
    BEST = 0
    BETTER = 1
    ACCEPTED = 2
    REJECTED = 3

Works for me!

from alns.

leonlan commented on July 17, 2024

To match the acceptance and stopping criteria, I propose switching the argument order to (self, rnd, best, current). Otherwise looks good to me.

Some function signatures take rnd as last positional argument (e.g., the operators, on_best), whereas others take it as the first one (e.g., the acceptance/stopping criteria). Should we make this consistent everywhere?

from alns.

N-Wouda commented on July 17, 2024

Some function signatures take rnd as last positional argument (e.g., the operators, on_best)

I'm leaning no. What we're changing now is mostly internal to the package: while other people can, in theory, write their own acceptance criteria or operator selection scheme, from what I have seen, very few people actually do.

Changing the signature of operators and callbacks, however, will directly impact users. We should probably not do that.

from alns.

leonlan commented on July 17, 2024

TODOS:

Rename weights to select, issue deprecation warnings for weights, rename simple weights to roulette wheel
Introduce new interface for selection schemes (as discussed above)
Update README and notebooks for new selection schemes interface

from alns.

N-Wouda commented on July 17, 2024

@leonlan can this issue now be closed? The notebooks are already OK, and the readme no longer explicitly links to anything (but points to the docs instead).

from alns.

Rename weight schemes about alns HOT 13 CLOSED

Comments (13)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent