Code Monkey home page Code Monkey logo

Comments (13)

N-Wouda avatar N-Wouda commented on July 17, 2024 1

Should we create an IntEnum for the evaluation outcome, and pass that to update, rather than a meaningless integer s_idx?

I'm thinking of

import enum

class Outcome(enum.IntEnum):
    BEST = 0
    BETTER = 1
    ACCEPTED = 2
    REJECTED = 3

from alns.

N-Wouda avatar N-Wouda commented on July 17, 2024

If/when we do this, we should probably also implement more options from the literature. That would also be a good check on whether the interface for adaptive schemes is sufficiently general.

from alns.

leonlan avatar leonlan commented on July 17, 2024

Stochastic universal sampling is an alternative to the roulette wheel mechanism. It's used in one ALNS paper (Chowdhury et al. (2019).

from alns.

leonlan avatar leonlan commented on July 17, 2024

Reinforcement learning (Sutton and Barto (2018)) could provide a framework for adaptive operator selection. For example mutli-armed bandit, which is already mentioned by #73.

from alns.

leonlan avatar leonlan commented on July 17, 2024

The dissertation by Karimi Mamaghan (2022) may be helpful for our changes related to the adaptive operation selection in ALNS.

  • In Section 3.5.1, the adaptive operator selection problem is described as a sequence of five steps: 1) performance criteria identification, 2) reward computation, 3) credit assignment, 4) selection, 5) move acceptance. Step 5) is already covered by our implementation of acceptance criteria.
  • Q-learning is the mechanism employed for operator selection. It is described in Section 4.1. The main idea is that we have a set of states $S$ (e.g., did we improve last time or not), a set of actions $A$ (e.g., which operator to select) and a reward function $R$ (e.g., how well did the selected operators perform).
  • Karimi-Mamaghan et al. (2023) uses Q-learning as operator selection mechanism in a large neighborhood search heuristic with $d$ destroy operators and a single repair operator.

from alns.

leonlan avatar leonlan commented on July 17, 2024

On adaptive operation selection from Karimi Mamaghan (2022, p64-p65). For each step, I'll comment on how this could relate to our new implementation.

Performance criteria identification – Whenever an operator is selected and applied to a problem instance, a set of feedback information can be collected that represents the performance of the algorithm. This feedback can be different per- formance criteria such as Objective Function Value (OFV), Diversity of Solutions (DOS), CT, and Depth of the Local Optima (DLP).

Performance criteria should be collected by the selection mechanism itself and not by ALNS. So a selection scheme could keep a history of costs, diversity values, etc. This can be collected during SelectionScheme.update, when any internal state variables of the selection mechanism is updated and feedback is collected. Because the performance criteria mainly relate solutions, we should probably pass best, curr, cand to SelectionScheme.update.

Reward computation – Once the performance criteria are identified, this step computes how much the application of each operator improves/deteriorates the performance criteria.

Reward computation should also be managed by the selection mechanism. This is again something that should be done in SelectionScheme.update. It will again most likely be based on best, curr, cand, e.g., reward can be defined as objective function gain/decrease.

Credit assignment (CA) – In this step, a credit is assigned to an operator based on the rewards calculated in the reward computation step. There are different credit assignment methods including Score-based CA (SCA) [Pen+19], Q-Learning based CA (QLCA) [Wau+13], Compass CA (CCA) [Mat+09], Learning Automata based CA (LACA) [GLL18], Average CA (ACA) [Fia10], and Extreme Value-based CA (EVCA) [Fia10] presented in Table 5

We currently have a simple weight scheme that is updated based on the 4 outcomes: best, better, acceptance, rejection. This is now baked into ALNS, but it's probably better to let the selection mechanism deal with this. Not sure though, because the outcomes are inherent to ALNS as well.

I'm not entirely sure why reward computation and credit assignment are separated. But this is probably similar to how SegmentedRouletteWheel works: the rewards (i.e., outcomes) are gathered over a segment of iterations. But the credit (i.e., score) is only assigned once the segment has finished. I think this is also the case for Q-learning, which uses episodes (i.e., segments) and then updates the Q-table after each episode.

Selection – Once a credit has been assigned to each operator, AOS selects the operator to apply in the next iteration. Different selection methods including Ran- dom selection (RS), Max-Credit Selection (MCS), Roulette-wheel Selection (RWS) [GLL18], Probability Matching Selection (PMS) [Fia+08], Adaptive Pursuit Selec- tion (APS) [Fia+08], Soft-Max Selection (SMS) [GB17], Upper Confidence Bound Multi-Armed Bandit Selection (UCB-MABS) [Fia+08], Dynamic Multi-Armed Bandit Selection (D-MABS) [Mat+09], Epsilon Greedy Selection (EGS) [San+14], and Heuristic-based Selection (HS) [CWL18] are presented in Table 6.

This is precisely what SelectionScheme.__call__ should do: select the best destroy/repair operator based on operator credits. Currently, the selection is only based on the scores that are updated internally. But it would probably be useful to also pass-in the current state information best, curr. There is no candidate yet so that's not needed.


Essentially, a selection mechanism consists of two things: 1) keeping track of operator scores and 2) selecting operators.

ALNS interacts with a selection scheme in two ways: 1) ALNS provides relevant information (possible actions, current states) to the selection mechanism, which then returns the selected operator. 2) ALNS provide relevant information about the ruin-and-recreate iteration (current states, the outcome) and the selection mechanism gets updated but doesn't need to return anything else.

from alns.

leonlan avatar leonlan commented on July 17, 2024

For SelectionScheme.__call__ to select operators, I like the current approach of using indices to determine which destroy/repair operator to select. So I think we only need to add best and curr as arguments to cover most things:

def __call__(self, best: State, curr: State, rnd: RandomState) -> Tuple[int, int]:

For SelectionScheme.__update__, we need the selected destroy/repair indices, the outcome index, and also best, curr, cand:

    def update(
        self,
        d_idx: int,
        r_idx: int,
        s_idx: int,
        best: State,
        curr: State,
        cand: State,
    ):

The reason why we still need the outcome index is because the selection schemes also interact with acceptance criteria. I also think that ALNS should still keep track of the outcomes, since it's a useful statistic to keep track of.

from alns.

N-Wouda avatar N-Wouda commented on July 17, 2024

def call(self, best: State, curr: State, rnd: RandomState) -> Tuple[int, int]:

To match the acceptance and stopping criteria, I propose switching the argument order to (self, rnd, best, current). Otherwise looks good to me.

For SelectionScheme.__update__, we need [..] best, curr, cand:

I think we can limit this further to just the observed candidate solution: if needed, the others can be derived from s_idx and the previous __call__. Something like (self, cand, d_idx, r_idx, s_idx)?

from alns.

leonlan avatar leonlan commented on July 17, 2024

To match the acceptance and stopping criteria, I propose switching the argument order to (self, rnd, best, current). Otherwise looks good to me.

Sounds good.

I think we can limit this further to just the observed candidate solution: if needed, the others can be derived from s_idx and the previous __call__. Something like (self, cand, d_idx, r_idx, s_idx)?

I'm not 100% convinced about this yet, but I'll try implementing this and see how it works. One benefit from providing all states is that it makes computing the objective difference (cand vs. best or cand vs curr) easier, just like we do in acceptance criteria.

Should we create an IntEnum for the evaluation outcome, and pass that to update, rather than a meaningless integer s_idx?

I'm thinking of

import enum

class Outcome(enum.IntEnum):
    BEST = 0
    BETTER = 1
    ACCEPTED = 2
    REJECTED = 3

Works for me!

from alns.

leonlan avatar leonlan commented on July 17, 2024

To match the acceptance and stopping criteria, I propose switching the argument order to (self, rnd, best, current). Otherwise looks good to me.

Some function signatures take rnd as last positional argument (e.g., the operators, on_best), whereas others take it as the first one (e.g., the acceptance/stopping criteria). Should we make this consistent everywhere?

from alns.

N-Wouda avatar N-Wouda commented on July 17, 2024

Some function signatures take rnd as last positional argument (e.g., the operators, on_best)

I'm leaning no. What we're changing now is mostly internal to the package: while other people can, in theory, write their own acceptance criteria or operator selection scheme, from what I have seen, very few people actually do.

Changing the signature of operators and callbacks, however, will directly impact users. We should probably not do that.

from alns.

leonlan avatar leonlan commented on July 17, 2024

TODOS:

  • Rename weights to select, issue deprecation warnings for weights, rename simple weights to roulette wheel
  • Introduce new interface for selection schemes (as discussed above)
  • Update README and notebooks for new selection schemes interface

from alns.

N-Wouda avatar N-Wouda commented on July 17, 2024

@leonlan can this issue now be closed? The notebooks are already OK, and the readme no longer explicitly links to anything (but points to the docs instead).

from alns.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.