Comments (13)
Should we create an IntEnum for the evaluation outcome, and pass that to update
, rather than a meaningless integer s_idx
?
I'm thinking of
import enum
class Outcome(enum.IntEnum):
BEST = 0
BETTER = 1
ACCEPTED = 2
REJECTED = 3
from alns.
If/when we do this, we should probably also implement more options from the literature. That would also be a good check on whether the interface for adaptive schemes is sufficiently general.
from alns.
Stochastic universal sampling is an alternative to the roulette wheel mechanism. It's used in one ALNS paper (Chowdhury et al. (2019).
from alns.
Reinforcement learning (Sutton and Barto (2018)) could provide a framework for adaptive operator selection. For example mutli-armed bandit, which is already mentioned by #73.
from alns.
The dissertation by Karimi Mamaghan (2022) may be helpful for our changes related to the adaptive operation selection in ALNS.
- In Section 3.5.1, the adaptive operator selection problem is described as a sequence of five steps: 1) performance criteria identification, 2) reward computation, 3) credit assignment, 4) selection, 5) move acceptance. Step 5) is already covered by our implementation of acceptance criteria.
- Q-learning is the mechanism employed for operator selection. It is described in Section 4.1. The main idea is that we have a set of states
$S$ (e.g., did we improve last time or not), a set of actions$A$ (e.g., which operator to select) and a reward function$R$ (e.g., how well did the selected operators perform). -
Karimi-Mamaghan et al. (2023) uses Q-learning as operator selection mechanism in a large neighborhood search heuristic with
$d$ destroy operators and a single repair operator.
from alns.
On adaptive operation selection from Karimi Mamaghan (2022, p64-p65). For each step, I'll comment on how this could relate to our new implementation.
Performance criteria identification – Whenever an operator is selected and applied to a problem instance, a set of feedback information can be collected that represents the performance of the algorithm. This feedback can be different per- formance criteria such as Objective Function Value (OFV), Diversity of Solutions (DOS), CT, and Depth of the Local Optima (DLP).
Performance criteria should be collected by the selection mechanism itself and not by ALNS. So a selection scheme could keep a history of costs, diversity values, etc. This can be collected during SelectionScheme.update
, when any internal state variables of the selection mechanism is updated and feedback is collected. Because the performance criteria mainly relate solutions, we should probably pass best
, curr
, cand
to SelectionScheme.update
.
Reward computation – Once the performance criteria are identified, this step computes how much the application of each operator improves/deteriorates the performance criteria.
Reward computation should also be managed by the selection mechanism. This is again something that should be done in SelectionScheme.update
. It will again most likely be based on best
, curr
, cand
, e.g., reward can be defined as objective function gain/decrease.
Credit assignment (CA) – In this step, a credit is assigned to an operator based on the rewards calculated in the reward computation step. There are different credit assignment methods including Score-based CA (SCA) [Pen+19], Q-Learning based CA (QLCA) [Wau+13], Compass CA (CCA) [Mat+09], Learning Automata based CA (LACA) [GLL18], Average CA (ACA) [Fia10], and Extreme Value-based CA (EVCA) [Fia10] presented in Table 5
We currently have a simple weight scheme that is updated based on the 4 outcomes: best, better, acceptance, rejection. This is now baked into ALNS, but it's probably better to let the selection mechanism deal with this. Not sure though, because the outcomes are inherent to ALNS as well.
I'm not entirely sure why reward computation and credit assignment are separated. But this is probably similar to how SegmentedRouletteWheel works: the rewards (i.e., outcomes) are gathered over a segment of iterations. But the credit (i.e., score) is only assigned once the segment has finished. I think this is also the case for Q-learning, which uses episodes (i.e., segments) and then updates the Q-table after each episode.
Selection – Once a credit has been assigned to each operator, AOS selects the operator to apply in the next iteration. Different selection methods including Ran- dom selection (RS), Max-Credit Selection (MCS), Roulette-wheel Selection (RWS) [GLL18], Probability Matching Selection (PMS) [Fia+08], Adaptive Pursuit Selec- tion (APS) [Fia+08], Soft-Max Selection (SMS) [GB17], Upper Confidence Bound Multi-Armed Bandit Selection (UCB-MABS) [Fia+08], Dynamic Multi-Armed Bandit Selection (D-MABS) [Mat+09], Epsilon Greedy Selection (EGS) [San+14], and Heuristic-based Selection (HS) [CWL18] are presented in Table 6.
This is precisely what SelectionScheme.__call__
should do: select the best destroy/repair operator based on operator credits. Currently, the selection is only based on the scores that are updated internally. But it would probably be useful to also pass-in the current state information best
, curr
. There is no candidate yet so that's not needed.
Essentially, a selection mechanism consists of two things: 1) keeping track of operator scores and 2) selecting operators.
ALNS interacts with a selection scheme in two ways: 1) ALNS provides relevant information (possible actions, current states) to the selection mechanism, which then returns the selected operator. 2) ALNS provide relevant information about the ruin-and-recreate iteration (current states, the outcome) and the selection mechanism gets updated but doesn't need to return anything else.
from alns.
For SelectionScheme.__call__
to select operators, I like the current approach of using indices to determine which destroy/repair operator to select. So I think we only need to add best
and curr
as arguments to cover most things:
def __call__(self, best: State, curr: State, rnd: RandomState) -> Tuple[int, int]:
For SelectionScheme.__update__
, we need the selected destroy/repair indices, the outcome index, and also best
, curr
, cand
:
def update(
self,
d_idx: int,
r_idx: int,
s_idx: int,
best: State,
curr: State,
cand: State,
):
The reason why we still need the outcome index is because the selection schemes also interact with acceptance criteria. I also think that ALNS should still keep track of the outcomes, since it's a useful statistic to keep track of.
from alns.
def call(self, best: State, curr: State, rnd: RandomState) -> Tuple[int, int]:
To match the acceptance and stopping criteria, I propose switching the argument order to (self, rnd, best, current)
. Otherwise looks good to me.
For
SelectionScheme.__update__
, we need [..]best
,curr
,cand
:
I think we can limit this further to just the observed candidate solution: if needed, the others can be derived from s_idx
and the previous __call__
. Something like (self, cand, d_idx, r_idx, s_idx)
?
from alns.
To match the acceptance and stopping criteria, I propose switching the argument order to
(self, rnd, best, current)
. Otherwise looks good to me.
Sounds good.
I think we can limit this further to just the observed candidate solution: if needed, the others can be derived from
s_idx
and the previous__call__
. Something like(self, cand, d_idx, r_idx, s_idx)
?
I'm not 100% convinced about this yet, but I'll try implementing this and see how it works. One benefit from providing all states is that it makes computing the objective difference (cand vs. best or cand vs curr) easier, just like we do in acceptance criteria.
Should we create an IntEnum for the evaluation outcome, and pass that to
update
, rather than a meaningless integers_idx
?I'm thinking of
import enum class Outcome(enum.IntEnum): BEST = 0 BETTER = 1 ACCEPTED = 2 REJECTED = 3
Works for me!
from alns.
To match the acceptance and stopping criteria, I propose switching the argument order to
(self, rnd, best, current)
. Otherwise looks good to me.
Some function signatures take rnd
as last positional argument (e.g., the operators, on_best), whereas others take it as the first one (e.g., the acceptance/stopping criteria). Should we make this consistent everywhere?
from alns.
Some function signatures take rnd as last positional argument (e.g., the operators, on_best)
I'm leaning no. What we're changing now is mostly internal to the package: while other people can, in theory, write their own acceptance criteria or operator selection scheme, from what I have seen, very few people actually do.
Changing the signature of operators and callbacks, however, will directly impact users. We should probably not do that.
from alns.
TODOS:
- Rename weights to select, issue deprecation warnings for weights, rename simple weights to roulette wheel
- Introduce new interface for selection schemes (as discussed above)
- Update README and notebooks for new selection schemes interface
from alns.
@leonlan can this issue now be closed? The notebooks are already OK, and the readme no longer explicitly links to anything (but points to the docs instead).
from alns.
Related Issues (20)
- Simple, cookie-cutter template to get started
- Integrate with MABWiser library HOT 2
- Integration with optimization modeling/solver software HOT 8
- Some quetions for using ALNS HOT 5
- Adaptive simulated annealing temperature and step-size HOT 4
- what's your alns.weights HOT 1
- Update docs
- Tasklist for next release HOT 2
- About other VRP destroy and repair operators HOT 3
- Bug in `examples/resource_constrained_project_scheduling_problem.ipynb:random_insert`?
- Circle bin packing with ALNS HOT 21
- Run example notebooks in CI HOT 1
- Deprecate AlphaUCB?
- How to solve vehicle routing problem with multiple capacities? HOT 5
- Documentation improvements HOT 2
- RCPSP示例中的“instance = ProblemData.read_instance('data/j9041_6.sm')”的数据在哪里 HOT 1
- Where is the data for "instance = ProblemData.read_instance('data/j9041_6.sm')" in the RCPSP example HOT 3
- Run w/o MABwiser in the CI
- Rename RandomWalk and WorseAccept? HOT 2
- Replace `RandomState` with `Generator` HOT 2
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from alns.