Interactive Simulated Annealing
Posted on April 7, 2025The Problem
There exist many problems where finding the local, and in some cases, global minima of a function is desirable - most commonly in optimization problems such as the traveling salesman problem (TSP). Oftentimes, however, computing this minima often requires traversing the entire state-space - this can be extremely difficultread NP-Hard
to do efficiently. However, for some classes of problem an approximate solution often suffices. How might we find an approximate solution?
The Metropolis Algorithm
Let us take a break from finding global minima and focus on a closely related problem. Say we wish for a method to sample from some arbitrary probability distribution, \(p(x)\). In many cases we know the probability \(p(x)\), but not how to sample some \(x\) from \(p(x)\). For a (trivial) example, let \[p(x)= \frac{3}{8} e^{-x^2} + \frac{1}{4} e^{-(x - 4)^2} + \frac{1}{4} e^{-(x + 2)^2}\] This is the thinking behind the Metropolis algorithmMetropolis, Rosenbluth, Rosenbluth, Teller, Teller (1953)
, originally proposed in 1953, though many other variations exist.
The idea is define a markov chain over possible values of \(p(x)\) such that we asymptotically reach a unique stationary distribution \(p(x)\).
We start with the detailed balance representing this conditioncite this?
\[
\begin{align*}
p(x' | x)\ p(x) = p(x | x')\ p(x')
\end{align*}
\]
\[
\frac{p(x' | x)}{p(x | x')} =\frac{p(x')}{p(x)}
\]
We then split this into the proposal and acceptance / rejection, where \(q(x' | x)\) represents the conditional probability of proposing \(x'\) over \(x\) and \(A(x',x)\) is the probability of accepting said proposed state
\[
p(x' | x) = q(x' | x)\ A(x',x)
\]
Thus
\[
A(x',x)=\min\left( 1, \frac{p(x')}{p(x)} \times \frac{q(x | x')}{q(x' | x)} \right)
\]
As an algorithm, it runs in couple relatively simple steps:
- Pick some initial state \(x_0\) (typically random), let \(t=0\);
- Generate a random candidate \(x'\) according to \(q(x'|x_t)\);
- Calculate the acceptance probability \(A(x',x_t)\).
- Accept the state if \(random(0,1) \leq A(x',x_t)\) and set \(x_{t+1}=x'\), otherwise reject it (\(x_{t+1}=x_t\));
- increment \(t\).
Current position: 0
Accepted moves: 0 / 0 (0%)
Samples collected: 0
Accepted moves: 0 / 0 (0%)
Samples collected: 0
Simulated Annealing
It should now hopefully be more clear as to how we might approach a local minimization problem like the one presented in the beginning. We now have some context to pursue an approach to simulated annealing. Simulated annealing was initially devised to solve the physical problem of (you guessed it) annealing of metals! As such, it revolves around the idea that heating and cooling affect both the temperature and free energy of a given system. We can derive this by modifying our metropolis algorithm in two key ways - introduce a temperature \(T\), and a cooling schedule which decreases \(T\), typically exponentiallythough other examples of cooling schedules exist, they may not be optimal
.
- introduce an energy function \(E(x)\) which we wish to optimize, in place of the probability function, where \(p(x) \propto e^{-E(x)/T}\), which aligns with the boltzmann distribution.
Then
\[
A(x',x)= \min \left( 1,\frac{e^{-E(x')/T}}{e^{-E(x)/T}} \right) = \min(1,e^{−(E(x′)−E(x))/T})
\]
which is exactly the acceptance probability for simulated annealing. Here temperature serves as an analog for risk - at high temperatures, the algorithm is more likely to accept an uphill move to a higher energy state. As the temperature decreases, the algorithm becomes less and less willing to take an uphill move in “hopes” that we will find a new global minima beyond it, reducing to hill-climbing as \(T\to 0\). It also turned out that this is a relatively good(barring some objections)
way to minimize some arbitrary function - which is really nice if we happen to have a problem which is difficult to compute exactly.
The algorithm for simulated annealing is quite similar to what we’ve already seen with the metropolis algorithm (shortened for simplicity here)
- Pick some initial state \(x\ \leftarrow x_0\) (typically random);
- For \(k=0\) through to \(k_{max}\),
- \(T\ \leftarrow\ schedule(k)\)
- \(x'=neighbor(x)\)
- if \(A(x,x') > random(0,1)\)
- \(x\ \leftarrow\ x'\)
- Return \(x\)
Current position: 0
Temperature: 1.0
Accepted moves: 0 / 0 (0%)
Temperature: 1.0
Accepted moves: 0 / 0 (0%)
References
METROPOLIS, Nicholas, ROSENBLUTH, Arianna W, ROSENBLUTH, Marshall N, TELLER, Augusta H and TELLER, Edward, 1953. Equation of state calculations by fast computing machines. The journal of chemical physics. 1953. Vol. 21, no. 6, p. 1087–1092.