Posted on 2020-05-26
by Oleg Grenrus

While reading about sorting networks in general for my previous blog post, I noticed *END* method being mentioned.

*Evolving non-determinism* is an optimization method. It was new for me. (It doesn't have own wikipedia page!) I have previously only really used simulated annealing for finding magic constants for 32-bit `splitmix`

variant. While SA was good fit for that problem, it's not easily applicable for all problems. END is similarly "the way as nature works" method, and also is also applicable for specific problems, such that:

- a solution consists of discrete
*steps*, there are no gradient so Hill-Climbing is not applicable - good first steps are important for the optimality of solution
- each
*partial solution*can be extended into*complete solution*

In other words, the methods works well when finding *a solution* is easy, but finding a rather good or optimal solution is *the problem*. It doesn't work for constraint solving problems where finding *a solution* is the problem in itself.

I decided to play with END. It was fun. In this blog post I will show some code, and then some example problems with results.

The method is described in great detail in the tech report *Evolving Non-Determinism: An Inventive and Efficient Tool for Optimization and Discovery of Strategies* by Hugues Juillé http://demo.cs.brandeis.edu/papers/END_tech_report.pdf

The report is from 1995. Surely I can replicate it on late 201x hardware. (Spoiler: not really, but close).

Luckily Haskell is great language for prototyping and modeling. Let us use it to understand how END works.

First the `Problem`

data type.

```
data Problem metric solution = Problem
{ fitness :: solution -> metric
, maximumFitness :: metric
, completeSolution :: SMGen -> solution -> solution
, initialSolution :: solution
, commitment :: Int -> solution -> solution
, initialCommitment :: Int
}
```

There are six fields:

`fitness`

function takes a`solution`

and returns some`metric`

. The END will then try to find global maximum (you can use`Down`

to search for minimums).`maximumFitness`

is one of termination criteria. If we find a solution with`fitness`

greater or equal, we stop early. This allows to search for "good enough" results.`completeSolution`

takes a possibly partial solution and completes it using non-determinism provided by random number generator.`initialSolution`

is — as name implies — an initial solution. It doesn't necessarily needs to be empty if we are interested in solutions starting with some known prefix. (Will see later that "prefix" is an abstract notion).`commitment`

is a function taking a*commitment degree*and a solution and truncating that solution to a given degree. For example*the first N steps*.`initialCommitment`

is then the initial commitment degree. It should be the "length" of`initialSolution`

plus one.

Then given these data algorithm will proceed as follows:

Generate the square grid with solutions starting from

`initialSolution`

using`completeSolution`

.Then for each cell on the grid, pick the best solution from a neighborhood based on

`fitness`

metric, and replace it with it. Thus best solutions will conquer the area.Truncate solutions with

`commitment`

using current degreeRegenerate the solutions, and go to step 2.

The algorithm works well when the initial step(s) are important for the good `fitness`

outcome. I decided to parameterize the algorithm parameters as type-level `Nat`

s:

```
data Config
(threads :: Nat)
(gridside :: Nat)
(threshold :: Nat)
(eralength :: Nat)
(maxsteps :: Nat)
(neighborhood :: Type -> Type)
= Config
```

`threads`

tells how many threads to use to run the algorithm (it's very parallel).`gridside`

is the length of the side of the grid (easier for calculations)`maxsteps`

is the absolute maximum of steps algorithm will perform`neighborhood`

describes how we define it. Whether we use max or Manhattan metric, and with which radius.- the remaining
`eralength`

and`threshold`

are related to configuring when commitment degree is*increased*.

When solutions are compared with each other, the better prefixes will start to dominate. However always truncating to one single step won't produce good outcomes: algorithm will start from almost the beginning. To narrow down the solution space we will increate the commitment degree, when the overall population is still diverse, but there are emerging dominating species. This way simple-1-step-prefix solutions become more complex 2-step-prefix solutions, simple "species" evolve into more complex "subspecies".

The `eralength`

parameter is fixed amount of steps after which the commitment degree will be increased. `threshold`

parameter is for *disorder* metric of the population: when population disorder goes below this threshold, we will increase the commitment degree.

The tech report doesn't describe how their disorder metric is calculated, but I think I figured it out.

Calculate amount of different solution-prefixes in the adjacent cells of the grid. Tech report seems to count double, but it is irrelevant. This reminds me of Ising model where the *energy* of configuration is calculated similarly. It's not an *entropy* but close enough for our purposes.

The figures below are generated using a very simple reference problem: The amount of *borders* is the disorder measure. It's minimum is zero when there is only single "species", and the maximum is the double of grid size. You can count the border around small islands of ones in the last (step 50) population drawn. There are 14 + 20.

Given numbers 0..9 arrange them so that there are maximum amount ofascents. This is very trivial problem, as the optimal answer is 0123456789 (and 987654321 is worst), but it is hard for generic algorithms: naive crossover is difficult, as the resulting solution might be invalid. Also algorithm easily runs into finding a sub optimal solution with a single descent like 1345890267

The code explains the details. Solution type is `[Int]`

and metric is `Int`

as.

```
type Solution = [Int]
type Metric = Int
```

The `Problem`

fields are defined as follows. Initial solution and commitment degree, and maximum fitness are trivial.

```
initialSolution = [] :: Solution
initialCommitment = 1 :: Int
maximumFitness = maxN :: Metric
```

Commitment is implemented by taking N elements from the end:

```
commitment :: Int -> Solution -> Solution
commitment n = reverse . take n . reverse
```

This is because we are using `(:)`

to append a value to the "end" of solution. The fitness functions calculates the ascents (with `>`

, because order is reversed).

```
fitness :: Solution -> Metric
fitness xs = length $ filter (uncurry (>)) $ pairs xs
pairs :: [a] -> [(a,a)]
pairs xs = zip xs (tail xs)
```

and the most interesting part is the completion of partial solutions. We calculate the set of remaining elements and randomly pick one adding it to the solution. (`bitmaskwithRejection64 n`

function returns a random number `0 <= x < n`

).

```
completeSolution :: SM.SMGen -> [Int] -> [Int]
completeSolution g0 xs0 =
go g0 (fullnodes `Set.difference` Set.fromList xs0)
where
go :: SM.SMGen -> Set.Set Int -> [Int]
go g rest
| Set.null rest = xs0
| otherwise =
let (w, g') = SM.bitmaskWithRejection64
(fromIntegral (Set.size rest)) g
x = Set.elemAt (fromIntegral w) rest
in x : go g' (Set.delete x rest)
```

The evolving non-determinism performs *very well*. After 25 steps the population is dominated with solution starting with 0 or 1. If we don't increase commitment decree slowly the zeros will dominate the whole grid.

Determining what is the good disorder threshold for your problem is tricky. I used 15-20% for the problems presented. At that stage there is still some diversity left and that threshold is usually achieved quickly.

**Step 2, Disorder 234 (45.7%)**

**Step 5, Disorder 186 (36.3%)**

**Step 25, Disorder 110 (21%)**

**Step 50, Disorder 34 (6.7%)**

Sorting networks is the second problem I tried to work with. I'm surely missed some low-level bit-trickery-hackery detail to make the evolving part of the problem *very fast*. But I got it fast enough to find the solutions below.

They are beautiful in own irregularity. Would you design such? (I also not sure if they are proper sorting networks, I haven't tried them).

**7 inputs, length 16, depth 6**

or

or

**8 inputs, length 19, depth 6**

The number eight asks for some regularity, it is after all. But no, random number generator disagrees:

or

or

Compare these to (boring and regular) bitonic sorting network with also depth 6, but using 24 comparators!

**9 inputs, length 25, depth 7**

Nine. Another odd number of inputs opens up the creativity of the machine: