# Evolving non-determinism

Posted on 2020-05-26 by Oleg Grenrus

While reading about sorting networks in general for my previous blog post, I noticed END method being mentioned.

Evolving non-determinism is an optimization method. It was new for me. (It doesn't have own wikipedia page!) I have previously only really used simulated annealing for finding magic constants for 32-bit splitmix variant. While SA was good fit for that problem, it's not easily applicable for all problems. END is similarly "the way as nature works" method, and also is also applicable for specific problems, such that:

• a solution consists of discrete steps, there are no gradient so Hill-Climbing is not applicable
• good first steps are important for the optimality of solution
• each partial solution can be extended into complete solution

In other words, the methods works well when finding a solution is easy, but finding a rather good or optimal solution is the problem. It doesn't work for constraint solving problems where finding a solution is the problem in itself.

I decided to play with END. It was fun. In this blog post I will show some code, and then some example problems with results.

The method is described in great detail in the tech report Evolving Non-Determinism: An Inventive and Efficient Tool for Optimization and Discovery of Strategies by Hugues Juillé http://demo.cs.brandeis.edu/papers/END_tech_report.pdf

The report is from 1995. Surely I can replicate it on late 201x hardware. (Spoiler: not really, but close).

Luckily Haskell is great language for prototyping and modeling. Let us use it to understand how END works.

First the Problem data type.

data Problem metric solution = Problem
{ fitness           :: solution -> metric
, maximumFitness    :: metric
, completeSolution  :: SMGen -> solution -> solution
, initialSolution   :: solution
, commitment        :: Int -> solution -> solution
, initialCommitment :: Int
}

There are six fields:

1. fitness function takes a solution and returns some metric. The END will then try to find global maximum (you can use Down to search for minimums).

2. maximumFitness is one of termination criteria. If we find a solution with fitness greater or equal, we stop early. This allows to search for "good enough" results.

3. completeSolution takes a possibly partial solution and completes it using non-determinism provided by random number generator.

4. initialSolution is — as name implies — an initial solution. It doesn't necessarily needs to be empty if we are interested in solutions starting with some known prefix. (Will see later that "prefix" is an abstract notion).

5. commitment is a function taking a commitment degree and a solution and truncating that solution to a given degree. For example the first N steps.

6. initialCommitment is then the initial commitment degree. It should be the "length" of initialSolution plus one.

Then given these data algorithm will proceed as follows:

1. Generate the square grid with solutions starting from initialSolution using completeSolution.

2. Then for each cell on the grid, pick the best solution from a neighborhood based on fitness metric, and replace it with it. Thus best solutions will conquer the area.

3. Truncate solutions with commitment using current degree

4. Regenerate the solutions, and go to step 2.

The algorithm works well when the initial step(s) are important for the good fitness outcome. I decided to parameterize the algorithm parameters as type-level Nats:

data Config
(gridside     :: Nat)
(threshold    :: Nat)
(eralength    :: Nat)
(maxsteps     :: Nat)
(neighborhood :: Type -> Type)
= Config
• threads tells how many threads to use to run the algorithm (it's very parallel).
• gridside is the length of the side of the grid (easier for calculations)
• maxsteps is the absolute maximum of steps algorithm will perform
• neighborhood describes how we define it. Whether we use max or Manhattan metric, and with which radius.
• the remaining eralength and threshold are related to configuring when commitment degree is increased.

When solutions are compared with each other, the better prefixes will start to dominate. However always truncating to one single step won't produce good outcomes: algorithm will start from almost the beginning. To narrow down the solution space we will increate the commitment degree, when the overall population is still diverse, but there are emerging dominating species. This way simple-1-step-prefix solutions become more complex 2-step-prefix solutions, simple "species" evolve into more complex "subspecies".

The eralength parameter is fixed amount of steps after which the commitment degree will be increased. threshold parameter is for disorder metric of the population: when population disorder goes below this threshold, we will increase the commitment degree.

## #Ramps and Disorder

The tech report doesn't describe how their disorder metric is calculated, but I think I figured it out.

Calculate amount of different solution-prefixes in the adjacent cells of the grid. Tech report seems to count double, but it is irrelevant. This reminds me of Ising model where the energy of configuration is calculated similarly. It's not an entropy but close enough for our purposes.

The figures below are generated using a very simple reference problem: The amount of borders is the disorder measure. It's minimum is zero when there is only single "species", and the maximum is the double of grid size. You can count the border around small islands of ones in the last (step 50) population drawn. There are 14 + 20.

Given numbers 0..9 arrange them so that there are maximum amount of ascents. This is very trivial problem, as the optimal answer is 0123456789 (and 987654321 is worst), but it is hard for generic algorithms: naive crossover is difficult, as the resulting solution might be invalid. Also algorithm easily runs into finding a sub optimal solution with a single descent like 1345890267

The code explains the details. Solution type is [Int] and metric is Int as.

type Solution = [Int]
type Metric   = Int

The Problem fields are defined as follows. Initial solution and commitment degree, and maximum fitness are trivial.

initialSolution   = []    :: Solution
initialCommitment = 1     :: Int
maximumFitness    = maxN  :: Metric

Commitment is implemented by taking N elements from the end:

commitment        :: Int -> Solution -> Solution
commitment n      = reverse . take n . reverse

This is because we are using (:) to append a value to the "end" of solution. The fitness functions calculates the ascents (with >, because order is reversed).

fitness          :: Solution -> Metric
fitness xs       = length $filter (uncurry (>))$ pairs xs

pairs :: [a] -> [(a,a)]
pairs xs = zip xs (tail xs)

and the most interesting part is the completion of partial solutions. We calculate the set of remaining elements and randomly pick one adding it to the solution. (bitmaskwithRejection64 n function returns a random number 0 <= x < n).

completeSolution :: SM.SMGen -> [Int] -> [Int]
completeSolution g0 xs0 =
go g0 (fullnodes Set.difference Set.fromList xs0)
where
go :: SM.SMGen -> Set.Set Int -> [Int]
go g rest
| Set.null rest = xs0
| otherwise     =
(fromIntegral (Set.size rest)) g
x       = Set.elemAt (fromIntegral w) rest
in x : go g' (Set.delete x rest)

The evolving non-determinism performs very well. After 25 steps the population is dominated with solution starting with 0 or 1. If we don't increase commitment decree slowly the zeros will dominate the whole grid.

Determining what is the good disorder threshold for your problem is tricky. I used 15-20% for the problems presented. At that stage there is still some diversity left and that threshold is usually achieved quickly.

Step 2, Disorder 234 (45.7%)

Step 5, Disorder 186 (36.3%)

Step 25, Disorder 110 (21%)

Step 50, Disorder 34 (6.7%)

## #Sorting networks

Sorting networks is the second problem I tried to work with. I'm surely missed some low-level bit-trickery-hackery detail to make the evolving part of the problem very fast. But I got it fast enough to find the solutions below.

They are beautiful in own irregularity. Would you design such? (I also not sure if they are proper sorting networks, I haven't tried them).

7 inputs, length 16, depth 6

or

or

8 inputs, length 19, depth 6

The number eight asks for some regularity, it is after all. But no, random number generator disagrees:

or

or

Compare these to (boring and regular) bitonic sorting network with also depth 6, but using 24 comparators!

9 inputs, length 25, depth 7

Nine. Another odd number of inputs opens up the creativity of the machine: