Scope


In this article, we'll see some of the basics of genetic algorithms, i.e. what they are, what they're good for, why we call them "genetic", and so on. This won't just be theory. We'll see how a simple genetic algorithm could be implemented in Visual Basic .NET. Source code is provided to and further experimentation encouraged. 

Introduction


A genetic algorithm could be seen as a component in the artificial intelligence field. They are ultimately a search technique using a metaheuristic approach to attain a particular goal - searching for a solution to a problem. We use the term "genetic" because, in order to solve a problem, we will use procedures inspired by the process of natural selection. Darwin meets computing. Similar in some ways to the iterative development approach except with code altering itself. Finding an ideal solution to a problem in one step is very difficult. This approach creates code which can "evolve" towards our goal, finding it's way by itself (well, almost).

Our natural world


In his «Principles of Biology», Herbert Spencer coined the expression "survival of the fittest", as a synonym for "most well adapted to the current environment". Darwin, in his «Origin of Species» used Spencer's quote alongside the concept of "natural selection", which could be intended as "better designed for an immediate, local environment". In any environment, the environment itself sets the standards to determine who is "fit".

For example, in a place with strong predators, a fundamental requirement to survive could be the ability to run away with great speed, or fight back with proficiency. In such a world, a fast runner or a strong fighter will be candidates for longer/better survival than those unprovided with those abilities. We could say the world poses a question like «how you will deal with those predators?», and those who can find a good solution to the riddle will be able to survive (that is an over-simplification, but we aren't biologists, after all!).

We can identify some parameters: a world to live in (or "a problem to solve"), a population that will face the world's challenges (or, individuals that will try to solve the problems posed by their natural conditions), a story regarding individuals which will pass on their experiences, their successful traits (and unsuccessful, too), their DNA, hoping their successors will grow better and better.

In computer science, we could take inspiration from biological approach to solve problems. Nature doesn't brute force solutions, it creates a multitude of possible solutions through recombinations of previous answers. Again, it's an over-simplification because in the real world challenges modify themselves, too, but the basis of the approach are still valid in a digital world. GAs (genetic algorithms) inherits some biology terms. We could identify: chromosome/individual, as a single solution to a given problem; population, as the array of possible solutions; gene, or a part (byte) of an individual; fitness, or the degree of adequacy of a certain solution; crossover/inbreeding, as the recombinations of solutions into new ones; mutation, or the random change which could occur into a solution.

Genetic algorithm overall

Let's consider a simple problem: we have a numeric target, let's say the number 5. This number can be seen as our world's challenge (or the solution domain). In our world we'll have inhabitants, and let's say the population is composed by a certain amount of mathematical operations.
We take for example 100 random operations for start. Each operation must confront itself with the challenge, trying to solve it. This match is called "fitness". Fitness could be defined, as we saw above regarding to biological beings, as the rate of adequacy in a given solution domain.

For example, the operation "0 + 0", will have a low fitness, because it cannot produce a value near to our target, but "0 + 5" will be a perfect valid answer. We must proceed in evaluating every member of our population, calculating fitness for each of them. Through this process, we'll find the best (or "fittest") answers to our problems. At this point, we take some pairs of individuals amongst the best, "inbreeding" them to produce new elements that share characteristics from both parents, and repeat the process for our entire population (if we wish to maintain a specific number of elements, at the end of the process we must discard a number of the worst elements which equals the number of new elements created).

A new generation of solutions is born - nearer to our target, we hope - awaiting to be tested against our problem. Rinse and repeat until valid solutions are born. Living beings pass on their DNA, their genetic code, to their sons and daughters. In computer science, everything is composed by bytes, and we could consider them like a kind of digital DNA, a representation of what an individual is. In the above example, we could represent an operation using six bytes: the first and the last two to represent numbers between 0 and 255 (from 00 to FF), while the two bytes between them can represent the operator (for example 00 = +, 01 = -, etc.).

Through the breeding phase, we extract a part of those bytes from a parent, and a part from the other, and combining it we will have a new item. Like in a real environment, we could think of introducing a sort of "mutation", which can alter in some ways the genes passed on. In other words, GAs are about finding the best path to solve a particular problem, creating better and better solutions towards it. For further specific information, check the Bibliography at the end of the article.

A practical implementation

Words are good, but when it comes to programming, are pretty cheap, and a piece of code if worth thousands words. So, let's see how we can profit from GAs in a practical way. In the following code, we'll discuss a graphical problem: given a target icon, and starting from a population composed by images made by random bytes, trying to "evolve" our random pixels to the target, hopefully reaching it completely.

Like i said in the opening, we'll use Visual Basic for this. The final user must be able to select an ICO file of 16x16 pixels, and the program will generate 100 random icons. For each of those icons, we will evaluate their fitness. This will be trivial: since an icon is an array of bytes, we'll simply match our target bytes with those of each population's element. The more they are alike, the better will be our image. For each generation of solutions, i will extract 4 element to inbreed, generating 4 new brand individuals, in which their parent's byte are reproduced, discarding the 4 worst element to maintain a constant populations of 100 elements.

Genetic image class

Our "genetic image" class can be written like this:

Public Class GAImage
 
    Const _FIXED_BYTES As Integer = 42
 
    Dim _targetBytes() As Byte
 
    Dim _personalBytes() As Byte
    Dim _isMutable() As Boolean
 
    Public ReadOnly Property FixedBytes As Integer
        Get
            Return _FIXED_BYTES
        End Get
    End Property
 
    Public Property PersonalBytes As Byte()
        Get
            Return _personalBytes
        End Get
        Set(value As Byte())
            _personalBytes = value
        End Set
    End Property
 
    Public ReadOnly Property Fitness As Single
        Get
            Dim _fitness As Single = 0
            For ii As Integer = 0 To _targetBytes.Length - 1
                If _targetBytes(ii) = _personalBytes(ii) Then
                    _fitness += 1
                    _isMutable(ii) = False
                End If
 
            Next
            Return _fitness
        End Get
    End Property
 
    Public ReadOnly Property FitnessPercent As Single
        Get
            Return (Me.Fitness * 100 / _targetBytes.Length).ToString("000.00")
        End Get
    End Property
 
    Public Function isByteMutable(numByte As Integer) As Boolean
        Return _isMutable(numByte)
    End Function
 
    Public Sub New(_target() As Byte, _code() As Byte, Optional randomInit As Boolean = True)
        ReDim _personalBytes(_target.Length - 1)
        ReDim _targetBytes(_target.Length - 1)
        _targetBytes = _target
 
        ReDim _isMutable(_target.Length - 1)
        For ii As Integer = 0 To _isMutable.Length - 1
            _isMutable(ii) = True
        Next
 
        For ii As Integer = 0 To _FIXED_BYTES
            _personalBytes(ii) = _targetBytes(ii)
        Next
 
        If Not randomInit Then Exit Sub
 
        Randomize()
        For ii As Integer = _FIXED_BYTES + 1 To _targetBytes.Length - 1
            _personalBytes(ii) = _code(Int(Rnd() * (_code.Length - 1)))
        Next
    End Sub
 
End Class

Its constructor asks for a copy of target's bytes, a genetic pool to extract from, and a flag to determine if a random initialization must take place (we'll see why in few lines). The class exposes some properties: PersonalBytes allows access to the "genetic code" of our individual, returning an array of its bytes; Fitness calculates how much the current image and the target are alike.

With the help of _isMutable array, we will set if a particular byte is resistant to mutation (miming a solid genetic acquired trait); FitnessPercent simply expose the precious value in a percentual format; FixedBytes has a technical-only utility: since the first 42 bytes represent the icon's header, those must be mainlined from target to individuals, in order to produce a set of valid icons. In that class we have all it takes to produce full-fledged genetic icons.

We need an environment in which our play could take place. A simple form in which observe how our generations changes will do.

The playground

Let's take a look at the final outlook our Form will have:



The button "Choose target" allows the user to choose an icon, which will be the image we want to reach evolving our solutions. One hundred PictureBox are dynamically created to contain the output of each GAImage. Random GAImages are generated, and we can see none of them has apparent correlation with our target (lots of chaos in them!). Still, evaluating them, we could identify the better samples, and start our loop of breeding.

Let's see a typical execution, to discuss later each part of it:



I will skip entirely the parts related to control's creation or use, which the reader could see in the source code. What will be discussed here is related exclusively to the aspects of GA. Most of the code below will be incomplete for the sake of comprehension. I suggest to download the source code to have the most possible complete overview.

Create a start solutions set

In opening our target image, we could access to its bytes. The bytes of the target image will be our ideal genetic code, and each byte represent an element of the genetic pool each created element draws by for its personal DNA.

Private Sub LoadImage(path As String)
    Dim b As New Icon(path)
    pbOriginal.Image = b.ToBitmap
 
    Dim ms As New MemoryStream
    b.Save(ms)
    ReDim _targetBytes(ms.ToArray.Length - 1)
    _targetBytes = ms.ToArray
 
    _GACode = New List(Of Byte)
    Dim _tmpCode As New List(Of Byte)
    _tmpCode.AddRange(_targetBytes)
    _GACode.AddRange(From bt As Byte In _tmpCode Distinct Select bt)
End Sub


The target icon is written in the pbOriginal PictureBox. After that, the image's bytes are accessed through a MemoryStream and written in the _targetBytes array. From that array, we select distinct values to populate a list of possible "genes" to be used when creating new icons. Think about this as our genetic code: DNA is made from adenine, guanine, cytosine, thymine. Our icons genetic code must follow a non completely random path: is the original set doesn't contain the byte AF, for example, our starting byte pool won't contain it either (i.e. it will be impossible to generate a specimen that contains a byte which is missed in the "standard" pool).

Private Sub InitializePopulation()
    _GAPopulation = New List(Of GAImage)
 
    For ii As Integer = 0 To _SAMPLE_NUMBER - 1
        Dim _ga As New GAImage(_targetBytes, _GACode.ToArray)
        _GAPopulation.Add(_ga)
 
        Dim pb As PictureBox = CType(GroupBox3.Controls("pbSample" & ii.ToString("00")), PictureBox)
        Try
           pb.Image = Image.FromStream(New MemoryStream(_ga.PersonalBytes))
        Catch
        End Try
    Next
End Sub


We populate a start list composed of GAImages, initializing them with the target byte array (which, as we saw, is useful for fitness calculation), and passing the pool from which our bytes could be drawn. Please refer to the GAImage code above for details on the initialization procedure.
The icon i've used in the above example is 318 bytes long: it's very unlikely a member of starting population will produce our final result.
It's necessary to loop through generations, combining the best items of each set.

A proposal for evolution

Let's see my proposal for a evolutive behavior. In my code, it takes place in clicking Button2 control. Non GA relevant code is omitted (refer to source code for it).

Private Sub Button2_Click(sender As Object, e As EventArgs) Handles Button2.Click
    ' omission
 
    While _runFlag
        ' omission
 
        Dim bestSamples As IEnumerable(Of GAImage) = From ga As GAImage In _GAPopulation Order By ga.Fitness Descending Take (4)
             
        ' omission
 
        If bestSamples(0).FitnessPercent > 95 Then Button3_Click(sender, e)
 
        Dim sample01 As GAImage = GenerateImage(bestSamples(0), bestSamples(1))
        Dim sample02 As GAImage = GenerateImage(bestSamples(1), bestSamples(0))
        Dim sample03 As GAImage = GenerateImage(bestSamples(2), bestSamples(3))
        Dim sample04 As GAImage = GenerateImage(bestSamples(3), bestSamples(2))
 
        _GAPopulation.Insert(0, sample01)
        _GAPopulation.Insert(1, sample02)
        _GAPopulation.Insert(2, sample03)
        _GAPopulation.Insert(3, sample04)
 
        Dim worstSamples As IEnumerable(Of GAImage) = From ga As GAImage In _GAPopulation Order By ga.Fitness Ascending Take (4)
        _GAPopulation.Remove(worstSamples(0))
        _GAPopulation.Remove(worstSamples(1))
        _GAPopulation.Remove(worstSamples(2))
        _GAPopulation.Remove(worstSamples(3))
 
        ' omission
    End While
End Sub

Simply as this, we loop undefinitely through the list of GAImages (or, at least until 95% of result has been reached), taking each time the 4 fittest samples, and crossing their bytes. We insert the new created individuals on top of our list (supposing they are the bests), and removing an equal number of elements judged the worst in terms of fitness. It is possible, if something "goes wrong" during the generation phase, that a new created element is worse than a pre-existent one. In this case, it is discarded anyway.

Private Function GenerateImage(firstGA As GAImage, secondGA As GAImage) As GAImage
    Dim _ga As New GAImage(_targetBytes, _GACode.ToArray, False)
    For ii As Integer = _ga.FixedBytes To firstGA.PersonalBytes.Length - 1 Step 1
        _ga.PersonalBytes(ii) = IIf(secondGA.isByteMutable(ii), Mutation(secondGA.PersonalBytes(ii), 2), Mutation(secondGA.PersonalBytes(ii), 20))
    Next
    Return _ga
End Function

The inbreeding is pretty simple: each accessible byte from the first element is looped, and the resultant byte is calculated as follows: if the same byte from the second element is mutation-resistant, we call on mutation routine with a probability of 1/20. Otherwise, we use a 1/2 probability rate.

Private Function Mutation(bt As Byte, rate As Byte) As Byte
    Randomize()
    Dim prob As Integer = Int(Rnd() * rate)
 
    If prob Mod rate = 0 Then
        Return _GACode(Int(Rnd() * (_GACode.Count - 1)))
    Else
        Return bt
    End If
End Function

Mutation() takes place with a probability: if the mutation rate catches up, the specific byte is randomly extracted from the standard pool. Otherwise, the second element byte will be reproduced in the offspring. The loop through each generation/population for single generation will eventually evolve towards the final (and expected) result. This is only one of the methods that can be used to implement evolutive algorithms: each coder is the judge in defining what are the parameters to be called into question, and what are the general rules to solve a particular problem.

In our case, as long as it can bring valid results, other kinds of approaches are possible.

Sample video

A sample running session could be seen here: g6JebAjtWQ

Download source code


Bibliography