Genetic Algorithms

Genetic Algorithms are a great way to reduce a search-space into a solution space, and an efficient way to find a near-optimal solution, quickly.

Genetic Algorithms are cool! I feel no one talks about them even though their applications are wide and their effectiveness extremely powerful. It’s a way to help optimize a search space and turn it into a solution space. They’re far better than brute-forcing solutions for problems with astronomical possibilities, and they can be much simpler to create than a specialized solution to a problem.

What are Genetic Algorithms?

Genetics Algorithms (GA) are loosely based on the Theory of Evolution by Natural Selection. There’s some common terminology, like population, chromosome, gene, and mutation.

The basic idea is that you start with a population – a group of randomly generated possible solutions. Each solution is like an individual – they have chromosomes which have genes. Individuals love to reproduce, but some individuals are more desirable than others, so they’re more likely to reproduce and give their good traits to the next generation.

Actually, I’m going to pause here and change my analogy. Think of the population more as sea lions. You have the leader, the bull – the most desirable mate. The bull reproduces with the most number of sea lions in the rookery, but sometimes the stragglers get lucky.

Each time a sea-lion reproduces, some genes from the mother and father are brought together. This is known as a crossover. But sometimes, genes don’t copy correctly and become altered. This is known as a mutation. Each mating season, or generation, the reproductive process creates the next population.

Like Evolution, the goal of each population is to find a better set of genes that make the individual more fit for the environment – which for an algorithm is a better-evaluating solution to the problem.

Can we please see an example already!?

Great, lots of definitions but no action. Let’s go to a simple example.

How about trying to find the minimum of a quadratic function? Sure, we could solve this ourselves – I’m sure we’ve all been through Algebra 1 – but let’s focus on the GA. For now I’ll go over the basic premise, but the concrete example with code will be in part 2.

For this, I want to find the minimum of the function f(x) = x^2 – 4x + 2. Now intuitively, we know that the solution is going to be between 0 and 63. So let’s make each individual have a chromosome that’s an array of bits representing an integer; each gene is then one bit. So, one possible solution would be [0, 0, 1, 0, 1], or 5.

A crossover will take some bits from one parent and some bits from the other. So if one parent was [0, 0, 0, 0, 0] and the other had [1, 1, 1, 1, 1], then we could have a crossover result in [0, 0, 1, 1, 1] or [0, 1, 0, 1, 0], or any combination. It’s up to us to organize how that happens.

Then a mutation can alter a bit, or in this case flip it. So a mutation could change the chromosome in the offspring from [0, 0, 1, 0, 0] to, say, [0, 1, 1, 0, 0] or [0, 0, 0, 0, 0].

Doesn’t Evolution Take Millennia to Prove Effective Though?

Well, not always. Bacteria can evolve in laboratory conditions in the matter of weeks, but that’s a whole other discussion. And plus, remember that computers are FAST!

Just because GAs are loosely based on Evolution doesn’t mean we have to strictly stick to its ways. One way to really increase the efficiency of a GA is to carry the best individual(s) of the population from one generation to the next. This is known as elitism, and it makes the algorithm greedy and converge faster.

Crossover and mutation rates can also be controlled for certain cases. They’re the way to tune the amount of time spent on expanding over a search space and the time to spend refining a solution that’s already been evaluated; breadth versus depth, really.

The speed of the search is mostly affected by the size of a population and how difficult it is to measure an individual’s performance. Optimizing the evaluation function is the biggest factor.

For the quadratic example, the performance test is quite fast because the possible solutions just needs to be entered into the quadratic function, and computers are great at arithmetic. Using elitism, we just have to keep the minimum.

With a population of sufficient size, the minimum should be found quickly, but if it’s too large, we’ll be mimicking a brute-force approach, where every number from 0 to 63 is evaluated. While this isn’t too strenuous or time-consuming for this particular example, it’s important to remember when attempting to find solutions for other problems.

So how do we know what a good population size is? Or how to crossover effectively, or how often to mutate? Or how many generations to have? Well, that’s up to the programmer and the problem. A little trial-and-error may be appropriate.

In my experience, a mutation rate of 10% and crossing over roughly 50% of the genes is pretty good. For the crossovers, I usually try to choose an axis in the middle of each gene based on random chance against the crossover rate, and everything from before the axis will come from the first parent while the rest is from the second parent. Another way is just to split the list of genes in half and randomly decide which half comes from which parent. Again, this is preference on how far of a spread you want at any given time for a search, and different problems will require different solutions.

What other applications GAs be used for?

Plenty! The Traveling Salesman Problem, creating schedules, minimizing conflicts for seating at a wedding reception, ways to pack cargo boxes or storage trucks, predicting football scores; there’s so many applications.

I’ve even tried recently using them to help optimize settings for a neural network I was creating. The genes were essentially the number of hidden neurons and the number of generations to train. This allowed my models to be trained on-demand, quickly and effectively, without over fitting.

I’ve also taken fantasy football data and created optimal lineups to use on FanDuel, which I’ll go over in Part 3.

GAs are a great way to find a pretty good solution, if not the optimal solution, to a problem where there’s too many possibilities to check. They don’t usually take long to program, and can be used in a plethora of situations. Now go forth and conquer!


Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s