# Genetic Algorithms: Simulating Evolution on the Computer, Part 2

by Jeff Smith

We continue with our discussion of using Java to create genetic algorithms that enable you to handle such diverse tasks as optimizing networks, maximizing mathematical functions, designing proteins in the pursuit of new drugs, and so on. Here, we look at more sophisticated algorithmic problems.

Review Part 1

#### A More Complicated (Traveling Salesman) Example

A more complicated and interesting example is to use genetic algorithms (GAs) to solve the "traveling salesman" problem [4]. In this classic problem, a salesman is traveling to N cities, all in one trip, and wants to take the shortest route that brings him to all N cities without visiting any twice. This problem is actually very similar to a problem faced on the Internet everyday: How should a packet of information be routed through the Internet (a network of N nodes) so that it gets from point A to point B while going through the fewest nodes possible?

It turns out that the search space (the number of possible solutions) of the traveling salesman problem increases as the factorial of the number of cities. So if our salesman needed us to optimize his route for a 20-city tour, there are 2.4 x 10¹8 possible routes he can take. If we had a supercomputer evaluate all possible solutions at a rate of 1,000 possible routes per second, it would take 7,710 years to arrive at the optimal route! And adding just one more city to his itinerary would increase by the possible solutions by a factor of 21! Obviously, this problem cannot be solved by a brute force evaluation of all possible solutions.

We can solve this problem, however, by extending the GASequenceList class, which stores chromosomes in the form of character strings. We can code these solutions in the form of chromosome strings such as "ABCDEFGHIJ" and "ACBDEGFHJI", where each letter represents a city. The GASequenceList class automatically prevents duplicate genes from appearing in our chromosomes. This is important for our salesman problem, because we don't want the GA to determine that the best chromosome is "AAAAAAAAAA" (basically, minimize the distance traveled by staying in city A for the duration of the trip!).

In our constructor, we pass in a parameter designating our "gene space" or list of possible gene values as the letters we've coded for each city. So if we have the cities A-J, we specify "ABCDEFGHIJ" for that parameter. GASequenceList will enforce the constraint that all chromosomes have some permutation of those 10 characters with no duplicates. So "CBADEFGHIJ" would be a legitimate chromosome, but "CCADEFGHIJ" would not since it duplicated the gene "C".

We will also need to write a getFitness() function that will calculate the distance traveled by our salesman as defined in the route stored in a given chromosome. For example, a chromosome like "ABCDEFGHIJ" might result in a route of 3,200 miles, while a chromosome like "ACBEDFGJIH" might result in 4,000 miles. Each city's coordinates could be stored in one, two, or three dimensions. In my example, I placed each city in a one-dimensional universe (a straight line) to simplify the code, but it's easy enough to extend the getFitness() function to calculate distances in two or three dimensions. And remember, the getFitness() function will need to return a high fitness value when the total distance traveled is small and a lower value when the distance is great. To achieve this, my getFitness() function returns one divided by the distance.

Using two-point crossover and performing 50 preliminary runs with a population of 300 chromosomes and a mutation probability of 5 percent, my GASalesman example solves the 20-city problem in 50 seconds (on a Pentium III 700 laptop).

#### A Curve-Fitting Example Using Floating-point Numbers

As a (nearly!) universal problem solver, GAs can also be used to do curve fitting. That is, suppose you have a set of data points collected from some electronic instrument and you want to come up with a polynomial equation that fits (and can plot) the data. Such an equation could be useful for extrapolating values for points not in your measured data set. While other mathematical methods [5] have been specially tailored to do this (e.g., Least Squares), GAs are also surprisingly good at solving this problem.

To do curve fitting with my GA library, simply extend the GAFloat class. Each chromosome will then consist of a set of floating point numbers (genes), each of which will represent a coefficient in the polynomial to be discovered. The polynomial being fitted will look something like this (assuming you are looking for a fourth degree polynomial):

c4*x4 + c3*x3 + c2*x2 + c1  + c0  = D

Where c4, c3, c2, c1, and c0 are the coefficients you are solving for and D is a data point value. You'll need to write a getFitness() function that plugs each gene from a candidate chromosome into these coefficients and then tests the fitness of this "equation" against the set of data points. Highly fit chromosomes (polynomial equations) will generate values that closely match the data points. Over time, the GA will evolve the correct set of coefficients to the equation, giving you a good curve-fitting solution.

Using two-point crossover, two decimal points precision, a population of 100 chromosomes, and a mutation probability of 10%, my GACurveFit example finds a second degree polynomial, that fits a set of eight data points, in under 4 seconds.

#### Running Sample Genetic Algorithms

To run my sample genetic algorithms, I created a simple applet that lets you choose between binary ones, curve fitting, traveling salesman, and trigonometric functions. It also plots the statistics computed by the algorithm: average deviation and average fitness. By looking at these graphs, you can get an idea of how many generations were run before the GA converged on the solution.

The following diagram illustrates the interesting observation that evolution occasionally stagnates at a certain fitness level (a local maxima) for many generations and then dramatically improves in just several generations. Is this a manifestation of Eldredge and Gould's theory of punctuated equilibrium [6], which postulates that long periods of relative evolutionary equilibrium are punctuated by short periods of rapid change?

Figure 3. Plots of the Traveling Salesman genetic algorithm.

#### Summary

While there is no guarantee that genetic algorithms will find the optimal or "best" solution to a complex problem, they have been successfully applied to a wide variety of theoretical problems such as optimizing information networks and optimizing obtuse mathematical functions. They are also applied to more "concrete" problems in areas such as engineering. For example, suppose you want to build a bridge that is stable, cheap, and simple to construct. You could use GAs to build thousands of simulated bridges (encoded as chromosomes), test them in a fitness function, and evolve a good bridge design.

Or perhaps you are an aeronautical engineer looking to design a superior aircraft wing. GAs could evolve a better wing design for you. Mortgage lenders use GAs to determine the best criteria for determining whether to extend loans. Stockbrokers use them to optimize trading strategies or to look at mountains of data to find subtle stock trends (such as sector A tends to rise when the prime rate is below X and inflation is greater than Y). There are probably many clever uses for GAs that have yet to be discovered.

The trick, if one exists, is to write a fitness function that can successfully guide the evolutionary process. A GA will never be able to "evolve" a solution for which there is no fitness function. For example, without the positive feedback of a fitness function, you'd never be able to create a GA that can break an encrypted password or descramble a satellite signal. But if you are clever enough to create a viable fitness function, you can probably find a good solution to your problem by implementing a GA and harnessing the power of simulated evolution.

#### References

[1] John Holland, Adaptation in Natural and Artificial Systems, University of Michigan Press, 1975.
[2] David E. Goldberg, Genetic Algorithms in Search, Optimization, and Machine Learning, Addison-Wesley Pub Co., 1989.
[3] Melanie Mitchell, An Introduction to Genetic Algorithms, MIT Press, 1998.
[4] David Applegate, Robert Bixby, Vasek Chvatal, William Cook, "Solving Traveling Salesman Problems", Princeton University.
[5] Cuthbert Daniel, Fred S. Wood, John Wayne Gorman, Fitting Equations to Data: Computer Analysis of Multifactor Data, John Wiley & Sons, 1999.
[6] Niles Eldredge and Steven Jay Gould, "Punctuated equilibria: An alternative to phyletic gradualism", Models in Paleobiology, 1972.