The Happy Technologist Interesting Geekdom

1Aug/110

Marbled Rye and Evolutionary Algorithms

Since no one has yet taken it upon themselves to write my unauthorized biography, it falls to me to make the following piece of information available to the public: I like to bake.

Breads and pies mostly -- I've got a couple of recipes posted here, including a pie crust that I'm pretty happy about, and a few things I've borrowed from other people. I've made a few rhubarbs lately that really turned out quite well.

One thing I recently attempted, was a Marbled Rye. This isn't a terribly difficult bread to make -- there are recipes everywhere. I was mostly pleased with it, though -- I didn't have any Caraway seeds, which add a lot of flavor, but the bread looked nice and better than a lot that I've made lately.
Chip's First Marbled Rye
One thing I experimented with, though, was yeast.

Yeast is one of those things I don't really understand. This is because the most I remember about the biological classification taxonomy was that everything was an "Animal", "Vegetable", or "Mineral" -- I have no idea which one a yeast would be. This was a problem for biologists as well, so in 1990 they changed the top three domains to be "Archaea," "Bacteria," and "Eukaryota," which has helped me in no way whatsoever because not only do I still not know which one yeast would be, but I no longer know which one I'm supposed to be, and I much preferred back when I was an Animal and the world made sense.

Anyway, yeast are largely responsible for the existence of Bourbon, which automatically qualifies them as A Good Thing™ no matter what biologists call them. Baker's yeast, which makes us happy, is "Saccharomyces cerevisiae" (note the interesting comment in Wikipedia about Crohn's and Colitis on that page -- I never knew that), and lives everywhere, so it's pretty easy to get hold of. You can leave potato-starch filled water out for a while and yeast will just show up. All it does, really, is convert sugar into bubbles and alcohol. In breads, the bubbles (Carbon Dioxide) make the breads rise... in alcohols, the alcohol well, makes the alcohol alcoholic. Yeast is glorious.

I've tried a number of yeasts -- a local farmer's market sells some non-specific "yeast" which smells nice, so I use it a lot. I've tried whatever Kroger sells, and most of the time I'm not picky -- Fleischmann's yeasts are the norm and they work just fine. In fact, I picked up some specific for pizza crusts and specific for fast rising breads as well as the more mundane packets on my last trip. Sadly, I accidentally put some pizza yeast in the lighter of the marbled rye doughs, and fast-rise yeast in the darker one. The result was a very tasty bread that had a slightly pizza dough flavor and was half-thick-dense dough and half-fluffy-light due to the differing rising times. It was a very tasty bread, but straightforwardly imperfect.

It's these different types of yeasts that interest us. What makes yeast fast-rising? What makes it better for pizza? And how does someone make that yeast?

The basic answer is that yeasts are cultivated over large periods of time -- bred, as it were, for the features that people want... fast rising, for example, or a specific taste which works better for pizza, bourbon, beer, or marbled rye. The same mechanism that people have exploited for thousands of years to breed house cats from tigers and Chihuahuas from wolves has given us yeast that produces double the gas with less sugar and water, and in a larger temperature range than bakers had a few thousand years ago. Batches of yeast that have desired characteristics are fed and cultivated while batches that are disliked or difficult to manage are discarded.

People were able to make strides in this area long before the underlying mechanisms were understood... Louis Pasteur was one of the first to understand yeast at a cellular level due largely to contemporary discoveries and advances in microscopy leading to more advanced methods for culturing specific traits. This was in the late 1800s, however; large companies such as Fleischmann's and Lesaffre can date their initial yeasts prior to Pasteur's work, and Christian bible contains stories of Jewish slaves taking dough with them during their flight from Egypt. Some descendants of those yeasts almost certainly survive today.

What am I doing talking about all of this on a technology blog? Well, the practice of cultivating and growing yeast is a human-driven evolutionary process that has lasted thousands of years and gone through countless generations to produce specific characteristics. Can we emulate this mechanism to produce desired results in data analysis and programming? Of course! Otherwise we wouldn't be talking about it. Enter the modern programming practices of Evolutionary Algorithms and Genetic Programming.

Both of these are techniques where computations are performed over and over -- frequently millions of times or more -- to achieve a desired result that would be difficult to achieve through other means. An Evolutionary Algorithm (EA) is one which manipulates (evolves) a data set to have particular characteristics. A Genetic Program (GP) does the same thing but operates on the algorithm itself, rather than the data. This is an important difference, and the two approaches can address radically different problems...

The "Damn Interesting" blog has a must-read article on the topic of genetic programming. The article relates stories of the work of Dr. Adrian Thompson, University of Sussex, who evolved a simple 10x10 grid of transistors (on an FPGA -- more happy tech) to differentiate between noises such as the words "Stop" and "Go". The fact that hardware that size can even achieve the feat is fascinating, but that it was evolved in this fashion is even more educational. Most interesting to me is the note that the "program" only works on the particular chip on which it was evolved. If the exact code is copied to a chip manufactured in an identical way, it doesn't work... the evolved code relies on some very nuanced differences, within manufacturing tolerances, to execute properly. That simply amazes me.

Such algorithms are good for image and facial recognition, stock trading, and, of course, video games to name just a few things. The process is straightforward -- generate a group of random "programs," consisting of variables (inputs) and operators (addition, subtraction, exponentiation, or more advanced string or date algorithms, or even loops for example). After all, that's all programs are, right? Why not generate them randomly? Once a few hundred are created, "run" them and compare the output to the desired result. Programs generate descendants either by mutation, where some operators and variables are changed (asexually), or by combining part of their code with part from another program to produce an offspring with traits from both parents (sexually). These children compete with their living ancestors, and those with the traits we're most looking for are used to create another generation.

Evolutionary Algorithms behave similarly, but instead of treating the mutating sets as code, they are treated as data. Random populations are created and evaluated for "fitness" against some criteria. As with the GPs, candidates from the population are mutated and mated to generate offspring, and the best-fitting members of the population survive for another generation.

EAs are better at problems where the data set is well understood, and criteria about the target can be easily converted into a good fitness test. The Traveling Salesman Problem is one where this approach is quite effective. Here's an article about someone using an EA to draw the Mona Lisa using semi-transparent polygons, although with a standing population of only 1 the Mona Lisa algorithm is the simplest case of an EA: a hill climbing algorithm.

The target space for these algorithms is growing daily, and I'd encourage anyone to try to play with them. I have seen some definite promise for these algorithms in the Heritage Health Prize, as well as for other projects to optimize some network interfaces, and to minimize costs and standing inventory in large supply chain networks. As an example, consider that you have a limited number of resources, say gasoline, flu shots, ammunition, or food. Then let's say you have many restrictions on how you hand them out -- the cost of delivering them, the amount that's needed for each location, the risks of delivering too little or too much. Building those calculations into a fitness test, so that you can compare different scenarios, and simulating possibilities by using an Evolutionary Algorithm is actually a quite efficient way to find good solutions.

You should play with this. There are countless tutorials out there and I doubt I'd do any better writing one of my own (although I may, eventually)... here are a few places to read more to get you started:

Enjoy!

Comments (0) Trackbacks (0)

No comments yet.


Leave a comment

No trackbacks yet.