In the storybook portrayal of science, theories are tested by experiments, which are conducted in laboratories so that the conditions can be rigorously controlled.

How would group selection be tested in the laboratory? Let's begin with the thousands of selection experiments that have already been conducted in the laboratory at the individual level. A population of animals, such as fruit flies or chickens, is measured for a particular trait, such as bristle number or egg productivity. Individuals that score high or low (depending upon the desired direction of selection) for the trait are selected to breed the next generation. If the average value of the trait in the offspring generation shifts in the direction of selection, then the trait is heritable and there has been a response to selection. Over many generations, artificial selection can cause organisms to become completely different from their ancestors, as our domesticated plants and animals attest.

Group selection can be studied in the laboratory by a simple extension of the protocol outlined above. A population of groups is created, a particular trait is measured for the groups, and the highest (or lowest) scoring groups are used to breed the next generation. If the average value of the trait in the offspring generation shifts in the direction of selection, then group selection is proven to be efficacious, at least under the conditions of the laboratory experiment.

To make the procedure less abstract, consider my favorite group selection experiment, which I have written about in Evolution for Everyone. William Muir, an animal breeder at Purdue University, selected for egg productivity in hens in two different ways. Both involved housing hens in cages (groups), which is standard practice in the poultry industry. The first method involved selecting the most productive hen within each cage to breed the next generation of hens. The second method involved selecting the most productive cages and using all the hens from those cages to breed the next generation of hens. It might seem that this is a subtle difference, that the same trait (egg productivity) should be selected in both cases and that the first method should be more efficacious. After all, eggs are produced by individual hens, so why not directly select the best? Why select at the group level, when even the best groups might have some individual duds?

The results told a completely different story. The first method caused egg productivity to perversely decline, even though the most productive hens were chosen each and every generation. The second method caused egg productivity to increase 160 percent in six generations, an astonishing response as artificial selection experiments go.

What happened? If you've been paying attention to my Truth and Reconciliation blogs, you'll recognize a classic case of multilevel selection. Natural selection within groups is sensitive only to relative fitness, relentlessly favoring hens who lay more eggs than their neighbors. The first method favored the nastiest hens who achieved their productivity by suppressing the productivity of other hens. After six generations, Muir had produced a nation of psychopaths, who plucked and murdered each other in their incessant attacks. No wonder egg productivity plummeted! It would be hard to imagine a more graphic example of what I have called "the original problem" throughout this series of blogs; traits that are "for the good of the group" are not always locally advantageous within the group and require a process of group-level selection to evolve.

That's why the second method worked. Selecting the most productive groups favored peaceful and cooperative hens, despite their selective disadvantage within groups. Moreover, group-level selection was sufficiently strong to successfully counteract selection within groups, which was taking place within cages for the second method, just as much as the first. Muir's experiment proves the efficacy of group selection, at least under the conditions of the experiment.

By the way, the groups of chickens were siblings. When some of my colleagues learn this fact, they shout "Aha! It's kin selection, not group selection!" Wrong. The groups were siblings in both methods, so their kinship cannot explain the difference between the methods. As I relate in T&R XIII, everything about kin selection theory can be understood in terms of the parameters of multilevel selection theory. Creating groups of siblings caused the psychopaths to cluster in some groups and the peaceniks to cluster in other groups, providing lots of variation to select upon at the group level. Psychopaths still beat peaceniks within each group; the fact that they were siblings is beside the point.

This experiment also raises important questions about what counts as an individual trait. Egg productivity seems like an individual trait because you can count the eggs coming out the hind end of a hen. The experiment reveals that egg productivity is in fact a highly social trait that depends upon the genetic composition of one's group, not just the individual's genetic composition. This example has profound consequences for how we think about human traits that seem individual but in fact are highly social.

This is only one of many experiments demonstrating the efficacy of group selection in the laboratory, for creatures as diverse as insects, plants, and vertebrates. In 1997, I organized a symposium on multilevel selection in my capacity as Vice President of the American Society of Naturalists, one of the premier evolution-oriented societies. The symposium took place at their annual meeting and was published as a special issue of the American Naturalist, arguably the premier journal for evolutionary research at the time. Among the speakers was John Maynard Smith, one of the premier evolutionists in the world and a chief critic of group selection, as I recount in T&R VIII and IX. I mention these credentials not to boast, but to emphasize how much the symposium occupied center stage in the world of evolutionary biology.

Another speaker at the symposium was Charles Goodnight, a student of Michael Wade, who conducted the first group selection experiments in the 1970's. Charles reviewed the literature and concluded that every group selection experiment conducted in the laboratory demonstrated an efficacious role for between-group selection, even when between-group selection was opposed by within-group selection. You can judge for yourself from the published version of Charles' talk (co-authored with Laurie Stevens) titled "Experimental Studies of Group Selection: What Do They Tell Us About Group Selection in Nature?".

As Charles recently recounted to me, he was convinced that his talk would be a career-maker for himself and a turning point for acceptance of group selection. Why not, given the import of his conclusions for one of the most important controversies in evolutionary theory, the prominent forum, and the likes of John Maynard Smith in the audience? He was sorely disappointed. The laboratory evidence for group selection had virtually no impact on the acceptance of group selection by the evolutionary community at large. So much for the storybook portrayal of science.

The only legitimate reason to discount the results of a laboratory experiment is when the conditions are highly artificial, therefore irrelevant to real-world situations. But this is not the case for the group selection experiments, in which groups are formed much as they might form in nature. Moreover, the whole beauty of laboratory experiments is that conditions can be varied in a systematic fashion. If a critic thinks that the conditions of one experiment are artificial, the answer is to conduct another experiment, not to discount laboratory evidence entirely.

In fact, group selection is so efficacious in the laboratory that even the proponents of group selection were surprised. As often happens, the laboratory experiments revealed factors operating in real biological systems that were beyond the imagination of the theorists. In particular, the theorists had limited their models to traits with a simple genetic basis, such as selfish and altruistic genes that code directly for selfish and altruistic behaviors. Given this assumption, phenotypic variation among groups corresponds directly to genetic variation among groups, which in turn depends critically on the number of individuals initiating each group. The larger the initial group size, the less variation among groups for between-group selection to act upon. That is the entire import of kin selection and the early conclusion that group selection requires special conditions, such as small initial group size, to be efficacious.

In the laboratory, groups vary substantially at the phenotypic level, even when they are initiated by large numbers of individuals -- because the relationship between the genetic composition of a group and its phenotype is complex rather than simple. Even when groups initially vary by only a small amount, complex interactions within groups causes them to become more variable over time, a kind of "butterfly effect" that also accounts for why complex physical systems, such as the weather, are so variable.

An experiment that I performed with my former student, William Swenson, will make this idea less abstract. We grew a fast-growing plant called Arabidopsis in small flowerpots. The soil was sterilized except for a slurry of six grams of unsterilized soil from a single well-mixed source. To be precise, to make the slurry, we placed unsterilized soil and sterilized water in a kitchen blender and blended it like crazy before delivering six grams of soil to each pot of sterilized soil. If you know anything about microbiology, you know that millions and millions of microbes comprising hundreds and hundreds of species are contained in a single gram of soil. Thus, the initial variation among pots in the genetic and species composition of the soil microbes was vanishingly small.

We grew the pots under constant environmental conditions until the plants were large enough to harvest. We weighed the biomass of the plants and performed a standard artificial selection experiment with a single twist. Instead of selecting the largest or smallest plants to breed for the next generation, we selected the soil from under the largest and smallest plants (in separate treatments) to make into a slurry and inoculate the next generation of pots. In other words, we were selecting at the level of whole microbial ecosystems rather than at the level of individual plants. Plant biomass was being used as a phenotypic trait of the ecosystem.

Even though the initial variation among pots was minuscule based on the large number of microbes colonizing each pot, variation did not stay minuscule because each pot was a complex biological system. Just as a butterfly flapping its wings can change the trajectory of a complex physical system such as the weather, each pot embarked upon a separate trajectory during the course of the first plant generation. This was apparent even to the naked eye; for example, some pots but not others developed a mat of algae over the surface of the soil. These differences made a difference for plant growth, so that by selecting the soil from beneath the largest and smallest plants, we were selecting microbial ecosystems that caused the plants to become large or small. Over a number of ecosystem "generations" (each comprising many microbial generations), the high and low selected lines diverged from each other -- proof that variation among ecosystems was heritable. This work was published in one of the nation's premier science journals, the Proceedings of the National Academy of Sciences. Once again, I say this not to boast, but to demonstrate that if multilevel selection experiments in the laboratory have failed to have an impact, it is not for lack of legitimacy or exposure. As with Goodnight's review of the literature, however, our experiment had virtually no impact on attitudes about group selection by the evolutionary community at large.

Muir's chicken experiment and our soil ecosystem experiment show that multilevel selection is not just an arcane scientific subject; it can be put to practical use. The eggs in your refrigerator come from group-selected hens, regardless of what you might think of group selection. In a second set of experiments, Bill Swenson and I selected microbial ecosystems to degrade a toxic compound. In his current research, Bill Muir and his colleagues are using group selection to create strains of livestock that have not been inadvertently selected to make each other miserable. What else might we select groups and ecosystems to do?

Multilevel selection experiments in the laboratory vividly illustrate why a truth and reconciliation process is needed for the subject of group selection. The experiments are published in the best journals because a core group of evolutionists does understand their import. For them, the storybook portrayal of science actually takes place. For the evolutionary community at large, however, the rules governing the acceptance and rejection of group selection are a different story.