
The Art of Genes How Organisms Make Themselves
.pdfrooms, much of biological evolution has involved changes in the binding sites within regulatory regions (locks) rather than in the master proteins themselves (keys). Because a typical master protein might bind to as many as one hundred different interpreting genes, an enormous constraint is imposed on the extent to which the shape of this master protein can be modified during evolution: any significant change may jeopardise the expression of all one hundred genes it normally binds to, most likely with disastrous consequences for the development and survival of the organism. In contrast, by changing a binding site in the regulatory region of a gene, only the expression of that gene will be directly affected. For this reason, evolutionary changes are often likely to involve mutations in the sites within regulatory regions, rather than alterations in the regions coding for the master proteins themselves.
An example of a change in a regulatory region might be the creation of a new binding site. Binding sites are quite short stretches of DNA, typically six to ten bases long. In a regulatory region a few thousand bases long, it is not too improbable that a chance mutation altering one or two bases in the DNA could create a new binding site for a master protein, or at least something that came reasonably close to a new binding site. This sort of mutation might start to couple the interpreting gene to a different master protein, modifying the gene's pattern of expression. If this new pattern proved advantageous for the organism, further mutations in the regulatory region might then be selected for to improve the match, creating an even better binding site. A new binding site (lock) has evolved to match a master protein (key) that was already around. It is also easy to see how a mutation could lead to the loss of a binding site in a regulatory region. A change in just one base in the DNA sequence of a binding site could mean that the master protein that normally recognises it can no longer bind. The interpreting gene would then no longer respond to this particular master protein.
Thus, the combination of binding sites in the regulatory region of a gene is something that has gradually evolved. During the course of evolution, particular binding sites have arisen or been lost, changing the way interpreting genes respond to the patchwork of master proteins.
Master proteins are only masters in the sense that they can influence the activity of many genes, just as a key might open many doors. They are not masters in the sense of dictators, having evolved all the information that decides which interpreting gene should be on or off. This is because of the way the system has evolved, through genes modifying their response to the master proteins, rather than the master proteins evolving more and more complex shapes that allow them to dictate to more and more genes.
I do not want to give you the impression that master proteins never change. At the early stages in the evolution of a master protein, when it may bind to just a few genes, there may be quite a bit of room for change. But as more interpreting genes evolve suitable binding sites and come under the influ ence of the master protein, the possibilities for change become more limited. It then becomes increasingly more likely that altered patterns of gene expression involve changes in interpretation, rather than changes in the master proteins themselves.
We can now see that genes interpret hidden colours, in the sense that interpretation was defined

in Chapter 3: (1) The hidden colours provide a frame of reference, a distribution of master proteins of various types. (2) Each gene responds to this pattern selectively, being expressed at various times and places in the organism according to the set of binding sites in its regulatory region. A different combination of binding sites leads to a different pattern of expression. (3) The particular selection made in each case is historically informed, depending on a series of historical events that have led to one set of binding sites in the regulatory region rather than another.
Families of colour
Although the evolution of master proteins is constrained, there is one very important way in which these restrictions can be partially overcome: through a process called gene duplication. This occurs when a mistake is made during the copying of DNA. Remember that DNA is normally copied once every time a cell divides. Occasionally an error is made in the copying process such that one stretch of DNA ends up being copied twice instead of once. The details of how this occurs need not concern us here: what matters is that it sometimes results in an extra copy of a gene being incorporated in the DNA. This means that if we have an organism with one gene for a master protein, very occasionally a descendant will be produced with an extra copy of the gene. The descendant now has two copies of the gene for a hidden colour (Fig. 6.1, upper part). These two copies will have exactly the same sequence of bases in their DNA: they will be 100% identical.
Fig. 6.1 Duplication and divergence of a gene coding for a master protein, resulting in genes for two different types of blue hidden colour.
Duplication seems to be of little consequence at first, but in the longer term it can provide greater evolutionary flexibility as the duplicate copies diverge. This is for the same reason that if you have two copies of a key, you can tinker around with one of them without jeopardising your ability to open doors, because the other copy acts as a backup. If you have two copies of a gene, some mutations that might normally be detrimental to the organism could be allowed because the genes act as backups for each other. Mutations would be expected to accumulate over a period of evolutionary time, so that the duplicate genes eventually start to diverge in sequence, indicated by the different shading of the genes in Fig. 6.1. The duplication has been followed by divergence between the copies, so the DNA sequences of the duplicate genes are no longer 100% but, say, only 90% identical (that is, out of every ten bases of DNA sequence, there is now on average one
difference, just as the word convection only differs in one out of ten letters from conviction). As the DNA sequences diverge, so the proteins encoded by each of the duplicates may also start to diverge. Perhaps this would lead to their shapes becoming slightly different in some way, so that they now bind to regulatory regions with a slightly different specificity. Having started off with one hidden colour, two different versions have evolved. The two hidden colours will be closely related to each other Many of the hidden colours affecting identity in flowers and flies are thought to have arisen by duplication and divergence. This became very dear when the identity genes needed for the hidden colours were isolated in the early 1980s. Once a gene has been isolated, the sequence of its DNA and encoded protein can be determined. By comparing the sequences of different identity genes, it soon became apparent that they had arisen by gene duplications. For example, the eight fly genes needed for colours a-h (page 77) all have similar DNA sequences. The percentage similarity between the eight genes is particularly high in one stretch of their DNA, about 180 bases long, named the homeobox.* The homeobox provides a sort of common signature, showing that the eight genes all started as duplicates of each other. Because of this, the identity genes of the fly are also sometimes referred to as homeobox genes, as they all share this region of similarity in their DNA. This does not mean that the homeobox region is identical in the different genes; they each have slight differences in this region due to divergence. It is just that the homeobox is the region of greatest similarity between the genes.
The reason that the homeobox is thought to be so well conserved between duplicates is that it codes for the part of the master protein (called the homeodomain) that makes direct contact with the DNA: the region of the master protein that fits into the binding site (equivalent to the part of a key you insert into a lock). Any major alterations in the sequence of this region are likely to disrupt the ability of the master protein to work at all, and alterations have therefore been selected against during evolution.
Because the master proteins encoded by these genes are similar or related to each other, we can symbolise them as a family of related hidden colours, say various types of green (Fig. 6.2). We can replace the colours a-h with types of green that begin with the same letter: colour a becomes apple-green, colour b is now bottle-green, c is cyprus-green, d is deep-green, e is emerald-green, f is forest-green, g is grass-green and h is herb-green. The fruit fly is divided up into territories coloured with various types of green, starting with apple-green at the head end and finishing with herb-green at the tail end. Each type of green territory corresponds to a region of cells where a particular type of master protein is made.
______________________________________________________________________________
* Homeo derives from homeosis, the term originally used to describe mutants with mistaken identities, and box is appended because the DNA sequence could be highlighted by drawing a box around it.
_____________________________________________________________________________
As shown in Fig. 6.2, the identity genes needed for the various green colours are arranged in two clusters in the DNA: five genes are in one cluster and three in the other. You may notice that the order of the genes in the clusters is the same as the order of hidden colours from head to tail in the organism. For example, looking at the cluster on the left, the gene for apple-green is followed by

the gene for bottle-green, followed by the gene for cyprus-green and so on, paralleling the order of the corresponding territories of hidden colour from head to tail in the animal. It is still not clear why the order of these genes in the DNA should correspond so nicely with their order of expression in the organism, but it most likely has to do with the way the duplications have evolved.
Fig. 6.2 Cluster of identity (homeobox) genes in fruit fly DNA with the corresponding hidden colours they code for.
Many of the identity genes affecting whorls of flower organs have also arisen by duplication and divergence. These genes each contain a similar stretch or signature in their DNA sequence. This signature is not the same as the one in the fly identity genes, and is therefore given a different name: the MADS-box.*
Instead of greens, we could represent the set of master proteins encoded by these genes as reds. The hidden colours of the flower previously referred to as a, b and c can now be replaced by amarone-red, burgundy-red and claret-red respectively (fortunately the wine trade has provided us with many names for reds). The flower -bud can be thought of as containing concentric territories of red colour: starting with amarone-red in the outermost whorl (sepals), then amarone-red + burgundy-red (petals), then burgundy-red + claret-red (stamens), and ending with claret-red in the centre (carpels). This pattern of hidden colours is what gives a separate identity to the various whorls of flower organs.
___________________________________________________________________________
*MADS is an acronym based on the names of some of the earliest described members of this family.
Let me summarise the main points so far. Flies and flowers contain a set of identity genes that are expressed in various regions of the organism to produce master proteins. This distribution of master proteins is equivalent to a map or patchwork of hidden colours. Many of the master proteins are related to each other because the various identity genes arose by duplication and divergence, and this can be symbolised by the red (flowers) or green (flies) families of hidden colour. The map of hidden colours provides a frame of reference that can be interpreted by many genes through their regulatory regions. The combination of binding sites in a regulatory
region acts like a specific molecular antenna, responding to the pattern of hidden colours in such a way that each of these genes comes to be expressed at certain times and places in the organism.
Genes and language
It is useful to compare this view of genes with the way our own language works. Each gene, made up of a sequence of DNA bases, is often compared to a word comprising a series of letters. The equivalent of all the thousands of genes in the total DNA of an organism might then be a large dictionary with a vocabulary of thousands of words. There is, however, a fundamental difference between dictionaries and DNA when it comes to expressing their contents. To express a word from a dictionary, someone needs to look up the word and pronounce it. The word itself does not carry information that tells you whether to say it or not— this comes from the reader who is using the dictionary. A gene, however, does carry information in its regulatory region that determines when and where it is expressed. The gene contains a molecular antenna, a series of binding sites, ensuring that it is expressed in some cells and not others. It would be as if each word had a large prefix that ensured it was pronounced at certain times and places.
Although analogies with the written word break down here, there is in my view a better type of linguistic comparison: with the way we use words in our head. When you talk, or experience a train of thought, the words seem to come automatically. Suppose you have a thought like 'I wonder how bees look at flowers: You do not look up each word, like 'I', then 'wonder', then 'how', in a mental dictionary, because to do so you would first have to know what words you wanted to look up: to look up 'wonder' in your head you would already have to know that 'wonder' is the word you wanted, defeating the whole point of the exercise. The words we use are in a sense stored in our brain, but they occur to us under particular conditions rather than being something we look up in a mental reference library. We are most conscious of this when a word is on the tip of our tongue and we have difficulty in recalling it. We have to wait until the word comes to us almost of its own volition. Thoughts are not something we plan and then execute by looking up the appropriate words, we just have them. We can of course plan to think about something, like 'I am going to spend the next hour thinking about bees'; but we do not plan the thoughts we will then have about bees and retrieve the words accordingly, because to do so would mean that we had already had the thoughts. Rather, we might start by contemplating some aspect of bees, and this would lead to other thoughts and words coming to mind. We experience a wandering train of thought rather than a planned series of events.
By analogy with genes, we might notionally divide each word in our brain into two parts. One part has to do with what gets expressed as the word occurs to us, and is responsible for how the word 'sounds' in our head. By 'sound' here, I mean the experience of having the word in our conscious mind, irrespective of whether we say it aloud or not. This would be equivalent to the coding region of a gene producing a particular protein. The second part of a word would determine when the word occurs to us, ensuring that each word comes to our mind under certain conditions. This would correspond to the regulatory region of a gene. It is as if each mental word carries information that leads to its being expressed or manifesting itself in our
consciousness according to the conditions in our brain, rather than just being an entity that we retrieve. Words in our mind are not the same as those written down on a page; they are networked or locked into the thinking process. Of course the way they are locked in is not immutable: it can change as our experiences and mental processes develop. At any one time, the 'regulatory part' of each word is historically informed, depending on our previous learning experiences.
I am not saying that mental words are as simple as a linear sequence of subunits in a gene. We do not yet know how words work in our mind, but they most likely reflect a complex set of interactions between cells in our brain. It may be that these interactions would defy being simply broken down into the equivalent of regulatory and coding parts of a gene. My reason for drawing this comparison between mental words and genes is not to give an oversimplistic view of the mind, but to give us a better sense of how genes work than is implied by the notion of a dictionary. There is no independent reader dipping into the gene volumes held within each cell. Genes carry information that leads to their being expressed at certain times and places.
Now of course the whole process of thinking is remarkably interactive. Every word or thought that occurs to you leads to new words coming to mind. There is an ever changing state of mind in which each word or thought feeds off the previous ones. I have begun to show that this is also true for genes. Genes come to be expressed by interpreting hidden colours; and these hidden colours or master proteins themselves depend on a set of genes (identity genes). Genes feed off each other much as words do. But I have yet to explain how the genes coding for the hidden colours themselves get to be expressed in a pattern. It is all very well saying that genes interpret a complex patchwork, but what sets up the patchwork to begin with? I have started my explanations in mid-stream, as it were, assuming that the pattern of hidden colours is already given. As we shall see, the production of this pattern depends on further interactions between genes and proteins. Before dealing with this, however, I want to address another issue in the next chapter: the hidden colours of humans.
Chapter 7 The hidden skeleton
As Gregor Samsa awoke one morning from uneasy dreams he found himself transformed in his bed into a gigantic insect. He was lying on his hard, as it were armour-plated, back and when he lifted his head a little he could see his dome-like brown belly divided into stiff arched segments on top of which the bed-quilt could hardly keep in position and was about to slide off completely. His numerous legs, which were pitifully thin compared to the rest of his bulk, waved helplessly before his eyes.
This is how Kafka starts his short story, Metamorphosis, describing the life of Gregor after he was transformed into an insect. Kafka's choice of an insect as the vehicle of his nightmare was not accidental. Insects live within their skeletons: they carry their supporting framework on the outside of their body, conjuring up a claustrophobic image of being helplessly trapped and entombed within a hard casing. They seem to have an imprisoned and alien existence, providing useful fodder for science fiction and horror stories. Nevertheless, there is a basic similarity between insects and ourselves. Like us, they have a head with mouth and eyes at one end, a body bearing limbs, and an anus at the tail end. This is what allows Kafka's transformation of a human into an insect to work so well: we can readily substitute human parts for corresponding parts of a beetle, head for head, main body for main body. We might take this as indicating that vertebrates, animals with internal bony skeletons such as ourselves, and arthropods, jointed animals with an outer casing such as insects, are formed on similar principles. Perhaps there is a common system that underlies the formation of these different types of animal. Another view would be that this similarity between vertebrates and arthropods is simply a trivial consequence of the limited number of ways that an animal can function. There are only so many ways that animals can operate. Having a head at one end, a main body with limbs, and a tail end, is a particularly convenient arrangement and it is not surprising to find it in different types of animal. In this view, a beetle and human are entirely different types of creature, any resemblance between them being a superficial consequence of limitations on the ways animals can function.
The question of whether vertebrates and arthropods are formed in a similar way or are built entirely differently was a burning issue in the early nineteenth century. It culminated in a famous confrontation in 1830 between two highly respected professors at the Muséum d'Histoire Naturelle in Paris: Georges Cuvier and Etienne Geoffroy Saint-Hilaire (for an excellent description of this debate, see The Cuvier-Geoffroy Debate by Toby Appel).
Form and function
The early part of the nineteenth century was the golden era of comparative anatomy, with the Muséum d'Histoire Naturelle in Paris being at the heart of many of the most exciting new discoveries. Scientists were busily dissecting, describing and comparing many different types of animal for the first time, trying to discover the secrets of their internal anatomy. The legacy of this era can still be observed today in the Paris Museum's Gallery of Comparative Anatomy. There you can see what looks like a skeleton march: a stunning display of skeletons from all sorts of animals,

arranged so that they all appear to be walking in the same direction (Fig. 7.1). The skeletons bear witness to a period in which the study of anatomy was at the cutting edge of biological research. Based on these sorts of investigations, Cuvier and Geoffroy each thought they had uncovered the unifying principles that governed anatomy. Their approach was, however, very different.
Fig. 7.1 Gallery of Comparative Anatomy, Muséum National d'Histoire Naturelle, Paris.
You might classify musical instruments in two ways. One would be to classify them according to how they look: violins and cellos have a similar shape and are made of wood and string; trumpets and horns look similar and are made of metal. This is a classification based on the form or structure of the instrument. Alternatively, you could classify them according to how they make sounds: you have to pluck or bow a stringed instrument; a woodwind instrument is sounded by blowing, either directly or through a reed; to play a brass instrument you have to press your lips against a mouthpiece so that they vibrate when you blow; finally, you bang or hit a percussion instrument. In this case, the classification is based on the way the instrument functions. These two approaches to classification— the form of the instrument or the way it functions— emphasise different aspects of the instruments. If you were to classify a saxophone based on form, you might
place it with the brass instruments, whereas if function was your main Criterion you would place it in the woodwind section because you play it by blowing through a reed (saxophones are normally classified as woodwinds for this reason). In spite of these differences, the two types of classification will often give similar answers because the structure of the instrument is obviously closely connected to the way you make it sound.
For Cuvier, the key to understanding the structure of an animal lay in the way it functioned. Each animal was beautifully designed to function in a particular way and this dictated its form and structure. To Geoffroy, things were the other way round: the fundamental feature of animals was their unity of form. The specific way they functioned was a secondary matter. Following these two approaches, Cuvier and Geoffroy each arrived at a different formulation of the key rules underlying anatomy.
Cuvier and functional integration
Based on extensive animal dissections and studies, Cuvier decided that it was no good trying to understand organs or tissues in isolation: they only made sense when they were seen as parts of an integrated active individual. Why, for example, should birds have feathers? Feathers only make sense if they can be attached to a specialised type of forelimb, making a wing. A wing only makes sense if there is a certain type of collarand breastbone for the wing muscles to attach to. The muscles in turn can only work if they can be provided with high levels of oxygen, requiring a particular type of chest and breathing system. Carrying on in this vein, the whole body plan of a bird could be deduced, starting from just a feather. Once told that an animal has a feather, we could work out that it must also have a certain type of collar-bone, and given a particular collar-bone we could infer that the owner had feathers. A feather without a bird, or a bird without feathers, just would not make any sense: the resulting animal would be illogical, unable to function properly and so could not exist. Cuvier applied and extended this principle to an enormous range of animals. Given a single fossilised bone he became expert at predicting what the rest of the skeleton looked like, what the animal ate and how it moved.
Cuvier was aware, however, that in practice many of the relations between parts had to be worked out retrospectively. If someone had never seen a bird and was given a feather, it is very unlikely that he or she would be able to deduce the structure of a bird from scratch. In practice, we cannot work everything out from first principles, but need to glean knowledge from the animals around us. Cuvier's deductions using fossilised bones were based on comparisons with bones of other known skeletons rather than purely logical deductions. He thought that this had more to do with our inadequacies in logical thinking than with any weakness in his theory. If we were intelligent enough, perhaps we could work out the structure of a bird from a feather without first needing to look at any examples.
It is a pity that Cuvier never met Sherlock Holmes, another great exponent of the power of deductive logic. This is how Holmes describes the art of deduction in A Study in Scarlet:
From a drop of water, a logician could infer the possibility of an Atlantic or a Niagara without
having seen or heard of one or the other. So life is a great chain, the nature of which is known wherever we are shown a single link of it. Like all other arts, the Science of Deduction and Analysis is one which can only be acquired by long and patient study, nor is life long enough to allow any mortal to attain the highest possible perfection in it.
To Cuvier, the logical principle that connected the different parts of an organism together was their functional interdependence: parts were inextricably linked because they relied on each other to work properly. To some degree this may seem rather obvious and it had indeed been stated by others before. Cuvier's achievement was to pursue this idea in a systematic way and to elevate it to a fundamental law which he called the Conditions of Existence:
it is this mutual dependence of the functions and the aid which they reciprocally lend one another that are founded the laws which determine the relations of their organs and which possess a necessity equal to that of metaphysical or mathematical laws, since it is evident that the seemly harmony between organs which interact is a necessary condition of existence of the creature to which they belong and that if one of these functions were modified in a manner incompatible with the modifications of the others the creature could no longer continue to exist.
For a creature to exist at all, its various parts must function together properly. This means that if one part was altered so that it no longer worked well with the others, the organism would not be able to survive and exist. Functional integration was therefore a necessary condition for existence and was the guiding principle behind all animal designs. Cuvier took the harmonious arrangements observed in animal anatomy to be a reflection of God's wisdom: species had been created by God along logical principles with parts that worked well together.
Perhaps Cuvier's crowning achievement was the way he applied his principle to the overall classification of animals. After surveying the animal kingdom, he proposed that there were basically four types of animal, each organised along different lines: (1) vertebrates, animals with a backbone, (2) molluscs, soft bodied animals such as slugs and snails, (3) articulates, jointed animals such as insects and shrimps (i.e. arthropods), (4) radiates, radially symmetrical animals such as starfish and jellyfish. Any animal could be classified as belonging to one of these four categories, or embranchements, just as an orchestral instrument can be classified as belonging to strings, woodwinds, brass or percussion.
Coming up with this four-fold classification may not seem particularly astounding, but imagine you had the task of dividing up the entire animal kingdom in a sensible way. Where would you start and what would your criteria be? You might begin with divisions like birds, fishes, mammals, etc. The trouble with this sort of classification is that it is somewhat superficial and anthropocentric: it is biased towards animals we are more familiar with. When set in the context of the whole animal kingdom, birds, fishes and mammals are seen to be all organised along very similar lines. Cuvier came up with a less parochial classification that recognised fundamental features of animal function and structure. To Cuvier, the four embranchements represented qualitatively different types of functional organisation. In his system, mammals, birds and fishes all belonged to just one of the embranchements, the vertebrates. The remaining three