John Wiley & Sons - 2004 - Analysis of Genes and Genomes
.pdf
9.2 GENETIC MAPPING |
289 |
|
|
9.1Genomic Mapping
In eukaryotes the simplest, and most natural, way to split a genome into smaller fragments is to consider the DNA contained within each chromosome individually. Since each is composed of one double-stranded DNA molecule, the chromosome provides the first level of genome mapping. The chromosome content of an organism (its karyotype) can be visualized using a microscope. Each chromosome is composed of two arms separated by a centromere. By convention, the shorter arm of each chromosome is designated as p and the longer arm is designated as q. The different chromosomes of an organism are usually different sizes (ranging in the human from 279 × 106 bp for chromosome 1 to 45 × 106 bp for chromosome 21), but most chromosomes are difficult to distinguish based on size alone by microscopy. Distinct chromosome banding patterns can be obtained, however, when they are treated with certain dyes. Approximately 500 different bands can be obtained reproducibly after treating human chromosomes with the stain Giemsa (Figure 9.2). These banding patterns can be used to generate a cytological map of each chromosome and provide a low-resolution mechanism to distinguish one portion of a chromosome from another. Some chromosome abnormalities that cause inherited genetic diseases can be observed by karyotype analysis – additional copies of chromosomes can be easily identified, e.g. Down’s Syndrome results from an extra copy (trisomy) of all or part of chromosome 21, and sufferers from Klinefelter’s Syndrome possess three sex chromosomes (XXY). Additionally, a variety of other chromosome abnormalities, e.g. deletions, inversions, translocations etc., can be detected as alterations in the normal banding pattern. The banding pattern also provides a mechanism for labelling chromosome regions. For example, using some of the techniques described below, the gene mutated in sufferers of cystic fibrosis has been mapped to the long arm of chromosome 7 in banding region 31. The chromosomal location of the gene in the cytological map is therefore designated as 7q31.
Isolated DNA fragments can be plotted onto the cytological map by a variety of methods. For example, fluorescently labelled single-stranded DNA fragments will hybridize to chromosome spreads like those shown in Figure 9.2 to yield the location of the complementary sequence (Taanman et al., 1991). This method of fluorescent in situ hybridization (FISH) is a powerful way to localize DNA sequences to individual chromosomes and even parts of chromosomes, but is low resolution in that sequences closer than approximately 3 Mbp apart will hybridize indistinguishably from each other. A number of additional genetic and physical maps of chromosomes have also been produced to aid the localization of specific DNA sequences (Figure 9.3), and we will discuss these below.
290 |
GENOME SEQUENCING PROJECTS 9 |
||
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Pair chromosomes |
|
||
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
2 |
3 |
|
|
4 |
|
5 |
6 |
7 |
8 |
9 |
10 |
1 1 |
12 |
|
13 |
14 |
15 |
16 |
17 |
18 |
|
|
19 |
20 |
21 |
22 |
|
Y |
X |
|
Figure 9.2. The human male karyotype showing the G bands. Metaphase chromosomes from a male were treated with the protease tryspin (to remove protein) and then stained with a mixture of dyes called Giemsa (named after Gustav Giemsa, who first used it) and viewed using a light microscope. Each pair of chromosomes has a similar length and banding pattern that allows them to be aligned. Chromosomes from a female would have two X chromosomes rather than the X and Y shown here
9.2Genetic Mapping
A genetic map is a representation of the distance between two DNA elements based upon the frequency at which recombination occurs between the two. The first genetic map of a chromosome was constructed by Alfred Sturtevant using data from Drosophila mating crosses collected by Thomas Morgan (Morgan, 1910). Sturtevant used the frequency at which particular observable phenotypes were separated from other genes (through recombination events) during meiosis. The information gained from the experimental crosses could be used to plot out the location of genes – tightly linked genes are physically
292 |
GENOME SEQUENCING PROJECTS 9 |
|
|
sequence variations between individuals. It is estimated that more than 99 per cent of human DNA sequences are the same across the population. This still allows for huge numbers of variations in DNA sequence between individuals. Several different methods have been used to exploit the inheritance of these variations to map their genomic location.
•Single-nucleotide polymorphisms. The most common types of sequence variation between individuals are described as single-nucleotide polymorphisms (SNPs), in which a single base pair is different between one individual and another. These differences may occur as frequently as about once every 100 –300 bp (Collins et al., 1998). Some of these alterations will be disease causing mutations – they may change the sequence of amino acids within a protein or alter the way in which gene expression occurs to impair the function of the resulting protein. Many SNPs, however, occur in non-coding regions of DNA or, even if they do occur within a coding region, they may not alter the amino acid sequence of the encoded polypeptide due to the degeneracy of the genetic code. Some of the nucleotide differences between individuals will, however, result in the alteration of restriction enzyme recognition sites such that existing sites are destroyed or new sites are created (Figure 9.4). Base changes at these sites results in different length DNA fragments being produced upon restriction digestion. These restriction fragment length polymorphisms (RFLPs) are usually detected by Southern blotting (Chapter 2) using a radioactive DNA probe. RFLPs are inherited and segregate in crosses and they can therefore be mapped using linkage analysis like genes (NIH/CEPH Collaborative Mapping Group, 1992).
•VNTRs. Another common variation in humans involves short DNA sequences that are present in the genome as tandem repeats. The number of copies of variable number tandem repeats (VNTRs) at a specific genomic location can vary widely between individuals, and is described as being highly polymorphic. Restriction fragment sizes (again detected by Southern blotting) using enzymes that cleave the DNA in regions flanking the repeats will be of different sizes depending on the number of repeats present.
•Microsatellites. Microsatellites are short, 2 –6 bp, tandemly repeated sequences that occur in a seemingly random fashion distributed throughout the genome of all higher organisms. They are generally found in non-coding regions of DNA, and their function (if any) is unknown. The number of repeats found at any particular genomic location is highly individual specific. The repeats are thought to be generated by polymerase ‘slippage’ during replication (Schlotterer,¨ 2000). In humans, the most common type
|
|
|
9.3 |
PHYSICAL MAPPING |
293 |
||||||
|
|
|
|
|
|
|
|
|
|
|
|
1: |
EcoRI |
|
|
EcoRI |
EcoRI |
|
|||||
|
|
|
|
|
|||||||
|
|
|
|
|
|
|
|
|
|
|
|
|
|
GAATTC |
|
|
GAATTC |
|
|
GAATTC |
|
|
|
|
|
CTTAAG |
|
|
CTTAAG |
|
|
CTTAAG |
|
|
|
|
|
|
|
|
|
|
|
|
|
||
|
|
|
|
Hybridization probe |
|
|
|
|
|||
2: |
EcoRI |
|
|
|
|
|
EcoRI |
|
|||
|
|
|
|
|
|
|
|
||||
|
|
|
|
|
|
|
|
||||
|
|
GAATTC |
|
|
GGATTC |
|
|
GAATTC |
|
|
|
|
|
CTTAAG |
|
|
CCTAAG |
|
|
CTTAAG |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Hybridisation probe
1 2
Figure 9.4. Restriction fragment length polymorphisms. (a) A section of DNA that contains three recognition sites for the restriction enzyme EcoRI. A single base change within one of the sites destroys the recognition sequence. (b) Cutting the DNA with EcoRI will generate different sized fragments that will be able to hybridize to the labelled DNA fragment (hybridization probe) shown. In the first case two small fragments will be formed that are capable of binding the probe, while in the second a single, larger fragment will bind. The restriction fragments are separated on an agarose gel and subjected to Southern blotting (see Figure 2.21) to identify sequences that are complementary to the probe
of microsatellite is 5 -AC-3 and several thousand different AC arrays may occur throughout the genome. Dinucleotide microsatellites in mammals typically vary in repeat number from about 10 to 30 repeats. The microsatellite DNA is subjected to PCR amplification using primers that flank the repeated region. The size of the PCR product obtained will therefore depend on the number of repeats. Microsatellites are inherited from one generation to the next and can thus be used for mapping by linkage analysis (Dib et al., 1996).
9.3Physical Mapping
The information held within genetic maps provides vital clues as to the order and approximate distance between particular DNA sequences within a chromosome. The map, although not providing sequence information itself, yields a framework onto which subsequently obtained sequence information can be
294 |
GENOME SEQUENCING PROJECTS 9 |
|
|
applied. The physical map of a genome is a map of genetic markers made by analysing a genomic DNA sequence directly, rather than analysing recombination events. As with genetic maps, physical maps for each chromosome within the genome can be constructed. Again, a variety of different techniques have been used to construct physical maps in the absence of complete sequence information.
•Restriction maps. The digestion of genomic DNA, or even isolated chromosomes, with restriction enzymes produces a large number of fragments that appear to run as a continuous smear, rather than as discrete bands, on
agarose gels after electrophoresis. However, certain restriction enzymes, e.g. NotI, have a comparatively large recognition sequence (5 -GCGGCCGC- 3 ) that is rarely found in human DNA sequences. The recognition site for NotI would be expected to occur, by chance, every 48 = 65 536 bp. Experimentally, NotI cleaves human DNA on average once every 10 Mbp. The discrepancy between these two numbers arises from the fact that
the DNA sequence within the genome is not random. For example, the sequence 5 -CG-3 , occurs comparatively rarely in the human genome and clusters of this dinucleotide tend to accumulate only at the 5 -end of actively transcribed genes (Cross and Bird, 1995). The recognition sequence for the NotI restriction enzyme contains two of these dinucleotide repeats and explains why the enzyme cuts human DNA so infrequently. Even using rare cutting restriction enzymes such as NotI, the construction of genomic restriction maps like those generated for small DNA fragments (Figure 9.1), is extremely difficult. Restriction mapping does provide highly reliable fragment ordering and distance estimation, but has only been completed for a few human chromosomes (Ichikawa et al., 1993; Hosoda et al., 1997).
•Radiation hybrid maps. A radiation hybrid is, usually, a hamster cell line that carries a relatively small DNA fragment from the genome of another organism, e.g. human. Irradiating human cells with X-rays causes random breaks within the DNA and produces fragments. The size of the fragments produced decreases as the dose of X-rays increases. The radiation levels used are sufficient to kill the human cells, but the chromosome fragments can be rescued by fusing the irradiated cells with a hamster cell in vitro. Typically, the human DNA fragments in the hybrid are a few Mbp long. The human DNA within the hybrid cell line is then analysed for the genetic markers it carries, either by hybridization, or by PCR. The closer the two markers are, the greater the probability those markers will be on the same DNA fragment and therefore end up in the same radiation hybrid.
9.4 NUCLEOTIDE SEQUENCING |
295 |
|
|
A B C D
Clone 1: 



C D E F
Clone 2:
F G H
Clone 3:
Figure 9.5. Aligning clones by STS mapping. Each clone contains several STSs. Clone 1 has four (A, B, C and D). Clone 2 also contains STSs C and D. Therefore clones 1 and 2 overlap with each other
•STS maps. A sequence tagged site (STS) is a DNA fragment, typically 100 – 200 bp in length, generated by PCR using primers based on already known DNA sequences. The genomic site for the sequence in question can be ‘tagged’ by its ability to hybridize with that sequence. STSs can be generated from previously cloned genes, or from other random non-gene sequences. Genomic DNA fragments that have been cloned into a library can then be ordered on the basis of the STSs they contain (Figure 9.5). This technique has been used to order inserts from individual human chromosomes in a YAC library (Foote et al., 1992), but fell foul when it was discovered that some YACs contained DNA from more than one human genome location. An STS map of the human genome has, however, been constructed using a series of radiation hybrids (Hudson et al., 1995).
The physical maps, although not aligning DNA base sequences themselves, have proved immensely useful in producing ordered library clones. The final stage of any sequencing project is then to determine the individual base sequence of each clone. Before we look at how the human genome sequence was attained and assembled, we needed to understand how the DNA sequence information itself is obtained.
9.4Nucleotide Sequencing
The uniformity of the DNA molecule and the seemingly monotonous repetition of the nucleotide bases may seem like impenetrable barriers to determining the precise sequence order of the bases within nucleic acid. In 1966, Robert Holley published the results of a 7 year project to sequence the alanine tRNA from
296 |
GENOME SEQUENCING PROJECTS 9 |
|
|
yeast (Holley, 1966). At 80 nucleotides in length, tRNAs are relatively small molecules in comparison to complete genes, or even complete genomes. The first DNA molecule to be sequenced was that of the bacteriophage λ cohesive (cos) ends (Wu and Taylor, 1971). These sequences, which are only 12 bases long, were obtained after the synthesis of a complementary RNA molecule and the subsequent use of RNA sequencing procedures. The methods used were, however, impractical for DNA sequencing on a large scale. In 1975, Fred Sanger and Alan Coulson devised a method of direct DNA sequencing referred to as the plus –minus method (Sanger and Coulson, 1975). This method utilized a DNA polymerase, primed by synthetic radio-labelled oligonucleotides, to generate fragments of DNA that could be analysed following electrophoresis and autoradiography. This technique was used to determine the entire 5386 bp sequence of the bacteriophage øX174 genome (Sanger et al., 1977).
9.4.1Manual DNA Sequencing
Two alternative, and improved, sequencing methods were described in 1977. Allan Maxam and Walter Gilbert devised a chemical method for cleaving the sugar –phosphate backbone of a radio-labelled DNA fragment at specific bases (Maxam and Gilbert, 1977). They used specific chemicals to modify individual DNA bases (e.g. the modification of T residues with potassium permanganate) or sets of bases (e.g. the modification of both A and G residues with formic acid) prior to cleavage of the sugar –phosphate backbone with piperidine at the modified bases (Maxam and Gilbert, 1980). The separation of the cleaved products using high-resolution polyacrylamide gel electrophoresis allowed unequivocal assignment of individual bases within a DNA sequence. Their method was, however, limited in the length of the DNA that can be sequenced during a single reaction (approximately 100 bases) and by the use of harsh chemicals required to modify and cleave the DNA.
Fred Sanger and his colleagues devised an alternative sequencing approach
based upon the faithful replication of DNA using a DNA polymerase (Sanger, Nicklen and Coulson, 1977b). They relied on the incorporation of 2 , 3 dideoxynucleotides into a newly replicated DNA chain to generate DNA
fragments that ended at a specific base (Figure 9.6). The dideoxynucleotide lacks a 3 hydroxyl group and, consequently, when it is incorporated into an extending DNA chain, DNA replication cannot continue as the 3 hydroxyl group is not available for the addition of further nucleotides. Thus, the growing DNA chain is terminated after the addition of the dideoxynucleotide. As originally described by Sanger, DNA replication was initiated by the binding of a complementary oligonucleotide to the DNA sequence and subsequent
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
9.4 |
|
|
|
NUCLEOTIDE SEQUENCING |
297 |
||||||||||||||||||||||
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
O |
|
|
|
O |
O |
|
|
|
Base |
|
O |
|
|
|
O |
O |
|
|
|
Base |
|
||||||||||||||||||||||||||||||||
−O |
|
O |
|
OCH2 |
|
−O |
|
O |
|
OCH2 |
|
|
|
|
||||||||||||||||||||||||||||||||||||||||
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
||||||||||||
P |
|
|
|
|
O |
P |
|
|
|
|
|
|
||||||||||||||||||||||||||||||||||||||||||
|
|
|
|
|
|
P |
|
|
|
|
|
|
P |
|
|
|
P |
|
O |
|
|
P |
|
|
|
|
|
|||||||||||||||||||||||||||
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
||
|
O− |
|
|
|
O− |
O− |
|
|
O |
|
|
O− |
|
|
|
O− |
O− |
|
|
O |
|
|
|
|
||||||||||||||||||||||||||||||
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
||
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
OH H |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|||||||||||
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
H |
H |
||||||||
|
|
|
|
|
|
|
Deoxynucleotide triphosphate |
|
|
|
|
|
Dideoxynucleotide triphosphate |
|
|
|
|
|||||||||||||||||||||||||||||||||||||
Figure 9.6. The structure of a deoxynucleotide triphosphate and its dideoxy derivative
incubation with DNA polymerase. The newly synthesized DNA will thus be complementary to the strand of DNA to which the oligonucleotide binds. The sequencing reaction was then split into four separate parts. To each was added a mixture of the four nucleotide triphosphates (dNTPs) required for the synthesis of new DNA. One of these was radio-labelled so that the newly synthesized DNA could be easily detected. Additionally, a single dideoxynucleotide triphosphate (either ddATP, ddGTP, ddCTP or ddTTP) was included in each reaction at a concentration of approximately 1/10 of its deoxynucleotide counterparts. Therefore, in the reaction containing ddATP, for example, when a T residue occurs on the template strand, in most cases a dATP will be inserted into the newly synthesized chain. However, at a relatively low frequency the dideoxy form of the nucleotide will be incorporated and the chain will terminate at this point. Since many DNA molecules are produced at the same time, this process results in the formation of a population of partially synthesized radioactive DNA molecules each having a common 5 -end, but each varying in length to a specific base at the 3 -end (Figure 9.7). These products can be separated using polyacrylamide gel electrophoresis and the sequence of the newly synthesized DNA can be read. The gel used to separate the newly synthesized DNA fragments usually contains high concentrations of urea (7 M) and is run at a high power level to heat the gel to about 70 ◦C. Both of these have denaturing effects on DNA fragments and help reduce secondary structure that could occur in the single-stranded molecules that may make them run anomalously through the gel.
The use of DNA replication as a tool for sequencing has several advantages.
•DNA synthesis can be initiated at any known point in a DNA sequence through the design of an oligonucleotide. This does mean that some knowledge of the DNA sequence is required before sequencing can commence. Many popular cloning vectors (Chapter 3) contain common oligonucleotide
