Genomics: The Science and Technology Behind the Human Genome Project. |
Charles R. Cantor, Cassandra L. Smith |
|
Copyright © 1999 John Wiley & Sons, Inc. |
|
ISBNs: 0-471-59908-5 (Hardback); 0-471-22056-6 (Electronic) |
10 DNA Sequencing: Current Tactics
WHY |
DETERMINE |
DNA |
SEQUENCE |
|
A complete DNA sequence of a representative human genome is the major goal of the hu- |
|
|||
man |
genome project. Complete DNA sequences of other genomes are also sought. Why |
|
||
do we want or need this information? All descriptions of the organization of a genome, at |
||||
lower resolution than the sequence, appear to offer little insight into genome function. |
||||
Sometimes genes with common or related functions are clustered. This is particularly true |
||||
in |
bacteria where the clustering allows polycistronic messages to ensure even production |
|||
of a set of interactive gene products. However, in higher cells, related genes are not nec- |
||||
essarily close |
together. For example, in humans, genes for alpha and beta globin chains |
|||
are located on different chromosomes, even though it is desirable to produce their prod- |
||||
ucts in equal |
amounts because they associate to form a heterotetramer, (alpha) |
2(beta) 2. |
||
The major purpose served by low-resolution maps is that they help us find things in the |
||||
genome. We usually want to find genes in order to study or characterize their function. It |
||||
is only at the level of the DNA sequence where we have any chance of drawing direct in- |
||||
ferences about the function of a gene from its structure. Admittedly, our ability to do this |
||||
today is still rather limited, as will be demonstrated in Chapter 15. However, from the rate |
||||
of progress in our ability to interpret DNA sequences de novo in terms of plausible gene |
||||
function, we can be reasonably optimistic that by the time the human genome is com- |
||||
pletely sequenced, coding regions will be identifiable with almost perfect accuracy, and |
||||
most new genes will carry in their sequence immediately recognizable clues about func- |
||||
tion. |
|
|
|
|
|
A second reason to have the DNA sequence of genomes is that it gives us direct access |
|||
to |
the DNA molecules |
of these genomes via PCR. Using the sequence, it will almost |
al- |
ways be possible to design primers that will amplify a small DNA target of interest, or to provide a probe that will uniquely allow effective screening of a library for a larger segment of DNA containing the region of interest. The key point is that once DNA sequence
is available, clones do not have do be stored and distributed. DNA sequences also often allow us to search for similar genes in related organisms (or even more distant organisms) more efficiently than by using DNA probes of unknown sequence. For example, to find a mouse gene comparable to a human gene, one can try to use the human gene as a hybridization probe at reduced stringency (lower temperature, higher salt) against a mouse
library or use the human gene to design PCR primers for probing the mouse genome. But, |
|
|
if one had both the relevant human and mouse DNA sequences available, a comparison |
|
|
among these might reveal consensus regions that are more highly conserved than average |
|
|
and thus better suited for hybridization or PCR to find corresponding |
genes in |
other |
species. This becomes increasingly important when searching for homologs of very dis- |
|
|
tantly related proteins. |
|
|
A continual debate in the human genome project is whether to determine the DNA se- |
|
|
quence of the junk: DNA that as far as we can tell is noncoding. Sydney |
Brenner |
was |
325
326 DNA SEQUENCING: CURRENT TACTICS
quick to point out early in the project that this DNA is rightly called junk and not garbage because, like junk, this DNA has been retained, while garbage is discarded. Today, admittedly, we cannot interpret much from noncoding DNA sequences. But this does not mean
they are nonfunctional. |
The fact that they remain in the genome argues for function, at |
||
least at the level |
of |
evolution. However, there are surely |
also functions for these se- |
quences at the level |
of |
gene regulation, chromosome function, and |
perhaps properties we |
know nothing about today. The junk is certainly worth sequencing, but it will be best to do this later in the genome project when the cost of DNA sequencing has diminished. An
analogy can be made between the genome project and the exploration of a new continent.
At the time the interior of North America was first explored, a major target was river valleys because they were accessible and because they were commercially valuable. No one willingly spent much time in deserts or arctic slopes. However, most of our oil deposits are located far from river valleys, and if we had not pushed exploration of the continent to completion, we would never have found very valuable resources. It is probably this way also with the genome; when we finally make our way through the junk, systematically, there will be some unexpectedly valuable finds. We may not know enough today to realize they were valuable, even if we could find them.
DESIGN OF DNA SEQUENCING PROJECTS
The first DNA sequence was determined in 1970 by Ray Wu at Cornell University. It con-
sisted |
of the 12-base |
single-stranded overhang at |
each end of bacteriophage lambda |
|
DNA. The samples needed were readily in hand. Two investigators worked on the project |
||||
for three years. Data handling and analysis did not present any unexpected or |
formidable |
|||
problems. The major chore was developing techniques for actually determining |
the order |
|||
of the |
bases. The method |
employed, selective addition |
of subsets of the four dpppN’s, |
still has many attractive features, and we will revisit it several times in this and the next chapter.
Today, the complete DNA sequencing of 50-kb DNA targets, the size of the entire bacteriophage lambda, is a common task in specialized high-throughput sequencing laboratories. However, such projects are not yet routine in most laboratories that do DNA sequencing. The sequencing of targets 3 to 90 times larger has been accomplished in quite a few cases. Sequencing of continuous Mb blocks of human DNA is now becoming commonplace in quite a few research groups. These projects, even 50-kb projects, pose obstacles that were inconceivable at the dawn of DNA sequencing.
It is useful to divide discussion about DNA sequencing projects into tactics and strategy. Tactics is how the order of the bases on a single DNA sample is read and confirmed. Strategy, as illustrated in Figure 10.1, has a number of components. Presumably the target
is selected in a rational manner, |
given the amount of effort that is actually required to |
complete a sequencing project. The |
upstream strategy is concerned with how the target is |
reduced to DNA samples suitable for application of the particular tactics selected. The
tactics are then used, piece by piece, |
in as efficient and automated a way as possible. |
||||
Then the downstream strategy consists in |
assembling the data into contiguous blocks of |
||||
DNA |
sequence, filling any |
gaps, and |
correcting the |
inevitable errors that creep into all |
|
DNA |
sequence data. |
|
|
|
|
|
Several caveats must |
be noted |
when |
thinking |
about DNA sequencing projects. Both |
the ideal tactics and strategy may depend on the types of targets. Effective strategies may
LADDER SEQUENCING TACTICS |
327 |
Figure 10.1 |
Design of a typical DNA sequencing project |
. |
combine several types of targets and several types of tactics. The key variable to judge efficiency and cost is the throughput: the number of base pairs of DNA sequence generated per day for each individual working in the laboratory. With current methods, except at the largest and most efficient genome sequencing groups, personnel costs are the completely dominant expense; chemicals, enzymes, and instrument depreciation all pale in compari-
son with salaries. In a few very automated and experienced centers, reagents and supplies are now the dominant costs.
Three terms are useful in evaluating sequencing progress. Raw DNA sequence is the direct data read from an experimental curve or photograph with local error correction done, for example, a manual override to correct an ambiguous call by sequence reading software. Finished sequence is the assembled DNA sequence for the entire target, with error corrections made by comparing redundant samples. In general, the complete DNA se-
quence is read separately from both DNA strands. This is a major contributor |
to |
finding |
and correcting some of the most common kinds of errors. Sequencing redundancy |
is |
the |
ratio of the number of raw base pairs of sequence acquired to the number of base pairs of finished sequence determined. It is usually at least 2, because of the need just cited to examine both strands. In general, the redundancy is dependent on the strategy used, and it has often been as high as 10 in many of the relatively large DNA sequencing projects that
have been accomplished to date.
LADDER SEQUENCING TACTICS
Virtually all current de novo DNA sequencing methods are based on the ability to fractionate single-stranded DNA by gel electrophoresis in the presence of a denaturant with single base resolution. Information about the location of particular bases in the sequence is converted into a specific DNA fragment size. Then these fragments are separated and analyzed. The gels used are either polyacrylamide or variants on this matrix like Long
Ranger TM . The denaturant is usually 7 M urea. Its presence is required to eliminate most
328 |
|
DNA |
SEQUENCING: CURRENT TACTICS |
|
|
|
|
||||
of |
the |
secondary |
structure that individual DNA strands can achieve by |
intramolecular |
|||||||
base pairing, where this is allowed by the DNA sequence. It is possible, under ideal cases, |
|||||||||||
to |
maintain single |
base resolution up to sizes of 1 kb. Some success |
has been reported |
||||||||
with ever larger sizes by the use of gel-filled capillaries. The use of denaturing gels is an |
|||||||||||
unfortunate aspect |
of |
current DNA |
sequencing. Since urea solutions are |
not stable |
to |
||||||
long-term storage, the gels must be cast within a few days of their use, and it is difficult to |
|||||||||||
reuse most gels more than several times without a serious decrease in performance. In the |
|
||||||||||
two decades since Wu’s first DNA sequencing, the ladder methods we will describe have |
|
||||||||||
produced more than 1,000 Mb of DNA sequence deposited in databases, and perhaps an |
|
||||||||||
equal amount or more that has not been published or deposited. |
|
|
|
|
|||||||
|
Two rather different approaches have been used to generate DNA sizes based on DNA |
|
|||||||||
sequence. We will describe how they are carried out starting with a single-stranded DNA |
|
||||||||||
template. Slightly more complex procedures are required if the original template is dou- |
|||||||||||
ble stranded. The first of these methods, developed by Allan Maxam and Walter |
Gilbert, |
||||||||||
is shown in Figure 10.2. The ends of the DNA are distinguished by specifically labeling |
|||||||||||
one of them. Usually this is done directly, and covalently, with a kinase that places a ra- |
|||||||||||
diolabeled phosphate at the 5 |
|
-terminus of the template. There are other ways to label the |
|||||||||
5 -end or 3 |
|
-end directly, and it is also possible to label either end indirectly, by hy- |
|||||||||
bridization with an appropriate complementary sequence. This requires that the end se- |
|||||||||||
quence be known; it usually is known, since the DNA template is cloned into a vector of |
|||||||||||
known flanking sequence. |
|
|
|
|
|
|
|||||
|
In Maxam-Gilbert sequencing base-specific or base-selective partial chemical cleavage |
||||||||||
is used to fragment the DNA. This is carried out under conditions where there is an aver- |
|
||||||||||
age of only one cut per template molecule with each cleavage scheme employed. Thus a |
|
||||||||||
very broad range of fragment sizes is produced that reflects the entire sequence of the |
|||||||||||
template. Four separate chemical fragmentation reactions are carried out; each one favors |
|||||||||||
cleavage after a specific base. The fragments are fractionated, and the sizes of the labeled |
|||||||||||
pieces are measured, usually in four |
parallel electrophoretic |
lanes. The |
DNA sequence |
||||||||
can be read directly off the gel as indicated by the example in Figure 10.3. The pattern of |
|||||||||||
bands seen is often called a ladder for reasons obvious from the figure. Note that in the |
|||||||||||
Maxam-Gilbert approach, there are additional fragments produced that are not detected |
|||||||||||
because |
they |
are |
not |
labeled, |
but |
they are present in |
the sample. |
For |
some |
alternate |
schemes of detecting DNA fragments for sequencing, like mass spectrometry, these additional pieces are undesirable.
Figure 10.2 Maxam-Gilbert sequence technique: Preparation of end-labeled, size-fractionated DNA sample.
LADDER SEQUENCING TACTICS |
329 |
Figure 10.3 Typical Maxam-Gilbert sequencing ladder and its interpretation.
The second general approach to DNA fragmentation for ladder sequencing was devel-
oped by Frederick Sanger (Fig. 10.4). This is the approach in widespread use today, for a variety of reasons, including the ability to avoid the use of toxic chemicals and the ease of adapting it to four-color fluorescent detection. One starts with a single-stranded template.
A primer is annealed to this template, near the 3 |
-end of the DNA to be sequenced. The |
|
primer must be long |
enough so that it binds only to one unique place |
on the template. |
This primer must correspond to known DNA sequence, either in the target or, more com- |
||
monly, in the flanking vector sequence. A DNA polymerase is used to extend the primer |
||
in a sequence-specific manner along the template. However, the sequence extension is |
||
halted, in a base specific manner, by allowing the occasional uptake of chain terminators: |
||
dpppN analogs that cannot be further extended by the enzyme. Almost all current DNA |
||
sequencing uses dideoxy-pppN’s as terminators. As shown in Figure 10.5, these deriva- |
||
tives lack the 3 |
OH needed to form the next phosphodiester |
bond. Four separate chain |
Figure 10.4 Sanger sequencing technique: Preparation of an end-labeled, size-fractionated DNA sample. The actual sequencing ladder will be virtually identical to that seen with the MaxamGilbert method.
330 DNA SEQUENCING: CURRENT TACTICS
Figure 10.5 Structure of a dideoxynucleoside triphosphate terminator.
extension reactions are carried out—each one with a different terminator. Label can be in- |
|
|
|
|
|||||||||||||||
troduced in several different ways: through the primer, the |
terminator, |
or internal |
|
|
|
|
|||||||||||||
dpppN’s. The resulting mixture of DNA fragments is melted |
off |
the |
template |
and |
ana- |
|
|
|
|
||||||||||
lyzed by gel electrophoresis. |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
||
ISSUES |
IN LADDER |
SEQUENCING |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
The major goal is to maximize sequencing throughput. A second, significant goal is to |
|
|
|
|
|||||||||||||||
minimize the number of sequencing errors. An important element of these goals is to be |
|
|
|
|
|
||||||||||||||
able to read the longest possible sequencing ladders, accurately. There are two significant |
|
|
|
|
|||||||||||||||
variables in this. The resolution of the gel electrophoresis will determine how far the se- |
|
|
|
|
|||||||||||||||
quencing data can be read, if there are data to be read at |
all. Ultimately there are trade- |
|
|
|
|
||||||||||||||
offs between how fast the gel can be run, which also affects the throughput, how well cer- |
|
|
|
|
|||||||||||||||
tain artifacts can be eliminated, and how much sample must be applied. The more sample |
|
|
|
|
|
||||||||||||||
we have, the easier is the detection but, in general, the lower is the resolution. Large dou- |
|
|
|
|
|||||||||||||||
ble-stranded DNAs show negligible diffusion during gel electrophoresis as described ear- |
|
|
|
|
|
||||||||||||||
lier in Chapter 5 (Yarmola et al., 1996). This is not the |
case for the smaller single- |
|
|
|
|||||||||||||||
stranded DNAs used in sequencing where diffusion is a |
significant |
cause |
of |
band |
|
|
|
|
|||||||||||
broadening. This motivates the use of higher fields where shorter running times can be |
|
|
|
|
|||||||||||||||
achieved, hence minimizing the effects of diffusion. However, higher fields lead to greater |
|
|
|
|
|||||||||||||||
joule heating. This increases the effects of thermal inhomogeneities |
which |
also |
lead |
to |
|
|
|
|
|||||||||||
band broadening. The issues are complex because field strength also influences the shape |
|
|
|
|
|||||||||||||||
of DNA in a gel and thus affects its diffusion coefficient. Other factors that affect band |
|
|
|
|
|||||||||||||||
shape and thus resolution are the volume in |
which |
the |
sample |
is |
loaded, |
the |
volume |
|
|
|
|
||||||||
sampled by the detector, and any inhomogeneities in gel concentration. For a thorough |
|
|
|
|
|||||||||||||||
discussion of the effects of these variables, see Luckey |
et |
al. (1993), |
and |
Luckey |
and |
|
|
|
|||||||||||
Smith (1993). |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
To a good approximation, the velocity, |
|
|
|
|
|
v, |
of |
DNA |
in |
denaturing acrylamide gel |
elec- |
|
|
||||||
trophoresis is proportional to 1/ |
L, |
where |
|
L is the length of the molecule. In automated flu- |
|
|
|||||||||||||
orescent detection (or the bottom wiper shown later in Fig. 10.10), the sample is exam- |
|
|
|
|
|||||||||||||||
ined at a constant distance from the starting point, |
|
|
|
|
|
|
|
D. |
|
The time it takes |
a fragment |
of |
a |
||||||
particular length to reach this distance is proportional to |
|
|
|
|
|
|
|
|
D /v |
DL. |
Hence |
the |
spacing |
||||||
between two bands of length |
L and |
L |
1 |
is |
DL |
|
D |
|
(L |
1) |
D. Thus the band |
spac- |
|||||||
ing is independent of size, but it can be increased, more or less at will by using longer and |
|
|
|
|
|||||||||||||||
longer running gels. |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
||
The second determinant of how far a ladder can be read is the uniformity of the sample |
|
|
|
|
|||||||||||||||
fragment |
yield. It |
is important to realize that |
the |
larger |
the |
target is, |
the |
smaller |
the |
yield |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
ISSUES IN LADDER SEQUENCING |
|
|
331 |
|||||
of each piece even if the distribution of fragments is absolutely uniform. Thus, with per- |
|
|
|
|
|||||||||||||||||
fect cleavage, sequencing a 100-base piece of DNA will require only 10% the amount of |
|
|
|
|
|||||||||||||||||
sample that a 1-kb target requires. Put another way, for constant amounts of DNA sample |
|
|
|
|
|||||||||||||||||
loaded, the detection sensitivity will have |
to increase in proportion to |
the |
length |
of the |
|
|
|
|
|||||||||||||
DNA target. The relative yield of particular DNA fragments is affected by |
the |
choice |
of |
|
|
|
|
||||||||||||||
DNA polymerase, the nature of the terminators and primers used, the actual DNA tem- |
|
|
|
|
|
||||||||||||||||
plate, and the reaction conditions. Much optimization has been required to produce repro- |
|
|
|
|
|||||||||||||||||
ducible runs of DNA sequence data that extend longer than 500 bases. |
|
|
|
|
|
|
|
|
|
||||||||||||
It is also important to realize that throughput is really |
the product of the |
number |
of |
|
|
|
|||||||||||||||
lanes per gel and the speed of the electrophoresis. Speed can be controlled by the electri- |
|
|
|
|
|||||||||||||||||
cal field applied. In fact higher fields |
appear |
to |
improve |
electrophoretic |
performance. |
|
|
|
|
||||||||||||
What limits the speed, once efficient cooling is provided to keep the running temperature |
|
|
|
|
|||||||||||||||||
of the gel constant, is the sensitivity of the |
detection |
scheme, if it is done |
on |
line. With |
|
|
|
||||||||||||||
off-line detection, the sensitivity is still important, not for speed, but for determining the |
|
|
|
||||||||||||||||||
number of lanes that can be used. The smaller the width of each lane, the more lanes one |
|
|
|
|
|||||||||||||||||
can place on a single gel but the smaller the amount of |
DNA one |
can actually |
load |
into |
|
|
|
|
|||||||||||||
each lane. |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
A major factor that affects the quality of DNA sequence data is the quality of the tem- |
|
|
|
|
|||||||||||||||||
plate DNA. When fluorescent labeling is used, great care must be taken not to introduce |
|
|
|
|
|||||||||||||||||
fluorescent |
contaminants |
into the |
DNA sample. |
A |
number |
of |
automated |
methods |
for |
|
|
|
|
|
|||||||
DNA preparation routinely yield DNA suitable for sequencing. These methods are conve- |
|
|
|
|
|
|
|||||||||||||||
nient because they are so standardized. A |
laboratory that tries to sequence |
DNA from |
|
|
|
|
|||||||||||||||
many different types of sources will frequently encounter difficulties. |
|
|
|
|
|
|
|
|
|
||||||||||||
In early |
DNA sequencing, |
|
32 |
|
|
|
|
|
|
|
introduced from |
g |
32 |
pppA |
|||||||
|
|
P was the label of choice, |
[ P] |
||||||||||||||||||
via kinasing of the primer for |
Sanger |
sequencing |
or |
the |
strand |
to |
be |
cleaved |
for |
|
|
|
|||||||||
Maxam-Gilbert sequencing. This isotope has a short half-life which results in very high |
|
|
|
|
|||||||||||||||||
experimental |
sensitivity. |
However, |
|
|
|
|
32P also has a |
relatively high energy beta particle, |
|
|
|
||||||||||
which causes an artifactual broadening of the thin fragment bands on DNA sequencing |
|
|
|
|
|
||||||||||||||||
gels. Instead of |
32P one can use the radioisotope |
|
|
|
|
35S, as g thio-pppA. This still has a short |
|
||||||||||||||
half-life, but the decay is softer, leading to sharper |
bands. At first, DNA sequence data |
|
|
|
|||||||||||||||||
were obtained by using X-ray film in autoradiography to make an image of the sequenc- |
|
|
|
|
|||||||||||||||||
ing gel. This can be read by hand, which is still done by some, perhaps with the help of |
|
|
|
||||||||||||||||||
devices and software to expedite transferring the data into a computer file. Alternatively, |
|
|
|
|
|||||||||||||||||
the film can be scanned and digitized by a device like a charge-couple device (CCD) cam- |
|
|
|
|
|||||||||||||||||
era. This then allows most of the data to be processed by image analysis software, with |
|
|
|
|
|||||||||||||||||
human intervention needed in difficult places. The accuracy of using film and some of the |
|
|
|
|
|||||||||||||||||
existing software does not appear to be as good as the fluorescent systems we will de- |
|
|
|
|
|||||||||||||||||
scribe later. |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
A new approach to recording data from radioactive decay is the use of imaging plates. |
|
|
|
|
|||||||||||||||||
These consist of individual pixels that record local decays. After the plate is exposed, it is |
|
|
|
|
|||||||||||||||||
read out by laser excitation in a raster |
pattern |
(scanning |
successive |
lines, |
in |
the same |
|
|
|
||||||||||||
manner as a TV camera or screen), and the resulting data are transferred into a computer |
|
|
|
|
|||||||||||||||||
file. A great advantage of imaging plates over film is that their response is a linear func- |
|
|
|
|
|||||||||||||||||
tion of dose over more than five orders of |
magnitude |
in |
intensity, and |
most |
important, |
|
|
|
|
||||||||||||
they are linear down to the lowest detectable doses. In contrast, film shows a dead zone at |
|
|
|
|
|||||||||||||||||
very low doses, and it easily saturates at high doses. Imaging plates are reusable, and for |
|
|
|
|
|||||||||||||||||
the heavy user, the great savings in film |
that |
result |
eventually |
compensate |
for the |
high |
|
|
|
|
|||||||||||
costs of the |
imaging |
plates and the |
instrument |
needed |
to |
read |
them out. Although |
it |
is |
|
|
|
|
332 DNA SEQUENCING: CURRENT TACTICS
possible, in principle, to use several different radioisotopes simultaneously, as is common
in liquid scintillation counting, and thus achieve multicolor labeling and detection, in practice, this is rarely done with radioactive DNA sequencing data.
In most contemporary DNA sequencing, radioisotopes have been replaced by fluorescent labels. These can be used on the primers, the terminators, or internally. It may seem surprising that fluorescent detection can be competitive with radioisotopes. However, one can gain enormous amounts of sensitivity in fluorescence by sequential excitation and emission from the same fluorophore until it undergoes some chemical side reaction and becomes bleached. This makes up in large part for the difference in energy between a
beta particle and the fluorescent photon. The major determinant of sensitivity in fluorescence detection is, then, not really signal; it is background. Scrupulous care must be taken to avoid the use of reagents, solvents, plastics, glove powder, and detergents that have fluorescent contaminants.
Four different colored fluorescent dyes are used in several of the most common DNA sequencing detection schemes. One dye is used for each base-specific primer extension. The ideal set of dyes would have very similar chemical structures so that their presence would affect the electrophoretic mobility of labeled DNA fragments in identical ways.
They would also have emission spectra as distinct as possible, and they would all be excitable by the same wavelength so that a single excitation source would suffice for all four dyes. The dyes would also allow similar very high sensitivity detection so that signal intensities from the four different cleavage reactions would be comparable. Inevitably with currently available dyes there are compromises. For example, a set of nearly identical dye-labeled chain terminators was produced for DNA sequencing that led to very good electrophoretic properties, but the emission spectra of these compounds were too similar
for the kind of accuracy needed in reading long sequence ladders. Subsequently a more well-resolved set of fluorescent terminators that are substrates for Sequenase, the most popular enzyme used in Sanger sequencing became commercially available. These have
the advantage that all four terminators can be used simultaneously in a single sequencing reaction.
All currently used dyes for four-color DNA sequencing are excited in the UV/visible wavelength range. The limits of this range and the typical widths of emission spectra of high quantum yield dyes make it rather difficult to detect more than four colors simultaneously. The infrared (IR) spectrum is much broader, and work is in progress trying to develop DNA sequencing dyes in this range. If the lower sensitivity of IR detection can be tolerated, such dyes would offer two advantages. The laser sources needed to excite them
are inexpensive, and at least eight different colors would be obtainable. This |
could be |
used to double the throughput of four-color sequencing, or it could be used to include a |
|
known standard in every sequencing lane to improve the accuracy of automatic sequence |
|
calling. Recently IR-excited dyes have begun to make an impact on automated DNA se- |
|
quencing. Multiple IR colors are presumably soon on the horizon. |
|
A significant improvement in fluorescent dyes for automated sequencing is the use |
of |
energy transfer methods (Glazer and Mathies, 1997). Primers contain a pair of fluorescent dyes (Fig. 10.6). One dye is common to all four primers. This is optimized to absorb the exciting laser dyes. The second dye is different in each primer, and it is close enough in each case that fluorescence resonance energy transfer is 100% efficient. Thus all the exci-
tation energy migrates to the second |
dye where it is subsequently emitted. The second |
|
dyes are chosen so that they have as |
different emission |
spectra as possible to maximize |
the ability to accurately discriminate |
the four different |
colors. |
ISSUES IN LADDER SEQUENCING |
333 |
Figure 10.6 Energy transfer primers (provided by Richard Mathies).
of four primers. ( b ) Structure of the donor dye. ( c ) Structure dyes that can be detected simultaneously in DNA sequencing.
(a ) Schematic design of a set of four different acceptor fluorescent
334 DNA SEQUENCING: CURRENT TACTICS
An alternative to fluorescent labels is chemiluminescence. This has the great advantage that no exciting light is needed. Thus the sensitivity can be extremely high, since there is no contamination from scattering of the exciting light used in fluorescence, or the effects of fluorescent impurities. Today, chemiluminescent detection schemes exist that can read-
ily be used in DNA sequencing. They have a few disadvantages. Only one color is currently available, and once the chemiluminescence has been read, it is difficult to use the gel or filter again. While this is not often a problem in most forms of DNA sequencing, it
is a problem in most mapping applications where the same filter replica of a gel is frequently probed many times in succession. Nevertheless, the sensitivity of chemiluminescence makes it attractive for some mapping applications. The advantages of four-color fluorescence are also beginning to be felt in some aspects of genome mapping. An exam-
ple was given in Chapter 8.
CURRENT FLUORESCENT DNA SEQUENCING
There are two basically distinct implementations of fluorescent detected DNA sequence determination. These are the current commonly available state-of-the-art tools used today
in |
most |
large-scale DNA sequencing projects. They each can produce more |
than |
10 |
4 to |
||
10 |
5 |
bp of raw DNA sequence per laboratory worker per day. Most allow 400 to 800 bases |
|||||
of data to be read per lane; most of the lanes give readable data when proper DNA prepa- |
|||||||
ration methods are used. The detection schemes used in the two approaches are illustrated |
|||||||
in Figure 10.7. Both are on-line gel readers. These two schemes have a number of serious |
|||||||
trade-offs. In the Applied Biosystems (ABI) instrument, based on original developments |
|||||||
by Leroy Hood and Lloyd Smith, four different colored dyes are used to analyze a mix- |
|||||||
ture |
of |
four different samples in a single gel lane (Fig. 10.7 |
|
|
a ). This allows four times |
||
more samples to be loaded per gel, if the width of the lanes is kept constant. The use of |
|||||||
four colors in a single lane avoids the problem of compensating for any differences in the |
|||||||
mobility of fragments in adjacent lanes—that is, there is no lane registration problem. In |
|||||||
order |
to |
do the four-color analysis, a laser perpendicular to |
the gel is |
used |
to |
excite one |
|
lane |
at |
a time, and the signal is detected through a rotating |
four-color |
wheel |
to |
separate |
|
the emission from the four different dyes. Thus the effective power of the laser is the time |
|||||||
shared among the lanes and the colors. With 20 lanes, the actual time-averaged illumina- |
|||||||
tion available is, at most, 1/80 the laser intensity. |
|
|
|
|
|||
|
In the alternative implementation, embodied in the Pharmacia automated laser fluores- |
||||||
cence (ALF) instrument, only a single fluorescent dye is used (Fig. 10.7 |
|
b ). The dye origi- |
nally selected was fluorescein because it is the most sensitive available for the particular laser exciting wavelength used. In a newer version of the instrument, a different laser and
an infrared emitting dye, Cy5, are used. The key feature of the ALF is that the laser excitation is in the plane of the gel, through all the sample lanes simultaneously. This design, which is based on an instrument originally developed by Wilhelm Ansorge, is possible because at the concentrations of label used for DNA sequencing the samples are optically
thin. This means that the amount of light absorbed at each lane is an insignificant fraction of the original laser intensity, so all lanes receive, effectively, equal excitation. The emission from all the lanes is recorded simultaneously by an array of detectors, one for each lane. While these could be made four-color detectors, in principle, the cost and complex-
ity is not warranted. Instead the ALF reads data from four closely spaced lanes, one for each base-specific fragmentation. Thus the number of lanes needed for one sample in the