CURRENT FLUORESCENT DNA SEQUENCING |
335 |
Figure 10.7 Schematic operation of current commercially available automated DNA sequencing gel readers. (a ) ABI four-color instrument. ( b ) Pharmacia one-color instrument.
ALF is four times the number used in the ABI instrument. The use of a single color sim-
plifies the construction of labeled primers, since |
only a single primer is needed |
for the |
||
four |
sequencing reactions, whereas |
with the ABI approach, if fluorescent primers are |
||
used, |
four different ones must be |
made—one for each |
color. The basic advantage of |
the |
ALF is the higher signal to noise for a given laser strength, since the full intensity of the exciting beam can be used to illuminate each sample continuously. This higher signal to noise allows faster running and, in principle, smaller lanes.
At present the advantages of the two different fluorescent approaches really depend on the application for which they are intended. If massive amounts of sample throughput, on relatively small DNA fragments, is most important, the ABI has an edge because of the larger number of samples that can be run per gel. If longer fragments are important, or if sample amounts are limited, or if the raw data must be scrutinized as in mutation detection (see Chapter 13), the ALF has the edge because of its greater sensitivity.
The remaining status of most state-of-the-art DNA sequencing is easy to summarize. The gels are still made manually. Attempts to manufacture and distribute precast gels have been a dismal failure. Samples can be loaded manually; however, semiautomatic methods are widely available, like multiple-headed microsyringes. Fully automated gelloading robots will be commonplace soon. The sequencing chemistry done can be done in
336 DNA SEQUENCING: CURRENT TACTICS
a fully automated form for almost all choices of templates and tactics (described later in this chapter). Several different robotic systems that perform the sequencing reactions in microtitreplate wells are now available (see Chapter 9). Three basic choices are available for the template. The sample to be sequenced can be cloned into the single-stranded DNA bacteriophage M13, and then a single M13 sequencing primer can be used for all templates. Alternatively, PCR can be used to prepare the template by a process called cycle sequencing in which one strand is differentially labeled or synthesized during cycles of linear amplification and terminators are introduced to allow the sequence to be read. The third general approach is to sequence directly from genomic DNA. Here the primer is
used directly on double-stranded plasmid, bacteriophage, cosmid, or even bacterial DNA after that sample has been melted.
VARIATIONS |
IN |
CONTEMPORARY |
DNA SEQUENCING |
TACTICS |
|
|
|
|
||
A tremendous amount of energy and cleverness has gone into attempts to improve and |
|
|||||||||
optimize current DNA sequencing technology. Here we cover some recent developments |
|
|||||||||
near the cutting edge of conventional DNA sequencing. The first improved the throughput |
|
|||||||||
by increasing the running speed. Major improvements in DNA sequencing rates are |
|
|||||||||
achievable by the use of thin gel samples. These can be either gel-filled capillaries or thin |
|
|||||||||
gel slabs. The advantage of a thin gel is that heat dissipation is more effective. The sam- |
|
|||||||||
ple has a more even temperature distribution, and probably more important, higher-field |
|
|||||||||
strengths can be used that can increase the running speed by up to an order of magnitude. |
|
|||||||||
This increases sample throughput, and it diminishes any residual effects of diffusion. |
|
|||||||||
However, greater detection sensitivity is needed to |
process the fluorescence as the sam- |
|
||||||||
ples whip by the detector. Special gel materials are |
also available that allow faster run- |
|
||||||||
ning. For example, the Long Ranger |
|
|
TM |
gel speeds up the electrophoresis; it also appears to |
||||||
have better resolution than standard polyacrylamide when longer than conventional run- |
|
|||||||||
ning gels are used. Perhaps most significant, these gels can be reloaded and rerun several |
|
|||||||||
times before their performance starts to deteriorate. |
|
|
|
|
|
|||||
The major improvement in recent DNA sequencing chemistry has been the use of en- |
|
|||||||||
gineered polymerases like the modified form |
of |
bacteriophage T7 |
DNA |
polymerase, |
|
|||||
called Sequenase (version 2). This genetically modified form of the enzyme has improved |
|
|||||||||
processivity, which means it makes more even sequencing ladders. The behavior of the |
|
|||||||||
enzyme is further enhanced by using Mn |
|
|
|
2 ions |
instead of Mg |
2 (Fig. 10.8). With |
||||
Sequenase, the limiting factor in long DNA sequencing reads is at the level of the elec- |
|
|||||||||
trophoresis and not the sequencing chemistry. Different |
genetically |
engineered poly- |
|
|||||||
merases have been optimized for cycle sequencing. These include Amplitaq FS and |
|
|||||||||
Thermosequenase. |
|
|
|
|
|
|
|
|
||
In most DNA sequencing the position of a band is used as the sole source of informa- |
|
|||||||||
tion. The intensity of the band is ignored. With the very even sequencing ladders provided |
|
|||||||||
by the use of Sequenase, and the high signal-to-noise ratio of the ALF-type systems, it is |
|
|||||||||
possible to use intensity as a base specific label. A typical result is shown in Figure 10.9. |
|
|||||||||
Here different amounts of fluorescent and nonlabeled primers were used for the four dif- |
|
|||||||||
ferent base-specific terminations. All samples were combined into a single lane. The re- |
|
|||||||||
sult is equivalent to four-color |
sequencing in |
terms |
of throughput and eliminating the |
|
||||||
need to register adjacent lanes. However, it is not clear how resistant the intensity-label- |
|
|||||||||
ing process |
is |
to various potential errors. Thus |
this |
method has not seen widespread use. |
|
|||||
VARIATIONS IN CONTEMPORARY DNA SEQUENCING TACTICS |
337 |
Figure 10.8 |
Example of how engineered polymerases |
(bottom |
) provide more even DNA sequenc- |
|
ing ladders than natural polymerases ( |
top |
). Provided by Wilhelm Ansorge. |
|
|
Figure 10.9 |
|
Example of |
the use of bases labeled with different amounts of a single dye to get in- |
|
|||
formation |
about |
all |
four bases in |
a single lane with only one color. |
(a ) Normal |
sequencing (four |
|
separate |
lanes). |
( |
b ) Two |
|
bases at a time (two lanes). ( |
c ) All four bases in one lane. |
Taken from |
Ansorge et al. (1990).
338 DNA SEQUENCING: CURRENT TACTICS
Three other DNA sequencing tricks are worth describing briefly. One solution to the problems of fast on-line analysis is off-line analysis. A very clever way to do this is the bottom wiper. As shown in Figure 10.10, this consists of a short sequencing gel atop a moving membrane. As the electrophoresis proceeds, samples are automatically eluted and transferred to the membrane. The resulting blot can then be analyzed off line in any way
one wishes. The unique aspect of this approach |
is that the spacing between adjacent |
|||
bands is not only a function of the electrophoresis, it can also be manipulated by how fast |
||||
the membrane moves. Thus samples that |
are too close to each other to be well resolved |
|||
by a given detector, even though the bands are resolved in the electrophoresis, can be sep- |
||||
arated and analyzed individually. |
|
|
||
The second method, developed by Barbara Shaw, is a novel variation on Sanger se- |
||||
quencing chemistry. Here the use |
of dideoxynucleoside triphosphate terminators is |
|||
avoided. Instead, trace amounts of boron derivatives of the normal deoxynucleoside |
||||
triphosphates |
are added |
to the cocktail of |
substrates used for DNA polymerase. |
|
Compounds like 5 |
- -[P-borano]-triphosphates are well tolerated by most DNA poly- |
|||
merases. These compounds have a BH |
|
3 group replacing an oxygen on the alpha phos- |
||
phate (nearest the sugar). Thus they lead to incorporation of boron-substituted phosphates |
||||
into the DNA chain. The polymerization |
process is efficient enough that the derivatives |
|||
can be introduced as part of PCR amplification. These derivatives are resistant to exonu- |
||||
clease III cleavage. Thus, when the PCR product is digested with exonuclease III, a lad- |
||||
der of fragments is produced that terminates at the first location at which a boron deriva- |
||||
tive is encountered. |
|
|
|
|
A final trick that appears to be extremely promising is internal labeling. Here a primer |
||||
is used adjacent to some 3 |
|
known flanking sequence. Label is introduced into the primer |
||
by selective extension in the absence of one of the four ordinary dpppN’s (essentially the |
||||
original Ray Wu sequencing strategy) selected from the known sequence. Either fluores- |
||||
cein-labeled or IR-dye-labeled dU or dA can be used (Fig. 10.11). The latter appears to |
||||
be superior. There are several advantages in this |
approach. The fluorophores introduced |
|||
are internal, which protects them from |
any exonuclease degradation; there is no back- |
|||
ground from a great excess of labeled primer, and there is no need to synthesize fluores- |
||||
cent primers. An example of the success in using internal labeling is shown in Figure |
||||
10.12. Admittedly this is an extraordinarily good sequencing result. However, recent ex- |
||||
perience at the EMBL where this technique was developed indicates that even in a teach- |
||||
ing setting, |
the use of internal labeling routinely proved highly accurate raw sequencing |
|||
Figure 10.10 Schematic illustration of the bottom wiper used to transfer DNA sequencing ladders to a membrane.
VARIATIONS IN CONTEMPORARY DNA SEQUENCING TACTICS |
339 |
Figure 10.11 |
Fluorescent dyes used for internal labeling. |
(a ) Dye structures. ( |
b ) Typical proce- |
dure for internal labeling.
340
Figur e 10.12 |
Results |
of a v e ry good |
but not e xceptional |
DN A sequencing |
ladder |
obtained |
with |
internal labeling. |
Ta ken from Grothues et al. (1993). |
|
|
|
|
||
ERRORS IN DNA SEQUENCING |
341 |
data with typical reads of 400 to 500 bases. Since the required fluorescent materials are readily available, there is to reason why they should not be incorporated into all one-color fluorescent DNA sequencing methods.
ERRORS IN |
DNA SEQUENCING |
|
|
|
|
|
A major factor in automated DNA sequencing is continued improvements in the software |
|
|||||
used to analyze the data. From manual sequence calling, the state of |
the art has |
pro- |
||||
gressed to automated sequence reading, regardless of whether radioactive, chemilumines- |
|
|||||
cent, or fluorescent sequencing data are recorded. Most software allows |
manual editing |
|
||||
and overriding. Ideal software indicates where bases are known with great accuracy and |
|
|||||
where there are ambiguities. One recent report cited a 1% automatic calling accuracy in |
||||||
350-base fluorescent sequencing runs, with significant deterioration beyond this point to |
||||||
17% error in 500 base runs (Koop et al., 1993). A more recent study (Table 10.1) is more |
||||||
optimistic and demonstrates that manual editing can improve, sometimes substantially, on |
|
|||||
the accuracy of automated calling software (Naeve et al., 1995). The best current auto- |
||||||
matic software is claimed to read 500 base runs with less than 1% error. It is very impor- |
||||||
tant to realize that most of the issues that |
have been addressed thus far at the software |
|||||
level are how to deal with cases where adjacent fragments are only partially resolved. The |
|
|||||
resolution of successive bands in DNA sequencing gel electrophoresis gradually deterio- |
|
|||||
rates as the size of the bands increases. Larger fragments spend more time in the gel and |
||||||
have correspondingly more time to disperse. They also are increasingly subject to electri- |
|
|||||
cal orientation (Chapter 5) which leads to a loss in size-dependent electrophoretic mobil- |
|
|||||
ity. For these reasons one usually tries to sequence both strands of a target, and this pro- |
||||||
vides the most accurate sequence possible at both ends of the target. |
|
|
||||
Many of the errors and ambiguities that occur in DNA sequencing data are systematic. |
|
|||||
This encourages the use of clever computer |
algorithms or |
artificial |
intelligence |
ap- |
||
proaches to refine, even more, automated sequence calling. Since the average separation |
|
|||||
between adjacent fragment lengths varies very |
gradually in DNA electrophoresis, from |
|
||||
the width of a band, one usually knows how many bases it represents even if these are not |
|
|||||
well resolved, as in the case of a run of |
the same base at very large fragment sizes. |
|||||
However, when the results are examined more closely, it is apparent that the band spacing |
|
|||||
is not perfect; it varies slightly depending on the identity of the last base in the chain, and |
||||||
perhaps the one before (Fig. 10.13 |
|
a ). Band intensities are also a function of the local se- |
||||
quence, and they are markedly affected |
by the particular DNA polymerase used. |
|||||
Secondary structure in the single-stranded DNA is not totally eliminated by the denatur- |
|
|||||
ing conditions used (Fig. 10.13 |
|
b ). This can be partially compensated for by using base |
||||
analogs like 7-deazaG instead of G, since these form weaker secondary structures. When |
||||||
DNA strands form hairpins under the conditions of sequencing gel electrophoresis, they |
|
|||||
migrate faster than expected. The result is called a compression; part of the sequence may |
|
|||||
be missed or may just be impossible to read (Fig. 10.14). Frequently compressions occur |
||||||
on one strand but not on the complementary strand; this is one of the major reasons to se- |
|
|||||
quence both strands. At first, the strand dependence of compressions may seem puzzling. |
|
|||||
After all, |
whatever intrastrand base pairs that |
can be formed |
by one strand can also |
be |
||
formed by its complement. However, two additional complications arise. First is that any strand with a high local density of G residues can form unusual helical structures with G–G pairing. These are very stable; fortunately the complementary strand will be C-rich
342
T ABLE |
10.1 |
|
Accuracy of |
A utomated DN |
A Sequencing as a Function of the Distance fr |
|
|
om the Primer |
|
|
|
|
|
|
|
|
||||||||
|
|
|
|
|
|
Maximum Correct (%) |
|
|
|
|
Median Correct (%) |
|
|
|
|
Minimum Correct (%) |
|
|
||||||
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Method |
|
|
1– |
|
101– |
201– |
301– |
401– |
501– |
1– |
101– |
201– |
301– |
401– |
501– |
1– |
101– |
201– |
301– |
401– |
501– |
|||
|
|
|
100 |
|
200 |
300 |
400 |
500 |
600 |
100 |
200 |
300 |
400 |
500 |
600 |
100 |
200 |
300 |
400 |
500 |
600 |
|||
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Dye-primer |
, |
|
98 |
|
100 |
100 |
100 |
99 |
96 |
93 |
100 |
100 |
99 |
92 |
44 |
38 |
99 |
96 |
71 |
14 |
9 |
|||
unedited |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Dye-primer |
, |
|
96 |
|
100 |
100 |
100 |
99 |
65 |
93 |
100 |
199 |
99 |
98 |
61 |
38 |
100 |
100 |
98 |
69 |
0 |
|||
edited |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Dye-terminator |
, |
100 |
|
100 |
100 |
100 |
100 |
75 |
97 |
100 |
99 |
98 |
83 |
27 |
51 |
92 |
88 |
36 |
0 |
0 |
||||
unedited |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Dye-terminator |
, |
100 |
|
100 |
100 |
100 |
100 |
84 |
97 |
100 |
100 |
100 |
84 |
15 |
21 |
94 |
95 |
39 |
0 |
0 |
||||
edited |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|||
Sour ce: |
Adapted from Nae |
v e et al. |
(1995). |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
||||
ERRORS IN DNA SEQUENCING |
343 |
Figure 10.13 |
Effect of local DNA sequences on the band spacing and intensities seen in sequenc- |
|
ing data. Taken from Tibetts et al. (1996). |
(a ) Relative band spacings are intrinsic to the sequence. |
|
(b ) Relative band intensities depend on the polymerase. |
|
|
344 DNA SEQUENCING: CURRENT TACTICS
Figure 10.14 Example of compression artifacts in DNA sequencing caused by stable secondary structures in the single-stranded sample. Shown with the sequence data are the automatic software
call, and the true sequence is shown above. Taken from Yamakawa et al., (1996)
instead of G-rich. The use of 7-deazaG presumably eliminates such structures. The second, and more serious complication is the sequence of the loop of the hairpin. It turns out that certain loop sequences promote particularly stable hairpin formation. An example is GCGAAAGC, which forms a hairpin with only two base pairs, but its melting tempera-
ture is 76°C in 0.15 M salt (Hirao et al., 1990). A recent study surveyed a large number of natural and synthetic sequences that led to compressions (Yamakama et al., 1996).
Remarkably all but 2% of these were formed by |
only two types of sequence |
motifs. |
|||
About a |
third, which showed |
up on both strands |
were |
hairpins with a G |
C-rich stem |
( 3 |
bp) connected by |
a 3–4 base loop. |
The |
remaining two-thirds occurred |
on only 1 |
