Добавил:
Опубликованный материал нарушает ваши авторские права? Сообщите нам.
Вуз: Предмет: Файл:
Скачиваний:
59
Добавлен:
17.08.2013
Размер:
456.46 Кб
Скачать

CURRENT FLUORESCENT DNA SEQUENCING

335

Figure 10.7 Schematic operation of current commercially available automated DNA sequencing gel readers. (a ) ABI four-color instrument. ( b ) Pharmacia one-color instrument.

ALF is four times the number used in the ABI instrument. The use of a single color sim-

plifies the construction of labeled primers, since

only a single primer is needed

for the

four

sequencing reactions, whereas

with the ABI approach, if fluorescent primers are

used,

four different ones must be

made—one for each

color. The basic advantage of

the

ALF is the higher signal to noise for a given laser strength, since the full intensity of the exciting beam can be used to illuminate each sample continuously. This higher signal to noise allows faster running and, in principle, smaller lanes.

At present the advantages of the two different fluorescent approaches really depend on the application for which they are intended. If massive amounts of sample throughput, on relatively small DNA fragments, is most important, the ABI has an edge because of the larger number of samples that can be run per gel. If longer fragments are important, or if sample amounts are limited, or if the raw data must be scrutinized as in mutation detection (see Chapter 13), the ALF has the edge because of its greater sensitivity.

The remaining status of most state-of-the-art DNA sequencing is easy to summarize. The gels are still made manually. Attempts to manufacture and distribute precast gels have been a dismal failure. Samples can be loaded manually; however, semiautomatic methods are widely available, like multiple-headed microsyringes. Fully automated gelloading robots will be commonplace soon. The sequencing chemistry done can be done in

336 DNA SEQUENCING: CURRENT TACTICS

a fully automated form for almost all choices of templates and tactics (described later in this chapter). Several different robotic systems that perform the sequencing reactions in microtitreplate wells are now available (see Chapter 9). Three basic choices are available for the template. The sample to be sequenced can be cloned into the single-stranded DNA bacteriophage M13, and then a single M13 sequencing primer can be used for all templates. Alternatively, PCR can be used to prepare the template by a process called cycle sequencing in which one strand is differentially labeled or synthesized during cycles of linear amplification and terminators are introduced to allow the sequence to be read. The third general approach is to sequence directly from genomic DNA. Here the primer is

used directly on double-stranded plasmid, bacteriophage, cosmid, or even bacterial DNA after that sample has been melted.

VARIATIONS

IN

CONTEMPORARY

DNA SEQUENCING

TACTICS

 

 

 

 

A tremendous amount of energy and cleverness has gone into attempts to improve and

 

optimize current DNA sequencing technology. Here we cover some recent developments

 

near the cutting edge of conventional DNA sequencing. The first improved the throughput

 

by increasing the running speed. Major improvements in DNA sequencing rates are

 

achievable by the use of thin gel samples. These can be either gel-filled capillaries or thin

 

gel slabs. The advantage of a thin gel is that heat dissipation is more effective. The sam-

 

ple has a more even temperature distribution, and probably more important, higher-field

 

strengths can be used that can increase the running speed by up to an order of magnitude.

 

This increases sample throughput, and it diminishes any residual effects of diffusion.

 

However, greater detection sensitivity is needed to

process the fluorescence as the sam-

 

ples whip by the detector. Special gel materials are

also available that allow faster run-

 

ning. For example, the Long Ranger

 

 

TM

gel speeds up the electrophoresis; it also appears to

have better resolution than standard polyacrylamide when longer than conventional run-

 

ning gels are used. Perhaps most significant, these gels can be reloaded and rerun several

 

times before their performance starts to deteriorate.

 

 

 

 

 

The major improvement in recent DNA sequencing chemistry has been the use of en-

 

gineered polymerases like the modified form

of

bacteriophage T7

DNA

polymerase,

 

called Sequenase (version 2). This genetically modified form of the enzyme has improved

 

processivity, which means it makes more even sequencing ladders. The behavior of the

 

enzyme is further enhanced by using Mn

 

 

 

2 ions

instead of Mg

2 (Fig. 10.8). With

Sequenase, the limiting factor in long DNA sequencing reads is at the level of the elec-

 

trophoresis and not the sequencing chemistry. Different

genetically

engineered poly-

 

merases have been optimized for cycle sequencing. These include Amplitaq FS and

 

Thermosequenase.

 

 

 

 

 

 

 

 

In most DNA sequencing the position of a band is used as the sole source of informa-

 

tion. The intensity of the band is ignored. With the very even sequencing ladders provided

 

by the use of Sequenase, and the high signal-to-noise ratio of the ALF-type systems, it is

 

possible to use intensity as a base specific label. A typical result is shown in Figure 10.9.

 

Here different amounts of fluorescent and nonlabeled primers were used for the four dif-

 

ferent base-specific terminations. All samples were combined into a single lane. The re-

 

sult is equivalent to four-color

sequencing in

terms

of throughput and eliminating the

 

need to register adjacent lanes. However, it is not clear how resistant the intensity-label-

 

ing process

is

to various potential errors. Thus

this

method has not seen widespread use.

 

VARIATIONS IN CONTEMPORARY DNA SEQUENCING TACTICS

337

Figure 10.8

Example of how engineered polymerases

(bottom

) provide more even DNA sequenc-

ing ladders than natural polymerases (

top

). Provided by Wilhelm Ansorge.

 

Figure 10.9

 

Example of

the use of bases labeled with different amounts of a single dye to get in-

 

formation

about

all

four bases in

a single lane with only one color.

(a ) Normal

sequencing (four

separate

lanes).

(

b ) Two

 

bases at a time (two lanes). (

c ) All four bases in one lane.

Taken from

Ansorge et al. (1990).

338 DNA SEQUENCING: CURRENT TACTICS

Three other DNA sequencing tricks are worth describing briefly. One solution to the problems of fast on-line analysis is off-line analysis. A very clever way to do this is the bottom wiper. As shown in Figure 10.10, this consists of a short sequencing gel atop a moving membrane. As the electrophoresis proceeds, samples are automatically eluted and transferred to the membrane. The resulting blot can then be analyzed off line in any way

one wishes. The unique aspect of this approach

is that the spacing between adjacent

bands is not only a function of the electrophoresis, it can also be manipulated by how fast

the membrane moves. Thus samples that

are too close to each other to be well resolved

by a given detector, even though the bands are resolved in the electrophoresis, can be sep-

arated and analyzed individually.

 

 

The second method, developed by Barbara Shaw, is a novel variation on Sanger se-

quencing chemistry. Here the use

of dideoxynucleoside triphosphate terminators is

avoided. Instead, trace amounts of boron derivatives of the normal deoxynucleoside

triphosphates

are added

to the cocktail of

substrates used for DNA polymerase.

Compounds like 5

- -[P-borano]-triphosphates are well tolerated by most DNA poly-

merases. These compounds have a BH

 

3 group replacing an oxygen on the alpha phos-

phate (nearest the sugar). Thus they lead to incorporation of boron-substituted phosphates

into the DNA chain. The polymerization

process is efficient enough that the derivatives

can be introduced as part of PCR amplification. These derivatives are resistant to exonu-

clease III cleavage. Thus, when the PCR product is digested with exonuclease III, a lad-

der of fragments is produced that terminates at the first location at which a boron deriva-

tive is encountered.

 

 

 

A final trick that appears to be extremely promising is internal labeling. Here a primer

is used adjacent to some 3

 

known flanking sequence. Label is introduced into the primer

by selective extension in the absence of one of the four ordinary dpppN’s (essentially the

original Ray Wu sequencing strategy) selected from the known sequence. Either fluores-

cein-labeled or IR-dye-labeled dU or dA can be used (Fig. 10.11). The latter appears to

be superior. There are several advantages in this

approach. The fluorophores introduced

are internal, which protects them from

any exonuclease degradation; there is no back-

ground from a great excess of labeled primer, and there is no need to synthesize fluores-

cent primers. An example of the success in using internal labeling is shown in Figure

10.12. Admittedly this is an extraordinarily good sequencing result. However, recent ex-

perience at the EMBL where this technique was developed indicates that even in a teach-

ing setting,

the use of internal labeling routinely proved highly accurate raw sequencing

Figure 10.10 Schematic illustration of the bottom wiper used to transfer DNA sequencing ladders to a membrane.

VARIATIONS IN CONTEMPORARY DNA SEQUENCING TACTICS

339

Figure 10.11

Fluorescent dyes used for internal labeling.

(a ) Dye structures. (

b ) Typical proce-

dure for internal labeling.

340

Figur e 10.12

Results

of a v e ry good

but not e xceptional

DN A sequencing

ladder

obtained

with

internal labeling.

Ta ken from Grothues et al. (1993).

 

 

 

 

ERRORS IN DNA SEQUENCING

341

data with typical reads of 400 to 500 bases. Since the required fluorescent materials are readily available, there is to reason why they should not be incorporated into all one-color fluorescent DNA sequencing methods.

ERRORS IN

DNA SEQUENCING

 

 

 

 

 

A major factor in automated DNA sequencing is continued improvements in the software

 

used to analyze the data. From manual sequence calling, the state of

the art has

pro-

gressed to automated sequence reading, regardless of whether radioactive, chemilumines-

 

cent, or fluorescent sequencing data are recorded. Most software allows

manual editing

 

and overriding. Ideal software indicates where bases are known with great accuracy and

 

where there are ambiguities. One recent report cited a 1% automatic calling accuracy in

350-base fluorescent sequencing runs, with significant deterioration beyond this point to

17% error in 500 base runs (Koop et al., 1993). A more recent study (Table 10.1) is more

optimistic and demonstrates that manual editing can improve, sometimes substantially, on

 

the accuracy of automated calling software (Naeve et al., 1995). The best current auto-

matic software is claimed to read 500 base runs with less than 1% error. It is very impor-

tant to realize that most of the issues that

have been addressed thus far at the software

level are how to deal with cases where adjacent fragments are only partially resolved. The

 

resolution of successive bands in DNA sequencing gel electrophoresis gradually deterio-

 

rates as the size of the bands increases. Larger fragments spend more time in the gel and

have correspondingly more time to disperse. They also are increasingly subject to electri-

 

cal orientation (Chapter 5) which leads to a loss in size-dependent electrophoretic mobil-

 

ity. For these reasons one usually tries to sequence both strands of a target, and this pro-

vides the most accurate sequence possible at both ends of the target.

 

 

Many of the errors and ambiguities that occur in DNA sequencing data are systematic.

 

This encourages the use of clever computer

algorithms or

artificial

intelligence

ap-

proaches to refine, even more, automated sequence calling. Since the average separation

 

between adjacent fragment lengths varies very

gradually in DNA electrophoresis, from

 

the width of a band, one usually knows how many bases it represents even if these are not

 

well resolved, as in the case of a run of

the same base at very large fragment sizes.

However, when the results are examined more closely, it is apparent that the band spacing

 

is not perfect; it varies slightly depending on the identity of the last base in the chain, and

perhaps the one before (Fig. 10.13

 

a ). Band intensities are also a function of the local se-

quence, and they are markedly affected

by the particular DNA polymerase used.

Secondary structure in the single-stranded DNA is not totally eliminated by the denatur-

 

ing conditions used (Fig. 10.13

 

b ). This can be partially compensated for by using base

analogs like 7-deazaG instead of G, since these form weaker secondary structures. When

DNA strands form hairpins under the conditions of sequencing gel electrophoresis, they

 

migrate faster than expected. The result is called a compression; part of the sequence may

 

be missed or may just be impossible to read (Fig. 10.14). Frequently compressions occur

on one strand but not on the complementary strand; this is one of the major reasons to se-

 

quence both strands. At first, the strand dependence of compressions may seem puzzling.

 

After all,

whatever intrastrand base pairs that

can be formed

by one strand can also

be

formed by its complement. However, two additional complications arise. First is that any strand with a high local density of G residues can form unusual helical structures with G–G pairing. These are very stable; fortunately the complementary strand will be C-rich

342

T ABLE

10.1

 

Accuracy of

A utomated DN

A Sequencing as a Function of the Distance fr

 

 

om the Primer

 

 

 

 

 

 

 

 

 

 

 

 

 

 

Maximum Correct (%)

 

 

 

 

Median Correct (%)

 

 

 

 

Minimum Correct (%)

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

Method

 

 

1–

 

101–

201–

301–

401–

501–

1–

101–

201–

301–

401–

501–

1–

101–

201–

301–

401–

501–

 

 

 

100

 

200

300

400

500

600

100

200

300

400

500

600

100

200

300

400

500

600

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

Dye-primer

,

 

98

 

100

100

100

99

96

93

100

100

99

92

44

38

99

96

71

14

9

unedited

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

Dye-primer

,

 

96

 

100

100

100

99

65

93

100

199

99

98

61

38

100

100

98

69

0

edited

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

Dye-terminator

,

100

 

100

100

100

100

75

97

100

99

98

83

27

51

92

88

36

0

0

unedited

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

Dye-terminator

,

100

 

100

100

100

100

84

97

100

100

100

84

15

21

94

95

39

0

0

edited

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

Sour ce:

Adapted from Nae

v e et al.

(1995).

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

ERRORS IN DNA SEQUENCING

343

Figure 10.13

Effect of local DNA sequences on the band spacing and intensities seen in sequenc-

ing data. Taken from Tibetts et al. (1996).

(a ) Relative band spacings are intrinsic to the sequence.

(b ) Relative band intensities depend on the polymerase.

 

344 DNA SEQUENCING: CURRENT TACTICS

Figure 10.14 Example of compression artifacts in DNA sequencing caused by stable secondary structures in the single-stranded sample. Shown with the sequence data are the automatic software

call, and the true sequence is shown above. Taken from Yamakawa et al., (1996)

instead of G-rich. The use of 7-deazaG presumably eliminates such structures. The second, and more serious complication is the sequence of the loop of the hairpin. It turns out that certain loop sequences promote particularly stable hairpin formation. An example is GCGAAAGC, which forms a hairpin with only two base pairs, but its melting tempera-

ture is 76°C in 0.15 M salt (Hirao et al., 1990). A recent study surveyed a large number of natural and synthetic sequences that led to compressions (Yamakama et al., 1996).

Remarkably all but 2% of these were formed by

only two types of sequence

motifs.

About a

third, which showed

up on both strands

were

hairpins with a G

C-rich stem

( 3

bp) connected by

a 3–4 base loop.

The

remaining two-thirds occurred

on only 1

Соседние файлы в папке genomics1-10