Добавил:
Опубликованный материал нарушает ваши авторские права? Сообщите нам.
Вуз: Предмет: Файл:
Скачиваний:
55
Добавлен:
17.08.2013
Размер:
510.17 Кб
Скачать

Genomics: The Science and Technology Behind the Human Genome Project.

Charles R. Cantor, Cassandra L. Smith

 

Copyright © 1999 John Wiley & Sons, Inc.

 

ISBNs: 0-471-59908-5 (Hardback); 0-471-22056-6 (Electronic)

12 Future DNA Sequencing without Length Fractionation

WHY

TRY

TO

AVOID LENGTH FRACTIONATIONS?

All

but

one

of the methods we have described for DNA sequencing in Chapter 10 in-

volved a gel electrophoretic fractionation of DNA. The exception used mass spectroscopy

instead of electrophoresis, but a length fractionation was still needed because all of the in-

formation about base location had been translated into fragment sizes prior

to

analysis.

There are several motivations to try to get away from this basic paradigm

and

develop

DNA sequencing methods that do not depend on size fractionation. First, size fractionation, except by mass spectroscopy, is really quite a slow and indirect method of reading sequence data. Second, fractionations are intrinsically hard to parallelize. Third, it is not obvious how fractionation methods could ever be used to look efficiently just at sequence differences, and most of the long-term future of DNA sequencing applications probably

lies in this key area of differential sequencing. This is true not only in the potential use of DNA sequencing for human diagnostics, but also for evolutionary applications, for popu-

lation genetic applications, and for ecological screening.

For all of these reasons there is considerable current interest in trying to develop entirely new approaches to DNA sequencing. Many of the techniques that will be men-

tioned in this chapter are going to be discussed only briefly. While they are probably capable of maturing into methods that can read DNA sequences, they are unlikely to do this soon enough, or ultimately efficiently enough, to be of much use for large-scale DNA sequence processing. However at least one of the second-generation methods treated in this chapter, sequencing by hybridization (SBH), does appear to offer a significant chance of making an impact on the current human genome project, and an even better chance of making a major impact on future DNA sequencing in clinical diagnostics.

SINGLE-MOLECULE

SEQUENCING

 

A number of different potential DNA sequencing methods require that data

be obtained

from one single molecule at a time. They include handling DNAs or their reaction prod-

ucts in flow systems, or observing DNAs by microscopy. One approach that has been in-

vestigated by Richard Keller, would use an exonuclease to degrade a tethered DNA con-

tinually, and detect individual nucleotides as they are cleaved by the enzyme and liberated

into a flowing stream. This method exploits the power of flow cytometry for very sensi-

tive detection of

a fluorescent target whose location is known

rather precisely. A

schematic illustration of this approach to single-molecule sequencing is given in Figure 12.1.

394

SINGLE-MOLECULE SEQUENCING

395

U

A

Fluorescently

G

labeled

A

DNA

C

strand

C

U

 

 

Exonuclease

U

 

Fluid

flow

G

Cleaved, labeled nucleotides

 

G

Focused

 

C

laser beam

 

 

 

Spectral filter

Photodetector

Computer

Figure 12.1 Schematic illustration of one approach to sequencing single DNA molecules in solution. Provided by Richard Keller.

To see all four bases in a single molecule, each would have to be labeled with a different fluorophore. There have been suggestions to use the intrinsic fluorescence of each base directly, but this fluorescence is weak, so it would require very sophisticated and expensive detection methods to be practical at the single-molecule limit. The challenge is to make fluorescent base analogs that are acceptable substrates for DNA polymerase. Substitution at every single nucleotide position must be accomplished. This requirement has been met with one and two bases, which is an impressive accomplishment. It remains

to be seen if it can be met with all four bases. The ideal exonuclease would liberate bases

in a kinetically smooth and rapid rate. It would be processive so that a single enzyme would suffice to degrade the DNA molecule. Otherwise, with the arrangement shown in

Figure 12.1, there would be pauses during which no product would be appearing, and one

396

FUTURE DNA SEQUENCING WITHOUT LENGTH FRACTIONATION

 

 

would constantly have to replenish the supply of enzyme as molecules fall off and are lost

 

to

the flowing stream. The properties

of actual exonucleases, for the most

part, are

not

this ideal, but they are rapid, and some are reasonably processive.

 

 

 

 

It is possible in a flowing stream to detect single fluorophores of the sort used to label

nucleic acid bases, which is like that which would be used with a tethered DNA. Some

typical results are illustrated in Figure 12.2. The issue that remains

unresolved

is the

chances of missing a base (a false negative) and the chances of seeing a noise fluctuation,

by

scattering from a microscopic dust

particle, or whatever, that imitates a

base when

none is present (a false positive).

It is interesting to examine what

the

consequences

would be if, instead of one molecule, many were tethered together in the flowing stream,

 

and exonucleases were allowed to process them all. If the digestion could be kept in syn-

chrony, the signal-to-noise problem

in detection would be alleviated

considerably.

However, there is no way to keep the reactions

in synchrony. The best one

can

do is

to

find

a way to start the exonucleases synchronously

and use the most processive

enzymes

 

available. Even in this case, however, there is an inevitable dephasing of the reaction, as it proceeds, because of the basic stochastic (noisy) nature of chemical reactions. Given

Figure 12.2

Typical data obtained in pilot experiments to test

the scheme shown in Figure 12.1.

Top panel: A

dilute concentration of a labeled nucleotide is allowed

to flow past the detector.

Bottom panel: A control with no labeled nucleotides.

 

SEQUENCING BY HIGH-RESOLUTION MICROSCOPY

397

Figure 12.3

Plus-minus DNA sequencing. (

a ) Extension until a template coding for a missing

dpppN is encountered. (

b ) Degradation of a chain until a specific nucleotide is reached.

some microscopic rate constant for a reaction, at the level of individual molecules the ac-

 

tual reaction times vary in such a way that a distribution of rates actually contributes the

 

average value

seen macroscopically. The reaction zone broadens like a random walk as

 

the

reaction

proceeds down the chain. Its width soon becomes more than several bases,

 

 

even under ideal kinetic circumstances.

 

 

 

 

 

 

 

 

 

One

way to circumvent

the

statistical problems

with

sequencing

by

exonucleases

 

would be to find a method to stop the reaction at fixed points and then allow it to restart.

 

This, in essence, is what is

done with DNA polymerase in

plus-minus sequencing, the

 

 

very earliest method used by Ray Wu. If one dpppN is left

out, the reaction proceeds

up

 

to the place

where this base is demanded by the template, and it stalls there

(Fig. 12.3

a ).

Adding the missing dpppN then

allows the reaction to continue. If a DNA

polymerase

 

 

with

a 3

-editing exonuclease activity is used, a similar result can be achieved by having

 

only one dpppN present. In this case the enzyme degrades the 3

 

 

 

 

 

-end of a DNA chain, un-

til it reaches a place where the

dpppN present is called for

by

the template

(Fig. 12.3

b ).

As long

as a

sufficient supply of dpppN remains, the enzyme will stall at this position.

 

These are useful tricks; they work well for sequencing, and there is no reason why they

 

could not be incorporated into strategies for sequencing a single molecule or small num-

 

 

bers

of molecules. However, the major potential advantage of the original scheme pro-

 

 

posed by Keller is speed, and steps that require changing substrates numerous times are

 

likely to slow down the whole process considerably.

 

 

 

 

 

 

 

 

SEQUENCING BY HIGH-RESOLUTION MICROSCOPY

 

 

 

 

 

 

 

 

One

of

the

earliest attempts at the development of alternate DNA sequencing methods

 

 

was

Michael

Beer’s strategy for determining nucleic acid sequence by

electron

mi-

 

croscopy. Beer’s plan was to label individual bases with particular electron-dense heavy

 

metal clusters and then image these. Two problems made this approach unworkable. First,

 

 

the nucleic acids were labeled

by

covalent modification

after

DNA

synthesis. This

leads

 

 

to less than

perfect stoichiometry

of the desired product, and it

undoubtedly

also

leads

to

 

398 FUTURE DNA SEQUENCING WITHOUT LENGTH FRACTIONATION

some unwanted side reactions with other bases. The second problem is that sample dam-

age in the conventional electron microscope is considerable; this makes it very difficult to achieve accurate enough images to read the DNA sequence as the spacing between metal-

tagged bases, since the structure moves around in response to the electron beam of the molecule. This problem of molecular perturbation by microscopes remains with us today

as the greatest obstacle to using high-resolution microscopy for DNA sequencing.

 

Currently

a

new

generation

of

ultramicroscopes has reopened

the issue of

whether

DNA could be sequenced, directly or indirectly, by looking at it. The new instruments are

scanning tip microscopes; the best studied of these are the scanning tunneling microscope

(STM) and the atomic force microscope (AFM). Both of these instruments read the sur-

faces of samples in much the same way that a blind person reads braille. The surface is

scanned with a

sharp

tip in a

raster pattern, as shown schematically in Figure 12.4. In

AFM what leads to the image is the force between the tip and the sample. Van der Waals

forces will attract the tip to the surface at long distances and repel the tip

at short dis-

tances. What is

usually done is to

have a feedback loop via a

piezoelectric

device. This

can be used to place a force on the tip to keep its vertical position constant, and the voltage needed to accomplish this is measured. Alternatively, one can apply a constant force,

and measure the vertical displacement of the tip, for example, by bouncing a laser beam

off the tip and seeing where it is reflected to. In STM an electrical potential is maintained between the tip and the surface. This leads to a current flow from the tip to the surface at short distances. The current is dependent on the distance between the tip and the surface,

and the electrical properties of the surface. In practice, one can adjust the position of the tip to maintain a constant current, and measure the tip position, or keep the vertical height of the tip constant and measure the current.

For AFM or STM to be successful, very flat surfaces are required. With hard samples on such surfaces, atomic resolution is routinely observed, and even subatomic resolution

has been reported, where information about the distribution of electron density within the sample is uncovered. DNA is not a hard sample, and it does not easily adhere to most very flat surfaces. These difficulties have produced many frustrations in early attempts to image DNA by AFM or STM. In retrospect, most or all of the spectacular early pictures of DNA have been artifacts, perhaps caused by helixlike imperfections in the underlying surfaces. The best that can be said is that images that looked like DNA were rare and far between, and not generally reproducible. One problem that soon became quite apparent is

that the forces used in these early attempts were sufficient in most, if not all, cases to knock the DNA molecules off the surface being imaged.

Figure 12.4 Operating principle of a typical scanning tip microscope. In STM the electrical current between the tip and the surface is measured. In AFM the repulse force between the tip and the

surface is measured.

 

 

 

SEQUENCING BY HIGH-RESOLUTION MICROSCOPY

399

More recent attempts to image DNA with scanning tip microscopes, particular with

 

 

AFM, have been more successful, at least in the sense that dense arrays of molecules can

 

 

be seen reproducibly. This is accomplished by using surfaces to which DNA adheres bet-

 

 

ter, like freshly cleaved mica, instead of the graphite or silicon surfaces

used

earlier.

 

Sharper tips give high enough resolution to be able to measure the lengths of the mole-

 

cules reliably. The current images are, however, a long way from the resolution needed to

 

 

read the sequence directly by looking at the bases. A number of severe obstacles will have

 

 

to be overcome if this is ever to be done. First, the current images are mostly of double-

 

stranded

DNA. This is understandable since

it is a much more regular structure,

much

 

 

more amenable to detection and quantitation in the microscope. However, in the double

 

 

strand, only short bits of sequence are readable from the outside, as one is forced to do in

 

AFM. This will lead to a difficult, but apparently not insurmountable, sequence recon-

 

struction problem where data from many molecules will have to be combined to synthe-

 

 

size the final sequence. A second problem is that the DNA molecules could still be dis-

 

 

torted quite a bit as the tip of the microscope moves over them. This may or may not be

 

 

alleviated by newer microscope designs that would allow lower forces to be used.

 

 

 

A third problem with AFMs is

that the image seen is a convolution of the shape of the

 

 

tip and the shape of the molecule, as shown in Figure 12.5. Thus, unless very sharp

tips

 

can be made, or tips of known shape, it can be difficult with a soft, deformable molecule

 

to deconvolute the image and see the true shape of the DNA. One approach to circumvent

 

 

 

many of these difficulties would be to label the DNA with base-specific reagents that are

 

 

more distinctive either in AFM, where larger, specific shapes could be used, or in STM,

 

where labels with different electrical properties might serve. As a test of this, and to make

 

sure DNA

imaging was now reliable

in the AFM, proteins were attached to the ends

of

 

 

DNAs before AFM imaging. Two examples of

the sorts of images seen are shown in

 

 

Figure 12.6. The protein used, purely because it was available and of a size that made it

 

easy to distinguish from DNA, was a chimera between streptavidin and a fragment of

 

 

staphylococcal protein A, which was already introduced in Chapter 4, where it was used

 

 

for immunoPCR. The DNA was biotinylated, either on one or both ends. Two different

 

 

lengths

of DNAs were used, and

since streptavidin is tetrameric, the resulting

images

 

show a progression of structures from DNA monomers up to trimers. Because of the na-

 

 

 

ture of these structures, the proper measured lengths of the DNAs within them, and the

 

 

expected

height difference between

the protein label at the ends or vertices of the

DNA

 

 

and the DNA itself, one can be very confident that these are true images of DNA and protein. However, the resolution is still far too low to allow sequencing.

Figure 12.5 In scanning tip microscopy, what is actually measured is a convolution of the shape of the object and the shape of the tip.

400 FUTURE DNA SEQUENCING WITHOUT LENGTH FRACTIONATION

Figure 12.6 Two typical AFM images of short-end biotinylated DNA molecules labeled at one or both ends by a chimeric protein fusion of streptavidin and staphylococcal protein A. Since the streptavidin is tetrameric, one can see figures representing more than one DNA molecule bound to

the same streptavidin. Reproduced from Murray et al. (1993).

It should be possible to use progressively smaller labels and to increase the resulting resolution. Whether this will lead to direct AFM DNA sequencing soon is anyone’s guess.

If it does, a real advantage is that one could sequence a wide variety of different molecules in a single experiment without the need to clone or fractionate. The labeling would almost certainly be introduced by PCR using analogs of the four bases. This will be much more accurate than the original chemical modification methods used for electron microscopy. However, the resulting images, as elegant as they may look some day, might have to be analyzed as images to extract the DNA sequence data. By current methods this

could become a serious bottleneck. What is still needed is a way to direct the tip of the microscope so that it tracks just over the DNA molecule of interest, rather than scanning a grid that is mostly background. If this can somehow be achieved, the problem of image analysis ought to become much simpler, and the rate of image acquisition also ought to

be increased considerably.

STEPWISE ENZYMATIC SEQUENCING

A major success story in the history of protein sequencing was the development of stepwise chemical degradation. Amino acid residues are removed one at a time from one end

of the polypeptide chain and their identity is determined successively. Automated

 

 

 

 

 

STEPWISE ENZYMATIC SEQUENCING

401

Edmond degradation currently provides our main source of direct protein sequence data.

 

 

The yield in each step is the critical variable, since it determines how far from the original

 

end the sequence can be read. Comparable chemical approaches for DNA or RNA se-

 

 

 

quencing have not been terribly successful. Recently, however, several

stepwise

enzy-

 

matic sequencing approaches

have been suggested. As individual processes, they do

not

 

 

at first glance seem all that attractive. However, they have the potential to be implemented

 

in massively parallel configurations, which, if successful, could ultimately provide very

 

high throughput. These schemes are distinct from the single molecule methods described

 

 

earlier in that any desired number of target molecules of each type can be employed. Thus

 

 

detection sensitivity is not an issue here.

 

 

 

 

 

 

One strategy, developed by Mathias Uhlen, is to divide the sample into four separate

 

wells,

each

containing

a

DNA polymerase without a 3

 

 

 

 

-proofreading activity. To each

 

well one of the four dpppN’s is added. Chain extension will occur only in

one well with

 

the concomitant release of pyrophosphate (Fig. 12.7). This product can be detected with

 

great sensitivity; for example, it can be enzymatically converted to ATP, and that can be

 

measured using luciferase, to generate a chemiluminescent signal. The amount of light

 

 

emitted

is proportional

to

the amount of ATP made. Thus one can quantitate

the amount

 

 

of dpppN incorporated and determine, within limits, how many units the chain was ex-

 

 

tended by. Sample from the well that was successfully extended is then divided into four

 

new wells, and the process is repeated. Actually three wells would

suffice,

since

one

 

knows that the next base is not the same as the one or ones just added, but it is probably

 

good to have the fourth as an internal control. One obvious complication with the scheme

 

 

is that the sample keeps getting divided, so one has to either start with

a large

amount of

 

it or have a

sensitive enough assay that only a small aliquot can be removed

and assayed.

 

 

A variation on this basic approach adds dpppNs in a cyclical order. This avoids the prob-

 

 

lem of sample subdivision. It appears to have considerable promise. The method is called

 

 

pyrosequencing.

 

 

 

 

 

 

 

 

A second strategy is similar in spirit but uses dideoxy pppN’s. This is shown in Figure

 

12.8. In four separate wells containing target DNA and DNA polymerase is added one of

 

 

the ddpppN’s carrying a label. Alternatively, one could use a single well and a mixture of

 

four different fluorescently labeled ddpppN’s. Only one of the ddpppN’s becomes incor-

 

porated. From the location of the well, or the color, the identity of the base just added is

 

known. The base just added is now removed by treating with the 3

 

 

 

 

-editing exonuclease

 

activity

of

DNA polymerase

I in the presence of all of the dpppN’s

except the

one

just

 

Figure 12.7

One scheme for stepwise enzymatic

DNA

sequencing. Here, when a particular base

is added, pyrophosphatase is used to synthesize ATP from

the

pyrophosphate (pp) released, and the

ATP in turn is used to generate a chemiluminescent signal by serving as a substrate for the enzyme, luciferase.

402 FUTURE DNA SEQUENCING WITHOUT LENGTH FRACTIONATION

Figure 12.8 A second scheme for stepwise enzymatic DNA sequencing. This is similar in spirit to plus-minus DNA sequencing illustrated in Figure 12.3. It uses fluorescent terminators, ddpppN*, like those employed in conventional automated Sanger sequencing.

added. Next the labeled ddpN

just removed is replaced by an unlabeled dpppN by using

DNA polymerase with only this particular dpppN present. Then the process is repeated.

This

scheme avoids the sample division or aliquoting problem of the previous strategy.

 

To detect a run of the

same base, one will have to be able to vary the scheme. What

will

happen in this case is

that the exonuclease treatment will degrade the chain back to

the location of the first base in the run. To determine the length of the run, one possible approach is to add a labeled analog of the particular dpppN involved in the run, in the presence of DNA polymerase, and detect the amount of synthesis by quantitating the incorporation of label. Then the entire block of labeled dpN’s has to be removed, replaced by unlabeled dpN’s, and next base after the block can now be determined. This is an unfortunate complication. However, in principle, the entire scheme could be set up in a mi-

crotitre plate format and run in

a very parallel way. As in the first scheme

the whole

process would be best carried out

in a solid state sequencing format so that the DNA

could be purified away from small

molecules and enzymes easily and efficiently

after

each step.

 

 

A third strategy, has not been tested to our knowledge, because it depends on finding a

dpppNx derivative that has two special properties. Like a ddpppN the derivative must not be extendable by DNA polymerase. However, there must be a way to change the dpNx af-

ter incorporation

into the DNA chain so that now it is extendable. The scheme then, as

shown in Figure

12.9, is to add dpppNx to the target in four separate tubes. Whichever

Figure 12.9 A third, as yet unrealized scheme for stepwise enzymatic DNA sequencing. Here the

key ingrediant would be a 3

-blocked pppNx

that could be deblocked after each single base

elongation step. Incorporation could be detected by pyrophosphate release as in Figure

12.7 or by the use of a fluorescent blocking agent, x

.

 

 

 

 

 

DNA SEQUENCING BY HYBRIDIZATION (SBH)

403

one extends could be determined by a pyrophosphatase assay as in the first scheme. Then

 

the incorporated dpNx is converted to dpN, and the process continued. One candidate, in

 

principle, for the desired

compound is dpppNp, which, after incorporation of the dpNp

 

into DNA could be converted to dpN by alkaline phosphatase. This latter step certainly

 

works; however, it is uncertain how well DNA polymerase will utilize compounds like

 

dpppNp’s. Apparently

quite

 

a few other reversible dpppN

derivatives have been tried as

 

DNA polymerase substrates without much success thus far.

This is a pity because the

 

scheme has real appeal.

 

 

 

 

 

 

 

 

 

In deciding how, eventually, to implement any of the above schemes in an automated

 

fashion, one needs to consider an interesting trade-off between the time to sequence one

 

sample and the number of samples that can be handled simultaneously. Instead of trying

 

all four single base additions simultaneously, they could be tried one at a time, in a cycli-

 

cal pattern, say A, then

G, then C, then T, as in

pyrosequencing. With an immobilized

 

DNA sample, the target is simply moved from one set of reagents, after washing, to an-

 

other, and the point where

 

positive

incorporation occurs

is recorded. The advantage of

 

this is that the logistics

and design of the system become much simpler, particularly for

 

the cases where pyrophosphate release is measured. It will take four times as long to

 

complete a given length of DNA sequence, but one could handle precisely four times as

 

many sequences simultaneously.

 

 

 

 

 

 

DNA SEQUENCING BY HYBRIDIZATION (SBH)

 

 

 

 

We will devote the rest of this chapter to a number of approaches for determining the se-

 

quence of DNA by hybridization. In all of these approaches one uses relatively short

 

oligonucleotides as probes to determine whether the target contains the precise comple-

 

mentary sequences. Essentially SBH reads a word at a time rather than a letter at a time.

 

Intuitively this is quite

efficient; it is after all the way written

language is usually read.

 

For reasons that will become readily apparent, attempts to perform SBH have usually fo-

 

cused on oligonucleotides with 6 to 10 bases. The conception of SBH appears to have had

 

at least four independent origins. Many groups are now working to develop an efficient

 

practical scheme to implement SBH.

 

 

 

 

 

 

There seems to be a consensus that SBH will eventually work well for some high-

 

throughput DNA sequencing applications, like sequence comparison, sequence checking,

 

and clinical diagnostic sequencing. In all of these cases one

is not

trying to determine

 

large tracts of sequence de novo; instead, the targets of interest are mostly small differ-

 

ences between a known or expected sequence, and what has actually been found. There is

 

also agreement that SBH will work for determining partial sequences, for fingerprinting,

 

and for mapping. However, SBH may not work for direct complete de novo sequencing

 

unless some of the enhancements or variations that have been proposed to circumvent a

 

number of problems turn out to work in practice.

 

 

 

 

The two critical features of SBH are illustrated in Figures 12.10 and 12.11. As we

 

demonstrated in Chapter

3,

 

the stability of a perfectly matched duplex is greater than an

 

end mismatched duplex, and much greater than a duplex with an internal mismatch (Fig.

 

12.10). Thus a key step in SBH is finding conditions where there is excellent discrimina-

 

tion between perfect matches and mismatches. An immediate

problem is that for a se-

 

quence of length

n,

there is only one perfect match, there are six possible end mismatches

 

(each base on each end of

the target can be

any one

of the

three

noncomplementary

 

bases), and there are

3(

 

n

2)

possible internal

mismatches. Unless the discrimination is

 

Соседние файлы в папке genomics11-15