Добавил:
Опубликованный материал нарушает ваши авторские права? Сообщите нам.
Вуз: Предмет: Файл:
Скачиваний:
46
Добавлен:
17.08.2013
Размер:
456.46 Кб
Скачать

Genomics: The Science and Technology Behind the Human Genome Project.

Charles R. Cantor, Cassandra L. Smith

 

Copyright © 1999 John Wiley & Sons, Inc.

 

ISBNs: 0-471-59908-5 (Hardback); 0-471-22056-6 (Electronic)

10 DNA Sequencing: Current Tactics

WHY

DETERMINE

DNA

SEQUENCE

 

A complete DNA sequence of a representative human genome is the major goal of the hu-

 

man

genome project. Complete DNA sequences of other genomes are also sought. Why

 

do we want or need this information? All descriptions of the organization of a genome, at

lower resolution than the sequence, appear to offer little insight into genome function.

Sometimes genes with common or related functions are clustered. This is particularly true

in

bacteria where the clustering allows polycistronic messages to ensure even production

of a set of interactive gene products. However, in higher cells, related genes are not nec-

essarily close

together. For example, in humans, genes for alpha and beta globin chains

are located on different chromosomes, even though it is desirable to produce their prod-

ucts in equal

amounts because they associate to form a heterotetramer, (alpha)

2(beta) 2.

The major purpose served by low-resolution maps is that they help us find things in the

genome. We usually want to find genes in order to study or characterize their function. It

is only at the level of the DNA sequence where we have any chance of drawing direct in-

ferences about the function of a gene from its structure. Admittedly, our ability to do this

today is still rather limited, as will be demonstrated in Chapter 15. However, from the rate

of progress in our ability to interpret DNA sequences de novo in terms of plausible gene

function, we can be reasonably optimistic that by the time the human genome is com-

pletely sequenced, coding regions will be identifiable with almost perfect accuracy, and

most new genes will carry in their sequence immediately recognizable clues about func-

tion.

 

 

 

 

A second reason to have the DNA sequence of genomes is that it gives us direct access

to

the DNA molecules

of these genomes via PCR. Using the sequence, it will almost

al-

ways be possible to design primers that will amplify a small DNA target of interest, or to provide a probe that will uniquely allow effective screening of a library for a larger segment of DNA containing the region of interest. The key point is that once DNA sequence

is available, clones do not have do be stored and distributed. DNA sequences also often allow us to search for similar genes in related organisms (or even more distant organisms) more efficiently than by using DNA probes of unknown sequence. For example, to find a mouse gene comparable to a human gene, one can try to use the human gene as a hybridization probe at reduced stringency (lower temperature, higher salt) against a mouse

library or use the human gene to design PCR primers for probing the mouse genome. But,

 

if one had both the relevant human and mouse DNA sequences available, a comparison

 

among these might reveal consensus regions that are more highly conserved than average

 

and thus better suited for hybridization or PCR to find corresponding

genes in

other

species. This becomes increasingly important when searching for homologs of very dis-

 

tantly related proteins.

 

 

A continual debate in the human genome project is whether to determine the DNA se-

 

quence of the junk: DNA that as far as we can tell is noncoding. Sydney

Brenner

was

325

326 DNA SEQUENCING: CURRENT TACTICS

quick to point out early in the project that this DNA is rightly called junk and not garbage because, like junk, this DNA has been retained, while garbage is discarded. Today, admittedly, we cannot interpret much from noncoding DNA sequences. But this does not mean

they are nonfunctional.

The fact that they remain in the genome argues for function, at

least at the level

of

evolution. However, there are surely

also functions for these se-

quences at the level

of

gene regulation, chromosome function, and

perhaps properties we

know nothing about today. The junk is certainly worth sequencing, but it will be best to do this later in the genome project when the cost of DNA sequencing has diminished. An

analogy can be made between the genome project and the exploration of a new continent.

At the time the interior of North America was first explored, a major target was river valleys because they were accessible and because they were commercially valuable. No one willingly spent much time in deserts or arctic slopes. However, most of our oil deposits are located far from river valleys, and if we had not pushed exploration of the continent to completion, we would never have found very valuable resources. It is probably this way also with the genome; when we finally make our way through the junk, systematically, there will be some unexpectedly valuable finds. We may not know enough today to realize they were valuable, even if we could find them.

DESIGN OF DNA SEQUENCING PROJECTS

The first DNA sequence was determined in 1970 by Ray Wu at Cornell University. It con-

sisted

of the 12-base

single-stranded overhang at

each end of bacteriophage lambda

DNA. The samples needed were readily in hand. Two investigators worked on the project

for three years. Data handling and analysis did not present any unexpected or

formidable

problems. The major chore was developing techniques for actually determining

the order

of the

bases. The method

employed, selective addition

of subsets of the four dpppN’s,

still has many attractive features, and we will revisit it several times in this and the next chapter.

Today, the complete DNA sequencing of 50-kb DNA targets, the size of the entire bacteriophage lambda, is a common task in specialized high-throughput sequencing laboratories. However, such projects are not yet routine in most laboratories that do DNA sequencing. The sequencing of targets 3 to 90 times larger has been accomplished in quite a few cases. Sequencing of continuous Mb blocks of human DNA is now becoming commonplace in quite a few research groups. These projects, even 50-kb projects, pose obstacles that were inconceivable at the dawn of DNA sequencing.

It is useful to divide discussion about DNA sequencing projects into tactics and strategy. Tactics is how the order of the bases on a single DNA sample is read and confirmed. Strategy, as illustrated in Figure 10.1, has a number of components. Presumably the target

is selected in a rational manner,

given the amount of effort that is actually required to

complete a sequencing project. The

upstream strategy is concerned with how the target is

reduced to DNA samples suitable for application of the particular tactics selected. The

tactics are then used, piece by piece,

in as efficient and automated a way as possible.

Then the downstream strategy consists in

assembling the data into contiguous blocks of

DNA

sequence, filling any

gaps, and

correcting the

inevitable errors that creep into all

DNA

sequence data.

 

 

 

 

 

Several caveats must

be noted

when

thinking

about DNA sequencing projects. Both

the ideal tactics and strategy may depend on the types of targets. Effective strategies may

LADDER SEQUENCING TACTICS

327

Figure 10.1

Design of a typical DNA sequencing project

.

combine several types of targets and several types of tactics. The key variable to judge efficiency and cost is the throughput: the number of base pairs of DNA sequence generated per day for each individual working in the laboratory. With current methods, except at the largest and most efficient genome sequencing groups, personnel costs are the completely dominant expense; chemicals, enzymes, and instrument depreciation all pale in compari-

son with salaries. In a few very automated and experienced centers, reagents and supplies are now the dominant costs.

Three terms are useful in evaluating sequencing progress. Raw DNA sequence is the direct data read from an experimental curve or photograph with local error correction done, for example, a manual override to correct an ambiguous call by sequence reading software. Finished sequence is the assembled DNA sequence for the entire target, with error corrections made by comparing redundant samples. In general, the complete DNA se-

quence is read separately from both DNA strands. This is a major contributor

to

finding

and correcting some of the most common kinds of errors. Sequencing redundancy

is

the

ratio of the number of raw base pairs of sequence acquired to the number of base pairs of finished sequence determined. It is usually at least 2, because of the need just cited to examine both strands. In general, the redundancy is dependent on the strategy used, and it has often been as high as 10 in many of the relatively large DNA sequencing projects that

have been accomplished to date.

LADDER SEQUENCING TACTICS

Virtually all current de novo DNA sequencing methods are based on the ability to fractionate single-stranded DNA by gel electrophoresis in the presence of a denaturant with single base resolution. Information about the location of particular bases in the sequence is converted into a specific DNA fragment size. Then these fragments are separated and analyzed. The gels used are either polyacrylamide or variants on this matrix like Long

Ranger TM . The denaturant is usually 7 M urea. Its presence is required to eliminate most

328

 

DNA

SEQUENCING: CURRENT TACTICS

 

 

 

 

of

the

secondary

structure that individual DNA strands can achieve by

intramolecular

base pairing, where this is allowed by the DNA sequence. It is possible, under ideal cases,

to

maintain single

base resolution up to sizes of 1 kb. Some success

has been reported

with ever larger sizes by the use of gel-filled capillaries. The use of denaturing gels is an

unfortunate aspect

of

current DNA

sequencing. Since urea solutions are

not stable

to

long-term storage, the gels must be cast within a few days of their use, and it is difficult to

reuse most gels more than several times without a serious decrease in performance. In the

 

two decades since Wu’s first DNA sequencing, the ladder methods we will describe have

 

produced more than 1,000 Mb of DNA sequence deposited in databases, and perhaps an

 

equal amount or more that has not been published or deposited.

 

 

 

 

 

Two rather different approaches have been used to generate DNA sizes based on DNA

 

sequence. We will describe how they are carried out starting with a single-stranded DNA

 

template. Slightly more complex procedures are required if the original template is dou-

ble stranded. The first of these methods, developed by Allan Maxam and Walter

Gilbert,

is shown in Figure 10.2. The ends of the DNA are distinguished by specifically labeling

one of them. Usually this is done directly, and covalently, with a kinase that places a ra-

diolabeled phosphate at the 5

 

-terminus of the template. There are other ways to label the

5 -end or 3

 

-end directly, and it is also possible to label either end indirectly, by hy-

bridization with an appropriate complementary sequence. This requires that the end se-

quence be known; it usually is known, since the DNA template is cloned into a vector of

known flanking sequence.

 

 

 

 

 

 

 

In Maxam-Gilbert sequencing base-specific or base-selective partial chemical cleavage

is used to fragment the DNA. This is carried out under conditions where there is an aver-

 

age of only one cut per template molecule with each cleavage scheme employed. Thus a

 

very broad range of fragment sizes is produced that reflects the entire sequence of the

template. Four separate chemical fragmentation reactions are carried out; each one favors

cleavage after a specific base. The fragments are fractionated, and the sizes of the labeled

pieces are measured, usually in four

parallel electrophoretic

lanes. The

DNA sequence

can be read directly off the gel as indicated by the example in Figure 10.3. The pattern of

bands seen is often called a ladder for reasons obvious from the figure. Note that in the

Maxam-Gilbert approach, there are additional fragments produced that are not detected

because

they

are

not

labeled,

but

they are present in

the sample.

For

some

alternate

schemes of detecting DNA fragments for sequencing, like mass spectrometry, these additional pieces are undesirable.

Figure 10.2 Maxam-Gilbert sequence technique: Preparation of end-labeled, size-fractionated DNA sample.

LADDER SEQUENCING TACTICS

329

Figure 10.3 Typical Maxam-Gilbert sequencing ladder and its interpretation.

The second general approach to DNA fragmentation for ladder sequencing was devel-

oped by Frederick Sanger (Fig. 10.4). This is the approach in widespread use today, for a variety of reasons, including the ability to avoid the use of toxic chemicals and the ease of adapting it to four-color fluorescent detection. One starts with a single-stranded template.

A primer is annealed to this template, near the 3

-end of the DNA to be sequenced. The

primer must be long

enough so that it binds only to one unique place

on the template.

This primer must correspond to known DNA sequence, either in the target or, more com-

monly, in the flanking vector sequence. A DNA polymerase is used to extend the primer

in a sequence-specific manner along the template. However, the sequence extension is

halted, in a base specific manner, by allowing the occasional uptake of chain terminators:

dpppN analogs that cannot be further extended by the enzyme. Almost all current DNA

sequencing uses dideoxy-pppN’s as terminators. As shown in Figure 10.5, these deriva-

tives lack the 3

OH needed to form the next phosphodiester

bond. Four separate chain

Figure 10.4 Sanger sequencing technique: Preparation of an end-labeled, size-fractionated DNA sample. The actual sequencing ladder will be virtually identical to that seen with the MaxamGilbert method.

330 DNA SEQUENCING: CURRENT TACTICS

Figure 10.5 Structure of a dideoxynucleoside triphosphate terminator.

extension reactions are carried out—each one with a different terminator. Label can be in-

 

 

 

 

troduced in several different ways: through the primer, the

terminator,

or internal

 

 

 

 

dpppN’s. The resulting mixture of DNA fragments is melted

off

the

template

and

ana-

 

 

 

 

lyzed by gel electrophoresis.

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

ISSUES

IN LADDER

SEQUENCING

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

The major goal is to maximize sequencing throughput. A second, significant goal is to

 

 

 

 

minimize the number of sequencing errors. An important element of these goals is to be

 

 

 

 

 

able to read the longest possible sequencing ladders, accurately. There are two significant

 

 

 

 

variables in this. The resolution of the gel electrophoresis will determine how far the se-

 

 

 

 

quencing data can be read, if there are data to be read at

all. Ultimately there are trade-

 

 

 

 

offs between how fast the gel can be run, which also affects the throughput, how well cer-

 

 

 

 

tain artifacts can be eliminated, and how much sample must be applied. The more sample

 

 

 

 

 

we have, the easier is the detection but, in general, the lower is the resolution. Large dou-

 

 

 

 

ble-stranded DNAs show negligible diffusion during gel electrophoresis as described ear-

 

 

 

 

 

lier in Chapter 5 (Yarmola et al., 1996). This is not the

case for the smaller single-

 

 

 

stranded DNAs used in sequencing where diffusion is a

significant

cause

of

band

 

 

 

 

broadening. This motivates the use of higher fields where shorter running times can be

 

 

 

 

achieved, hence minimizing the effects of diffusion. However, higher fields lead to greater

 

 

 

 

joule heating. This increases the effects of thermal inhomogeneities

which

also

lead

to

 

 

 

 

band broadening. The issues are complex because field strength also influences the shape

 

 

 

 

of DNA in a gel and thus affects its diffusion coefficient. Other factors that affect band

 

 

 

 

shape and thus resolution are the volume in

which

the

sample

is

loaded,

the

volume

 

 

 

 

sampled by the detector, and any inhomogeneities in gel concentration. For a thorough

 

 

 

 

discussion of the effects of these variables, see Luckey

et

al. (1993),

and

Luckey

and

 

 

 

Smith (1993).

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

To a good approximation, the velocity,

 

 

 

 

 

v,

of

DNA

in

denaturing acrylamide gel

elec-

 

 

trophoresis is proportional to 1/

L,

where

 

L is the length of the molecule. In automated flu-

 

 

orescent detection (or the bottom wiper shown later in Fig. 10.10), the sample is exam-

 

 

 

 

ined at a constant distance from the starting point,

 

 

 

 

 

 

 

D.

 

The time it takes

a fragment

of

a

particular length to reach this distance is proportional to

 

 

 

 

 

 

 

 

D /v

DL.

Hence

the

spacing

between two bands of length

L and

L

1

is

DL

 

D

 

(L

1)

D. Thus the band

spac-

ing is independent of size, but it can be increased, more or less at will by using longer and

 

 

 

 

longer running gels.

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

The second determinant of how far a ladder can be read is the uniformity of the sample

 

 

 

 

fragment

yield. It

is important to realize that

the

larger

the

target is,

the

smaller

the

yield

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

ISSUES IN LADDER SEQUENCING

 

 

331

of each piece even if the distribution of fragments is absolutely uniform. Thus, with per-

 

 

 

 

fect cleavage, sequencing a 100-base piece of DNA will require only 10% the amount of

 

 

 

 

sample that a 1-kb target requires. Put another way, for constant amounts of DNA sample

 

 

 

 

loaded, the detection sensitivity will have

to increase in proportion to

the

length

of the

 

 

 

 

DNA target. The relative yield of particular DNA fragments is affected by

the

choice

of

 

 

 

 

DNA polymerase, the nature of the terminators and primers used, the actual DNA tem-

 

 

 

 

 

plate, and the reaction conditions. Much optimization has been required to produce repro-

 

 

 

 

ducible runs of DNA sequence data that extend longer than 500 bases.

 

 

 

 

 

 

 

 

 

It is also important to realize that throughput is really

the product of the

number

of

 

 

 

lanes per gel and the speed of the electrophoresis. Speed can be controlled by the electri-

 

 

 

 

cal field applied. In fact higher fields

appear

to

improve

electrophoretic

performance.

 

 

 

 

What limits the speed, once efficient cooling is provided to keep the running temperature

 

 

 

 

of the gel constant, is the sensitivity of the

detection

scheme, if it is done

on

line. With

 

 

 

off-line detection, the sensitivity is still important, not for speed, but for determining the

 

 

 

number of lanes that can be used. The smaller the width of each lane, the more lanes one

 

 

 

 

can place on a single gel but the smaller the amount of

DNA one

can actually

load

into

 

 

 

 

each lane.

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

A major factor that affects the quality of DNA sequence data is the quality of the tem-

 

 

 

 

plate DNA. When fluorescent labeling is used, great care must be taken not to introduce

 

 

 

 

fluorescent

contaminants

into the

DNA sample.

A

number

of

automated

methods

for

 

 

 

 

 

DNA preparation routinely yield DNA suitable for sequencing. These methods are conve-

 

 

 

 

 

 

nient because they are so standardized. A

laboratory that tries to sequence

DNA from

 

 

 

 

many different types of sources will frequently encounter difficulties.

 

 

 

 

 

 

 

 

 

In early

DNA sequencing,

 

32

 

 

 

 

 

 

 

introduced from

g

32

pppA

 

 

P was the label of choice,

[ P]

via kinasing of the primer for

Sanger

sequencing

or

the

strand

to

be

cleaved

for

 

 

 

Maxam-Gilbert sequencing. This isotope has a short half-life which results in very high

 

 

 

 

experimental

sensitivity.

However,

 

 

 

 

32P also has a

relatively high energy beta particle,

 

 

 

which causes an artifactual broadening of the thin fragment bands on DNA sequencing

 

 

 

 

 

gels. Instead of

32P one can use the radioisotope

 

 

 

 

35S, as g thio-pppA. This still has a short

 

half-life, but the decay is softer, leading to sharper

bands. At first, DNA sequence data

 

 

 

were obtained by using X-ray film in autoradiography to make an image of the sequenc-

 

 

 

 

ing gel. This can be read by hand, which is still done by some, perhaps with the help of

 

 

 

devices and software to expedite transferring the data into a computer file. Alternatively,

 

 

 

 

the film can be scanned and digitized by a device like a charge-couple device (CCD) cam-

 

 

 

 

era. This then allows most of the data to be processed by image analysis software, with

 

 

 

 

human intervention needed in difficult places. The accuracy of using film and some of the

 

 

 

 

existing software does not appear to be as good as the fluorescent systems we will de-

 

 

 

 

scribe later.

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

A new approach to recording data from radioactive decay is the use of imaging plates.

 

 

 

 

These consist of individual pixels that record local decays. After the plate is exposed, it is

 

 

 

 

read out by laser excitation in a raster

pattern

(scanning

successive

lines,

in

the same

 

 

 

manner as a TV camera or screen), and the resulting data are transferred into a computer

 

 

 

 

file. A great advantage of imaging plates over film is that their response is a linear func-

 

 

 

 

tion of dose over more than five orders of

magnitude

in

intensity, and

most

important,

 

 

 

 

they are linear down to the lowest detectable doses. In contrast, film shows a dead zone at

 

 

 

 

very low doses, and it easily saturates at high doses. Imaging plates are reusable, and for

 

 

 

 

the heavy user, the great savings in film

that

result

eventually

compensate

for the

high

 

 

 

 

costs of the

imaging

plates and the

instrument

needed

to

read

them out. Although

it

is

 

 

 

 

332 DNA SEQUENCING: CURRENT TACTICS

possible, in principle, to use several different radioisotopes simultaneously, as is common

in liquid scintillation counting, and thus achieve multicolor labeling and detection, in practice, this is rarely done with radioactive DNA sequencing data.

In most contemporary DNA sequencing, radioisotopes have been replaced by fluorescent labels. These can be used on the primers, the terminators, or internally. It may seem surprising that fluorescent detection can be competitive with radioisotopes. However, one can gain enormous amounts of sensitivity in fluorescence by sequential excitation and emission from the same fluorophore until it undergoes some chemical side reaction and becomes bleached. This makes up in large part for the difference in energy between a

beta particle and the fluorescent photon. The major determinant of sensitivity in fluorescence detection is, then, not really signal; it is background. Scrupulous care must be taken to avoid the use of reagents, solvents, plastics, glove powder, and detergents that have fluorescent contaminants.

Four different colored fluorescent dyes are used in several of the most common DNA sequencing detection schemes. One dye is used for each base-specific primer extension. The ideal set of dyes would have very similar chemical structures so that their presence would affect the electrophoretic mobility of labeled DNA fragments in identical ways.

They would also have emission spectra as distinct as possible, and they would all be excitable by the same wavelength so that a single excitation source would suffice for all four dyes. The dyes would also allow similar very high sensitivity detection so that signal intensities from the four different cleavage reactions would be comparable. Inevitably with currently available dyes there are compromises. For example, a set of nearly identical dye-labeled chain terminators was produced for DNA sequencing that led to very good electrophoretic properties, but the emission spectra of these compounds were too similar

for the kind of accuracy needed in reading long sequence ladders. Subsequently a more well-resolved set of fluorescent terminators that are substrates for Sequenase, the most popular enzyme used in Sanger sequencing became commercially available. These have

the advantage that all four terminators can be used simultaneously in a single sequencing reaction.

All currently used dyes for four-color DNA sequencing are excited in the UV/visible wavelength range. The limits of this range and the typical widths of emission spectra of high quantum yield dyes make it rather difficult to detect more than four colors simultaneously. The infrared (IR) spectrum is much broader, and work is in progress trying to develop DNA sequencing dyes in this range. If the lower sensitivity of IR detection can be tolerated, such dyes would offer two advantages. The laser sources needed to excite them

are inexpensive, and at least eight different colors would be obtainable. This

could be

used to double the throughput of four-color sequencing, or it could be used to include a

known standard in every sequencing lane to improve the accuracy of automatic sequence

 

calling. Recently IR-excited dyes have begun to make an impact on automated DNA se-

 

quencing. Multiple IR colors are presumably soon on the horizon.

 

A significant improvement in fluorescent dyes for automated sequencing is the use

of

energy transfer methods (Glazer and Mathies, 1997). Primers contain a pair of fluorescent dyes (Fig. 10.6). One dye is common to all four primers. This is optimized to absorb the exciting laser dyes. The second dye is different in each primer, and it is close enough in each case that fluorescence resonance energy transfer is 100% efficient. Thus all the exci-

tation energy migrates to the second

dye where it is subsequently emitted. The second

dyes are chosen so that they have as

different emission

spectra as possible to maximize

the ability to accurately discriminate

the four different

colors.

ISSUES IN LADDER SEQUENCING

333

Figure 10.6 Energy transfer primers (provided by Richard Mathies).

of four primers. ( b ) Structure of the donor dye. ( c ) Structure dyes that can be detected simultaneously in DNA sequencing.

(a ) Schematic design of a set of four different acceptor fluorescent

334 DNA SEQUENCING: CURRENT TACTICS

An alternative to fluorescent labels is chemiluminescence. This has the great advantage that no exciting light is needed. Thus the sensitivity can be extremely high, since there is no contamination from scattering of the exciting light used in fluorescence, or the effects of fluorescent impurities. Today, chemiluminescent detection schemes exist that can read-

ily be used in DNA sequencing. They have a few disadvantages. Only one color is currently available, and once the chemiluminescence has been read, it is difficult to use the gel or filter again. While this is not often a problem in most forms of DNA sequencing, it

is a problem in most mapping applications where the same filter replica of a gel is frequently probed many times in succession. Nevertheless, the sensitivity of chemiluminescence makes it attractive for some mapping applications. The advantages of four-color fluorescence are also beginning to be felt in some aspects of genome mapping. An exam-

ple was given in Chapter 8.

CURRENT FLUORESCENT DNA SEQUENCING

There are two basically distinct implementations of fluorescent detected DNA sequence determination. These are the current commonly available state-of-the-art tools used today

in

most

large-scale DNA sequencing projects. They each can produce more

than

10

4 to

10

5

bp of raw DNA sequence per laboratory worker per day. Most allow 400 to 800 bases

of data to be read per lane; most of the lanes give readable data when proper DNA prepa-

ration methods are used. The detection schemes used in the two approaches are illustrated

in Figure 10.7. Both are on-line gel readers. These two schemes have a number of serious

trade-offs. In the Applied Biosystems (ABI) instrument, based on original developments

by Leroy Hood and Lloyd Smith, four different colored dyes are used to analyze a mix-

ture

of

four different samples in a single gel lane (Fig. 10.7

 

 

a ). This allows four times

more samples to be loaded per gel, if the width of the lanes is kept constant. The use of

four colors in a single lane avoids the problem of compensating for any differences in the

mobility of fragments in adjacent lanes—that is, there is no lane registration problem. In

order

to

do the four-color analysis, a laser perpendicular to

the gel is

used

to

excite one

lane

at

a time, and the signal is detected through a rotating

four-color

wheel

to

separate

the emission from the four different dyes. Thus the effective power of the laser is the time

shared among the lanes and the colors. With 20 lanes, the actual time-averaged illumina-

tion available is, at most, 1/80 the laser intensity.

 

 

 

 

 

In the alternative implementation, embodied in the Pharmacia automated laser fluores-

cence (ALF) instrument, only a single fluorescent dye is used (Fig. 10.7

 

b ). The dye origi-

nally selected was fluorescein because it is the most sensitive available for the particular laser exciting wavelength used. In a newer version of the instrument, a different laser and

an infrared emitting dye, Cy5, are used. The key feature of the ALF is that the laser excitation is in the plane of the gel, through all the sample lanes simultaneously. This design, which is based on an instrument originally developed by Wilhelm Ansorge, is possible because at the concentrations of label used for DNA sequencing the samples are optically

thin. This means that the amount of light absorbed at each lane is an insignificant fraction of the original laser intensity, so all lanes receive, effectively, equal excitation. The emission from all the lanes is recorded simultaneously by an array of detectors, one for each lane. While these could be made four-color detectors, in principle, the cost and complex-

ity is not warranted. Instead the ALF reads data from four closely spaced lanes, one for each base-specific fragmentation. Thus the number of lanes needed for one sample in the

Соседние файлы в папке genomics1-10