Добавил:
Опубликованный материал нарушает ваши авторские права? Сообщите нам.
Вуз: Предмет: Файл:
Скачиваний:
58
Добавлен:
17.08.2013
Размер:
510.17 Кб
Скачать

404 FUTURE DNA SEQUENCING WITHOUT LENGTH FRACTIONATION

Figure 12.10 Effect of mismatches on the stability of short DNA duplexes.

Figure 12.11 Reading a text by reconstruction from overlapping n -tuple words.

very strong, there will be an inevitable background problem where a specific signal is diluted by a large number of weak mismatches.

The second key aspect of SBH is that the sequence can be read by overlapping words, as shown in Figure 12.11. In principle, with perfect data one would not need to try all the

words to reconstruct the sequence. The problem of reconstructing a sequence

from

all

n -

tuple subsegments is highly overdetermined, except for the complications that we will

discuss below. Simulations show that reconstructing DNA sequences

from

oligonu-

cleotide hybridization fingerprints is very robust and very error resistant. Even significant

levels of insertions and deletions can be tolerated without badly degrading the final se-

quence. A key element of SBH that is easily forgotten is that negatives as well as posi-

tives are extremely informative. Knowing that a specific oligonucleotide

like AACTG-

GAC does not exist anywhere in the target provides a constraint that can sometimes

be

quite useful in assembling the data from words that are found to be present.

 

 

 

BRANCH POINT AMBIGUITIES

The major theoretical limitation with simple direct implementations of SBH is the inability to determine sequences uniquely if repeating sequences are present. There are two kinds of repeats: tandemly repeated sequences and interspersed repeats. The presence of a tandemly repeated sequence can be detected, but it is difficult to determine the number of repeated sequences present. When the length of the monomer repeat sequence length is longer than the SBH words length, then the number of copies of a tandemly repeated sequence can only be determined when a hybridization signal is quantitated and not simply scored positively. These problems are relatively easily dealt with by conventional sequencing or PCR assays, since the unique sequence flanking the simple sequence or tandem repeat will generally be known.

The more serious problem caused by repeats is called a branch point ambiguity (Fig.

12.12). If the SBH word length used is

n, these ambiguities arise whenever there is an ex-

act recurrence of any sequence with length

n 1

in the target. What happens, as shown

in Figure 12.12, is that the data produced by the complete pattern of

n -tuple words can be

assembled in two different ways. In general, there is no way to distinguish between the

alternative assemblies. In principle, if one could read the sequence out to the very end of

the SBH target, the particular ambiguity shown

in Figure 12.12

could be resolved.

BRANCH POINT AMBIGUITIES

405

Figure 12.12

A tandem repeating sequence

results in a branch point ambiguity: Two different re-

constructions are possible from the pattern on

n -tuple words detected.

However, there is no guarantee that this will be possible. The moment that there are more

 

than one recurrences, the ambiguities become almost intractable. For example, the case

 

shown in Figure 12.13, in which a sequence recurs three times, cannot be resolved even if

 

one could read all the way to the ends of the SBH target.

 

 

 

The probability of any particular sequence 8 long recurring in a target

of 200 bp

is

very low: A rough estimate gives 192/4

8 3 10

3. However

the probability that the

target will have one or more recurrence once all possible recurrences are considered is ac-

 

tually quite high. An analogy is asking what is the probability that two people in a room

 

have the same birthday. If you specify a particular date, or pick a particular person,

the

odds of a match are very low. However, if you allow all possible pairings to be consid-

 

ered, the odds are quite high that in a room with 30 people, two will share the same birth-

 

day. So sequence recurrences are a serious problem. They

limit the length of sequence

 

that can be directly and unambiguously read. In general, the chances of recurrences diminish as the length of the word used increases.

Figure 12.13

A more serious branch point ambiguity that leads to uncertainty in the arrangement

of two blocks of single-copy DNA sequence.

406

FUTURE DNA SEQUENCING WITHOUT LENGTH FRACTIONATION

 

 

There are

4

n possible

words of length

n,

for a

four-letter alphabet. Therefore to se-

quence by hybridization could require examining as many as 65,536 possible 8-mers or

262,144 possible 9-mers. Making complete sets of compounds larger than this and con-

trolling their quality is likely to be challenging with present or currently extrapolated

technology. It turns out that an estimate of the average sequence length that can be read

before a

branch point ambiguity arises is given approximately by the

square

root

of

the

number of words used. When the words are DNA sequences, this is 4

 

 

 

n /2

 

 

 

. For 8-mers, the

average length

of

sequence determined between branch points will

be 256.

This

is

quite

an acceptable size scale. It seems like a losing proposition to increase the word size much

beyond 8, unless technical considerations in the hybridizations demand this. Reducing the

word length below 8 will lead to an unacceptably high frequency of branch points, unless

some specific additional strategy is introduced to resolve these ambiguities.

 

 

 

Branch point ambiguities have one additional implication that must be dealt with in all

attempts to implement a successful SBH strategy. The number of branch points present in

a target

sequence

will grow rapidly as the total length of the

target

increases. Thus one

must subdivide the target into relatively short DNA fragments in order to have a reason-

able chance of sequencing each fragment unambiguously. This is a relatively undesirable

 

feature of SBH. However, even if branch points could be resolved some other way, short

targets are probably still mandated in order to diminish complications that may arise from

intramolecular secondary or tertiary structure in the target. More will be said about this

later.

 

 

 

 

 

 

 

 

 

SBH USING

OLIGONUCLEOTIDE

CHIPS

 

 

 

 

 

It is obvious that SBH cannot be practical if one is forced to look at hybridizations between a single oligonucleotide and a single target one at a time. If 8-mers were used, 65,536 different experiments would have to be done to determine a sequence that on average would be a DNA fragment less than 256 bases in length. The major appeal of SBH is

that it seems readily adaptable to highly parallel implementations. There are two very different approaches that are being explored for this. The first is to hybridize a single-labeled sample to an array of all of the possible oligonucleotides it may contain. This is some-

times called format I SBH. The ideal array would

be very small to minimize the amount

of sample that was needed. Hence it is conventional

to call the array a chip, by analogy

with a semiconductor chip. A schematic illustration

of an SBH experiment using such a

chip is shown in Figure 12.14. A real chip would probably contain all 65,536 possible 8- mers, probably each present several times to allow signal averaging and control for reproducibility. The location of each particular oligonucleotide would be known. The actual

patterns of oligonucleotides would probably be rather particular, a consequence of what-

ever systematic method is used

to produce them. The chip

surface itself could be silicon,

or glass, or plastic. The key aspect is that the

oligonucleotides must be covalently at-

tached

to it, and

the surface must not interfere with

the hybridization. The surface must

not show significant amounts of

nonspecific adsorption of the target, and it must not hin-

der,

sterically

or electrostatically, the approach

of the target to the bound oligonu-

cleotides. The ideal surface will also assist, or at least not interfere with, whatever detection system is ultimately used to quantitate the amount of hybridization that has occurred.

Several approaches are being tested to see how to fabricate efficiently a usable chip containing 65,536 8-mers. One basic strategy is to premake all of the compounds in the

SBH USING OLIGONUCLEOTIDE CHIPS

407

Figure 12.14

An example of the expected hybridization pattern of a labeled target exposed to an

 

oligonucleotide chip. The actual chip might contain 65,000 or more probes.

 

array separately, and then develop a parallelized automated device to spot or spray the

 

compounds onto the right locations on the chip. The disadvantage of this approach is that

 

the rate of manufacture of each chip could be fairly slow. The major advantage of this ap-

 

proach is that

the oligonucleotides only have to be made once, and their individual se-

 

quence and purity can be checked. There is no consensus at the present time what the op-

 

timal way would be to manufacture chips given samples of all the 65,536 8-mers. A key

 

variable is how

they will be attached to the chip surface. A long enough spacer must be

 

used to keep the 8-mers well above the surface. Otherwise, the surface is likely to pose a

 

steric restriction for the much bulkier target DNA.

 

The alternate strategies involve synthesizing the array on the chip. One potential gen-

 

eral way to do this is photolithography, a technique that has been very powerful in the

 

construction of semiconductor chips. It has been used quite successfully by Steven Fodor

 

and others at

Affymetrix, Inc. to make dense arrays of peptides, and more recently to

 

make dense arrays of oligonucleotides. The basic requirement is that nucleotide deriva-

 

tives are needed that are blocked from extending, say because the 3

OH is esterified. The

block used, however, can be removed by photolysis. A mask is used to allow selective il-

 

lumination of only those chains that require extension by a particular base in this position

 

(Fig. 12.15). Thus the light activates just a subset of the oligomers on the chip. The chain

 

extension reaction is carried out in the dark. Then, in turn, three other masks are used to complete one cycle of synthesis. The key requirement in this approach is that the photoreaction must proceed at virtually 100% yield. Otherwise, the desired sequences will not be made in sufficient purity. This is a very difficult demand to satisfy with photochemical reactions. Instead, one can still use the principles of masks but just do more standard solid

408 FUTURE DNA SEQUENCING WITHOUT LENGTH FRACTIONATION

Figure 12.15 Construction of an oligonucleotide chip by in situ synthesis using photolithography techniques.

state oligonucleotide synthesis by spraying liquid reagents through the masks. The great power of the lithographic approach is that one can make any array desired—that is, any compound or compounds can be put in any positions on the array.

A different synthetic approach, with more limited versatility, is shown in Figure 12.16. This is the conception of Edwin Southern. It is actually a very simple lithographic approach that makes one particular array configuration efficiently. Figure 12.16 shows the steps in synthesis by stripes that would be needed to generate all possible tetranucleotides. The configuration that results is similar to the way the genetic code is ordinarily written down. Southern actually uses a glass plate as his chip. The reagents needed for the synthesis are pumped through channels between two glass plates as shown in Figure

12.17. It may be hard to miniaturize this design sufficiently to make a really small chip, but the plates made by Southern in this manner were the first dense oligonucleotide arrays actually being tested in real sequencing experiments.

An alternative approach being used to make arrays of 8-mers involves the use of a thin gel rather than a surface. This has the advantage that the sample thickness potentially allows larger amounts of oligonucleotide to be localized. In this approach, developed by Andrei Mirzabekov and coworkers, a glass plate was covered with a 50-micron thin gel. Pre-made oligonucleotides were deposited on the gel in 1 mm spots. The major effect of using a gel rather than a surface is that one has to be concerned about the local concentration of sample during washing steps. On a surface, solvent exchange is quite rapid, so the concentration of free sample can be reduced to zero quickly, and no back reactions of re-

SBH USING OLIGONUCLEOTIDE CHIPS

409

Figure 12.16

Pattern of stepwise DNA synthesis used in Southern’s procedure for in situ synthe-

 

sis of an oligonucleotide chip. Four successive synthetic steps are indicated. Within each square of

 

the array, the sequence of the tetranucleotide synthesized is read left to right from the 3

to 5 direc-

tion starting from the upper row and continuing with the lower row.

 

leased material with the chip need be considered. With a gel, if target is released, it will take quite a while to leave the gel, and during this period there is a significant chance of back reaction with the chip if conditions permit duplex formation. This has both advan-

tages and disadvantages, as we

will illustrate later.

 

 

Regardless of the method

of synthesis, the key technical

issue that must be

overcome

is how the oligonucleotides are anchored at their position in the array. Mirzabekov uses

direct chemical coupling to an oxidized ribonucleotide placed at

the 3

-end of the 8-mer

(Fig. 12.18). This is time-honored nucleic acid chemistry, but it does offer some risk of changing the stability of the resulting duplex because of the altered chemical structure at

the sugar. The approach used by Southern is to attach a long hydrophilic linker arm to the glass surface (Fig. 12.19). This arm has a free primary hydroxyl group that can be used to

Figure 12.17

Glass plates separated by rubber dams are used to direct the reagents in each step of

the procedure illustrated in Figure 12.16.

410 FUTURE DNA SEQUENCING WITHOUT LENGTH FRACTIONATION

Figure 12.18 Method of attachment of oligonucleotides to polyacrylamide gels used by

Mirzabekov and coworkers. From Khrapko et al. (1991).

Figure 12.19 Method of attachment of oligonucleotides to glass plates used by Southern et al. (1992).

initiate the synthesis of the first nucleotide of the 8-mer in standard DNA synthesis proto-

 

cols. It acts chemically exactly like the 3

 

OH of a nucleoside in coupling to an activated

phosphate of the next nucleotide.

 

 

 

SEQUENCING BY HYBRIDIZATION TO SAMPLE

CHIPS

 

 

The second general SBH approach is to make a large, dense array of samples and probe it

 

by hybridization with one labeled oligonucleotide at a time. This is sometimes called for-

 

mat II SBH. In this format, while it takes a long time to complete the sequence, one is ac-

 

tually sequencing a large number of samples simultaneously. A schematic illustration of

 

this approach is shown in Figure 12.20. It looks deceptively similar to the use of oligonu-

 

cleotide chips, but everything is reversed. The array might, for example, correspond to an

 

entire cDNA library, perhaps 2

10 4

clones in all. Because SBH can only

use relatively

short samples, each cDNA might have to be broken down into fragments. It is not imme-

 

diately obvious how to do this with large numbers of clones at once. One possibility is to

 

subclone two different restriction enzyme digests of the cDNA inserts. This scrambles up

 

connectivity information in the original clones; however, in most cases that information

 

would be easily restored by the sequencing process itself, or by rehybridization of any

 

ambiguous fragments back to the original, intact clones. If each 1.5-kb average cDNA

 

clone yielded six fragments in each of the two digests, one would want to array 3.6

10 5

subclones in order to maintain the redundancy

of coverage

of the original library. This

 

would constitute the sample chip.

 

 

 

SEQUENCING BY HYBRIDIZATION TO SAMPLE CHIPS

411

Figure 12.20 An example of the expected hybridization pattern of a labeled oligonucleotide to a sample chip. The actual chip might contain 20,000 or more samples.

Since the individual components are available in any desired quantity, one could, in

 

principle, make as many copies of the array as could conveniently be handled simultane-

 

ously. In practice, it does not seem at all unreasonable to suppose that 100 copies of the

 

array could be processed in parallel. It is envisaged that the sample chips be made by us-

 

ing the robotic

x y tables common in the semiconductor industry (Fig. 12.21). These are

 

very accurate and fast. It has been estimated that a sample density of 2

10 4 per 10 to 20

Figure 12.21

How a robotic

x y table can be used in offset printing to construct of dense arrays

of samples or probes (at right) starting from more expanded arrays (at left).

412

 

FUTURE DNA SEQUENCING WITHOUT LENGTH FRACTIONATION

 

 

cm

2 is quite practical. Thus the entire array of subclones could be contained in 180 to 360

 

cm

2, which is about 30% to 60% the size of a typical 8.5

 

 

11 inch piece of paper.

 

 

If octanucleotides are used as hybridization probes to this large array, only a small

 

fraction of the samples will show a positive signal. For one 250-bp subclone, the odds of

 

containing any particular 8-mer are 243/4

8 3.8 10

3; the possibility of a positive hy-

bridization is less than 0.4%. When this is multiplied by the number of subclones in the

 

array, there should be an average of 1.4

 

10

3 positive subclones per hybridization. Since

 

each yields eight bases of DNA sequence data, the rate of sequence acquisition per single

 

hybridization is 1.1

 

10 4

bp. If 100 chips can really

be

managed simultaneously, and if

 

three hybridizations can be done per day, the overall

throughput

is

3.3

 

6

10 . This is

quite an impressive rate, and many of the variables used to estimate it are probably con-

 

servative.

 

 

 

 

 

 

 

 

 

 

 

 

A major feature of the use of sample chips in sequencing projects is that the approach

 

does

not scale down conveniently. Sample chips are only

useful if

entire libraries

are to

 

be sequenced as a unit. Such a method makes good sense for cDNAs and the genomes of

 

model organisms. If all 65,536 8-mers must be used, at the rates we estimated above of

 

300 hybridizations per day, it will take more than half a year to complete the sequencing.

 

Scaling down would not reduce the time of the effort

at all; it would just

reduce the

 

amount of sequence data ultimately obtained. In practice, one does not have to use all

 

65,536 compounds to determine the sequence. Because of the considerable redundancy in

 

the method, one ought to be able to use just a fraction

of all 8-mers. The exact

fraction

 

will depend on error rates in the hybridization, how branch points will

be resolved, and

 

what kind of sequencing accuracy one desires. In some

of the enhanced SBH schemes

 

that

will be described later, it has been

estimated that one might be

able to operate close

 

to a redundancy of one rather than eight. However, this remains to be demonstrated in

 

practice.

 

 

 

 

 

 

 

 

 

 

 

EARLY

EXPERIENCES

WITH

SBH

 

 

 

 

 

 

 

 

A major difficulty in testing the potential of SBH and evaluating the merits of different

 

SBH strategies or particular variations on conditions,

sample attachment, and so on, is

 

that the method does not scale down. A particular problem occurs in the use of oligonu-

 

cleotide chips. It is difficult to vary parameters using the set of all 65,536 8-mers. Indeed,

 

no one has yet actually made this set of compounds. Instead, several more limited tests of

 

SBH have been carried out.

 

 

 

 

 

 

 

 

 

Southern has used the scheme shown in Figure 12.16 to make a chip containing four

 

copies of all possible octapurine sequences (A and G only). An example of some data ob-

 

tained with this chip is shown in Figure 12.22. In the actual example used, the labeled tar-

 

get DNA was a specific sequence of 24 pyrimidines (C and T). This contains 17 different

 

8-tuples, and so 17 positive hybridization spots would be expected. The actual results in

 

Figure 12.22 are much more

complex than this. Two problems need

to

be dealt

with,

 

which illustrate some of the basic issues in trying to

implement SBH on a large scale.

 

First the amount of oligonucleotide at each position in the array differs. More important,

 

the strength

of hybridization

to different sequences varies quite

a

bit. Duplexes

rich in

 

G C will be more stable, under most

ordinary hybridization conditions than duplexes

 

rich in A

T. There are ways to compensate for this, as we will illustrate later, and one of

 

these was actually

used

with the samples

in Figure 12.22. But

the compensation

is not

 

EARLY EXPERIENCES WITH SBH

413

Figure 12.22

Properties of an octapurine chip which contains four replicas of all 256 octapurines.

(a ) Hybridization pattern with an equimolar mixture of all octapyrimidines to show variation in the

amount of attached

purine. (

b ) Pattern of hybridization seen with a labeled 24-base target. From

Southern et al. (1992).

 

perfect, and so there is a variability in the signal intensity that needs to be evaluated before a hybridization is scored as positive.

The second basic problem is cross-hybridization with single mismatches. This is a serious problem under the conditions used in Figure 12.22. From the results of these and other experiments, it has been estimated that most of the sample is not hybridized to the correct matches but instead forms a background halo of hybridization with numerous mismatches. Methods are being developed to correct for all these problems and make the

best estimates of the right sequence in cases like Figure 12.22. It is too early to judge the effectiveness of these methods. A final potential problem with the test case used by Southern is that homopurine sequences can form triplexes with two antiparallel pyrimi-

dine complements. These triplexes are quite stable, as we will illustrate in Chapter 14. It is conceivable that triplexes could have formed under the conditions used to test the oligopurine arrays, and since they would lead to systematic errors, one could go back and look for them.

Hans Lehrach has shown that oligonucleotide hybridization works well in fingerprinting samples for mapping (Chapter 9). These experiments provide some insight into the potential use of sample arrays for sequencing. Lehrach hybridizes single oligomers, or small pools of compounds, with large arrays of clones. This has successfully led to finished maps, so the sequence specificity under the conditions used must be reasonably good. However, since the mapping systems can tolerate considerable error, this is not a robust test of whether this approach will actually give usable sequence. What greatly expedites these experiments is that for fingerprinting, any oligonucleotide is as good as any other, so a large set of synthetic compounds is not needed to test the basic strategy.

 

Using

the same approach, with a few immobilized

samples,

Radoje Drmanac

and

Radomir

Crkvenjakov successfully completed two short

pilot sequencing projects

by

SBH.

In the first case, the 100-base sequence was

known

in advance, as was

Соседние файлы в папке genomics11-15