Добавил:
Опубликованный материал нарушает ваши авторские права? Сообщите нам.
Вуз: Предмет: Файл:
Скачиваний:
45
Добавлен:
17.08.2013
Размер:
577.75 Кб
Скачать

Genomics: The Science and Technology Behind the Human Genome Project.

Charles R. Cantor, Cassandra L. Smith

 

Copyright © 1999 John Wiley & Sons, Inc.

 

ISBNs: 0-471-59908-5 (Hardback); 0-471-22056-6 (Electronic)

14 Sequence-Specific

Manipulation of DNA

EXPLOITING THE SPECIFICITY OF BASE-BASE RECOGNITION

In this chapter various methods will be described that take advantage of the specific recognition of DNA sequences to allow analytical or preparative procedures to be carried

out on a selected fraction of a complex DNA sample. For example, one can design chemical or enzymatic schemes to cut at extremely specific DNA sites, to purify specific DNA

sequences, or to isolate selected classes of DNA sequences. Methods have

been devel-

oped that allow the presence of repeated DNA sequences in genomes to be used as pow-

erful analytical tools instead of serving as roadblocks for mapping and DNA sequencing.

Other methods have been developed that allow the isolation of DNAs that recognize spe-

cific ligands. Finally a large number of programs are

underway to explore the

direct use

of DNA or RNA sequences as potential drugs.

 

 

In almost all of the objectives just outlined,

a fundamental strategic

decision must be

made at the outset. If the DNA target of interest can be melted without introducing unwanted complications, then the single-stranded DNA sequence can be read, directly, and

the full power of PCR can usually be brought to bear to assist in the manipulation of the DNA target. PCR has been well described in Chapter 4, and there is no need to re-intro- duce the principles here. In those cases where it is not safe or desirable to melt the DNA, alternative methods are needed. Such cases include working with very large DNA, which

will break if melted, and working in vivo. Here a very attractive approach is to use DNA triplexes that are capable of recognizing the specific sequence of selected portions of an

intact

duplex

DNA. Triplexes

may not have been encountered by

some

readers before,

and so their basic properties will be described before their utility is demonstrated.

STRUCTURE OF TRIPLE-STRANDED DNA

 

 

Unanticipated formation of triple-stranded DNA helices was a

scourge of early experi-

ments with model DNA polymers. Most of the first available synthetic DNAs were ho-

mopolymers like poly dA and poly dT. Contamination of samples or buffers with magne-

sium

ion was

rampant. DNAs

love to form triplexes under

these

conditions, if the

sequence permits it. Many homopolymeric or simple repeating sequences can form triplestranded complexes consisting of two purine-rich and one pyrimidine-rich strand or one purine-rich and two pyrimidine-rich strands, depending on the conditions. This is true for DNAs, RNAs, or DNA-RNA mixtures. Eventually conditions were found where the un-

wanted

formation

of

these

triplexes could be suppressed. The whole issue was forgotten

and lay

dormant

for

more

than a decade. Triplexes were rediscovered, under much more

470

STRUCTURE OF TRIPLE-STRANDED DNA

471

Figure 14.1 Appearance of S1 nuclease hypersensitive sites upstream from the start of transcription of some genes.

interesting circumstances when a decade ago investigators began to explore the chromatin structure surrounding active genes.

The key observation that led to a renaissance of interest in triplexes is a phenomenon called S1 hypersensitivity. S1 nuclease is an enzyme that cleaves single-stranded DNA specifically, usually at slightly acidic pH. It will not cleave double strands; it will not even cleave a single-base mismatch efficiently, although it will cut at larger mismatches. Investigators were using various nucleases to examine the accessibility of DNA segments

near or in genes as a function of the potential for gene expression in particular tissues. Unexpectedly, many genes showed occasional sites where S1 could nick one of the DNA

strands, upstream from the start of transcription, quite efficiently (Fig. 14.1). The phenomenon was termed S1 hypersensitivity. Its implication was that some unusual structure

must exist in the region, rendering the normal duplex DNA susceptible to attack. To iden-

tify

the sequences responsible for S1 hypersensitivity, upstream sequences were cloned

and

tested

for

S1 sensitivity. Fortunately they were initially tested within

the plasmids

used

for cloning. These

plasmids were highly supercoiled, and S1 hypersensitive sites

were

found

and

rapidly

localized to complex homopurine stretches like the

example

shown in Figure 14.2. The S1 nicks were found to lie predominantly on the purine-rich strand. It was soon realized that the S1 hypersensitivity, under the conditions used, required a supercoiled target. The effect was lost when the plasmid was linearized, even by

cuts far away from the purine block.

The problem that remained was to identify the nature of the altered DNA structure responsible for S1 hypersensitivity. The dependence of cleavage on a high degree of superhelicity implied that the sites must, overall, be unwound relative to the normal B DNA duplex. Obvious possibilities were melted loops, left-hand helix formation, or cruciform extrusion (formation of an intramolecular junction of four duplexes like the Holliday structure illustrated in Chapter 1). None of these, however, were consistent with the par-

ticular DNA sequences that

formed the S1 hypersensitive sites, and

none could

explain

why only the purine strand suffered extensive nicking. The key

observation

that resolved

this dilemma was made by

Maxim Frank-Kamenetskii, then

working

in

Moscow.

He

noted that there was a direct correlation between the amount of supercoiling needed to reveal the S1 hypersensitivity and the pH used for the S1 treatment. A quantitative analysis of this effect indicated that both the amount of unwinding that occurred when the S1 hypersensitive site was created, and the number of protons that had to be bound during this

Figure 14.2 DNA sequence of a typical S1 hypersensitive site.

472 SEQUENCE-SPECIFIC MANIPULATION OF DNA

Figure 14.3

 

Formation of a DNA

triplex by disproportionation of two homopurine-homopyrimi-

 

 

dine duplexes.

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

process could be explained by a simple model, which involved the formation of a specific

 

 

 

intramolecular pyrimidine-purine-pyrimidine (YRY) triple helix.

 

 

 

 

 

 

 

It is easiest to examine intermolecular triplex formation before considering the ways in

 

 

which such structures might be formed

intramolecularly at the S1 hypersensitive site.

 

Figure 14.3 illustrates a disproportionation reaction between two duplexes that results in a

 

triplex and a free single strand. This is precisely the sort of reaction that occurred so fre-

 

quently in early studies with DNA homopolymers

and led to the

presence

of unwanted

 

 

 

DNA triplexes.

If

such a reaction

is

assayed

 

by

S1

sensitivity,

disproportionation

by

 

the appearance of a single-stranded polypurine would be detected. The corresponding

 

 

possible intramolecular reactions are illustrated in Figure 14.4. Here a block of homopy-

 

rimidine sequence folds back on itself (spaced by a short hairpin) to make an intramolec-

 

 

ular triplex; the remaining homopurine

stretch, not involved in the triplex, is

left

as a

 

large single-stranded loop. It is

this loop

that

is

the

target

for

the

S1

nuclease. The

 

net topological effect is an unwinding

of roughly half of the homopurine-homopyrimi-

 

 

dine duplex stretch. This is consistent with what is seen experimentally. In order to form

 

 

base triplets between two pyrimidines

and one

purine,

a T: A – T

 

complex can

form

 

 

directly, but

a CH

: G – C

complex

requires

protonation

of

the

N

 

3

of one C, as shown

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

in Figure 14.5.

Figure 14.4

Two intramolecular routes for formation of triplexes from a long homopurine-

homopyrimidine duplex. The structure shown on the right is the one consistant with a large body of

available chemical

modification data.

Y and

R

refer, respectively, to homopyrimidine and homo-

purine tracts.

 

 

 

 

STRUCTURE OF TRIPLE-STRANDED DNA

473

Figure 14.5

Acid-stabilized triplex base pairing schemes. A dash (-) indicates the normal Watson-

Crick base pairing scheme while a colon (:) indicates the base pairs which involve the third strand.

(a) As written schematically.

(b) Actual proposed structures.

There are two possible isomeric models consistent with the unwinding and pH depen-

dence of formation of the YRY triple strand. In both the two pyrimidine strands run an-

tiparallel to each other; the Watson – Crick pyrimidine strand is antiparallel to the

purine

strand; the triplex pyrimidine strand is parallel to the purine strand. The specific structural

models proposed require that the pyrimidine sequences have mirror symmetry. This can

 

be tested by manipulating particular DNA sequences, and it turns out to be valid. In addi-

tion the two models in Figure 14.4 can be evaluated by looking at the pattern of accessi-

bility of the S1 hypersensitive structure to various agents that chemically modify DNA in

a structure-dependent manner. These studies reveal that the correct model for the S1 hy-

persensitive structure is the one shown on the right in Figure 14.4, where the 3

segment

of the purine stretch is the one incorporated into the triplex. The reason why this structure

predominates is not known.

With the principles of triplex formation in S1 hypersensitive sites understood, a number of research groups began to explore the properties of simple linear triplexes more sys-

tematically. The

need for superhelical density to drive the formation of triplex can

be

avoided simply by working at a low enough pH. In practice, pH 5 to 6 suffices for most

 

sequences capable of forming triplexes at all. A surprise was the remarkable stability of

 

triplexes. They

can survive electrophoresis, even with lengths as small as 12. Proof

that

the third strand lies in the major groove of the Watson – Crick duplex, and that the third,

pyrimidine,

strand

is antiparallel to the Watson – Crick pyridine strand, was obtained by

an elegant

series

of experiments in which agents were attached to the ends of the third

strand that

were capable of chemically nicking bases on the duplex. The specific pattern

of nicks provided a detailed picture of the structure of the complex (Fig. 14.6).

A second type

of DNA triplex, stable at pH 7 was soon rediscovered. This purine-

purine-pyrimidine

(RRY) was precisely the form known two decades before, stabilized

474 SEQUENCE-SPECIFIC MANIPULATION OF DNA

Figure 14.6

Structure of DNA triplexes determined from

chemical

modification

experiments.

(a)

The

third strand lies in the major groove of the duplex helix.

 

 

 

(b) Strand directions in structures with

 

two

pyrimidine strands and one purine strand.

 

(c) Strand directions in structures with two purine

 

strands and one pyrimidine strand. Shading in

(b)

and

(c)

indicates the Watson-Crick duplex.

 

by Mg

2 or other

polyvalent cations. This structure could also lead to an S1 hypersensi-

tive site in supercoiled plasmids. However, here the pyrimidine-rich strand was nicked by

 

 

the

enzyme instead

of

the purine-rich strand. Some DNA sequences can actually form

 

both types of triplexes, depending on the conditions. The structure of the S1 hypersensi-

tive

sites favored by divalent ions is shown in Figure

14.7. This particular

isomer

is

the

one

consistent

with

the

observed pattern of modification with various chemical

agents

 

 

that react with DNA covalently. Three types of base triples can be accommodated in this

 

 

structure: G: G – C, A: A – T, and T: A – T. Their patterns of

hydrogen bonding

are shown

in Figure

14.8. The two non – Watson – Crick base-paired strands

in

these

complexes

are

antiparallel; this is supported by studies on particular DNA sequences. As in the type of

 

triplex described earlier, the third strand, in this case an additional purine or an additional

pyrimidine

strand,

lies

in the major groove of the

Watson – Crick

duplex

(Fig.

14.6).

Studies using

circular

oligonucleotides can help confirm assignments about the direction

 

of strands in triplexes (Kool, 1995).

 

 

 

 

 

 

 

More

complex triple helices can also be made. An example is shown

in Figure 14.9.

Here

all

three

strands

must contain blocks of alternating homopurine and homopyrimi-

 

 

dine sequences. The third strand lies down in the major

groove of the Watson – Crick du-

 

plex, and alternate blocks made triplexes with two pyrimidine and one purine strand and

 

 

triplexes with one pyrimidine and two purine strands. As our knowledge of triplex struc-

 

tures increases, and as base analogs are tested, it will undoubtedly be possible to design a

wealth of

triplexes

in

which a third strand can be used to recognize

a wide variety

of

DNA duplex sequences. Based on experience to date, these triplexes are likely to be quite

 

stable. One strong caveat to using them in various biological applications must be noted.

The

kinetics of triplex

formation and dissociation are

very slow, much

slower

than the

 

rates of corresponding processes in duplexes.

Figure 14.7

Intramolecular triplex structure formed at neutral pH in the presence of Mg

2 ions.

Figur e 14.8

T riple x base pairing schemes f

a v o red by Mg

2 ions.

(a) As written schematically

. (b) Actual proposed structures.

475

476 SEQUENCE-SPECIFIC MANIPULATION OF DNA

Figure 14.9 An example of a more complex triplex structure formed by alternating blocks of purines and pyrimidines. The third strand lies in the major groove of the Watson-Crick duplex. Its interactions with the duplex are indicated by dots.

TRIPLEX-MEDIATED DNA CLEAVAGE

The first application of triplexes to be discussed is their use in recognizing particular duplexes and rendering these susceptible to specific chemical or enzymatic cleavage. This potential was already described briefly in the previous section when chemical derivatives

of the third strand were used

to help analyze the structure

of the

triplex. The appeal of

this approach is that it will be relatively easy to find or introduce a unique DNA sequence

capable of forming triple strands into a target of interest. Subsequent cleavage at this se-

quence would represent the sort of cut that is extremely useful for executing any of the

Smith-Birnstiel-like mapping strategies we described in Chapter 8.

 

 

Chemical cleavage agents that have been tried include Cu-phenanthroline complexes,

iron-EDTA-ethidium bromide complexes, and others shown elsewhere in the chapter. The

types of reactions one would like to be able to carry out with these modified oligonu-

cleotides

are shown schematically in Figure 14.10. Rather good yields and specificities

have

been

observed when the chemical cleavage is

used to cut the complementary strand

of a duplex (Fig. 14.10

a ). Much less success has been had with direct triplex-mediated

cleavage

of a duplex (Fig. 14.10

b ). Generally, nicking of one strand of the duplex pro-

ceeds very well, but it is difficult to make the second cut needed to affect a true double-

strand cleavage. The reason for this is that many of the chemical agents used are stoichio-

metric rather than catalytic. They have to be reactivated or replaced by a fresh reagent in

order

to

be able to perform a second strand cleavage. While elegant

chemical

methods

have been proposed to circumvent this problem,

to date, specific efficient duplex chemi-

cal cleavage has been an elusive goal. However, this has not proved to be a serious road-

block, because alternative methods for using

triplexes to

promote

specific

enzymatic

cleavage of duplexes have been very successful.

 

 

 

 

Achilles’s heel strategies are based on the general notion of using restriction methyl-

ases to protect all except a single or small set of protected recognition sites. This renders

most

of

the potential sites in the sample resistant to the conjugate restriction nuclease.

Then

the

protecting agent is

removed, and the

nuclease added. Cleavage only occurs at

Figure 14.10 Triplex-mediated DNA cleavage using a DNA strand containing a chemically reac-

tive group

(x) that generates radicals.

(a) Cleavage of a single strand.

(b) Cleavage of a duplex.

 

 

 

 

 

 

TRIPLEX-MEDIATED DNA CLEAVAGE

477

the sites that escaped the initial protection. These strategies were named by one of their

 

developers, Wacslaw Szbalski, at the University of Wisconsin, by analogy to the myth of

 

Achilles. As an infant Achilles was dipped into the river Styx by his mother which ren-

 

dered him immune to all physical harm except for his heel, which was masked from the

 

 

effects of the Styx because his mother was holding him there. Two straightforward exam-

 

ples of Achilles’s heel specific cleavage of DNA are shown in Figure 14.11. These ap-

 

proaches are applicable to very large DNA

or even intact genomic DNA, since they can

 

 

be carried out in situ in agarose. In one

case it is necessary to find a tight binding site for

 

a protein that masks a restriction site

(Fig.

14.11

a ).

Examples of suitable protein and

 

binding sites are lac repressor, lambda repressor,

 

 

E. coli

lexA protein, or a host of eukary-

 

otic

transcription factors,

particularly

viral factors

like

the NFAT protein. To be

useful,

 

the site must contain an internal or nearby flanking restriction site. There is no guarantee

 

that such a site will conveniently exist

in a target of interest. However, given the large

 

number of potentially useful restriction

sites, there are many possibilities. If necessary,

 

for some applications the desired site can always be designed and introduced.

 

 

Instead of proteins, triplexes can

be used to mask restriction sites (Fig.

14.11

b ). It

turns

out that embedding a

four-base restriction

site

in

a homopurine-homopyrimidine

 

Figure 14.11

Achilles’s heel strategies for specific DNA cleavage.

(a) Blocking a restriction

en-

zyme cleavage site

E with a DNA binding protein.

(b) Blocking a restriction enzyme cleavage site

E

with a triplex.

M indicates methylation sites.

 

 

478

 

SEQUENCE-SPECIFIC MANIPULATION OF DNA

 

 

 

 

 

 

 

 

 

 

 

 

stretch destabilizes the resulting triplex only slightly. However, triplex formation renders

 

the site totally unaccessible to the restriction methylase. After the remaining restriction

 

sites have been methylated, conditions are altered to dissociate the triplex. Then the re-

 

striction enzyme is added, and cleavage is allowed to occur. This Achilles’s heel approach

 

works very well, even at the level of single sites in the human genome. However, it still

 

suffers from the limitation that only a small subset of sequences within a target will be

 

potential sites of triplex-mediated specific cleavage.

 

 

 

 

 

 

 

 

 

 

 

 

 

A

generalization of the Achilles’s heel approach is

possible by the use of the

 

E. coli

recA protein. Developed by Camerini-Otero and elaborated by Szybalski (Koob et al.,

 

1992), this method has been called recA-assisted restriction endonuclease (RARE) cleav-

 

age (Fig. 14.12). The method is applicable to genomic DNA because all of the steps can

 

be carried out in agarose. The recA protein has a number of different activities. One of

 

these is a cooperative binding to single-stranded DNA, leading to a completely coated

 

complex containing about one recA monomer for every five

bases

(Fig.

14.12

 

 

a ). The

coated complex will then interact with double-stranded DNA molecules in a search for

 

 

sequences homologous to the single strand. In

 

 

 

 

 

 

 

 

 

E. coli

this process constitutes one of the

early steps that eventually leads to strand invasion and recombination. In the test tube,

 

without

accessory

nucleases,

the

reaction stops

if

a

homologous

duplex

sequence

is

 

found,

and the third strand remains complexed to this

homolog,

even

if

the recA

protein

 

 

is subsequently removed (Fig. 14.12

 

 

 

 

b ). The mechanism of the sequence search is un-

known. Similarly the actual nature of the complexes formed with recA protein present or

 

 

after

recA protein removal are still not completely understood despite intense efforts to

 

study

these processes because of their importance in basic

 

 

 

 

 

 

 

 

 

E. coli

biology. Some sort of

triple strand is believed to be involved, although this has never been proven. What is key,

 

however,

for

Achilles’s

heel

applications,

is

that

the recA

protein-mediated

complex

 

blocks

the access of restriction methylases to duplex DNA sequences contained within it.

 

 

A

schematic outline of RARE cleavage is given in

Figure

14.12

 

 

 

 

c . The technique has

worked well to cut at two selected sites 200 to 500 kb apart in a target to generate a spe-

 

cific internal fragment, and generation of a 1.3-Mb telomeric DNA fragment that requires

 

only a single RARE cleavage has been reported. In practice, it has been more efficient to

 

use a six-base specific restriction system like rather than the four-base systems used with

 

other

Achilles’s heel methods. The reason is that a common

source

of

background

in

 

these approaches is incomplete methylation. This produces a diverse distribution of hemi-

 

 

methylated sites which are cut, albeit slowly, by the conjugate nuclease. The result is a

 

significant background of nonspecific cleavage. This background

 

can be markedly re-

 

 

duced by going to the six-base enzyme, since its sites are 16 times less frequent, on aver-

 

age. Since recA-mediated cleavage is applicable, in principle, to any selected DNA se-

 

quence, the rarity of six-base cleavage sites does not pose a particular obstacle.

 

 

 

 

The recA protein-coated single strands can be as short as 15 bases for RARE cleavage,

 

although

in

practice

targets

two to four times this length

 

are

usually

employed.

One

 

makes a trade-off between the increased efficiency and specificity obtained with longer

 

 

complexes, and the lowered efficiency of their diffusion into agarose-embedded DNA

 

 

samples. Yields of the desired duplex of 40% to 60% have been reported in early experi-

 

ments.

It remains

to

be

seen

how

generally

obtainable

such high yields will be. The

 

power

of

RARE

cleavage in

physical mapping

is

that

given

two

DNA

probes

spaced

 

within about 1 Mb, RARE cleavage should provide the DNA between these probes as a unique fragment free from major contamination by the remainder of the genome.

TRIPLEX-MEDIATED DNA CLEAVAGE

479

Figure

14.12

RARE cleavage of

DNA.

(a) Complex

formed between

E. coli

recA protein and

single-stranded DNA.

(b) Complex between a recA-protein coated DNA strand and the homologous

 

 

sequence

in duplex DNA.

(c)

Outline

of the procedure used to

generate a large DNA fragment

be-

 

tween two known sequences by RARE cleavage.

Соседние файлы в папке genomics11-15