Добавил:

fench Опубликованный материал нарушает ваши авторские права? Сообщите нам.

Вуз:

Сумский государственный университет

Предмет:

Генетика

Файл:

Genomics- The Science and Technology Behind the Human Genome Project. Charles R. Cantor, Cassandra L / genomics1-10 / 3

.pdf

Скачиваний:

Добавлен:

17.08.2013

Размер:

343.56 Кб

Скачать

☆

1 / 41 2 3 4 > Следующая >>>

Genomics: The Science and Technology Behind the Human Genome Project.	Charles R. Cantor, Cassandra L. Smith
	Copyright © 1999 John Wiley & Sons, Inc.
	ISBNs: 0-471-59908-5 (Hardback); 0-471-22056-6 (Electronic)

3 Analysis of DNA Sequences

by Hybridization

BASIC

REQUIREMENTS

FOR

SELECTIVITY

AND

SENSITIVITY

The haploid human genome is 3

base

pairs, and a

typical human cell, as described in

the last chapter, is somewhere between diploid and tetraploid in DNA content. Thus each cell

has about 10

base pairs of DNA. A single base pair is 660 Da. Hence the weight of DNA in a

single cell can be calculated as 10

660 / (6

23 ) 10

g or 10 pg. Ideally we would

like to be able to do analyses on single cells. This means that if only a small portion of the

genome is the target for analysis, far less than 10 pg of material will need to be detected. By

current methodology we are in fact able to determine the presence or

absence of almost

any

20-bp DNA sequence within a single cell, such as the sequence ATTGGCATAGGAGCC-

CATGG. This analysis takes place at the level of single molecules. Two requirements must be

met to perform such an exquisitely demanding analysis. There must be sufﬁcient experimental

sensitivity to detect the presence of the sequence. This sensitivity is provided by either chemi-

cal or

biological

ampliﬁcation

procedures or

a combination of these procedures.

There

must also be sufﬁcient experimental selectivity to discriminate between the desired, true target

sequence and all other similar sequences, which may differ from the target by as little as one

base. That speciﬁcity lies with the intrinsic selectivity of DNA base pairing, itself.

The target of a 20-bp DNA sequence is not picked casually. Twenty bp is just about the

smallest DNA length that has a high probability,

a priori, of being found in a

single copy

in the human genome. This can be deduced as

follows

from

simple binomial

statistics

(Box 3.1).

For

simplicity,

pretend

that the human genome contains equal amounts

the

four

bases, A, T, C, and G, and

that the occurrences of the bases are random. (These con-

straints will be relaxed elsewhere in the book when some of the unusual statistical proper-

ties of natural DNAs need to be considered explicitly. Then the expected frequency of oc-

currence

of any

particular

stretch

DNA

sequence,

such

bases beginning as

ATCCG . . ., is 4

n . The

average number

occurrences

of this particular sequence in

the haploid human genome is 3

4 n . For a sequence of 16 bases,

n 16, the av-

erage

occurrence

is 3

9 4 16

which is

about 1. Thus

such a

length

will tend to

seen as often as not by chance; it is not long enough to be a unique identiﬁer. There is a

reasonable chance that the sequence 16 bases long will occur several times in different

places in the genome. Choosing

20 gives an average occurrence of about 0.3%. Such

sequences will almost always be unique genome landmarks. One corollary of this simple

exercise is that it is a very futile exercise to look at random for the occurrence of particu-

lar 20-mers in the sequence of a higher organism unless there is good a priori reason for

suspecting the presence of these sequences. This means

that sequences of

length 20

more can be used as unique identiﬁers (see Box 3.2).

BASIC REQUIREMENTS FOR SELECTIVITY AND SENSITIVITY

BOX 3.1

BINOMIAL

STATISTICS

Binomial

statistics

describe

the

probable outcome of events like coin

ﬂipping,

events that depend on a

single random variable. While a normal coin has a

50%

chance of heads or tails

with each ﬂip, we will consider here the more general

case

weighted

coin

with

two

possible

outcomes

with

probabilities

(heads)

and

(tails). Since there are no other possible outcomes

1. If N

successive ﬂips

are executed, and the outcome is a particular string, such as

hhhhttthhh,

the

chance

of this particular outcome is

p n q N n ,

where

is the number

of times heads was ob-

served. Note that all strings with the same numbers of heads and tails will have the

same a priori probability, since in binomial statistics each event does not affect the

probability of subsequent events. Later in this book we will deal with cases where

this extremely simple model does not hold. If we care only about the chance of an

outcome

with

heads

and

tails,

without regard

sequence,

the

number of

such

events

!/(n !)(N n )!, and so the fraction of

times

this

outcome

will

seen is (

p n q N n )N !/(n !)(N n )!

A simple binomial model can also be used to estimate the frequency of

occur-

rence

particular

DNA

base sequences. Here there are four possible outcomes

(not quite as complex as dice throwing where six possible outcomes occur). For a

particular

string

with

n A A’s,

n C

C’s,

n G

G’s and

n T

T’s, and

a base

composition

n T .

and

the

chance occurrence

that

string

n A X

n C X

n G X

The

number of possible strings with a particular base composition is

!/(n A

!n C !n G !n T !),

and by combining this with the previous term, the probability of a string with a par-

ticular base composition can easily be computed. Incidentally, the number of possi-

ble strings

length

4 N

, while

the

number

of different

base

compositions

this length is (

3)!/(N !3!).

The

same

statistical

models can

be used to make estimates that two people

will

share the same DNA sequences. Such estimates are very useful in DNA-based identity

testing. Here we consider just the simple case of two allele polymorphisms. In a par-

ticular place in the genome, suppose that a fraction of all individuals have one base,

f1 ,

while the remainder have another,

f2. The chance that two individuals share

the

same

allele is

2 f

2. If

a set

two-allele

polymorphisms

(

i, j, k, . . .)

is consid-

ered

simultaneously,

the

chance

that

two

individuals

are

identical

for

all

them is

2i g j2g k2. .

. .

choosing

sufﬁciently

large,

can

clearly

make

the

overall

chance too low to occur,

unless the individuals in question are one and the

same.

However, two caveats apply to this

reasoning. First, related individuals will

show

much higher degree of similarity than predicted by this model. Monozygotic twins, in

principle, should share an identical set of alleles at the germ-line level. Second, the

proper allele frequencies to use will depend on the racial, ethnic, and other genetic

characteristics of the individuals in question. Thus it may not always be easy to select

appropriate values. These difﬁculties notwithstanding, DNA testing offers a very pow-

erful approach to identiﬁcation of individuals, paternity testing, and a variety of foren-

sic applications.

66	ANALYSIS OF DNA SEQUENCES BY				HYBRIDIZATION

BOX 3.2
DNA SEQUENCES		AS	UNIQUE SAMPLE IDENTIFIERS
The following table shows the number of different						sequences	of length		n	and com-
pares these values to the sizes of various genomes. Since genome size is virtually the
same as the number of possible short substrings, it is easy to determine the lengths of
short sequences that will occur on average only					once per genome. Sequences a few
bases longer than these lengths will, for all practical purposes, occur either once or not
at all, and hence they can serve as unique identiﬁers.
LENGTH			NUMBER OF SEQUENCES				GENOME	, GENOME SIZE (BP )
	8		6.55		10	4	Bacteriophage lambda, 5			10 4
	9		2.60		10	5
	10		1.05		10	6
	11		4.20		10	6	E. coli,	4 10 6
	12		1.68		10	7	S. cerevisiae, 1.3		10	7
	13		6.71		10	7
	14		2.68		10	8	All mammalian mRNAs, 2			10	8
	15		1.07		10	9
	16		4.29		10	9	Human haploid genome, 3			10	9
	17		1.72		10	10
	18		6.87		10	10
	19		2.75		10	11
	20		1.10		10	12

DETECTION OF SPECIFIC DNA SEQUENCES
DNA molecules themselves are the perfect set of reagents to identify particular DNA se-
quences. This is because of the strong, sequence-speciﬁc base pairing between comple-
mentary DNA strands. Here one strand of DNA will be considered to be a target, and the
other, a probe. (If both are not initially available in a single-stranded form, there are many
ways to circumvent this complication.) The analysis for a particular DNA sequence con-
sists in asking whether a probe can ﬁnd its target					in the sample of interest. If the probe
does so, a double-stranded DNA complex will be formed. This							process is called				hy-
bridization,		and all we have to do is to discriminate between this complex and the initial
single-stranded starting materials (Fig. 3.1							a ).
	The earliest hybridization experiments were carried out in homogeneous solutions.
Hybridization was allowed to proceed for a ﬁxed time period, and then a physical separa-
tion was performed to capture double-stranded material and discard single strands.
Hydroxyapatite chromatography was used to do this					discrimination because conditions
could be found in which double-stranded DNA bound					to a column		of hydroxylapatite,
while single strands were eluted (Fig. 3.1							b). The amount of double-stranded DNA could
be quantitated by using a radioisotopic label on the						probe or the target, or by measuring
the	bulk amount	of	DNA captured or eluted.	This method is still used today in select

cases, but it is very tedious because only a few samples can be conveniently analyzed simultaneously.

EQUILIBRIA BETWEEN DNA DOUBLE AND SINGLE STRANDS

Figure 3.1 Detecting the formation of speciﬁc double-stranded DNA sequences.

is to tell whether the sequence of interest is present in single-stranded (s.s.) or double-stranded (d.s.) form. (b)Physical puriﬁcation by methods such as hydroxyapatite chromatography. Physical puriﬁcation by the use of one strand attached to an immobilized phase. A label is used to

detect hybridization of the single-stranded probe.

Modern hybridization protocols immobilize one of the two DNAs on a solid support (Fig. 3.1 c ). The immobilized phase can be either the probe or the target. The complementary sample is labeled with a radioisotope, a ﬂuorescent dye, or some other speciﬁc moiety that later allows a signal to be generated in situ. The amount of color, or radioactivity, on the immobilized phase is measured after hybridization, for a ﬁxed period, and subsequent washing of the solid support to remove adsorbed, but nonhybridized, material. As

will be shown later, an advantage of this method is that many samples can be processed in parallel. However, there are also some disadvantages that will become apparent as we proceed.


EQUILIBRIA BETWEEN DNA		DOUBLE	AND	SINGLE	STRANDS
The	fraction of singleand double-stranded DNA in solution can be monitored by vari-
ous	spectroscopic properties that effectively average over different					DNA sequences.
Such measurements allow		us to view	the	overall	reaction of DNA	single strands.

(a)The problem

(c)


68	ANALYSIS OF DNA SEQUENCES BY HYBRIDIZATION
Ultraviolet absorbance, circular dichroism, or the ﬂuorescence of dyes that bind selec-
tively to duplex DNA can all be used for this purpose. If the amount of double-stranded
(duplex) DNA in a sample is monitored as a function of temperature, the results typi-
cally obtained are shown in Figure 3.2. The DNA is transformed from double strands at
low temperature, rather abruptly at some critical temperature, to single strands. The
process, for long DNA, is		usually so cooperative that it	can	be likened to	the melting	of
a solid, and the transition is called			DNA	melting.	The midpoint of		the transition for a
particular DNA sample is called the melting temperature,						T m . For DNAs that are very
rich in the bases G		C, this can be 30 or 40°C higher than for extremely (A					T)-rich
samples. It is such spectroscopic observations, on large numbers of					small DNA	du-
plexes that have allowed us to achieve a quantitative			understanding of		most aspects	of
DNA melting.

Figure 3.2

Typical melting

behavior for DNA as a function of average base composition. Shown

is the fraction of single-stranded molecules as a function of temperature. The midpoint of the transi-

tion is the melting temperature,

m .

The goal of this section is to deﬁne conditions that allow sequence-speciﬁc analyses of

DNA using DNA hybridization. Speciﬁcity means the ratio of perfectly base-paired du-

plexes to duplexes with imperfections

mismatches. Thus

high

speciﬁcity means

that

the conditions maximize the amount

of double-stranded perfectly base-paired complex

and minimize the amount of other species. Key

variables are

the concentrations of

the

DNA probes and targets that are used, the temperature, and the salt concentration (ionic

strength).

The melting temperature of long

DNA is concentration

in dependent. This arises from

the way in

which

T m

deﬁned. Large DNA melts in patches as shown in

Figure 3.3

a . At

T m , the temperature at which half

the

DNA

melted,

T)-rich zones are melted,

while (G

C)-rich zones

are still in duplexes. No net strand separation will have taken

place because no duplexes will have been completely melted. Thus there can be no con-

centration dependence to T

m .

In contrast, the melting of short DNA duplexes, DNAs of 20 base pairs or less, is ef-

fectively all or none (Fig. 3.3

b). In this case the concentration of intermediate species,

partly single-stranded and partly duplex, is sufﬁciently small that it can be ignored. The

reaction of two short complementary DNA strands,

and

B, may be written as

A B AB

EQUILIBRIA BETWEEN DNA DOUBLE AND SINGLE STRANDS

Figure 3.3

Melting behavior of DNA.

(a)Structure of a typical high molecular weight DNA at its

melting temperature. (b)Status of a short DNA sample at its melting temperature.

The equilibrium (association) constant for this reaction is deﬁned as

K a [AB ] [A ][B ]

Most experiments will start with an equal concentration of the two strands (or a duplex melted to give the two strands). The initial total concentration of strands (whatever their

form) is

. If, for simplicity, all of the strands are initially single stranded, their concen-

trations is

[A ]

[B ]

T m

half

the

strands must be

duplex.

Hence

the

concentrations

the

different

species at

will be

[AB ]

[A ] [B ]

The equilibrium constant at

[AB

]

/ 4

[A ][B ] (C T / 4) C T

Do not be misled by this expression into thinking that the equilibrium constant is concen-

tration dependent. It is the

that is concentration dependent. The equilibrium constant is

temperature dependent. The above expression indicates the value seen for the equilibrium

constant, 4/

T , at the temperature,

T m

. This particular

m occurs when the equilibrium is

observed at the total strand concentration,

T .

ANALYSIS OF DNA SEQUENCES BY HYBRIDIZATION

A special case must be considered in which hybridization occurs between two short

single

strands

of the same

identical

sequence,

Such strands are self-complementary.

An example is GGGCCC which can base pair with itself. In this case the reaction can be

written

The equilibrium constant,

K a , becomes

K a

]

At the melting temperature half of the strands must be duplex. Hence

[C 2]

[2C ]

C T

where

C T

as before is the total concentration of strands. Thus we can evaluate the equilib-

rium expression at

T m

C T

/ 4

(C T / 2)

As before, what this really means is that

T m

concentration dependent.

In both cases

simple mass action considerations ensure that

T m

will increase as the concentration is

raised.

The ﬁnal case we need to consider is when one strand is in vast excess over the other

instead of both being at equal concentrations. This is frequently the case

when a

trace

amount of probe is used to interrogate a concentrated

sample or, alternatively, when

large amount of probe is used to interrogate a very minute sample. The formation of du-

plex can be written as before as

A B

, but now the initial starting conditions are

[B ] [A ]

Effectively the total strand concentration,

C T

, is thus simply

the initial

concentration of

the excess strand:

B o . At

T m ,

[AB

] [A ]

Thus the equilibrium expression at

becomes

[AB

]

C T

[A ][B ]

The melting temperature is still concentration dependent.

The importance of our ability to drive duplex formation cannot be

underestimated.

Figure 3.4 illustrates the practical utility of this ability. It shows the concentration depen-

dence

of the melting temperature

of two different duplexes. We can characterize

each re-

THERMODYNAMICS OF THE MELTING OF SHORT DUPLEXES

Figure 3.4	The dependence of the	melting temperature,	T m , of two short duplexes on the total
concentration of DNA strands,		C T .

action by its melting temperature and can attempt to extract thermodynamic parameters

the

enthalpy

change,

H , and the free energy change,

, for

each

reaction.

However, with the

T m ’s

different for the two reactions, if we do this at

m , the

thermody-

namic parameters derived will refer to reactions at two different temperatures. There will

be no way, in general, to compare these parameters, since they are expected to be intrinsi-

cally temperature dependent. The concentration dependence of melting saves us from this

dilemma. By varying the concentration, we can produce conditions where the

two du-

plexes

have

the same

m . Now thermodynamic parameters derived from each are

compa-

rable. We can, if we wish, choose any temperature for this comparison. In practice, 298 K

has been chosen for this purpose.

THERMODYNAMICS OF THE MELTING OF SHORT DUPLEXES

The model we will use to analyze the melting of short DNA double helices is shown in

Figure 3.5. The two strands come

together, in a nucleation step, to form a single pair.

Double strands can form by stacking of adjacent base pairs above or below the initial nu-

cleus until a full duplex has zippered up. It does not matter, in our treatment, where the

initial nucleus forms. We also need

not consider any intermediate steps beyond the nu-

cleus

and the fully

duplex state. In

that state, for a duplex of

n base

pairs,

there will be

n 1 stacking

interactions (Fig.

3.6). Each interaction reﬂects

the

energetics

of stacking

two adjacent base

pairs on top of

each other. There are ten distinct such

interactions,

ApG/CpT, ApA/TpT, and so on (where the slash indicates two complementary antiparal-

lel strands). Because their energetics are very different, we must consider the DNA sequence explicitly in calculating the thermodynamics of DNA melting.

Figure 3.5 A model for the mechanism of the formation of duplex DNA from separated complementary single strands.

72 ANALYSIS OF DNA SEQUENCES BY HYBRIDIZATION

Figure 3.6 Stacking interactions in double-stranded DNA.

For each of the ten possible stacking interactions, we can deﬁne a standard

for the

free energy of

stacking, a

stacking,

and

for

the entropy

for the enthalpyH

stacking. These quantities will be related at a particular temperature by the expression

G s0

H s0 T S s0

For any particular duplex DNA sequence, we can compute the

thermodynamic parame-

ters for duplex formation by simply combining the parameters for the competent stacking

interactions plus a nucleation term. Thus

nuc0

g sym

nuc0

and similarly

for the entropy, where the sums are taken over all of the stacking interac-

tions in

the

duplex, where

g sym

0.4

kcal/mol if

the

two strands are identical; other-

wise,

g sym

0. The equilibrium constant for duplex formation is given by

K ks 1 s 2s 3 . . . s n 1

where

k is the equilibrium constant of nucleation, related to the

nuc0

and each

s i

is the microscopic equilibrium

constant

for

particular stacking

reaction,

re-

lated to the

forG

0that reaction by

ln s i

The key factor involved in predicting the stability of DNA (and RNA) duplexes is that

all of these thermodynamic parameters have been measured experimentally. One takes

advantage

the enormous power available to synthesize particular DNA sequences,

combines complementary pairs of such sequences, and measures their extent of duplex as

a function of

temperature and

concentration. For

example, we can

study the

properties

0 s

THERMODYNAMICS OF THE MELTING OF SHORT DUPLEXES

and compare

these with A

. Since

the only difference between these complexes

is an extra ApA/TpT stacking interaction, the differences will yield the thermodynamic

parameters for that interaction. Other sequences are more complex to handle, but this has

all been accomplished.

It is helpful to choose a single standard temperature and set of environmental conditions

for the tabulation of thermodynamic data: 298 K has been selected, and

at 298 K in 1

G s0 's

M NaCl are listed in Table 3.1. Enthalpy values for the stacking interactions can be obtained

in two ways: either by direct calorimetric measurements or by examining the temperature de-

pendence of the stacking interactions.

are also listedH in0 Table's 3.1. So are

, which

can

calculated from

the

relationship

. FromH

these data,T Sthermody-

namic values at other temperatures can be estimated as shown in Box 3.3. The effects of salt

are well understood and are described in detail elsewhere (Cantor and Schimmel, 1980).

The results shown in Table 3.1 make

it clear

that

the

effects of the DNA

sequence

duplex stability are very large. The average

8 kcal/mol; the range is

11.9 kcal/mol. The average

1.6

kcal/mol

with a range of

0.9

kcal/mol. Thus the DNA sequence must be considered explicitly in estimating duplex

stabilities. The two additional parameters needed to do this concern the energetics of nu-

cleation. These are relatively sequence

independent,

and we

can use

average

values of

G nuc0

kcal/mol (except if

no G–C

pairs

are

present,

then

kcal/mol should

used) and

nuc0

For estimating the stability of perfectly paired duplexes,

these

nucleation

parameters

and the stacking energies in Table 3.1 are used. Table 3.2 shows typical results when cal-

culated

and

experimentally

measured

are

compared. TheG

0agreement's in almost all

cases is excellent, and the few discrepancies seen are probably within the range of typical

experimental errors. The approach described above has been generalized to predict the

thermodynamic properties of triple helices, and presumably it will also serve for four-

stranded DNA structures.

5.6

3.6

TABLE 3.1	Nearest-neighbor Stacking Interactions in
Double-stranded DNA

	Nearest-neighbor Thermodynamics

	H °	S °	G °
Interaction	(kcal/mol)	(cal/Kmol)	(kcal/mol)

AA/TT	9.1	24.0	1.9
AT/TA	8.6	23.9	1.5
TA/AT	6.0	16.9	0.9
CA/GT	5.8	12.9	1.9
GT/CA	6.5	17.3	1.3
CT/GA	7.8	20.8	1.6
GA/CT	5.6	13.5	1.6
CG/GC	11.9	27.8	3.6
GC/CG	11.1	26.7	3.1
GG/CC	11.0	26.6	3.1

Source: Adapted from Breslauer et al. (1986).

1 / 41 2 3 4 > Следующая >>>

Соседние файлы в папке genomics1-10

#
17.08.2013253.34 Кб461.pdf
#
17.08.2013456.46 Кб4610.pdf
#
17.08.2013435.19 Кб502.pdf
#
17.08.2013343.56 Кб463.pdf
#
17.08.2013296.13 Кб464.pdf
#
17.08.2013326.85 Кб455.pdf
#
17.08.2013406.31 Кб456.pdf
#
17.08.2013277.57 Кб467.pdf
#
17.08.2013634.83 Кб478.pdf