Genomics: The Science and Technology Behind the Human Genome Project. |
Charles R. Cantor, Cassandra L. Smith |
|
Copyright © 1999 John Wiley & Sons, Inc. |
|
ISBNs: 0-471-59908-5 (Hardback); 0-471-22056-6 (Electronic) |
3 Analysis of DNA Sequences
by Hybridization
BASIC |
REQUIREMENTS |
FOR |
SELECTIVITY |
AND |
SENSITIVITY |
|
|
|
|
|
|
|
|
||||||||
The haploid human genome is 3 |
|
|
|
10 |
9 |
base |
pairs, and a |
typical human cell, as described in |
|||||||||||||
the last chapter, is somewhere between diploid and tetraploid in DNA content. Thus each cell |
|
|
|
|
|||||||||||||||||
has about 10 |
10 |
base pairs of DNA. A single base pair is 660 Da. Hence the weight of DNA in a |
|
|
|||||||||||||||||
single cell can be calculated as 10 |
|
|
10 |
|
660 / (6 |
10 |
23 ) 10 |
11 |
g or 10 pg. Ideally we would |
||||||||||||
like to be able to do analyses on single cells. This means that if only a small portion of the |
|
|
|||||||||||||||||||
genome is the target for analysis, far less than 10 pg of material will need to be detected. By |
|
|
|
||||||||||||||||||
current methodology we are in fact able to determine the presence or |
absence of almost |
any |
|
|
|
|
|||||||||||||||
20-bp DNA sequence within a single cell, such as the sequence ATTGGCATAGGAGCC- |
|
|
|
|
|||||||||||||||||
CATGG. This analysis takes place at the level of single molecules. Two requirements must be |
|
|
|
|
|||||||||||||||||
met to perform such an exquisitely demanding analysis. There must be sufficient experimental |
|
|
|
|
|||||||||||||||||
sensitivity to detect the presence of the sequence. This sensitivity is provided by either chemi- |
|
|
|||||||||||||||||||
cal or |
biological |
amplification |
procedures or |
by |
a combination of these procedures. |
There |
|
|
|
|
|||||||||||
must also be sufficient experimental selectivity to discriminate between the desired, true target |
|
|
|
|
|||||||||||||||||
sequence and all other similar sequences, which may differ from the target by as little as one |
|
|
|
||||||||||||||||||
base. That specificity lies with the intrinsic selectivity of DNA base pairing, itself. |
|
|
|
|
|
|
|
||||||||||||||
The target of a 20-bp DNA sequence is not picked casually. Twenty bp is just about the |
|
|
|||||||||||||||||||
smallest DNA length that has a high probability, |
a priori, of being found in a |
single copy |
|
|
|
||||||||||||||||
in the human genome. This can be deduced as |
follows |
from |
simple binomial |
statistics |
|
|
|
|
|||||||||||||
(Box 3.1). |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
||
For |
simplicity, |
pretend |
that the human genome contains equal amounts |
of |
the |
four |
|
|
|||||||||||||
bases, A, T, C, and G, and |
that the occurrences of the bases are random. (These con- |
|
|
||||||||||||||||||
straints will be relaxed elsewhere in the book when some of the unusual statistical proper- |
|
|
|
||||||||||||||||||
ties of natural DNAs need to be considered explicitly. Then the expected frequency of oc- |
|
|
|
||||||||||||||||||
currence |
of any |
particular |
stretch |
of |
DNA |
sequence, |
|
such |
as |
|
|
|
|
|
|
n |
bases beginning as |
||||
ATCCG . . ., is 4 |
|
n . The |
average number |
|
of |
occurrences |
of this particular sequence in |
|
|||||||||||||
the haploid human genome is 3 |
|
|
|
10 |
9 |
4 n . For a sequence of 16 bases, |
n 16, the av- |
||||||||||||||
erage |
occurrence |
is 3 |
|
10 |
9 4 16 |
which is |
about 1. Thus |
such a |
length |
will tend to |
be |
||||||||||
seen as often as not by chance; it is not long enough to be a unique identifier. There is a |
|
|
|||||||||||||||||||
reasonable chance that the sequence 16 bases long will occur several times in different |
|
|
|||||||||||||||||||
places in the genome. Choosing |
|
|
|
n |
20 gives an average occurrence of about 0.3%. Such |
||||||||||||||||
sequences will almost always be unique genome landmarks. One corollary of this simple |
|
|
|
|
|||||||||||||||||
exercise is that it is a very futile exercise to look at random for the occurrence of particu- |
|
|
|||||||||||||||||||
lar 20-mers in the sequence of a higher organism unless there is good a priori reason for |
|
|
|||||||||||||||||||
suspecting the presence of these sequences. This means |
that sequences of |
length 20 |
or |
|
|
||||||||||||||||
more can be used as unique identifiers (see Box 3.2). |
|
|
|
|
|
|
|
|
|
|
|
64
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
BASIC REQUIREMENTS FOR SELECTIVITY AND SENSITIVITY |
|
|
65 |
|
||||||||||||||||||||
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|||||
BOX 3.1 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|||||
BINOMIAL |
|
STATISTICS |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
||||||||||||
Binomial |
|
statistics |
describe |
the |
probable outcome of events like coin |
flipping, |
|
|
|
|
|
|
|
|
|
|||||||||||||||||||||||||
events that depend on a |
single random variable. While a normal coin has a |
|
50% |
|
|
|
|
|
|
|
|
|
||||||||||||||||||||||||||||
chance of heads or tails |
with each flip, we will consider here the more general |
case |
|
|
|
|
|
|
|
|
|
|||||||||||||||||||||||||||||
of |
|
a |
weighted |
|
coin |
with |
two |
possible |
outcomes |
with |
probabilities |
|
|
|
|
|
|
|
|
|
p |
(heads) |
and |
|
||||||||||||||||
q |
|
(tails). Since there are no other possible outcomes |
|
|
|
|
|
|
p |
q |
1. If N |
successive flips |
|
|
||||||||||||||||||||||||||
are executed, and the outcome is a particular string, such as |
|
|
|
|
|
|
|
|
hhhhttthhh, |
the |
chance |
|
||||||||||||||||||||||||||||
of this particular outcome is |
|
|
|
p n q N n , |
where |
|
n |
is the number |
of times heads was ob- |
|
|
|
|
|||||||||||||||||||||||||||
served. Note that all strings with the same numbers of heads and tails will have the |
|
|
|
|
|
|
|
|
|
|||||||||||||||||||||||||||||||
same a priori probability, since in binomial statistics each event does not affect the |
|
|
|
|
|
|
|
|
|
|||||||||||||||||||||||||||||||
probability of subsequent events. Later in this book we will deal with cases where |
|
|
|
|
|
|
|
|
|
|||||||||||||||||||||||||||||||
this extremely simple model does not hold. If we care only about the chance of an |
|
|
|
|
|
|
|
|
|
|||||||||||||||||||||||||||||||
outcome |
with |
|
|
|
|
|
|
n |
heads |
and |
|
N |
n |
tails, |
without regard |
to |
sequence, |
the |
number of |
|
|
|
|
|||||||||||||||||
such |
events |
is |
|
|
|
|
|
N |
!/(n !)(N n )!, and so the fraction of |
times |
this |
outcome |
will |
be |
|
|
|
|
||||||||||||||||||||||
seen is ( |
|
p n q N n )N !/(n !)(N n )! |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
||||||||||||||||||
|
|
A simple binomial model can also be used to estimate the frequency of |
occur- |
|
|
|
|
|
|
|
|
|
||||||||||||||||||||||||||||
rence |
of |
particular |
DNA |
base sequences. Here there are four possible outcomes |
|
|
|
|
|
|
|
|
|
|
||||||||||||||||||||||||||
(not quite as complex as dice throwing where six possible outcomes occur). For a |
|
|
|
|
|
|
|
|
|
|||||||||||||||||||||||||||||||
particular |
string |
|
with |
|
|
|
n A A’s, |
n C |
C’s, |
n G |
G’s and |
n T |
T’s, and |
a base |
|
composition |
of |
n T . |
|
|
||||||||||||||||||||
X |
A |
, |
X |
C |
, |
X |
|
G |
, |
and |
|
X |
T |
the |
chance occurrence |
of |
that |
string |
is |
|
|
|
|
|
X |
|
n A X |
n C X |
n G X |
The |
|
|||||||||
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
A |
C |
G |
T |
|
|
|||||
number of possible strings with a particular base composition is |
|
|
|
|
|
|
|
|
|
N |
!/(n A |
!n C !n G !n T !), |
|
|||||||||||||||||||||||||||
and by combining this with the previous term, the probability of a string with a par- |
|
|
|
|
|
|
|
|
|
|||||||||||||||||||||||||||||||
ticular base composition can easily be computed. Incidentally, the number of possi- |
|
|
|
|
|
|
|
|
|
|||||||||||||||||||||||||||||||
ble strings |
of |
length |
|
|
|
|
N |
is |
4 N |
, while |
|
the |
number |
of different |
base |
compositions |
|
of |
|
|
|
|
|
|||||||||||||||||
this length is ( |
|
|
|
N |
|
3)!/(N !3!). |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|||||||||||||
|
|
The |
same |
|
statistical |
models can |
be used to make estimates that two people |
will |
|
|
|
|
|
|
|
|
|
|||||||||||||||||||||||
share the same DNA sequences. Such estimates are very useful in DNA-based identity |
|
|
|
|
|
|
|
|
|
|
|
|||||||||||||||||||||||||||||
testing. Here we consider just the simple case of two allele polymorphisms. In a par- |
|
|
|
|
|
|
|
|
|
|||||||||||||||||||||||||||||||
ticular place in the genome, suppose that a fraction of all individuals have one base, |
|
|
|
|
|
|
|
|
f1 , |
|
||||||||||||||||||||||||||||||
while the remainder have another, |
|
|
|
|
f2. The chance that two individuals share |
the |
same |
|
|
|
|
|
||||||||||||||||||||||||||||
allele is |
|
f |
1 |
2 f |
2 |
2 |
g |
2. If |
a set |
of |
M |
|
two-allele |
polymorphisms |
( |
|
|
|
i, j, k, . . .) |
is consid- |
|
|||||||||||||||||||
ered |
simultaneously, |
the |
chance |
that |
two |
individuals |
are |
identical |
for |
all |
of |
them is |
|
|
|
|
|
|
|
|
|
|||||||||||||||||||
g |
2i g j2g k2. . |
|
. . |
By |
|
choosing |
|
M |
sufficiently |
large, |
we |
can |
clearly |
make |
the |
overall |
|
|
|
|
|
|||||||||||||||||||
chance too low to occur, |
unless the individuals in question are one and the |
same. |
|
|
|
|
|
|
|
|
|
|||||||||||||||||||||||||||||
However, two caveats apply to this |
reasoning. First, related individuals will |
show |
a |
|
|
|
|
|
|
|
|
|
||||||||||||||||||||||||||||
much higher degree of similarity than predicted by this model. Monozygotic twins, in |
|
|
|
|
|
|
|
|
|
|
||||||||||||||||||||||||||||||
principle, should share an identical set of alleles at the germ-line level. Second, the |
|
|
|
|
|
|
|
|
|
|||||||||||||||||||||||||||||||
proper allele frequencies to use will depend on the racial, ethnic, and other genetic |
|
|
|
|
|
|
|
|
|
|||||||||||||||||||||||||||||||
characteristics of the individuals in question. Thus it may not always be easy to select |
|
|
|
|
|
|
|
|
|
|||||||||||||||||||||||||||||||
appropriate values. These difficulties notwithstanding, DNA testing offers a very pow- |
|
|
|
|
|
|
|
|
|
|
||||||||||||||||||||||||||||||
erful approach to identification of individuals, paternity testing, and a variety of foren- |
|
|
|
|
|
|
|
|
|
|||||||||||||||||||||||||||||||
sic applications. |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
||||||||
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
66 |
ANALYSIS OF DNA SEQUENCES BY |
HYBRIDIZATION |
|
|
|
|
|||||
|
|
|
|
|
|
|
|
|
|
|
|
BOX 3.2 |
|
|
|
|
|
|
|
|
|
|
|
DNA SEQUENCES |
AS |
UNIQUE SAMPLE IDENTIFIERS |
|
|
|
|
|
|
|||
The following table shows the number of different |
sequences |
of length |
|
n |
and com- |
|
|||||
pares these values to the sizes of various genomes. Since genome size is virtually the |
|
|
|
|
|||||||
same as the number of possible short substrings, it is easy to determine the lengths of |
|
|
|
|
|||||||
short sequences that will occur on average only |
once per genome. Sequences a few |
|
|
|
|
||||||
bases longer than these lengths will, for all practical purposes, occur either once or not |
|
|
|
|
|||||||
at all, and hence they can serve as unique identifiers. |
|
|
|
|
|
||||||
LENGTH |
|
NUMBER OF SEQUENCES |
GENOME |
, GENOME SIZE (BP ) |
|
||||||
|
8 |
|
6.55 |
10 |
4 |
Bacteriophage lambda, 5 |
|
10 4 |
|
||
|
9 |
|
2.60 |
|
10 |
5 |
|
|
|
|
|
|
10 |
|
1.05 |
|
10 |
6 |
|
|
|
|
|
|
11 |
|
4.20 |
|
10 |
6 |
E. coli, |
4 10 6 |
|
|
|
|
12 |
|
1.68 |
|
10 |
7 |
S. cerevisiae, 1.3 |
10 |
7 |
|
|
|
13 |
|
6.71 |
|
10 |
7 |
|
|
|
|
|
|
14 |
|
2.68 |
10 |
8 |
All mammalian mRNAs, 2 |
|
10 |
8 |
||
|
15 |
|
1.07 |
|
10 |
9 |
|
|
|
|
|
|
16 |
|
4.29 |
|
10 |
9 |
Human haploid genome, 3 |
|
10 |
9 |
|
|
17 |
|
1.72 |
|
10 |
10 |
|
|
|
|
|
|
18 |
|
6.87 |
10 |
10 |
|
|
|
|
|
|
|
19 |
|
2.75 |
|
10 |
11 |
|
|
|
|
|
|
20 |
|
1.10 |
|
10 |
12 |
|
|
|
|
|
|
|
|
|
|
|
|
|
||||
DETECTION OF SPECIFIC DNA SEQUENCES |
|
|
|
|
|
|
|
||||
DNA molecules themselves are the perfect set of reagents to identify particular DNA se- |
|
|
|
|
|||||||
quences. This is because of the strong, sequence-specific base pairing between comple- |
|
|
|
|
|||||||
mentary DNA strands. Here one strand of DNA will be considered to be a target, and the |
|
|
|
|
|||||||
other, a probe. (If both are not initially available in a single-stranded form, there are many |
|
|
|
|
|||||||
ways to circumvent this complication.) The analysis for a particular DNA sequence con- |
|
|
|
|
|||||||
sists in asking whether a probe can find its target |
in the sample of interest. If the probe |
|
|
|
|
||||||
does so, a double-stranded DNA complex will be formed. This |
process is called |
|
|
|
hy- |
||||||
bridization, |
and all we have to do is to discriminate between this complex and the initial |
|
|
|
|
||||||
single-stranded starting materials (Fig. 3.1 |
|
|
|
a ). |
|
|
|
|
|||
|
The earliest hybridization experiments were carried out in homogeneous solutions. |
|
|
|
|
||||||
Hybridization was allowed to proceed for a fixed time period, and then a physical separa- |
|
|
|
|
|||||||
tion was performed to capture double-stranded material and discard single strands. |
|
|
|
|
|||||||
Hydroxyapatite chromatography was used to do this |
discrimination because conditions |
|
|
|
|
||||||
could be found in which double-stranded DNA bound |
to a column |
of hydroxylapatite, |
|
|
|
|
|||||
while single strands were eluted (Fig. 3.1 |
|
|
|
b). The amount of double-stranded DNA could |
|
|
|
||||
be quantitated by using a radioisotopic label on the |
probe or the target, or by measuring |
|
|
|
|
||||||
the |
bulk amount |
of |
DNA captured or eluted. |
This method is still used today in select |
|
|
|
|
cases, but it is very tedious because only a few samples can be conveniently analyzed simultaneously.
EQUILIBRIA BETWEEN DNA DOUBLE AND SINGLE STRANDS
Figure 3.1 Detecting the formation of specific double-stranded DNA sequences.
is to tell whether the sequence of interest is present in single-stranded (s.s.) or double-stranded (d.s.) form. (b)Physical purification by methods such as hydroxyapatite chromatography. Physical purification by the use of one strand attached to an immobilized phase. A label is used to
detect hybridization of the single-stranded probe.
Modern hybridization protocols immobilize one of the two DNAs on a solid support (Fig. 3.1 c ). The immobilized phase can be either the probe or the target. The complementary sample is labeled with a radioisotope, a fluorescent dye, or some other specific moiety that later allows a signal to be generated in situ. The amount of color, or radioactivity, on the immobilized phase is measured after hybridization, for a fixed period, and subsequent washing of the solid support to remove adsorbed, but nonhybridized, material. As
will be shown later, an advantage of this method is that many samples can be processed in parallel. However, there are also some disadvantages that will become apparent as we proceed.
EQUILIBRIA BETWEEN DNA |
DOUBLE |
AND |
SINGLE |
STRANDS |
|
|
The |
fraction of singleand double-stranded DNA in solution can be monitored by vari- |
|||||
ous |
spectroscopic properties that effectively average over different |
DNA sequences. |
||||
Such measurements allow |
us to view |
the |
overall |
reaction of DNA |
single strands. |
67
(a)The problem
(c)
68 |
ANALYSIS OF DNA SEQUENCES BY HYBRIDIZATION |
|
|
|
|||
Ultraviolet absorbance, circular dichroism, or the fluorescence of dyes that bind selec- |
|
||||||
tively to duplex DNA can all be used for this purpose. If the amount of double-stranded |
|
||||||
(duplex) DNA in a sample is monitored as a function of temperature, the results typi- |
|
||||||
cally obtained are shown in Figure 3.2. The DNA is transformed from double strands at |
|
|
|||||
low temperature, rather abruptly at some critical temperature, to single strands. The |
|
||||||
process, for long DNA, is |
usually so cooperative that it |
can |
be likened to |
the melting |
of |
|
|
a solid, and the transition is called |
DNA |
melting. |
The midpoint of |
the transition for a |
|||
particular DNA sample is called the melting temperature, |
|
|
T m . For DNAs that are very |
||||
rich in the bases G |
C, this can be 30 or 40°C higher than for extremely (A |
T)-rich |
|||||
samples. It is such spectroscopic observations, on large numbers of |
small DNA |
du- |
|
||||
plexes that have allowed us to achieve a quantitative |
understanding of |
most aspects |
of |
|
|||
DNA melting. |
|
|
|
|
|
|
Figure 3.2 |
Typical melting |
behavior for DNA as a function of average base composition. Shown |
|
||||||||||
is the fraction of single-stranded molecules as a function of temperature. The midpoint of the transi- |
|
||||||||||||
tion is the melting temperature, |
|
|
|
T |
m . |
|
|
|
|
|
|
||
The goal of this section is to define conditions that allow sequence-specific analyses of |
|
||||||||||||
DNA using DNA hybridization. Specificity means the ratio of perfectly base-paired du- |
|
|
|||||||||||
plexes to duplexes with imperfections |
or |
mismatches. Thus |
high |
specificity means |
that |
|
|
||||||
the conditions maximize the amount |
of double-stranded perfectly base-paired complex |
|
|
||||||||||
and minimize the amount of other species. Key |
variables are |
the concentrations of |
the |
|
|||||||||
DNA probes and targets that are used, the temperature, and the salt concentration (ionic |
|
||||||||||||
strength). |
|
|
|
|
|
|
|
|
|
|
|
|
|
The melting temperature of long |
DNA is concentration |
|
|
|
in dependent. This arises from |
|
|||||||
the way in |
which |
T m |
is |
defined. Large DNA melts in patches as shown in |
Figure 3.3 |
a . At |
|||||||
T m , the temperature at which half |
the |
DNA |
is |
melted, |
(A |
|
|
T)-rich zones are melted, |
|
||||
while (G |
C)-rich zones |
are still in duplexes. No net strand separation will have taken |
|
||||||||||
place because no duplexes will have been completely melted. Thus there can be no con- |
|
|
|||||||||||
centration dependence to T |
|
|
|
m . |
|
|
|
|
|
|
|
||
In contrast, the melting of short DNA duplexes, DNAs of 20 base pairs or less, is ef- |
|
||||||||||||
fectively all or none (Fig. 3.3 |
|
|
|
b). In this case the concentration of intermediate species, |
|
||||||||
partly single-stranded and partly duplex, is sufficiently small that it can be ignored. The |
|
||||||||||||
reaction of two short complementary DNA strands, |
|
|
|
A |
and |
B, may be written as |
|
A B AB
EQUILIBRIA BETWEEN DNA DOUBLE AND SINGLE STRANDS |
69 |
Figure 3.3 |
Melting behavior of DNA. |
(a)Structure of a typical high molecular weight DNA at its |
melting temperature. (b)Status of a short DNA sample at its melting temperature.
The equilibrium (association) constant for this reaction is defined as
K a [AB ] [A ][B ]
Most experiments will start with an equal concentration of the two strands (or a duplex melted to give the two strands). The initial total concentration of strands (whatever their
form) is |
C |
T |
. If, for simplicity, all of the strands are initially single stranded, their concen- |
|||||||||||||||||||||||||
trations is |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
[A ] |
|
[B ] |
C |
T |
|
|
|
|
|
|
|
|
|||||
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
||||||||
|
|
|
|
|
|
|
|
|
|
|
|
o |
|
|
|
o |
|
|
2 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
||
At |
T m |
half |
|
the |
strands must be |
in |
duplex. |
Hence |
the |
concentrations |
|
of |
|
the |
|
different |
||||||||||||
species at |
|
T |
m |
will be |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
[AB ] |
[A ] [B ] |
|
|
C |
T |
|
|
|
|
|||||||||
|
|
|
|
|
|
|
|
|
|
4 |
|
|
|
|
||||||||||||||
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
||||
The equilibrium constant at |
|
T |
m |
is |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|||||
|
|
|
|
|
|
|
|
K |
a |
|
[AB |
] |
|
|
C |
T |
/ 4 |
|
|
|
4 |
|
||||||
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
2 |
|
|
|
|
|
|||||||
|
|
|
|
|
|
|
|
|
|
|
[A ][B ] (C T / 4) C T |
|||||||||||||||||
Do not be misled by this expression into thinking that the equilibrium constant is concen- |
||||||||||||||||||||||||||||
tration dependent. It is the |
T |
m |
that is concentration dependent. The equilibrium constant is |
|||||||||||||||||||||||||
temperature dependent. The above expression indicates the value seen for the equilibrium |
|
|
|
|
||||||||||||||||||||||||
constant, 4/ |
|
C |
T , at the temperature, |
|
|
|
|
T m |
. This particular |
|
|
|
|
|
T |
m occurs when the equilibrium is |
||||||||||||
observed at the total strand concentration, |
|
|
|
|
|
|
C |
T . |
|
|
|
|
|
|
|
|
|
|
|
|
|
70 |
ANALYSIS OF DNA SEQUENCES BY HYBRIDIZATION |
|
|
|
|
|
|
|
|
|
|
||||||||||||
A special case must be considered in which hybridization occurs between two short |
|
||||||||||||||||||||||
single |
strands |
of the same |
identical |
sequence, |
|
|
|
|
|
|
|
C. |
|
Such strands are self-complementary. |
|||||||||
An example is GGGCCC which can base pair with itself. In this case the reaction can be |
|
||||||||||||||||||||||
written |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
2C |
C |
2 |
|
|
|
|
|
|
|
|
|
||||
The equilibrium constant, |
|
|
K a , becomes |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
||
|
|
|
|
|
|
|
|
K a |
|
[C |
2] |
|
|
|
|
|
|
|
|||||
|
|
|
|
|
|
|
|
[C |
2 |
|
|
|
|
|
|
|
|
|
|||||
|
|
|
|
|
|
|
|
|
|
|
] |
|
|
|
|
|
|
|
|
|
|||
At the melting temperature half of the strands must be duplex. Hence |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|||||||||
|
|
|
|
|
[C 2] |
[2C ] |
|
C T |
|
|
|
||||||||||||
|
|
|
|
|
4 |
|
|
|
|||||||||||||||
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
||||
where |
C T |
as before is the total concentration of strands. Thus we can evaluate the equilib- |
|
||||||||||||||||||||
rium expression at |
T m |
as |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
||
|
|
|
|
|
|
K |
a |
C T |
/ 4 |
|
|
|
1 |
|
|
|
|
||||||
|
|
|
|
|
|
(C T / 2) |
C |
|
|
T |
|
|
|||||||||||
As before, what this really means is that |
|
|
|
|
T m |
|
is |
concentration dependent. |
In both cases |
||||||||||||||
simple mass action considerations ensure that |
|
|
|
|
|
|
|
|
T m |
|
|
will increase as the concentration is |
|||||||||||
raised. |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
The final case we need to consider is when one strand is in vast excess over the other |
|
||||||||||||||||||||||
instead of both being at equal concentrations. This is frequently the case |
|
|
when a |
trace |
|
||||||||||||||||||
amount of probe is used to interrogate a concentrated |
sample or, alternatively, when |
a |
|
||||||||||||||||||||
large amount of probe is used to interrogate a very minute sample. The formation of du- |
|
|
|||||||||||||||||||||
plex can be written as before as |
|
A B |
AB |
, but now the initial starting conditions are |
|||||||||||||||||||
|
|
|
|
|
|
|
|
[B ] [A ] |
|
|
|
|
|
|
|
||||||||
|
|
|
|
|
|
|
|
|
o |
|
|
|
|
o |
|
|
|
|
|
|
|
||
Effectively the total strand concentration, |
|
|
|
|
C T |
, is thus simply |
the initial |
concentration of |
|||||||||||||||
the excess strand: |
B o . At |
T m , |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|||
|
|
|
|
|
|
|
|
[AB |
] [A ] |
|
|
|
|
|
|
|
|||||||
Thus the equilibrium expression at |
T |
m |
becomes |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
||||||
|
|
|
|
|
K |
a |
|
[AB |
] |
|
1 |
|
|
|
|
||||||||
|
|
|
|
|
|
|
|
C T |
|
|
|||||||||||||
|
|
|
|
|
|
|
|
|
[A ][B ] |
|
|
|
|
||||||||||
The melting temperature is still concentration dependent. |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|||||
The importance of our ability to drive duplex formation cannot be |
underestimated. |
|
|||||||||||||||||||||
Figure 3.4 illustrates the practical utility of this ability. It shows the concentration depen- |
|
||||||||||||||||||||||
dence |
of the melting temperature |
of two different duplexes. We can characterize |
each re- |
|
|
THERMODYNAMICS OF THE MELTING OF SHORT DUPLEXES |
71 |
Figure 3.4 |
The dependence of the |
melting temperature, |
T m , of two short duplexes on the total |
concentration of DNA strands, |
C T . |
|
action by its melting temperature and can attempt to extract thermodynamic parameters |
|
|
|
|
||||||||
like |
the |
enthalpy |
change, |
|
H , and the free energy change, |
G |
, for |
each |
reaction. |
|||
However, with the |
|
T m ’s |
different for the two reactions, if we do this at |
|
T |
m , the |
thermody- |
|||||
namic parameters derived will refer to reactions at two different temperatures. There will |
|
|
|
|
||||||||
be no way, in general, to compare these parameters, since they are expected to be intrinsi- |
|
|
|
|
||||||||
cally temperature dependent. The concentration dependence of melting saves us from this |
|
|
|
|
||||||||
dilemma. By varying the concentration, we can produce conditions where the |
two du- |
|
|
|
|
|||||||
plexes |
have |
the same |
T |
m . Now thermodynamic parameters derived from each are |
compa- |
|
|
|
||||
rable. We can, if we wish, choose any temperature for this comparison. In practice, 298 K |
|
|
|
|||||||||
has been chosen for this purpose. |
|
|
|
|
|
|
|
|
||||
THERMODYNAMICS OF THE MELTING OF SHORT DUPLEXES |
|
|
|
|
|
|||||||
The model we will use to analyze the melting of short DNA double helices is shown in |
|
|
|
|
||||||||
Figure 3.5. The two strands come |
together, in a nucleation step, to form a single pair. |
|
|
|
||||||||
Double strands can form by stacking of adjacent base pairs above or below the initial nu- |
|
|
|
|
||||||||
cleus until a full duplex has zippered up. It does not matter, in our treatment, where the |
|
|
|
|||||||||
initial nucleus forms. We also need |
not consider any intermediate steps beyond the nu- |
|
|
|
|
|||||||
cleus |
and the fully |
duplex state. In |
that state, for a duplex of |
|
|
n base |
pairs, |
there will be |
||||
n 1 stacking |
interactions (Fig. |
3.6). Each interaction reflects |
the |
energetics |
of stacking |
|
|
|
||||
two adjacent base |
pairs on top of |
each other. There are ten distinct such |
interactions, |
as |
|
|
|
ApG/CpT, ApA/TpT, and so on (where the slash indicates two complementary antiparal-
lel strands). Because their energetics are very different, we must consider the DNA sequence explicitly in calculating the thermodynamics of DNA melting.
Figure 3.5 A model for the mechanism of the formation of duplex DNA from separated complementary single strands.
72 ANALYSIS OF DNA SEQUENCES BY HYBRIDIZATION
Figure 3.6 Stacking interactions in double-stranded DNA.
For each of the ten possible stacking interactions, we can define a standard |
|
|
for the |
|
|
|
|||||||||||||
free energy of |
stacking, a |
0 |
|
of |
stacking, |
and |
a |
|
for |
the entropy |
of |
S |
0 |
||||||
for the enthalpyH |
|
s |
|||||||||||||||||
|
|
|
|
|
s |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
stacking. These quantities will be related at a particular temperature by the expression |
|
|
|
||||||||||||||||
|
|
|
|
|
|
|
G s0 |
H s0 T S s0 |
|
|
|
||||||||
For any particular duplex DNA sequence, we can compute the |
thermodynamic parame- |
|
|
|
|
||||||||||||||
ters for duplex formation by simply combining the parameters for the competent stacking |
|
|
|
|
|||||||||||||||
interactions plus a nucleation term. Thus |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|||||
|
|
|
|
|
G |
0 |
|
G |
nuc0 |
G |
s0 |
|
g sym |
|
|
|
|||
|
|
|
|
|
|
|
H |
0 |
H |
nuc0 |
H |
s0 |
|
|
|
||||
and similarly |
for the entropy, where the sums are taken over all of the stacking interac- |
|
|
|
|||||||||||||||
tions in |
the |
duplex, where |
g sym |
0.4 |
kcal/mol if |
the |
two strands are identical; other- |
||||||||||||
wise, |
g sym |
0. The equilibrium constant for duplex formation is given by |
|
|
|
|
|
||||||||||||
|
|
|
|
|
|
|
K ks 1 s 2s 3 . . . s n 1 |
|
|
|
|
||||||||
where |
k is the equilibrium constant of nucleation, related to the |
|
|
by |
|
G |
nuc0 |
|
|||||||||||
|
|
|
|
|
|
|
G |
nuc0 |
|
RT |
ln |
k |
|
|
|
|
|||
and each |
|
s i |
is the microscopic equilibrium |
constant |
for |
a |
particular stacking |
reaction, |
re- |
|
|
||||||||
lated to the |
|
forG |
0that reaction by |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
||
|
|
|
|
i |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
G |
i0 |
RT |
ln s i |
|
|
|
|
|||||
The key factor involved in predicting the stability of DNA (and RNA) duplexes is that |
|
|
|
||||||||||||||||
all of these thermodynamic parameters have been measured experimentally. One takes |
|
|
|
||||||||||||||||
advantage |
of |
the enormous power available to synthesize particular DNA sequences, |
|
|
|
||||||||||||||
combines complementary pairs of such sequences, and measures their extent of duplex as |
|
|
|
|
|||||||||||||||
a function of |
temperature and |
concentration. For |
example, we can |
study the |
properties |
of |
|
|
|
G
0 s
THERMODYNAMICS OF THE MELTING OF SHORT DUPLEXES
A |
/T |
8 |
and compare |
these with A |
9 |
/T |
9 |
. Since |
the only difference between these complexes |
|
|
|||||||||||
|
8 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
||||
is an extra ApA/TpT stacking interaction, the differences will yield the thermodynamic |
|
|
|
|||||||||||||||||||
parameters for that interaction. Other sequences are more complex to handle, but this has |
|
|
|
|||||||||||||||||||
all been accomplished. |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
||||||
|
It is helpful to choose a single standard temperature and set of environmental conditions |
|
|
|
||||||||||||||||||
for the tabulation of thermodynamic data: 298 K has been selected, and |
|
|
|
at 298 K in 1 |
G s0 's |
|
|
|||||||||||||||
M NaCl are listed in Table 3.1. Enthalpy values for the stacking interactions can be obtained |
|
|
|
|||||||||||||||||||
in two ways: either by direct calorimetric measurements or by examining the temperature de- |
|
|
|
|||||||||||||||||||
pendence of the stacking interactions. |
are also listedH in0 Table's 3.1. So are |
|
, which |
S |
0 |
's |
||||||||||||||||
can |
be |
calculated from |
the |
relationship |
|
|
|
|
|
s |
0 |
. FromH |
0 |
|
0 |
|
s |
|
||||
|
|
|
|
G |
|
|
|
|
||||||||||||||
|
|
|
|
s |
s |
these data,T Sthermody- |
|
|
|
|||||||||||||
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
s |
|
|
|
||
namic values at other temperatures can be estimated as shown in Box 3.3. The effects of salt |
|
|
|
|||||||||||||||||||
are well understood and are described in detail elsewhere (Cantor and Schimmel, 1980). |
|
|
|
|||||||||||||||||||
|
The results shown in Table 3.1 make |
it clear |
that |
|
the |
effects of the DNA |
sequence |
|
|
|
||||||||||||
on |
duplex stability are very large. The average |
|
|
|
is |
|
|
|
H |
|
s0 |
8 kcal/mol; the range is |
|
|
||||||||
to |
11.9 kcal/mol. The average |
is |
G |
|
s0 |
1.6 |
kcal/mol |
with a range of |
0.9 |
to |
||||||||||||
kcal/mol. Thus the DNA sequence must be considered explicitly in estimating duplex |
|
|
|
|||||||||||||||||||
stabilities. The two additional parameters needed to do this concern the energetics of nu- |
|
|
|
|||||||||||||||||||
cleation. These are relatively sequence |
independent, |
and we |
can use |
average |
|
values of |
|
|
|
|||||||||||||
G nuc0 |
|
5 |
kcal/mol (except if |
no G–C |
pairs |
are |
present, |
then |
|
|
6 |
kcal/mol should |
|
be |
||||||||
used) and |
H |
nuc0 |
0. |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
||||
|
For estimating the stability of perfectly paired duplexes, |
these |
nucleation |
parameters |
|
|
|
|||||||||||||||
and the stacking energies in Table 3.1 are used. Table 3.2 shows typical results when cal- |
|
|
|
|||||||||||||||||||
culated |
|
and |
experimentally |
measured |
are |
compared. TheG |
0agreement's in almost all |
|
|
|
||||||||||||
cases is excellent, and the few discrepancies seen are probably within the range of typical |
|
|
|
|||||||||||||||||||
experimental errors. The approach described above has been generalized to predict the |
|
|
|
|||||||||||||||||||
thermodynamic properties of triple helices, and presumably it will also serve for four- |
|
|
|
|||||||||||||||||||
stranded DNA structures. |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
73
5.6
3.6
TABLE 3.1 |
Nearest-neighbor Stacking Interactions in |
|
|
Double-stranded DNA |
|
|
|
|
|
|
|
|
Nearest-neighbor Thermodynamics |
|
|
|
|
|
|
|
H ° |
S ° |
G ° |
Interaction |
(kcal/mol) |
(cal/Kmol) |
(kcal/mol) |
|
|
|
|
AA/TT |
9.1 |
24.0 |
1.9 |
AT/TA |
8.6 |
23.9 |
1.5 |
TA/AT |
6.0 |
16.9 |
0.9 |
CA/GT |
5.8 |
12.9 |
1.9 |
GT/CA |
6.5 |
17.3 |
1.3 |
CT/GA |
7.8 |
20.8 |
1.6 |
GA/CT |
5.6 |
13.5 |
1.6 |
CG/GC |
11.9 |
27.8 |
3.6 |
GC/CG |
11.1 |
26.7 |
3.1 |
GG/CC |
11.0 |
26.6 |
3.1 |
|
|
|
|
Source: Adapted from Breslauer et al. (1986).