Добавил:
Опубликованный материал нарушает ваши авторские права? Сообщите нам.
Вуз: Предмет: Файл:
Скачиваний:
57
Добавлен:
17.08.2013
Размер:
253.34 Кб
Скачать

BASIC DNA BIOLOGY

21

BOX 1.5 (Continued)

 

One function of nick translation is to degrade primers used at earlier stages of DNA

 

synthesis. Frequently these primers are RNA molecules. Thus the nick translation, 5

-

exonuclease activity ensures that no RNA segments remain in the finished DNA. This activity also is used as part of the process by which DNA damaged by radiation or chemicals is repaired.

The final DNA polymerase activity commonly encountered is strand displacement. This is observed in some mechanisms of DNA replication.

In most DNA replication, however, a separate enzyme, DNA helicase, is used to melt

 

 

the double helix (separate the base-paired strands) prior to chain extention. Some DNA

 

 

 

polymerases also have a terminal transferase activity. For

example,

 

 

Taq

polymerase

usually adds a single nontemplated A onto the 3

 

 

-ends of the strands that it has synthe-

sized.

 

 

 

 

 

 

 

 

 

 

A large library of enzymes exists for manipulating nucleic acids. Although several

 

 

enzymes may modify nucleic acids in the same or a similar manner, differences in cat-

 

 

 

alytic activity or in the protein structure may lead to success or failure in a particular

 

 

application. Hence the choice of a specific enzyme for a novel application may require

 

 

 

an intimate knowledge of the differences between the enzymes catalyzing the same re-

 

 

 

action. Given the sometimes unpredictable behavior of enzymes, empirical testing of

 

 

several similar enzymes may be required.

 

 

 

 

 

 

 

More details are given here about one particularly well-studied enzyme. DNA poly-

 

 

 

merase I, isolated from

Escherichia coli

by Arthur Kornberg in 1963, established much

of the nomenclature

used

with enzymes that act on nucleic

acids. This enzyme

was

 

 

 

one of the first enzymes involved in macromolecular synthesis

to be

isolated,

and

it

 

 

also displays multiple catalytic activities. DNA polymerase I replicates DNA, but it is

 

 

mostly a DNA repair enzyme rather than the major DNA replication enzyme in vivo.

 

 

 

This

enzyme

requires

a single-stranded DNA template to

provide

instructions

on

 

 

which DNA sequence to make, a short oligonucleotide primer with a free 3

 

 

 

-OH ter-

minus to specify where synthesis should begin, activated precursors (nucleoside

 

 

triphosphates), and a divalent cation like MgCl

 

 

2. The

primer

oligonucleotide is ex-

tended

at its

3

-end by the addition, in a 5

 

-

to 3 -direction,

of mononucleotides

that

are complementary to the opposite base on the template DNA. The activity of DNA polymerase I is distributive rather than processive, since the enzyme falls off the template after incorporating a few bases.

(continued)

22

DNA

CHEMISTRY

AND BIOLOGY

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

BOX 1.5

 

(Continued)

 

 

 

 

 

 

Besides the polymerase activity, DNA polymerase I has two activities that degrade

 

 

DNA. These activities are exonucleases because they degrade DNA from either the 5

 

 

-

or the 3

-end. Both activities require double-stranded DNA. The 3

 

-exonuclease proof-

reading

activity is the reverse reaction of the 5

- to

3 -polymerase activity. This

activ-

ity enhances the specificity of extension reaction by removing a mononucleotide from

 

 

the 3 -primer end when it is mismatched with the template base. This means that the

 

 

specificity

of

the DNA

polymerase I extension

reaction is

enhanced from

 

 

10 8 to

10

9. Both the extension and 3

 

-exonuclease activity reside in the same large, prote-

 

olytic degradation fragment of DNA polymerase, also called the Klenow fragment.

 

 

The

5

 

- to 3

-exonuclease

activity

is quite

different. This activity resides in

the

 

smaller proteolytic degradation fragment of DNA polymerase I. It removes oligonu-

 

 

cleotides containing 3–4 bases, and its activity does not depend on the occurrence of

 

 

mismatches. A strand displacement reaction depends on a concerted effort of the ex-

 

 

tension and 5

-exonuclease activity of DNA polymerase I. Here the extension reaction

 

 

begins

at

a

single-stranded nick;

the 5

 

-exonuclease

activity degrades

the single-

stranded

DNA annealed to the template ahead of the 3

 

-end being

extended,

thus pro-

viding a single-stranded template for the extension reaction. The DNA polymerase I

 

 

extension reaction will also act on nicked DNA in the absence of the 5

 

 

-exonuclease

activity. Here the DNA polymerase I just strand displaces the annealed single-stranded

 

 

DNA as it extends from the nick.

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

Figure 1.11

Structure of replication forks.

(a)

A single fork showing the continuously synthesized

leading strand and discontinuously synthesized lagging strand.

(b) A double fork, common in al-

most all genomic

DNA replication.

(c) Topological

aspects of DNA replication. Thin arrows show

translational motion of the forks; bold arrows show the required DNA rotation around the forks.

BASIC DNA BIOLOGY

23

Unfortunately, the rotations generated by the two forks do not cancel each other out. They add. If this rotation were actually allowed to occur across massive lengths of DNA, the cell would probably be stirred to death. Instead, optional topoisomerases are used to restrict the rotation to regions close to the replication fork. Topoisomerases can cut and reseal doublestranded DNA very rapidly without allowing the cut ends to diffuse apart. Thus the rotations

can let the torque generated by helix unwinding to be dissipated. (The actual mechanism of

topoisomerases is more complex and indirect, but the outcome is the same as

we have

stated.)

 

The information stored in the sequence of DNA bases comprises inherited characteris-

tics called genes. Early in the development of molecular biology, observed facts

could be

accounted for by the principle that one gene (i.e., one stretch of DNA sequence) codes for one protein, a linear polymer containing a sequence composed of up to 20 different amino

acids, as shown in Figure 1.12

a.

A three-base sequence of DNA directs the incorporation

of one amino acid; hence the genetic code is

a triplet code. The gene would define both

the start and stop points of actual transcription

and the ultimate start and stop points of

the translation into protein. Nearby would be additional DNA sequences that regulate the

nature of the gene expression: when, where, and how much mRNA should be made. A

typical gene in a prokaryote (a bacterium or

any

other cell without a nucleus) is one to

two thousand base pairs. The resulting mRNA has some upstream and downstream untranslated regions; it encodes (one or more) proteins with several hundred amino acids.

Figure

1.12

What

genes

are.

(a) Transcription and translation of a typical prokaryotic

gene.

N and

C indicate

the

amino

and carboxyl

ends of the peptide backbone of a protein.

(b)

Transcription and translation of a typical eukaryotic gene. Introns (i) are removed by splicing leav-

 

ing only exons (e)

 

.

 

 

 

24 DNA CHEMISTRY AND BIOLOGY

Figure 1.13

Six possible reading frames (arrows) for a stretch of DNA sequence.

 

 

The genes of eukaryotes (and even some genes in prokaryotes) are much larger and more

 

 

complex. A typical mammalian gene might be 30,000 base pairs in length. Its regulatory re-

 

 

gions can be far upstream, downstream, or buried in the middle of the gene. Some genes are

 

 

known that are almost 100 times larger. A key difference between prokaryotes and eukary-

 

 

otes is that most of the DNA sequence

in eukaryotic genes is not translated (Fig.

1.12

 

b ). A

very long RNA transcript (called

hnRNA,

for heterogeneous nuclear RNA) is made; then

 

most of it is removed by a process called

 

RNA splicing.

In a typical case several or many sec-

 

tions of the RNA are removed, and the remaining bits are resealed. The DNA segments cod-

 

 

ing for the RNA that actually remain in

the mature translated message are called

 

exons

(be-

cause they are expressed). The parts

excised are called

 

introns.

The function of introns,

 

beyond their role in supporting the splicing reactions, is not clear. The resulting eukaryotic

 

 

mRNA, which codes for a single protein, is typically 3 kb in size, not much bigger than its

 

 

far more simply made prokaryotic counterpart.

 

 

 

 

Now that we know the DNA structure and RNA transcription products of many genes, the

 

 

notion of one gene one protein has to be broadened considerably. Some genes show a pattern

 

 

of multiple starts. Different proteins can be made from the same gene if these starts affect

 

 

coding regions. Quite common are genes with multiple alternate splicing patterns. In the sim-

 

 

plest case this will result in the elimination of an exon or the substitution of one exon for an-

 

 

other. However, much more complicated

variations can be generated in this way.

Finally

 

 

DNA sequences can be read in multiple reading frames, as shown by Figure 1.13. If the se-

 

 

quence allows it, as many as six different (but not independent) proteins can be coded for by

 

 

a single DNA sequence depending on which strand is read and in what frame. Note that since

 

 

transcription is unidirectional, in the same direction as replication, one DNA strand is tran-

 

 

scribed from left to right as the structures are drawn, and the other from right to left.

 

 

Recent research has found

evidence for genes that lie

completely within other genes.

 

 

For example, the gene responsible for some forms of the disease neurofibromatosis is an extremely large one, as shown in Figure 1.14. It has many long introns, and they are tran-

Figure 1.14 In eukaryotes some genes can lie within other genes. A small gene with two introns coded for by one of the DNA strands lies within a single intron of a much larger gene coded for by

the other strand. Introns are shown as hatched.

GENOME SIZES

25

scribed off of the opposite strand used for the transcription of the type one neurofibromatosis gene. The small gene is expressed, but its function is unknown.

GENOME SIZES

 

 

 

 

 

 

 

 

 

 

 

 

The purpose of the human genome project

is to map and sequence the

human genome

 

 

and find all of the genes it contains. In parallel, the genomes of a number of model organ-

 

isms will also be studied. The rationale for this is clear-cut. The human is a very poor ge-

 

netic organism. Our lifespan is so long that very few generations can be monitored. It is

 

unethical (and impractical) to control breeding of humans. As a result one must examine

 

 

inheritance

patterns

retrospectively

in families. Typical human families

are

quite

small.

 

We are a very heterogeneous outbred species, with just the opposite genetic characteris-

 

tics of highly inbred, homogeneous laboratory strains of animals used for genetic studies.

 

For all these reasons experimental genetics is largely restricted to model organisms. The

 

 

gold standard test for the function

of a previously unknown gene is to

knock it out

and

 

 

see the resulting effect, in other words, determine the phenotype of a deletion. For organ-

 

isms with two copies of their genome, like humans, this requires knocking out both gene

 

copies. Such a double knockout is extremely difficult without resorting to controlled

 

breeding. Thus model organisms are a necessary part of the genome project.

 

 

 

 

 

 

Considerable thought has gone into the choice of model organisms. In general, these

 

represent a compromise between genome size and genetic utility.

 

 

 

 

E. coli

is the best-stud-

ied

bacterium;

its

complete

DNA

sequence

became

available

early

in

1997.

 

Saccharomyces cerevisiae

 

 

is the best studied yeast, and for that

matter the best studied

 

single-cell eukaryotic organism. Its genetics is exceptionally well developed, and the

 

complete 12,068 kb DNA sequence was reported in 1996, the result of a worldwide coor-

 

dinated effort for DNA sequencing.

 

 

Caenorhabditis

elegans,

 

a nematode worm has very

well-developed genetics; its developmental biology is

exquisitely refined. Every

cell

in

 

the mature organism is identified as are the cell lineages that lead up to the mature adult.

 

The

last

invertebrate

canonized

by the genome project is the fruit fly

 

 

Drosophila

melanogaster.

 

This organism has played a key role in the development of the field of ge-

 

netics, and it is also an extraordinarily convenient system for studies of development. The

 

fruit fly has an unusually small genome for such a complex organism; thus the utility of

 

genomic sequence data is especially apparent in this case.

 

 

 

 

 

 

 

For vertebrates, if a single model organism must be selected, the mouse is the obvious

 

choice. The size of the genome of

 

 

Mus musculus

 

is similar

to that of humans. However,

its generation time is much shorter, and the genetics of the mouse is far easier to manipu-

 

late. A number of inbred strains exist with relatively homozygous but different genomes;

 

 

yet these will crossbreed in some cases. From such interspecific crosses very powerful ge-

 

 

netic mapping tools emerge, as we will describe in Chapter 6. Mice are small, hence rela-

 

tively inexpensive to breed and maintain. Their genetics and developmental biology are

 

 

relatively advanced. Because of efforts to contain the projected costs of the genome pro-

 

ject, no other “official” model organisms exist. However, many other organisms are of in-

 

tense interest for genome studies; some of these are already under active scrutiny. These

 

include maize, rice,

Arabhidopsis thaliana,

 

rats, pigs, cows, as well as a number of sim-

pler organisms.

 

 

 

 

 

 

 

 

 

 

 

 

 

In thinking of which additional organisms to subject to genome analysis, careful atten-

 

tion must be given to what is called the G-value paradox. Within even relatively similar

 

 

classes of organisms, the genome size can vary considerably. Furthermore, as the data in

 

Table

1.1

reveal,

there is not

a monotomic relationship

between genome

size

and

our

 

26

DNA CHEMISTRY AND BIOLOGY

 

 

 

 

TABLE 1.1 Genome Sizes (base pairs)

 

 

 

 

 

 

 

 

 

Bacteriophage lambda

5.0

104

 

 

Escherichia coli

4.6

106

 

 

Yeasts

12.0

106

 

 

Giardia lamblia

14.0

106

 

 

Drosophila melanogaster

1.0

108

 

 

Some hemichordates

1.4

108

 

 

Human

3.0

109

 

 

Some amphibians

8.0

1011

Note: These are haploid genome sizes. Many cells will have more than one copy of the haploid genome.

view of how evolutionarily advanced a particular organism is. Thus, for example, some

 

 

 

amphibians have genomes several hundred times larger than the human. Occasional or-

 

 

 

 

ganisms like some hemichordates or the puffer fish have relatively small genomes despite

 

 

 

 

their relatively recent evolution. The same sort of situation exists in plants.

 

 

 

 

 

 

 

In planning the future of genome studies,

as

attention broadens

to

additional

 

 

organisms, one must decide whether it will be more interesting to examine closely related

 

 

 

organisms

or to cast as broad a

phylogenetic

net

as

funding permits.

Several

organ-

 

 

 

isms seem to be of particular interest

at

the

present

time.

The

fission

yeast

 

 

Schizosaccharomyces

pombe

has

a genome the same size as the

budding

yeast

S. cere-

 

visiae.

However, these two organisms are as far diverged from each other, evolutionarily,

 

 

as each is from a

human being. The genetics of

 

 

 

 

 

S. pombe

 

is almost as facile

as that of

 

S. cerevisiae.

Any features strongly conserved in both organisms are likely to be present

 

 

throughout life as we know it. Both yeasts are very densely packed with genes. The temp-

 

 

 

 

tation to compare them with full genomic sequencing may be irresistible. Just how far

 

 

 

genome studies will be extended to other organisms, to large numbers of different indi-

 

 

 

viduals, or even to repeated samplings of a given individual will depend on how efficient

 

 

 

these studies eventually become. The potential future need for genome analysis is almost

 

 

 

 

unlimited, as described in Box 1.6.

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

BOX 1.6

 

 

 

 

 

 

 

 

 

 

 

 

 

 

GENOME

PROJECT

ENHANCEMENTS

 

 

 

 

 

 

 

 

 

 

 

 

 

DNA Sequencing Rate:

 

 

 

 

 

 

 

 

 

 

 

 

 

bp Per Person Per Day

 

 

 

Accessible Targets

 

 

 

 

 

 

 

 

 

 

 

 

 

 

106

 

One human, five selected model organisms

 

 

 

 

 

 

 

Organisms of commercial value

 

 

 

 

 

 

 

107

 

Selected diagnostic DNA sequencing

 

 

 

 

 

108

 

Human diversity (see Chapter 15)

 

 

 

 

 

 

 

 

 

 

5 109 individuals

 

6 to 12 106

 

 

 

 

 

 

 

 

differences

3 to 6

1016

 

 

 

 

 

 

 

Full diagnostic DNA sequencing

 

 

 

 

 

 

 

109

 

Environment exposure assessment

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

SOURCES AND ADDITIONAL READINGS

 

 

27

NUMBERS

OF

GENES

 

 

 

 

 

 

 

 

 

 

It has been estimated that half of the genes in the human genome are central nervous sys-

 

 

 

tem specific. For such genes, one must wonder how adequate a model the mouse will be

 

 

 

for the human. Even if there are similar genes in both species, it is not easy to see how the

 

 

 

counterparts of particular human phenotypes will be found in the mouse. Do mice get

 

 

 

headaches, do they get depressed, do they have fantasies, do

they dream in color? How

 

 

 

can we tell? For such reasons it is desirable, as the technology advances to permit this, to

 

 

 

bring into focus the genomes of experimental animals more amenable to neurophysiologi-

 

 

 

cal and psychological studies. Primates like the chimp are similar

enough

to the human

 

 

 

that it should be easy to study them by starting with human material

as DNA

probes. Yet

 

 

 

the differences between humans and chimps are likely to be of particular interest in defin-

 

 

 

ing the truly unique features of our species. Other vertebrates, like the rat, cat, and dog,

 

 

 

while more distant from the human, may also be very attractive genome targets because

 

 

 

their physiologies are very convenient to study, and in some cases they display very well-

 

 

 

developed personality traits. Other organisms, such as the parasitic protozoan,

 

Giardia

 

lamblia

 

or the blowfish, fugu, are eukaryotes of particular interest because of their com-

 

 

 

paratively small genome sizes.

 

 

 

 

 

 

 

 

 

 

The true goal of the genome project is to discover all of the genes in an organism and

 

 

 

make them available in a form convenient for future scientific study. It is not so easy, with

 

 

 

present tools and information, to estimate the number of genes in any organism. The first

 

 

 

complete bacterial genome to be sequenced is that of

 

 

 

H. influenzae

Rd. It has 1,830,137

base pairs and 1743 predicted protein coding regions plus

six sets of

three rRNA genes

 

 

 

and numerous genes for other cellular RNAs like tRNA.

 

 

 

H. influenzae

 

is not as well stud-

 

ied

as

E. coli,

and we do

not yet know how many of these coding

regions are actually ex-

 

 

 

pressed.

For

the

bacterium

E. coli,

we believe

that almost all genes are expressed and

 

 

translated to at least a detectable extent. In two-dimensional electrophoretic fractionations

 

 

 

of

E. coli

proteins, about 2500 species can be seen. An average

E. coli

gene is about 1 to

 

2 kb in size; thus the 4.6 Mb genome is fully packed with genes. Yeasts are similarly

 

 

 

packed. Further details about gene density are given in Chapter 15.

 

 

 

 

 

 

In vertebrates the gene density is much more difficult to estimate. An average gene is

 

 

 

probably about 30 kb. In any given cell type, 2d electrophoresis reveals several thousand

 

 

 

protein products. However, these products are very different in different cell types. There

 

 

 

is no way to do an exhaustive

search. Various

estimates of

the total number of human

 

 

 

genes range from 5

 

104 to 2

105 . The

true

answer will probably not be known until

 

 

long after we have the complete human DNA sequence, because of the problems of multi-

 

 

 

ple splicing patterns and genes within genes discussed earlier. However, by having cloned

 

 

 

and sequenced the entire human genome, any section of DNA suspected of harboring one

 

 

 

or more genes will be easy to scrutinize further.

 

 

 

 

 

 

 

 

SOURCES

AND

ADDITIONAL READINGS

 

 

 

 

 

 

 

 

Alivisatos, A. P., Jonsson, K. P., Peng, X., Wilson, T. E., Loweth, C. J., Bruchez, M. P., and Schultz,

 

 

P. G. 1996. Organization of “nanocrystal molecules” using DNA.

 

Nature

382: 609–611.

 

Berman, H. M. 1997. Crystal studies of B-DNA: The answers and the questions.

 

Biopolymers

44:

 

23–44.

 

 

 

 

 

 

 

 

 

 

Berman, H. M., Olson, W. K., Beveridge, D. L., Westbrook J., Gelbin, A., Demeny, T., Hsieh, S.-H., Srinivasan, A. R., and Schneider, B. 1992. The Nucleic Acid Database: A comprehensive

28

DNA CHEMISTRY AND BIOLOGY

 

 

 

 

 

 

 

relational database of three-dimensional structures of nucleic acids.

 

 

Biophysical

Journal

63:

 

751–759.

 

 

 

 

 

 

 

Cantor, C. R., and Schimmel, P. R. 1980.

Biophysical

Chemistry.

San Francisco: W. H. Freeman,

 

 

ch. 3 (Protein structure) and ch. 4 (Nucleic acid structure).

 

 

 

 

 

Garboczi, D. N., Ghosh, P., Utz, U., Fan, Q. R., Biddison, W. E, and Wiley, D. C. 1996. Structure of

 

 

the complex between human T-cell receptor, viral peptide and HLA-A2.

 

 

Nature

384: 134–141.

Hartmann, B., and Lavery, R. 1996. DNA structural forms.

Quarterly

Review of Biophysics

29:

 

309–368.

 

 

 

 

 

 

 

Klinman, D. A., Yi, A., Beaucage, S., Conover, J., and Krieg, A. M. 1996. CpG motifs expressed by

 

 

bacterial DNA rapidly induce lymphocytes to secrete IL-6, IL-12, and IFN-g.

 

 

Proceeding of the

 

 

National Academy of Sciences USA

93: 2879–2883.

 

 

 

 

Lodish, H., Darnell, J., and Baltimore, D. 1995.

Molecular Cell

Biology,

3rd. ed. New York:

 

 

Scientific American Books.

 

 

 

 

 

 

Mao, C., Sun, W., and Seeman, N. C. 1997. Assembly of Borromean rings from DNA.

 

 

Nature

386:

 

137–138.

 

 

 

 

 

 

 

Mirkin, C. A., Letsinger, R. L., Mucic, R. C., and Storhoff, J. J. 1996. A DNA-based method for ra-

 

 

 

tionally assembling nanoparticles into macroscopic materials.

 

Nature

382: 607–609.

 

Niemeyer, C. M., Sano, T., Smith, C. L., and Cantor, C. R. 1994. Oligonucleotide-directed self-

 

 

 

assembly of proteins: Semisynthetic DNA-streptavidin hybrid molecules as connectors for the

 

 

 

 

 

generation of macroscopic arrays and the construction of supramolecular bioconjugates.

 

 

Nucleic

 

Acids Research

22: 5530–5539.

 

 

 

 

 

 

Saenger, W. 1984.

Principles of Nucleic Acid Structure.

New York: Springer-Verlag.

 

 

Timsit, H. Y., and Moras, D. 1996. Cruciform structures and functions.

 

Quarterly Review

of

 

Biophysics

29: 279–307.

 

 

 

 

 

 

Соседние файлы в папке genomics1-10