

NESTED DELETIONS |
371 |
Figure 11.10 Delta restriction subcloning. Top panel shows restriction digests of the target plasmid. Bottom panel shows sequences read from delta subclones. Adapted from Ansorge et al. (1996).
the multicloning site adjacent to what was originally an internal segment of the insert and |
|
|
||
allows vector sequence to be used as |
a primer to obtain this internal sequence. In test |
|
|
|
cases about two-thirds of a 2- to 3-kb insert can be sequenced by testing 10 enzymes that |
|
|
||
cut within the polylinker. Only a few of these will need to be used for actual subcloning. |
|
|
||
The problem with this approach is that one is at the mercy of an unknown distribution of |
|
|
||
restriction sites, and at present, it is not clear how the whole process could be |
automated |
|
|
|
to the point where human intervention becomes unnecessary. |
|
|
|
|
NESTED DELETIONS |
|
|
|
|
This is a more systematic variant of the type of delta restriction cloning approach just de- |
|
|
||
scribed. Here a clone is systematically truncated from one or both ends by the use of |
|
|
||
exonucleases. The original procedure, developed by Stephen Henikoff, is illustrated in |
|
|
||
Figure 11.11. A DNA target is |
cut with two different restriction nucleases. One yields |
a |
|
|
3 -overhang; the other yields a 5 |
-overhang. The enzyme |
E. coli |
exonuclease III degrades |
|
a 3 -overhang very inefficiently, while it degrades the 3 |
-strand in a 5 |
-overhang very effi- |
||
ciently. The result is degradation from only a single end of the DNA. After exonuclease |
|
|
||
treatment, the ends of the shortened insert must be trimmed to produce cloneable |
blunt |
|
|
|
ends. The DNA target is then taken up in a new vector and sequenced using primers from |
|
|

372 STRATEGIES FOR LARGE-SCALE DNA SEQUENCING
Figure 11.11 |
Preparation of nested deletion clones. |
A and |
B are restriction enzyme cleavage sites |
that give the overhanging ends indicated. Adapted from Henikoff (1984).
that |
new vector. In principle, this process ought to be quite efficient. In practice, while |
|
this |
proposed strategy has been known for many years, it does not seem |
to have found |
many adherents. |
|
|
|
A variation on the original exonuclease III procedure for generating nested deletions |
|
has |
been described by Chuan Li and Philip Tucker. In this method, termed exoquence |
|
DNA |
sequencing, a DNA fragment with different overhangs at its ends is |
produced and |
one |
end is selectively degraded with exonuclease III. At various time |
points the reaction |
is stopped, and the resulting template-primer complexes are treated separately with one of
several different restriction enzymes and then subjected to |
Sanger sequencing reactions, |
as shown in Figure 11.12. The final DNA sequencing |
ladders are examined directly by |
gel electrophoresis. Thus no cloning is required. If the |
restriction enzymes are chosen |
well, and the times at which the reactions are stopped are spaced sufficiently closely, suf- |
ficient sequence data will be revealed to generate overlaps that will allow the reconstruc-
tion |
of contiguous sequence. This is an attractive method in |
principle; it remains to be |
seen |
whether it will prove more generally appealing than the |
original nested deletion |
cloning approach. |
|

PRIMER JUMPING |
373 |
Figure 11.12 |
Strategy for exoquence DNA sequencing. Shown is a relatively simple case; there |
|
|
are more elaborate cases if the restriction enzymes used cut more frequently. |
A and B are restriction |
||
enzyme sites as in Figure 11.11; |
R is an additional restriction enyzme cleavage site. Taken from Li |
||
and Tucker (1993). |
|
|
|
PRIMER |
JUMPING |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
||
This strategy has been discussed quite a bit. However, there are yet no reported examples |
|
|
|
|
|||||||||||||||
of its implementation. The basic notion is outlined in Figure 11.13. It is similar in many |
|
|
|
||||||||||||||||
ways to delta subcloning, but it differs in a number of significant features. PCR is used, |
|
|
|
|
|||||||||||||||
rather than subcloning. A very specific set of restriction enzymes is used: one rare cutter |
|
|
|
|
|||||||||||||||
which can have any cleavage pattern and an additional pair of restriction enzymes con- |
|
|
|
|
|||||||||||||||
sisting of an eight base cutter and a four or six base cutter; but they have to produce the |
|
|
|
|
|||||||||||||||
same |
set of complementary single-stranded |
ends. Examples are |
|
|
|
|
|
Not |
I (GC/GGCCGC) |
|
|||||||||
and |
Sse |
8387 |
I (CCTGCA/GG) |
for |
the eight cutters, |
and |
|
|
|
Ene |
I (Y/GGCCR) and |
Nsi |
I |
||||||
(ATGCA/T) or |
Pst I (CTGCA/G), respectively, as more frequent cutters. In principle, the |
|
|
|
|||||||||||||||
approach shown in Figure 11.13 ought to be applicable to much larger DNA than delta |
|
|
|
|
|||||||||||||||
subcloning, |
based |
on |
the |
past |
success |
at |
making reasonably |
large |
jumping |
libraries |
|
|
|
|
|||||
(Chapter 8). |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
||
For primer jumping the insert is cloned next to a vector fragment containing any de- |
|
|
|
|
|||||||||||||||
sired |
infrequent cleavage |
site |
between two known unique DNA sequences, |
shown |
as |
|
|
|
|
|
a |
||||||||
and |
b |
in Figure 11.13. The vector fragment is constructed so that it contains no cleavage |
|
|
|
||||||||||||||
sites |
for the 4-base |
or 6-base cutting enzyme between the unique sequences |
and |
the |
|
|
|
|
|||||||||||
insert, |
but it does contain a second |
infrequent cleavage site, |
|
|
|
|
|
N in Figure 11.13, as close |
|
||||||||||
to the |
upstream unique |
sequence |
as |
possible. |
Ideally the vector will be one |
arm |
of a |
|
|
|
|

374 STRATEGIES FOR LARGE-SCALE DNA SEQUENCING
Figure 11.13 |
Primer jumping, an untested but potentially attractive method for directed DNA se- |
|||
quencing. Restriction |
enzyme site |
N |
must |
produce an end that is complementary to the end gener- |
ated by the four-base cutter. Sites |
a |
and |
b can be anything so long as they are not present in the tar- |
get, but there must a site for infrequent cleavage between them.
YAC, and the other arm |
could be treated in a similar fashion. The clone is |
cut at |
the |
|||||||||||
distal rare cleavage to completion and then partially digested with the frequently cutting |
|
|||||||||||||
enzyme. The resulting fragments are separated by length, and the separation gel is sliced |
|
|||||||||||||
into |
pieces. |
The |
resulting DNA fragments are diluted to very low concentration and |
|||||||||||
ligated. This will produce DNA circles in which the vector sequence, including segments |
|
|||||||||||||
a |
and |
|
b |
in |
Figure |
11.13, is now located next to each site in the partial digest which |
||||||||
was |
cleaved |
by |
the |
frequently cleaving enzyme. Thus the known sequence can now be |
|
|||||||||
used |
for |
starting a |
primer walk. The approximate position of |
the walk within the large |
||||||||||
clone |
will |
be known |
from the size of the fragment. With the 800-bp to 1-kb sequence |
|||||||||||
reads |
now |
being |
achieved |
under good |
circumstances, it is conceivable |
that |
one |
would |
|
|||||
be able to sequence from the cleaved site up to the next equivalent restriction site without |
||||||||||||||
the need to make additional primers, in most cases. If this were the case, one could do a |
|
|||||||||||||
directed walking strategy on a large DNA target using only two primers—one for each |
|
|||||||||||||
vector arm. |
|
|
|
|
|
|
|
|
|
|
||||
|
In |
a |
similar |
vein |
to primer |
jumping, if single-sided |
PCR |
ever |
works |
well |
enough |
(Chapter 4), these methods could be used for directed cycle sequencing by the approaches just described.

PRIMER MULTIPLEXING |
375 |
PRIMER MULTIPLEXING
This is a potentially very powerful strategy for large-scale DNA sequencing. It was developed by George Church and has been elaborated, independently, by Raymond Gesteland. There are a number of features that set multiplexing aside from many other approaches. A
major peculiarity of the method is that it does not scale down efficiently, so it is best suited for fairly massive projects, typically several hundred kb of DNA sequence or more.
The basic scheme for primer multiplexing is shown in Figure 11.14. In the particular case shown, a multiplexing of 40 is used. Forty different vectors are constructed; each has a unique 20-base sequence on each side of the cloning site. The DNA target of interest is shotgun cloned, separately, into all 40 vectors. This produces 40 different libraries. Pools are constructed by selecting one clone from each of the libraries and mixing them. These 40-clone pools are the samples on which DNA sequencing is performed. The pools are
subjected to standard DNA sequencing chemistry to generate a mixture of 40 different ladders, but no radioactivity or other label is introduced into the DNA at this stage. The mixture is fractionated by polyacrylamide electrophoresis and blotted onto a membrane.
A particularly convenient way to do this is by the bottom wiper described in Chapter 10.
The blotted DNA is crosslinked onto the filter by UV irradiation to attach it very stably. This is a key step, since the filters will be reused many times.
To read the DNA sequence from each pool of clones, the filter is hybridized with a probe corresponding to one of the 40 unique 20-base sequences. By this indirect endlabeling method (introduced in Chapter 8), only one of the 40 clones in the sequence lad-
der is visible. The probe is removed from the filter by washing, and then the hybridization and washing are repeated successively for each of the other unique sequence primers. By
this multiplexing approach, most stages of the project are streamlined by a factor of 40.
Figure 11.14 |
Basic scheme used for primer multiplexing: |
a, b, c, and so on, represent unique vec- |
tor primer sequences. |
|
|

376 |
STRATEGIES FOR LARGE-SCALE DNA SEQUENCING |
|
|
|
The exceptions are the hybridization, autoradiography, or other color detection, and wash- |
|
|
||
ing. Thus great care must be taken to automate these steps in the most efficient way. |
|
|
||
Recently a |
fairly successful demonstration |
of the efficiency that can be achieved with |
|
|
primer multiplexing, combined with transposon |
jumping, was reported by Robert Weiss |
|
|
|
and Raymond Gesteland. |
|
|
|
|
MULTIPLEX |
GENOMIC WALKING |
|
|
|
A different approach to multiplex sequencing has been suggested by Walter Gilbert. This |
|
|
||
is designed |
to be used for the sequencing of |
entire small genomes like |
E. coli |
where di- |
rect genomic DNA sequencing is feasible. The method is illustrated in Figure 11.15 |
|
a. The |
||
great appeal of this method is that absolutely no cloning is required. The total genome is |
|
|
||
digested separately with a set of different restriction enzymes. The products of this diges- |
|
|
||
tion are loaded onto polyacrylamide gels in adjacent lanes and fractionated. A highly la- |
|
|
||
beled probe with an arbitrary sequence is selected (with a length chosen to occur on aver- |
|
|
Figure 11.15 |
Multiplex |
genomic walking. |
(a ) Basic |
outline |
of the experiment. ( |
b ) Restriction |
|||
map |
in a typical region, and resulting segments of sequence, |
|
|
A, B, C |
revealed by hybridization with |
||||
one |
specific probe. |
( |
c ) |
Sections of readable and |
unreadable sequence |
on a particular restriction |
|
||
fragment. The probe is located |
|
L bases from the end of the fragment. |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
GLOBAL STRATEGIES |
377 |
||
age once per genome) and used to hybridize with a blot of the separated fragments (Fig. |
|
|
|
|
|||||||||||
11.15b ). In most of the lanes this probe will give a readable sequence. Suppose that the |
|
|
|
||||||||||||
probe lies 60 bp upstream from a given restriction site. The first 60 bases of sequence will |
|
|
|
|
|||||||||||
be unreadable because data will extend in both directions (Fig. 11.15 |
|
|
c ). However, longer |
|
|||||||||||
regions of the ladder will be interpretable, since they must lie in the direction away from |
|
|
|
|
|||||||||||
the nearby restriction site. In general, one will expect to get a number of usable reads in |
|
|
|
|
|||||||||||
both directions from the probe, just by the fortuitous occurrence of useful restriction sites. |
|
|
|
|
|||||||||||
These reads are assembled into a |
segment of DNA sequence. Next probes are designed |
|
|
|
|
||||||||||
from the most distal regions of the segment, and these are used to continue the genomic |
|
|
|
|
|||||||||||
walk. |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
In |
principle, multiplex genomic |
walking |
is a |
very |
elegant |
and |
spartan |
approach to |
|
|
|
|
|||
DNA sequencing. One has a choice at any time |
whether |
to |
use |
additional |
arbitrary |
|
|
|
|
||||||
probes, and so increase the number of parallel sequencing thrusts, or whether to focus on |
|
|
|
|
|||||||||||
directed walking. Thus one has a |
method with some of the advantages |
of both |
random |
|
|
|
|
||||||||
and directed strategies. A potential weakness is the relatively high fraction of failed lanes |
|
|
|
|
|||||||||||
that will occur unless the probe has |
a single binding site in the genome. Another problem |
|
|
|
|
||||||||||
is the technical demands that genomic sequencing makes. It is also not obvious how easy |
|
|
|
|
|||||||||||
this strategy will be to automate. It does work, but the overall efficiency needs to be es- |
|
|
|
|
|||||||||||
tablished before the method can be compared quantitatively with others. |
|
|
|
|
|
|
|
||||||||
GLOBAL |
STRATEGIES |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
A basic issue that has confronted the |
human genome |
project |
since |
its |
conception |
is not |
|
|
|
|
|||||
how to sequence but what to sequence. From a purely biological standpoint, the most in- |
|
|
|
|
|||||||||||
teresting sequencing targets are genes. The choice of genes depends on the sorts of bio- |
|
|
|
|
|||||||||||
logical questions one is interested in. An evolutionary biologist may want to sequence one |
|
|
|
|
|||||||||||
homologous gene in a wide variety |
of organisms. Cell biologists or physiologists may |
|
|
|
|
||||||||||
want to focus on a set of functionally related genes or gene families within just a few or- |
|
|
|
|
|||||||||||
ganisms. However, from the point of view of whole genome studies, the purpose of se- |
|
|
|
|
|||||||||||
quencing is really to find genes and make them available for subsequent biological stud- |
|
|
|
|
|||||||||||
ies. This puts a very different tilt on the issues that affect the choice of sequencing targets. |
|
|
|
||||||||||||
For simple gene-rich organisms like bacteria and yeasts, there is little doubt that com- |
|
|
|
|
|||||||||||
plete genomic sequencing is desired and worth doing even with existing DNA sequencing |
|
|
|
|
|||||||||||
technology. Indeed sequencing projects have been completed on many bacteria including |
|
|
|
|
|||||||||||
H. influenzae, Mycoplasma genitalium, Mycoplasma pneumoniae, Methanococcus jan- |
|
|
|
||||||||||||
naschii, |
Synechocystis |
strain pcc6803, |
and |
|
|
Escherichia coli, |
and |
one yeast, |
S. cerevisiae |
|
|||||
(see Chapter 15). Additional projects are well underway with a number of other microor- |
|
|
|
|
|||||||||||
ganisms, including the bacterium |
|
|
|
Mycobacterium tuberculosis |
|
and the yeast, |
S. pombe. E. |
|
|||||||
coli is an obvious choice as the focus of much of our fundamental studies in prokaryotic |
|
|
|
|
|||||||||||
molecular biology. Mycoplasmas represent the smallest known free-living genomes. |
|
|
|
|
|||||||||||
Mycobacterium tuberculans |
|
is |
important because |
of the |
current medical crisis |
with |
drug- |
|
|
resistant tuberculosis. The two yeasts account for most of our current knowledge and technical power in fungal genetics. They are also very different from each other, so much will be learned from comparisons between them. The real issue that will have to be faced in the future is at what stage in DNA sequencing technology is it desirable and affordable to sequence the genomes of many other simple organisms?
378 |
STRATEGIES FOR LARGE-SCALE DNA SEQUENCING |
|
|
|
|
|
|
|
|
|
|
|
|
|
|||||||||||
There are a number of more advanced organisms that appear to have relatively high |
|
|
|
|
|
||||||||||||||||||||
coding percentages of DNA. These include a simple plant, |
|
|
|
|
|
|
|
|
|
|
Arabidopsis thaliana, |
a much |
|||||||||||||
more economically important plant, rice, the fruitfly, |
|
|
|
|
|
|
|
Drosophila melanogaster, |
and |
the |
|||||||||||||||
nematode, |
Caenorhabditis |
elegans. |
|
|
|
There |
are |
strong |
arguments |
in |
favor |
of |
obtaining |
|
|
||||||||||
complete DNA sequences on these organisms rapidly. They all are systems where a great |
|
|
|
|
|
|
|
|
|||||||||||||||||
deal of past genetics has been done, and a great deal of ongoing |
interest |
in |
biological |
|
|
|
|
|
|||||||||||||||||
studies remains. Certain primitive fishes may also have small genomes as does the puffer |
|
|
|
|
|
|
|
||||||||||||||||||
fish. Here the argument in favor of sequencing is |
that it will |
be |
relatively |
easy to |
find |
|
|
|
|
|
|||||||||||||||
most of the genes. However, these organisms are currently pretty much in a biological |
|
|
|
|
|
|
|
||||||||||||||||||
vacuum. |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
For more complex, gene-dilute organisms, the selection of sequencing targets is, not |
|
|
|
|
|
|
|||||||||||||||||||
surprisingly, also more complex. Here there is little debate that |
|
|
|
|
|
|
|
|
|
|
Homo sapiens |
and |
the |
||||||||||||
mouse, |
Mus musculus, |
|
are the |
obvious first choice. It is much |
less |
clear |
what |
should |
|
|
|||||||||||||||
come after this. Do we target other primates because they will be most useful in under- |
|
|
|
|
|
|
|
||||||||||||||||||
standing the very large fraction of human genes that are believed to be central nervous |
|
|
|
|
|
||||||||||||||||||||
system specific? Do we examine genomes of organisms that have long been the focus of |
|
|
|
|
|
|
|
||||||||||||||||||
physiological studies like rats, dogs, and cats. Or do we aim for a much broader represen- |
|
|
|
|
|
|
|||||||||||||||||||
tation of evolutionary diversity? Alternatively, how important should |
the |
commercial |
|
|
|
|
|
||||||||||||||||||
value of potential genome targets be? Cows, horses, pine trees, maize, and salmon have a |
|
|
|
|
|
||||||||||||||||||||
much more important economic role than |
|
|
|
|
|
Arabidopsis |
|
or |
C. elegans. |
These questions |
are |
|
|||||||||||||
interesting to ponder, but they really do not require answers at the present time. If suffi- |
|
|
|
|
|
||||||||||||||||||||
ciently inexpensive DNA sequencing methods are |
developed in the future, we will |
want |
|
|
|
|
|
|
|
||||||||||||||||
to sequence every genome of biological interest. For the present, technology pretty much |
|
|
|
|
|
|
|
||||||||||||||||||
limits us to a few choices. |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|||
With most complex organisms, only a few percent of |
the |
genome |
is |
known |
to |
be |
|
|
|
|
|
|
|
||||||||||||
coding sequence. The function of the rest, |
which we earlier termed junk, is |
unknown, |
|
|
|
|
|
|
|||||||||||||||||
today. With limited resources, and relatively slow sequencing |
technology, |
most |
in- |
|
|
|
|
|
|||||||||||||||||
volved |
groups are |
choosing |
to focus |
on |
selectively |
sequencing |
genes |
from |
human |
|
|
|
|
|
|
||||||||||
or other sources. There are |
two |
ways |
to |
go about this. One approach |
is |
to |
find |
a |
|
|
|
|
|||||||||||||
gene-rich region in a genome and sequence it completely. Regions that have been selected |
|
|
|
|
|
|
|
||||||||||||||||||
include the T-cell receptor loci, immunoglobulin gene families, and the major histo- |
|
|
|
|
|
||||||||||||||||||||
compatibility complex. |
All of |
these |
regions |
are |
of |
intense |
interest |
in |
understanding |
|
|
|
|
|
|||||||||||
the function of the immune system. Another region of interest is the Huntington’s disease |
|
|
|
|
|
|
|
||||||||||||||||||
region because it is very gene rich, and in the process of finding the particular gene |
|
|
|
|
|
||||||||||||||||||||
responsible for the disease a |
large |
set |
of |
cloned |
DNA |
samples |
from |
this |
region |
has |
|
|
|
|
|
|
|||||||||
become available. |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
An alternative |
to |
genomic |
sequencing |
in |
a gene-rich |
region |
|
is |
to |
sequence |
cDNAs, |
|
|
|
|
|
|
DNA copies of expressed mRNAs. These are relatively easy to produce, and many cDNA libraries are available. Each represents the pattern of gene expression of the particular tissue or sample from which the original mRNA was obtained. In sequencing a cDNA, one
knows one is dealing with an expressed gene, therefore a functional gene. This is a considerable advantage over genomic sequencing where one has no knowledge a priori that a particular gene found at the DNA level is actually ever used by the organism. With cDNA sequencing, one is always examining genes or nearby flanking sequences. This is another great advantage over genomic sequencing where, even in the best of cases, most of the sequence will not be coding. However, there are some potential difficulties with projects to examine massive numbers of cDNA sequences, as we will demonstrate.
SEQUENCE-READY LIBRARIES |
379 |
SEQUENCE-READY LIBRARIES
Today, the notion of sequencing an entire human chromosome from left to right telomere is being considered seriously at a number of Genome Centers. In some cases the plans are based on a preexisting minimum tiling set of clones. Here, as long as the set is complete
and exists in a vector like a cosmid or a BAC that allows direct sequencing, the strategy is
predetermined. The clones are selected and sequenced one by one by whatever method is |
|
|
|
|||||||||||||||||||||
deemed optimal at the time for 50- |
to 150-kb clones. |
|
|
|
|
|
|
|
|
|
|
|
||||||||||||
Suppose, |
however, that, |
with |
sequencing |
as |
the eventual |
goal, one |
wishes |
to |
create |
|
||||||||||||||
an optimal library to facilitate subsequent sequencing of any particular region deemed |
|
|
||||||||||||||||||||||
interesting. |
There |
are |
two |
basically |
similar |
strategies |
for |
achieving |
this |
objective. If |
||||||||||||||
a dense ordered library already exists |
in |
an |
appropriate |
vector, |
one can |
sequence |
the |
|
||||||||||||||||
ends of all |
of |
the |
clones |
in a |
relatively |
easy |
and |
cost-effective |
manner. |
Since vector |
||||||||||||||
priming |
can |
be |
used, |
the |
goal |
is |
to read |
into |
the |
cloned |
insert |
as |
far |
as |
possible |
in |
a |
|||||||
single |
pass |
of raw DNA sequencing. If this is done for all the clones, |
the |
result |
is |
a |
||||||||||||||||||
sampling of the genomic sequence (Smith et al., 1994). For example, suppose that the |
|
|||||||||||||||||||||||
initial library is 20-fold redundant |
50-kb cosmids. A |
cosmid end |
on |
average |
would |
|
||||||||||||||||||
occur |
every |
1.25 |
kb. |
A |
|
700-base |
sequence |
read |
at |
each |
end |
would |
generate |
a |
total |
of 28 kb of sequence. When realistic failure rates and some inevitable overlaps are
considered, |
the result would still be roughly half |
the total sequence. |
This |
is suffi- |
||||||||||||||
ciently |
dense that |
almost |
any |
cDNA |
sequence |
from |
the |
region |
would |
be |
represented |
|
||||||
in some |
of |
the available |
genomic |
DNA |
sequence. |
Thus |
all |
sequenced |
cDNAs |
could |
|
|||||||
be mapped by software sequence comparisons without the |
need |
for |
any additional |
|
||||||||||||||
experiments. |
The |
average |
spacing |
between sequenced |
genomic |
regions |
would be |
|||||||||||
short enough so that PCR primers could be designed to close any of the gaps by cycle |
||||||||||||||||||
sequencing. |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
For many targets, however, there is no existing clone map. The effort to create one de |
||||||||||||||||||
novo is considerable, even by the enhanced methods described in Chapter 9. For this rea- |
||||||||||||||||||
son, as automated DNA sequencing becomes more and |
more |
efficient, |
strategies |
that |
|
|||||||||||||
avoid the construction of |
a map altogether |
become |
attractive. |
One recent |
proposal |
for |
such a scheme also relies on the sequencing of the ends of the clones (Venter et al., 1996). Consider, for example, an ordered tenfold redundant BAC library of the human genome.
With 150-kb inserts, 200,000 BACs are required. If each of these is sequenced for 500 bp from both ends, the resulting data set will contain 400,000 sequence reads encompassing
200 Mb of DNA. On average, the density of DNA sequence is a 500-bp block every 7.5 kb. Once created, such a resource would serve two functions. Many cDNAs would still match up with a segment of BAC sequence, and they could serve to correlate the BAC library with other existing genome resources and information. The utility of the BACs in this regard could be improved if, for example, they were created so that their ends had a bias to occur in coding sequence. However, even in the absence of cDNA information, the BACs will serve as a starting point for the genomic sequencing of any region of interest. One could choose any BAC that corresponds to the region of interest and sequence it
completely. Then, by inspection, the BACs in the library that overlapped least with the first sequenced BAC could be picked out and used for the next round of sequencing. The
process would continue until the region of interest were completed. |
In this |
way the |
sequencing project itself would create the minimum tiling set of BACs |
needed for |
the |
region. |
|
|

380 STRATEGIES FOR LARGE-SCALE DNA SEQUENCING
SEQUENCING cDNA LIBRARIES
Usually cDNA libraries are made by a scheme like that shown in Figure 11.16. To prepare high-quality cDNAs, it is important to start with a population of intact mRNAs. This
is not always easy; mRNAs are very susceptible to cleavage by endogenous cellular ribonucleases, and some tissues or samples are very rich in these enzymes. Most eukaryotic
mRNAs have several hundred bases of A at their 3 |
|
|
|
|
-end. This poly A tail can be used to |
||
capture these mRNAs and remove contaminating rRNA, tRNA, and other small cytoplas- |
|
||||||
mic and nuclear RNAs. Unfortunately, one also loses that fraction of mRNAs that lack a |
|||||||
poly A tail. An oligo-T primer can then be used with reverse transcriptase to make a DNA |
|||||||
copy of the mRNA strand. Alternatively, random primers |
can |
be used |
to |
copy |
the |
||
mRNAs, or specific primers can be used if one is searching |
for a particular mRNA or |
||||||
class of mRNAs. There are two general methods to convert the resulting RNA-DNA du- |
|
||||||
plexes into cDNAs. Left to their own devices, some reverse transcriptases will, once the |
|||||||
RNA strand is displaced or degraded, continue synthesis, after |
making a |
hairpin, |
until |
||||
they have copied the entire DNA strand of the duplex. As shown in Figure 11.16 |
a, S1 nu- |
||||||
clease can then be used to cleave the hairpin and generate a cloneable end. Unfortunately, |
|||||||
the S1 nuclease treatment can also destroy some of the ends of the cDNA. An alternative |
|||||||
procedure is to use RNase H to nick the RNA |
strand of |
the |
duplex. The |
resulting |
nicks |
||
can serve as primer for DNA polymerases like |
|
|
|
E. coli |
DNA polymerase I. This eventually |
||
leads to a complete DNA copy except for a few |
nicks which |
can |
be |
sealed by |
DNA |
lig- |
|
Figure 11.16 Approaches to the construction of cDNA libraries: Use of S1 nuclease to generate clonable inserts.