Добавил:

fench Опубликованный материал нарушает ваши авторские права? Сообщите нам.

Вуз:

Сумский государственный университет

Предмет:

Генетика

Файл:

Genomics- The Science and Technology Behind the Human Genome Project. Charles R. Cantor, Cassandra L / genomics11-15 / 11

.pdf

Скачиваний:

Добавлен:

17.08.2013

Размер:

277.66 Кб

Скачать

☆

<<< < Предыдущая 12 / 42 3 4 > Следующая >>>

NESTED DELETIONS

371

Figure 11.10 Delta restriction subcloning. Top panel shows restriction digests of the target plasmid. Bottom panel shows sequences read from delta subclones. Adapted from Ansorge et al. (1996).

the multicloning site adjacent to what was originally an internal segment of the insert and
allows vector sequence to be used as	a primer to obtain this internal sequence. In test
cases about two-thirds of a 2- to 3-kb insert can be sequenced by testing 10 enzymes that
cut within the polylinker. Only a few of these will need to be used for actual subcloning.
The problem with this approach is that one is at the mercy of an unknown distribution of
restriction sites, and at present, it is not clear how the whole process could be		automated
to the point where human intervention becomes unnecessary.
NESTED DELETIONS
This is a more systematic variant of the type of delta restriction cloning approach just de-
scribed. Here a clone is systematically truncated from one or both ends by the use of
exonucleases. The original procedure, developed by Stephen Henikoff, is illustrated in
Figure 11.11. A DNA target is	cut with two different restriction nucleases. One yields		a
3 -overhang; the other yields a 5	-overhang. The enzyme	E. coli	exonuclease III degrades
a 3 -overhang very inefﬁciently, while it degrades the 3		-strand in a 5		-overhang very efﬁ-
ciently. The result is degradation from only a single end of the DNA. After exonuclease
treatment, the ends of the shortened insert must be trimmed to produce cloneable		blunt
ends. The DNA target is then taken up in a new vector and sequenced using primers from

372 STRATEGIES FOR LARGE-SCALE DNA SEQUENCING

Figure 11.11

Preparation of nested deletion clones.

A and

B are restriction enzyme cleavage sites

that give the overhanging ends indicated. Adapted from Henikoff (1984).

that	new vector. In principle, this process ought to be quite efﬁcient. In practice, while
this	proposed strategy has been known for many years, it does not seem	to have found
many adherents.
	A variation on the original exonuclease III procedure for generating nested deletions
has	been described by Chuan Li and Philip Tucker. In this method, termed exoquence
DNA	sequencing, a DNA fragment with different overhangs at its ends is	produced and
one	end is selectively degraded with exonuclease III. At various time	points the reaction

is stopped, and the resulting template-primer complexes are treated separately with one of

several different restriction enzymes and then subjected to	Sanger sequencing reactions,
as shown in Figure 11.12. The ﬁnal DNA sequencing	ladders are examined directly by
gel electrophoresis. Thus no cloning is required. If the	restriction enzymes are chosen
well, and the times at which the reactions are stopped are spaced sufﬁciently closely, suf-

ﬁcient sequence data will be revealed to generate overlaps that will allow the reconstruc-

tion	of contiguous sequence. This is an attractive method in	principle; it remains to be
seen	whether it will prove more generally appealing than the	original nested deletion
cloning approach.

PRIMER JUMPING

373

Figure 11.12	Strategy for exoquence DNA sequencing. Shown is a relatively simple case; there
are more elaborate cases if the restriction enzymes used cut more frequently.			A and B are restriction
enzyme sites as in Figure 11.11;		R is an additional restriction enyzme cleavage site. Taken from Li
and Tucker (1993).

PRIMER

JUMPING

This strategy has been discussed quite a bit. However, there are yet no reported examples

of its implementation. The basic notion is outlined in Figure 11.13. It is similar in many

ways to delta subcloning, but it differs in a number of signiﬁcant features. PCR is used,

rather than subcloning. A very speciﬁc set of restriction enzymes is used: one rare cutter

which can have any cleavage pattern and an additional pair of restriction enzymes con-

sisting of an eight base cutter and a four or six base cutter; but they have to produce the

same

set of complementary single-stranded

ends. Examples are

Not

I (GC/GGCCGC)

and

Sse

8387

I (CCTGCA/GG)

for

the eight cutters,

and

Ene

I (Y/GGCCR) and

Nsi

(ATGCA/T) or

Pst I (CTGCA/G), respectively, as more frequent cutters. In principle, the

approach shown in Figure 11.13 ought to be applicable to much larger DNA than delta

subcloning,

based

the

past

success

making reasonably

large

jumping

libraries

(Chapter 8).

For primer jumping the insert is cloned next to a vector fragment containing any de-

sired

infrequent cleavage

site

between two known unique DNA sequences,

shown

and

in Figure 11.13. The vector fragment is constructed so that it contains no cleavage

sites

for the 4-base

or 6-base cutting enzyme between the unique sequences

and

the

insert,

but it does contain a second

infrequent cleavage site,

N in Figure 11.13, as close

to the

upstream unique

sequence

possible.

Ideally the vector will be one

arm

of a

374 STRATEGIES FOR LARGE-SCALE DNA SEQUENCING

Figure 11.13	Primer jumping, an untested but potentially attractive method for directed DNA se-
quencing. Restriction	enzyme site	N	must	produce an end that is complementary to the end gener-
ated by the four-base cutter. Sites		a	and	b can be anything so long as they are not present in the tar-

get, but there must a site for infrequent cleavage between them.

YAC, and the other arm

could be treated in a similar fashion. The clone is

cut at

the

distal rare cleavage to completion and then partially digested with the frequently cutting

enzyme. The resulting fragments are separated by length, and the separation gel is sliced

into

pieces.

The

resulting DNA fragments are diluted to very low concentration and

ligated. This will produce DNA circles in which the vector sequence, including segments

and

Figure

11.13, is now located next to each site in the partial digest which

was

cleaved

the

frequently cleaving enzyme. Thus the known sequence can now be

used

for

starting a

primer walk. The approximate position of

the walk within the large

clone

will

be known

from the size of the fragment. With the 800-bp to 1-kb sequence

reads

now

being

achieved

under good

circumstances, it is conceivable

that

one

would

be able to sequence from the cleaved site up to the next equivalent restriction site without

the need to make additional primers, in most cases. If this were the case, one could do a

directed walking strategy on a large DNA target using only two primers—one for each

vector arm.

similar

vein

to primer

jumping, if single-sided

PCR

ever

works

well

enough

(Chapter 4), these methods could be used for directed cycle sequencing by the approaches just described.

PRIMER MULTIPLEXING

375

PRIMER MULTIPLEXING

This is a potentially very powerful strategy for large-scale DNA sequencing. It was developed by George Church and has been elaborated, independently, by Raymond Gesteland. There are a number of features that set multiplexing aside from many other approaches. A

major peculiarity of the method is that it does not scale down efﬁciently, so it is best suited for fairly massive projects, typically several hundred kb of DNA sequence or more.

The basic scheme for primer multiplexing is shown in Figure 11.14. In the particular case shown, a multiplexing of 40 is used. Forty different vectors are constructed; each has a unique 20-base sequence on each side of the cloning site. The DNA target of interest is shotgun cloned, separately, into all 40 vectors. This produces 40 different libraries. Pools are constructed by selecting one clone from each of the libraries and mixing them. These 40-clone pools are the samples on which DNA sequencing is performed. The pools are

subjected to standard DNA sequencing chemistry to generate a mixture of 40 different ladders, but no radioactivity or other label is introduced into the DNA at this stage. The mixture is fractionated by polyacrylamide electrophoresis and blotted onto a membrane.

A particularly convenient way to do this is by the bottom wiper described in Chapter 10.

The blotted DNA is crosslinked onto the ﬁlter by UV irradiation to attach it very stably. This is a key step, since the ﬁlters will be reused many times.

To read the DNA sequence from each pool of clones, the ﬁlter is hybridized with a probe corresponding to one of the 40 unique 20-base sequences. By this indirect endlabeling method (introduced in Chapter 8), only one of the 40 clones in the sequence lad-

der is visible. The probe is removed from the ﬁlter by washing, and then the hybridization and washing are repeated successively for each of the other unique sequence primers. By

this multiplexing approach, most stages of the project are streamlined by a factor of 40.

Figure 11.14	Basic scheme used for primer multiplexing:	a, b, c, and so on, represent unique vec-
tor primer sequences.

376	STRATEGIES FOR LARGE-SCALE DNA SEQUENCING
The exceptions are the hybridization, autoradiography, or other color detection, and wash-
ing. Thus great care must be taken to automate these steps in the most efﬁcient way.
Recently a	fairly successful demonstration	of the efﬁciency that can be achieved with
primer multiplexing, combined with transposon		jumping, was reported by Robert Weiss
and Raymond Gesteland.
MULTIPLEX	GENOMIC WALKING
A different approach to multiplex sequencing has been suggested by Walter Gilbert. This
is designed	to be used for the sequencing of	entire small genomes like	E. coli	where di-
rect genomic DNA sequencing is feasible. The method is illustrated in Figure 11.15				a. The
great appeal of this method is that absolutely no cloning is required. The total genome is
digested separately with a set of different restriction enzymes. The products of this diges-
tion are loaded onto polyacrylamide gels in adjacent lanes and fractionated. A highly la-
beled probe with an arbitrary sequence is selected (with a length chosen to occur on aver-

Figure 11.15		Multiplex		genomic walking.	(a ) Basic	outline	of the experiment. (		b ) Restriction
map	in a typical region, and resulting segments of sequence,						A, B, C	revealed by hybridization with
one	speciﬁc probe.	(	c )	Sections of readable and	unreadable sequence	on a particular restriction
fragment. The probe is located				L bases from the end of the fragment.

GLOBAL STRATEGIES

377

age once per genome) and used to hybridize with a blot of the separated fragments (Fig.

11.15b ). In most of the lanes this probe will give a readable sequence. Suppose that the

probe lies 60 bp upstream from a given restriction site. The ﬁrst 60 bases of sequence will

be unreadable because data will extend in both directions (Fig. 11.15

c ). However, longer

regions of the ladder will be interpretable, since they must lie in the direction away from

the nearby restriction site. In general, one will expect to get a number of usable reads in

both directions from the probe, just by the fortuitous occurrence of useful restriction sites.

These reads are assembled into a

segment of DNA sequence. Next probes are designed

from the most distal regions of the segment, and these are used to continue the genomic

walk.

principle, multiplex genomic

walking

is a

very

elegant

and

spartan

approach to

DNA sequencing. One has a choice at any time

whether

use

additional

arbitrary

probes, and so increase the number of parallel sequencing thrusts, or whether to focus on

directed walking. Thus one has a

method with some of the advantages

of both

random

and directed strategies. A potential weakness is the relatively high fraction of failed lanes

that will occur unless the probe has

a single binding site in the genome. Another problem

is the technical demands that genomic sequencing makes. It is also not obvious how easy

this strategy will be to automate. It does work, but the overall efﬁciency needs to be es-

tablished before the method can be compared quantitatively with others.

GLOBAL

STRATEGIES

A basic issue that has confronted the

human genome

project

since

its

conception

is not

how to sequence but what to sequence. From a purely biological standpoint, the most in-

teresting sequencing targets are genes. The choice of genes depends on the sorts of bio-

logical questions one is interested in. An evolutionary biologist may want to sequence one

homologous gene in a wide variety

of organisms. Cell biologists or physiologists may

want to focus on a set of functionally related genes or gene families within just a few or-

ganisms. However, from the point of view of whole genome studies, the purpose of se-

quencing is really to ﬁnd genes and make them available for subsequent biological stud-

ies. This puts a very different tilt on the issues that affect the choice of sequencing targets.

For simple gene-rich organisms like bacteria and yeasts, there is little doubt that com-

plete genomic sequencing is desired and worth doing even with existing DNA sequencing

technology. Indeed sequencing projects have been completed on many bacteria including

H. inﬂuenzae, Mycoplasma genitalium, Mycoplasma pneumoniae, Methanococcus jan-

naschii,

Synechocystis

strain pcc6803,

and

Escherichia coli,

and

one yeast,

S. cerevisiae

(see Chapter 15). Additional projects are well underway with a number of other microor-

ganisms, including the bacterium

Mycobacterium tuberculosis

and the yeast,

S. pombe. E.

coli is an obvious choice as the focus of much of our fundamental studies in prokaryotic

molecular biology. Mycoplasmas represent the smallest known free-living genomes.

Mycobacterium tuberculans

important because

of the

current medical crisis

with

drug-

resistant tuberculosis. The two yeasts account for most of our current knowledge and technical power in fungal genetics. They are also very different from each other, so much will be learned from comparisons between them. The real issue that will have to be faced in the future is at what stage in DNA sequencing technology is it desirable and affordable to sequence the genomes of many other simple organisms?

378

STRATEGIES FOR LARGE-SCALE DNA SEQUENCING

There are a number of more advanced organisms that appear to have relatively high

coding percentages of DNA. These include a simple plant,

Arabidopsis thaliana,

a much

more economically important plant, rice, the fruitﬂy,

Drosophila melanogaster,

and

the

nematode,

Caenorhabditis

elegans.

There

are

strong

arguments

favor

obtaining

complete DNA sequences on these organisms rapidly. They all are systems where a great

deal of past genetics has been done, and a great deal of ongoing

interest

biological

studies remains. Certain primitive ﬁshes may also have small genomes as does the puffer

ﬁsh. Here the argument in favor of sequencing is

that it will

relatively

easy to

ﬁnd

most of the genes. However, these organisms are currently pretty much in a biological

vacuum.

For more complex, gene-dilute organisms, the selection of sequencing targets is, not

surprisingly, also more complex. Here there is little debate that

Homo sapiens

and

the

mouse,

Mus musculus,

are the

obvious ﬁrst choice. It is much

less

clear

what

should

come after this. Do we target other primates because they will be most useful in under-

standing the very large fraction of human genes that are believed to be central nervous

system speciﬁc? Do we examine genomes of organisms that have long been the focus of

physiological studies like rats, dogs, and cats. Or do we aim for a much broader represen-

tation of evolutionary diversity? Alternatively, how important should

the

commercial

value of potential genome targets be? Cows, horses, pine trees, maize, and salmon have a

much more important economic role than

Arabidopsis

C. elegans.

These questions

are

interesting to ponder, but they really do not require answers at the present time. If sufﬁ-

ciently inexpensive DNA sequencing methods are

developed in the future, we will

want

to sequence every genome of biological interest. For the present, technology pretty much

limits us to a few choices.

With most complex organisms, only a few percent of

the

genome

known

coding sequence. The function of the rest,

which we earlier termed junk, is

unknown,

today. With limited resources, and relatively slow sequencing

technology,

most

in-

volved

groups are

choosing

to focus

selectively

sequencing

genes

from

human

or other sources. There are

two

ways

go about this. One approach

ﬁnd

gene-rich region in a genome and sequence it completely. Regions that have been selected

include the T-cell receptor loci, immunoglobulin gene families, and the major histo-

compatibility complex.

All of

these

regions

are

intense

interest

understanding

the function of the immune system. Another region of interest is the Huntington’s disease

region because it is very gene rich, and in the process of ﬁnding the particular gene

responsible for the disease a

large

set

cloned

DNA

samples

from

this

region

has

become available.

An alternative

genomic

sequencing

a gene-rich

region

sequence

cDNAs,

DNA copies of expressed mRNAs. These are relatively easy to produce, and many cDNA libraries are available. Each represents the pattern of gene expression of the particular tissue or sample from which the original mRNA was obtained. In sequencing a cDNA, one

knows one is dealing with an expressed gene, therefore a functional gene. This is a considerable advantage over genomic sequencing where one has no knowledge a priori that a particular gene found at the DNA level is actually ever used by the organism. With cDNA sequencing, one is always examining genes or nearby ﬂanking sequences. This is another great advantage over genomic sequencing where, even in the best of cases, most of the sequence will not be coding. However, there are some potential difﬁculties with projects to examine massive numbers of cDNA sequences, as we will demonstrate.

SEQUENCE-READY LIBRARIES

379

SEQUENCE-READY LIBRARIES

Today, the notion of sequencing an entire human chromosome from left to right telomere is being considered seriously at a number of Genome Centers. In some cases the plans are based on a preexisting minimum tiling set of clones. Here, as long as the set is complete

and exists in a vector like a cosmid or a BAC that allows direct sequencing, the strategy is

predetermined. The clones are selected and sequenced one by one by whatever method is

deemed optimal at the time for 50-

to 150-kb clones.

Suppose,

however, that,

with

sequencing

the eventual

goal, one

wishes

create

an optimal library to facilitate subsequent sequencing of any particular region deemed

interesting.

There

are

two

basically

similar

strategies

for

achieving

this

objective. If

a dense ordered library already exists

appropriate

vector,

one can

sequence

the

ends of all

the

clones

in a

relatively

easy

and

cost-effective

manner.

Since vector

priming

can

used,

the

goal

to read

into

the

cloned

insert

far

possible

single

pass

of raw DNA sequencing. If this is done for all the clones,

the

result

sampling of the genomic sequence (Smith et al., 1994). For example, suppose that the

initial library is 20-fold redundant

50-kb cosmids. A

cosmid end

average

would

occur

every

1.25

kb.

700-base

sequence

read

each

end

would

generate

total

of 28 kb of sequence. When realistic failure rates and some inevitable overlaps are

considered,

the result would still be roughly half

the total sequence.

This

is sufﬁ-

ciently

dense that

almost

any

cDNA

sequence

from

the

region

would

represented

in some

the available

genomic

DNA

sequence.

Thus

all

sequenced

cDNAs

could

be mapped by software sequence comparisons without the

need

for

any additional

experiments.

The

average

spacing

between sequenced

genomic

regions

would be

short enough so that PCR primers could be designed to close any of the gaps by cycle

sequencing.

For many targets, however, there is no existing clone map. The effort to create one de

novo is considerable, even by the enhanced methods described in Chapter 9. For this rea-

son, as automated DNA sequencing becomes more and

efﬁcient,

strategies

that

avoid the construction of

a map altogether

become

attractive.

One recent

proposal

for

such a scheme also relies on the sequencing of the ends of the clones (Venter et al., 1996). Consider, for example, an ordered tenfold redundant BAC library of the human genome.

With 150-kb inserts, 200,000 BACs are required. If each of these is sequenced for 500 bp from both ends, the resulting data set will contain 400,000 sequence reads encompassing

200 Mb of DNA. On average, the density of DNA sequence is a 500-bp block every 7.5 kb. Once created, such a resource would serve two functions. Many cDNAs would still match up with a segment of BAC sequence, and they could serve to correlate the BAC library with other existing genome resources and information. The utility of the BACs in this regard could be improved if, for example, they were created so that their ends had a bias to occur in coding sequence. However, even in the absence of cDNA information, the BACs will serve as a starting point for the genomic sequencing of any region of interest. One could choose any BAC that corresponds to the region of interest and sequence it

completely. Then, by inspection, the BACs in the library that overlapped least with the ﬁrst sequenced BAC could be picked out and used for the next round of sequencing. The

process would continue until the region of interest were completed.	In this	way the
sequencing project itself would create the minimum tiling set of BACs	needed for	the
region.

380 STRATEGIES FOR LARGE-SCALE DNA SEQUENCING

SEQUENCING cDNA LIBRARIES

Usually cDNA libraries are made by a scheme like that shown in Figure 11.16. To prepare high-quality cDNAs, it is important to start with a population of intact mRNAs. This

is not always easy; mRNAs are very susceptible to cleavage by endogenous cellular ribonucleases, and some tissues or samples are very rich in these enzymes. Most eukaryotic

mRNAs have several hundred bases of A at their 3						-end. This poly A tail can be used to
capture these mRNAs and remove contaminating rRNA, tRNA, and other small cytoplas-
mic and nuclear RNAs. Unfortunately, one also loses that fraction of mRNAs that lack a
poly A tail. An oligo-T primer can then be used with reverse transcriptase to make a DNA
copy of the mRNA strand. Alternatively, random primers			can	be used	to	copy	the
mRNAs, or speciﬁc primers can be used if one is searching			for a particular mRNA or
class of mRNAs. There are two general methods to convert the resulting RNA-DNA du-
plexes into cDNAs. Left to their own devices, some reverse transcriptases will, once the
RNA strand is displaced or degraded, continue synthesis, after				making a	hairpin,		until
they have copied the entire DNA strand of the duplex. As shown in Figure 11.16							a, S1 nu-
clease can then be used to cleave the hairpin and generate a cloneable end. Unfortunately,
the S1 nuclease treatment can also destroy some of the ends of the cDNA. An alternative
procedure is to use RNase H to nick the RNA	strand of	the	duplex. The		resulting		nicks
can serve as primer for DNA polymerases like				E. coli		DNA polymerase I. This eventually
leads to a complete DNA copy except for a few	nicks which	can	be	sealed by	DNA	lig-

Figure 11.16 Approaches to the construction of cDNA libraries: Use of S1 nuclease to generate clonable inserts.

<<< < Предыдущая 12 / 42 3 4 > Следующая >>>

Соседние файлы в папке genomics11-15

#
17.08.2013277.66 Кб5911.pdf
#
17.08.2013510.17 Кб5812.pdf
#
17.08.2013311.59 Кб5813.pdf
#
17.08.2013577.75 Кб5814.pdf
#
17.08.2013499.07 Кб5915.pdf
#
17.08.201326.85 Кб58appendix databases.pdf