Добавил:

fench Опубликованный материал нарушает ваши авторские права? Сообщите нам.

Вуз:

Сумский государственный университет

Предмет:

Генетика

Файл:

Genomics- The Science and Technology Behind the Human Genome Project. Charles R. Cantor, Cassandra L / genomics1-10 / 9

.pdf

Скачиваний:

Добавлен:

17.08.2013

Размер:

475.69 Кб

Скачать

☆

<<< < Предыдущая 1 23 / 43 4 > Следующая >>>

					PROBE POOLING IN	S. POMBE	MAPPING	305
ing to increase the throughput			of experiments even further. However, the most efﬁcient
ways to do this have not yet been worked out.
A caveat for all pooling approaches concerns the threshold between background noise
or cross-hybridization and true positive hybridization signals. Theoretical, noise-free
strategies are not likely to be viable with real biological					samples. Instead of striving	for
the most ambitious and efﬁcient possible pooling strategy, it is prudent to use a more
overdetermined approach and sacriﬁce a bit of efﬁciency for a margin of safety. It is im-
portant to keep in mind that a pooling approach does not need to be perfect. Once poten-
tial clones of interest are found or mapped, additional experiments can always be done to
conﬁrm their location. The key idea is to				get the map approximately right with a mini-
mum number of experiments. Then the actual work involved in conﬁrming the pooling is
ﬁnite and can be done regionally by groups interested in particular locales.
PROBE POOLING		IN	S.	POMBE	MAPPING
An example of the power of pooling is illustrated by results obtained in ordering a cosmid
library	of the	yeast	S. pombe.	This organism was chosen for a model mapping project
because a low-resolution restriction map was already available and because of the interest
in this organism as a potential target for genomic sequencing. An arrayed cosmid library
was available. This was ﬁrst screened by hybridization with each of the three							S. pombe
chromosomes puriﬁed by PFG. Typical results are shown in Figure 9.14. It is readily
apparent that most cosmids fall clearly				on one chromosome by their hybridization.
However there are signiﬁcant variations in the amount of DNA present in each spot of the
array. Thus a considerable amount of numerical manipulation of the data is required to
correct for DNA sample variation, and for differences in the signals seen in successive
hybridizations. When this is done, it is possible to assign, uniquely, more than 85% of the
clones to a single chromosome based on only three hybridizations.
The next step is to make a regional assignment of the clones. Here puriﬁed restriction
fragments are labeled and used as hybridization probes with the cosmid array. An exam-
ple is shown in Figure 9.15. Note that it is inefﬁcient to use only a single restriction frag-
ment at a time. For example, once one knows the chromosomal location of the cosmids,
one can mix restriction fragments from different chromosomes and use them simultane-
ously with little chance of introducing errors. After a small number of such experiments,
one has most of the cosmids assigned to a well-deﬁned region.
To ﬁngerprint the cosmid array further, and begin to link up cosmid contigs, arbitrary
mixtures of single-copy DNA probes were				used. These were generated by the method
shown in Figure 9.16. A FIGE separation of a total restriction enzyme digest of							S. pombe
genomic DNA was sliced into fractions. Because of the very high resolution of FIGE in
the size range of interest, these slices should essentially contain nonoverlapping DNA se-
quences. Each slice was then used as a hybridization probe. As an additional ﬁngerprint-
ing tool, mixtures of any available				S. pombe	cloned single-copy sequences were made and
used as probes. For all of these data to be analyzed, it is essential to consider the quantita-
tive hybridization signal, and correct it				both for background and for differences in the
amount of DNA in each clone, the day-to-day variations in labeling, and overall hy-
bridization efﬁciency.
The	hybridization proﬁle of		each of the	cosmid clones with the various probes used is

an indication of where the clone is located. A likelihood analysis was developed to match

306 ENHANCED METHODS FOR PHYSICAL MAPPING

Figure 9.14	Hybridization of a cosmid array of		S. pombe	clones with intact	S. pombe	chromo-
somes I (	a ) and II (	b ).


Figure 9.15	Hybridization of the same cosmid array shown in Figure 9.14 with two large restric-
tion fragments puriﬁed from the		S. pombe	genome.

PROBE POOLING IN

S. POMBE

MAPPING

307

high resolution separation of small restriction fragments by FIGE

purification of probe pools from gel slices

hybridization of probe pools to cosmid filter

gel

filter with cosmid DNA

Figure 9.16 Preparation of nonoverlapping pools of probes by high-resolution FIGE separation of

restriction enzyme-digested	S. pombe	genomic DNA.
up clones with similar hybridization	proﬁles	that indicate possible overlaps (Box 8.3).

The basic logic behind this method is shown in Figure 9.17. Regardless of how pools of probes are made, actually overlapping clones will tend to show a concordant pattern of hybridization when many probe pools are examined. The likelihoods that reﬂect the concordancy of the patterns were then used in a series of different clone-ordering algorithms, including those developed at LLNL, LANL, a cluster analysis, a simulated annealing

analysis, and the method sketched in Box 9.2. In general, we	found	that	the different
methods gave consistent results. Where inconsistencies were seen,	these	could	often be

resolved or rationalized by manual inspection of the data. Figure 9.18 shows the hybridization proﬁle of three clones. This example was chosen because the interpretation of

the proﬁle is straightforward. The clones share hybridization with the same chromosome, the same restriction fragments, and most of the same complex probe mixtures. Thus they

308 ENHANCED METHODS FOR PHYSICAL MAPPING

must	be located	nearby,	and	in fact they form a contig. The clones show some hybridiza-
tion	differences	when	very	simple probe	mixtures are used, since they are not identical
clones. An example of the kinds of contigs that emerge from such an analysis is shown in
Figure 9.19. These are clearly large					and redundant contigs of the sort one hopes to see in
a robust map.

Figure 9.17	Even when complex pools	of probes are used (e.g., pools	a and	b ) overlapping clones
will still tend to show a concordant pattern of		hybridization.

Figure	9.18	Hybridization proﬁle of three different			S. pombe	cosmid	clones,	with a series of 63
different pools of probes.

Figure 9.19		S. pombe	cosmid map	constructed from	the patterns of probe hybridization to the			S.
pombe	cosmid	array. Cosmids	are indicated	by horizontal	lines along the maps. Letters shown
above the map and fragment names (e.g., SHNF					Sﬁ I fragment H and		Not	I fragment F). Gaps in
contigs are indicated by a vertical bar at the right end. Positions of LTRs, 5S rDNAs, and markers
are indicated by *, #, and			, respectively.

FALSE POSITIVES WITH SIMPLE POOLING SCHEMES

309

310 ENHANCED METHODS FOR PHYSICAL MAPPING

BOX 9.2

CONSTRUCTION OF AN ORDERED CLONE MAP

Once a set of likelihood estimates has been obtained for clone overlap, the goal is to assemble a map of these clones that represents the most probable order given the available overlap data. This is a computationally intensive task. A number of different algorithms have been developed for the purpose. They all examine the consistency of particular clone orders with the likelihood results. The available methods all appear to be less than perfectly rigorous, since they all deal with data only on clone pairs and not on higher clusters. Nevertheless, these methods are fairly successful at establishing good ordered clone maps.

Figure 9.20 shows the principle behind a method we used in the construction of a

cosmid	map of	S. pombe.	The objective in this limited example is to test the evidence
in favor of particular schemes for ordering three clones A, B, and C. Various possible
arrangements of these clone are written as the columns and rows of a matrix, each ele-
ment of this matrix is represented by the maximum likelihood estimate that the clones
i and	j overlap by a fraction		f: L ij( f ). For each possible arrangement of clones, we cal-
culated the weight of the matrix,			W	m deﬁned as
			W m	i j L ij(f)
				j i i

The	result for a simple case is shown in the ﬁgure. The true map will have an arrange-
ment of clones that gives a minimum weight or very close to this. The method is par-
ticularly effective in penalizing arrangements of clones where good evidence for over-
lap	exists, and yet the clones are postulated to be nonoverlapping in the ﬁnal

assembled map.

Figure 9.20 Example of an algorithm used to construct ordered clone maps from likelihood data. Details are given in Box 9.2.

	FALSE	POSITIVES WITH	SIMPLE			POOLING SCHEMES		311
The methods developed on	S. pombe	allowed a cosmid			map	to be completed to about
the 98% stage in 1.5 person years of effort. Most of this effort was method or algorithm
development, and we estimate that to repeat this process on a similar size genome would
take only 3 person-months of effort. This is in stark contrast with earlier mapping ap-
proaches. Strict bottom-up ﬁngerprinting methods, scaled to the size			of	the			S.	pombe
genome, would require around 8 to 10 person-years of effort. Thus by the use of pooling
and complex probes, we have gained more than an order of magnitude in mapping speed.
The issue that remains to be tested is how well this kind of approach will do with mam-
malian samples where the effects of repeated DNA sequences will have to be eliminated.
However, between the use of competition hybridization, which has been so successful in
FISH, and selective PCR, we can be reasonably optimistic that probe pooling will be a
generally applicable method.
FALSE POSITIVES WITH SIMPLE POOLING SCHEMES
Row and column pools are very natural ideas for speeding the analysis			of	an	array of
samples. It is easy to implement these	kinds of pools with simple		tools		and	robots.
However, they lead to a signiﬁcant level of false positives when the density of positive
samples in an array becomes large. Here	we illustrate this problem in		detail,			since	it
will be an even more serious problem in more complex pooling schemes. Consider the
simple example shown in Figure 9.21.	Here two clones in the array		hybridize with				a
probe (or show PCR ampliﬁcation with the probe primers, in the case of YAC screening).
If row and column pools are used for the analysis, rather than individual clones, four po-
tentially positive clones are identiﬁed by the combination of two positive rows and two
positive columns. Two are true positives; two are false positives. To decide among them,
each isolated clone can be checked individually. In the case shown, only four additional
hybridizations or PCR reactions would have to be done. This is a small addition to the
number of tests required for screening the rows and columns. However, as the number of
true positives grows linearly, the number of false positives grows quadratically. It soon
becomes hopelessly inefﬁcient to screen them all individually. An alternative approach,
which is much more efﬁcient, in the limit of high numbers of positive samples, is to con-
struct alternate pools. For example, in the case shown in Figure 9.21, if one also included
pools made along the diagonals, most true and false positives could be distinguished.

Figure 9.21 An example of how false positives are generated when a pooled array is probed. Here row and sample pooling will reveal four apparent positive clones whenever only two real positives

occur (unless the two true positives happen to fall on the same row or column).


312			ENHANCED METHODS FOR PHYSICAL MAPPING
MORE		GENERAL POOLING		SCHEMES
A	branch		of mathematics	called	sampling	theory	is	well developed and instructs us how
to	design		pools effectively. In the most		general case there	are two	signiﬁcant	variables:
the	number		of dimensions used for the array and the pools,			and the number of alternate
pool conﬁgurations employed. In the case described in Figure 9.21, the array is two di-
mensional, and the pools are one dimensional. Rows and columns represent one conﬁgu-
ration of the array. Diagonals, in essence, represent another conﬁguration of the array be-
cause		they	would be rows and columns if the elements of			the array were placed in a
different order. Here we want to generalize these ideas. It is most important to realize that
the	dimensionality of an			array or a pool	is a mathematical statement about how we			chose


to describe it. It is not a statement about how the	array is actually composed in space. An
example is shown by the pooling scheme illustrated	in Figure 9.22. Here plate pools are
used in conjunction with vertical pools, made by combining each sample at a		ﬁxed	x-ylo-
cation on all the plates. The arrangement of plates appears to be three		dimensional;	the
plate pools are two dimensional, but the vertical pools are only one dimensional.

Figure 9.22 A pooling scheme that appears, superﬁcially, to be three dimensional, but, in practice, is only two dimensional.

MORE GENERAL POOLING SCHEMES

313

consider

ﬁrst

element library,

and

assume that we

have

sufﬁcient

robotics to

sample it in any way we wish. A two-dimensional square array is constructed by assign-

ing to each element a location:

a ij

where

i 1, N

1/2;j 1, N 1/2

If we pool rows and columns, each of these one-dimensional pools has

N 1/2 components,

and

there are 2

N 1/2 different

pools if we

just consider rows and columns. The

actual

con-

struction of a pool consists of setting one index constant, say

i 3, and

then

combining

all samples that share that index.

The

same

element

library

can

be treated

as a

three-dimensional

cubic

array. This is

constructed, by analogy, with the two-dimensional array by assigning to each element a

location:

a ijk

where

1, N

1/3

1, N

1/3

1, N

1/3

If we pool two-dimensional surfaces of this array, each of these pools has

N 2/3 compo-

nents. There are

N 1/3 different

pools if

consider

three

orthogonal sets

planes. The

actual

process of constructing

these

pools

consists

setting

one

index

constant,

say

j 2, and then combining all samples that share this index.

We can easily generalize the pooling

scheme

to higher dimensions. These become

hard

depict

visually, but there is no

difﬁculty

handling

them

mathematically.

make a four-dimensional array of the library, we assign each of the

clones an index:

a ijkl

where

1, N

1/4

1, N

1/4

1, N

1/4

1, N

1/4

This array is what is actually called a hypercube.

can

make

cubic

pools

samples

from the array by setting one index constant, say

and then combining all samples

that share this index. The result is 4

1/4 different

pools,

each

with

3/4 elements. The

pools actually correspond to four orthogonal sets of sample cubes.

The process can be extended to ﬁve and higher dimensions as needed. Note that the re-

sult of increasing the dimensionality is to decrease, steadily, the number of different pools

needed, at a cost of increasing the size of each pool. Therefore the usefulness of higher-

dimension pooling will depend very much on experimental sensitivity. Can the true posi-

tives b distinguished among an increasingly higher level of background noise as the com-

plexity of the pools grows?

The highest dimension possible for a pooling scheme is called a binary sieve. It is deﬁ-

nitely the most efﬁcient way to ﬁnd a rare event in a complex sample so long as one is

dealing with perfect data. In the examples discussed

above,

note that

the

range

over

which each sample index runs keeps dropping steadily, from

i 1,

1/2 to

i 1, N 1/4 as

the dimension of the array is increased from two to four. The most extreme case possible

would allow each index only a single value; in this case there is really no array, the sam-

ples are just being numbered. A pooling scheme one

step short of this extreme would be

to allow each index to run over just two numbers. If we kept strictly to the above analogy

we would say

i 1, 2; however, it is more useful to let the indices run from 0 to 1. Then

we assign to a particular clone an index like

a 101100. This is

just a

binary

number

(the

equivalent decimal is 44 in this case). Pools are constructed, just as

before,

selecting

all clones with a ﬁxed index, like

a ijk 0mn .

314 ENHANCED METHODS FOR PHYSICAL MAPPING

With a binary sieve we are nominally restricted to libraries where N 2 q. Then the array is constructed by indexing

N is a power of 2:

a ijklmn

. . .

where

i, j, k, l, m,. n,. . 0, 1

Each of the pools made by setting one index to a ﬁxed value will have

/2 elements. The

number

pools will

q log(

)/log(2). This

implicitly

assumes that one scores each

clone for 1 or 0 at each index, but not both. The notion in a pure binary sieve is that if the

clone

not in one pool (k

1),

it must certainly be in the other

(

0),

and so

there

no need to test them both. With real samples, one would almost certainly want to test both

pools to avoid what are usually rather frequent false negatives. The size of each pool is

enormous—it contains half of the clones in the library. However, the number of pools is

very small. It cannot be further reduced without including additional sorts of information

about the samples, such as intensity, color, or other measurable characteristics.

The

binary sieve is constructed by numbering

the

samples

with

binary

numbers,

namely integers in base two. One can back off from the extreme example of the binary

sieve by using indices in other bases. For example, with base three indices, the array is

constructed as

a ijkl . .

. ,

where

i, j, k,.l,.

. 0, 1,

2. This results in a larger number of

pools, each less complex than the pools used in the binary sieve. It is clear that to con-

struct the actual pools used in binary sieves and related schemes would be

quite complex

if one had to do it by hand. However, it is relatively easy to instruct an

x-yrobot to sample

in the required manner. Specialized tools could probably be utilized to make the pooling

process more rapid.

A numerical example will be helpful, here. Consider a library where

N 2 14 16,384

clones. For a two-dimensional array, we need 2

7 by 2

7 or 128 by 128 clones. The row and

column pools will have 128 elements each. There are 256 pools needed. In contrast, the

binary array requires only 14 pools (28 if we

want to protect against false

negatives).

Each pool, however, will have 8192 clones in it! Constructing these pools is not conceiv-

able unless the procedure is fully automated.

As the dimensionality of pooling increases, there are trade-offs between the reduced

number

pools and the increased complexity of the

pools. The advantage of reduced

pool number is obvious: Fewer PCR reactions or hybridizations will have to be done. A

disadvantage of increased pool complexity, beyond background problems that we have al-

ready discussed, is the increasing number of false positives when the array has

high

density of positive targets. For example, with

a two-dimensional array, two

positive

clones a

36and a

24 imply that false positives will appear at a

34 and a

26. In three dimensions,

two positive clones a

826

and

534

will

generate

false

positives

824

, a

834

, a

524

836

536

and a

. The number of false positives is smaller

some

common

orthogonal

526

plane. In

general for

two positive clones

there will

to 2

false

positives

n -dimensional pool. By the time a binary sieve is reached, the false positives become to-

tally impossible to handle. Thus the binary sieve will be useful only for ﬁnding very rare

needles in very large haystacks. For realistic screening of libraries, much lower dimen-

sionality pooling is needed.

The actual optimum dimension pooling scheme to use will depend on the number of

clones in the array, the redundancy of the library (which will increase the rate of false

positives) and the number of false positives that one is willing to tolerate, and

then

re-

screen

for individually or with

an alternate array

conﬁguration. Figure

9.23

gives

some

<<< < Предыдущая 1 23 / 43 4 > Следующая >>>

Соседние файлы в папке genomics1-10

#
17.08.2013296.13 Кб464.pdf
#
17.08.2013326.85 Кб455.pdf
#
17.08.2013406.31 Кб456.pdf
#
17.08.2013277.57 Кб467.pdf
#
17.08.2013634.83 Кб478.pdf
#
17.08.2013475.69 Кб469.pdf
#
17.08.2013192.47 Кб48booktext[1].pdf