Добавил:

fench Опубликованный материал нарушает ваши авторские права? Сообщите нам.

Вуз:

Сумский государственный университет

Предмет:

Генетика

Файл:

Genomics- The Science and Technology Behind the Human Genome Project. Charles R. Cantor, Cassandra L / genomics1-10 / 6

.pdf

Скачиваний:

Добавлен:

17.08.2013

Размер:

406.31 Кб

Скачать

☆

<<< < Предыдущая 1 23 / 53 4 5 > Следующая >>>

INTERVAL MAPPING

185

Thus the average contribution from one child to the ELOD is the

sum

of these two

cases weighted by their expected frequency.

Since

recombination

across

occurs

only 10% of the time,

ELOD(0.1)

0.9 log

0.9

0.1 log

0.1

0.5

ELOD(0.1)

0.23 0.07 0.16

Thus observation of cosegregation of A and D adds to the probability of linkage, while

observation of separation of A and D subtracts from the evidence for linkage.

What we need to do is develop the tools to assess the statistical signiﬁcance of a par-

ticular ELOD score. Since some markers

will appear to cosegregate by

chance

any

study with a relatively small number of affected individuals, there is always a chance of

seeing a signiﬁcantly positive LOD score, simply because of the random ﬂuctuations. A

near consensus in human genetics is that an observed LOD of 3.0 or higher is required

before the probability of purely accidental linkage can be reduced to the point where few

errors are made. For the example just described, the number of individuals segregating D

with unambiguous pedigrees that would have to be combined to generate a LOD score of

3.0 can be estimated as 3/0.16

18. For common inherited diseases this is not a problem,

but for very rare diseases it may be extremely difﬁcult to ﬁnd 18 genetically informative

individuals for a particular marker with an unambiguous diagnosis.

Note that several constraints apply to the linkage analysis described above. One must

have access to a parent with known phase between A

and D. The marker A to

be tested

for linkage must have useful heterozygosity. The diagnosis of D must be unambiguous in

all the individuals tested. Note that failing to diagnose an individual who is carrying D (a

false negative) does not hurt the analysis,

since in this case the individual and

the

parent

are not scored. However, misclassifying an individual as carrying D instead of d (a false

positive) causes serious problems because it will weaken the evidence about which alleles

at other loci are cosegregating with D.

INTERVAL

MAPPING

Once a

genetic map is available for a region of

interest,

the

process

linkage

analysis

can be made more powerful by examining several markers simultaneously. We will con-

sider the simplest possible case, illustrated in Figure 6.19

a . As in the previous discussion

of simple linkage analysis, we will calculate the average contribution of the LOD

score

from a single, informative individual inheriting a disease allele D. We wish to

test

a re-

gion of the genome containing two linked loci with markers A and B to see if the disease

allele D lies between them or is unlinked. (Here we ignore the case that it might be linked

to A and B but lie outside them rather than between them.)

Suppose that the loci containing A and B are 20 cM apart. This is a reasonable model

for how human genetic maps are used in average regions of the genome.

0.2. First

we calculate the possible contributions from

a parent carrying D to a child,

also

carrying

D, if there is no linkage between A and B with D (Fig. 6.19

b ). Since A and B are on

the

same chromosome, D, if unlinked, they must

lie on a different chromosome. Assuming

that the parent is heterozygous and informative at all these loci, there are four possible

contributions from the parent to the child (Fig. 6.19

c ).

186

INTERVAL

MAPPING

187

If no recombination between A and B occurs (80% probability for markers 20 cM

apart), the child will either inherit ABD (0.4 odds) or abD (0.4 odds). If recombination

between A and B occurs (20% probability), the child will inherit either AbD (0.1 odds) or

aBD (0.1 odds).

If D is linked and

located between A and B,

assuming

the phase of the parent is

known, the two homologous chromosomes of the parent

carry alleles ADB and adb, as

shown in Figure 6.19

d . In principle, D may lie anywhere between A and B and the actual

position of D is a variable that must be included in the calculations. Here we will consider

the simple case where D lies midway between A and B. Assuming that the recombination

frequency is uniform in this

region of the chromosome, we can then place D 10 cM from

A and 10 cM from B (Fig. 6.19

e ). There are four possible sets of alleles that can be

passed from this parent to a child who inherits D (Fig. 6.19

f ). These are as follows:

ADB: resulting from no recombination between A and D, and no recombination be-

tween D and B (odds are 0.9

0.9).

ADb: resulting from no recombination between A and D but recombination has oc-

curred between D and B (odds are 0.9

0.1).

aDB: recombination

has occurred between A

and D, but

no recombination has oc-

curred between D abd B (odds are 0.1

0.9).

aDb (a double crossover event):

recombination has occurred both between A and D

and between D and B (odds are 0.1

0.1).

Thus the same four possible genotypes can

arise either with or without linkage.

However, the odds of particular genotypes vary

considerably in

the two

cases.

For

the

four possible offspring:

Alleles

ADB

ADb

aDB

aDb

Odds (linked/unlinked)

0.81/0.4

0.09/0.1

0.09/0.1 0.01/0.4

The ELOD for a single statistically representative child can be calculated from these re-

sults by realizing that if there is linkage, the probabilities of seeing the four patterns of al-

leles are 0.81, 0.09, 0.09, and 0.01, respectively. Thus the ELOD is given by

ELOD(0.2)

0.81 log

0.81

0.09 log

0.09

0.09 log

0.09

0.4

0.1

0.01 log

0.01

0.4

ELOD(0.2)

0.25 0.004 0.004 0.02 0.23


Figure 6.19	Interval mapping to test the hypothesis that a disease allele D is located equidistant
between two linked markers A and B, separated by 20 cM.			(a) Map of the test region.	(b) Parental
chromosomes if D is unlinked to A and B.		(c) Possible chromosomes inherited by an offspring car-
rying the disease allele in the absence of linkage.			(d) Parental chromosomes if D lies between A and
B. (e) Map location assumed for D for the example calculated in the text.			(f) Possible parental con-
tributions to an offspring inheriting the disease allele.

188

GENETIC

ANALYSIS

Note that this ELOD is larger in the case of interval mapping than in the simple case of

linkage analysis we considered earlier. The number of informative individuals that would

have to be examined to achieve a LOD score of 3 would be 3/0.23

14.

FINDING GENES BY GENETIC MAPPING

What is done, in practice, is to repeat the kinds of calculations previously described with

all

possible

values of

a variable using actual genotype data from real

families. For

simple linkage analysis the sorts of results obtained are shown schematically in Figure

6.20. These yield

the expected LOD score as a function of

. The critical results are the

maximum

LOD

value

and the conﬁdence limits on possible values of

. With interval

mapping, the results are more complex, but the basic kind of information obtained is sim-

ilar, as shown by the example in Figure 6.21. For details, see Ott (1991) and Lalouel and

White (1966).

In a typical case, no a priori information exists about the putative location of a gene of

interest. To have a reasonable chance of ﬁnding it, one must test the hypothesis that it lies

near (or between) any of about 150 informative markers. This will subdivide the genome

into

intervals

spaced

about 20 cM apart. Each marker must be tested with a sufﬁcient

number of informative individuals to achieve a LOD score of 3.0 or higher if that particu-

lar

marker is

linked

to the gene. With present technology this search is often carried out

one

marker

and

one

individual

time. It is easy to estimate that around 150

markers

60 individuals

(parents and offspring) must be tested in

ideal

cases

where parental phase is known and markers are very informative. If the analysis is carried

out by ordinary Southern blotting (Chapter 3) of DNA bands 6000 to 12,000 gel elec-

trophoresis lanes have to be examined by hybridization to afford a reasonable chance of

ﬁnding a gene, and this is an ideal case! Schemes have recently been described that can

reduce the workload by an order of magnitude through the use of pools of samples

(Churchill et al., 1993; see also Chapter 9 for examples of the power of pooling).

If a LOD score of 3.0 or greater is achieved, there is

a reasonable chance

that

the

correct

location

the

disease

gene

interest has been

found.

What

is usually done

is to celebrate, publish a preliminary report, and fend off overoptimistic members of the

press or families segregating the disease

of interest who confuse the ﬁrst sighting of

gene location with the identiﬁcation of the actual disease gene itself. Knowing the loca-

tion of a disease gene

does provide improved diagnostics for

the disease

but, initially,

only in those families where the phase of the disease allele and nearby markers is known.

Figure 6.20 LOD score for linkage of two genes, with a particular recombination frequency, that would be seen in a typical set of family inheritance data.

FINDING GENES BY GENETIC MAPPING

189

Figure 6.21 Example of interval mapping data. Shown is the expected LOD score [log(odds)] as a function of the possible location of a gene within the interval mapped by two known linked genes.

(Adapted from Leppert et al., 1987).

Furthermore, at a 10 cM distance,	the amount of recombination between the marker and
the disease allele in each meiosis	is still 10%, so the	accuracy of any genetic testing is
quite limited. More accurate approaches are outlined in Box		6.3 and Box 6.4.

BOX 6.3

MULTIPOINT MAPPING

More accurate genetic maps can be constructed by considering all the loci simultane-

ously rather than just dealing with pairs of loci. In this case what one establishes, primarily, is the order of the loci and the relative odds in favor of that order based on the sum of all the available data. In principle, one can write down all possible genetic maps and calculate the relative likelihood of each being correct in the context of the

available data. In practice, it is usually quite tedious to do this. Instead, as shown in Figure 6.22, one usually plots the most likely map, and gives the relative odds that the order of each successive pair of markers is reversed from the true order.

Figure 6.22 Typical map data by multipoint analysis. Shown are the relative odds in favor of two orderings of the markers A, B, C, and D.


190		GENETIC		ANALYSIS
	The	next goals are to strengthen the evidence for linkage and narrow				the		putative
location of the gene. Additional examples of affected individuals can be examined using
only	the	closest	known		markers. If this increases the LOD score, there	is	little doubt
that the gene location has been correctly identiﬁed. Once the interval containing the gene
is known,		one can	look		for additional markers in the region of interest. Various		methods
to ﬁnd polymorphic markers in selected DNA regions will be described later. These meth-
ods	are quite powerful				so long as the region is actually polymorphic in the population.
Note that once the approximate gene location is found, the markers used to reﬁne that lo-
cation need not be informative in all patients in the sample. What is key is to ﬁnd particu-
lar individuals who demonstrate recombination between the disease gene and						nearby
markers. Until linkage was established such individuals actually weakened the search be-
cause there was no way					of knowing a priori that they were recombinants, and	thus,		as
shown in earlier examples, they subtracted from the expected LOD score. Once the gene
is known		to be nearby,		such individuals can be recognized as recombinants and		properly

scored as shown by the example in Figure 6.23. Just two informative individuals with recombination events deﬁned by their haplotypes (patterns of alleles on a single chromosome) are sufﬁcient to pinpoint the location of the disease gene, barring the unlikely occurrence of a gene conversion or double crossover.

MOVING FROM WEAK LINKAGE CLOSER TO				A	GENE
Failure		to ﬁnd a linked marker in an		initial test does not mean that			no marker		is	linked
to	the	gene. A disease gene must lie			somewhere	in the genome. A	possibility		is	that
the model for inheritance used in				the linkage study was wrong.			One	must	consider
dominant		and recessive	inheritance as	well more complex cases where			multiple alleles
or	even	multiple genes	are involved.	It is very		tempting in cases	where	the	maximum

LOD score obtained is less than 3.0 to review individual families contributing to the LOD score and ask if the score can be improved by dropping some of the families. This implicitly challenges the diagnosis in these families or presumes that the disease is heteroge- neous—that it is inﬂuenced by other factors in addition to the particular gene in question.

Figure 6.23 Examples of two recombinant genotypes seen from a parent with known phase. Once the disease allele D is known to lie in this region, the genotype of the two recombinants restricts the possible location of the disease gene to between markers b and c.

					LINKAGE DISEQUILIBRIUM	191
This is a very dangerous practice,		since if one starts with a sufﬁcient number of families,
it will almost always to possible to achieve an alluring LOD score by selectively choosing
among them. Clearly the appropriate statistical tests must be employed to discount the re-
sulting LOD score against such selective manipulation of the data. The real issue is not
whether one can increase a LOD score by dropping a family with a negative contribution.
The issue is whether the magnitude of the increase in LOD is sufﬁcient to justify the ad-
ditional parameterization implicit in dropping this family. A much safer procedure is to
collect more families and try additional markers near the ones that have already shown a
hint of linkage if not yet compelling evidence for linkage. When this has been done, some
LODs of 2.0 eventually have produced the desired gene; others have faded into oblivion.
Eventually genetic linkage studies may narrow down the location of a gene to a 2 cM re-
gion. However, in such an interval of the genome, there may be a single gene or more than
80. It is very difﬁcult to use conventional linkage analysis to narrow the location further. The
available families are likely to have only a limited number of recombination events in the re-
gion of interest because they represent just a few generations, which means any recombina-
tions seen must have occurred recently. A 2 cM localization means that already 50 informa-
tive meioses have been found. It is		usually not efﬁcient to keep gathering more families at
this point, although it is efﬁcient to keep trying to ﬁnd additional informative markers, since
these can narrow down the location of any recombination events.
LINKAGE	DISEQUILIBRIUM
In fortunate cases, a variant on linkage analysis can be used to home in on the likely loca-
tion of a disease gene once it has		been assigned to a mapped	region of	a	chromosome.
Suppose	that most affected individuals	have the same disease allele	D. This	is	the case, for

example, with the Huntington’s disease individuals who live near Lake Maricaibo in

Venezuela; it	is also the case with individuals affected with	sickle cell disease, and with
most individuals	of northern European descent afﬂicted with severe	cystic ﬁbrosis. In such

cases it is possible that the disease is the result of a founder effect: all affected individuals have inherited the same disease allele-carrying chromosome from a common progenitor.

(When no evidence for a single disease allele exists, but phenotypic variation in the disease is evident, one can try to subtype the disease by severity, age of onset, particular symptoms, and test the presumption that, for this subtype, a founder effect may exist.)

If	a disease	allele	arose once		by mutation on		a single chromosome, it will be created
in the	context of	a particular		haplotype		(Fig.	6.24). The chromosome that ﬁrst carries
the disease will have a particular set of polymorphic							markers. It will have a particular ge-
netic background. As		this	chromosome	is	passed	through	many generations of offspring, it

Figure 6.24 Generation of a disease allele by a mutation on a founder haplotype sets the stage for linkage disequilibrium.

192

GENETIC ANALYSIS

will suffer frequent meiotic recombination events. These will tend to blur the memory of the

original haplotype of the chromosome; they will average out the original genetic background

with the general distribution of markers in the human population. However, those

markers

very close to the disease gene will tend, more likely than average, to retain the haplotype of

the original chromosome because, as the distance to the disease gene shrinks, it becomes less

likely that recombination events will have occurred in this particular location.

Humans are an outbred population.

Most alleles

were

established when

the

species

was established, and a sufﬁcient number of generations

has

passed

since

then

that

fre-

quent recombination events have occurred between any pair

neighboring

loci resolved

on our genetic maps. For this reason the distribution of particular haplotypes in neighbor-

ing loci in the population (as opposed to particular families) should be close to random.

Consider the case shown in Figure 6.25, for two neighboring loci with two alleles each.

Within the population, the frequencies

X of

the

alleles

particular

locus must sum

1.0.

X a

X A

1.0

X b

X B

1.0

The frequencies of particular haplotypes,

f, should be given by simple binomial statistics:

fAB X A X B

fAb

X A X b

faB X a X B

fab X a X b

Deviations from these results, measured,

for example,

fAB

observed

fAB

calculated,

are evidence for linkage disequilibrium, and they indicate that

the

individuals

examined

are not a random sample of the population. Note, however, that deviation of allele fre-

quencies from those expected by binomial statistics may

have other causes besides ge-

netic linkage. Deviations can reﬂect improper sampling of the population, or they can re-

ﬂect actual functional association between speciﬁc alleles. The latter process could occur,

for example, if the protein products of the two genes in question actually interacted bio-

chemically. (For further discussion see Ott, 1991.)

To search for a gene by linkage disequilibrium, one does not examine families segre-

gating a disease allele D. Instead, one looks across a broad spectrum of the population for

unrelated individuals who have the disease allele D. If evidence

for

linkage

disequilib-

rium is found, it reﬂects recombinations

along the chromosome all the way

back

time

to the original founder. Since this may extend back hundreds of years, more than ten gen-

erations may be involved, and thus the number of recombination

events

viewed

will

much greater than possible with any contemporary family. In the case of linkage disequi-

librium, we expect to see the general results shown in Figure 6.26. There will be a gradi-

ent of increasing deviation from equilibrium as the neighborhood of the disease gene is

reached because of the diminishing likelihood of recombination

events

occurring

ever-shrinking region.

Figure 6.25 Possible haplotypes in a two-allele system used to examine whether loci are at equilibrium.

COMPLICATIONS IN LINKAGE DISEQUILIBRIUM AND GENETIC MAPS IN GENERAL

193

Figure 6.26 Gradient of linkage disequilibrium seen near a disease allele in a case where a founder effect occurred.

COMPLICATIONS IN LINKAGE DISEQUILIBRIUM AND

GENETIC MAPS IN GENERAL

The human genome is a potential mineﬁeld of uncharted genetic events, hidden rearrangements, new mutations, and genetic heterogeneity. Failure to see linkage disequilibrium near a gene does not mean that the gene is far away. Two of the most plausible potential complications are the existence of more than one founder or the existence of a signiﬁcant fraction of alleles in the population that have arisen by new mutations. For example, in the case of dominant lethal diseases (those in which, nominally, the affected individuals have no offspring), one must expect that most disease alleles will be new mutations. Multiple founders can occur in distinct geographical populations, and they can be tested for by subdividing the linkage disequilibrium analysis accordingly. However, our increasingly mobile population, at least in developed countries, will make such analyses increasingly difﬁcult.

Two other reasonable explanations for a failure to see linkage disequilibrium near a disease gene of interest are shown in Figure 6.27. The ﬁrst of these is the possible presence of recombination hot spots. If the recombination pattern in the region of interest is punctate, then an even gradient of linkage disequilibrium will not be seen. Instead, mark-

ers that lie within a pair of hot spots will appear to be in disequilibrium, while those that lie on opposite sides of a hot spot will appear to have equilibrated. The occurrence of any disequilibrium in the region is presumptive evidence that a disease gene is there, since this is the basis for selection of the particular set of individuals to be examined. However, the complex pattern of allele statistics in the region will make it difﬁcult to narrow in on the location of the disease gene.

Figure 6.27	complications	that can obscure evidence for linkage disequilibrium.	(a) Recombina-
tion hotspots near the disease gene.		(b) Mutation hot spots near the disease gene.

194 GENETIC ANALYSIS

A second potential source of confusion is the presence of mutation hotspots. These are quite common in the human genome. For example, the sequence CpG is quite mutagenic

in those regions of the genome where the C is methylated, as discussed in Chapter 1. When mutation hotspots are present, these alleles appear to have equilibrated with their neighbors, while more distant pairs of alleles may still show deviations from equilibrium.

As in the case of recombination hotspots, disequilibrium indicates that one has not sampled the population randomly. This is presumptive evidence for a disease gene nearby, but

mutation hotspots weaken the	power of the disequilibrium approach to actually focus in
on the location of the desired gene.
DISTORTIONS IN THE GENETIC	MAP

We have already discussed brieﬂy the occurrence of recombination hot spots and their deleterious effect on attempts to ﬁnd genes by linkage disequilibrium. Some hot spots are inherited; in the mouse Major Histocompatibility Complex (MHC), a set of genes that

regulates	immune response, a	hot spot allele	has been	found	that raises the local fre-
quency of recombination by a		hundredfold. All	of the recombination events caused by
this hot	spot have been mapped	within the second intron		of the	E

While we are not sure what has caused this hot spot, the region has been sequenced, and one peculiarity is the occurrence of four sequences with 9/11 bases equal to a consensus sequence TGGAAATCCCC. Such sequences have also been found in regions associated

with other recombination hot spots.

The genetic map of the human, and other organisms is not uniform. Recombination is generally higher near the telomers and lower near centromeres. The map is strikingly different in males and females—that is, meiosis in males and females appears to display a very different pattern of recombination hot spots. A typical example is shown for a selected region of human chromosome 1 in Figure 6.28. Note that some regions that have short genetic distances

in the female have long distances in the male, and vice versa. Genetic linkage analysis is more powerful in regions where recombination is prevalent because, the more recombinants

per Mb, the more ﬁnely the genetic data will serve to subdivide the region. In general, genetic maps based on female meioses are considerably longer than those based on male meioses.

This is summarized in Table 6.1. A frequent practice is to pool data and show a sex-averaged genetic map. It is not very clear that this is a reasonable thing to do. Instead, it would seem that once a region of interest has been selected, meioses should be chosen from either the female or the male depending on which set produces the most expanded and informative map

of the region. At present it does not appear that most workers pay much attention to this.

b gene, 4.3 kb in size.

Figure 6.28 Comparison of low-resolution genetic maps in female and male meiosis. Shown is a portion of the map of human chromosome 1.

<<< < Предыдущая 1 23 / 53 4 5 > Следующая >>>

Соседние файлы в папке genomics1-10

#
17.08.2013456.46 Кб5810.pdf
#
17.08.2013435.19 Кб632.pdf
#
17.08.2013343.56 Кб583.pdf
#
17.08.2013296.13 Кб584.pdf
#
17.08.2013326.85 Кб585.pdf
#
17.08.2013406.31 Кб586.pdf
#
17.08.2013277.57 Кб587.pdf
#
17.08.2013634.83 Кб608.pdf
#
17.08.2013475.69 Кб599.pdf
#
17.08.2013192.47 Кб60booktext[1].pdf