Добавил:
Опубликованный материал нарушает ваши авторские права? Сообщите нам.
Вуз: Предмет: Файл:
Скачиваний:
55
Добавлен:
17.08.2013
Размер:
406.31 Кб
Скачать

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

INTERVAL MAPPING

185

Thus the average contribution from one child to the ELOD is the

sum

of these two

 

 

 

 

cases weighted by their expected frequency.

Since

recombination

across

10

 

cM

 

occurs

 

 

 

 

only 10% of the time,

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

ELOD(0.1)

0.9 log

 

0.9

0.1 log

 

0.1

 

 

 

0.5

0.5

 

 

ELOD(0.1)

0.23 0.07 0.16

 

Thus observation of cosegregation of A and D adds to the probability of linkage, while

 

 

 

 

observation of separation of A and D subtracts from the evidence for linkage.

 

 

 

 

 

 

 

 

 

What we need to do is develop the tools to assess the statistical significance of a par-

 

 

 

 

ticular ELOD score. Since some markers

will appear to cosegregate by

chance

in

any

 

 

 

 

study with a relatively small number of affected individuals, there is always a chance of

 

 

 

 

seeing a significantly positive LOD score, simply because of the random fluctuations. A

 

 

 

 

near consensus in human genetics is that an observed LOD of 3.0 or higher is required

 

 

 

 

before the probability of purely accidental linkage can be reduced to the point where few

 

 

 

 

errors are made. For the example just described, the number of individuals segregating D

 

 

 

 

with unambiguous pedigrees that would have to be combined to generate a LOD score of

 

 

 

 

 

3.0 can be estimated as 3/0.16

18. For common inherited diseases this is not a problem,

 

but for very rare diseases it may be extremely difficult to find 18 genetically informative

 

 

 

 

individuals for a particular marker with an unambiguous diagnosis.

 

 

 

 

 

 

 

 

 

 

 

 

 

Note that several constraints apply to the linkage analysis described above. One must

 

 

 

 

have access to a parent with known phase between A

and D. The marker A to

be tested

 

 

 

 

for linkage must have useful heterozygosity. The diagnosis of D must be unambiguous in

 

 

 

 

all the individuals tested. Note that failing to diagnose an individual who is carrying D (a

 

 

 

 

false negative) does not hurt the analysis,

since in this case the individual and

the

parent

 

 

 

 

are not scored. However, misclassifying an individual as carrying D instead of d (a false

 

 

 

 

positive) causes serious problems because it will weaken the evidence about which alleles

 

 

 

 

at other loci are cosegregating with D.

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

INTERVAL

MAPPING

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

Once a

genetic map is available for a region of

interest,

the

process

of

 

linkage

analysis

 

 

 

 

can be made more powerful by examining several markers simultaneously. We will con-

 

 

 

 

 

sider the simplest possible case, illustrated in Figure 6.19

 

 

 

 

 

 

 

 

a . As in the previous discussion

of simple linkage analysis, we will calculate the average contribution of the LOD

score

 

 

 

 

from a single, informative individual inheriting a disease allele D. We wish to

test

a re-

 

 

 

 

gion of the genome containing two linked loci with markers A and B to see if the disease

 

 

 

 

allele D lies between them or is unlinked. (Here we ignore the case that it might be linked

 

 

 

 

to A and B but lie outside them rather than between them.)

 

 

 

 

 

 

 

 

 

 

 

 

 

 

Suppose that the loci containing A and B are 20 cM apart. This is a reasonable model

 

 

 

 

for how human genetic maps are used in average regions of the genome.

 

 

 

 

 

 

 

 

 

 

AB

0.2. First

we calculate the possible contributions from

a parent carrying D to a child,

also

 

carrying

 

 

 

 

D, if there is no linkage between A and B with D (Fig. 6.19

 

 

 

 

 

 

 

 

 

b ). Since A and B are on

the

same chromosome, D, if unlinked, they must

lie on a different chromosome. Assuming

 

 

 

 

 

that the parent is heterozygous and informative at all these loci, there are four possible

 

contributions from the parent to the child (Fig. 6.19

 

 

 

 

 

 

c ).

 

 

 

 

 

186

 

 

 

 

 

 

 

 

 

 

 

 

 

INTERVAL

MAPPING

187

If no recombination between A and B occurs (80% probability for markers 20 cM

 

 

 

apart), the child will either inherit ABD (0.4 odds) or abD (0.4 odds). If recombination

 

 

 

between A and B occurs (20% probability), the child will inherit either AbD (0.1 odds) or

 

 

 

aBD (0.1 odds).

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

If D is linked and

located between A and B,

assuming

the phase of the parent is

 

 

 

known, the two homologous chromosomes of the parent

carry alleles ADB and adb, as

 

 

 

 

 

shown in Figure 6.19

d . In principle, D may lie anywhere between A and B and the actual

 

 

 

position of D is a variable that must be included in the calculations. Here we will consider

 

 

 

the simple case where D lies midway between A and B. Assuming that the recombination

 

 

 

 

 

frequency is uniform in this

region of the chromosome, we can then place D 10 cM from

 

 

 

 

 

A and 10 cM from B (Fig. 6.19

 

e ). There are four possible sets of alleles that can be

 

passed from this parent to a child who inherits D (Fig. 6.19

 

 

 

 

 

f ). These are as follows:

 

ADB: resulting from no recombination between A and D, and no recombination be-

 

 

 

tween D and B (odds are 0.9

 

0.9).

 

 

 

 

 

 

 

 

 

ADb: resulting from no recombination between A and D but recombination has oc-

 

 

 

curred between D and B (odds are 0.9

 

 

 

 

0.1).

 

 

 

 

 

 

 

aDB: recombination

has occurred between A

and D, but

no recombination has oc-

 

 

 

curred between D abd B (odds are 0.1

 

 

 

 

 

0.9).

 

 

 

 

 

 

 

aDb (a double crossover event):

 

 

recombination has occurred both between A and D

 

 

 

and between D and B (odds are 0.1

 

 

 

 

0.1).

 

 

 

 

 

 

 

 

Thus the same four possible genotypes can

arise either with or without linkage.

 

 

 

However, the odds of particular genotypes vary

considerably in

the two

cases.

For

the

 

 

 

four possible offspring:

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

Alleles

 

 

 

ADB

ADb

aDB

 

aDb

 

 

 

Odds (linked/unlinked)

0.81/0.4

 

0.09/0.1

0.09/0.1 0.01/0.4

 

The ELOD for a single statistically representative child can be calculated from these re-

 

 

 

sults by realizing that if there is linkage, the probabilities of seeing the four patterns of al-

 

 

 

leles are 0.81, 0.09, 0.09, and 0.01, respectively. Thus the ELOD is given by

 

 

 

ELOD(0.2)

0.81 log

 

0.81

0.09 log

 

0.09

0.09 log

 

0.09

 

0.4

0.1

0.1

 

 

0.01 log

 

0.01

 

 

 

 

 

 

 

 

 

 

 

0.4

 

 

 

 

 

 

 

 

ELOD(0.2)

0.25 0.004 0.004 0.02 0.23

 

 

 

Figure 6.19

Interval mapping to test the hypothesis that a disease allele D is located equidistant

 

between two linked markers A and B, separated by 20 cM.

 

(a) Map of the test region.

(b) Parental

chromosomes if D is unlinked to A and B.

(c) Possible chromosomes inherited by an offspring car-

 

rying the disease allele in the absence of linkage.

 

(d) Parental chromosomes if D lies between A and

 

B. (e) Map location assumed for D for the example calculated in the text.

(f) Possible parental con-

tributions to an offspring inheriting the disease allele.

 

 

 

188

 

GENETIC

ANALYSIS

 

 

 

 

 

 

 

 

Note that this ELOD is larger in the case of interval mapping than in the simple case of

 

linkage analysis we considered earlier. The number of informative individuals that would

 

 

have to be examined to achieve a LOD score of 3 would be 3/0.23

 

 

14.

FINDING GENES BY GENETIC MAPPING

 

 

 

 

 

 

 

What is done, in practice, is to repeat the kinds of calculations previously described with

 

 

all

possible

values of

 

as

a variable using actual genotype data from real

families. For

simple linkage analysis the sorts of results obtained are shown schematically in Figure

 

 

6.20. These yield

the expected LOD score as a function of

 

 

. The critical results are the

maximum

LOD

value

and the confidence limits on possible values of

 

 

 

. With interval

mapping, the results are more complex, but the basic kind of information obtained is sim-

 

 

ilar, as shown by the example in Figure 6.21. For details, see Ott (1991) and Lalouel and

 

White (1966).

 

 

 

 

 

 

 

 

 

 

 

 

In a typical case, no a priori information exists about the putative location of a gene of

 

interest. To have a reasonable chance of finding it, one must test the hypothesis that it lies

 

near (or between) any of about 150 informative markers. This will subdivide the genome

 

 

into

intervals

spaced

about 20 cM apart. Each marker must be tested with a sufficient

 

number of informative individuals to achieve a LOD score of 3.0 or higher if that particu-

 

lar

marker is

linked

to the gene. With present technology this search is often carried out

 

one

marker

and

one

individual

at

a

time. It is easy to estimate that around 150

markers

 

40

to

60 individuals

(parents and offspring) must be tested in

ideal

cases

where parental phase is known and markers are very informative. If the analysis is carried

 

 

out by ordinary Southern blotting (Chapter 3) of DNA bands 6000 to 12,000 gel elec-

trophoresis lanes have to be examined by hybridization to afford a reasonable chance of

 

 

finding a gene, and this is an ideal case! Schemes have recently been described that can

 

 

reduce the workload by an order of magnitude through the use of pools of samples

 

 

(Churchill et al., 1993; see also Chapter 9 for examples of the power of pooling).

 

 

 

If a LOD score of 3.0 or greater is achieved, there is

a reasonable chance

that

the

correct

location

of

the

disease

gene

of

interest has been

found.

What

is usually done

 

is to celebrate, publish a preliminary report, and fend off overoptimistic members of the

 

 

press or families segregating the disease

of interest who confuse the first sighting of

a

 

gene location with the identification of the actual disease gene itself. Knowing the loca-

 

tion of a disease gene

does provide improved diagnostics for

the disease

but, initially,

 

only in those families where the phase of the disease allele and nearby markers is known.

 

 

Figure 6.20 LOD score for linkage of two genes, with a particular recombination frequency, that would be seen in a typical set of family inheritance data.

FINDING GENES BY GENETIC MAPPING

189

Figure 6.21 Example of interval mapping data. Shown is the expected LOD score [log(odds)] as a function of the possible location of a gene within the interval mapped by two known linked genes.

(Adapted from Leppert et al., 1987).

Furthermore, at a 10 cM distance,

the amount of recombination between the marker and

the disease allele in each meiosis

is still 10%, so the

accuracy of any genetic testing is

quite limited. More accurate approaches are outlined in Box

6.3 and Box 6.4.

BOX 6.3

MULTIPOINT MAPPING

More accurate genetic maps can be constructed by considering all the loci simultane-

ously rather than just dealing with pairs of loci. In this case what one establishes, primarily, is the order of the loci and the relative odds in favor of that order based on the sum of all the available data. In principle, one can write down all possible genetic maps and calculate the relative likelihood of each being correct in the context of the

available data. In practice, it is usually quite tedious to do this. Instead, as shown in Figure 6.22, one usually plots the most likely map, and gives the relative odds that the order of each successive pair of markers is reversed from the true order.

Figure 6.22 Typical map data by multipoint analysis. Shown are the relative odds in favor of two orderings of the markers A, B, C, and D.

190

GENETIC

ANALYSIS

 

 

 

 

The

next goals are to strengthen the evidence for linkage and narrow

the

putative

location of the gene. Additional examples of affected individuals can be examined using

only

the

closest

known

markers. If this increases the LOD score, there

is

little doubt

that the gene location has been correctly identified. Once the interval containing the gene

is known,

one can

look

for additional markers in the region of interest. Various

methods

to find polymorphic markers in selected DNA regions will be described later. These meth-

ods

are quite powerful

so long as the region is actually polymorphic in the population.

Note that once the approximate gene location is found, the markers used to refine that lo-

cation need not be informative in all patients in the sample. What is key is to find particu-

lar individuals who demonstrate recombination between the disease gene and

nearby

markers. Until linkage was established such individuals actually weakened the search be-

cause there was no way

of knowing a priori that they were recombinants, and

thus,

as

shown in earlier examples, they subtracted from the expected LOD score. Once the gene

is known

to be nearby,

such individuals can be recognized as recombinants and

properly

scored as shown by the example in Figure 6.23. Just two informative individuals with recombination events defined by their haplotypes (patterns of alleles on a single chromosome) are sufficient to pinpoint the location of the disease gene, barring the unlikely occurrence of a gene conversion or double crossover.

MOVING FROM WEAK LINKAGE CLOSER TO

A

GENE

 

 

 

 

 

Failure

to find a linked marker in an

initial test does not mean that

no marker

is

linked

to

the

gene. A disease gene must lie

somewhere

in the genome. A

possibility

is

that

the model for inheritance used in

the linkage study was wrong.

One

must

consider

dominant

and recessive

inheritance as

well more complex cases where

multiple alleles

or

even

multiple genes

are involved.

It is very

tempting in cases

where

the

maximum

LOD score obtained is less than 3.0 to review individual families contributing to the LOD score and ask if the score can be improved by dropping some of the families. This implicitly challenges the diagnosis in these families or presumes that the disease is heteroge- neous—that it is influenced by other factors in addition to the particular gene in question.

Figure 6.23 Examples of two recombinant genotypes seen from a parent with known phase. Once the disease allele D is known to lie in this region, the genotype of the two recombinants restricts the possible location of the disease gene to between markers b and c.

 

 

 

 

 

LINKAGE DISEQUILIBRIUM

191

This is a very dangerous practice,

since if one starts with a sufficient number of families,

 

it will almost always to possible to achieve an alluring LOD score by selectively choosing

 

among them. Clearly the appropriate statistical tests must be employed to discount the re-

 

sulting LOD score against such selective manipulation of the data. The real issue is not

 

whether one can increase a LOD score by dropping a family with a negative contribution.

 

The issue is whether the magnitude of the increase in LOD is sufficient to justify the ad-

 

ditional parameterization implicit in dropping this family. A much safer procedure is to

 

collect more families and try additional markers near the ones that have already shown a

 

hint of linkage if not yet compelling evidence for linkage. When this has been done, some

 

LODs of 2.0 eventually have produced the desired gene; others have faded into oblivion.

 

Eventually genetic linkage studies may narrow down the location of a gene to a 2 cM re-

 

gion. However, in such an interval of the genome, there may be a single gene or more than

 

80. It is very difficult to use conventional linkage analysis to narrow the location further. The

 

available families are likely to have only a limited number of recombination events in the re-

 

gion of interest because they represent just a few generations, which means any recombina-

 

tions seen must have occurred recently. A 2 cM localization means that already 50 informa-

 

tive meioses have been found. It is

usually not efficient to keep gathering more families at

 

this point, although it is efficient to keep trying to find additional informative markers, since

 

these can narrow down the location of any recombination events.

 

 

 

 

LINKAGE

DISEQUILIBRIUM

 

 

 

 

 

In fortunate cases, a variant on linkage analysis can be used to home in on the likely loca-

 

tion of a disease gene once it has

been assigned to a mapped

region of

a

chromosome.

 

Suppose

that most affected individuals

have the same disease allele

D. This

is

the case, for

 

example, with the Huntington’s disease individuals who live near Lake Maricaibo in

Venezuela; it

is also the case with individuals affected with

sickle cell disease, and with

most individuals

of northern European descent afflicted with severe

cystic fibrosis. In such

cases it is possible that the disease is the result of a founder effect: all affected individuals have inherited the same disease allele-carrying chromosome from a common progenitor.

(When no evidence for a single disease allele exists, but phenotypic variation in the disease is evident, one can try to subtype the disease by severity, age of onset, particular symptoms, and test the presumption that, for this subtype, a founder effect may exist.)

If

a disease

allele

arose once

by mutation on

a single chromosome, it will be created

in the

context of

a particular

haplotype

(Fig.

6.24). The chromosome that first carries

the disease will have a particular set of polymorphic

markers. It will have a particular ge-

netic background. As

this

chromosome

is

passed

through

many generations of offspring, it

Figure 6.24 Generation of a disease allele by a mutation on a founder haplotype sets the stage for linkage disequilibrium.

192

GENETIC ANALYSIS

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

will suffer frequent meiotic recombination events. These will tend to blur the memory of the

 

 

 

 

original haplotype of the chromosome; they will average out the original genetic background

 

 

 

 

 

 

with the general distribution of markers in the human population. However, those

markers

 

 

 

 

very close to the disease gene will tend, more likely than average, to retain the haplotype of

 

 

 

 

the original chromosome because, as the distance to the disease gene shrinks, it becomes less

 

 

 

 

 

likely that recombination events will have occurred in this particular location.

 

 

 

 

 

 

 

 

 

 

 

Humans are an outbred population.

Most alleles

were

established when

the

 

species

 

 

 

 

was established, and a sufficient number of generations

has

passed

since

then

that

fre-

 

 

 

 

quent recombination events have occurred between any pair

of

neighboring

loci resolved

 

 

 

 

 

on our genetic maps. For this reason the distribution of particular haplotypes in neighbor-

 

 

 

 

ing loci in the population (as opposed to particular families) should be close to random.

 

 

 

 

Consider the case shown in Figure 6.25, for two neighboring loci with two alleles each.

 

 

 

 

Within the population, the frequencies

 

 

 

 

X of

the

alleles

at

a

particular

locus must sum

to

 

1.0.

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

X a

X A

1.0

 

X b

X B

1.0

 

 

 

 

The frequencies of particular haplotypes,

 

 

 

 

 

f, should be given by simple binomial statistics:

 

 

fAB X A X B

fAb

 

X A X b

faB X a X B

fab X a X b

 

 

Deviations from these results, measured,

for example,

as

 

 

 

 

 

 

 

 

 

 

fAB

observed

fAB

calculated,

are evidence for linkage disequilibrium, and they indicate that

the

individuals

examined

 

 

 

 

are not a random sample of the population. Note, however, that deviation of allele fre-

 

 

 

 

quencies from those expected by binomial statistics may

have other causes besides ge-

 

 

 

 

netic linkage. Deviations can reflect improper sampling of the population, or they can re-

 

 

 

 

flect actual functional association between specific alleles. The latter process could occur,

 

 

 

 

for example, if the protein products of the two genes in question actually interacted bio-

 

 

 

 

chemically. (For further discussion see Ott, 1991.)

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

To search for a gene by linkage disequilibrium, one does not examine families segre-

 

 

 

 

gating a disease allele D. Instead, one looks across a broad spectrum of the population for

 

 

 

 

unrelated individuals who have the disease allele D. If evidence

for

linkage

disequilib-

 

 

 

 

rium is found, it reflects recombinations

along the chromosome all the way

back

 

in

time

 

 

 

 

to the original founder. Since this may extend back hundreds of years, more than ten gen-

 

 

 

 

erations may be involved, and thus the number of recombination

events

viewed

will

be

 

 

 

 

much greater than possible with any contemporary family. In the case of linkage disequi-

 

 

 

 

librium, we expect to see the general results shown in Figure 6.26. There will be a gradi-

 

 

 

ent of increasing deviation from equilibrium as the neighborhood of the disease gene is

 

 

 

 

reached because of the diminishing likelihood of recombination

events

occurring

in

an

 

 

 

 

ever-shrinking region.

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

Figure 6.25 Possible haplotypes in a two-allele system used to examine whether loci are at equilibrium.

COMPLICATIONS IN LINKAGE DISEQUILIBRIUM AND GENETIC MAPS IN GENERAL

193

Figure 6.26 Gradient of linkage disequilibrium seen near a disease allele in a case where a founder effect occurred.

COMPLICATIONS IN LINKAGE DISEQUILIBRIUM AND

GENETIC MAPS IN GENERAL

The human genome is a potential minefield of uncharted genetic events, hidden rearrangements, new mutations, and genetic heterogeneity. Failure to see linkage disequilibrium near a gene does not mean that the gene is far away. Two of the most plausible potential complications are the existence of more than one founder or the existence of a significant fraction of alleles in the population that have arisen by new mutations. For example, in the case of dominant lethal diseases (those in which, nominally, the affected individuals have no offspring), one must expect that most disease alleles will be new mutations. Multiple founders can occur in distinct geographical populations, and they can be tested for by subdividing the linkage disequilibrium analysis accordingly. However, our increasingly mobile population, at least in developed countries, will make such analyses increasingly difficult.

Two other reasonable explanations for a failure to see linkage disequilibrium near a disease gene of interest are shown in Figure 6.27. The first of these is the possible presence of recombination hot spots. If the recombination pattern in the region of interest is punctate, then an even gradient of linkage disequilibrium will not be seen. Instead, mark-

ers that lie within a pair of hot spots will appear to be in disequilibrium, while those that lie on opposite sides of a hot spot will appear to have equilibrated. The occurrence of any disequilibrium in the region is presumptive evidence that a disease gene is there, since this is the basis for selection of the particular set of individuals to be examined. However, the complex pattern of allele statistics in the region will make it difficult to narrow in on the location of the disease gene.

Figure 6.27

complications

that can obscure evidence for linkage disequilibrium.

(a) Recombina-

tion hotspots near the disease gene.

(b) Mutation hot spots near the disease gene.

 

194 GENETIC ANALYSIS

A second potential source of confusion is the presence of mutation hotspots. These are quite common in the human genome. For example, the sequence CpG is quite mutagenic

in those regions of the genome where the C is methylated, as discussed in Chapter 1. When mutation hotspots are present, these alleles appear to have equilibrated with their neighbors, while more distant pairs of alleles may still show deviations from equilibrium.

As in the case of recombination hotspots, disequilibrium indicates that one has not sampled the population randomly. This is presumptive evidence for a disease gene nearby, but

mutation hotspots weaken the

power of the disequilibrium approach to actually focus in

on the location of the desired gene.

DISTORTIONS IN THE GENETIC

MAP

We have already discussed briefly the occurrence of recombination hot spots and their deleterious effect on attempts to find genes by linkage disequilibrium. Some hot spots are inherited; in the mouse Major Histocompatibility Complex (MHC), a set of genes that

regulates

immune response, a

hot spot allele

has been

found

that raises the local fre-

quency of recombination by a

hundredfold. All

of the recombination events caused by

this hot

spot have been mapped

within the second intron

of the

E

While we are not sure what has caused this hot spot, the region has been sequenced, and one peculiarity is the occurrence of four sequences with 9/11 bases equal to a consensus sequence TGGAAATCCCC. Such sequences have also been found in regions associated

with other recombination hot spots.

The genetic map of the human, and other organisms is not uniform. Recombination is generally higher near the telomers and lower near centromeres. The map is strikingly different in males and females—that is, meiosis in males and females appears to display a very different pattern of recombination hot spots. A typical example is shown for a selected region of human chromosome 1 in Figure 6.28. Note that some regions that have short genetic distances

in the female have long distances in the male, and vice versa. Genetic linkage analysis is more powerful in regions where recombination is prevalent because, the more recombinants

per Mb, the more finely the genetic data will serve to subdivide the region. In general, genetic maps based on female meioses are considerably longer than those based on male meioses.

This is summarized in Table 6.1. A frequent practice is to pool data and show a sex-averaged genetic map. It is not very clear that this is a reasonable thing to do. Instead, it would seem that once a region of interest has been selected, meioses should be chosen from either the female or the male depending on which set produces the most expanded and informative map

of the region. At present it does not appear that most workers pay much attention to this.

b gene, 4.3 kb in size.

Figure 6.28 Comparison of low-resolution genetic maps in female and male meiosis. Shown is a portion of the map of human chromosome 1.

Соседние файлы в папке genomics1-10