Ординатура / Офтальмология / Английские материалы / Eye, Retina, and Visual System of the Mouse_Chalupa, Williams_2008
.pdf54 The Mouse Eye Transcriptome:
Cellular Signatures, Molecular
Networks, and Candidate Genes
for Human Disease
ELDON E. GEISERT AND ROBERT W. WILLIAMS
The laboratory mouse occupies a unique position at the intersection of basic vision science and clinical medicine. The advantages of this species include its small size and rapid rate of reproduction, its extremely well-characterized genome, and the many well-honed techniques used to engineer gene variants, knockouts, and knock-ins. Temporal and spatial patterns of gene expression can be controlled with impressive precision in eye, retina, brain, and many other tissues of mice (see chapter 39, this volume). Human gene variants, usually those that cause disease, can be inserted into the mouse genome to study mechanisms of normal function, disease progression, and drug efficacy (Pennesi et al., 2003). In a dramatic recent example, Jacobs et al. (2007) inserted the human red opsin gene (OPN1LW) into mice, thereby converting this normally color-blind species into a functional trichromat able to discriminate between red and green. Even more radical engineering is practical. For example, it is now possible to produce so-called humanized mice in which the proteins in an organ such as the liver (Kondo et al., 2002) or the immune system (Shultz et al., 2007) are replaced through genetic engineering by human equivalents. One can only speculate that comparable work humanizing the mouse eye and retina is not far behind. The future holds the potential for having mouse models that will exactly recapitulate human disease affecting vision, allowing us the opportunity to define the fundamental defect in disease states and to develop and test new therapies to prevent the loss of sight.
There are also more subtle and perhaps equally important ways in which mice are having an impact on biomedical research and clinical medicine. Populations of diverse strains of mice can be used to model human genetic diversity. Although most biologists use mice to model actions of specific genes—each in isolation on a fixed genetic background (usually C57BL/6J)—a growing number of geneticists are using groups of different strains of mice, particularly off-
spring of the 16 strains that have been sequenced (Frazer et al., 2007), to model the complexity of human populations. Diverse strains of mice and their progeny can be used to test the utility and predictability of personalized medicine. Approximately 10 million common sequence variants, including single nucleotide polymorphisms (SNPs), insertions, deletions, duplications, and even retrotransposons, contribute to much of human biological diversity. Roughly the same types and numbers of genomic sequence variants account for phenotypic differences among common inbred strains of mice, which, unlike humans, can be raised in nearly identical environments. For example, the two oldest strains of mice, the glaucoma-prone DBA/2J and the comparatively normal C57BL/6J, differ at roughly 1.5 million SNPs, as well as an additional large number of microsatellites and copy number variants. It is now practical to use a relatively modest number of strains, from 10 to 100, to model the response diversity and wide spectrum of disease burden of complex human populations rather than just to model that action of a single gene in a single inbred strain. It is finally practical to exploit variation among strains as if it were a “natural” genomic manipulation (Williams et al., 1996). Variation of this type can provide insight into interactions among genes, transcripts, and proteins expressed in the eye, retina, and brain. This more holistic approach tends to scale better to the whole genome than do studies of single alleles on single genetic backgrounds.
In the first half of this chapter, we discuss the current state of mouse genomics with special reference to the expression of genes in the mouse eye. How many genes are expressed in different parts of the eye? What online resources are now available to track progress in eye genomics in mouse and human? How well are researchers able to use online resources to computationally annotate or infer gene function in the same way that protein function is often inferred by common structural motifs? In the second half of this chapter, we
659
return to the topic of the population genetics of the mouse eye and review a powerful new data set, the Hamilton Eye Institute Mouse Eye Database, which includes many analytical tools to study associations between variation in gene expression and functional differences in the eye, retina, and brain of a large number of highly diverse strains of mice.
Mouse and human genomes
Early in 2001, initial draft sequences of the human genome were published nearly simultaneously by the International Human Genome Sequencing Consortium (McPherson et al., 2001) and Celera Genomics (Venter et al., 2001). A few months after release of the human sequence, Celera assembled a commercial draft of the mouse genome that combined public and private sequence data from five strains of mice (A/J, C57BL/6J, DBA/2J, 129X1/SvJ, and 129S1/ SvImJ). The first public assembly of the genome of strain C57BL/6J was released late in 2002 (Waterston et al., 2002). The strong position of C57BL/6J as the most important and widely used inbred strain was reinforced by this public release. This accounts for the now almost obligatory step of transferring (or introgressing, to use the geneticist’s term) all interesting mutations into the C57BL/6 (B6 or “black-six”) genome.
The revelation of human and mouse genomes has catalyzed a large and unanticipated revolution in the production and use of online genomic databases. Use was initially confined mainly to molecular biologists and bioinformaticists. At present, however, a significant proportion of the biomedical research community has come to take these resources for granted as a vital research infrastructure on par with PubMed. Over the last few years, dynamic Web editions of human, mouse, rat, chimpanzee, dog, yeast, Drosophila, and C. elegans genomes at NCBI, UCSC, and Ensemble have had an enormous impact on the way biomedical research is conducted. These Web portals or knowledgebases represent dynamic encyclopedias that form a framework for most genetic, genomic, molecular, cellular, systems, and clinical research. Learning how to use and evaluate these resources is becoming as important as learning how to read and critique the literature.
Initial comparative analysis of mammalian genomes found that the mouse genome is slightly shorter than the human genome (2.6 billion nucleotides vs. 3.0 billion nucleotides), probably due to “trivial” differences in noncoding sequence and repetitive DNA. As expected, the mouse and human genomes contain very close to the same number of classic protein-coding genes, now estimated at 27,000 ± 3,000 in mammals. A large part of the mouse genome can be partitioned into chromosome strands that are roughly equivalent to homologous strands of human chromosomes, a feature that is referred to as conserved synteny (similar strings). Like
many interesting research efforts, our improved understanding of genome sequence and the process of transcription has revealed how much we now do not understand. While the genome is often referred to metaphorically as an encyclopedia, or even as a simple string of letters, it is actually a deviously clever and an incomplete patchwork of code. This code cannot be understood except in the context of a whole organism and its environmental history. Having sequence and global transcriptome data in hand has taught molecular biologists and bioinformaticists to be humble in their claims of having seen a path to a brave new postgenomic world. To give one example, we now have a much greater appreciation of the complex role of noncoding RNAs, micro-RNAs, guide RNAs, and antisense RNA sequences in regulating transcription and translation. There is now good evidence that more of the genome (perhaps as much as 30%), including intergenic “junk DNA,” is transcribed, and that only a fraction of transcripts are destined to be translated into mature protein. There is also strong evidence of far more alternative transcript use, with estimates that up to 80%– 90% of genes have two or more variants. The supposedly bland world of DNA and RNA has suddenly become much more of a jungle and may even rival the proteome in its intricacy and complexity (Kampa et al., 2004; Kapranov et al., 2007).
Sequence Differences Among Mice: A Great New Resource With genomic sequence data from five different inbred strains, Celera was able to systematically extract close to 3 million SNPs (Waterston et al., 2002; Lindblad-Toh et al., 2005). This resource introduced a new approach to genomics in which missense mutations and premature stop codons can be hunted down directly in databases. For example, among the 1.5 million SNPs that distinguish between the two oldest inbred strains of mice, C57BL/6J and DBA/2J, a subset of 42 is located in the tyrosine-related protein 1 (Tyrp1, also known as the brown locus), including two missense mutations that alter protein sequence (figure 54.1). This first wave of SNP variants for five strains was followed three years later (Frazer et al., 2007) by a massive amount of array-based sequence data generated by Perlegen Sciences for the National Institute for Environment Health Sciences (table 54.1). Perlegen used massive wafers covered with 25-nucleotide DNA strands (68 wafers per genome) to sequence precisely the same gene-rich 1.5 billion base pairs of DNA across a panel of 15 highly diverse strains, including four wild subspecies from which virtually all common strains trace their descent—WSB/EiJ (Mus musculus domesticus), PWD/PhJ (M. m. musculus), MOLF/EiJ (M. m. molossinus), and CAST/EiJ (M. m. castaneus).
The result of this massive effort is an open database that provides access to about 10 million high-quality SNPs that can be downloaded and sorted at several sites, including
660 advanced genomic technologies
the SNP browser at GeneNetwork, www.genenetwork.org/ webqtl/snpBrowser (see figure 54.1). Although these new SNPs have not garnered the attention of the original sequencing efforts, they are extremely powerful resources. The millions of SNPs that are now a fixed resource for inbred strains neatly match the 8–10 million common SNPs segregating in human populations (www.hapmap.org). In both mouse and human these SNPs are collectively responsible for much of the phenotypic diversity and differences in health and disease. For example, the first missense mutations in Tyrp1 (figure 54.1, top) change the wild-type TGT cysteine codon at position 86 in exon 2 (note the Gs and As in the top row) present in virtually all mammals to a TAT codon in DBA/2J that codes for a mutant tyrosine residue. This SNP is responsible for the brown locus phenotype. The second missense SNP in exon 5 (rs28091461, the less severe b-“light” allele) results in an arginine (R) to histidine (H) replacement at
amino acid position 326. These mutations in a key melanocyte catalase disturb melanin production and lead to oculocutaneous albinism type 3 (OCA3) in humans. In mice, these mutations also contribute significantly to glaucoma susceptibility (see chapter 39, this volume).
To ensure a low error rate, both Celera and Perlegen used stringent criteria to define SNPs. This conservative approach and the focus on gene-rich regions have led to the prediction that there are actually in excess of 40 million common SNPs fixed in the genomes of these 16 common strains (Frazer et al., 2007). There are also many other types of genetic polymorphism variants, including insertions, deletions, polymorphic transposons, duplications, and inversions. In sum, just a small cohort of inbred strains provides direct access to roughly four times the level of common variation segregating in humans. Before the genome projects, this genetic variation remained undiscovered, but after sequencing several
Figure 54.1 The SNPs within the exons of the tyrosine-related protein 1 (Tyrp1) are shown from the SNP browser page of GeneNetwork. This is a list limited to the 11 SNPs found in the exons of Tyrp1 (total of 126 SNPs in the gene) within the 17 strains of mice examined. At the top of the page the critical information can be selected. For this analysis we examined Tyrp1. The chromosomal location is listed. We also selected all strains in the database,
exons only, and all functions of the SNPs. The listing at the bottom of the page is self-explanatory, with SNP ID, SNP RS ID, chromosome (Chr), megabase pair (Mb), domain, gap, gene, function of the SNP (Function), conservation, alleles, source, and the specific SNPs in the strains examined. Notice that four of the SNPs create missense mutations, and the remainder of those found in exons are silent.
geisert and williams: the mouse eye transcriptome |
661 |
Table 54.1
Top strains of mice currently sequenced, along with several pertinent ocular phenotypes
Strain |
Sequencing Status |
Ocular Trait |
C57BL/6J |
8X shotgun |
Pigmented appearance: black; related genotype: a/a. C57BL/6J was the DNA source for |
|
|
the international collaboration that generated the first high-quality draft sequence of |
|
|
the mouse genome. |
DBA/2J |
1.3X + Perlegen |
Dilute brown. Glaucoma model, large eye; interacting loci cause severe iris atrophy and |
|
|
glaucoma in DBA/2J mice. Retinal histopathology reveals a loss of RGCs, as well as |
|
|
GABAergic and cholinergic amacrine cells (Moon et al., 2005). |
A/J |
1.3X + Perlegen |
Tyr-negative albino |
129S1/SvImJ |
1.3X + Perlegen |
Tyr-negative albino white-bellied agouti |
129X1/SvJ |
Celera |
Tyr chinchilla pink eye ocular albino |
CAST/EiJ |
Perlegen |
Agouti |
BTBR T+ tf/J |
Perlegen |
Black and tan, tufted |
MOLF/EiJ |
Perlegen |
White-bellied agouti; related genotype: Aw/Aw. Homozygous for the retinal degeneration |
|
|
allele Pde6brd1. |
KK/HlJ |
Perlegen |
Tyr-negative albino KK/HlJ male mice exhibit diabetic symptoms that include |
|
|
hyperglycemia, hyperinsulinemia, and insulin resistance. |
AKR/J |
Perlegen |
Tyr-negative albino |
PWD/PhJ |
Perlegen |
Agouti |
NZW/LacJ |
Perlegen |
Tyrp1b/Tyrp1b p Tyrc/p Tyrc albino; Pde6brd1 |
BALB/cByJ |
Perlegen |
Tyr-negative albino |
WSB/EiJ |
|
White-bellied agouti |
C3H/HeJ |
Perlegen |
Agouti carries Pde6brd1, resulting in retinal degeneration by weaning age |
FVB/NJ |
Perlegen |
Tyr-negative albino carries Pde6brd1, resulting in retinal degeneration |
NOD/LtJ |
Perlegen |
Tyr-negative albino |
Note: The level of coverage and quality produced by Perlegen Sciences’ array-based resequencing technology is equivalent to that of 1.5 × whole-genome shotgun sequencing. For mouse strains and phenotypes, see www.informatics.jax.org/external/festing/mouse/ STRAINS.shtml.
different genomes we are finding that the genome contains a surprising number of variations. With the online genomic resources available to investigators, scientists are able to mine these data sources, producing a rapidly evolving understanding of the structure of the genome.
Gene expression in the mouse eye: Results and resources
Recent advances in genomic technologies have facilitated the large-scale examination of tissues, defining transcript abundance for tens of thousands of genes simultaneously in single experiments. There are two basic technological approaches. In the first, mRNA is isolated, processed, and directly sequenced (expressed sequence tags [ESTs] or serial analysis of gene expression [SAGE]). In the second approach, mRNA or its cDNA derivative is hybridized to a complementary probe (microarray methods), and rather than counting mRNA molecules, one measures the binding affinity as an indirect measure of mRNA concentration. In both of these applications mRNA is isolated using oligo-dT to hybridize the poly-A tail of the message. This approach selectively isolates RNA that will be translated into protein (approximately 3% of the total RNA in the cell), eliminating
the highly abundant ribosomal RNA and transfer RNA. The advantage to these methods is the rapid interrogation of all mRNAs. The disadvantage is that untranslated RNAs are not interrogated by these methods and we are not examining the full complexity of transcriptional activity. The EST analysis and the SAGE analysis directly sequence copies of mRNA and can accurately measure the levels of abundant messages. Furthermore, the quality of the database and detection of rare messages can be increased by increasing the number of sequenced clones or cDNA fragments. Microarray has the advantage of examining a large number of transcripts in a single hybridization step. However, microarray methods have two significant disadvantages. For abundant transcripts, the probe can be saturated and will not reflect the true level of message, and for rare transcripts the hybridization may be below the level of detection.
Expressed sequence tag and serial analysis of gene expression analysis
Many different genes are expressed in the diverse tissue types that make up the eye. Early estimates based on the short sequences (300–500 base pairs long) derived from mRNAs
662 advanced genomic technologies
(ESTs) indicated that retina alone produced around 15,000 unique transcripts (Williams, Strom, Zhou, et al., 1998). The UniGene EST collection binds together a set of ESTs that trace back to the same genome location, and often to a single well-known gene. This puts the UniGene collection at a halfway point between the modest number of canonical protein-coding genes and the high number of transcript variants. In the mouse there are a total of around 86,000 UniGene EST clusters (build 163), of which 24,000 are derived from mouse eye tissues. This is roughly twice the number found in organs such as liver (12,000) and heart (11,000), but somewhat less than that found in whole brain (30,000). Unfortunately, differences among tissue and organs have as much to do with ascertainment biases and database coverage as they do with inherent differences in mRNA and cellular diversity. This point is driven home by the fact that the number of UniGene clusters in humans is 50% greater than that in mice (124,000), a difference that is almost certainly technical. However, as in the mouse, the number of human UniGene clusters associated with ocular tissues is close to 24,000.
The National Eye Institute’s NEIBank is an excellent resource to explore the fine details of EST sets extracted from different ocular tissues in several species (http:// neibank.nei.nih.gov). NEIBank includes EST/UniGene collection libraries from whole eye and several discrete ocular tissue types, including the ciliary body (human only), the choroid/retinal pigmented epithelium (human, mouse, other), cornea (mouse, human, other), iris (human, rat, zebrafish), lacrimal gland (human and mouse), lens (human, mouse, rat, kangaroo, dog, zebrafish, rabbit, guinea pig, cow, rhesus monkey), optic nerve (human only), trabecular meshwork (human and rat), and retina (human, mouse, rat, dog, zebrafish, rabbit, guinea pig). The NEIBank retinal mRNA library for adult mouse (NbLib0027) consists of approximately 35,000 UniGene clusters and ESTs. Only one-third (12,224) are represented by multiple hits or clones. Not surprisingly, at the top of the list is rhodopsin, represented by 1,050 counts. In other words, rhodopsin sequence was identified in 1,050 independently sequenced clones. If we sum the unique transcripts in NEIBank across all mouse eye libraries, the total reaches about 28,000 (table 54.2). This sum does not include those genes, ESTs, and UniGene clusters that are only active during development. What we can conclude is that a significant proportion of the genome, certainly well over half of all coding genes, is expressed in the eye, with an exceedingly high level of expression in the retina.
SAGE analysis, like EST analysis, directly sequences short segments of cDNA, offering the ability to improve the quality of the data set by increasing the number of cDNAs sequenced. For the eye, the SAGE database of the Cepko group (http:// itstgp01.med.harvard.edu/retina) examines the expression
Table 54.2
Numbers of clones in different EST libraries in the NEIBank
Mouse Eye |
Human Eye |
Mouse Retina |
Number of Clones |
28,300 |
14,342 |
34,885 |
1 or more |
8,347 |
5,277 |
12,224 |
2 or more |
5,240 |
3,193 |
8,413 |
3 or more |
3,619 |
2,058 |
6,207 |
4 or more |
2,644 |
1,442 |
4,757 |
5 or more |
1,997 |
1,040 |
3,697 |
6 or more |
1,494 |
801 |
2,955 |
7 or more |
1,175 |
628 |
2,374 |
8 or more |
956 |
516 |
1,920 |
9 or more |
782 |
433 |
1,576 |
10 or more |
673 |
374 |
1,330 |
11 or more |
570 |
326 |
1,110 |
12 or more |
370 |
241 |
712 |
15 or more |
221 |
143 |
397 |
20 or more |
133 |
97 |
230 |
25 or more |
99 |
68 |
178 |
30 or more |
Note: Shown are the EST copy number in the mouse eye (NbLib0032 353 highest copy number for Cryg), the human (NbLib0079 NEI site 339 highest copy number for Cryg), and the mouse retina (NbLib0027 1050 max, for Rho). Data were extracted from the NEIBank (http://neibank.nei.nih.gov).
patterns in the developing mouse eye. In the case of the Cepko database, SAGE tags are 14 bases in length and are used to identify specific transcripts in the developing C57BL/6 retina (Blackshaw et al., 2004). This high-quality database identifies transcripts expressed at very high levels and low levels. The data are presented in an easy-to-use, highly interactive format presenting SAGE data for different tissues of the eye and at different developmental stages ranging from embryonic day 12.5 to adult. Using this resource one can examine the level of expression of individual transcripts to uncover developmental expression patterns. In addition to the SAGE database, in situ hybridizations of retinal sections are included for many of the mouse transcripts. The data presented at the Cepko site are a starting point for functional analysis of the role of genes in retinal development. For example, Blackshaw et al. (2004) identified genes expressed in the adult retina that formed a photoreceptor-enriched catalogue of transcripts within the inner retina. This group of genes has a correlated developmental expression pattern with rhodopsin and may offer prime candidate genes for human disease.
Microarray analysis
The advent of microarray technology in the 1990s led to a revolution in the way data are collected and analyzed. With a single chip, one can interrogate tens of thousands of tran-
geisert and williams: the mouse eye transcriptome |
663 |
scripts. This created the need for massive databases and the skilled management of bioinformatic resources. These databases contain significant amounts of data for transcript levels within the eye, retina, and other ocular tissues. In the case of the eye and retina, the data are confounded by the fact that the structures are composed of many different tissue types formed by unique cells. From these data one can extract some unique cellular signatures; however, the results may not accurately reflect each of the cell types and may include lack of cell-level specificity. Furthermore, the data do not provide the investigator with any information concerning translational regulation (protein expression) or posttranslational modifications important in protein regulation. Nonetheless, several microarray databases offer unique opportunities for vision researchers to investigate the regulation of the gene expression during eye development and in different strains of the mouse.
For the development of the eye there is a wonderful microarray-based Web site maintained by the Friedlander group at Scripps Institute. This site is a source of information for the postnatal developmental expression pattern of transcripts in mouse retina. The Friedlander group used Affymetrix microarrays to identify the relative levels of gene expression during the postnatal development of the mouse eye (Dorrell et al., 2004). (The data are presented at www. scripps.edu/cb/friedlander/gene_expression.) The developmental expression patterns of groups of genes are illustrated in a format that allows the investigator to query the developmental expression patterns of a specific gene or groups of genes. These Web sites can serve as references to define the molecular signatures of different tissues in the eye, along with the developmental regulation of gene expression. They also provide an excellent resource with which to follow the developmental expression patterns of genes expressed in the eye or retina. Their analysis provides the investigator with developmental expression patterns of groups of genes. These clusters of genes are specifically related to the developmental patterns of cell birth and maturation. For example, the last group of genes to be upregulated is related to phototransduction and the maturation of rods. The Friedlander database, like many vision-related mouse databases, is based on information generated from only one strain of mouse, the C57BL/6 mouse.
One important goal of genomic research is to combine expression data with variation in phenotype. In the mouse, this can be accomplished by examining strain differences as related to differential susceptibility to disease or naturally occurring differences in phenotype. For the mouse there are several resources that define phenotypic variability. Two examples are the Jackson Laboratory mouse phenotype database (http://phenome.jax.org/pub-cgi/phenome/ mpdcgi?rtn=docs/home) and GeneNetwork Phenotypes (www.genenetwork.org). The mouse strains also serve as a
rich genomic resource, with expression genetics allowing us to correlate phenotype variation with genomic locus. Quantifying mRNA levels with microarray-based systems has allowed for a rapid interrogation or transcript expression in the mouse. Expression genetics allows for a conceptually unique approach of using this transcript expression data to locate genomic loci controlling disease or phenotype. Furthermore, by using mouse genetic panels such as the BXD strain set, we can provide insights into the genetic networks controlling differences in gene expression and phenotypic variation.
The Hamilton Eye Institute Mouse Eye Database We have created the Hamilton Eye Institute Mouse Eye Database (HEIMED, available at www.genenetwork.org) to bring expression genetics to the vision research community. By examining variations at the transcript level we can define correlations in gene expression to genomic loci revealing higher-order transcriptional networks, as well as loci capable of modulating morphological features in the eye, for example, retinal ganglion cell (RGC) number. This approach can lead to dissection of entire networks of genes controlling transcriptional networks within the eye that dictate specific ocular phenotypes. To bring this power of expression genetics and the mouse strain set to the study of the eye, we developed a highly interactive database, HEIMED. This database estimates mRNA expression in whole eyes of young adult mice of many lines generated using the Affymetrix M430 2.0 array, which contains more than 45,000 probe sets representing more than 39,000 transcripts. Within the HEIMED, data from a total of 98 mouse strains is presented, including 67 BXD strains; the two parental strains (C57BL/6J and DBA/2J), along with their reciprocal F1s; and 27 strains from the mouse diversity panel (plus B6 and D2 for a total of 29 common strains) were generated by crossing C57BL/6J with DBA/2J. The BXDs are particularly useful for systems genetics because all of the BXDs are fully mapped and both parental strains are fully sequenced.
The BXD Genetic Reference Panel of Mice The BXD strain set is an integral part of HEIMED and our eye databases. This unique strain set was originally developed in the laboratory of Benjamin Taylor and was transferred to the Jackson Laboratory from Dr. Taylor’s research colony at his retirement (Taylor et al., 1999). The two parental mouse strains, C57BL/6J and DBA/2J, were inbred strains developed relatively early. These strains have contrasting characteristics. The C57BL/6J is a very widely used inbred strain that was originally developed to be refractory to tumors; in contrast, the DBA/2J strain was developed to be susceptible to tumors. These inbred genetic differences have produced one strain (C57BL/6J) that is resistant to CNS damage and another (DBA/2J) that is susceptible to injury (Inman et al., 2002). The DBA/2J mice used in the traditional
664 advanced genomic technologies
set of BXD strains carried a mutation in the Tyrp1 gene that contributed to iris stromal atrophy. In 1997, a second mutation in Gpnmb resulted in a significant increase in iris diseases, causing pigment dispersion in the DBA/2J strain (Chang et al., 1999). In the advanced BXD RI strain set, which was developed from DBA/2J mice carrying both mutations (Peirce et al., 2004), 15 strains carry both the Tyrp1 and Gpnmb mutations and should develop iris disease similar to that observed in the DBA/2J strain. Since the BXD RI strain set is fully mapped and since each strain can be sampled an unlimited number of times, these mice are ideal for use as a mapping panel to identify genomic loci controlling specific phenotypes such as ganglion cell number (Williams et al., 1996) or controlling genetic networks (Vazquez-Chona et al., 2005).
The BXD RI strain set is at the core of our novel analytical of tool set and QTL mapping algorithms (www. genenetwork.org). This includes an SNP database for sequenced mouse strains, along with unique sequence analysis software developed by the Williams group. The collective purpose of these tools and techniques is to extract and test molecular networks that affect gene expression in the eye and the retina. Examples of the innovative use of this approach are described in articles published in Science (Brem et al., 2002) and Nature (Schadt et al., 2003) that have highlighted the power of treating expression data as a quantitative trait and using QTL mapping methods to systematically identify upstream controllers that are responsible for individual differences in transcript abundance ( Jansen and Nap, 2001). Our group has been using QTL mapping methods of this type for more than 6 years (Williams et al., 1996; Hittalmani et al., 2003; Wang et al., 2003; Chesler et al., 2004; Vazquez-Chona et al., 2005), and we welcome the opportunity to bring these approaches to the vision research community, specifically for study of the retina (Vazquez-Chona et al., 2005) (see www.genenetwork.org and the HEIMED). The precision with which we can map QTLs that modulate retina transcript networks is determined by the quality of genetic maps that we assemble for RI strains. With the sequence data, we can now completely define all breakpoints in the set of 80 BXD RI strains. These strains incorporate approximately 7,000 recombinations. Ideally, we would know the precise location of each of these breakpoints within 100,000 base pairs. This would have been almost unthinkable a few years ago. Now it is simply a matter of finding polymorphic loci that distinguish the parental strains.
Using HEIMED, we can extract the molecular signatures of specific tissues or cell types by looking for genes that correlate across the entire genome and are coexpressed with tissue-specific or cell-specific marker genes. If we examine these molecular signatures in the HEIMED database, we can begin to unravel the unique genetic networks that regulate tissue-specific expression patterns. An example of this
type of analysis is genes that covary with rhodopsin (Rho). Unlike other approaches that identify all of the genes expressed in a cell type (in the case of rhodopsin it is rods), HEIMED generates a list of genes that are uniquely expressed in rods, forming a unique signature for this cell type. If we examine the top 100 transcripts that correlate with Rho, we can observe a unique molecular signature for photoreceptors with genes expressed in rods (e.g., Pde6b, Gnat1, Guca1b, Nrl, Pde6g, Pdc, Vtn, Rp1, and Cabp4) with similar expression patterns across the reference panel of mice. The genetic covariance with the rod-specific transcripts not only defines a unique rod signature; it can also be used to define genes associated with human retinal diseases. Pearson and Spearman correlations were used to rank candidates with expression tightly coupled (r > 0.9 for the first 100 probe sets) with rhodopsin across all strains. Thirty of the top 100 covariates with rhodopsin are genes currently associated with retinal disease in humans. For example, Rdh12 (r = 0.98 with rhodopsin) has a well-characterized association with Leber’s congenital amaurosis (LCA3). Examples of other known disease genes that covary with rhodopsin are listed in table 54.3. The genomic locations of other genes that covary with rhodopsin suggest that they are candidate genes for human disease. Their genomic locations in the mouse were converted to human chromosome locations using mouse/ human synteny maps.
We examined the location of human diseases with human disease loci with unidentified disease-causing genes in the RetNet database (www.sph.uth.tmc.edu/Retnet/disease. htm). This analysis allowed us to generate strong biological candidates for uncloned human disease loci, using HEIMED in combination with GeneNetwork. The eight candidate genes for human retinal diseases are listed in table 54.4. These results are one example of the power and potential of mouse genomics for understanding retinal disease in the human.
An example of a tissue-specific molecular signature is the cornea. If we use Aldh3a1, the gene for aldehyde dehydrogenase family 3, subfamily A1, as a marker for the cornea and run this on the trait correlation function of GeneNetwork, we then retrieve a set of transcripts that appears to be a molecular signature of the cornea. The transcripts in the list include Aldh3a1, keratin 12, members of the KLF transcription factor family, and the corneal crystalline Tkt. If we examine the list of genes from HEIMED, we find they represent a series of genes that are expressed in the cornea, with some genes being unique to the cornea. By comparing these genes with the total gene expression in the cornea as represented in the NEIBank (NbLib0116, adult mouse cornea; http://neibank.nei.nih.gov/index.shtml), we find that 9 of the top 10 genes on the HEIMED corneal list are highly represented in the NEIBank list. These include aldehyde dehydrogenase family 3, subfamily A1 (Aldh3a1, 32 independent
geisert and williams: the mouse eye transcriptome |
665 |
Table 54.3
Cloned human disease genes found in our correlative analysis of rhodopsin expression in the mouse, with 23 cloned human disease genes and references for the original and cloning of the gene
Gene |
|
Disease* |
Human Locus |
Mapping Reference |
Cloning Reference |
||
Crb1 |
Leber congenital amaurosis, AR |
1q31.3 |
|
den Hollander et al., 1999b |
den Hollander et al. 1999a |
||
|
Retinitis pigmentosa, AR |
|
|
|
|
|
|
|
Other AD retinopathies |
|
|
|
|
|
|
Sag |
Retinitis pigmentosa, AR |
2q37.1 |
|
Ngo et al., 1990 |
Ngo et al., 1990 |
||
|
CSNB, AR |
|
|
|
|
|
|
Rho |
CSNB, AD |
3q22.1 |
|
Nathans et al., 1986 |
Nathans and Hogness, 1984 |
||
|
Retinitis pigmentosa, AR, AD |
|
|
|
|
|
|
Gnat1 |
CSNB, AD |
3p21.31 |
|
Sparkes et al., 1987 |
Lerman and Minna, 2000 |
||
Pde6b |
CSNB, AD |
4p16.3 |
|
Bateman et al., 1992 |
Pittler et al., 1993 |
||
|
Retinitis pigmentosa, AR |
|
|
|
|
|
|
Cnga1 |
Retinitis pigmentosa, AR |
4p12 |
|
Dhallan et al., 1991 |
Kaupp et al., 1989 |
||
Pde6a |
Retinitis pigmentosa, AR |
5q33.1 |
|
Ovchinnikov et al., 1987 |
Pittler et al., 1990 |
||
Guca1b |
Macular degeneration, AD |
6p21.1 |
|
Surguchov et al., 1997 |
Payne et al., 1999 |
||
Guca1a |
Cone or cone-rod dystrophy, AD |
6p21.1 |
|
Subbaraya et al., 1994 |
Subbaraya et al., 1994 |
||
Tulp1 |
Leber congenital amaurosis, AR |
6p21.3 |
|
North et al., 1997 |
North et al., 1997 |
||
|
Retinitis pigmentosa, AR |
|
|
|
|
|
|
Rp1 |
Retinitis pigmentosa, AR, AD |
8q12.1 |
|
Pierce et al., 1999 |
Pierce et al., 1999 |
||
Kcnv2 |
Cone or cone-rod dystrophy, AR |
9q24.2 |
|
Ottschytsch et al., 2002 |
Ottschytsch et al., 2002 |
||
Cabp4 |
CSNB, AR |
11q13.1 |
|
Haeseleer et al., 2004 |
Haeseleer et al., 2000 |
||
Rom1 |
Retinitis pigmentosa, AD |
11q12.3 |
|
Bascom et al., 1990 |
Bascom et al., 1989 |
||
Nrl |
Retinitis pigmentosa, AD, AR |
14q11.2 |
|
Yang-Feng and Swaroop, 1992 |
Swaroop et al., 1991 |
||
Rdh12 |
Leber congenital amaurosis, AR |
14q24.1 |
|
Haeseleer et al., 2002 |
Haeseleer et al., 2002 |
||
Nr2e3 |
Retinitis pigmentosa, AR |
15q23 |
|
Kobayashi et al., 1999 |
Kobayashi et al., 1999 |
||
|
Other AR retinopathy |
|
|
|
|
|
|
Cngb1 |
Retinitis pigmentosa, AR |
16q13 |
|
γ subunit: Ardell et al., 1995 |
γ subunit: Ardell et al., 1995 |
||
|
|
|
|
|
β subunit: Ardell et al., 1996 |
β subunit: Ardell et al., 1996 |
|
Aipl1 |
Cone or cone-rod dystrophy, AD |
17p13.2 |
|
Sohocki et al., 1999 |
Sohocki et al., 1999 |
||
Unc119 |
Cone or cone-rod dystrophy, AD |
17q11 |
|
Swanson et al., 1998 |
Higashide et al., 1996 |
||
Fscn2 |
Macular degeneration, AD |
17q25.3 |
|
Tubb et al., 2000 |
Saishin et al., 1997 |
||
Crx |
Retinitis pigmentosa, AD |
19q13.32 |
|
Freund et al., 1997 |
Freund et al., 1997 |
||
|
Cone or cone-rod dystrophy, AD |
|
|
|
|
|
|
Rs1 |
XL retinoschisis |
Xp22.13 |
|
Dahl et al., 1987 |
Sauer et al., 1997 |
||
AD, autosomal dominant; AR, autosomal recessive; CSNB, congenital stationary night blindness; XL, x-linked. |
|||||||
|
|
|
Table 54.4 |
|
|
||
|
|
Eight candidate genes for mapped human disease |
|
||||
|
|
|
|
|
|
|
|
Gene |
|
Disease |
|
|
Human Locus |
Mouse Locus |
Mapping Reference |
RP32 |
|
Severe AR RP |
|
1p34.3–p13.3 |
4 at 154.132529 Mb |
Zhang et al., 2005 |
|
AXPC1 |
|
AR ataxia |
|
1q31.1–1q32.3 |
1 at 134.282002 Mb |
Higgins et al., 1997 |
|
RP29 |
|
AR RP |
|
4q32.1–4q34.3 |
8 at 26.0 cM |
Hameed et al., 2001 |
|
MCDR3 |
|
AD macular dystrophy |
|
5pter–5p13.1 |
15 at 6.994909 Mb |
Michaelides et al., 2003 |
|
LOC387715 |
|
ARMD, complex etiology |
|
10q26.13 |
Chr 7 F3 |
Jakobsdottir et al., 2005 |
|
USH1A, USH1 |
AR Usher syndrome, French |
|
14q32.11–14qter |
4 at 150.799494 Mb |
Kaplan et al., 1991 |
||
RP22 |
|
AR RP |
|
16p12.3–16p12.1 |
7 at 60.0 cM |
Finckh et al., 1998 |
|
CACD |
|
AD central areolar choroidal dystrophy |
17pter–17p13.1 |
11 at 68.798082 |
Lotery et al., 1996 |
||
Note: Most of the data came from the website RetNet. The eye signal is the average expression level across all strains of the specific transcript in the eye expressed on a log scale. The clones are expressed at a level of 2 log units less in BXD24, which has acquired a mutation that causes photoreceptor degeneration.
AR, autosomal recessive; RP, retinitis pigmentosa.
666 advanced genomic technologies
clones); keratin complex 1; acidic gene 12 (Krt1-12, 44 independent clones); transketolase (Tkt, five independent clones); and uroplakin 1B (Upk1b, three independent clones). Furthermore, many of the genes in the list are key features of the corneal proteome (see chapter 57, this volume). These include the corneal crystallins, Aldh3A1 and Tkt, which are highly expressed in the cornea (Piatigorsky, 2000; Estey et al., 2007). The list also includes genes of unknown function in the cornea, such as Upk1b. Uroplakin1b is thought of as a bladder protein expressed in the transitional epithelium as part of a complex with the other two known uroplakin proteins (Min et al., 2006; Sun, 2006). The ability to extract this corneal transcriptome signature from the whole eye database illustrates the power of HEIMED. The minor but consistent variations in corneal gene expression across the strains of mice in the data set allow the software to extract a group of genes uniquely expressed in the cornea from a data set consisting of the transcriptome of the whole eye.
Genetic networks in the mouse eye
In addition to providing molecular signatures of different cell types and tissues of the eye, HEIMED can reveal genetic networks that underlie phenotypic differences in the mouse eye. The genetic heterogeneity in inbred mice results in sig-
nificant variation in phenotypes. These genetic heterogeneities can result in differences in behavior, neural structure, number of neurons, protein expression, and relative abundance of mRNA species. The simplest example is a Mendelian trait with a specific phenotype such as albinism. With the number of inbred strains available, the expression levels of specific transcripts can easily be examined. For example we can look at the levels of tyrosinase (Tyr) across a number of strains (figure 54.2). The lack of tyrosinase function is directly related to a common mouse phenotype, albinism. In figure 54.2, the level of Tyr is illustrated for 21 inbred strains (the data are from HEIMED). The pigmented strains are represented by solid bars and the albino strains by white bars. In general, strains with lower levels of Tyr are animals that are albino and not all albino mice have low levels of Tyr transcript. This points out the simple nature of most mutations. The lack of pigment is related not to the level of the Tyr transcript but to the normal functioning of the tyrosinase protein. This example of Tyr serves as a simple caution always to relate the analysis of data to the biology of the system, especially when examining levels of transcripts. This serves as a word of caution as we begin to examine the differences in the genomes of inbred strains and specific transcriptome profiles within the eye. These genomic or genetic differences may or may not be fully reflected in
Eye M430v2 09/06 RMA : 1417717_a_at by Case (ranked)
12.0
Value
11.5
11.0
10.5
10.0
*
*
9.5
9.0
PWK/PhJ |
PWD/PhJ |
NZW/LacJ |
BXSB/MpJ |
FVB/NJ |
A/J |
D2B6F1 |
129S1/SvImJ |
NOD/LtJ |
BALB/cByJ |
LG/J |
B6D2F1 |
KK/HIJ |
C57BL/6J |
NZO/HILtJ |
MOLF/EiJ |
WSB/EiJ |
C3H/HeJ |
DBA/2J |
CAST/Ei |
CAST/EiJ |
Figure 54.2 Expression levels of tyrosinase (Tyr) are shown for many of the inbred strains from the mouse diversity panel (see names along the ordinate). The scale to the left is a log scale of transcript level expressed as a mean and standard deviation as
defined by the Affymetrix chip. Note the low levels of Tyr in most of the albino mice (white bars). The strains that have not been sequenced are indicated by an asterisk above the histogram bar.
geisert and williams: the mouse eye transcriptome |
667 |
