Добавил:
kiopkiopkiop18@yandex.ru t.me/Prokururor I Вовсе не секретарь, но почту проверяю Опубликованный материал нарушает ваши авторские права? Сообщите нам.
Вуз: Предмет: Файл:
Ординатура / Офтальмология / Английские материалы / Myopia Animal Models to Clinical Trials_Beuerman, Saw, Tan_2009.pdf
Скачиваний:
0
Добавлен:
28.03.2026
Размер:
3.4 Mб
Скачать

169 New Approaches in the Genetics of Myopia

display of annotations in a single window. Additionally, they also allow query and retrieval of data from underlying databases that support the browsers.

The genomic content in the UCSC Genome Browser is organized into several groups, each consisting of relevant tracks with provision for customized tracks using various established formats such as BED (positions of data items in a standard UCSC Browser format) or WIG (allows the display of continuous-valued data in a track format). There are other tools for further analyses of the data, one of which is the Table Browser. It is a portal to the relational database underlying the browser, allowing query and retrieval of information through structured query. From this portal, relevant genomic information can be elicited for genomic convergence. Information that may be useful for genomic convergence in GWAS is: gene and regulatory functionality, linkage disequilibrium, and allele specific information. In addition, genomic information from other databases specific to the disease can also be included. In our association study for myopia (manuscript submitted), we have converged the myopia loci shown in Table 1 and the EyeSAGE43 database (which contains a rich set of reported loci and corresponding gene expression using SAGE) together with our GWAS to help prioritize the markers. Figure 1 shows an example of the convergence of GWAS p-values with genomic information in the UCSC Genome Browser.

Pathway Analysis

Pathway analysis in cancer genomics

Pathway analysis, sometimes synonymously known as Gene Set Enrichment Analysis (GSEA), comprises of statistical methods developed in the field of cancer genomics for expression-based annotations.44–50 It utilizes biological pathway information in discovery of candidate genes. Diseases are often regulated by networks of genes or gene sets, each conferring a small effect on the overall phenotype. Traditional data mining or statistical approaches may not capture the small effects of these genes on the disease. Such genetic heterogeneity common in many complex human diseases can lead to the loss of power to detect genetic associations using single marker analyses due to weak marginal effects and multiple testing corrections.51

170 L.K. Goh, R. Metlapally and T. Young

Fig. 1. Genomic convergence of GWAS p-values with genomic information on the UCSC Genome Browser. The top track shows the GWAS p-values followed by the other tracks available in the browser such as Genetic Association Studies of Complex Diseases and Disorders (GAD), Human Quantitative Trait Locus (RGD Human QTL), UCSC Genes, Gene Expression Atlas Ratios (GNF Ratio), CpG Islands, TS miRNA sites (TargetScan miRNA Regulatory Sites), 7X Reg Potential (Regulatory Potential), Mammal Cons (PhastCons Conservation), SNPs, Linkage Disequilibrium, Database of Genomic Variants (DGV), and ENCODE regions.

In pathway analyses, statistical methods are used to compute the combined statistics of genes and then assess the significance of these genes in gene sets or pathways. It involves two steps: score statistics for each gene set and assessment of the significance of the gene sets with annotated pathways. For score statistics, Subramanian et al.48 and Mootha et al.44 calculated an enrichment score for each gene set by ranking the genes based on their association with the phenotype. Tian et al.46 and Kim and Volsky47 used two sample statistics such as t-test for each gene and aggregate it for every gene set. A difference in these methods is the treatment of genes that are not in the gene set. One approach is to apply penalties on the non-member genes44,48 and the other is to ignore it.46 To assess significance of the statistics, most of these methods use nonparametric statistics

171 New Approaches in the Genetics of Myopia

such as permutation to measure the significance of overlapped genes with those in annotated gene sets. Pathway analyses and GSEA have been successfully applied in many studies involving basic science, clinical studies, and pathway deregulation in cancer biology.52–54

Pathway analysis in GWAS

Single marker-based association tests suffer from low power if each tested marker is in incomplete LD with the unobserved or untyped QTL. This has led to the development of multi-locus analysis, which considers the joint effects of markers simultaneously. It can be performed on the basis of either genotype or haplotype. Instead of single marker-based association with the trait, a group of markers are assessed for association with the trait. For genotype analysis, the approach typically uses multi-linear regression to model the relationship between the traits and a vector of covariates corresponding to the genotypes or similarity between pairs of genotypes. Intuitively, this should provide greater power to detect QTL, but due to the large number of degrees of freedom in multivariate test statistics, simulation studies have shown similar or reduced power compared with single-marker analyses. Several novel statistical approaches (parametric and non-parametric) have been developed to overcome this. One approach is a dimension-reduction procedure, such as principal component analysis55,56 or Fourier transformation57 to reduce the genetic data, while another uses kernel functions to reduce the score statistics to a global statistic,58–61 all resulting in a smaller degree of freedom.

An alternative to genotype analysis mentioned above is combining p- values from single marker-based association tests.62–64 It has two steps, just as in the pathway analysis or GSEA approaches for cancer genomics. The first involves robust methods on combining test statistics from multiple markers or SNPs into meaningful statistics for genes in a pathway. These can then be tested against the null hypothesis that the gene sets are not clustered by chance. Instead of focusing on a few markers with strongest associations with the disease, the biological relevance of the markers as captured in pathways is considered. It is still a relatively new approach for GWAS though encouraging results have been shown in several studies.65–68

Lastly, haplotype association tests offer a higher dimension of analyses and may identify markers with small genetic effects that could otherwise be missed out in single marker tests. However, phase information of haplotypes is not easily available, and though it can be inferred statistically, it

172 L.K. Goh, R. Metlapally and T. Young

introduces some uncertainty that leads to an inflated statistics variance and therefore reduces power.69 As in multi-locus analysis, it suffers from a large degree of freedom. Haplotype analysis is also limited to within a chromosome, which does not necessarily make biological sense since genes within a given pathway are often from different chromosomes. As the focus of this review is pathway analysis, we refer readers to a comprehensive review of haplotype analysis by Salem et al.70 A discussion of haplotype and genotype-based analysis can also be found in Clayton et al.69

In the following sections, we review the multi-locus methods that have been developed to enable pathway analysis in GWAS. As highlighted, they can be categorized as non-parametric, parametric, and p-values combining approaches.

Non-parametric approaches

A non-parametric method was proposed by Schaid et al.58 using U- statistics71 to compute a global statistic for pair-wise comparison of genotypes between samples using kernel functions that describe the dosage effects of additive, dominant, recessive, quadratic, or allelic models.

Uglobal

=

i< j k wkh(gi,k , g j ,k )

(n )

 

 

2

where gi,k and gj,k are the kth genotype of samples i and j, h(gi, gj) is the kernel function and there are N samples and K markers. wk is the weight

for k markers, which is estimated from the covariance matrix of U. The statistical test consists of comparing the global statistics between cases and controls. The statistical power of the test is thus dependent on the choice of kernel functions and the resultant is a loss of power when an inappropriate kernel is used. In simulation, the quadratic kernel has been shown to be robust and is thus a reasonable choice when the underlying genetic effect is unknown. Do note that the performance of the method is also influenced by the accumulated noise from an increased number of SNPs. The test statistics may deteriorate when too many SNPs in the K markers are not associated with the trait.57

An alternative U-statistics was proposed by Wei et al.,60 looking at the within and between-group U-statistics instead of the contrast between cases and controls. This allows for qualitative traits of more than two categories and can be extended to quantitative traits as well. Instead of the

173 New Approaches in the Genetics of Myopia

kernel functions used by Schaid et al.,58 a Hamming distance kernel was implemented. wk is SNP-specific and defined as the negative logarithm of the single marker p-value.

U within

=

i< j k wk I(gi,k gj ,k )

 

(n )

 

 

 

2

 

U between =

ii

==1ncases jj ==1ncontrol k wk I(gi,k gj ,k)

 

 

(ncases )

 

 

 

ncontrol

Under the null hypothesis, Ubetween is zero so the test statistics is defined as:-

T = U between

U within

Simulation shows the U-statistics by Wei et al.60 is comparable with that of Schaid et al.58 for additive and multiplicative models. The more interesting result is the scenario where there are protective and predisposing effects among the multiple markers; Wei et al. shows marked improvement. It should be noted that the comparison with Schaid et al. was implemented using the linear dosage kernel that was acknowledged by the author to suffer from poor power when the minor alleles were both protective and disease predisposing across multiple markers. The quadratic kernel that showed robustness could have been used for a more comprehensive comparison.

One of the drawbacks from the above methods is the lack of covariates accommodation, besides the intensive computation sometimes required. Kwee et al.61 proposed a semi-parametric approach that regresses the quantitative trait on a smooth nonparametric function of the genotype, allowing adjustment of any covariates. The nonparametric function is modelled in a reduced-dimension space using kernel function based on identical-by-state (IBS), thereby reducing the degree of freedom.

Yi = bXiT + h(gi ) + ei

where Yi denotes the trait for sample i, Xi the covariate vector, h(gi) the nonparametric function of genotype gi , and εi the random sample-specific