Добавил:
kiopkiopkiop18@yandex.ru t.me/Prokururor I Вовсе не секретарь, но почту проверяю Опубликованный материал нарушает ваши авторские права? Сообщите нам.
Вуз: Предмет: Файл:
Ординатура / Офтальмология / Английские материалы / Myopia Animal Models to Clinical Trials_Beuerman, Saw, Tan_2009.pdf
Скачиваний:
0
Добавлен:
28.03.2026
Размер:
3.4 Mб
Скачать

174 L.K. Goh, R. Metlapally and T. Young

effect. h and β are estimated using least-square kernel machines. The score utilizing kernel function is similar to what we have seen:

S(gi ,k

, g j ,k ) =

kK=1wk IBS(gi ,k , gj ,k )

kK=1wk

 

 

Considerations for weights are minor-allele frequency (MAF, q) and p-value of association. The first gives more weights to rare SNPs while the latter intuitively up weights of SNPs with prior evidence of association.

Several options were suggested:

1

and

1

for rare SNPs, and

log10

(pk )

qk

qk

 

 

 

for association, which is similar to that used by Wei et al. The test statistics is based on testing the nonparametric function. Simulation results show the approach achieves higher statistical power compared to single locus tests.

Non-parametric approaches typically utilize similarity measures prior to setting up the test statistics. Several methods on genomic similarity in multi-locus analysis have been discussed in detail by Wessel et al.59 Various similarity measures designed with weights to accommodate genomic functionality, such as IBS, allele frequency, functional variations, nucleotide conservation, single-locus association, haplotype, and ancestry were proposed.

Parametric approaches

Multi-locus analysis suffers from large degrees of freedom so one approach is to reduce the dimensionality by exploiting the underlying LD structure. Wang and Elston57 proposed a method using Fourier transformation (FT) to capture the genotype variations across different traits. In the scenario where SNPs are in LD, the genotypic variation among trait groups extends across all the SNPs, and hence could be compressed into low-frequency components of a Fourier transform. To maintain consistency of genotypic variation across the SNPs, the genotype matrix is recoded to obtain positive correlation between the SNPs. For an additive model, this is done by changing the negative correlated SNPs xij by |2 – xij|.

175 New Approaches in the Genetics of Myopia

With the assumption that the genotype affects only the mean of the phenotype measure and not its scale, the score statistics for the kth FT component of sample i is

N

Uk = Yi (xik xk ) i

The variance of Uk is estimated by

 

=

1

 

2

 

T

 

 

 

 

 

 

 

Vk

 

 

 

 

 

 

n 1

(Yi Y ) (xik xk ) (xik

xk )

 

 

i

i

 

 

To give weights to low frequency FT components, a weight function [1/(k+1)]2 is added. The global weighted score statistic is then defined as

Tw = wTU wTV0w

which follows an asymptotic normal distribution. V0 is the estimated variance-covariance matrix.

Another dimension reduction approach is principal component (PC) analysis. In Gauderman et al.55 and Wang and Abbott,56 PC was computed for multiple SNPs to capture underlying LD structure and then tested for association with the disease. Instead of using SNPs in a logistic regression model, PCs of the sample covariance matrix of SNPs are used. The stronger the LD among SNPs, the fewer the PCs are needed. Given the property of PC analysis that variance of the kth PC is its eigenvalue λ k, a subset of PCs that will explain most of the SNP variation is selected, thereby reducing the degree of freedom. The choice then becomes a trade-off between the amount of variance explained and the degree of freedom penalty. A general rule of thumb is to select PCs that explain at least 80% of the variance.

In a similar idea of utilizing LD structure, Li et al.72 proposed a genebased association test by combining optimally weighted markers (ATOM). It basically assigns weights to markers based on LD structure from a reference set such as the HapMap. Suppose M markers are available in the reference set, the score statistic is defined as

 

 

 

M

k

 

 

1

j

 

si,k

=

gi, j

 

 

 

 

M j =1

pjqj

176 L.K. Goh, R. Metlapally and T. Young

where ∆kj is the LD coefficient between markers k and j, and pj and qj are allele frequencies of marker j.

To test for genetic association, the authors proposed using the aggregate score sk = i si,k of marker k for all samples in a single marker-based

test or PC of the scores in a regression model. Simulation to compare the various methods was done using the dataset of CHI3L2 on chromosome 1 and CDH17 on chromosome 8, a benchmark dataset with complementary gene expression data, where significant evidence was found for cis-acting regulatory elements.73 Based on the results, it is difficult to conclude which method is better. However, it should be noted that SNPTEST — a method that relies on imputation from IMPUTE74 — shows robust performance.

P-values combining approaches

Methods on p-value combining are a departure from multi-locus analysis, but still adhere to the methodology from pathway analysis in cancer genomics. The approach utilizes single marker-based test statistics for a region of interest (e.g. gene) by selecting the ‘best’ SNP63,65 or combined p- values of all SNPs using various methods. P-value combining methods are quite established, some of which are utilized in meta-analysis. They include Fisher Information, SIMES, False Discovery Rate, Rank Truncated Product and its various isoforms, Fourier Transform, and Bayesian approach.62,64,66,75,76 Since GWAS test statistics are SNP-based and not genebased, the question is the window for determining the gene region in order to combine the statistics. In Wang et al.,63 the window is centered at each SNP with 500kb upand down-stream of the SNP. Others are based on genomic intervals elicited from genome databases, some extending the window into promoter regions. In the simulation study by Chapman et al.,62 two separate regions surrounding the genes of interest (24kb for CTLA4 and 48 kb IL21R) were selected.

Once the GWAS SNP-based statistics is converted into gene-based statistics, the established approach of gene-set enrichment analysis can be applied. This involves computing an enrichment score similar to that in GSEA. Statistical significance and adjustment for multiple testing was done by permutation. Another approach is to elucidate top ranked SNPs associated with genes and perform functional analyses using tools such as DAVID,77 GoStat,78 or commercial software Ingenuity Pathway Analysis