Question

Parametric Or Non-Parametric - That Is The Question?

4

Entering edit mode

13.1 years ago

Darren J. Fitzpatrick ★ 1.1k

Given measures of relative RNA abundance from microarray experiments, after preprocessing, normalising, etc., there always seems to be those genes that refuse to behave normally.

When conducting subsequent analysis on these genes, e.g., eQTL analysis, differential expression - do you go parametric or non-parametric?

What are your thoughts?

gene statistics microarray • 4.8k views

ADD COMMENT • link updated 12.9 years ago by David Quigley 11k • written 13.1 years ago by Darren J. Fitzpatrick ★ 1.1k

1

Entering edit mode

I work using microarrays from 4 years and as far as I know most often researchers use anova, I do as well.

ADD REPLY • link 13.1 years ago by boczniak767 ▴ 870

score 7 · Answer 1 · 2011-11-23

Much of the time I use a parametric test to establish an observed statistic, but a non-parametric test to establish a significance threshold for that test. This is a fairly common approch. For example, I think it's safe to say the vast majority of differential expression analysis is performed with some variation of the t test or linear regression. SAM, for example, uses a modified t test, and establishes a FDR through permutation testing.

This is a common approach for eQTL analysis as well; typically one tests candidate alleles using a linear model, but establishes significance by permutation testing. The non-parametric testing is particularly important for eQTL analysis, because in my experience eQTL results are particularly susceptible to outliers which hyper-inflate your statistic when looking for trans-eQTLs. This can happen when you have rare homozygous alleles that coincide with rare high or low expression values; by testing the whole genome, you inevitably identify these cases which are likely (though not certain) to be spurious associations.

When I perform genome-wide correlation analysis I take a different approach: I use spearman rank correlation rather than pearson correlation. In general, I've found the results to be of nearly the same power and I feel more comfortable with a non-parametric statistic in this case. For genome-wide analysis I also use a permutation method (the Genome-Wide Error Rate method, Churchill Genetics 1994) to establish a significance threshold.