Question

Multiple testing for eQTL data with correlated SNPs

0

Entering edit mode

9.8 years ago

Krisr ▴ 470

Hi,

I have downloaded the GTEx eqtl raw p-value data for one tissue of interest. I have about 100 candidate SNPs (some in LD with one another) for which I'd like to look for eQTL evidence.

I wrote a script to extract all eQTLs reported for the 100 SNPs of interest. Using this data, I'd like to implement a FDR correction (or other method). However, I'm concerned this approach may be too conservative in not accounting for the correlation among the 100 SNPs.

Does anyone have an idea of how I could address this issue? Or if there are any programs/software out there that offers a solution?

I Was thinking of extracting the 1000 Genomes genotype data (~85 individuals from one of the populations of similar background) for the 100 SNPs and using this program to adjust the p-values: http://journals.plos.org/plosgenetics/article?id=10.1371/journal.pgen.1000456#pgen-1000456-g009

Any thoughts are appreciated.

SNP correlation eqtl multiple-testing • 3.0k views

ADD COMMENT • link updated 2.6 years ago by Ram 44k • written 9.8 years ago by Krisr ▴ 470

Ram · Answer 1 · 2015-03-11

SNPs that are in strong LD with each other are not independent signals; they're tagging one or more causal variants and will have a complicated correlation structure. If one were starting with the raw data genotype calls, a common statistical approach would be to use a linear model to assess association between each variant and your phenotype. Pick the variant with the strongest single association. Now add that variant to the model, and ask whether there is another variant that is significantly associated with the phenotype after conditioning on the first variant you selected. If so, that's evidence that more than one SNP may be independently linked to the phenotype.

With only the P values (and, presumably, the physical location of the variants, which you can get from the annotations) you can't use this approach since you can't build the model. A practical approach would be to choose a conservative (low) cut-off for LD, and choose only one SNP within each LD block. You would still need to correct for the complete set of SNPs that you considered, since you tested them all for association. Adjusting for multiple testing using more complex methods would be tricky since you don't have the genotypes and can't use permutation. The simplest and most defensible approach would be to use Holm's correction for your 100 SNPs; it's slightly more powerful than Bonferroni but equally conservative.