Hi,
I doing an eQTL analysis, but I am a bit struggling with the proper correctinng for multiple test because of the massive amount of tests performed. I have more or less 400,000 SNPs and 20,000 transcripts expressed. I am performing a basic analysis using PLINK (Yes I know that there are more newer programs available like matrixEQTL etc, but unfortunately it was decided for me that I have to use PLINK). So for trans eQTLs the amount of tests = 400000*20000=8.00E+009 test so a massive amount and bonferroni correction would be 6.25E-012, but it is debated that this is too conservative. For cis (500kb up or down stream of the transcript) I dont really know what the correct way is of determining the proper cut off, some people mentioned that just taking the GWAS whole genome significance (about 2E-08) level would be appropriate but I am not sure about this. In plink there is the option --adjust to calculate adjusted p-values, but I split up the chromosomes in PLINK analysis otherwise the analysis takes almost a month. So I was hoping to find something to calculate corrected p-values afterwards or just a good way to find a proper cut off value, does anyone has a good idea?
Thanks for your reply! I will look into plink 1.9. But do you perhaps also know a method to apply to my already generated p-values?
A few observations:
PLINK --adjust does not account for the number of phenotypes being tested (each phenotype is considered a separate run... which, incidentally, is a major reason why PLINK is suboptimal for eQTL analysis). It only corrects for the number of SNPs, so you'd need to postprocess the --adjust p-values anyway.
As for cis vs. trans tests, one procedure which sticks to the Bonferroni principle is:
a. Split your 5% acceptable false positive rate into something like "2% cis false positive, 3% trans false positive". (Don't choose those exact numbers; I made them up with no specialized knowledge of eQTL analysis.)
b. Count the number of cis and trans tests, and use Bonferroni math to determine appropriate thresholds for each.
False discovery rate control may be more appropriate than Bonferroni correction, but you'll have to check with your collaborators to see if they find it acceptable.
Other alternatives include permutation tests (if you have no covariates, PLINK 1.9 --assoc might be fast enough for this; but --linear permutation tests on 20000 phenotypes is probably still impractical), and using weaker Bonferroni corrections based on the "effective number of independent tests" (see e.g. X Gao et al. (2010) Avoiding the high Bonferroni penalty in genome-wide association studies).