Question

Clarification On P-Value Threshold For Gwas

2

Entering edit mode

11.5 years ago

mlscmahe ▴ 100

I have an Illumina GWAS data set with ~900 samples and 10 quantitative traits related to Obesity (BMI,weight, waist circumference), Diabetes (PGL), Hypertension (SBP, DBP) and Lipid profiles (TGL,T.Cholestrol, HDL,LDL). I use PLINK for QC and statistical association analysis. QC is performed as per standard protocols. After performing QC I have done LD pruning using PLINK to do PCA using EIGENSTRAT.

Original data set had ~7,00,000 (close to 1 million) SNPs, after doing QC, it turned out to be ~6,00,000 SNPs. Following this, LD pruning reduced SNPs to ~3,00,000. I have done statistical association analysis on two data sets, one with ~6,00,000 SNPs and other with 3,00,000 SNPs (LD pruned set) separately. While doing statistical association analysis (--linear) I have adjusted for Age, Sex and first 10 Principle components.

However, I have some confusion on p-value threshold calculation. I have seen couple of links where they say 0.05 / number of snps would give p-value threshold. But, do I have to consider 12 covariates used in --linear for calculation of p-value threshold? I will be grateful if you can clarify this to me? Thanking you in anticipation.

gwas p-value genetics • 6.7k views

ADD COMMENT • link updated 11.5 years ago by Charles Warden 8.3k • written 11.5 years ago by mlscmahe ▴ 100

score 2 · Answer 1 · 2013-12-19

I would be interested to see what others say, but I would say "no".

For example, consider a 2-way ANOVA (perhaps between tumor and normal expression, considering individual patient pairing). Here you have two factors (tumor status and patient ID). You could imagine something like a 12-way ANOVA (although I can't imagine a large enough dataset to justify correcting for 12 expression variables).

The multiple hypothesis correction is based upon the number of tests. In that respect, a 1-way or 2-way (or hypothetical 12-way) ANOVA all have the same number of corrections (the total number of tests, which is typically the number of genes for a gene expression study, the number of SNPs in a GWAS study, etc.) For gene expression, an FDR correction is more typical than the Bonferroni correction that you described, but I agree this more stringent criteria is more appropriate in this case.

I don't believe you actually conducted more tests (you just conducted a test that compares multiple variables at the same time). If this is true, the answer is "no"