I am performing a GWAS Analysis, while comparing my pre-imputation and post-imputation data, I observed that the most significant genetic variant (p<1x 10-16) from pre-imputation data is no more significant post imputation. Imputation performed using reference genome 1000genome phase3 v5 SAS population data in Michigan Imputation server. These variants were missed out while matching the target data and the ref data. How do I overcome this? What is should be reported in the manuscript (pre or post imputation data)? How do we justify such findings ?
I am no expert, but my guess is that the imputation step drastically increased the number of SNPs and thus the number of association tests performed. When correcting for multiple testing, your significant SNPs no longer pass the threshold. How many SNPs were tested before and after imputation? Also, what software/model are you using for GWAS?
We did not perform multiple correction. The SNPs significant pre-imputation (unadjusted p < 1x10-16) is no longer significant post imputation even till p < 0.05. We performed genotyping on ~8 lakh markers and post Quality control, ~3 lakh markers were used as input file for imputation. Post-imputation my output .vcf file retrieved ~80 lakh markers. We performed association analysis using plink where summary statistics were calculated using X2 test. For imputation, we used Michigan imputation server, using 1000genome phase3 v5 (SAS population) as reference, where phasing was performed using Eagle v2.4 where r2 threshold was put 0.2.