Question

GWAS from direct samples... is imputation necessary?

1

Entering edit mode

7 months ago

sabrilo171 ▴ 20

I'm currently working in a GWAS from 334 samples that were sequenced with Illumina HumanCoreExome 12.1 Chips (2013 version) (Initial SNP count = 578000). I have already processed the IDAT files to PLINK files (PED, MAP) and performed QC using standard thresholds (MAF 0.01, Sample Call Rate 0.95, Individual Call Rate 0.98, HWE 1e-6). After that I still have the 334 individuals and 265000 remaining SNPs.

I was able to run them in GEMMA without imputating and achieved some results. They are above the basic statistical significance line, but not over the Bonferroni correction line in the Manhattan Plot.

This made me wonder, are my results valid if I didn't imputate them? Does not imputating hinder my chances to be published?

If it turns out it is necessary, what is the simplest way to do so with the tools I have: IDAT files, PED + MAP file, binary PLINK files, VCF file (PLINK generated). Unix/Ubuntu + Conda (from WSL), RStudio. I could download the reference GRCh37 file related to the Chips used to create the IDAT files. The reason I didn't impute on the first place was because I tried to use the PLINK generated VCF file in the BEAGLE javascript, and, after processing, BEAGLE returned an empty VCF file with the Sample_IDs as column names, and no SNP information.

Thank you in advance

imputation PLINK GWAS • 637 views

ADD COMMENT • link updated 7 months ago by LChart 5.0k • written 7 months ago by sabrilo171 ▴ 20

score 2 · Accepted Answer · 2024-11-03

Imputing is generally necessary since these older arrays will largely contain tagging SNPs but not lead SNPs. The genotype miscall rate is also not quite as good as sequencing, and imputation does help to fix bad genotype calls; it will also improve your call rates.

Easiest way is the Michigan Imputation Server. Follow the tutorial.