Hello everyone,
I am currently working with some imputed genotype data and hope you could help me with a few questions regarding after-imputation quality control.
I have filtered out SNPs with low imputation quality (r2<0.8). However, while running the GWAS association analysis using GEMMA I have noticed that some SNPs are being filtered out with default filtering thresholds within the software (such as MAF and SNP-level missingness). Should any additional QC steps (in addition to filtering poorly imputed SNPs) be applied after imputation before running the GWAS? Based on the literature, it seems that only imputation quality thresholds are applied after imputation, however, high missingness rate seems quite problematic...
Any help would be really appreciated.
Best regards,
Aurina
Related post:
Hi Aurina, I'm also in a similar situation where I have received the imputed data (from IMPUTE) in the plink format (.bed, .bim, and .fam) for ch1 to ch22 individually.
Can you please suggest me a suitable protocol/tool to perform the post imputation QC and association analysis.
Thanks in advance
Hi, it is best for you to open a new separate question instead of asking the same question in multiple existed answers.
The simplest way to solve your problem is to join all the bed files together with plink then proceed as normal. Otherwise, you can use a for loop in bash to run the QC for each file separately
Can you let me know what you did finally?