Dear all,
I want to do PCA on post-imputation dosage data to adjust for population stratification (for GWA). Are there any packages that allow SNP pruning and principal component analysis on post-imputation dosage data? [particularly minimac /"Michigan imputation server" outputs]
Many thanks for your replies beforehand,
My thoughts on potential approaches:
- My impression is that bigsnpr (PMID: 32415959) can use dosage data, however:
Still relies on plink hardcalls (bed files) for SNP pruning
It seems to be adapt to BGEN files. Likely there needs to be tweaking to extract and feed the dosage from minimac vcf files.
I could also use hardcalls all the way for PCA calculation, then use those PCs for GWA on dosage data. I suspect this might be justifiable because use of either dosage or hardcalls did not seem to have much of an impact on PC calculation in this study: PMID: 32415959 > Results > section 4.5 > last 3 sentences.
I could use genotype chip PCs instead of post-imputation dose PCs, I have put that as a separate thread already.