Entering edit mode
2.9 years ago
coleman_jonathan
▴
470
We wish to combine some genotype data from multiple different datasets generated on different arrays, and so have imputed them to the TopMED panel and have merged them using VCFTools. However, we wish to apply quality control to the data, particularly removing uncertain (i.e. low information score [R2]) variants as well as low MAF. Calculating MAF is straightforward (there are VCFtools plugins for that), but I am not aware of a similar tool to recalculate R2.
Does anyone know of an tool that does this?
EDIT: Clarified multiple cohorts
Didn't the imputation algorithm you used provide INFO scores for each SNP in the output?
4galaxy77 – yes, sorry I realise the key piece of information wasn't clear – we have multiple imputed datasets (with R2 values) that we're combining into a single mega-dataset. We need new R2 values for the mega-dataset.