Hi,
I have just downloaded imputation result files from the Michigan Imputation Server. The files include ".info.gz" and ".dosage.gz". I will like to filter the genotype results by R^2 > 0.8 to obtain good quality imputed genotypes. I'm new to this kind of analysis and not sure how to proceed from here. Can someone please advise me on how to do this?
Here is .info example:
SNP REF(0) ALT(1) ALT_Frq MAF AvgCall Rsq Genotyped LooRsq EmpR EmpRsq Dose0 Dose1
1:62246 C T 0.30621 0.30621 0.69406 0.14963 Imputed - - - - -
1:62209 T G 0.25723 0.25723 0.75622 0.11694 Imputed - - - - -
Here is .dosage example:
##fileformat=VCFv4.1
##filedate=2018.4.11
##source=Minimac3
##contig=<ID=10>
##FILTER=<ID=GENOTYPED,Description="Marker was genotyped AND imputed">
##FILTER=<ID=GENOTYPED_ONLY,Description="Marker was genotyped but NOT imputed">
##FORMAT=<ID=GT,Number=1,Type=String,Description="Genotype">
##FORMAT=<ID=DS,Number=1,Type=Float,Description="Estimated Alternate Allele Dosage : [P(0/1)+2*P(1/1)]">
##FORMAT=<ID=GP,Number=3,Type=Float,Description="Estimated Posterior Probabilities for Genotypes 0/0, 0/1 and 1/1 ">
##INFO=<ID=AF,Number=1,Type=Float,Description="Estimated Alternate Allele Frequency">
##INFO=<ID=MAF,Number=1,Type=Float,Description="Estimated Minor Allele Frequency">
##INFO=<ID=R2,Number=1,Type=Float,Description="Estimated Imputation Accuracy">
##INFO=<ID=ER2,Number=1,Type=Float,Description="Empirical (Leave-One-Out) R-square (available only for genotyped variants)">
#CHROM POS ID REF ALT QUAL FILTER INFO FORMAT Sample1 Sample2
However, column "INFO" that contains R2 in the dosage file has this format:
AF=0.00036;MAF=0.00036;R2=0.00035
AF=0.08734;MAF=0.08734;R2=0.18100
what does it look like. Provide an example.