My region of interest is ~ 120 kb. I have 300 samples, all containing 14 SNPs in the region. I tried to impute using beagle to get more SNPs. I used CEU population (the closest one) as a reference. After filtering CEU samples I got about 350 SNPs per sample to use a reference.
After imputation I got the same number of SNPs in my population as in CEU. Is it legit to use them all? I have a gut feeling that I have to filter them according to the imputation quality or something. How do I do that? The output VCF looks something like this:
1 110187031 rs113581509 C T . PASS AR2=0.468;DR2=0.514;AF=0.06 GT:DS:GP 0|1:0.759:0.242,0.758,0 0|0:0.018:0.982,0.018,0 0|0:0.018:0.982,0.018,0 0|0:0.001:0.999,0.001,0
Can anyone give me a clue on how to filter the results? Or maybe I should use another software?
Hi eyb,
I know it's been several years ago, but right now I'm facing the same problem that you had in that moment: I've just achieved to impute my data with Beagle, but now I would like to know how to filter out the bad quality SNPs.
I suspect that it is related with the DR2 field, but I'm not quite sure about it... Did you finally resolve your problem?? Thank you very much in advanced!