Hi all, I imputed a set of genotypes by beagle using the following function:
java -jar beagle.jar gt=target.vcf ref=ref.vcf gp=true out=imputed
With gp = true to obtain the genotypes posterior probabilities of 0|0,0|1,1|0 and 1|1. The result obtained look like this:
##fileformat=VCFv4.2
##source="beagle.18May20.d20.jar"
##INFO=<ID=AF,Number=A,Type=Float,Description="Estimated ALT Allele Frequencies">
##INFO=<ID=DR2,Number=1,Type=Float,Description="Dosage R-Squared: estimated squared correlation between estimated REF dose [P(RA) + 2*P(RR)] and true REF dose">
##INFO=<ID=IMP,Number=0,Type=Flag,Description="Imputed marker">
##FORMAT=<ID=GT,Number=1,Type=String,Description="Genotype">
##FORMAT=<ID=DS,Number=A,Type=Float,Description="estimated ALT dose [P(RA) + 2*P(AA)]">
##FORMAT=<ID=GP,Number=G,Type=Float,Description="Estimated Genotype Probability">
#CHROM POS ID REF ALT QUAL FILTER INFO FORMAT A1 A2 A3
1 41466 oar3_OAR1_38175 G A . PASS DR2=0.01;AF=0.1125;IMP GT:DS:GP 0|0:0.2:0.81,0.18,0.01 0|0:0.2:0.81,0.18,0.01 0|0:0.27:0.75,0.24,0.02
Once imputed, I wanted to filter by the genotype posterior probability (GP), so I used an instruction that I found and modified with the help of bcftools, for example if I wanted filter by GP>=80 and exclude (-e) the genotypes with less probability (they will appear as "./."):
bcftools +setGT imputed.vcf.gz -- -t q -n . -e 'GP>0.80' | bcftools query -f'%CHROM\t%POS\t%ID\t%REF\t%ALT\t%QUAL\t%FILTER\t%INFO[\t%GT]\n'
And the result should be something like that:
#CHROM POS ID REF ALT QUAL FILTER INFO FORMAT A1 A2 A3
1 41466 oar3_OAR1_38175 G A . PASS DR2=0.01;AF=0.1125;IMP GT 0|0 0|0 ./.
Because the first and second animal (A1 & A2) has a highest genotype probability of 0.81 to 0|0 genotype, but A3 has less. The problem is that according to the probability that you select, the "exclude" command puts as missign or maintine with their value wrong genotypes. For example if I select GP>=85 the result obtained is:
1 41466 oar3_OAR1_38175 G A . PASS DR2=0.01;AF=0.1125;IMP GT ./. ./. ./.
But if I select GP>=99 the program returns something wrong to my understanding, like this:
1 41466 oar3_OAR1_38175 G A . PASS DR2=0.01;AF=0.1125;IMP GT 0|0 0|0 0|0
These three genotypes should appear as missing, but it returns them to me as if they had passed the filter. I'm sure it is something that I'm not understanding well about the command, or perhaps there is another better command to do it, could you help me?. Thanks!
Try
bcftools +setGT test.vcf -- -t q -n . -i'FORMAT/GP>=0.80'