Entering edit mode
3.5 years ago
Alejandro
▴
10
I had a vcf file with imputed genotypes from beagle like this:
##fileformat=VCFv4.2
##filedate=20210504
##source="beagle.18May20.d20.jar"
##INFO=<ID=AF,Number=A,Type=Float,Description="Estimated ALT Allele Frequencies">
##INFO=<ID=DR2,Number=1,Type=Float,Description="Dosage R-Squared: estimated squared correlation between estimated REF dose [P(RA) + 2*P(RR)] and true REF dose">
##INFO=<ID=IMP,Number=0,Type=Flag,Description="Imputed marker">
##FORMAT=<ID=GT,Number=1,Type=String,Description="Genotype">
##FORMAT=<ID=DS,Number=A,Type=Float,Description="estimated ALT dose [P(RA) + 2*P(AA)]">
##FORMAT=<ID=GP,Number=G,Type=Float,Description="Estimated Genotype Probability">
#CHROM POS ID REF ALT QUAL FILTER INFO FORMAT AA20098 AA20099 AC10002
1 20509 oar3_OAR1_17218 A G . PASS DR2=0.16;AF=0.2293;IMP GT:DS:GP 0|0:0.28:0.90,0.05,0.05 0|0:0.27:0.99,0
.01,0.01 0|0:0.71:0.70,0.20,0.10
1 21717 oar3_OAR1_18426 A G . PASS DR2=0.01;AF=0.0484;IMP GT:DS:GP 0|1:0.12:0.05,0.90,0.05 0|1:0.11:0.01,0
.99,0.01 0|1:0.06:0.20,0.70,0.10
1 23949 oar3_OAR1_20658 A C . PASS DR2=0.17;AF=0.3023;IMP GT:DS:GP 1|0:0.41:0.05,0.90,0.05 1|0:0.39:0.01,0
.99,0.01 1|0:0.89:0.20,0.70,0.10
1 31587 oar3_OAR1_28296 G A . PASS DR2=0.00;AF=0.0120;IMP GT:DS:GP 1|1:0.02:0.05,0.05,0.90 1|1:0.02:0.01,0
.01,0.99 1|1:1.04:0.10,0.20,0.70
And I did a filter for by GT>85 with bcftools, and to the result I added the header again with a 'cat', getting something like this:
##fileformat=VCFv4.2
##filedate=20210504
##INFO=<ID=AF,Number=A,Type=Float,Description="Estimated ALT Allele Frequencies">
##INFO=<ID=DR2,Number=1,Type=Float,Description="Dosage R-Squared: estimated squared correlation between estimated REF dose [P(RA) + 2*P(RR)] and true REF dose">
##INFO=<ID=IMP,Number=0,Type=Flag,Description="Imputed marker">
##FORMAT=<ID=GT,Number=1,Type=String,Description="Genotype">
#CHROM POS ID REF ALT QUAL FILTER INFO FORMAT AA20098 AA20099 AC10002
1 20509 oar3_OAR1_17218 A G . PASS DR2=0.16;AF=0.2293;IMP GT 0|0 0|0 ./.
1 21717 oar3_OAR1_18426 A G . PASS DR2=0.01;AF=0.0484;IMP GT 0|1 0|1 ./.
1 23949 oar3_OAR1_20658 A C . PASS DR2=0.17;AF=0.3023;IMP GT 1|0 1|0 ./.
1 31587 oar3_OAR1_28296 G A . PASS DR2=0;AF=0.012;IMP GT 1|1 1|1 ./.
After that, I tried to convert it to plink format by vcftools and plink.
vcftools --vcf infile.vcf --out outfile --plink
plink --sheep --vcf infile.vcf --recode --out outfile
But in booth I got this ped:
AA20098 AA20098 0 0 0 -9 0 0 0 0 0 0 0 0 0 0
AA20099 AA20099 0 0 0 -9 0 0 0 0 0 0 0 0 0 0
AC10002 AC10002 0 0 0 -9 0 0 0 0 0 0 0 0 0 0
And I don't understand why in this ped only appear 0 if the logical thing would be that here also appear 1. Any ideas?
Can you post the EXACT contents of an example VCF that is not converting properly, along with the plink .log file, so I can replicate what you’re seeing?
It's already solved, it was silly due to when I edited the header and data I didn't notice that it was several spaces and not tabs like the rest of the vcf had. For that reason neither plink nor vcftools recognized the genotypes, now it works!. Thanks for your interest.
Hello. Could you please share the code you used to filter by a GP>0.85? I am trying to do that with bcftools -selGT but I am having problems. Thank you