Entering edit mode
3.7 years ago
ziv_attia
•
0
I have created a vcf table using GATK using haplotypeCaller, genomicsDBimpirt and genotypesVCF.
the output I get is very different from the vcf4.2 format.
for example:
0/1:8,3:11:36:36,0,233
from vcftools
0|1:2,4:6:72:0|1:4938136_T_C:162,0,72:4938136 #from GATK
^_____________^ ^_____^
0|1:2,4:6:72:0:162,0,72 #how it should look like...
This format stuck the downstream pipeline I am used to work with.
Any idea what is it mean / how to get rid of it?
thanks!
Please show us the exact GATK commands you used. This looks like a Find & Replace operation gone wrong.
hope this info helps
Thank you. For the example entries you've shown in your question, can you also show us the
FORMAT
field from the 2 VCF files for those entries?GATK format field -
GT:AD:DP:GQ:PGT:PID:PL:PS
vcftools format field -
GT:DP:GL
this is probably the reason. How do format the format of the vcf to contain only the GT:DP:GL fields ?
I don't think GATK giving you more information is necessarily a "problem". You can always extract the info you need from what GATK gives you. You should be able to use
bcftools annotate
to keep/removeFORMAT
fields. Extract a small subset of your GATK VCF file and try processing it withbcftools annotate
.thanks a ton! i will go through it and see how it works
I'm sorry but what should we see ? how any output from GATK should be similar to the 'old' vcftools ? what are the weird characters ? what is the FORMAT column associated to both outputs ?