Hi, from a previous run of my sample, I know that at the position in question there is a TAAG>T variant (deletion of AAG), which is also clearly visible in IGV.
A samtools mpileup confirms this:
1 225685621 T 10 ,,,-3aag,-3aag,-3aag,-3aag,,-3aag,,-3aag CFFG7GGWGG 34 <<<<<<<>>><<<<<><>><>>>><<><<<>>>< CFHHJJJHFIIEJJGJJJJJJIJJJFCFFEJJHD
But, and here is my issue, the bcftoolsc call, or bcftools view command turns this variant into the following, which disagrees with the previous outputs
1 225685621 . TAAGAAGA TAAGA 174 . INDEL;IDV=7;IMF=0.636364;DP=11;VDB=0.0633369;SGB=-0.616816;MQ0F=0;ICB=1;HOB=0.5;AC=1;AN=2;DP4=0,2,0,6;MQ=60 GT:PL:DP:DV:DP4 0/1:208,0,87:8:6:0,2,0,6
Thus, I'd like to understand why this is happening, and how I could prevent that behaviour. Is it possible that this is a bug?
Thanks for any help,
Michael
EDIT:
I could solve the problem by normalizing the call with piping the bcftools call output to:
bcftools norm -f genome.fa -
But still, I do not understand the representation given above.
Ha, I was really hoping that was the answer, but I cannot see it.
If you remove
TAAGA
fromTAAGAAGA
, the result isAGA
as a difference, ergo the deletion.. Now I assume I am wrong with my argument, so could you please elaborate? Thank youTAAGA isn't what's removed, it's the result of the change (i.e., we start with TAAGAAGA, something happens, and we then see TAAGA). VCF files will represent the states observed, not a reference and then change.