Hi all, I'm trying to pull strand count information from a VCF file made using GATK4's Mutect2. I used the following command to create this VCF:
gatk Mutect2 -I SRR8525881.bam -mbq 20 -R ../../genome/hxb2.fa --mitochondria-mode True -O SRR8525881.vcf
The VCF output for multi-allelic sites looks like this (I've emboldened the read depths and the strand count fields (strand counts are in this order: ref forward, ref reverse, alt forward, alt reverse)):
K03455.1 2042 . AG GA,GG . . DP=35;ECNT=8;MBQ=20,20,20;MFRL=182,166,90;MMQ=60,60,60;MPOS=56,45;OCM=0;POPAF=2.40,2.40;TLOD=35.34,20.29 GT:AD:AF:DP:F1R2:F2R1:SB 0/1/2:16,10,9:0.289,0.258:35:12,8,2:4,1,5:10,6,12,7
The strand counts given appear to be combined counts for both alternate alleles (12+7 = 10+9) so is there any way I can get strand counts for each alternate allele individually?
I've tried using GATK's VariantsToTable with the --splitMultiAllelic parameter and vcflib's vcfmulti program to split these multi allelic sites but get the following output for the same site:
K03455.1 2042 . AG GA 0 . DP=35;ECNT=8;MBQ=20,20,20;MFRL=182,166,90;MMQ=60,60,60;MPOS=56;OCM=0;POPAF=2.40;TLOD=35.34 GT:AD:AF:DP:F1R2:F2R1:SB ./0/1:16,10,9:0.289:35:12,8,2:4,1,5:10,6,12,7
K03455.1 2042 . AG GG 0 . DP=35;ECNT=8;MBQ=20,20,20;MFRL=182,166,90;MMQ=60,60,60;MPOS=45;OCM=0;POPAF=2.40;TLOD=20.29 GT:AD:AF:DP:F1R2:F2R1:SB ./0/1:16,10,9:0.258:35:12,8,2:4,1,5:10,6,12,7
As you can see, the alternate strand count info is the same combined total for both alleles.