Hello,
I am trying to filter some vcf files from mpileup by using the average base quality of reference bases and the average base quality of the variant bases.
In the vcf file, the I16 category is formatted as:
1 #reference Q13 bases on the forward strand
2 #reference Q13 bases on the reverse strand
3 #non-ref Q13 bases on the forward strand
4 #non-ref Q13 bases on the reverse strand
5 sum of reference base qualities
6 sum of squares of reference base qualities
7 sum of non-ref base qualities
8 sum of squares of non-ref base qualities
9 sum of ref mapping qualities
10 sum of squares of ref mapping qualities
11 sum of non-ref mapping qualities
12 sum of squares of non-ref mapping qualities
13 sum of tail distance for ref bases
14 sum of squares of tail distance for ref bases
15 sum of tail distance for non-ref bases
16 sum of squares of tail distance for non-ref
My problem is that numbers 1-4 are only presenting high quality bases, while numbers 5 and 7 are summing the base quality of all reads, not just the reads included in 1-4.
Is there a way to change how I16 values 5 and 7 are calculated?
Or is there any other settings I can change so that my output will display the average base quality of only my high quality bases, the bases included in I16[1-4]?
This is a great question I would also love the answer to.