I produced the VCF file using mpileup with no options, so that the bases would be displayed. However, I don't understand some of the notations in the displayed bases. The first one is that I don't understand the ^KT notation (or ^Kt) here (questions continued after the example). What does it mean? The sam file shows that there should be 8 reads there, but there are 9 characters or 6 if ^K is one character. I believe that I have also seen ^E characters too but am not showing them here.
$ grep -m 5 '\^K' 1792tmp_q1_v2/pileup.vcf
CP003069 1 N 3 ^KT^Kt^KT GGI
CP003069 2 N 8 TtT^KT^Kt^Kt^Kt^Kt BFIIEHDD
CP003069 3 N 12 TtTTtttt^KT^Kt^Kt^Kt GIIIHFIIGIGI
CP003069 4 N 20 CcCCccccCccc^Kc^Kc^Kc^Kc^KC^Kc^KC^KC GGIIEFGIHIHHIIGHHIGH
CP003069 5 N 27 AaAAaaaaAaaaaaaaAaAA^Ka^KA^KA^Ka^Ka^Ka^KA BIIIHHGIHHHIIEBHHIGHIIHIEIG
$ samtools view 1792tmp_q1_v2/aln.sorted.bam|head|cut -f 2,3,4,5,6
16 CP003069 1 0 4M1I27M
16 CP003069 1 0 4M1I31M
16 CP003069 1 0 4M1I31M
16 CP003069 1 0 4M1I31M
0 CP003069 1 42 36M
16 CP003069 1 0 4M1I31M
16 CP003069 1 42 36M
0 CP003069 1 42 36M
0 CP003069 2 42 36M
16 CP003069 2 42 36M
Does the +1tG mean that there is a T insertion or does it mean that there is a T, and then there is an extra base, a G? If it is the first option, then is the repeating insertion, then G basecall, an indicator that there are multiple alleles present? In my case of a haploid, would it be an indicator of incorrect mapping?
CP003069 111111 N 117 G$G$g$G$GGGggG+1Tg+1tG+1Tg+1tG+1TG+1Tg+1tG+1Tg+1tG+1TG+1TG+1Tg+1tG+1TG+1TG+1TG+1TGg+1tG+1Tg+1tG+1Tg+1tg+1tg+1tg+1tg+1tg+1tg+1tG+1TG+1Tg+1tg+1tg+1tG+1TG+1TG+1TG+1TG+1Tg+1tg+1tg+1tg+1tg+1tG+1Tg+1tg+1tG+1TG+1TG+1Tg+1tg+1tG+1TG+1TG+1TG+1TG+1TG+1TG+1TG+1TG+1Tg+1tG+1Tg+1tg+1tG+1TG+1Tg+1tG+1TG+1TG+1Tg+1tg+1tg+1tg+1tG+1TG+1TG+1Tg+1tG+1TG+1TG+1TG+1Tg+1tg+1tg+1tG+1TG+1TG+1Tg+1tG+1Tg+1tG+1Tg+1tg+1tg+1tg+1tG+1Tg+1ttG+1Tg+1tg+1tg+1t^It^9g+1t^IT^9g+1t
And then last question (for now). Can you help me interpret -1NC? Or is it collapsing reads to one locus and/or bad mapping?
CP003069 111111 N 121 C$c$CcccCcCcccCcCcCCCCCCcccc-1nc-1nc-1nc-1nc-1nC-1NC-1Nc-1nc-1nc-1nc-1nc-1nc-1nC-1Nc-1nc-1nC-1NC-1NC-1Nc-1nC-1Nc-1nC-1Nc-1nC-1NC-1Nc-1nC-1NC-1NC-1NC-1NC-1Nc-1nc-1nc-1nC-1NC-1NC-1NC-1NC-1Nc-1nc-1nc-1nc-1nC-1NC-1NC-1NC-1NC-1Nc-1nC-1NC-1Nc-1nC-1NC-1Nc-1nc-1nc-1nC-1Nc-1nc-1nc-1nC-1Nc-1nc-1nC-1NC-1NC-1Nc-1nC-1Nc-1nc-1nCCCCcCCCCccCCCccCCCcC^@C^@C^@C
Thanks for helping me!!
Ok, thanks. What does the K mean next to the ^ though (^K)?
I guess the K is the mapping quality of the entire read?
Yes. The ASCII value of K is 75, so following the instructions, subtract 33 to give a mapping quality of 42 [for the entire read, which starts at this position].