I have such a problem. I run samtools this way: mpileup -f <ref.fasta -l contigs.list input.bam > output.pileup
Input reads have good base quality, but in pileup file at SNP positions I have poor base quality. I mean, for example
In pileup file base quality at snp posiiton: !!
base quality at ordinary place: qq
It occurs not always, not at all snp postions but very often.
Thanks.
As Istvan correctly pointed out, your data looks odd.
But anyway, if you see differences between the quality scores in a pileup (samtools mpileup) and the actual sequences (samtools view), then this is very likely due to samtools automatic BAQ computation, which can downgrade quality scores if a misalignment is likely. BAQ is switched on by default, but you can disable it with mpileup's -B option (not recommended though). See Li, 2001.
I experienced something similar a while ago and BAQ was the cause. So even if the qualities require conversion, don't be surprise if there are still differences.
Thank you, Andreas. The problem was in BAQ computation. It underestimates qualities. With regards our data, they were fine. Text below is just a part of sam file: some insignificant for my question fields were dropped.
q is ascii 123. Illumina and I think Solid range quality from 64 ... 126. So it looks like you need to convert your quality scores to PHRED scale. If I remember correctly both BWA and Bowtie can be told to convert quality ranges of the FASTQ. Then again I have never worked with solid data.
Current (two year old) Illumina and Solid systems use Sanger (+33) encodings. Older Illumina were indeed on the +64 scale (not sure about Solid) but even on that scale the reported quality measures typically end at 41 (it does not use the entire scale). The q would account for a quality of 49 so that makes it a bit suspicious value even on the older scale. Then tools may exhibit strange behaviors once they get codes that are outside of the expected range - there is little error checking - perhaps rightfully so as typically it would mean billions of wasted checks.
Did you convert your base qualities at any point? Did you specify illumina quality?
Reads are actually SOLiD data. And we didn't convert anything.