As for the mean MAPQ, you could combine samtools view and awk as follows (I chose an arbitrary 1Mb region on chr22):
samtools view aln.bam chr22:20000000-21000000 | \
awk '{sum+=$5} END { print "Mean MAPQ =",sum/NR}'
Also, I've hacked (type 'make' in the nawk dir) an elegant change that Heng Li made of awk to allow named columns for common genomiccs formats such as SAM. You can name the mapq column instead of $5 (the idea being that this makes for more readable and better documented pipelines):
samtools view aln.bam chr22:20000000-21000000 | \
awk -c sam '{sum+=$mapq} END { print "Mean MAPQ =",sum/NR}'
Note that this version of awk will allow the first line of the file to serve as a header for naming the columns (like an R data frame). Just use:
awk -c header
As for the mean BQ, the same concept would work, but you just need a little tool to unwind the ASCII BQ values into Phred scores.
samtools view aln.bam chr22:20000000-21000000 | \
perl -lane '@quals = split(undef, $F[10]); foreach $qual (@quals) {print ord($qual)-33}' | \
awk '{sum+=$0} END { print "Mean BASEQUAL =",sum/NR}'
Or something like that.
+1 for this hack for awk . I love it ! I need it ! I want it in the standard distribution !
@lh3 gets all the credit for that. I just added a bit of laziness to it.
thanks! very helpful