Question

Calculation Of Average Mapping Quality Over A Given Genomic Region

1

Entering edit mode

13.2 years ago

Bioscientist ★ 1.7k

For a given genomic region,how can I calculate:

1.average mapping quality(phred-like) for all mapped reads in the bam file over the region.

2.average sequencing quality(also phred score) for all bases in all mapped reads over the region

Any softwares or packages? I know GATK may do such work, but their website is so confusing and I cannot find the information I need.

Thanks

gatk mapping quality • 13k views

ADD COMMENT • link updated 13.2 years ago by Aaronquinlan 12k • written 13.2 years ago by Bioscientist ★ 1.7k

score 8 · Answer 1 · 2012-01-05

8

Entering edit mode

13.2 years ago

Aaronquinlan 12k

As for the mean MAPQ, you could combine samtools view and awk as follows (I chose an arbitrary 1Mb region on chr22):

samtools view aln.bam chr22:20000000-21000000 | \
    awk '{sum+=$5} END { print "Mean MAPQ =",sum/NR}'

Also, I've hacked (type 'make' in the nawk dir) an elegant change that Heng Li made of awk to allow named columns for common genomiccs formats such as SAM. You can name the mapq column instead of $5 (the idea being that this makes for more readable and better documented pipelines):

samtools view aln.bam chr22:20000000-21000000 | \
    awk -c sam '{sum+=$mapq} END { print "Mean MAPQ =",sum/NR}'

Note that this version of awk will allow the first line of the file to serve as a header for naming the columns (like an R data frame). Just use:

awk -c header

As for the mean BQ, the same concept would work, but you just need a little tool to unwind the ASCII BQ values into Phred scores.

samtools view aln.bam chr22:20000000-21000000 | \ 
   perl -lane '@quals = split(undef, $F[10]); foreach $qual (@quals) {print ord($qual)-33}' | \
   awk '{sum+=$0} END { print "Mean BASEQUAL =",sum/NR}'

Or something like that.

ADD COMMENT • link 13.2 years ago by Aaronquinlan 12k

2

Entering edit mode

+1 for this hack for awk . I love it ! I need it ! I want it in the standard distribution !

ADD REPLY • link 13.2 years ago by Pierre Lindenbaum 165k

2

Entering edit mode

@lh3 gets all the credit for that. I just added a bit of laziness to it.

ADD REPLY • link 13.2 years ago by Aaronquinlan 12k

0

Entering edit mode

thanks! very helpful

ADD REPLY • link 13.2 years ago by Bioscientist ★ 1.7k