Calculation Of Average Mapping Quality Over A Given Genomic Region
1
1
Entering edit mode
12.9 years ago
Bioscientist ★ 1.7k

For a given genomic region,how can I calculate:

1.average mapping quality(phred-like) for all mapped reads in the bam file over the region.

2.average sequencing quality(also phred score) for all bases in all mapped reads over the region

Any softwares or packages? I know GATK may do such work, but their website is so confusing and I cannot find the information I need.

Thanks

gatk mapping quality • 12k views
ADD COMMENT
8
Entering edit mode
12.9 years ago

As for the mean MAPQ, you could combine samtools view and awk as follows (I chose an arbitrary 1Mb region on chr22):

samtools view aln.bam chr22:20000000-21000000 | \
    awk '{sum+=$5} END { print "Mean MAPQ =",sum/NR}'

Also, I've hacked (type 'make' in the nawk dir) an elegant change that Heng Li made of awk to allow named columns for common genomiccs formats such as SAM. You can name the mapq column instead of $5 (the idea being that this makes for more readable and better documented pipelines):

samtools view aln.bam chr22:20000000-21000000 | \
    awk -c sam '{sum+=$mapq} END { print "Mean MAPQ =",sum/NR}'

Note that this version of awk will allow the first line of the file to serve as a header for naming the columns (like an R data frame). Just use:

awk -c header

As for the mean BQ, the same concept would work, but you just need a little tool to unwind the ASCII BQ values into Phred scores.

samtools view aln.bam chr22:20000000-21000000 | \ 
   perl -lane '@quals = split(undef, $F[10]); foreach $qual (@quals) {print ord($qual)-33}' | \
   awk '{sum+=$0} END { print "Mean BASEQUAL =",sum/NR}'

Or something like that.

ADD COMMENT
2
Entering edit mode

+1 for this hack for awk . I love it ! I need it ! I want it in the standard distribution !

ADD REPLY
2
Entering edit mode

@lh3 gets all the credit for that. I just added a bit of laziness to it.

ADD REPLY
0
Entering edit mode

thanks! very helpful

ADD REPLY

Login before adding your answer.

Traffic: 1474 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6