Question

bimodal distribution of bam mapping quality

0

Entering edit mode

9.0 years ago

irritable_phd_syndrome ▴ 130

I have some RNA-Seq data that I analyzed with tophat2. The command that I used to generate it is

/path/to/tophat-2.1.0/tophat -p 20 -o ouputdir --library-type fr-firststrand /reference/homo_sapiens/GRCh38/ensembl/Sequence/Bowtie2Index/Homo_sapiens.GRC38 my_trimmed_data.fq.gz

This output a file, accepted_hits.bam with 50e6 aligned reads. When I plot histogram of the mapping quality scores, roughly 27e6 reads have a mapping quality value [0-3] and 23e6 reads have a mapping quality value of 50. There are _no_ values in between.

How should I interpret this bimodal distribution of mapping quality scores? This seems very strange to me.

alignment tophat RNA-Seq • 2.6k views

ADD COMMENT • link updated 9.0 years ago by dariober 15k • written 9.0 years ago by irritable_phd_syndrome ▴ 130

1

Entering edit mode

9.0 years ago

dariober 15k

Besides Tophat, where intermediate mapq is avoided by design, the mapping quality from other aligners (e.g. bwa) tends to be markedly bimodal. I.e. reads tend to be either very unambiguously mapped somewhere or they can map equally well at multiple places.

ADD COMMENT • link 9.0 years ago by dariober 15k

score 3 · Accepted Answer · 2016-03-24

3

Entering edit mode

9.0 years ago

John 13k

This is normal :)

Tophat2 Mapping Qualities

ADD COMMENT • link 9.0 years ago by John 13k