All the reads aligned using STAR have low mapping quality (0-3). What is happening?
0
1
Entering edit mode
8.3 years ago
kirannbishwa01 ★ 1.6k

I used STAR for mapping my RNAseq data using STAR. I had used 2 pass mapping and had supplied GTF files. I am also interested in how the reads gets aligned to transcriptome. So, I used the following script during 2nd pass mapping and the script for 1st pass mapping is similar:

STAR --runThreadN 16 --runMode alignReads --genomeDir 2nDpassGenomeDir --readFilesIn Trm_PE-2ms01e_R1.fastq Trm_PE-2ms01e_R2.fastq --outFileNamePrefix 2nDpassAlignment/2ms01e/2ms01e --outFilterMultimapNmax 10 --outSAMmapqUnique Integer0to255 --outSAMtype BAM SortedByCoordinate --outReadsUnmapped Fastx --outSAMattributes All --alignIntronMin 10 --quantMode TranscriptomeSAM GeneCounts

  • When I look for the aligned bam file from STAR on IGV I was surprised to see that all the reads have very low mapping quality (0-3), infact reads with mapQ 0 are above 80%. The rest are mapQ 1 and 3. And this is the case for every gene.

  • I also ran awk to capture any reads with mapping quality above 5 and there were none.

  • Also, the bam file generated by aligning RNAseq reads to the transcriptome has lots of read with mapping quality 0 (zero).

For confirmation I compared the aligned genome sequence data (aligned using BWA from several samples) at the same regions for several genes and I see that there are lost of reads with mapQ above 40. Also, the reads from RNAseq and Genome sequence data share several variants at the same site.

What is causing the reads from RNAseq alignment using STAR to the same position to have such a low mapping quality?

Is something wrong with the script?

Is something different with how STAR assigns mapping quality?

or, is there some other problem?

Thanks in advance, - B

mapping quality STAR BWA alignment • 12k views
ADD COMMENT
1
Entering edit mode

Did you take a small sample of reads and blasted them at NCBI to make sure they look correct (right genome)?

ADD REPLY
0
Entering edit mode

Genome is the right one. Its the same genome I used to align the genome resequence data using BWA.

The genome reseq data aligned by BWA at the same position have high mapping quality (above 40).

ADD REPLY
0
Entering edit mode

According to a recent blog post by Dr. Simon Andrews, STAR uses the same MAPQ scores as TopHat which would mean that all those reads are multi-mapping which explains why they have MAPQ scores of 0-3 (mapping from 2 to 10 or more locations).

ADD REPLY
0
Entering edit mode

Thanks genomax2. But, there should still be some uniquely mapping reads when I am using the whole genome. There is not one read that aligned and has mapping quality above 3 (this is the highest).

I am little surprised and confused why should the scoring of mapQ value be so different.

ADD REPLY
1
Entering edit mode

Don't literally use Integer0to255 as the value for --outSAMmapqUnique. Either use a single value, e.g., --outSAMmapqUnique 50, or don't bother with the argument (the default is 255). It's extremely likely that Integer0to255 caused the "unique alignment" MAPQ to be set to 0.

ADD REPLY
0
Entering edit mode

I think that could be the case. I will change the value for --outSAMmapqUnique and will update it on the forum.

Thanks,

ADD REPLY
0
Entering edit mode

Good observation. I see in the code where the integer range is tested but not the argument type. I don't know enough C/C++ to say what the behaviour would be in this case but your comment makes sense to me (and I'm sure you know more about that).

ADD REPLY
0
Entering edit mode

I won't profess to being a C++ aficionado, so I'm also not sure exactly how that's getting handled. I'm curious if this solves the problem though.

ADD REPLY
1
Entering edit mode

@Devon @ SES @ Wouter: Using --outSAMmapqUnique 60 solved the issue. But, there are reads with either mapQ value 60 or 0, 1, 2, 3. I hope STAR will do a better job of assigning mapping quality.

Also, it is stated in STAR manual to use --outSAMmapqUnique Interger0to255 for downstream GATK compatibility. I think this statement could be update to avoid any confusion.

The statement could be better written as --outSAMmapqUnique X - where X is an desired integer value from 0 to 255.

Personally, for me --outSAMmapqUnique Interger0to255 is confusing.

ADD REPLY
0
Entering edit mode

The manual also states (page 26):

--outSAMmapqUnique

default: 255

int: 0 to 255: the MAPQ value for unique mappers

ADD REPLY
0
Entering edit mode

What does the quality distribution of the FASTQ data look like? If that is good then you can debug the alignments. If not, then you'll want to go back and figure out what happened with the trimming or even with the library prep if the raw data is no good.

ADD REPLY
0
Entering edit mode

The FastQ data quality looks good. There were some enriched kmers even after quality trimming but not too much. But, based on the previous discussion on biostars I used the data (after it was trimmed).

I am posting some of the quality checks from fastqc after the data was trimmed.

per base seq quality

ADD REPLY
0
Entering edit mode

Could you have a look at the mapping qualities in the first pass using STAR?

ADD REPLY
0
Entering edit mode

Its the same. I think the problem is with the scripts.

ADD REPLY

Login before adding your answer.

Traffic: 1796 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6