After finishing STAR two-step alignment, I got 62% uniquely mapped reads. But featureCounts gives me only 17% successfully aligned rate.
featureCounts -T 8 -F GTF -p --countReadPairs -t exon -g gene_id -a ~/genome_ref/gencode.v38.annotation.gtf -o ~/expression/all_counts.txt *.bam
I also have a look at other data in this dataset. Many of them are around 20% successfully aligned rate. But QC report is fine. According to the library preparation protocol, the library is unstranded. I also checked the bam file in IGV. The reads distribution seems to be normal. (Although I don't know how to get an overview of the read peaks referring to this answer)
Previously when I analyze another dataset, this rate could be around 70%.
I have 2 questions:
Does low aligned rate severely affect the quantification of gene expression? In another word, can these data be used for downstream analysis?
Why this happens? Any solutions or explanation?
Thank you.
featureCounts should output a summary file. How does that look like?
There are a lot of multi-mapping reads.
For instance, assigned is 39054053 and Unassigned_MultiMapping is 174085844, resulting in 17.29745159% of successfully assigned rate.
All in this dataset have more multi-mapping reads than assigned reads.
Check your annotation file and how featureCounts works when a read overlaps two or more features. It might be that since your reads do not map to unique exons (or uniques gene-id) you have a low assigned rate.