Question

featureCounts has low successfully assigned reads

0

Entering edit mode

3.2 years ago

tomas4482 ▴ 430

After finishing STAR two-step alignment, I got 62% uniquely mapped reads. But featureCounts gives me only 17% successfully aligned rate.

featureCounts -T 8 -F GTF -p --countReadPairs -t exon -g gene_id -a ~/genome_ref/gencode.v38.annotation.gtf -o ~/expression/all_counts.txt *.bam

I also have a look at other data in this dataset. Many of them are around 20% successfully aligned rate. But QC report is fine. According to the library preparation protocol, the library is unstranded. I also checked the bam file in IGV. The reads distribution seems to be normal. (Although I don't know how to get an overview of the read peaks referring to this answer)

Previously when I analyze another dataset, this rate could be around 70%.

I have 2 questions:

Does low aligned rate severely affect the quantification of gene expression? In another word, can these data be used for downstream analysis?
Why this happens? Any solutions or explanation?

Thank you.

RNA-seq featureCounts • 3.1k views

ADD COMMENT • link updated 17 months ago by Juan Pablo • 0 • written 3.2 years ago by tomas4482 ▴ 430

0

Entering edit mode

featureCounts should output a summary file. How does that look like?

ADD REPLY • link 3.2 years ago by Rogerio Ribeiro ▴ 110

0

Entering edit mode

There are a lot of multi-mapping reads.

For instance, assigned is 39054053 and Unassigned_MultiMapping is 174085844, resulting in 17.29745159% of successfully assigned rate.

All in this dataset have more multi-mapping reads than assigned reads.

ADD REPLY • link 3.2 years ago by tomas4482 ▴ 430

0

Entering edit mode

Check your annotation file and how featureCounts works when a read overlaps two or more features. It might be that since your reads do not map to unique exons (or uniques gene-id) you have a low assigned rate.

ADD REPLY • link 3.2 years ago by Rogerio Ribeiro ▴ 110

score 1 · Answer 1 · 2021-10-22

1

Entering edit mode

3.2 years ago

tomas4482 ▴ 430

I found the reason: overrepresented sequences.

I ran some tests. No rRNA contamination. No adpater duplication.

Hence it should be problem with library construction.

ADD COMMENT • link 3.2 years ago by tomas4482 ▴ 430

0

Entering edit mode

Hello Tomas

I got the same problem..low assignment after running featurecounts. I remember when I was checking the QC reports that I had a hight number of overrepresented sequences...

The question is.. how did you figure out that was the problem... and more important.... does it affect the gene expression analysis??

Kind regards

ADD REPLY • link 17 months ago by Juan Pablo • 0

score 0 · Answer 2 · 2021-10-21

0

Entering edit mode

3.2 years ago

swbarnes2 14k

Only 20% aligned? RNASeq should work better than that. Are you sure you are aligning to the right thing?

ADD COMMENT • link 3.2 years ago by swbarnes2 14k

0

Entering edit mode

20% means featureCounts could only map 20% unique but not multiple-mapping or umbiguous reads. It is not the same with STAR alignment rate

ADD REPLY • link 3.2 years ago by tomas4482 ▴ 430