Question

featureCounts results discordance with library information

0

Entering edit mode

6.6 years ago

Hughie ▴ 30

Hi! everyone:
I'm doing read count and found something strange:

I used RSeQC infer_experiment.py and got the result below which shows that my data is definitely reversely stranded
So, I used featureCount -s 2 for count and found a low assign rate:

So, I try -s 0 -s 1 for comparison and found both show high assign rate:
-s 0:

-s 1:
Here I also show the STAR mapping result:
(after mapping, I used samtools for filtering samtools view -f 2 -F 256)

Question:
1. Is there any wrong with my argument setting?
2. Despite the -s comparison, why featureCounts always got low assigned rate? Why the total reads in featureCounts are so many? And even after I filtered multiple mapping reads using samtools before, I got so many multiple assigned showed in featureCounts' summary?

Thank you a lot for your reading and suggestions!

RNA-Seq • 1.9k views

ADD COMMENT • link 6.6 years ago by Hughie ▴ 30

0

Entering edit mode

STAR has a --quantMode GeneCounts parameter, which should output counts using the same method as HTSeq count. And why do you filter the bam file before counting?

why featureCounts always got low assigned rate?

It may be due to poor annotation. What is the organism? What are the genome and annotation versions?

Why the total reads in featureCounts are so many?

Are you telling featureCounts you have paired reads?

And even after I filtered multiple mapping reads using samtools before, I got so many multiple assigned showed in featureCounts' summary?

Overlapping features?

ADD REPLY • link 6.6 years ago by h.mon 35k

0

Entering edit mode

Thank you for your kindly reply! h.mon
I normally remove multiple mapped reads after mapping using samtools, because I never use these reads for downstream analysis, I think when using HTSeq we will also discard these.
What is the organism? What are the genome and annotation versions?
I used mm10 reference genome from ensembl and its corresponding GTF annotation.
I think the low assigned rate is due to these reads that fall into intronic and intergenic. And when I using un-stranded mode, it counts twice for a read.
Thank you again!

ADD REPLY • link 6.6 years ago by Hughie ▴ 30