I am new to RNA-seq analysis of paired end read data and I was performing read counting using featureCounts. The use of -O flag increases the number of assigned reads drastically (difference of more than 20 million assigned reads). I don't quite understand the use of this option. In the manual it says that it "Assign reads to all their overlapping meta-features (or features if -f is specified)". Please can someone elaborate on this. Thanks in advance
By default, featureCounts does not count multi-overlapping reads
But after specifying -O flag
each overlapping meta-feature/feature receives a count of 1 from a read
So first, multi-overlapping reads will be counted; Second, more features overlapped by one read, more times it will be counted. I think that's why the number of assigned reads drastically
Sounds like a large proportion of your reads map to multiple genes. Do you have rRNA contamination in your data? If so those exist in hundreds of genes and would amplify the total counts (since each feature receives a count). Try doing a log-log count plots of the two quantifications to see the real difference.
Alternatively featureCounts support fractional counting of multi-mappers (so 1/N reads are assigned where N is the number of matches).
But if you are interested in quantification you might want to read this - basically featureCounts might not be the better option for you...
But then why STAR statistics tell a different story? Over 90% reads had mapped uniquely and only about 3% were muti-mapping.
Number of input reads | 32604356
Average input read length | 300
UNIQUE READS:
Uniquely mapped reads number | 30430485
Uniquely mapped reads % | 93.33%
Average mapped length | 297.43
Number of splices: Total | 38026407
Number of splices: Annotated (sjdb) | 37697238
Number of splices: GT/AG | 37644236
Number of splices: GC/AG | 296935
Number of splices: AT/AC | 38377
Number of splices: Non-canonical | 46859
Mismatch rate per base, % | 0.47%
Deletion rate per base | 0.01%
Deletion average length | 1.84
Insertion rate per base | 0.01%
Insertion average length | 1.71
MULTI-MAPPING READS:
Number of reads mapped to multiple loci | 1060773
% of reads mapped to multiple loci | 3.25%
Number of reads mapped to too many loci | 7889
% of reads mapped to too many loci | 0.02%
UNMAPPED READS:
I had previously done this analysis with DEXSeq without any difficulties but then I switched to different tools because only very few exons showed significant difference between 2 experimental conditions.
Multiple mapping reads are not the same thing as the reads overlapping multiple features, a read can be mapped to a unique position but there could be some overlapping transcripts which caused this read overlapped by multiple transcripts (features)
Sounds like a large proportion of your reads map to multiple genes. Do you have rRNA contamination in your data? If so those exist in hundreds of genes and would amplify the total counts (since each feature receives a count). Try doing a log-log count plots of the two quantifications to see the real difference.
Alternatively featureCounts support fractional counting of multi-mappers (so 1/N reads are assigned where N is the number of matches).
But if you are interested in quantification you might want to read this - basically featureCounts might not be the better option for you...
But then why STAR statistics tell a different story? Over 90% reads had mapped uniquely and only about 3% were muti-mapping.
I had previously done this analysis with DEXSeq without any difficulties but then I switched to different tools because only very few exons showed significant difference between 2 experimental conditions.
Multiple mapping reads are not the same thing as the reads overlapping multiple features, a read can be mapped to a unique position but there could be some overlapping transcripts which caused this read overlapped by multiple transcripts (features)
If you want more help from me I would need to see the log-log plot of the two quantifications.