Hi All, I have a question for the output of featurecounts (from subread package). The total number of my input read pairs is 47168870 (reported by fastQC and STAR):
> Number of input reads | 47168870
> Average input read length | 152
> UNIQUE READS:
> Uniquely mapped reads number | 37604677
> Uniquely mapped reads % | 79.72%
> Average mapped length | 149.39
> Number of splices: Total | 16519867
> Number of splices: Annotated (sjdb) | 0
But the total number of fragments reported by featurecount is 82845035, almost twice as much as the number of input read pairs. the number of SAM alignment pairs reported by htseqcount is 81037190.
> (featurecount): Total fragments : 82845035
> Successfully assigned fragments : 32386307 (39.1%)
(htseqcount)
> 81000000 SAM alignment record pairs processed. Warning: Mate pairing
> was ambiguous for 965089 records; mate key for first such record:
> 81037190 SAM alignment pairs processed.
This is the gff I used for count is gencode.v24.annotation.gff3
My question is that I want to know what is the definition of fragment in featurecount report? why there is more fragments compared with the input read pairs? In my understanding, each read pair indicates a fragment and the total number of fragment and total number of read pair should be equal.
Thank you for your time in advance.
ewre
Did you use the
-p
option to count fragments instead of reads?Yes: featureCounts -p -s 2 -T 5 -a gencode.v24.annotation.gff3 -t exon -g gene_id -o sample.out sample.bam
Ewre, I have the same situation as you do. I wonder if you find out the answers to your questions, can you kindly share the answer?
If you isolate the read names of all the reads that have mapped, and then
sort | uniq
them, how many do you get? Chances that a multi-positional alignment is happening?