Hello everyone!
I can't understand why my featureCounts summary differs so much from the RNA STAR ones. Both were done in Galaxy. I used default parameters (which I can't add here right now because Galaxy has Bad Gateway, but I will add them ASAP). The subsequent PCA plot is not that good, so I suspect that something is wrong with counting.
The featureCounts found 3681407 unassigned unmapped reads, when RNA STAR reported only about 663296 unmapped reads.
What could be the reason?
The example of summary of featureCounts:
Status RNA STAR on data 56, data 38, and data 37: mapped.bam (8,5 weeks, sample 3)
Assigned 4705542
Unassigned_Unmapped 3681407
Unassigned_MappingQuality 0
Unassigned_Chimera 0
Unassigned_FragmentLength 0
Unassigned_Duplicate 0
Unassigned_MultiMapping 1557399
Unassigned_Secondary 0
Unassigned_NonSplit 0
Unassigned_NoFeatures 5861961
Unassigned_Overlapping_Length 0
Unassigned_Ambiguity 193083
The RNA STAR result for the same sample:
Number of input reads | 14855335
Average input read length | 478
UNIQUE READS:
Uniquely mapped reads number | 10732876
Uniquely mapped reads % | 72.25%
Average mapped length | 472.17
Number of splices: Total | 8059241
Number of splices: Annotated (sjdb) | 8025445
Number of splices: GT/AG | 7965722
Number of splices: GC/AG | 72373
Number of splices: AT/AC | 4974
Number of splices: Non-canonical | 16172
Mismatch rate per base, % | 0.44%
Deletion rate per base | 0.01%
Deletion average length | 1.24
Insertion rate per base | 0.01%
Insertion average length | 2.05
MULTI-MAPPING READS:
Number of reads mapped to multiple loci | 441052
% of reads mapped to multiple loci | 2.97%
Number of reads mapped to too many loci | 67940
% of reads mapped to too many loci | 0.46%
UNMAPPED READS:
Number of reads unmapped: too many mismatches | 663296
% of reads unmapped: too many mismatches | 4.47%
Number of reads unmapped: too short | 2921806
% of reads unmapped: too short | 19.67%
Number of reads unmapped: other | 28365
% of reads unmapped: other | 0.19%
CHIMERIC READS:
Number of chimeric reads | 467140
% of chimeric reads | 3.14%
Any help is highly appreciated!
STAR maps to the genome, featureCounts to the content of the GTF. You are comparing apples with peers. I do not see anything suspicious. Even if your reads map to the genome you can have e.g. genomic DNA contamination which will result in lower assignment rate in the featureCounts output. If you make statements about PCA plots etc. then please add an image.
Not that good
is not a very informative description.Sure. The PCA: https://ibb.co/hRMpy5B Just the samples are not well-grouped.
Thank you very much for your comment. Indeed, I looked in a wrong direction.
It is also rather strange that featureCounts has much higher Unassigned_MultiMapping number. I understand that the gtf file can have much less sequences than the reference genome, so many reads are just unmapped. But why there is such an increase of multimapped reads?..