Hi,
I am working on a set of RNAseq samples. I am using featureCounts from the subread package in order to count fragments (not reads) falling into genomic features. I am using the command line option --ignoreDup
to exclude duplicate reads. The results seem somewhat strange...
Here is the summary file provided by featureCounts for two samples:
Status sample1.bam
Assigned 12019290
Unassigned_Ambiguity 0
Unassigned_MultiMapping 7794471
Unassigned_NoFeatures 16908358
Unassigned_Unmapped 0
Unassigned_MappingQuality 0
Unassigned_FragementLength 0
Unassigned_Chimera 0
Unassigned_Secondary 0
Unassigned_Nonjunction 0
Unassigned_Duplicate 68111548
Status sample2.bam
Assigned 48247506
Unassigned_Ambiguity 0
Unassigned_MultiMapping 15519394
Unassigned_NoFeatures 67192231
Unassigned_Unmapped 0
Unassigned_MappingQuality 0
Unassigned_FragementLength 0
Unassigned_Chimera 0
Unassigned_Secondary 0
Unassigned_Nonjunction 0
Unassigned_Duplicate 0
All samples were processed in the same way and same commands were used to run featureCounts. Why does one sample have 0
unassigned duplicates and the other 68111548
? This difference seems so black and white that I am afraid there is an error somewhere. What exactly does it mean to have unassigned duplicates?
You were right. After using the MarkDuplicates function form Picard tools and rerunning featureCounts I was able to "recognize" duplicates in the fragment counting. However, I am not aware of using different settings for the alignment. But it is very weird that some of the samples were flagged and others werent.
Anyways...thanks for your help