:: I cross-posted this question on the Bioconductor support forum (after posting here) because the featureCounts website suggests that forum would be best for getting help with the featureCounts software ::
I am using featureCounts (version 1.5.2) to count the number of reads within bins along the genome:
featureCounts -R -F SAF --fracOverlap 1 -Q 1 --primary -T 20 -a Bins.saf -o readCounts.txt Input.bam
A small percentage of my reads are not being counted:
Status Input.bam
Assigned 25677313
Unassigned_Ambiguity 0
Unassigned_MultiMapping 0
Unassigned_NoFeatures 344088
Unassigned_Unmapped 0
Unassigned_MappingQuality 0
Unassigned_FragmentLength 0
Unassigned_Chimera 0
Unassigned_Secondary 0
Unassigned_Nonjunction 0
Unassigned_Duplicate 0
However, when I remove the --fracOverlap command, 100% of my reads are assigned. When I view the location of the unassigned reads in IGV alongside my bins, they are mapped completely within the bins. Therefore shouldn't they be counted?
Here is the SAM output of some of the unassigned reads:
HISEQ:132:C87JAANXX:8:2102:21334:46258 1040 chr1 64313502 60 47M3S * 0 0 GCCTGTAGAGCATTTTCTTAAATAGTGATTGATGGGGGAGAGCCCAGANA DGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGBGE<<3#3 MD:Z:47 PG:Z:MarkDuplicates RG:Z:C87JAANXX.8 NM:i:0 AS:i:47 XS:i:32
HISEQ:287:C8785ANXX:2:2108:10903:32326 16 chr1 64348161 60 46M4S * 0 0 TAATATATCAATCTTGTTTTTCATAGGTAAACAGGGTGAAAGATTTACTC GGGEGGGFGGGGGBEGBGGGEEG>GGGGGGGGEGGGGGGGGGGGGCCCCC MD:Z:46 PG:Z:MarkDuplicates RG:Z:C8785ANXX.2 NM:i:0 AS:i:46 XS:i:19
HISEQ:132:C87JAANXX:8:2306:5138:50069 0 chr1 64416635 60 48M2S * 0 0 AGTCACAAGTGCACAGGTTCCAGAGGTCAGGCCTTGAGCAGTTTGGGGCT CBBBBFGGGGGFGGGGGGGGGEG1FGGGGGG1=FGGEGGGGGGGGGGFGG MD:Z:31A16 PG:Z:MarkDuplicates RG:Z:C87JAANXX.8 NM:i:1 AS:i:43 XS:i:19
HISEQ:132:C87JAANXX:8:1310:2745:62463 1040 chr1 64441532 60 4S46M * 0 0 AGTTCTCTCAAAAACAACATCTGTAATGTTGTTAGGACCTGGGTCCTAAC :11:E::=/0=1E<1=1111>F:E1=0<1E=1@GGGF0>GBGFGGBBBBB MD:Z:46 PG:Z:MarkDuplicates RG:Z:C87JAANXX.8 NM:i:0 AS:i:46 XS:i:0
HISEQ:287:C8785ANXX:1:2301:12294:65428 1024 chr1 64540417 14 2S48M * 0 0 GTTGTGTGTGTGTGTGTGGGTGTGTGTGTGTGTGTTTGTGTTATATGTGT BABABGGGGGGGGGGGGG/=CCGGGGGGGFGGG>FGGGGF?==F:FGGGG MD:Z:16T31 PG:Z:MarkDuplicates RG:Z:C8785ANXX.1 NM:i:1 AS:i:43 XS:i:39
HISEQ:287:C8785ANXX:5:2304:7860:85658 0 chr1 64540417 18 2S48M * 0 0 GTTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTTTGTGTTATATGTGT BCCBBGGGGEGFGGGGGGGGFDGGGGGGGGGGGGGBGGGGGGGD@GGGFG MD:Z:48 PG:Z:MarkDuplicates RG:Z:C8785ANXX.5 NM:i:0 AS:i:48 XS:i:43
HISEQ:287:C8785ANXX:5:2101:11743:10794 16 chr1 64575425 60 3M1I46M * 0 0 TGATTTTTTTTTTTCACATATGGAGATTGTATGGTGAGGAAAACCCAAAC CEFEE<BGGGGEEGEGEGGGGGGGGGEGGGGGGGGGGGGGGGGGGCCCCC MD:Z:49 PG:Z:MarkDuplicates RG:Z:C8785ANXX.5 NM:i:1 AS:i:46 XS:i:21
And here is the flagstat output from the BAM file:
26021401 + 0 in total (QC-passed reads + QC-failed reads)
0 + 0 secondary
0 + 0 supplementary
8931396 + 0 duplicates
26021401 + 0 mapped (100.00% : N/A)
0 + 0 paired in sequencing
0 + 0 read1
0 + 0 read2
0 + 0 properly paired (N/A : N/A)
0 + 0 with itself and mate mapped
0 + 0 singletons (N/A : N/A)
0 + 0 with mate mapped to a different chr
0 + 0 with mate mapped to a different chr (mapQ>=5)
What kind of annotation are you using that would give you 100% overlap with genes?!?! Your original 80% (I have no clue why one would ever want to use
--fracOverlap
) was more in the range of reasonable. Are you creating fake intergenic genes or something, because otherwise something weird went on.I'm analysing sequencing data from DNA which has been digested at a specific sequence motif. The reads should all align within the windows created by this digestion. I forgot to mention I actually filtered my BAM file previously to feature counting using BEDtools. Where I removed any reads not 100% covered by a fragment.