Question

Reads not assigned to features in featureCounts software

0

Entering edit mode

8.0 years ago

James Ashmore ★ 3.5k

:: I cross-posted this question on the Bioconductor support forum (after posting here) because the featureCounts website suggests that forum would be best for getting help with the featureCounts software ::

I am using featureCounts (version 1.5.2) to count the number of reads within bins along the genome:

featureCounts -R -F SAF --fracOverlap 1 -Q 1 --primary -T 20 -a Bins.saf -o readCounts.txt Input.bam

A small percentage of my reads are not being counted:

Status  Input.bam
Assigned    25677313
Unassigned_Ambiguity    0
Unassigned_MultiMapping 0
Unassigned_NoFeatures   344088
Unassigned_Unmapped 0
Unassigned_MappingQuality   0
Unassigned_FragmentLength   0
Unassigned_Chimera  0
Unassigned_Secondary    0
Unassigned_Nonjunction  0
Unassigned_Duplicate    0

However, when I remove the --fracOverlap command, 100% of my reads are assigned. When I view the location of the unassigned reads in IGV alongside my bins, they are mapped completely within the bins. Therefore shouldn't they be counted?

enter image description here

Here is the SAM output of some of the unassigned reads:

HISEQ:132:C87JAANXX:8:2102:21334:46258  1040    chr1    64313502    60  47M3S   *   0   0   GCCTGTAGAGCATTTTCTTAAATAGTGATTGATGGGGGAGAGCCCAGANA  DGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGBGE<<3#3  MD:Z:47 PG:Z:MarkDuplicates RG:Z:C87JAANXX.8    NM:i:0  AS:i:47 XS:i:32
HISEQ:287:C8785ANXX:2:2108:10903:32326  16  chr1    64348161    60  46M4S   *   0   0   TAATATATCAATCTTGTTTTTCATAGGTAAACAGGGTGAAAGATTTACTC  GGGEGGGFGGGGGBEGBGGGEEG>GGGGGGGGEGGGGGGGGGGGGCCCCC  MD:Z:46 PG:Z:MarkDuplicates RG:Z:C8785ANXX.2    NM:i:0  AS:i:46 XS:i:19
HISEQ:132:C87JAANXX:8:2306:5138:50069   0   chr1    64416635    60  48M2S   *   0   0   AGTCACAAGTGCACAGGTTCCAGAGGTCAGGCCTTGAGCAGTTTGGGGCT  CBBBBFGGGGGFGGGGGGGGGEG1FGGGGGG1=FGGEGGGGGGGGGGFGG  MD:Z:31A16  PG:Z:MarkDuplicates RG:Z:C87JAANXX.8    NM:i:1  AS:i:43 XS:i:19
HISEQ:132:C87JAANXX:8:1310:2745:62463   1040    chr1    64441532    60  4S46M   *   0   0   AGTTCTCTCAAAAACAACATCTGTAATGTTGTTAGGACCTGGGTCCTAAC  :11:E::=/0=1E<1=1111>F:E1=0<1E=1@GGGF0>GBGFGGBBBBB  MD:Z:46 PG:Z:MarkDuplicates RG:Z:C87JAANXX.8    NM:i:0  AS:i:46 XS:i:0
HISEQ:287:C8785ANXX:1:2301:12294:65428  1024    chr1    64540417    14  2S48M   *   0   0   GTTGTGTGTGTGTGTGTGGGTGTGTGTGTGTGTGTTTGTGTTATATGTGT  BABABGGGGGGGGGGGGG/=CCGGGGGGGFGGG>FGGGGF?==F:FGGGG  MD:Z:16T31  PG:Z:MarkDuplicates RG:Z:C8785ANXX.1    NM:i:1  AS:i:43 XS:i:39
HISEQ:287:C8785ANXX:5:2304:7860:85658   0   chr1    64540417    18  2S48M   *   0   0   GTTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTTTGTGTTATATGTGT  BCCBBGGGGEGFGGGGGGGGFDGGGGGGGGGGGGGBGGGGGGGD@GGGFG  MD:Z:48 PG:Z:MarkDuplicates RG:Z:C8785ANXX.5    NM:i:0  AS:i:48 XS:i:43
HISEQ:287:C8785ANXX:5:2101:11743:10794  16  chr1    64575425    60  3M1I46M *   0   0   TGATTTTTTTTTTTCACATATGGAGATTGTATGGTGAGGAAAACCCAAAC  CEFEE<BGGGGEEGEGEGGGGGGGGGEGGGGGGGGGGGGGGGGGGCCCCC  MD:Z:49 PG:Z:MarkDuplicates RG:Z:C8785ANXX.5    NM:i:1  AS:i:46 XS:i:21

And here is the flagstat output from the BAM file:

26021401 + 0 in total (QC-passed reads + QC-failed reads)
0 + 0 secondary
0 + 0 supplementary
8931396 + 0 duplicates
26021401 + 0 mapped (100.00% : N/A)
0 + 0 paired in sequencing
0 + 0 read1
0 + 0 read2
0 + 0 properly paired (N/A : N/A)
0 + 0 with itself and mate mapped
0 + 0 singletons (N/A : N/A)
0 + 0 with mate mapped to a different chr
0 + 0 with mate mapped to a different chr (mapQ>=5)

featurecounts subread • 4.2k views

ADD COMMENT • link 8.0 years ago by James Ashmore ★ 3.5k

1

Entering edit mode

What kind of annotation are you using that would give you 100% overlap with genes?!?! Your original 80% (I have no clue why one would ever want to use --fracOverlap) was more in the range of reasonable. Are you creating fake intergenic genes or something, because otherwise something weird went on.

ADD REPLY • link 8.0 years ago by Devon Ryan 105k

0

Entering edit mode

I'm analysing sequencing data from DNA which has been digested at a specific sequence motif. The reads should all align within the windows created by this digestion. I forgot to mention I actually filtered my BAM file previously to feature counting using BEDtools. Where I removed any reads not 100% covered by a fragment.

ADD REPLY • link 8.0 years ago by James Ashmore ★ 3.5k