Hello fellow Biostars,
I recently stumbled across the phenomenon that FeatureCounts fails to count fragments instead of reads, despite the command option -p. This only seems to happen when the input BAM file starts with several multialigning reads and contains the tag NH:i:10. The problem then disappears when I remove reads from the first 50-100 lines, if they contain the beforementioned tag.
Do you know why FeatureCounts switches from fragment to the default read count if the beginning of a file has too many reads that align multiple times, e.g. 10x? Does the programme perform some sort of initial checkup?
(I assume that the library preparation had something to do with the abundancy of multialigning reads - I work with RNAseq data from a global RNA library preparation protocol (Illumina Ribo Zero Gold).)
Looking forward to your enlightening answers! :-)
Are the mate reads of these multimappers still in the BAM file?
Yes, I only filtered for "listed first in the file", something like this:
The BAM header length is 3319, that's where the record numbers in the AWK command come from; the NH-tag is usually listed in column 22 or 23. I basically checked between lines 3319 and 3369 and left the mates untouched.