Hi,
I'm using filterBam (Rsamtools 1.17.36, R 3.1.0) to perform some custom filtering with a BAM file containing single end reads aligned with Bowtie 2 in local mode (soft clipping), and allowing a certain number of multiple alignments (multihits), so that for the same read ID (qname) I'm having multiple alignment locations, map qualities, etc, up to my specified threshold.
I have groups of genomic regions composed of certain repeated features along the genome and I'm interested in completely removing reads cross-aligning to different groups, plus keeping the 'best' possible alignment for those aligning to multiple locations within a group. I found the custom filtering option of filterBam very attractive for such a purpose.
I first tested manually my custom filtering function, which returns a vector of logicals as long as the original number of records, telling which ones to keep in the ouput. The selection works as it should, and the returned records are the ones I want to keep.
However when used within the filterBam function with the FilterRules option, after double checking I noticed I was getting more reads than I should in the output bam. Later with some pretty printing just before the function return I discovered the custom filtering function actually seems to be called twice per filtering call, first time returning the correct number of records to keep, then the second one returning a number of additional reads taken from I don't know where, but clearly not passing the filtering conditions.
EDIT: I could work around this this by filtering reads parsing them from the SAM file and converting to BAM later, but filterBam is much faster and actually a very handy function.
Thanks in advanced for your help and time !