Hello!
I'm working with PE ATAC-seq data and have been using diffBind to generate consensus peaksets with sizes of 401bp centered around the summits. I've noticed a discrepancy when I compare the counts that diffBind reports and counts that I compute using pysam directly on the bams. Here are the parameters that I used for diffBind (v3.0.13):
counts <- dba.count(samples, minOverlap=0.1, score=DBA_SCORE_READS, bRemoveDuplicates=TRUE, bParallel=TRUE)
And for pysam, I use the following filters to remove: unmapped reads, fail QC, is secondary, is unmappped, and is not a proper pair. I only count reads that are fully contained within the 401bp region.
DiffBind consistently reports fewer reads than what I find with my own method. In most cases, the correlations between the reported counts are quite high but not always. Does anyone know what could be causing this difference?
Thanks!
edit: I'm referring to raw counts in case that wasn't clear.
I'm using v3.0.13. I have been filtering duplicates.