Hi,
I am comparing diffbind and featurecounts based quantification for ATAC-Seq data I quantified first with ATAC-Seq:
First I quantified the reads in intervals using two approached within diffbind as follows:
Way 1
dba_obj <- dba(sampleSheet=diffbind_atacseq_samplesheet, scoreCol=5, minOverlap=1)
count_dba_obj <- dba.count(dba_obj, minOverlap=1, summits=0, score = DBA_SCORE_READS)
Way 2
consensus.peaks <- dba.peakset(dba_obj, bRetrieve=TRUE, minOverlap=1)
count_consensus_obj <- dba.count(dba_obj, peaks=consensus.peaks, score=DBA_SCORE_READS, minOverlap=1, summits=0)
Both gave same count matrix (after obtained counts dataframe with dba.peakset). The peak coordinates are exactly the same.
But since the counts were too low in intervals as compared to one I visualized
I checked using featurecounts by manually creating consensus peaks ( concatenating all peak files and merging by bedtools merge). The reads were high in number.
The peak coordinates were same in diffbind and featurecounts for quantification purpose. for example,
from diffbind
CHR START END DE_r1 DE_r2 hESC_r1 hESC_r2
1 chr1 9998 10599 6 12 8 14
from featurecounts
DE_r1 DE_r2 hESC_r1 hESC_r2
chr1_9998_10599 248 265 191 339
Even counting reads in this region using bedtools agrees with featurecounts.
I am wondering why diffbind and featurecounts are giving different results. featurecounts make sense and seems correct when visualized on IGV but diffbind No. Is there any parameter I am doing wrong in diffbind ? Or some logic I'm misising ?
Please help.
Thank you
Ankit
Wonder if 1 is counting summits and the other is counting the entire broad region.
True in case of featurecounts and same I want to get via diffbind but it's not giving.
I was trying to get the same results with diffbind. I tried to do it with summit = 0, but as per dba.count rdrr.io (the value of summits is TRUE (or 0), the summits will be calculated but the peaksets will be unaffected). So it is not the right parameter to observe the same count matrix as featurecounts. It seems to give the full length coordinate of the peak (or consensu peak) but still quantify across summit and I don't know how many bases upstream and downstream(summit = 0). So If anyone know how to quantify on a full regions (not summit) just like featurecounts, it will be very helpful.
Moreover, if anyone has argument why I should not quantify over a full peak coordinate and just stick to summit based quantification for ATAC-Seq data, I would be happy to welcome the suggestions. Let's say summit=75 (as discussed here https://support.bioconductor.org/p/9150460/#9150477).
Thank you
What happens if you use summits = FALSE, i.e. count over the new footprint of the overlapping regions?