Question

Why counting by diffbind and featurecounts differ?

0

Entering edit mode

7 months ago

Ankit ▴ 500

Hi,

I am comparing diffbind and featurecounts based quantification for ATAC-Seq data I quantified first with ATAC-Seq:

First I quantified the reads in intervals using two approached within diffbind as follows:

Way 1

dba_obj <-  dba(sampleSheet=diffbind_atacseq_samplesheet,  scoreCol=5, minOverlap=1)
count_dba_obj <- dba.count(dba_obj, minOverlap=1, summits=0, score = DBA_SCORE_READS)

Way 2

consensus.peaks <- dba.peakset(dba_obj, bRetrieve=TRUE, minOverlap=1)
count_consensus_obj <- dba.count(dba_obj, peaks=consensus.peaks, score=DBA_SCORE_READS, minOverlap=1, summits=0)

Both gave same count matrix (after obtained counts dataframe with dba.peakset). The peak coordinates are exactly the same.

But since the counts were too low in intervals as compared to one I visualized

I checked using featurecounts by manually creating consensus peaks ( concatenating all peak files and merging by bedtools merge). The reads were high in number.

The peak coordinates were same in diffbind and featurecounts for quantification purpose. for example,

from diffbind

   CHR START   END DE_r1 DE_r2 hESC_r1 hESC_r2 
1 chr1  9998 10599     6    12       8      14

from featurecounts

                   DE_r1   DE_r2   hESC_r1   hESC_r2
chr1_9998_10599     248     265     191      339

Even counting reads in this region using bedtools agrees with featurecounts.

I am wondering why diffbind and featurecounts are giving different results. featurecounts make sense and seems correct when visualized on IGV but diffbind No. Is there any parameter I am doing wrong in diffbind ? Or some logic I'm misising ?

Please help.

Thank you

Ankit

featurecounts diffbind • 540 views

ADD COMMENT • link updated 7 weeks ago by Ian 6.1k • written 7 months ago by Ankit ▴ 500

0

Entering edit mode

Wonder if 1 is counting summits and the other is counting the entire broad region.

ADD REPLY • link 7 months ago by Yogi ▴ 70

0

Entering edit mode

True in case of featurecounts and same I want to get via diffbind but it's not giving.

I was trying to get the same results with diffbind. I tried to do it with summit = 0, but as per dba.count rdrr.io (the value of summits is TRUE (or 0), the summits will be calculated but the peaksets will be unaffected). So it is not the right parameter to observe the same count matrix as featurecounts. It seems to give the full length coordinate of the peak (or consensu peak) but still quantify across summit and I don't know how many bases upstream and downstream(summit = 0). So If anyone know how to quantify on a full regions (not summit) just like featurecounts, it will be very helpful.

Moreover, if anyone has argument why I should not quantify over a full peak coordinate and just stick to summit based quantification for ATAC-Seq data, I would be happy to welcome the suggestions. Let's say summit=75 (as discussed here https://support.bioconductor.org/p/9150460/#9150477).

Thank you

ADD REPLY • link 7 months ago by Ankit ▴ 500

0

Entering edit mode

What happens if you use summits = FALSE, i.e. count over the new footprint of the overlapping regions?

ADD REPLY • link 7 weeks ago by Ian 6.1k