dba.count reduces binding matrix, why?
1
0
Entering edit mode
3.2 years ago
halffedelf ▴ 40
db <- dba(samplesheet)

This gives me:

db
9 Samples, 138044 sites in matrix (193226 total)

then I do db.cnt <- dba.count(db)

db.cnt
9 Samples, 55893 sites in matrix

I wonder why dba.count is removing more than half of the intervals?

diffbind3 bioconductor diffbind atac-seq chip-seq • 1.3k views
ADD COMMENT
0
Entering edit mode

I figured filter and filterFun was causing removal of such large number of intervals. I had the score parameter set to DBA_SCORE_RPKM. I am wondering, in this case, if "filter" was applied using DBA_SCORE_RPKM or raw read counts? Because by default, filter = 5 and filterFun = max, so does it mean peaks are filtered for 5 read counts, or 5 RPKM (since my score=DBA_SCORE_RPKM), in at least 1 sample?

ADD REPLY
0
Entering edit mode

I removed score=DBA_SCORE_RPKM and now it retains 98% of the intervals. I think it is a very important thing to mention in the documentation. Filter = 5 will filter very differently between raw reads and RPKM. RPKM>5 is a very very stringent filtering, whereas raw reads>5 is not so much.

ADD REPLY
0
Entering edit mode
3.1 years ago
Rory Stark ★ 2.1k

What version is this?

The current default should be max RPKM of 1.

ADD COMMENT

Login before adding your answer.

Traffic: 1635 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6