Entering edit mode
3.2 years ago
halffedelf
▴
40
db <- dba(samplesheet)
This gives me:
db
9 Samples, 138044 sites in matrix (193226 total)
then I do db.cnt <- dba.count(db)
db.cnt
9 Samples, 55893 sites in matrix
I wonder why dba.count is removing more than half of the intervals?
I figured filter and filterFun was causing removal of such large number of intervals. I had the score parameter set to DBA_SCORE_RPKM. I am wondering, in this case, if "filter" was applied using DBA_SCORE_RPKM or raw read counts? Because by default, filter = 5 and filterFun = max, so does it mean peaks are filtered for 5 read counts, or 5 RPKM (since my score=DBA_SCORE_RPKM), in at least 1 sample?
I removed score=DBA_SCORE_RPKM and now it retains 98% of the intervals. I think it is a very important thing to mention in the documentation. Filter = 5 will filter very differently between raw reads and RPKM. RPKM>5 is a very very stringent filtering, whereas raw reads>5 is not so much.