Bam files not accepted in dba.count
1
0
Entering edit mode
3.7 years ago

The script I had working 2 years ago for DiffBind no longer works. Dba.count seems to reject files it accepted before.

   sets_counted_g=dba.count(sets_g)
Computing summits...
Sample: H:/(filepath truncated)/file_sorted.bam125  
Re-centering peaks...
Indexing H:/(filepath truncated)/file_sorted.bam
[E::hts_idx_push] NO_COOR reads not in a single block at the end 2 -1
Error in FUN(X[[i]], ...) : failed to build index
  file: H:/CH_DATA/ATAC/2020_Second_postprocessing/bam/PoolCH-7_sorted.bam
In addition: Warning message:
Parallel execution unavailable: executing serially.

I thought I'd sorted this file by position, but I went ahead and re-sorted it using

samtools sort file_sorted.bam -o file_sorted_byposition.bam

The result still fails to count:

sets_counted_g=dba.count(sets_g)
Computing summits...
Sample: H:/(filepath truncated)/file_sorted_byposition.bam125 
Re-centering peaks...
Error in colnames(result)[4:ncol(result)] <- colnames(classes) : 
  replacement has length zero
In addition: Warning message:
Parallel execution unavailable: executing serially

Tagged this with both samtools and Diffbind because it looks like a samtools problem, but the script worked as-is 2 years ago for DiffBind. So maybe DiffBind is involved. Picard ValidateSamFile returned no errors on the file.

samtools diffbind dba.count bam • 3.7k views
ADD COMMENT
0
Entering edit mode

what is the output of

samtools quickcheck file_sorted_byposition.bam

is file_sorted_byposition.bam indexed with samtools index ?

ADD REPLY
0
Entering edit mode

Thanks Pierre,

Quickcheck exited without error. I did generate an index for file_sorted_byposition.bam, but DiffBind doesn't call for indexes.

ADD REPLY
0
Entering edit mode

The same error appears when a sampleSheet with only one line is used.

Reproducible from the vignette data:

library(DiffBind)
library(QuantitativeChIPseqWorkshop)

setwd(paste(system.file("extdata", package="QuantitativeChIPseqWorkshop")))

samples <- read.csv("tamoxifen.csv")

tam.peaks <- dba(sampleSheet=samples[1,])

tam.counts <- dba.count(tam.peaks)

Computing summits... 
Re-centering peaks... 
Error in colnames(result)[4:ncol(result)] <- colnames(classes) :    replacement has length zero

Is it the expected behaviour?

ADD REPLY
0
Entering edit mode

I can't say I'm surprised, as the package is meant to compare samples, so there's not much point in running it with only 1. You might try posting over on the Bioconductor support forum, as the package dev is active there. They may want to catch that error so that it's more informative.

ADD REPLY
0
Entering edit mode
3.5 years ago
Rory Stark ★ 2.1k

I can verify that this is a bug and not expected behavior. One of those edge cases in R where something that is usually a matrix gets turned into a vector when there is only one column. I'll address this as a bugfix after the Bioconductor 2.13 release. I'll update this issue here when a fix is checked in.

Regarding index files, DiffBind does expect there to be matching index files for each of the bam files. If they are missing, it will attempt to create new ones; if this fails, so will read counting.

ADD COMMENT
0
Entering edit mode

Hi Rory, Could you please comment on how to call for the matching index files in DiffBind? I could not find any information in bioconductor. Thank you.

ADD REPLY
1
Entering edit mode

I'm not positive but I believe if you have the corresponding .bai (indexed file) for each .bam in the same directory you are pointing to in your Samplesheet, DiffBind will find and use these. Also, you can use samtools index to create these files. But like Rory said, DiffBind creates these regardless.

ADD REPLY
0
Entering edit mode

Yes, for each XX.bam file DiffBind looks for a XX.bam.bai or XX.bai file. If it is missing, it will try to create one so long as the current user has permission to write into the directory.

ADD REPLY
0
Entering edit mode

Hi Rory Stark

I am experiencing the same issue with DiffBind (3.14.0). I am using a previous dataset which only 1 replicate per condition :(

dbObj_1 <- dba(sampleSheet=samples, dir=".")

# This outputs as:
# 1 Samples, 4497 sites in matrix:
#         ID Tissue Factor Condition      Treatment Replicate Caller Intervals
# 1 Drought_1   Root     ER   Drought Drought-stress         1 narrow      4497


dbObj <- dba.count(dbObj_1)


# Computing summits...
# Re-centering peaks...
# Error in colnames(result)[4:ncol(result)] <- colnames(classes) : 
#  replacement has length zero

Was a solution found? or is there a work around here? I have tried adjusting some of the parameters, but I feel like I'm missing something.

ADD REPLY

Login before adding your answer.

Traffic: 2194 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6