I'm working through csaw
for paired-end ATAC-seq data (see my other post for more background on my general workflow), and I don't think my BAM files are being properly read into R. I have used numerous files that are coordinate-sorted (also have tried name-sorted) and indexed (via either samtools
or Rsamtools
) and I keep getting these Htslib
errors and warnings upon csaw::windowCounts
or csaw::regionCounts
:
[W::bam_hdr_read] bgzf_check_EOF: No such file or directory
[E::bgzf_read] Read block operation failed with error -1 after zd of zu bytes
As the command continues, the first warning appears many times. The files ultimately end up loading, but I believe they are only partially read, as none of my macs2
peaks on chr1 show up (0 counts across all samples) and chrX and chrY do not appear to be read at all.
Example for
> standard.chr <- paste0("chr", c(1:19, "X", "Y"))
> param <- readParam(max.frag=1000, pe="both",
discard=blacklist, restrict=standard.chr)
> binned <- windowCounts(pe.bams, bin=TRUE, width=10000, param=param)
> binned@rowRanges@seqnames
factor-Rle of length 211698 with 19 runs
Lengths: 4 12711 11876 11694 11717 12151 10069 9504 9176 8749 5822 17839 15645 15223 14806 14637 14194 12570 3311
values : chr1 chr10 chr11 chr12 chr13 chr14 chr15 chr16 chr17 chr18 chr19 chr2 chr3 chr4 chr5 chr6 chr7 chr8 chr9
Levels(21): chr1 chr10 chr11 chr12 chr13 chr14 chr15 chr16 chr17 chr18 ... chr3 chr4 chr5 chr6 chr7 chr8 chr9 chrX chrY
As you can see, no 10kb bins were called for chrX or chrY, and only 4 were called for chr1.
I'm pretty certain the BAM file is not corrupted as the headers and EOF exist, and samtools quickcheck
elicits no errors.
Example of header via samtools view -H file.sorted.bam
@HD VN:1.5 SO:coordinate
@SQ SN:chr1 LN:195471971
@SQ SN:chr10 LN:130694993
@SQ SN:chr11 LN:122082543
@SQ SN:chr12 LN:120129022
@SQ SN:chr13 LN:120421639
@SQ SN:chr14 LN:124902244
@SQ SN:chr15 LN:104043685
@SQ SN:chr16 LN:98207768
@SQ SN:chr17 LN:94987271
@SQ SN:chr18 LN:90702639
@SQ SN:chr19 LN:61431566
@SQ SN:chr1_GL456210_random LN:169725
@SQ SN:chr1_GL456211_random LN:241735
@SQ SN:chr1_GL456212_random LN:153618
@SQ SN:chr1_GL456213_random LN:39340
@SQ SN:chr1_GL456221_random LN:206961
@SQ SN:chr2 LN:182113224
@SQ SN:chr3 LN:160039680
@SQ SN:chr4 LN:156508116
@SQ SN:chr4_GL456216_random LN:66673
@SQ SN:chr4_JH584292_random LN:14945
@SQ SN:chr4_GL456350_random LN:227966
@SQ SN:chr4_JH584293_random LN:207968
@SQ SN:chr4_JH584294_random LN:191905
@SQ SN:chr4_JH584295_random LN:1976
@SQ SN:chr5 LN:151834684
@SQ SN:chr5_JH584296_random LN:199368
@SQ SN:chr5_JH584297_random LN:205776
@SQ SN:chr5_JH584298_random LN:184189
@SQ SN:chr5_GL456354_random LN:195993
@SQ SN:chr5_JH584299_random LN:953012
@SQ SN:chr6 LN:149736546
@SQ SN:chr7 LN:145441459
@SQ SN:chr7_GL456219_random LN:175968
@SQ SN:chr8 LN:129401213
@SQ SN:chr9 LN:124595110
@SQ SN:chrX LN:171031299
@SQ SN:chrX_GL456233_random LN:336933
@SQ SN:chrY LN:91744698
@SQ SN:chrY_JH584300_random LN:182347
@SQ SN:chrY_JH584301_random LN:259875
@SQ SN:chrY_JH584302_random LN:155838
@SQ SN:chrY_JH584303_random LN:158099
@SQ SN:chrUn_GL456239 LN:40056
@SQ SN:chrUn_GL456367 LN:42057
@SQ SN:chrUn_GL456378 LN:31602
@SQ SN:chrUn_GL456381 LN:25871
@SQ SN:chrUn_GL456382 LN:23158
@SQ SN:chrUn_GL456383 LN:38659
@SQ SN:chrUn_GL456385 LN:35240
@SQ SN:chrUn_GL456390 LN:24668
@SQ SN:chrUn_GL456392 LN:23629
@SQ SN:chrUn_GL456393 LN:55711
@SQ SN:chrUn_GL456394 LN:24323
@SQ SN:chrUn_GL456359 LN:22974
@SQ SN:chrUn_GL456360 LN:31704
@SQ SN:chrUn_GL456396 LN:21240
@SQ SN:chrUn_GL456372 LN:28664
@SQ SN:chrUn_GL456387 LN:24685
@SQ SN:chrUn_GL456389 LN:28772
@SQ SN:chrUn_GL456370 LN:26764
@SQ SN:chrUn_GL456379 LN:72385
@SQ SN:chrUn_GL456366 LN:47073
@SQ SN:chrUn_GL456368 LN:20208
@SQ SN:chrUn_JH584304 LN:114452
@PG ID:bowtie2 PN:bowtie2 VN:2.2.6 ...
# rest of header...
And here's an example read via samtools view file.sorted.bam | head
:
NS500653:255:H2YWNAFXY:3:11506:4441:16945 99 chr1 3000222 27 75M = 3000302 15 GTCTAGGAAGTTGTCCATTTCATCCAGGTTTTCCTGGTTTTTTTTTAGTATAGCCTTTCATAGTAGAATCTGATG AAAAAEEEEEEEEEEEEEEEEEEEEEEEEEEEEAEEEEEEEEEEEE/EEEEEAEAEEEEAE/E<A<AAA6EEEEA MD:Z:65A9 PG:Z:MarkDuplicates XG:i:0 NM:i:1 XM:i:1 XN:i:0 XO:i:0 AS:i:-4 XS:i:-41 YS:i:-5YT:Z:CP
Any suggestions for how to fix these issues when loading BAMs into csaw
? I'm working on Windows, so maybe it would help to try this out on the Centos cluster.