Hi,
I am processing hundreds of bam files: merging, sorting, removing duplicates; all this combining samtools and picard-tools. Sometimes (~5%) I get a bam missing the EOF marker (if I run "samtools view -c bam_file" I get this error: "[W::bam_hdr_read] EOF marker is absent. The input is probably truncated.") This happens about 5% of the times, either at an intermediate or a final step, and either after running picard-tools or samtools.
I have analysed some of the "erroneous" bams and regenerated them. The "EOF" error is corrected. But It turns out that the resulting bam is basically the same. For example, it changes its size from 201808668 bytes to 201808713 (just 45 bytes), and the number of reads is not modified.
Has anyone experienced this "random-EOF-missingness" before? Is it something related to the machine? Can I simply ignore these errors?
Thanks, Federico
Interesting. The EOF marker for a BAM file is only 28 bytes, so that it's actually 45 bytes is interesting. That leaves 17 bytes, but the smallest BGZF block is 26 bytes, so it must be data missing off the last block to be written.
The fact that it's 45 bytes every time is actually a pretty big give away that it's samtools causing the issue, possibly in a recent update or something. If you could upload somewhere the last 1Mb of a file which works and a file which was truncated, that would help greatly in figuring out what samtools is aborting on.
This is really strange. I am re-running "samtools view -c" for the same files and... surprise! ... there are no complains about missing EOF characters today.
The 45 bytes difference was just one example, I didn't go through all the cases. Today that file keeps having 45 bytes less than the regenerated one... But now there are no complaints about missing EOFs.
Hm - are you piping the contents of a BAM/SAM to samtools? Because on older versions of samtools if you pass a BAM on the stdin, it always gives the no EOF error. If you knew that samtools version that would be a bit help too :)
I am not piping. Samtools version 1.2 (using htslib 1.2.1). As I said it also happens with picard-tools. But the errors disappeared today (for the same files, haven't modified or regenerated them). I guess it was some issue with the filesystem.
Some versions of samtools will print that warning if you pipe to it from standard in, in which case it is safe to ignore. I have not seen it in other situations.