Hi,
I am using samtools version 1.7 to try and sort a BAM file but am running into a persistent error I can't understand/resolve with Google. The job completes, with the exception that the output (sorted) BAM file's header begins with the warning that: [W::bam_hdr_read] EOF marker is absent. The input is probably truncated
The only output to stderr is: [bam_sort_core] merging from 40 files and 1 in-memory blocks...
I have searched online about the issue and found conflicting opinions on whether this could be ignored or not and was hoping I could get some guidance as to figuring this out. Some sources have suggested looking at the hexdump of the output file to find the correct EOF marker, but I couldn't sort out how to do this - if this is a reasonable idea, could anyone give me some guidance as to how? I I am dowloading the BAM file as-is from the Encode Project and have re-downloaded the file and checked its size and header, there are no indications that there are any problems with it. The first few lines of the input BAM file headerare just:
@HD VN:1.4 SO:coordinate SO:coordinate
@SQ SN:chr1 LN:248956422
@SQ SN:chr2 LN:242193529
@SQ SN:chr3 LN:198295559
Any advice as to figure out if the file is trucated from Encode (or I'm screwing up the processing some how), or if this error can be ignored?
Thanks
"SO:coordinate SO:coordinate" . Is this not already co-ordinate sorted? Are you trying to sort on any other basis?
I doubt the completeness of the downloaded bam though.
Could you please share the link to the file if its a publicly available bam,to cross check?
Hi, Yes, it is already coordinate sorted but I am unsure about all of the reference sequences used and wanted to ensure I could re-sort it if necessary. The file I am using specifically is: https://www.encodeproject.org/files/ENCFF592KJQ/@@download/ENCFF592KJQ.bam The experiment where the file is contained is: https://www.encodeproject.org/experiments/ENCSR537BCG/ (I need the alignments file, which is ENCFF592KJQ.bam).
Could you please let me know if you find anything regarding the completeness of these? I have already re-downloaded the file several times so I'm not sure what else to do.
Let me check from the link provided. Shall update in a few hours,
Hi, I downloaded bam from the link. Using sort of samtools Version: 1.5., on 40 threads , it completed successfully in around 4 minutes.
Hence would you please check the completeness of your download. md5sum should be the below.
0c21b3a10f191a68be528688a90271c3 ENCFF592KJQ.bam
If not so, you can be pretty sure that something went wrong with the download.
I have the same md5sum for my file -- I am assuming then that this issue is with my samtools version and will try 1.5. Thanks so much for the help!
Since it says the problem is in line 1, why don't you start by showing us what the top few lines of your input file look like. Telling us the version of samtools never hurts either.
Hey, I updated the original post with this information + the (improved) results of more troubleshooting (which was really just using the most recent version of samtools, 1.7, available). Let me know if I can add anything else that would be helpful