Persistent parse and truncation errors sorting BAM files with samtools
0
0
Entering edit mode
6.4 years ago
gewa ▴ 20

Hi, I am using samtools version 1.7 to try and sort a BAM file but am running into a persistent error I can't understand/resolve with Google. The job completes, with the exception that the output (sorted) BAM file's header begins with the warning that: [W::bam_hdr_read] EOF marker is absent. The input is probably truncated

The only output to stderr is: [bam_sort_core] merging from 40 files and 1 in-memory blocks...

I have searched online about the issue and found conflicting opinions on whether this could be ignored or not and was hoping I could get some guidance as to figuring this out. Some sources have suggested looking at the hexdump of the output file to find the correct EOF marker, but I couldn't sort out how to do this - if this is a reasonable idea, could anyone give me some guidance as to how? I I am dowloading the BAM file as-is from the Encode Project and have re-downloaded the file and checked its size and header, there are no indications that there are any problems with it. The first few lines of the input BAM file headerare just:

@HD VN:1.4  SO:coordinate   SO:coordinate
@SQ SN:chr1 LN:248956422
@SQ SN:chr2 LN:242193529
@SQ SN:chr3 LN:198295559

Any advice as to figure out if the file is trucated from Encode (or I'm screwing up the processing some how), or if this error can be ignored?

Thanks

software error alignment • 1.7k views
ADD COMMENT
1
Entering edit mode

"SO:coordinate SO:coordinate" . Is this not already co-ordinate sorted? Are you trying to sort on any other basis?

I doubt the completeness of the downloaded bam though.

Could you please share the link to the file if its a publicly available bam,to cross check?

ADD REPLY
0
Entering edit mode

Hi, Yes, it is already coordinate sorted but I am unsure about all of the reference sequences used and wanted to ensure I could re-sort it if necessary. The file I am using specifically is: https://www.encodeproject.org/files/ENCFF592KJQ/@@download/ENCFF592KJQ.bam The experiment where the file is contained is: https://www.encodeproject.org/experiments/ENCSR537BCG/ (I need the alignments file, which is ENCFF592KJQ.bam).
Could you please let me know if you find anything regarding the completeness of these? I have already re-downloaded the file several times so I'm not sure what else to do.

ADD REPLY
0
Entering edit mode

Let me check from the link provided. Shall update in a few hours,

ADD REPLY
0
Entering edit mode

Hi, I downloaded bam from the link. Using sort of samtools Version: 1.5., on 40 threads , it completed successfully in around 4 minutes.

Hence would you please check the completeness of your download. md5sum should be the below.

0c21b3a10f191a68be528688a90271c3 ENCFF592KJQ.bam

If not so, you can be pretty sure that something went wrong with the download.

ADD REPLY
0
Entering edit mode

I have the same md5sum for my file -- I am assuming then that this issue is with my samtools version and will try 1.5. Thanks so much for the help!

ADD REPLY
0
Entering edit mode

Since it says the problem is in line 1, why don't you start by showing us what the top few lines of your input file look like. Telling us the version of samtools never hurts either.

ADD REPLY
0
Entering edit mode

Hey, I updated the original post with this information + the (improved) results of more troubleshooting (which was really just using the most recent version of samtools, 1.7, available). Let me know if I can add anything else that would be helpful

ADD REPLY

Login before adding your answer.

Traffic: 2406 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6