Error sorted paired-end .bam using samtools
4
1
Entering edit mode
8.0 years ago

Hi

I created some .bam files aligning reads to the human genome using ernebs5 http://erne.sourceforge.net/manual.php

I had both paired-end and singletons reads which I aligned seperately I have had no problem manipulating the singleon bam files in samtool However, my paired-end read files won't sort. Here is the flagstat for one:

117254947 + 616693 in total (QC-passed reads + QC-failed reads)
0 + 0 secondary
0 + 0 supplementary
0 + 0 duplicates
103481375 + 0 mapped (88.25% : 0.00%)
117254947 + 616693 paired in sequencing
58925941 + 9879 read1
58329006 + 606814 read2
264604 + 0 properly paired (0.23% : 0.00%)
97335658 + 0 with itself and mate mapped
6145717 + 0 singletons (5.24% : 0.00%)
2896848 + 0 with mate mapped to a different chr
1538594 + 0 with mate mapped to a different chr (mapQ>=5)

I can convert it to a .sam file and to my (inexperienced) eye it looks fine and similar to the singletons alignments. However, when I try to sort I get an error, that the chromosome labels are found I the binary header but not the text header? I don't understand this and why it did not affect the singleton alignments (aligned against the same reference)

[ bam_sort_core] merging from 85 files...

[E::trans_tbl_add_sq] @SQ SN (chr1) found in binary header but not text header.
[E::trans_tbl_add_sq] @SQ SN (chr10) found in binary header but not text header.
[E::trans_tbl_add_sq] @SQ SN (chr11) found in binary header but not text header.
[E::trans_tbl_add_sq] @SQ SN (chr11_gl000202_random) found in binary header but not text header.
[E::trans_tbl_add_sq] @SQ SN (chr12) found in binary header but not text header.
[E::trans_tbl_add_sq] @SQ SN (chr13) found in binary header but not text header.
[E::trans_tbl_add_sq] @SQ SN (chr14) found in binary header but not text header.
[E::trans_tbl_add_sq] @SQ SN (chr15) found in binary header but not text header.
[E::trans_tbl_add_sq] @SQ SN (chr16) found in binary header but not text header.
[E::trans_tbl_add_sq] @SQ SN (chr17) found in binary header but not text header.
[E::trans_tbl_add_sq] @SQ SN (chr17_ctg5_hap1) found in binary header but not text header.
[E::trans_tbl_add_sq] @SQ SN (chr17_gl000203_random) found in binary header but not text header.
[E::trans_tbl_add_sq] @SQ SN (chr17_gl000204_random) found in binary header but not text header.
[E::trans_tbl_add_sq] @SQ SN (chr17_gl000205_random) found in binary header but not text header.
[E::trans_tbl_add_sq] @SQ SN (chr17_gl000206_random) found in binary header but not text header.
[E::trans_tbl_add_sq] @SQ SN (chr18) found in binary header but not text header.
[E::trans_tbl_add_sq] @SQ SN (chr18_gl000207_random) found in binary header but not text header.
[E::trans_tbl_add_sq] @SQ SN (chr19) found in binary header but not text header.
[E::trans_tbl_add_sq] @SQ SN (chr19_gl000208_random) found in binary header but not text header.
[E::trans_tbl_add_sq] @SQ SN (chr19_gl000209_random) found in binary header but not text header.
[E::trans_tbl_add_sq] @SQ SN (chr1_gl000191_random) found in binary header but not text header.
[E::trans_tbl_add_sq] @SQ SN (chr1_gl000192_random) found in binary header but not text header.
[E::trans_tbl_add_sq] @SQ SN (chr2) found in binary header but not text header.
[E::trans_tbl_add_sq] @SQ SN (chr20) found in binary header but not text header.
[E::trans_tbl_add_sq] @SQ SN (chr21) found in binary header but not text header.
[E::trans_tbl_add_sq] @SQ SN (chr21_gl000210_random) found in binary header but not text header.
[E::trans_tbl_add_sq] @SQ SN (chr22) found in binary header but not text header.
[E::trans_tbl_add_sq] @SQ SN (chr3) found in binary header but not text header.
[E::trans_tbl_add_sq] @SQ SN (chr4) found in binary header but not text header.
[E::trans_tbl_add_sq] @SQ SN (chr4_ctg9_hap1) found in binary header but not text header.
[E::trans_tbl_add_sq] @SQ SN (chr4_gl000193_random) found in binary header but not text header.
[E::trans_tbl_add_sq] @SQ SN (chr4_gl000194_random) found in binary header but not text header.
[E::trans_tbl_add_sq] @SQ SN (chr5) found in binary header but not text header.
[E::trans_tbl_add_sq] @SQ SN (chr6) found in binary header but not text header.
[E::trans_tbl_add_sq] @SQ SN (chr6_apd_hap1) found in binary header but not text header.
[E::trans_tbl_add_sq] @SQ SN (chr6_cox_hap2) found in binary header but not text header.
[E::trans_tbl_add_sq] @SQ SN (chr6_dbb_hap3) found in binary header but not text header.
[E::trans_tbl_add_sq] @SQ SN (chr6_mann_hap4) found in binary header but not text header.
[E::trans_tbl_add_sq] @SQ SN (chr6_mcf_hap5) found in binary header but not text header.
[E::trans_tbl_add_sq] @SQ SN (chr6_qbl_hap6) found in binary header but not text header.
[E::trans_tbl_add_sq] @SQ SN (chr6_ssto_hap7) found in binary header but not text header.
[E::trans_tbl_add_sq] @SQ SN (chr7) found in binary header but not text header.
[E::trans_tbl_add_sq] @SQ SN (chr7_gl000195_random) found in binary header but not text header.
[E::trans_tbl_add_sq] @SQ SN (chr8) found in binary header but not text header.
[E::trans_tbl_add_sq] @SQ SN (chr8_gl000196_random) found in binary header but not text header.
[E::trans_tbl_add_sq] @SQ SN (chr8_gl000197_random) found in binary header but not text header.
[E::trans_tbl_add_sq] @SQ SN (chr9) found in binary header but not text header.
[E::trans_tbl_add_sq] @SQ SN (chr9_gl000198_random) found in binary header but not text header.
[E::trans_tbl_add_sq] @SQ SN (chr9_gl000199_random) found in binary header but not text header.
[E::trans_tbl_add_sq] @SQ SN (chr9_gl000200_random) found in binary header but not text header.
[E::trans_tbl_add_sq] @SQ SN (chr9_gl000201_random) found in binary header but not text header.
[E::trans_tbl_add_sq] @SQ SN (chrM) found in binary header but not text header.
[E::trans_tbl_add_sq] @SQ SN (chrUn_gl000211) found in binary header but not text header.
[E::trans_tbl_add_sq] @SQ SN (chrUn_gl000212) found in binary header but not text header.
[E::trans_tbl_add_sq] @SQ SN (chrUn_gl000213) found in binary header but not text header.
[E::trans_tbl_add_sq] @SQ SN (chrUn_gl000214) found in binary header but not text header.
[E::trans_tbl_add_sq] @SQ SN (chrUn_gl000215) found in binary header but not text header.
[E::trans_tbl_add_sq] @SQ SN (chrUn_gl000216) found in binary header but not text header.
[E::trans_tbl_add_sq] @SQ SN (chrUn_gl000217) found in binary header but not text header.
[E::trans_tbl_add_sq] @SQ SN (chrUn_gl000218) found in binary header but not text header.
[E::trans_tbl_add_sq] @SQ SN (chrUn_gl000219) found in binary header but not text header.
[E::trans_tbl_add_sq] @SQ SN (chrUn_gl000220) found in binary header but not text header.
[E::trans_tbl_add_sq] @SQ SN (chrUn_gl000221) found in binary header but not text header.
[E::trans_tbl_add_sq] @SQ SN (chrUn_gl000222) found in binary header but not text header.
[E::trans_tbl_add_sq] @SQ SN (chrUn_gl000223) found in binary header but not text header.
[E::trans_tbl_add_sq] @SQ SN (chrUn_gl000224) found in binary header but not text header.
[E::trans_tbl_add_sq] @SQ SN (chrUn_gl000225) found in binary header but not text header.
[E::trans_tbl_add_sq] @SQ SN (chrUn_gl000226) found in binary header but not text header.
[E::trans_tbl_add_sq] @SQ SN (chrUn_gl000227) found in binary header but not text header.
[E::trans_tbl_add_sq] @SQ SN (chrUn_gl000228) found in binary header but not text header.
[E::trans_tbl_add_sq] @SQ SN (chrUn_gl000229) found in binary header but not text header.
[E::trans_tbl_add_sq] @SQ SN (chrUn_gl000230) found in binary header but not text header.
[E::trans_tbl_add_sq] @SQ SN (chrUn_gl000231) found in binary header but not text header.
[E::trans_tbl_add_sq] @SQ SN (chrUn_gl000232) found in binary header but not text header.
[E::trans_tbl_add_sq] @SQ SN (chrUn_gl000233) found in binary header but not text header.
[E::trans_tbl_add_sq] @SQ SN (chrUn_gl000234) found in binary header but not text header.
[E::trans_tbl_add_sq] @SQ SN (chrUn_gl000235) found in binary header but not text header.
[E::trans_tbl_add_sq] @SQ SN (chrUn_gl000236) found in binary header but not text header.
[E::trans_tbl_add_sq] @SQ SN (chrUn_gl000237) found in binary header but not text header.
[E::trans_tbl_add_sq] @SQ SN (chrUn_gl000238) found in binary header but not text header.
[E::trans_tbl_add_sq] @SQ SN (chrUn_gl000239) found in binary header but not text header.
[E::trans_tbl_add_sq] @SQ SN (chrUn_gl000240) found in binary header but not text header.
[E::trans_tbl_add_sq] @SQ SN (chrUn_gl000241) found in binary header but not text header.
[E::trans_tbl_add_sq] @SQ SN (chrUn_gl000242) found in binary header but not text header.
[E::trans_tbl_add_sq] @SQ SN (chrUn_gl000243) found in binary header but not text header.
[E::trans_tbl_add_sq] @SQ SN (chrUn_gl000244) found in binary header but not text header.
[E::trans_tbl_add_sq] @SQ SN (chrUn_gl000245) found in binary header but not text header.
[E::trans_tbl_add_sq] @SQ SN (chrUn_gl000246) found in binary header but not text header.
[E::trans_tbl_add_sq] @SQ SN (chrUn_gl000247) found in binary header but not text header.
[E::trans_tbl_add_sq] @SQ SN (chrUn_gl000248) found in binary header but not text header.
[E::trans_tbl_add_sq] @SQ SN (chrUn_gl000249) found in binary header but not text header.
[E::trans_tbl_add_sq] @SQ SN (chrX) found in binary header but not text header.
[E::trans_tbl_add_sq] @SQ SN (chrY) found in binary header but not text header.

I get this using the -n flag in sort or without i.e. sort -n Sample5c_paired.bam > sorted.bam

Am I missing something very obvious about sorting paired-end .bam files in samtools? or is there something round with my alignment file? Any help very gratefully appreciated.

Jo

Samtools ERNE paired-end BAM • 3.3k views
ADD COMMENT
0
Entering edit mode

Thanks genomax2 and MacSpider

Sorry not to be clear. I am using samtools sort to sort by .bam files. Its weird I don't have the same issue with the singleton reads even though its got the same headers. Anyway, I will try the new samtools version (and now I think I can just delete the @SQ SN headers and keep going if necessary)

Thanks!!

ADD REPLY
0
Entering edit mode

Hi again

OI am using samtools 1.3.1 which seem to be the latest? Jo

ADD REPLY
0
Entering edit mode

Please use ADD COMMENT/ADD REPLY when responding to existing posts to keep threads logically organized.

ADD REPLY
3
Entering edit mode
7.9 years ago
John 13k

The BAM format has two headers. One in binary, and one in text. The binary header only contains chromosome names/lengths. Sometimes these two headers come out-of-sync.

To debug you can try and use pybam:

Download pybam from https://github.com/JohnLonginotto/pybam/blob/master/pybam.py Whatever directory that downloads to, go there in the terminal and run python (2.x), and in the python terminal type:

import pybam
bam_data = pybam.bgunzip('/path/to/your.bam') # change the path :)
print bam_data.header_text                    # The text header
print bam_data.chromosome_names               # From the binary header
print bam_data.chromosome_lengths             # From the binary header

If they don't match, you'll need to reheader the BAM file, which is a very error-prone process depending on what the above result gives you (i.e. if you can just alter the text portion of the header, or if you need to write a new binary bit).

ADD COMMENT
2
Entering edit mode
8.0 years ago
GenoMax 147k

Are you using the latest samtools? If not that would be the first thing to try. This error has been referenced in this thread and appears to have been fixed.

If you sorted your files with unix sort then follow @Macspider's suggestion below.

ADD COMMENT
0
Entering edit mode

Hi I am using samtools 1.3.1

ADD REPLY
0
Entering edit mode
8.0 years ago

found in binary header but not text header

You are trying to sort a bam file, which is the binary -machine readable- sam. To do so, you should take care of the header. Take a moment to check the difference between samtools view file.bam and samtools view -h file.bam. You will notice that the header appears.

What you should do in these cases to sort it is to use, for practical reasons, samtools sort. Which also allows you to order either by position or by name. bamtools has also a sort sub-command, if you prefer it to samtools.

Sorting with the normal built-in sort won't work on the bam file! This because of the header that is composed of lines starting with @ therefore generating a corrupted sam file.

ADD COMMENT
0
Entering edit mode

I am using samtools sort not unix sort

ADD REPLY
0
Entering edit mode
7.9 years ago

Hi

I am still stuck on this.

It does not happen when I sort a singleton reads file also created by ernebs5 and with identical @SQ headers. It is not due to memory constraints as I have diverted the temp files to a big enough repository and checked the usage in our HPC facility.

I'd still be graeteful for any more ideas

Jo

ADD COMMENT
0
Entering edit mode

I suggest that you post this question to Samtools mailing list to make them aware of the problem: https://lists.sourceforge.net/lists/listinfo/samtools-help

I will tag John Marshall who is the official maintainer of Samtools.

ADD REPLY
0
Entering edit mode

Thank-you genomax2, I have done as you suggested.

ADD REPLY
1
Entering edit mode

Hi

Just to close this loop

The problem was Erne does not output @SQ headers. It looks that way when I use samtools view because samtools automatically generates basic ones. No @SQ headers causes samtools sort to fail, but only when the alignments a big enough to need to merge (hence the singletons worked but the paired did not) Samtools 1.3.1 does not have this problem, but I was running two versions of samtools on my HPC and samtools 1.3 was failing the files.

Thanks very much for the help and the first suggestion to update samtools was right!

ADD REPLY
0
Entering edit mode

If Erne is not outputting SQ headers, that's frankly a very worrying sign. While you have managed to work around the issue by adding in SQ headers, it makes me extremely suspicious of Erne as an aligner to begin with. I hadn't heard of it until now, and if it doesn't write out spec-compliant BAM data, that's a serious issue. I'd consider using a more mainstream aligner if possible...

At the very least, run your BAM file through Picard's ValidateSamFile to make sure there aren't more surprises waiting for you :)

ADD REPLY

Login before adding your answer.

Traffic: 1762 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6