Is My Bam File Sorted ?
5
50
Entering edit mode
13.8 years ago

Hi all,

I wonder if there is any tool (in samtools ?) to quickly know whether a BAM file has been sorted or not.

(longer story: I've merged a set of sorted bams using 'samtools merge' and I've then used MarkDuplicates with the option: 'assume_sorted=true'. Am I right ?)

Thanks,

Pierre

UPDATE: I wrote a post about it : http://plindenbaum.blogspot.com/2011/02/testing-if-bam-file-is-sorted-using.html

bam next-gen sequencing sort • 72k views
ADD COMMENT
1
Entering edit mode

For those just looking for a quick way to do this using modern samtools, see martin.triska's answer further down.

ADD REPLY
0
Entering edit mode

Hi Pierre, do you have similar solution to check if sam file is sorted or not? Thank you

ADD REPLY
0
Entering edit mode

just pipe your SAM into 'samtools view' before using my program.

ADD REPLY
0
Entering edit mode

got it. Thank you :)

ADD REPLY
0
Entering edit mode

Hi Pierre, You didn't give us the explicit answer yet? Do we still need sort the bam after we 'samtools merge' sorted small bams? I made a small test and find the merged sorted bam are already sorted.

ADD REPLY
40
Entering edit mode
13.8 years ago

You can use the sort order (SO) flag in the header to check if the file has been sorted:

% samtools view -H 5_110118_FC62VT6AAXX-hg18-unsort.bam
@HD    VN:1.0    SO:unsorted
% samtools view -H 5_110118_FC62VT6AAXX-hg18-sort.bam
@HD    VN:1.0    SO:coordinate

Unfortunately samtools index will work on both types of files without raising a command-line error, but will produce a smaller index file which is indicative of a problem:

% samtools index 5_110118_FC62VT6AAXX-hg18-sort.bam 
% echo $? 
0
% samtools index 5_110118_FC62VT6AAXX-hg18-unsort.bam 
% echo $?
0
% ls -lh *.bai 
chapman users 6.2M 2011-02-01 06:46 5_110118_FC62VT6AAXX-hg18-sort.bam.bai
chapman users  408 2011-02-01 06:47 5_110118_FC62VT6AAXX-hg18-unsort.bam.bai

Edit:

From the comments it appears as if samtools will sometimes raise an error in this case. Here is what an unsorted BAM file that finishes with a truncated index and no error messages looks like:

% samtools view 5_110118_FC62VT6AAXX-hg18-unsort.bam | head
HWI-EAS264:5:1:1353:936#0       77      *       0       0       *       *       0       0       NTTTCCATTCCATTATGGATGATTCCATTCCATTGTATTC        ########################################        XM:i:0
HWI-EAS264:5:1:1353:936#0       141     *       0       0       *       *       0       0       TGGAACGGAATGGAATGGAATGGAATGGAATCAATCCGAC        DDGGDEGGGGDG@BGGG=@G:EFFEEB4FF3BBB@B<EBA        XM:i:0
HWI-EAS264:5:1:1463:944#0       99      chr4    79057890        255     40M     =       79058357       507      NTAGATATGTAAATAAAGTACAGGAAAATGTGGTGAGTAT        ########################################       XA:i:1   MD:Z:0A39       NM:i:1
HWI-EAS264:5:1:1463:944#0       147     chr4    79058357        255     40M     =       79057890       -507     AAAATATATACCAGGTAGATAAAGGGTCGTTCGTATTTCA        @B2EEDEB8@ED@DE?,A:?:DGEGDD:GGA@=8@182@A       XA:i:0   MD:Z:40 NM:i:0
HWI-EAS264:5:1:1534:933#0       77      *       0       0       *       *       0       0       NACTCCATTCAATTCCATTCCTTTTAGCTCGGGTTGTTTC        ########################################        XM:i:0
HWI-EAS264:5:1:1534:933#0       141     *       0       0       *       *       0       0       AATGGAATGGAATGGAATAAATCCCGGGGGAGTGGAATGG        BD4CC=G@D@BB-A?BB-:B==DB,==*==)4&::A=>CB        XM:i:0
HWI-EAS264:5:1:1691:931#0       99      chr18   28072372        255     40M     =       28072771       439      NCTGGGTTTTTATGTCTTCCAGCTAGTAAGTGCTGGAGCT        #322244442<:22<@@C@C@@;@;646666446664536       XA:i:1   MD:Z:0G39       NM:i:1
HWI-EAS264:5:1:1691:931#0       147     chr18   28072771        255     40M     =       28072372       -439     GTTCTCCCGCCTGCACCTCTCTCTGTGTGTTGGCTTTGTT        >B@D>ADDBDD>BD-DDDEDGGFHIIFI@HIIIHHIIIIH       XA:i:0   MD:Z:40 NM:i:0
ADD COMMENT
3
Entering edit mode

STrange... we have sorted BAM files with @HD VN:1.0 SO:unsorted in the header... I need to investigate further it seems...

ADD REPLY
1
Entering edit mode

Thanks very much Heng, that did it. I now get the error '[bam_index_core] the alignment is not sorted: reads without coordinates prior to reads with coordinates.' I appreciate you looking into this.

ADD REPLY
0
Entering edit mode

The only problem is that changing that flag is program dependent and not required by the BAM spec. Picard does set the flag correctly, although samtools sort does not. I remember reading on SeqAnswers that Heng was planning on implementing this to avoid confusion; he can probably chime in if that's still a plan. Another tool to check for "bad" indexes on unsorted files is 'samtools idxstats' which might be nicer than looking at the size.

ADD REPLY
0
Entering edit mode

Which version of samtools is indexing an unsorted file? I happen to have version 0.1.8 (r613) here and it stops with a message that the alignments are not sorted if I try to index a file sorted by queryname.

ADD REPLY
0
Entering edit mode

That's with the most recent version: 0.1.12a (r862). I would prefer to get an error.

ADD REPLY
0
Entering edit mode

Same observation as keith I do get an error: [mdo041@astrakan ~]$ samtools-0.1.12a/samtools index test.fas.blat.sam.bam [bam_index_core] the alignment is not sorted (NG-1755): 2158255 > 2132773 in 1-th chr using a newly imported sam file. Brad: I think the reason for not seeing an error is that the file you are trying is actually sorted just by chance. So I don't know about the flags.

ADD REPLY
0
Entering edit mode

Try the version here: https://github.com/lh3/samtools

ADD REPLY
0
Entering edit mode

Same result for me: [bam_index_core] the alignment is not sorted

ADD REPLY
0
Entering edit mode

Thanks. I mainly ask Brad to try it on the unsorted file for which samtools does not give any error messages.

ADD REPLY
0
Entering edit mode

Michael and Keith, the file is definitely unsorted. Not sure what I'm doing different. Heng, I tried the GitHub version and got the same behavior. Edited the answer above with the first few lines of the file if that helps; it's from a paired end Bowtie alignment.

ADD REPLY
0
Entering edit mode

Hmm... Please try again. Updated a minute ago.

ADD REPLY
0
Entering edit mode

I have realized that such input will cause problems, but it seems that the fix is not correct. Please try again. Updated a minute ago.

ADD REPLY
18
Entering edit mode
7.6 years ago
martin.triska ▴ 180

If anyone arriving to this thread years later as me, you can run:

samtools stats <file.bam> | grep "is sorted:"

(I'm using the samtools 1.4, don't know when was this added to stats)

Enjoy :)

ADD COMMENT
1
Entering edit mode

If it gives back "is sorted: 1" means it is, indeed, sorted? I want to corroborate because the Samtools manual does not say anything about the meaning of this output code (in the manual entry for the stats command; I mean, something like explaining that 1 is sorted and 0 is not).

ADD REPLY
1
Entering edit mode

Samtools did check the order. In the source code:

if ( stats->tid==bam_line->core.tid && bam_line->core.pos<stats->pos )
stats->is_sorted = 0;

and

fprintf(to, "SN\tis sorted:\t%d\n", stats->is_sorted ? 1 : 0);

Link to source on github

ADD REPLY
0
Entering edit mode

From documentation:

is sorted - flag indicating whether the file is coordinate sorted (1) or not (0).

ADD REPLY
7
Entering edit mode
13.8 years ago
Michael 55k

Hi,

I think if you run samtools index, it will throw an error if the file was not sorted before, can use return code e.g. in a script. Otherwise it would do no harm to run sort again, would it?

Edit: So there seems to be some confusion here about flags and samtools index, but the safest method to make sure the file is really sorted is to always run samtools sorton the bam file. This changes nothing if the file is already sorted. Also, I have never seen samtools index work with a non-sorted file, I am refering to 0.1.12a, exept in the case that the imported sam file was already sorted from the beginning, thus this method is safe, imho.

ADD COMMENT
2
Entering edit mode
13.8 years ago

Unless you have extra information about the BAM file, it is not possible to tell without scanning the file. Extra information might be that you know that the optional sort order field in the header is present and correct, you know the complete history of the file or you have the expected checksum of the sorted data.

Otherwise, you can only be sure of the sort order by tracking the reference sequence and coordinates of the previous and current alignments while processing the file (in the case of coordinate sorting), or by pre-scanning with a program that performs this check. This is what 'samtools index' does while indexing.

It may be too expensive to pre-scan a huge BAM file to check for sortedness, so the cheapest way is to do it while you work on the data.

ADD COMMENT
1
Entering edit mode
13.8 years ago
Tim_Yates ▴ 110

I ran into this problem recently, and no it looks like there is no way of telling.

The best I could come up with, was requiring sorted files to then be indexed to generate a FILE.bai file. Any BAM files that were missing this paired file were sorted and indexed as part of the pipeline...

ADD COMMENT

Login before adding your answer.

Traffic: 1805 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6