I wonder if there is any tool (in samtools ?) to quickly know whether a BAM file has been sorted or not.
(longer story: I've merged a set of sorted bams using 'samtools merge' and I've then used MarkDuplicates with the option: 'assume_sorted=true'. Am I right ?)
Hi Pierre, You didn't give us the explicit answer yet? Do we still need sort the bam after we 'samtools merge' sorted small bams? I made a small test and find the merged sorted bam are already sorted.
Unfortunately samtools index will work on both types of files without raising a command-line error, but will produce a smaller index file which is indicative of a problem:
From the comments it appears as if samtools will sometimes raise an error in this case. Here is what an unsorted BAM file that finishes with a truncated index and no error messages looks like:
Thanks very much Heng, that did it. I now get the error '[bam_index_core] the alignment is not sorted: reads without coordinates prior to reads with coordinates.' I appreciate you looking into this.
The only problem is that changing that flag is program dependent and not required by the BAM spec. Picard does set the flag correctly, although samtools sort does not. I remember reading on SeqAnswers that Heng was planning on implementing this to avoid confusion; he can probably chime in if that's still a plan. Another tool to check for "bad" indexes on unsorted files is 'samtools idxstats' which might be nicer than looking at the size.
Which version of samtools is indexing an unsorted file? I happen to have version 0.1.8 (r613) here and it stops with a message that the alignments are not sorted if I try to index a file sorted by queryname.
Same observation as keith I do get an error:
[mdo041@astrakan ~]$ samtools-0.1.12a/samtools index test.fas.blat.sam.bam
[bam_index_core] the alignment is not sorted (NG-1755): 2158255 > 2132773 in 1-th chr
using a newly imported sam file. Brad: I think the reason for not seeing an error is that the file you are trying is actually sorted just by chance. So I don't know about the flags.
Michael and Keith, the file is definitely unsorted. Not sure what I'm doing different. Heng, I tried the GitHub version and got the same behavior. Edited the answer above with the first few lines of the file if that helps; it's from a paired end Bowtie alignment.
If it gives back "is sorted: 1" means it is, indeed, sorted? I want to corroborate because the Samtools manual does not say anything about the meaning of this output code (in the manual entry for the stats command; I mean, something like explaining that 1 is sorted and 0 is not).
I think if you run samtools index, it will throw an error if the file was not sorted before, can use return code e.g. in a script. Otherwise it would do no harm to run sort again, would it?
Edit:
So there seems to be some confusion here about flags and samtools index, but the safest method to make sure the file is really sorted is to always run samtools sorton the bam file. This changes nothing if the file is already sorted. Also, I have never seen samtools index work with a non-sorted file, I am refering to 0.1.12a, exept in the case that the imported sam file was already sorted from the beginning, thus this method is safe, imho.
Unless you have extra information about the BAM file, it is not possible to tell without scanning the file. Extra information might be that you know that the optional sort order field in the header is present and correct, you know the complete history of the file or you have the expected checksum of the sorted data.
Otherwise, you can only be sure of the sort order by tracking the reference sequence and coordinates of the previous and current alignments while processing the file (in the case of coordinate sorting), or by pre-scanning with a program that performs this check. This is what 'samtools index' does while indexing.
It may be too expensive to pre-scan a huge BAM file to check for sortedness, so the cheapest way is to do it while you work on the data.
I ran into this problem recently, and no it looks like there is no way of telling.
The best I could come up with, was requiring sorted files to then be indexed to generate a FILE.bai file. Any BAM files that were missing this paired file were sorted and indexed as part of the pipeline...
For those just looking for a quick way to do this using modern samtools, see martin.triska's answer further down.
Hi Pierre, do you have similar solution to check if sam file is sorted or not? Thank you
just pipe your SAM into 'samtools view' before using my program.
got it. Thank you :)
Hi Pierre, You didn't give us the explicit answer yet? Do we still need sort the bam after we 'samtools merge' sorted small bams? I made a small test and find the merged sorted bam are already sorted.