Question

Why is coordinate sort required before findng read depths?

0

Entering edit mode

23 months ago

guntul ▴ 40

I have a wgs dataset and when I attempt use it with sambamba depth command, it gives sambamba-depth: All files must be coordinate-sorted error. What is the reason for this and why coordinate sorting is required?

wgs sambamba • 1.1k views

ADD COMMENT • link updated 23 months ago by zhang yi xing ▴ 50 • written 23 months ago by guntul ▴ 40

2

Entering edit mode

hi ,you can look at this fig in software mosdepth . By traversing a sorted BAM file from the beginning, one can obtain depth information. This algorithm will be faster and more memory-efficient. I believe the principle of Sambamba depth is the same. enter image description here

ADD REPLY • link 23 months ago by zhang yi xing ▴ 50

0

Entering edit mode

I've moving this to a comment as "use a different tool with the same requirement" is not an answer to "why is it a requirement"

ADD REPLY • link 23 months ago by Ram 45k

score 4 · Accepted Answer · 2023-06-02

In an unsorted BAM file, reads can be in any random order. In a co-ordinate sorted BAM file, reads are in the order in which they map to the reference genome. When they're sorted that way, to find a depth at a certain position, the program only needs to navigate to that position and account for all reads that exist at that position. As soon as a read that maps to the next position is found, the algorithm can stop looking.

In an unsorted BAM, the algorithm will need to look at every single read in the entire file before it's sure that all reads aligned to the position of interest are accounted for.

If your input has 500 positions, the sorted approach will mean going through the file once, jumping to each of the 500 positions The unsorted approach will mean going through the file 500 times, which is extremely unproductive.