Why is coordinate sort required before findng read depths?
1
0
Entering edit mode
17 months ago
guntul ▴ 40

I have a wgs dataset and when I attempt use it with sambamba depth command, it gives sambamba-depth: All files must be coordinate-sorted error. What is the reason for this and why coordinate sorting is required?

wgs sambamba • 904 views
ADD COMMENT
2
Entering edit mode

hi ,you can look at this fig in software mosdepth . By traversing a sorted BAM file from the beginning, one can obtain depth information. This algorithm will be faster and more memory-efficient. I believe the principle of Sambamba depth is the same. enter image description here

ADD REPLY
0
Entering edit mode

I've moving this to a comment as "use a different tool with the same requirement" is not an answer to "why is it a requirement"

ADD REPLY
4
Entering edit mode
17 months ago
Ram 44k

In an unsorted BAM file, reads can be in any random order. In a co-ordinate sorted BAM file, reads are in the order in which they map to the reference genome. When they're sorted that way, to find a depth at a certain position, the program only needs to navigate to that position and account for all reads that exist at that position. As soon as a read that maps to the next position is found, the algorithm can stop looking.

In an unsorted BAM, the algorithm will need to look at every single read in the entire file before it's sure that all reads aligned to the position of interest are accounted for.

If your input has 500 positions, the sorted approach will mean going through the file once, jumping to each of the 500 positions The unsorted approach will mean going through the file 500 times, which is extremely unproductive.

ADD COMMENT

Login before adding your answer.

Traffic: 2487 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6